Improving to_frames() implementation #1578

brimoor · 2022-02-02T00:33:08Z

Previously, the default syntax for frame views dataset.to_frames() would check for parallel directories of frames on disk for each video and synchronously use ffmpeg to sample any non-existent frames when creating the view.

For example, videos with the following paths

/path/to/video1.mp4
/path/to/video2.mp4
...

would be sampled (if necessary) as follows:

/path/to/video1/
    000001.jpg
    000002.jpg
    ...
/path/to/video2/
    000001.jpg
    000002.jpg
    ...

There were numerous problems with this approach:

Sampling frames is expensive, which violates the notion that DatasetViews should defer computation until sample access-time.
In practice, sampling frames rarely goes as expected for non-H.264/5 streams. For example, ffmpeg may fail to extract the last X% of frames of a video. The previous implementation lacked a good mechanism for gracefully continuing upon failures in such a way that running dataset.to_frames() would not always retry and re-fail to sample these uncomputable frames.

This PR modifies the default behavior of dataset.to_frames() to instead assume that the user has already sampled the frames offline and stored their locations in a filepath field of each Frame of their video dataset. Frames that do not have a filepath populated (eg uncomputable ones) are omitted from the returned frames view.

One can still ask FiftyOne to do the work of sampling frames via dataset.to_frames(sample_frames=True). Moreover, when sampling frames via this syntax:

The frame filepath fields will be automatically populated on the input dataset/collection so that the default syntax dataset.to_frames() can subsequently be used
Sampling failures are now (by default) gracefully logged rather than raising a fatal exception

Example usages:

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F

dataset = foz.load_zoo_dataset("quickstart-video")

# This syntax is currently undocumented
# Use `(video_path, frame_number)` rather than sampling frames
frames = dataset.to_frames(sample_frames="dynamic")
print(dataset.count("frames"))
print(len(frames))

# Sample some frames
# This will store the sampled frame image paths in `filepath` sample fields
frames = dataset.to_frames(sample_frames=True, fps=1, verbose=True)
print(dataset.exists("frames.filepath").count("frames"))
print(len(frames))

# Now sample some different frames
people = dataset.filter_labels("frames.detections", F("label") == "person")
clips = people.to_clips("frames.detections")
frames = clips.to_frames(sample_frames=True, fps=1, verbose=True)
print(len(frames))

# Sample all remaining frames
frames = dataset.to_frames(sample_frames=True, verbose=True)
print(dataset.exists("frames.filepath").count("frames"))
print(dataset.count("frames"))
print(len(frames))

# Now let's actually use the new default behavior, which assumes `filepath` is
# already populated on each frame document
frames = dataset.to_frames()
print(dataset.count("frames"))
print(len(frames))

# Sub-sampling works with the default syntax too
frames = dataset.to_frames(fps=1)
print(len(frames))

# Frames without filepaths are not included in frames views
# To simulate this, we'll randomly clear some filepaths
dataset.match_frames(F.rand() < 0.1).clear_frame_field("filepath")
frames = dataset.to_frames()
print(dataset.count("frames"))
print(dataset.exists("frames.filepath").count("frames"))
print(len(frames))

# If frame documents are missing, these frames are not included in frames views
# by default because no filepaths are available
dataset.match_frames(F("detections.detections").length() > 10).keep_frames()
frames = dataset.to_frames()
print(dataset.count("frames"))
print(len(frames))

# No sampling will happen here because we already sampled all frames
frames = dataset.to_frames(sample_frames=True, sparse=True, verbose=True)
print(dataset.count("frames"))
print(len(frames))

# All frames exist on disk, but frame documents need to be created for the
# frames that we deleted previously so that filepaths can be stored again
frames = dataset.to_frames(sample_frames=True, verbose=True)
print(dataset.count("frames"))
print(len(frames))

# All frames have filepaths stored again, so the default syntax includes all
# frames in the frames view
frames = dataset.to_frames()
print(dataset.count("frames"))
print(len(frames))

# `to_frames()` is very graceful by default
dataset = fo.Dataset()
dataset.add_samples(
    [
        fo.Sample(filepath="non-existent1.mp4"),
        fo.Sample(filepath="non-existent2.mp4"),
        fo.Sample(filepath="non-existent3.mp4"),
        fo.Sample(filepath="non-existent4.mp4"),
        fo.Sample(filepath="non-existent5.mp4"),
    ]
)
view = dataset.to_frames()
view = dataset.to_frames(sample_frames=True, verbose=True)

ehofesmann

LGTM! Big fan of this new default behavior.

brimoor added 13 commits January 16, 2022 22:57

docstring updates

f0744c2

Merge branch 'develop' into to-frames

1903f91

Merge branch 'develop' into to-frames

309fd2c

store frame paths in a Frame.filepath field

11f7186

fixing frame clearing bug

513301b

finishing implementation

1fd3cc5

updating docs and tests

d12782b

Merge branch 'develop' into to-frames

81b350b

linting

630145b

docs tweaks

90ca1bd

bug fixes

d75f813

more bugs

57c6e6c

graceful

75fe3b2

brimoor added the enhancement label Feb 2, 2022

brimoor requested a review from a team February 2, 2022 00:33

brimoor self-assigned this Feb 2, 2022

brimoor added 2 commits February 1, 2022 23:46

finalizing implementation

4c9b2e5

documenting view parameters on SampleCollection

e0bbf64

ehofesmann approved these changes Feb 5, 2022

View reviewed changes

brimoor merged commit 33d752c into develop Feb 6, 2022

brimoor deleted the to-frames branch February 6, 2022 17:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving to_frames() implementation #1578

Improving to_frames() implementation #1578

brimoor commented Feb 2, 2022 •

edited

Loading

ehofesmann left a comment

Improving to_frames() implementation #1578

Improving to_frames() implementation #1578

Conversation

brimoor commented Feb 2, 2022 • edited Loading

ehofesmann left a comment

Choose a reason for hiding this comment

brimoor commented Feb 2, 2022 •

edited

Loading