Improving to_frames() implementation #1578
Merged
+490
−193
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously, the default syntax for frame views
dataset.to_frames()
would check for parallel directories of frames on disk for each video and synchronously useffmpeg
to sample any non-existent frames when creating the view.For example, videos with the following paths
would be sampled (if necessary) as follows:
There were numerous problems with this approach:
DatasetView
s should defer computation until sample access-time.ffmpeg
may fail to extract the last X% of frames of a video. The previous implementation lacked a good mechanism for gracefully continuing upon failures in such a way that runningdataset.to_frames()
would not always retry and re-fail to sample these uncomputable frames.This PR modifies the default behavior of
dataset.to_frames()
to instead assume that the user has already sampled the frames offline and stored their locations in afilepath
field of eachFrame
of their video dataset. Frames that do not have afilepath
populated (eg uncomputable ones) are omitted from the returned frames view.One can still ask FiftyOne to do the work of sampling frames via
dataset.to_frames(sample_frames=True)
. Moreover, when sampling frames via this syntax:filepath
fields will be automatically populated on the input dataset/collection so that the default syntaxdataset.to_frames()
can subsequently be usedExample usages: