-
Notifications
You must be signed in to change notification settings - Fork 838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable video_reader
backend
#220
Conversation
…_28_compile_torchvision
…_28_compile_torchvision
…_28_compile_torchvision
794b413
to
ae0d5c9
Compare
These parameters and theirs values are specified in the BENCHMARKS dict. | ||
|
||
All of these benchmarks are evaluated within different timestamps modes corresponding to different frame-loading scenarios: | ||
- `1_frame`: 1 single frame is loaded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain what you are trying to achieve with these variations? I'd want to know why I shouldn't add 2_frames_6_spaces. I'd want to know why 6_frames
tests something fundamentally different to 2_frames
and why you didn't also do 20_frames
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were already present in the script and were done by Rémi, I just added a bit of documentation.
I think the idea is to have different common scenarios that can reflect a typical workload during training (e.g. with delta_timestamps), but I can't answer as to why these values specifically.
@Cadene care to shed some light?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's arbitrary based on possible future usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay then nit: if it were me, I'd add Remi's statement as a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 0f1986e
lerobot/common/datasets/_video_benchmark/run_video_benchmark.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Alexander Soare <alexander.soare159@gmail.com>
@aliberts Thanks for this great PR that benchmark pyav versus video_reader + other very needed additions. Could you double check that video_reader is faster than pyav?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please ping for a second review ;) Thanks!
These parameters and theirs values are specified in the BENCHMARKS dict. | ||
|
||
All of these benchmarks are evaluated within different timestamps modes corresponding to different frame-loading scenarios: | ||
- `1_frame`: 1 single frame is loaded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's arbitrary based on possible future usage.
The backend can be either "pyav" (default) or "video_reader". | ||
"video_reader" requires installing torchvision from source, see: | ||
https://github.com/pytorch/vision/blob/main/torchvision/csrc/io/decoder/gpu/README.rst | ||
(note that you need to compile against ffmpeg<4.3) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth mentioning that we expect video_reader for be faster (or slower?) than pyav, and point to benchmark README
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed bf3dbbd (I'll do a proper benchmark and link to it in a future PR)
(note that you need to compile against ffmpeg<4.3)
+ While both use cpu, "video_reader" is faster than "pyav" but requires additional setup.
+ See our benchmark results for more info on performance:
+ https://github.com/huggingface/lerobot/pull/220
+ See torchvision doc for more info on these two backends:
+ https://pytorch.org/vision/0.18/index.html?highlight=backend#torchvision.set_video_backend
Note: Video benefits from inter-frame compression. Instead of storing every frame individually,
Already on it in #282, I'll go with something like |
…_28_compile_torchvision
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What this does
This enables torchvision's — still experimental —
video_reader
backend for faster video decoding.video_backend
in config and as aLeRobotDataset
(andMultiLeRobotDataset
) argument to select betweenpyav
andvideo_reader
(defaults to pyav as before).How it was tested
with
Quality metrics (
avg_per_pixel_l2_error
,avg_psnr
,avg_ssim
,avg_mse
) are identical. Loading time is generally improved by a factor of ~1.5.1_frame
backend
2_frames
backend
2_frames_4_space
backend
6_frames
backend
I also did a run to reproduce pretrained act on aloha transfer task with
The full run is available on wandb here
How to checkout & try? (for the reviewer)
You first need to compile torchvision from source. Original instructions are available there but I'll recap here as some of it is not up-to-date:
TORCHVISION_INCLUDE
to the location of the video codec headers(nvcuvid.h and cuviddec.h), which would be under theInterface
directory.TORCHVISION_LIBRARY
environment variable to the location of the video codec library(libnvcuvid.so), which would be underLib/linux/stubs/x86_64
directory.CUDA_HOME
environment variable to the cuda root directory.You can do all these with this command: (assuming you have unziped the codec sdk in
$HOME
and that your cuda home is at/usr/local/cuda
, which is generally where it's at)conda install -c conda-forge "ffmpeg<4.3"
pyproject.toml
poetry lock --no-cache --no-update && poetry install --sync --all-extras
If you don't use poetry, you can simply do this instead (add the extras you need)
pip install .
video_reader
works with this: (It shouldn't raise any error)python -c "import torchvision; torchvision.set_video_backend('video_reader');"
Try any run with the option
video_backend=video_reader
(will default topyav
if not specified), e.g.This change is