This is the repository for the paper Progress-Aware Online Action Segmentation for Egocentric Procedural Task Videos accepted by CVPR 2024.
Most of the codes are adapted from MSTCN.
- main.py: Script to train and evaluate the model.
- model.py: Contains the implementation of the neural network models (MultiStageModel, SingleStageModel, etc.).
- batch_gen.py: Script for generating batches of data for training and evaluation.
- eval.py: Evaluation script.
- utils: Utility functions including
write_graph_from_transcripts
andwrite_progress_values
. - data/: Directory containing datasets, including ground truth and feature files.
GTEA: download GTEA data from link1 or link2. Please refer to ms-tcn or CVPR2024-FACT.
EgoProceL: download EgoProceL data from G-Drive. Please refer to CVPR2024-FACT.
EgoPER: download EgoPER data from G-Drive. Please refer to EgoPER for the original data.
To generate the target progress values:
python utils/write_progress_values.py
To generate task graphs from video transcripts:
python utils/write_graph.py
To train the model, use the following command:
python main.py --action train --dataset <dataset_name> --split <split_number> --exp_id protas --causal --graph --learnable_graph [other options]
To test the model, use the following command:
python main.py --action predict --dataset <dataset_name> --split <split_number> --exp_id protas --causal --graph --learnable_graph [other options]
Note: Theoretically, to test the model in an online setting, you should use the --action predict_online
argument, which makes predictions frame by frame. However, if the model is set to be causal, it will only make predictions based on frames up to the current frame. In this case, using --action predict
will produce the same results while being more efficient.
If you find the project helpful, we would appreciate if you cite the work:
@article{Shen:CVPR24,
author = {Y.~Shen and E.~Elhamifar},
title = {Progress-Aware Online Action Segmentation for Egocentric Procedural Task Videos},
journal = {{IEEE} Conference on Computer Vision and Pattern Recognition},
year = {2024}}