This is the implementation for paper: Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models.
The AnomalyRuler pipeline consists of two main stages: induction and deduction. The induction stage involves: i) visual perception transfers normal reference frames to text descriptions; ii) rule generation derives rules based on these descriptions to determine normality and anomaly; iii) rule aggregation employs a voting mechanism to mitigate errors in rules. The deduction stage involves: i) visual perception transfers continuous frames to descriptions; ii) perception smoothing adjusts these descriptions considering temporal consistency to ensure neighboring frames share similar characteristics; iii) robust reasoning rechecks the previous dummy answers and outputs reasoning.
pip install torch==2.1.0 torchvision==0.16.0 transformers==4.35.0 accelerate==0.24.1 sentencepiece==0.1.99 einops==0.7.0 xformers==0.0.22.post7 triton==2.1.0
pip install pandas pillow openai scikit-learn protobuf
Download the datasets and put the {train} and {test} folder under the {dataset_name} folder, for example:
+-- SHTech
| +-- train
| +-- test
| +-- 01_0014
| +-- 000.jpg
| +-- ...
Download links:
python image2text.py --data='SHTech'
python main.py --data='SHTech' --induct --b=1 --bs=10
python majority_smooth.py --data='SHTech'
PS: You can also start from Step 3 to reuse the rules and simply reproduce the results.
python main.py --data='SHTech' --deduct
@inproceedings{yang2024anomalyruler,
title={Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models},
author={Yuchen Yang and Kwonjoon Lee and Behzad Dariush and Yinzhi Cao and Shao-Yuan Lo},
year={2024},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)}
}