The search for efficient neural network architectures has gained much focus in recent years, where modern architectures focus not only on accuracy but also on inference time and model size. Here, we present FUN, a family of novel Frequency-domain Utilization Networks. These networks utilize the inherent efficiency of the frequency-domain by working directly in that domain, represented with the Discrete Cosine Transform. Using modern techniques and building blocks such as compound-scaling and inverted-residual layers we generate a set of such networks allowing one to balance between size, latency and accuracy while outperforming competing RGB-based models. Extensive evaluations verifies that our networks present strong alternatives to previous approaches. Moreover, we show that working in frequency domain allows for dynamic compression of the input at inference time without any explicit change to the architecture.
Official PyTorch implementation of "Rethinking FUN: Frequency-domain Utilization Networks".
Based on the implementation by Ross Wightman
- Clone this repo:
git clone https://github.com/kfir99/FUN.git
cd FUN
- Install a conda environment
conda create -n torch-env
conda activate torch-env
conda install -c pytorch pytorch torchvision cudatoolkit=10.2 jpeg2dct
conda install pyyaml
Pretrained models are available for download
Path | Accuracy | # Parameters (M) | FPS (V-100, batch size = 1) |
---|---|---|---|
eFUN | 77 | 4.2 | 124 |
eFUN-L | 78.8 | 6.2 | 101 |
eFUN-S | 75.6 | 3.4 | 132 |
eFUN-S+ | 73.3 | 2.5 | 145 |
Training and validation data should be organized in the following structure:
* data_dir
* train
* class_name_a
* images
* class_name_b
* images
.
.
.
* validation
* class_name_a
* images
* class_name_b
* images
.
.
.
./distributed_train.sh \
<number of abailable GPUs>\
<data_dir>\
--output <desired output path>\
--dct\
--model <efun/efun_l/efun_s/efun_s_plus>\
--drop-path <0.2/0.2/0.2/0.3>\
--no-prefetcher\
-b 128\
--sched step\
--epochs 450\
--decay-epochs 2.4\
--decay-rate .97\
--opt rmsproptf\
--opt-eps .001\
-j 8\
--warmup-lr 1e-6\
--weight-decay 1e-5\
--drop 0.2\
--model-ema\
--model-ema-decay 0.9999\
--remode pixel\
--reprob 0.2\
--lr .048
Training and validation data should be organized in the following structure:
* data_dir
* class_name_a
* images
* class_name_b
* images
.
.
.
python validate.py\
<path to test data>\
--model <efun/efun_l/efun_s/efun_s_plus>\
--checkpoint <path to trained model>\
--dct\
--no-prefetcher\
-j 32\
-b 256
Files added specifically for eFUN are marked in bold
- sota-bench integration