Please put the dataset and annotation into the cvpods
project as following:
.
└── datasets
├── coco
│ ├── annotations
│ ├── train2017
│ └── val2017
├── lvis
│ ├── lvis_v0.5_train.json
│ ├── lvis_v0.5_val.json
│ ├── num_shots_v0.5.npy
│ ├── lvis_v1_train.json
│ ├── lvis_v1_val.json
│ └── num_shots_v1.0.json
└── .....
Related files can be downloaded:
- Enter one project folder
- Traning with:
pods_train --num-gpus 8
- Inference with:
pods_test --num-gpus 8
(* denotes the cosine classifier)
We refactor the code of the internal version and re-train all experiments, the performance results have a little difference(higher) with the reported in the original paper.
Name | Cls Norm | input size | lr sched | train time (s/iter) | train mem (GB) | box AP | mask AP | Trained Model |
---|---|---|---|---|---|---|---|---|
MaskRCNN-R50-FPN | 640-800 | 90k | 0.486 | 5.26 | 20.4 | 20.7 | LINK | |
MaskRCNN-R50-FPN | Cosine | 640-800 | 90k | 0.500 | 5.26 | 23.0 | 23.8 | LINK |
MaskRCNN-R50-FPN-RFS | 640-800 | 90k | 0.485 | 5.25 | 23.5 | 24.2 | LINK | |
MaskRCNN-R50-FPN-RFS | Cosine | 640-800 | 90k | 0.485 | 5.25 | 24.5 | 24.9 | LINK |
MaskRCNN-R50-FPN-DisAlign | 640-800 | 90k | 0.486 | 5.26 | 23.7 | 24.3 | LINK | |
MaskRCNN-R50-FPN-DisAlign | Cosine | 640-800 | 90k | 0.500 | 5.26 | 26.3 | 27.1 | LINK |
MaskRCNN-R50-FPN-RFS-DisAlign | Cosine | 640-800 | 90k | 0.500 | 5.26 | 27.1 | 27.5 | LINK |
Name | Cls Norm | input size | lr sched | train time (s/iter) | train mem (GB) | box AP | mask AP | Trained Model |
---|---|---|---|---|---|---|---|---|
MaskRCNN-R101-FPN | 640-800 | 90k | 22.6 | 22.8 | LINK | |||
MaskRCNN-R101-FPN | Cosine | 640-800 | 90k | 24.8 | 25.3 | LINK | ||
MaskRCNN-R101-FPN-RFS | Cosine | 640-800 | 90k | 26.6 | 26.8 | LINK | ||
MaskRCNN-R101-FPN-DisAlign | 640-800 | 90k | 25.9 | 26.2 | LINK | |||
MaskRCNN-R101-FPN-DisAlign | Cosine | 640-800 | 90k | 27.6 | 28.1 | LINK | ||
MaskRCNN-R101-FPN-RFS-DisAlign | Cosine | 640-800 | 90k | 28.7 | 28.9 | LINK |
Name | Cls Norm | input size | lr sched | train time (s/iter) | train mem (GB) | box AP | mask AP | Trained Model |
---|---|---|---|---|---|---|---|---|
MaskRCNN-X101-FPN | 640-800 | 90k | 24.8 | 25.2 | LINK | |||
MaskRCNN-X101-FPN | Cosine | 640-800 | 90k | 27.4 | 28.4 | LINK | ||
MaskRCNN-X101-FPN-DisAlign | 640-800 | 90k | 26.9 | 27.3 | LINK | |||
MaskRCNN-X101-FPN-DisAlign | Cosine | 640-800 | 90k | 29.6 | 30.2 | LINK |
Name | Cls Norm | input size | lr sched | train time (s/iter) | train mem (GB) | box AP | mask AP | Trained Model |
---|---|---|---|---|---|---|---|---|
MaskRCNN-R50-FPN | 640-800 | 180k | 0.486 | 5.26 | 18.8 | 18.3 | LINK | |
MaskRCNN-R50-FPN | Cosine | 640-800 | 180k | 0.500 | 5.26 | 21.3 | 21.1 | LINK |
MaskRCNN-R50-FPN-RFS | 640-800 | 180k | 0.485 | 5.25 | 22.9 | 22.5 | LINK | |
MaskRCNN-R50-FPN-RFS(A1) | 640-800 | 180k | 22.3 | |||||
MaskRCNN-R50-FPN-DisAlign | 640-800 | 180k | 0.486 | 5.26 | 21.9 | 21.3 | LINK | |
MaskRCNN-R50-FPN-DisAlign | Cosine | 640-800 | 180k | 0.500 | 5.26 | 24.8 | 24.2 | LINK |
- A1: Evaluating Large-Vocabulary Object Detectors: The Devil is in the Details