Release mask2former (open-mmlab#7595)

* Release checkpoints of Mask2Former and MaskFormer * update config link Co-authored-by: luochunhua <luochunhua1996@outlook.com>
Dumoio · Apr 7, 2022 · a9f144e · a9f144e
1 parent acfcf81
commit a9f144e
Show file tree

Hide file tree

Showing 5 changed files with 197 additions and 4 deletions.
diff --git a/configs/mask2former/README.md b/configs/mask2former/README.md
@@ -0,0 +1,55 @@
+# Mask2Former
+
+> [Masked-attention Mask Transformer for Universal Image Segmentation](http://arxiv.org/abs/2112.01527)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Image segmentation is about grouping pixels with different semantics, e.g., category or instance membership, where each choice of semantics defines a task. While only the semantics of each task differ, current research focuses on designing specialized architectures for each task. We present Masked-attention Mask Transformer (Mask2Former), a new architecture capable of addressing any image segmentation task (panoptic, instance or semantic). Its key components include masked attention, which extracts localized features by constraining cross-attention within predicted mask regions. In addition to reducing the research effort by at least three times, it outperforms the best specialized architectures by a significant margin on four popular datasets. Most notably, Mask2Former sets a new state-of-the-art for panoptic segmentation (57.8 PQ on COCO), instance segmentation (50.1 AP on COCO) and semantic segmentation (57.7 mIoU on ADE20K).
+
+<div align=center>
+<img src="https://camo.githubusercontent.com/455d3116845b1d580b1f8a8542334b9752fdf39364deee2951cdd231524c7725/68747470733a2f2f626f77656e63303232312e6769746875622e696f2f696d616765732f6d61736b666f726d657276325f7465617365722e706e67" height="300"/>
+</div>
+
+## Introduction
+
+Mask2Former requires COCO and [COCO-panoptic](http://images.cocodataset.org/annotations/panoptic_annotations_trainval2017.zip) dataset for training and evaluation. You need to download and extract it in the COCO dataset path.
+The directory should be like this.
+
+```none
+mmdetection
+├── mmdet
+├── tools
+├── configs
+├── data
+│   ├── coco
+│   │   ├── annotations
+|   |   |   ├── instances_train2017.json
+|   |   |   ├── instances_val2017.json
+│   │   │   ├── panoptic_train2017.json
+│   │   │   ├── panoptic_train2017
+│   │   │   ├── panoptic_val2017.json
+│   │   │   ├── panoptic_val2017
+│   │   ├── train2017
+│   │   ├── val2017
+│   │   ├── test2017
+```
+
+## Results and Models
+
+| Backbone |  style  |  Pretrain   | Lr schd | Mem (GB) | Inf time (fps) |  PQ  | box mAP | mask mAP |                                                                                Config                                                                                |                                                                                                                                                                                           Download                                                                                                                                                                                           |
+|:--------:|:-------:|:-----------:|:-------:|:--------:|:--------------:|:----:|:-------:|:--------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
+|   R-50   | pytorch | ImageNet-1K |   50e   |   13.9   |       -        | 51.9 |  44.8   |   41.9   |              [config](https://github.com/open-mmlab/mmdetection/blob/master/configs/mask2former/mask2former_r50_lsj_8x2_50e_coco.py)              |                           [model](https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_r50_lsj_8x2_50e_coco/mask2former_r50_lsj_8x2_50e_coco_20220326_224516-0091ce2b.pth) &#124; [log](https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_r50_lsj_8x2_50e_coco/mask2former_r50_lsj_8x2_50e_coco_20220326_224516.log.json)                           |
+|  Swin-T  |    -    | ImageNet-1K |   50e   |   15.9   |       -        | 53.4 |  46.3   |   43.4   | [config](https://github.com/open-mmlab/mmdetection/blob/master/configs/mask2former/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco_20220326_224553-c92f921c.pth) &#124; [log](https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco_20220326_224553.log.json) |
+
+## Citation
+
+```latex
+@article{cheng2021mask2former,
+  title={Masked-attention Mask Transformer for Universal Image Segmentation},
+  author={Bowen Cheng and Ishan Misra and Alexander G. Schwing and Alexander Kirillov and Rohit Girdhar},
+  journal={arXiv},
+  year={2021}
+}
+```
diff --git a/configs/mask2former/metafile.yml b/configs/mask2former/metafile.yml
@@ -0,0 +1,59 @@
+Collections:
+  - Name: Mask2Former
+    Metadata:
+      Training Data: COCO
+      Training Techniques:
+        - AdamW
+        - Weight Decay
+      Training Resources: 8x A100 GPUs
+      Architecture:
+        - Mask2Former
+    Paper:
+      URL: https://arxiv.org/pdf/2112.01527
+      Title: 'Masked-attention Mask Transformer for Universal Image Segmentation'
+    README: configs/mask2former/README.md
+    Code:
+      URL: https://github.com/open-mmlab/mmdetection/blob/v2.23.0/mmdet/models/detectors/mask2former.py#L7
+      Version: v2.23.0
+
+Models:
+- Name: mask2former_r50_lsj_8x2_50e_coco
+  In Collection: Mask2Former
+  Config: configs/mask2former/mask2former_r50_lsj_8x2_50e_coco.py
+  Metadata:
+    Training Memory (GB): 13.9
+    Iterations: 368750
+  Results:
+  - Task: Object Detection
+    Dataset: COCO
+    Metrics:
+      box AP: 44.8
+  - Task: Instance Segmentation
+    Dataset: COCO
+    Metrics:
+      mask AP: 41.9
+  - Task: Panoptic Segmentation
+    Dataset: COCO
+    Metrics:
+      PQ: 51.9
+  Weights: https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_r50_lsj_8x2_50e_coco/mask2former_r50_lsj_8x2_50e_coco_20220326_224516-0091ce2b.pth
+- Name: mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco
+  In Collection: Mask2Former
+  Config: configs/mask2former/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco.py
+  Metadata:
+    Training Memory (GB): 15.9
+    Iterations: 368750
+  Results:
+  - Task: Object Detection
+    Dataset: COCO
+    Metrics:
+      box AP: 46.3
+  - Task: Instance Segmentation
+    Dataset: COCO
+    Metrics:
+      mask AP: 43.4
+  - Task: Panoptic Segmentation
+    Dataset: COCO
+    Metrics:
+      PQ: 53.4
+  Weights: https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco_20220326_224553-c92f921c.pth
diff --git a/configs/maskformer/README.md b/configs/maskformer/README.md
@@ -36,10 +36,10 @@ mmdetection
 
 ## Results and Models
 
-| Backbone |  style  | Lr schd | Mem (GB) | Inf time (fps) |   PQ   |   SQ   |   RQ   | PQ_th  | SQ_th  | RQ_th  | PQ_st  | SQ_st  | RQ_st  |                                                           Config                                                           |                                                                                                                                                                    Download                                                                                                                                                                     |                                                                         detail                                                                          |
-|:--------:|:-------:|:-------:|:--------:|:--------------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:--------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------:|
-|   R-50   | pytorch |   75e   |   16.2   |       -        | 46.854 | 80.617 | 57.085 | 51.089 | 81.511 | 61.853 | 40.463 | 79.269 | 49.888 | [config](https://download.openmmlab.com/mmdetection/v2.0/maskformer/maskformer_r50_mstrain_16x1_75e_coco/maskformer_r50_mstrain_16x1_75e_coco.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/maskformer/maskformer_r50_mstrain_16x1_75e_coco/maskformer_r50_mstrain_16x1_75e_coco_20220221_141956-bc2699cb.pth) &#124; [log](https://download.openmmlab.com/mmdetection/v2.0/maskformer/maskformer_r50_mstrain_16x1_75e_coco/maskformer_r50_mstrain_16x1_75e_coco_20220221_141956.log.json) | This version was mentioned in Table XI, in paper [Masked-attention Mask Transformer for Universal Image Segmentation](https://arxiv.org/abs/2112.01527) |
-
+| Backbone |  style  | Lr schd | Mem (GB) | Inf time (fps) |   PQ   |   SQ   |   RQ   | PQ_th  | SQ_th  | RQ_th  | PQ_st  | SQ_st  | RQ_st  |                                                                                 Config                                                                                  |                                                                                                                                                                                              Download                                                                                                                                                                                              |                                                                         detail                                                                          |
+|:--------:|:-------:|:-------:|:--------:|:--------------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------:|
+|   R-50   | pytorch |   75e   |   16.2   |       -        | 46.854 | 80.617 | 57.085 | 51.089 | 81.511 | 61.853 | 40.463 | 79.269 | 49.888 |            [config](https://github.com/open-mmlab/mmdetection/blob/master/configs/maskformer/maskformer_r50_mstrain_16x1_75e_coco.py)            |                       [model](https://download.openmmlab.com/mmdetection/v2.0/maskformer/maskformer_r50_mstrain_16x1_75e_coco/maskformer_r50_mstrain_16x1_75e_coco_20220221_141956-bc2699cb.pth) &#124; [log](https://download.openmmlab.com/mmdetection/v2.0/maskformer/maskformer_r50_mstrain_16x1_75e_coco/maskformer_r50_mstrain_16x1_75e_coco_20220221_141956.log.json)                       | This version was mentioned in Table XI, in paper [Masked-attention Mask Transformer for Universal Image Segmentation](https://arxiv.org/abs/2112.01527) |
+|  Swin-L  | pytorch |  300e   |   27.2   |       -        | 53.249 | 81.704 | 64.231 | 58.798 | 82.923 | 70.282 | 44.874 | 79.863 | 55.097 | [config](https://github.com/open-mmlab/mmdetection/blob/master/configs/maskformer/maskformer_swin-l-p4-w12_mstrain_64x1_300e_coco.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/maskformer/maskformer_swin-l-p4-w12_mstrain_64x1_300e_coco/maskformer_swin-l-p4-w12_mstrain_64x1_300e_coco_20220326_221612-061b4eb8.pth) &#124; [log](https://download.openmmlab.com/mmdetection/v2.0/maskformer/maskformer_swin-l-p4-w12_mstrain_64x1_300e_coco/maskformer_swin-l-p4-w12_mstrain_64x1_300e_coco_20220326_221612.log.json) |                                                                            -                                                                            |
 ## Citation
 
 ```latex

diff --git a/configs/maskformer/maskformer_swin-l-p4-w12_mstrain_64x1_300e_coco.py b/configs/maskformer/maskformer_swin-l-p4-w12_mstrain_64x1_300e_coco.py
@@ -0,0 +1,67 @@
+_base_ = './maskformer_r50_mstrain_16x1_75e_coco.py'
+
+pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth'  # noqa
+depths = [2, 2, 18, 2]
+model = dict(
+    backbone=dict(
+        _delete_=True,
+        type='SwinTransformer',
+        pretrain_img_size=384,
+        embed_dims=192,
+        patch_size=4,
+        window_size=12,
+        mlp_ratio=4,
+        depths=depths,
+        num_heads=[6, 12, 24, 48],
+        qkv_bias=True,
+        qk_scale=None,
+        drop_rate=0.,
+        attn_drop_rate=0.,
+        drop_path_rate=0.3,
+        patch_norm=True,
+        out_indices=(0, 1, 2, 3),
+        with_cp=False,
+        convert_weights=True,
+        init_cfg=dict(type='Pretrained', checkpoint=pretrained)),
+    panoptic_head=dict(
+        in_channels=[192, 384, 768, 1536],  # pass to pixel_decoder inside
+        pixel_decoder=dict(
+            _delete_=True,
+            type='PixelDecoder',
+            norm_cfg=dict(type='GN', num_groups=32),
+            act_cfg=dict(type='ReLU')),
+        enforce_decoder_input_project=True))
+
+# weight_decay = 0.01
+# norm_weight_decay = 0.0
+# embed_weight_decay = 0.0
+embed_multi = dict(lr_mult=1.0, decay_mult=0.0)
+norm_multi = dict(lr_mult=1.0, decay_mult=0.0)
+custom_keys = {
+    'norm': norm_multi,
+    'absolute_pos_embed': embed_multi,
+    'relative_position_bias_table': embed_multi,
+    'query_embed': embed_multi
+}
+
+# optimizer
+optimizer = dict(
+    type='AdamW',
+    lr=6e-5,
+    weight_decay=0.01,
+    eps=1e-8,
+    betas=(0.9, 0.999),
+    paramwise_cfg=dict(custom_keys=custom_keys, norm_decay_mult=0.0))
+optimizer_config = dict(grad_clip=dict(max_norm=0.01, norm_type=2))
+
+# learning policy
+lr_config = dict(
+    policy='step',
+    gamma=0.1,
+    by_epoch=True,
+    step=[250],
+    warmup='linear',
+    warmup_by_epoch=False,
+    warmup_ratio=1e-6,
+    warmup_iters=1500)
+runner = dict(type='EpochBasedRunner', max_epochs=300)
diff --git a/configs/maskformer/metafile.yml b/configs/maskformer/metafile.yml
@@ -29,3 +29,15 @@ Models:
       Metrics:
         PQ: 46.9
     Weights: https://download.openmmlab.com/mmdetection/v2.0/maskformer/maskformer_r50_mstrain_16x1_75e_coco/maskformer_r50_mstrain_16x1_75e_coco_20220221_141956-bc2699cb.pth
+  - Name: maskformer_swin-l-p4-w12_mstrain_64x1_300e_coco
+    In Collection: MaskFormer
+    Config: configs/maskformer/maskformer_swin-l-p4-w12_mstrain_64x1_300e_coco.py
+    Metadata:
+      Training Memory (GB): 27.2
+      Epochs: 300
+    Results:
+    - Task: Panoptic Segmentation
+      Dataset: COCO
+      Metrics:
+        PQ: 53.2
+    Weights: https://download.openmmlab.com/mmdetection/v2.0/maskformer/maskformer_swin-l-p4-w12_mstrain_64x1_300e_coco/maskformer_swin-l-p4-w12_mstrain_64x1_300e_coco_20220326_221612-061b4eb8.pth