Skip to content

Commit

Permalink
Release mask2former (open-mmlab#7595)
Browse files Browse the repository at this point in the history
* Release checkpoints of Mask2Former and MaskFormer

* update config link

Co-authored-by: luochunhua <luochunhua1996@outlook.com>
  • Loading branch information
ZwwWayne and chhluo authored Apr 7, 2022
1 parent acfcf81 commit a9f144e
Show file tree
Hide file tree
Showing 5 changed files with 197 additions and 4 deletions.
55 changes: 55 additions & 0 deletions configs/mask2former/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Mask2Former

> [Masked-attention Mask Transformer for Universal Image Segmentation](http://arxiv.org/abs/2112.01527)
<!-- [ALGORITHM] -->

## Abstract

Image segmentation is about grouping pixels with different semantics, e.g., category or instance membership, where each choice of semantics defines a task. While only the semantics of each task differ, current research focuses on designing specialized architectures for each task. We present Masked-attention Mask Transformer (Mask2Former), a new architecture capable of addressing any image segmentation task (panoptic, instance or semantic). Its key components include masked attention, which extracts localized features by constraining cross-attention within predicted mask regions. In addition to reducing the research effort by at least three times, it outperforms the best specialized architectures by a significant margin on four popular datasets. Most notably, Mask2Former sets a new state-of-the-art for panoptic segmentation (57.8 PQ on COCO), instance segmentation (50.1 AP on COCO) and semantic segmentation (57.7 mIoU on ADE20K).

<div align=center>
<img src="https://camo.githubusercontent.com/455d3116845b1d580b1f8a8542334b9752fdf39364deee2951cdd231524c7725/68747470733a2f2f626f77656e63303232312e6769746875622e696f2f696d616765732f6d61736b666f726d657276325f7465617365722e706e67" height="300"/>
</div>

## Introduction

Mask2Former requires COCO and [COCO-panoptic](http://images.cocodataset.org/annotations/panoptic_annotations_trainval2017.zip) dataset for training and evaluation. You need to download and extract it in the COCO dataset path.
The directory should be like this.

```none
mmdetection
├── mmdet
├── tools
├── configs
├── data
│ ├── coco
│ │ ├── annotations
| | | ├── instances_train2017.json
| | | ├── instances_val2017.json
│ │ │ ├── panoptic_train2017.json
│ │ │ ├── panoptic_train2017
│ │ │ ├── panoptic_val2017.json
│ │ │ ├── panoptic_val2017
│ │ ├── train2017
│ │ ├── val2017
│ │ ├── test2017
```

## Results and Models

| Backbone | style | Pretrain | Lr schd | Mem (GB) | Inf time (fps) | PQ | box mAP | mask mAP | Config | Download |
|:--------:|:-------:|:-----------:|:-------:|:--------:|:--------------:|:----:|:-------:|:--------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
| R-50 | pytorch | ImageNet-1K | 50e | 13.9 | - | 51.9 | 44.8 | 41.9 | [config](https://github.com/open-mmlab/mmdetection/blob/master/configs/mask2former/mask2former_r50_lsj_8x2_50e_coco.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_r50_lsj_8x2_50e_coco/mask2former_r50_lsj_8x2_50e_coco_20220326_224516-0091ce2b.pth) &#124; [log](https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_r50_lsj_8x2_50e_coco/mask2former_r50_lsj_8x2_50e_coco_20220326_224516.log.json) |
| Swin-T | - | ImageNet-1K | 50e | 15.9 | - | 53.4 | 46.3 | 43.4 | [config](https://github.com/open-mmlab/mmdetection/blob/master/configs/mask2former/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco_20220326_224553-c92f921c.pth) &#124; [log](https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco_20220326_224553.log.json) |

## Citation

```latex
@article{cheng2021mask2former,
title={Masked-attention Mask Transformer for Universal Image Segmentation},
author={Bowen Cheng and Ishan Misra and Alexander G. Schwing and Alexander Kirillov and Rohit Girdhar},
journal={arXiv},
year={2021}
}
```
59 changes: 59 additions & 0 deletions configs/mask2former/metafile.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
Collections:
- Name: Mask2Former
Metadata:
Training Data: COCO
Training Techniques:
- AdamW
- Weight Decay
Training Resources: 8x A100 GPUs
Architecture:
- Mask2Former
Paper:
URL: https://arxiv.org/pdf/2112.01527
Title: 'Masked-attention Mask Transformer for Universal Image Segmentation'
README: configs/mask2former/README.md
Code:
URL: https://github.com/open-mmlab/mmdetection/blob/v2.23.0/mmdet/models/detectors/mask2former.py#L7
Version: v2.23.0

Models:
- Name: mask2former_r50_lsj_8x2_50e_coco
In Collection: Mask2Former
Config: configs/mask2former/mask2former_r50_lsj_8x2_50e_coco.py
Metadata:
Training Memory (GB): 13.9
Iterations: 368750
Results:
- Task: Object Detection
Dataset: COCO
Metrics:
box AP: 44.8
- Task: Instance Segmentation
Dataset: COCO
Metrics:
mask AP: 41.9
- Task: Panoptic Segmentation
Dataset: COCO
Metrics:
PQ: 51.9
Weights: https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_r50_lsj_8x2_50e_coco/mask2former_r50_lsj_8x2_50e_coco_20220326_224516-0091ce2b.pth
- Name: mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco
In Collection: Mask2Former
Config: configs/mask2former/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco.py
Metadata:
Training Memory (GB): 15.9
Iterations: 368750
Results:
- Task: Object Detection
Dataset: COCO
Metrics:
box AP: 46.3
- Task: Instance Segmentation
Dataset: COCO
Metrics:
mask AP: 43.4
- Task: Panoptic Segmentation
Dataset: COCO
Metrics:
PQ: 53.4
Weights: https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco_20220326_224553-c92f921c.pth
8 changes: 4 additions & 4 deletions configs/maskformer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,10 +36,10 @@ mmdetection

## Results and Models

| Backbone | style | Lr schd | Mem (GB) | Inf time (fps) | PQ | SQ | RQ | PQ_th | SQ_th | RQ_th | PQ_st | SQ_st | RQ_st | Config | Download | detail |
|:--------:|:-------:|:-------:|:--------:|:--------------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:--------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------:|
| R-50 | pytorch | 75e | 16.2 | - | 46.854 | 80.617 | 57.085 | 51.089 | 81.511 | 61.853 | 40.463 | 79.269 | 49.888 | [config](https://download.openmmlab.com/mmdetection/v2.0/maskformer/maskformer_r50_mstrain_16x1_75e_coco/maskformer_r50_mstrain_16x1_75e_coco.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/maskformer/maskformer_r50_mstrain_16x1_75e_coco/maskformer_r50_mstrain_16x1_75e_coco_20220221_141956-bc2699cb.pth) &#124; [log](https://download.openmmlab.com/mmdetection/v2.0/maskformer/maskformer_r50_mstrain_16x1_75e_coco/maskformer_r50_mstrain_16x1_75e_coco_20220221_141956.log.json) | This version was mentioned in Table XI, in paper [Masked-attention Mask Transformer for Universal Image Segmentation](https://arxiv.org/abs/2112.01527) |

| Backbone | style | Lr schd | Mem (GB) | Inf time (fps) | PQ | SQ | RQ | PQ_th | SQ_th | RQ_th | PQ_st | SQ_st | RQ_st | Config | Download | detail |
|:--------:|:-------:|:-------:|:--------:|:--------------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------:|
| R-50 | pytorch | 75e | 16.2 | - | 46.854 | 80.617 | 57.085 | 51.089 | 81.511 | 61.853 | 40.463 | 79.269 | 49.888 | [config](https://github.com/open-mmlab/mmdetection/blob/master/configs/maskformer/maskformer_r50_mstrain_16x1_75e_coco.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/maskformer/maskformer_r50_mstrain_16x1_75e_coco/maskformer_r50_mstrain_16x1_75e_coco_20220221_141956-bc2699cb.pth) &#124; [log](https://download.openmmlab.com/mmdetection/v2.0/maskformer/maskformer_r50_mstrain_16x1_75e_coco/maskformer_r50_mstrain_16x1_75e_coco_20220221_141956.log.json) | This version was mentioned in Table XI, in paper [Masked-attention Mask Transformer for Universal Image Segmentation](https://arxiv.org/abs/2112.01527) |
| Swin-L | pytorch | 300e | 27.2 | - | 53.249 | 81.704 | 64.231 | 58.798 | 82.923 | 70.282 | 44.874 | 79.863 | 55.097 | [config](https://github.com/open-mmlab/mmdetection/blob/master/configs/maskformer/maskformer_swin-l-p4-w12_mstrain_64x1_300e_coco.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/maskformer/maskformer_swin-l-p4-w12_mstrain_64x1_300e_coco/maskformer_swin-l-p4-w12_mstrain_64x1_300e_coco_20220326_221612-061b4eb8.pth) &#124; [log](https://download.openmmlab.com/mmdetection/v2.0/maskformer/maskformer_swin-l-p4-w12_mstrain_64x1_300e_coco/maskformer_swin-l-p4-w12_mstrain_64x1_300e_coco_20220326_221612.log.json) | - |
## Citation

```latex
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
_base_ = './maskformer_r50_mstrain_16x1_75e_coco.py'

pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth' # noqa
depths = [2, 2, 18, 2]
model = dict(
backbone=dict(
_delete_=True,
type='SwinTransformer',
pretrain_img_size=384,
embed_dims=192,
patch_size=4,
window_size=12,
mlp_ratio=4,
depths=depths,
num_heads=[6, 12, 24, 48],
qkv_bias=True,
qk_scale=None,
drop_rate=0.,
attn_drop_rate=0.,
drop_path_rate=0.3,
patch_norm=True,
out_indices=(0, 1, 2, 3),
with_cp=False,
convert_weights=True,
init_cfg=dict(type='Pretrained', checkpoint=pretrained)),
panoptic_head=dict(
in_channels=[192, 384, 768, 1536], # pass to pixel_decoder inside
pixel_decoder=dict(
_delete_=True,
type='PixelDecoder',
norm_cfg=dict(type='GN', num_groups=32),
act_cfg=dict(type='ReLU')),
enforce_decoder_input_project=True))

# weight_decay = 0.01
# norm_weight_decay = 0.0
# embed_weight_decay = 0.0
embed_multi = dict(lr_mult=1.0, decay_mult=0.0)
norm_multi = dict(lr_mult=1.0, decay_mult=0.0)
custom_keys = {
'norm': norm_multi,
'absolute_pos_embed': embed_multi,
'relative_position_bias_table': embed_multi,
'query_embed': embed_multi
}

# optimizer
optimizer = dict(
type='AdamW',
lr=6e-5,
weight_decay=0.01,
eps=1e-8,
betas=(0.9, 0.999),
paramwise_cfg=dict(custom_keys=custom_keys, norm_decay_mult=0.0))
optimizer_config = dict(grad_clip=dict(max_norm=0.01, norm_type=2))

# learning policy
lr_config = dict(
policy='step',
gamma=0.1,
by_epoch=True,
step=[250],
warmup='linear',
warmup_by_epoch=False,
warmup_ratio=1e-6,
warmup_iters=1500)
runner = dict(type='EpochBasedRunner', max_epochs=300)
12 changes: 12 additions & 0 deletions configs/maskformer/metafile.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,15 @@ Models:
Metrics:
PQ: 46.9
Weights: https://download.openmmlab.com/mmdetection/v2.0/maskformer/maskformer_r50_mstrain_16x1_75e_coco/maskformer_r50_mstrain_16x1_75e_coco_20220221_141956-bc2699cb.pth
- Name: maskformer_swin-l-p4-w12_mstrain_64x1_300e_coco
In Collection: MaskFormer
Config: configs/maskformer/maskformer_swin-l-p4-w12_mstrain_64x1_300e_coco.py
Metadata:
Training Memory (GB): 27.2
Epochs: 300
Results:
- Task: Panoptic Segmentation
Dataset: COCO
Metrics:
PQ: 53.2
Weights: https://download.openmmlab.com/mmdetection/v2.0/maskformer/maskformer_swin-l-p4-w12_mstrain_64x1_300e_coco/maskformer_swin-l-p4-w12_mstrain_64x1_300e_coco_20220326_221612-061b4eb8.pth

0 comments on commit a9f144e

Please sign in to comment.