forked from open-mmlab/mmdetection
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Release mask2former (open-mmlab#7595)
* Release checkpoints of Mask2Former and MaskFormer * update config link Co-authored-by: luochunhua <luochunhua1996@outlook.com>
- Loading branch information
Showing
5 changed files
with
197 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
# Mask2Former | ||
|
||
> [Masked-attention Mask Transformer for Universal Image Segmentation](http://arxiv.org/abs/2112.01527) | ||
<!-- [ALGORITHM] --> | ||
|
||
## Abstract | ||
|
||
Image segmentation is about grouping pixels with different semantics, e.g., category or instance membership, where each choice of semantics defines a task. While only the semantics of each task differ, current research focuses on designing specialized architectures for each task. We present Masked-attention Mask Transformer (Mask2Former), a new architecture capable of addressing any image segmentation task (panoptic, instance or semantic). Its key components include masked attention, which extracts localized features by constraining cross-attention within predicted mask regions. In addition to reducing the research effort by at least three times, it outperforms the best specialized architectures by a significant margin on four popular datasets. Most notably, Mask2Former sets a new state-of-the-art for panoptic segmentation (57.8 PQ on COCO), instance segmentation (50.1 AP on COCO) and semantic segmentation (57.7 mIoU on ADE20K). | ||
|
||
<div align=center> | ||
<img src="https://camo.githubusercontent.com/455d3116845b1d580b1f8a8542334b9752fdf39364deee2951cdd231524c7725/68747470733a2f2f626f77656e63303232312e6769746875622e696f2f696d616765732f6d61736b666f726d657276325f7465617365722e706e67" height="300"/> | ||
</div> | ||
|
||
## Introduction | ||
|
||
Mask2Former requires COCO and [COCO-panoptic](http://images.cocodataset.org/annotations/panoptic_annotations_trainval2017.zip) dataset for training and evaluation. You need to download and extract it in the COCO dataset path. | ||
The directory should be like this. | ||
|
||
```none | ||
mmdetection | ||
├── mmdet | ||
├── tools | ||
├── configs | ||
├── data | ||
│ ├── coco | ||
│ │ ├── annotations | ||
| | | ├── instances_train2017.json | ||
| | | ├── instances_val2017.json | ||
│ │ │ ├── panoptic_train2017.json | ||
│ │ │ ├── panoptic_train2017 | ||
│ │ │ ├── panoptic_val2017.json | ||
│ │ │ ├── panoptic_val2017 | ||
│ │ ├── train2017 | ||
│ │ ├── val2017 | ||
│ │ ├── test2017 | ||
``` | ||
|
||
## Results and Models | ||
|
||
| Backbone | style | Pretrain | Lr schd | Mem (GB) | Inf time (fps) | PQ | box mAP | mask mAP | Config | Download | | ||
|:--------:|:-------:|:-----------:|:-------:|:--------:|:--------------:|:----:|:-------:|:--------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| | ||
| R-50 | pytorch | ImageNet-1K | 50e | 13.9 | - | 51.9 | 44.8 | 41.9 | [config](https://github.com/open-mmlab/mmdetection/blob/master/configs/mask2former/mask2former_r50_lsj_8x2_50e_coco.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_r50_lsj_8x2_50e_coco/mask2former_r50_lsj_8x2_50e_coco_20220326_224516-0091ce2b.pth) | [log](https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_r50_lsj_8x2_50e_coco/mask2former_r50_lsj_8x2_50e_coco_20220326_224516.log.json) | | ||
| Swin-T | - | ImageNet-1K | 50e | 15.9 | - | 53.4 | 46.3 | 43.4 | [config](https://github.com/open-mmlab/mmdetection/blob/master/configs/mask2former/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco_20220326_224553-c92f921c.pth) | [log](https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco_20220326_224553.log.json) | | ||
|
||
## Citation | ||
|
||
```latex | ||
@article{cheng2021mask2former, | ||
title={Masked-attention Mask Transformer for Universal Image Segmentation}, | ||
author={Bowen Cheng and Ishan Misra and Alexander G. Schwing and Alexander Kirillov and Rohit Girdhar}, | ||
journal={arXiv}, | ||
year={2021} | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
Collections: | ||
- Name: Mask2Former | ||
Metadata: | ||
Training Data: COCO | ||
Training Techniques: | ||
- AdamW | ||
- Weight Decay | ||
Training Resources: 8x A100 GPUs | ||
Architecture: | ||
- Mask2Former | ||
Paper: | ||
URL: https://arxiv.org/pdf/2112.01527 | ||
Title: 'Masked-attention Mask Transformer for Universal Image Segmentation' | ||
README: configs/mask2former/README.md | ||
Code: | ||
URL: https://github.com/open-mmlab/mmdetection/blob/v2.23.0/mmdet/models/detectors/mask2former.py#L7 | ||
Version: v2.23.0 | ||
|
||
Models: | ||
- Name: mask2former_r50_lsj_8x2_50e_coco | ||
In Collection: Mask2Former | ||
Config: configs/mask2former/mask2former_r50_lsj_8x2_50e_coco.py | ||
Metadata: | ||
Training Memory (GB): 13.9 | ||
Iterations: 368750 | ||
Results: | ||
- Task: Object Detection | ||
Dataset: COCO | ||
Metrics: | ||
box AP: 44.8 | ||
- Task: Instance Segmentation | ||
Dataset: COCO | ||
Metrics: | ||
mask AP: 41.9 | ||
- Task: Panoptic Segmentation | ||
Dataset: COCO | ||
Metrics: | ||
PQ: 51.9 | ||
Weights: https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_r50_lsj_8x2_50e_coco/mask2former_r50_lsj_8x2_50e_coco_20220326_224516-0091ce2b.pth | ||
- Name: mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco | ||
In Collection: Mask2Former | ||
Config: configs/mask2former/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco.py | ||
Metadata: | ||
Training Memory (GB): 15.9 | ||
Iterations: 368750 | ||
Results: | ||
- Task: Object Detection | ||
Dataset: COCO | ||
Metrics: | ||
box AP: 46.3 | ||
- Task: Instance Segmentation | ||
Dataset: COCO | ||
Metrics: | ||
mask AP: 43.4 | ||
- Task: Panoptic Segmentation | ||
Dataset: COCO | ||
Metrics: | ||
PQ: 53.4 | ||
Weights: https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco_20220326_224553-c92f921c.pth |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
67 changes: 67 additions & 0 deletions
67
configs/maskformer/maskformer_swin-l-p4-w12_mstrain_64x1_300e_coco.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
_base_ = './maskformer_r50_mstrain_16x1_75e_coco.py' | ||
|
||
pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth' # noqa | ||
depths = [2, 2, 18, 2] | ||
model = dict( | ||
backbone=dict( | ||
_delete_=True, | ||
type='SwinTransformer', | ||
pretrain_img_size=384, | ||
embed_dims=192, | ||
patch_size=4, | ||
window_size=12, | ||
mlp_ratio=4, | ||
depths=depths, | ||
num_heads=[6, 12, 24, 48], | ||
qkv_bias=True, | ||
qk_scale=None, | ||
drop_rate=0., | ||
attn_drop_rate=0., | ||
drop_path_rate=0.3, | ||
patch_norm=True, | ||
out_indices=(0, 1, 2, 3), | ||
with_cp=False, | ||
convert_weights=True, | ||
init_cfg=dict(type='Pretrained', checkpoint=pretrained)), | ||
panoptic_head=dict( | ||
in_channels=[192, 384, 768, 1536], # pass to pixel_decoder inside | ||
pixel_decoder=dict( | ||
_delete_=True, | ||
type='PixelDecoder', | ||
norm_cfg=dict(type='GN', num_groups=32), | ||
act_cfg=dict(type='ReLU')), | ||
enforce_decoder_input_project=True)) | ||
|
||
# weight_decay = 0.01 | ||
# norm_weight_decay = 0.0 | ||
# embed_weight_decay = 0.0 | ||
embed_multi = dict(lr_mult=1.0, decay_mult=0.0) | ||
norm_multi = dict(lr_mult=1.0, decay_mult=0.0) | ||
custom_keys = { | ||
'norm': norm_multi, | ||
'absolute_pos_embed': embed_multi, | ||
'relative_position_bias_table': embed_multi, | ||
'query_embed': embed_multi | ||
} | ||
|
||
# optimizer | ||
optimizer = dict( | ||
type='AdamW', | ||
lr=6e-5, | ||
weight_decay=0.01, | ||
eps=1e-8, | ||
betas=(0.9, 0.999), | ||
paramwise_cfg=dict(custom_keys=custom_keys, norm_decay_mult=0.0)) | ||
optimizer_config = dict(grad_clip=dict(max_norm=0.01, norm_type=2)) | ||
|
||
# learning policy | ||
lr_config = dict( | ||
policy='step', | ||
gamma=0.1, | ||
by_epoch=True, | ||
step=[250], | ||
warmup='linear', | ||
warmup_by_epoch=False, | ||
warmup_ratio=1e-6, | ||
warmup_iters=1500) | ||
runner = dict(type='EpochBasedRunner', max_epochs=300) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters