Skip to content

yingfei1016/CMT

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection

arXiv visitors

This repository is an official implementation of CMT.


CMT is a robust 3D detector for end-to-end 3D multi-modal detection. A DETR-like framework is designed for multi-modal detection(CMT) and lidar-only detection(CMT-L), which obtains 73.0% and 70.1% NDS separately on nuScenes benchmark. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. CMT can be a strong baseline for further research.

Preparation

Main Results

We provide some results on nuScenes val set. The default batch size is 2 on each GPU.

config mAP NDS GPU schedule time
CMT-pillar0200-r50-704x256 53.8% 58.5% 8 x 2080ti 20 epoch 13 hours
CMT-voxel0100-r50-800x320 60.1% 63.4% 8 x 2080ti 20 epoch 14 hours
CMT-voxel0075-vov-1600x640 69.4% 71.9% 8 x A100 15e+5e(with cbgs) 45 hours

Contact

If you have any questions, feel free to open an issue or contact us at yanjunjie@megvii.com, liuyingfei@megvii.com, sunjianjian@megvii.com or wangtiancai@megvii.com.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published