CVSformer: Cross-View Synthesis Transformer for Semantic Scene Completion
We use pytorch 1.9.0 with cuda 11.1. To build the environment, run:
matplotlib
opencv-python
plyfile
'trimesh>=2.35.39,<2.35.40'
'networkx>=2.2,<2.3'
tqdm
ninja
easydict
argparse
h5py
scipy
For NYU, download from https://github.com/charlesCXK/TorchSSC.
For NYUCAD, download from https://github.com/UniLauX/PALNet.
You can train your segmentation to obtain the 2D input of CVSformer. We pre-train DeepLabv3 for 1,000 epochs to segment the RGB image.
$ cd extensions
$ python setup.py install
$ python -m torch.distributed.launch --nproc_per_node=$NGPUS train.py
You can obtain our SOTA model from https://pan.xunlei.com/s/VN_rkyLfy43RBn1BfXwrTJtGA1?pwd=qumh#.
NYU
Method | SC IoU | SSC mIoU |
---|---|---|
Sketch | 71.3 | 41.1 |
SemanticFu | 73.1 | 42.8 |
FFNet | 71.8 | 44.4 |
SISNet(voxel) | 70.8 | 45.6 |
PCANet | 78.9 | 48.9 |
SISNet(voxel) | 78.2 | 52.4 |
CVSformer | 73.7 | 52.6 |
Method | SC IoU | SSC mIoU |
---|---|---|
Sketch | 84.2 | 55.2 |
SemanticFu | 84.8 | 57.2 |
FFNet | 85.5 | 57.4 |
SISNet(voxel) | 82.8 | 57.4 |
PCANet | 84.3 | 59.6 |
SISNet(voxel) | 86.3 | 63.5 |
CVSformer | 86.0 | 63.9 |