DiffCAD: Weakly-Supervised Probabilistic CAD Model Retrieval and Alignment from an RGB Image

Daoyi Gao, Dávid Rozenberszki, Stefan Leutenegger, and Angela Dai

DiffCAD proposed a weakly-supervised approach for CAD model retrieval and alignment from an RGB image. Our approach utilzes disentangled diffusion models to tackle the ambiguities in the monocular perception, and achives robuts cross-domain performance while only trained on synthetic dataset.

Environment

We tested with Ubuntu 20.04, Python 3.8, CUDA 11, Pytorch 2.0

Dependencies

We provide an Anaconda environment with the dependencies, to install run

conda env create -f env.yaml

Available Resources

Data

We provide our synthetic 3D-FRONT data rendering (RGB, rendered/predicted depth, mask, camera poses); processed watertight (mesh-fusion) and canonicalized meshes (ShapeNet and 3D-FUTURE), and their encoded latent vectors; machine estimated depth and masks on the validation set of ScanNet25k data. However, since the rendered data will take up large storage space, we also encourage you to generate the synthetic data rendering yourself following BlenderProc or 3DFront-Rendering.

Source Dataset	Description
3D-FRONT-CONFIG	Scene config for rendering, we also augment it with ShapeNet objects.
3D-FRONT-RENDERING	Renderings of 3D-FRONT dataset for each target category.
Object Meshes	Canonicalized and watertighted mesh of ShapeNet and 3D-FUTURE.
Object Meshes - AUG	ShapeNet object but scaled by its NN 3DF object scale, which we use to augment the synthetic dataset.
Object Latents	Encoded object latents for retrieval.
Val ScanNet25k	Predict depth, GT and predicted masks, CAD pools, pose gts on the validation set.
ScanNet25k data	The processed data from ROCA

Pretrained Checkpoint

We also provide the checkpoints for scene scale, object pose, and shape diffusion models.

Source Dataset
Scale	Joint category ldm model
Pose	Category-specific ldm model
Shape	Category-specific ldm model

Training

For scene scale:

python train_scale.py --base=configs/scale/depth_feat.yaml -t --gpus=0, --logdir=logs

For object NOCs:

python train_pose.py --base=configs/pose/depth_gcn.yaml -t --gpus=0, --logdir=logs

For object latents:

python train_shape.py --base=configs/shape/nocs_embed.yaml -t --gpus=0, --logdir=logs

Inference

For scene scale sampling:

python scripts/generate_multi_scale_candidates.py --config_path weights/scale/scale.yaml --model_path weights/scale/scale.ckpt --data_path datasets/Scan2CAD --split_path splits_redo/scale/val_joint.txt --outdir output_redo --num_iters 10 --mask SEEM_redo

For object NOCs generation (per category):

python scripts/generate_multi_nocs_candidates.py --category 02818832 --config_path weights/pose/pose.yaml --model_path weights/pose/02818832.ckpt --data_path datasets/Scan2CAD --split_path splits/pose/02818832/val_nonocc_centroid_maskexist.txt --outdir output/nocs --num_iters 5 --pred_scale_dir output/scale/predictions.json

For alignment (per category):

python scripts/alignment_from_nocs.py --category 02818832 --prediction_path output/nocs/02818832 --pose_gt_root datasets/Scan2CAD/val_pose_gt/scan2cad_val_02818832.json --mesh_root /project/3dlg-hcvc/diorama/diffcad/object_meshes/02818832 --split_path splits/pose/02818832/val_nonocc_centroid_maskexist.txt --outdir output/pose --num_iters 5

For object latent sampling:

python scripts/generate_multi_shape_candidates.py --category 02818832 --config_path weights/shape/shape.yaml --model_path weights/shape/02818832.ckpt --data_path datasets/Scan2CAD --ply_path output/nocs/02818832 --split_path splits/shape/02818832/val_nonocc_centroid_maskexist.txt --outdir output/shape --num_iters 5 --latent_root /project/3dlg-hcvc/diorama/diffcad/object_latents/02818832/latents_train

For evaluation

python scripts/eval_alignments.py --category 02818832 --prediction_path output/pose/pose_predictions_02818832.json --data_path datasets/Scan2CAD --pose_gt_path datasets/Scan2CAD/val_pose_gt/scan2cad_val_02818832.json --split_path splits/pose/02818832/val_02818832.txt --mesh_data_path /project/3dlg-hcvc/diorama/diffcad/object_meshes/02818832 --full_annotation_path datasets/Scan2CAD/full_annotations.json

BibTeX

@article{gao2023diffcad,
title= {DiffCAD: Weakly-Supervised Probabilistic CAD Model Retrieval and Alignment from an RGB Image},
author={Gao, Daoyi and Rozenberszki, David and Leutenegger, Stefan and Dai, Angela},
booktitle={ArXiv Preprint},
year={2023}
}

Reference

We borrow latent-diffusion from the official implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
configs		configs
data		data
models		models
modules		modules
scripts		scripts
splits		splits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env.yaml		env.yaml
lr_scheduler.py		lr_scheduler.py
train_pose.py		train_pose.py
train_scale.py		train_scale.py
train_shape.py		train_shape.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DiffCAD: Weakly-Supervised Probabilistic CAD Model Retrieval and Alignment from an RGB Image

Environment

Dependencies

Available Resources

Data

Pretrained Checkpoint

Training

Inference

BibTeX

Reference

About

Releases

Packages

Languages

License

3dlg-hcvc/DiffCAD

Folders and files

Latest commit

History

Repository files navigation

DiffCAD: Weakly-Supervised Probabilistic CAD Model Retrieval and Alignment from an RGB Image

Environment

Dependencies

Available Resources

Data

Pretrained Checkpoint

Training

Inference

BibTeX

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages