More Documentation Ongoing for VLM Reasoning and Real World Experiment. The README Needs a Lot of Cleaning and Update
π [2024-10-17] Installation for Hardware Integration/3D Printing Updated. π [2024-10-15] Installation for Robotics Software Updated. π [2024-10-11] Made Public
This is the official implementation of FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction
Irving Fang, Kairui Shi, Xujin He, Siqi Tan, Yifan Wang, Hanwen Zhao, Hung-Jui Huang, Wenzhen Yuan, Chen Feng, Jing Zhang
FusionSense is a novel 3D reconstruction framework that enables robots to fuse priors from foundation models with highly sparse observations from vision and tactile sensors. It enables visually and geometrically accurate scene and object reconstruction, even for conventionally challenging objects.
We used a depth camera mounted on a robot arm powered by ROS2
to acquire pictures with accurate pose information. We also used a tactile sensor for Active Touch Selection.
If you have no need for this part, feel free to jump into Step 1 for the 3D Gaussian pipeline of Robust Global Shape Representation and Local Geometric Optimization.
- For installing robotics software, please see Robotics Software Installation.
- For hardware integration, please see 3D Printing Instructions.
Note: Because our major dependencies, Nerfstudio
and Grounded-SAM-2
, officially support two different CUDA version (11.8 vs. 12.1), we will have to create two separate environments. We hope to resolve this in the future when Nerfstudio
bump its official CUDA support version.
git clone --recursive https://github.com/ai4ce/FusionSense.git
cd FusionSense
conda env create -f config.yml
conda activate fusionsense
Install compatible pytorch and cuda-toolkit version:
pip install torch==2.1.2+cu118 torchvision==0.16.2+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit
Install tinycudann:
pip install ninja git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
Build the environment
pip install -e .
We use Grounded-SAM-2
for segmenting the foreground and background. Please make sure to use our modified submodule.
We recommend starting a separate Conda environment, since Grounded-SAM-2
requires CUDA 12.1, which is not yet officially supported by Nerfstudio
.
cd Grounded-SAM2-for-masking
cd checkpoints
bash download_ckpts.sh
cd ../gdino_checkpoints
bash download_ckpts.sh
conda create -n G-SAM-2
conda activate G-SAM-2
conda install pip
conda install opencv supervision transformers
pip install torch torchvision torchaudio
# select cuda version 12.1
export CUDA_HOME=/path/to/cuda-12.1/
# install Segment Anything 2
pip install -e .
# install Grounding DINO
pip install --no-build-isolation -e grounding_dino
For further installation problems:
-
For
dn-splatter
, see Installation -
For
Grounded-SAM2-for-masking
, see Installation
set train.txt
with images id.
Switch your conda env first
set your scene path and prompt text with the end of '.'
eg. 'transparent white statue.'
conda activate G-SAM-2
cd Grounded-SAM2-for-masking
python grounded_sam2_hf_model_imgs_MaskExtract.py --path {ABSOLUTE_PATH} --text {TEXT_PROMPT_FOR_TARGET_OBJ}
cd ..
run the script to extract masks.
If the num_no_detection
is not 0, you need to select the frame again. Then you will see mask_imgs in /masks
, and you can check /annotated
frames to see the results more directly.
You can change configs here: configs/config.py
conda activate fusionsense
python scripts/train.py --data_name {DATASET_NAME} --model_name {MODEL_NAME} --configs {CONFIG_PATH}
For render jpeg or mp4 outputs using nerfstudio, we recommend install ffmpeg in conda environment:
conda install -c conda-forge x264=='1!161.3030' ffmpeg=4.3.2
To render outputs of pretrained models:
python scripts/render_video.py camera-path --load_config your-model-config --camera_path_filename camera_path.json --rendered_output_names rgb depth normal
more details in nerfstudio ns-render
.
datasets/
ds_name/
β
βββ transforms.json # need for training
β
βββ train.txt
β
βββ images/
β βββ rgb_1.png
β βββ rgb_2.png
β
βββ realsense_depth/
β βββ depth_1.png
β βββ depth_2.png
β
βββ tactile/
β βββ image
β βββ mask
β βββ normal
β βββ patch
β
βββ model.stl # need for evaluation
β
βββ normals_from_pretrain/ # generated
β βββ rgb_1.png
β βββ rgb_2.png
β
βββ foreground_pcd.ply
β
βββ merged_pcd.ply
outputs/
ds_name/
β
βββ MESH/
β βββ mesh.ply
β
βββ nerfstudio_models/
β βββ 30000.ckpt
β
βββ cluster_centers.npy
β
βββ config.yml
β
βββ high_grad_pts.pcd
β
βββ high_grad_pts_ascii.pcd
β
βββ dataparser_transforms.json
eval/
ds_name/ *evaluation results files*