FusionSense

More Documentation Ongoing for VLM Reasoning and Real World Experiment. The README Needs a Lot of Cleaning and Update

🆕 [2024-10-17] Installation for Hardware Integration/3D Printing Updated. 🆕 [2024-10-15] Installation for Robotics Software Updated. 🆕 [2024-10-11] Made Public

FusionSense

[Page] | [Paper] | [Video]

This is the official implementation of FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction

Irving Fang, Kairui Shi, Xujin He, Siqi Tan, Yifan Wang, Hanwen Zhao, Hung-Jui Huang, Wenzhen Yuan, Chen Feng, Jing Zhang

FusionSense is a novel 3D reconstruction framework that enables robots to fuse priors from foundation models with highly sparse observations from vision and tactile sensors. It enables visually and geometrically accurate scene and object reconstruction, even for conventionally challenging objects.

Preparation

Step 0: Install Everything Robotics

We used a depth camera mounted on a robot arm powered by ROS2 to acquire pictures with accurate pose information. We also used a tactile sensor for Active Touch Selection.

If you have no need for this part, feel free to jump into Step 1 for the 3D Gaussian pipeline of Robust Global Shape Representation and Local Geometric Optimization.

For installing robotics software, please see Robotics Software Installation.
For hardware integration, please see 3D Printing Instructions.

Step 1: Install 3D Gaussian Dependencies and Nerfstudio

Note: Because our major dependencies, Nerfstudio and Grounded-SAM-2, officially support two different CUDA version (11.8 vs. 12.1), we will have to create two separate environments. We hope to resolve this in the future when Nerfstudio bump its official CUDA support version.

git clone --recursive https://github.com/ai4ce/FusionSense.git
cd FusionSense
conda env create -f config.yml
conda activate fusionsense

Install compatible pytorch and cuda-toolkit version:

pip install torch==2.1.2+cu118 torchvision==0.16.2+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit

Install tinycudann:

pip install ninja git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch

Build the environment

pip install -e .

Step 3: Install Grounded-SAM-2

We use Grounded-SAM-2 for segmenting the foreground and background. Please make sure to use our modified submodule.

We recommend starting a separate Conda environment, since Grounded-SAM-2 requires CUDA 12.1, which is not yet officially supported by Nerfstudio.

cd Grounded-SAM2-for-masking
cd checkpoints
bash download_ckpts.sh
cd ../gdino_checkpoints
bash download_ckpts.sh

conda create -n G-SAM-2
conda activate G-SAM-2
conda install pip 
conda install opencv supervision transformers
pip install torch torchvision torchaudio
# select cuda version 12.1
export CUDA_HOME=/path/to/cuda-12.1/
# install Segment Anything 2
pip install -e . 
# install Grounding DINO
pip install --no-build-isolation -e grounding_dino

For further installation problems:

For dn-splatter, see Installation
For Grounded-SAM2-for-masking, see Installation

Usage

Select Frames

set train.txt with images id.

Extract Mask

Switch your conda env first
set your scene path and prompt text with the end of '.'
eg. 'transparent white statue.'

conda activate G-SAM-2
cd Grounded-SAM2-for-masking
python grounded_sam2_hf_model_imgs_MaskExtract.py  --path {ABSOLUTE_PATH} --text {TEXT_PROMPT_FOR_TARGET_OBJ}
cd ..

run the script to extract masks.

If the num_no_detection is not 0, you need to select the frame again. Then you will see mask_imgs in /masks, and you can check /annotated frames to see the results more directly.

Run pipeline

You can change configs here: configs/config.py

conda activate fusionsense
python scripts/train.py --data_name {DATASET_NAME} --model_name {MODEL_NAME} --configs {CONFIG_PATH}

Render outputs

For render jpeg or mp4 outputs using nerfstudio, we recommend install ffmpeg in conda environment:

conda install -c conda-forge x264=='1!161.3030' ffmpeg=4.3.2

To render outputs of pretrained models:

python scripts/render_video.py camera-path --load_config your-model-config --camera_path_filename camera_path.json --rendered_output_names rgb depth normal

more details in nerfstudio ns-render.

Dataset Format

datasets/
    ds_name/
    │
    ├── transforms.json # need for training
    │
    ├── train.txt
    │
    ├── images/
    │   ├── rgb_1.png
    │   └── rgb_2.png
    │ 
    ├── realsense_depth/
    │   ├── depth_1.png
    │   └── depth_2.png
    │
    │── tactile/
    │   ├── image
    │   ├── mask
    │   ├── normal
    │   └── patch
    │
    ├── model.stl       # need for evaluation
    │
    ├── normals_from_pretrain/ # generated
    │   ├── rgb_1.png
    │   └── rgb_2.png
    │
    ├── foreground_pcd.ply
    │
    └── merged_pcd.ply

Outputs Format

outputs/
    ds_name/
    │
    ├── MESH/
    │   └── mesh.ply
    │
    ├── nerfstudio_models/
    │   └── 30000.ckpt
    │   
    ├── cluster_centers.npy
    │
    ├── config.yml
    │
    ├── high_grad_pts.pcd
    │
    ├── high_grad_pts_ascii.pcd
    │
    └── dataparser_transforms.json

eval/
    ds_name/ *evaluation results files*

Name		Name	Last commit message	Last commit date
Latest commit History 189 Commits
Grounded-SAM2-for-masking @ 7fd369c		Grounded-SAM2-for-masking @ 7fd369c
assets		assets
configs		configs
dn_splatter		dn_splatter
eval_utils		eval_utils
instructions		instructions
scripts		scripts
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
config.yml		config.yml
config2.yml		config2.yml
pixi.lock		pixi.lock
pixi.toml		pixi.toml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FusionSense

[Page] | [Paper] | [Video]

Preparation

Step 0: Install Everything Robotics

Step 1: Install 3D Gaussian Dependencies and Nerfstudio

Step 3: Install Grounded-SAM-2

Usage

Select Frames

Extract Mask

Run pipeline

Render outputs

Dataset Format

Outputs Format

About

Releases

Packages

Contributors 10

Languages

License

ai4ce/FusionSense

Folders and files

Latest commit

History

Repository files navigation

FusionSense

[Page] | [Paper] | [Video]

Preparation

Step 0: Install Everything Robotics

Step 1: Install 3D Gaussian Dependencies and Nerfstudio

Step 3: Install Grounded-SAM-2

Usage

Select Frames

Extract Mask

Run pipeline

Render outputs

Dataset Format

Outputs Format

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 10

Languages

Packages