More Documentation Ongoing for VLM Reasoning and Real World Experiments. The README Needs a Lot of Cleaning and Update
🆕 [2024-10-17] Installation for Hardware Integration/3D Printing Updated.
🆕 [2024-10-15] Installation for Robotics Software Updated.
🆕 [2024-10-11] Made Public
This is the official implementation of FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction
Irving Fang, Kairui Shi, Xujin He, Siqi Tan, Yifan Wang, Hanwen Zhao, Hung-Jui Huang, Wenzhen Yuan, Chen Feng, Jing Zhang
FusionSense is a novel 3D reconstruction framework that enables robots to fuse priors from foundation models with highly sparse observations from vision and tactile sensors. It enables visually and geometrically accurate scene and object reconstruction, even for conventionally challenging objects.
This repo has been tested on Ubuntu 20.04
and 22.04
. The real-world experiment is conducted on 22.04
as ROS2 Humble
requires it.
We used a depth camera mounted on a robot arm powered by ROS2
to acquire pictures with accurate pose information. We also used a tactile sensor for Active Touch Selection.
If you have no need for this part, feel free to jump into Step 1 for the 3D Gaussian pipeline of Robust Global Shape Representation and Local Geometric Optimization.
- For installing robotics software, please see Robotics Software Installation.
- For hardware integration, please see 3D Printing Instructions.
Note: ROS2
doesn't play well with Conda in general. See official doc and this issue in the ROS2 repo. As a result, in this project we limited the direct interaction between ROS2
and Python perception modules.
We will need two independent virtual environments due to some compatibility issue.
Please see DN-Splatter and Metric3D Installation
Please see Grounded-SAM-2
set train.txt
with images id.
Switch your conda env first
Set your scene path and prompt text with an '.' at the end.
eg. 'transparent white statue.'
conda activate G-SAM-2
cd Grounded-SAM2-for-masking
python grounded_sam2_hf_model_imgs_MaskExtract.py --path {ABSOLUTE_PATH} --text {TEXT_PROMPT_FOR_TARGET_OBJ}
cd ..
run the script to extract masks.
If the num_no_detection
is not 0, you need to select the frame again. Then you will see mask_imgs in /masks
, and you can check /annotated
frames to see the results more directly.
You can change configs here: configs/config.py
conda activate fusionsense
python scripts/train.py --data_name {DATASET_NAME} --model_name {MODEL_NAME} --configs {CONFIG_PATH}
For render jpeg or mp4 outputs using nerfstudio, we recommend install ffmpeg in conda environment:
conda install -c conda-forge x264=='1!161.3030' ffmpeg=4.3.2
To render outputs of pretrained models:
python scripts/render_video.py camera-path --load_config your-model-config --camera_path_filename camera_path.json --rendered_output_names rgb depth normal
more details in nerfstudio ns-render
.
datasets/
ds_name/
│
├── transforms.json # need for training
│
├── train.txt
│
├── images/
│ ├── rgb_1.png
│ └── rgb_2.png
│
├── realsense_depth/
│ ├── depth_1.png
│ └── depth_2.png
│
│── tactile/
│ ├── image
│ ├── mask
│ ├── normal
│ └── patch
│
├── model.stl # need for evaluation
│
├── normals_from_pretrain/ # generated
│ ├── rgb_1.png
│ └── rgb_2.png
│
├── foreground_pcd.ply
│
└── merged_pcd.ply
outputs/
ds_name/
│
├── MESH/
│ └── mesh.ply
│
├── nerfstudio_models/
│ └── 30000.ckpt
│
├── cluster_centers.npy
│
├── config.yml
│
├── high_grad_pts.pcd
│
├── high_grad_pts_ascii.pcd
│
└── dataparser_transforms.json
eval/
ds_name/ *evaluation results files*