Skip to content

Integrates the vision, touch, and common-sense information of foundational models, customized to the agent's perceptual needs.

License

Notifications You must be signed in to change notification settings

ai4ce/FusionSense

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

More Documentation Ongoing for VLM Reasoning and Real World Experiments. The README Needs a Lot of Cleaning and Update

๐Ÿ†• [2024-10-17] Installation for Hardware Integration/3D Printing Updated.

๐Ÿ†• [2024-10-15] Installation for Robotics Software Updated.

๐Ÿ†• [2024-10-11] Made Public

FusionSense

[Page] | [Paper] | [Video]

This is the official implementation of FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction

Irving Fang, Kairui Shi, Xujin He, Siqi Tan, Yifan Wang, Hanwen Zhao, Hung-Jui Huang, Wenzhen Yuan, Chen Feng, Jing Zhang

FusionSense is a novel 3D reconstruction framework that enables robots to fuse priors from foundation models with highly sparse observations from vision and tactile sensors. It enables visually and geometrically accurate scene and object reconstruction, even for conventionally challenging objects.

FusionSense Snapshot

Preparation

This repo has been tested on Ubuntu 20.04 and 22.04. The real-world experiment is conducted on 22.04 as ROS2 Humble requires it.

Step 0: Install Everything Robotics

We used a depth camera mounted on a robot arm powered by ROS2 to acquire pictures with accurate pose information. We also used a tactile sensor for Active Touch Selection.

If you have no need for this part, feel free to jump into Step 1 for the 3D Gaussian pipeline of Robust Global Shape Representation and Local Geometric Optimization.

Note: ROS2 doesn't play well with Conda in general. See official doc and this issue in the ROS2 repo. As a result, in this project, ROS2 uses the minimal system Python environment and have limited direct interaction with the Python perception modules.

Step 1: Install 3D Gaussian Dependencies

We will need two independent virtual environments due to some compatibility issue.

Step 1.1: DN-Splatter and Metric3D

Please see DN-Splatter and Metric3D Installation

Step 1.2: Grounded-SAM-2

Please see Grounded-SAM-2

Usage

0. Prepare Data

You can see here for an example dataset structure.

Note that a lot of the folders are generated during the pipeline. The data needed to start this projects are: images, realsense_depth, tactile, gelsight_transform.json and transforms.json.

The ROS2 packages I shared can be used to acquire them. Or you can manually format your dataset this way.

In the following documentation, I will assume that the dataset is put under /home/irving/.

1. Extract Mask

Switch your conda env first

conda activate G-SAM-2

Inside the submodule of our Grounded-SAM2

cd Grounded-SAM2-for-masking

Run the script to extract masks by setting your scene path and prompt text with an '.' at the end. ย  eg. --path /home/irving/FusionSense_data/transparent_bunny --text 'transparent bunny statue.' ย 

python grounded_sam2_hf_model_imgs_MaskExtract.py ย --path {ABSOLUTE_PATH} --text {TEXT_PROMPT_FOR_TARGET_OBJ}

You will see mask_imgs in the newly created /masks folder, and you can check /annotated folder to see the results more directly.

2. Select Frames

set train.txt with images id.

3. Run pipeline

You can change configs here: configs/config.py

conda activate fusionsense
python scripts/train.py --data_name {DATASET_NAME} --model_name {MODEL_NAME} --configs {CONFIG_PATH}

4. Render outputs

For render jpeg or mp4 outputs using nerfstudio, we recommend install ffmpeg in conda environment:

conda install -c conda-forge x264=='1!161.3030' ffmpeg=4.3.2

To render outputs of pretrained models:

python scripts/render_video.py camera-path --load_config your-model-config --camera_path_filename camera_path.json --rendered_output_names rgb depth normal

more details in nerfstudio ns-render.

Dataset Format

datasets/
ย  ย  ds_name/
ย  ย  โ”‚
ย  ย  โ”œโ”€โ”€ transforms.json # need for training
ย  ย  โ”‚
ย  ย  โ”œโ”€โ”€ train.txt
ย  ย  โ”‚
ย  ย  โ”œโ”€โ”€ images/
ย  ย  โ”‚ ย  โ”œโ”€โ”€ rgb_1.png
ย  ย  โ”‚ ย  โ””โ”€โ”€ rgb_2.png
ย  ย  โ”‚ 
ย  ย  โ”œโ”€โ”€ realsense_depth/
ย  ย  โ”‚ ย  โ”œโ”€โ”€ depth_1.png
ย  ย  โ”‚ ย  โ””โ”€โ”€ depth_2.png
ย  ย  โ”‚
ย  ย  โ”‚โ”€โ”€ tactile/
ย  ย  โ”‚ ย  โ”œโ”€โ”€ image
ย  ย  โ”‚ ย  โ”œโ”€โ”€ mask
ย  ย  โ”‚ ย  โ”œโ”€โ”€ normal
ย  ย  โ”‚ ย  โ””โ”€โ”€ patch
ย  ย  โ”‚
ย  ย  โ”œโ”€โ”€ model.stl ย  ย  ย  # need for evaluation
ย  ย  โ”‚
ย  ย  โ”œโ”€โ”€ normals_from_pretrain/ # generated
ย  ย  โ”‚ ย  โ”œโ”€โ”€ rgb_1.png
ย  ย  โ”‚ ย  โ””โ”€โ”€ rgb_2.png
ย  ย  โ”‚
ย  ย  โ”œโ”€โ”€ foreground_pcd.ply
ย  ย  โ”‚
ย  ย  โ””โ”€โ”€ merged_pcd.ply

Outputs Format

outputs/
ย  ย  ds_name/
ย  ย  โ”‚
ย  ย  โ”œโ”€โ”€ MESH/
ย  ย  โ”‚ ย  โ””โ”€โ”€ mesh.ply
ย  ย  โ”‚
ย  ย  โ”œโ”€โ”€ nerfstudio_models/
ย  ย  โ”‚ ย  โ””โ”€โ”€ 30000.ckpt
ย  ย  โ”‚ ย  
ย  ย  โ”œโ”€โ”€ cluster_centers.npy
ย  ย  โ”‚
ย  ย  โ”œโ”€โ”€ config.yml
ย  ย  โ”‚
ย  ย  โ”œโ”€โ”€ high_grad_pts.pcd
ย  ย  โ”‚
ย  ย  โ”œโ”€โ”€ high_grad_pts_ascii.pcd
ย  ย  โ”‚
ย  ย  โ””โ”€โ”€ dataparser_transforms.json

eval/
ย  ย  ds_name/ *evaluation results files*