Skip to content

[ICRA2025] Integrates the vision, touch, and common-sense information of foundational models, customized to the agent's perceptual needs.


Notifications You must be signed in to change notification settings


Repository files navigation


Integrates the vision, touch, and common-sense information of foundational models, customized to the agent's perceptual needs.


  1. Select frames:

    Run to select frames you want, or manually select, and you will get a folder of selected frames and transforms.json.

    Remember to set transforms.json in right format.

  2. Generate Mask_imgs by Grounded_SAM_2:

    set your scene path and prompt text with the end of '.'
    eg. 'transparent white statue.'


    run the script to extract masks.

    If the num_no_detection is not 0, you need to select the frame again. Then you will see mask_imgs in path/masks, and you can check path/annotated frames to see the results more directly.

  3. Generate VisualHull by masks and transforms.json:

    run to generate visual hull.

    python --path your-path  

    You will get a point cloud file foreground_pcd.ply, and a screenshot voxels.png of checking whether the generated VisualHull is correct.

  4. RealSense depth & Metric3Dv2 depth:

    Get your realsense depth from your camera file in realsense_depth folder.

    Use your RGB images to generate predict depth with Metric3Dv2.

    python --root_dir your-path

    Remember to set your camera intrinsics and image size in that file

  5. Generate initial GS model sparse points:

    run the script to generate initial sparse points using VisualHull pcd as forground and Metric3Dv2 depth as background.

    python --path your-path   

    The initial points will be saved in path/merged_pcd.ply

  6. Generate normals by dsine:

    set your rgb images path to generate normals.

    python dn_splatter/scripts/ --data-dir [PATH_TO_DATA] --model-type dsine  
  7. Set transforms and configs:

    To use realsense depth, set "depth_file_path": "realsense_depth/depth_0.png" each frame

    To use initial pts, set "ply_file_path": "merged_pcd.ply"

    To use Visual Hull prune supervised method, set "object_pc_path": "object.ply"

  8. Train:

    Select your method and configs.

    ns-train dn-splatter --pipeline.model.use-depth-loss True\
                        --pipeline.model.normal-lambda 0.4\
                        --pipeline.model.sensor-depth-lambda 0.2\
                        --pipeline.model.use-depth-smooth-loss True \
                        --pipeline.model.use-normal-loss True\
                        --pipeline.model.normal-supervision mono\
                        --pipeline.model.random_init False normal-nerfstudio\
                        --data your-path\
                        --load-pcd-normals True --load-3D-points True  --normal-format opencv
  9. Mesh Extraction:

    gs-mesh {dn, tsdf, sugar-coarse, gaussians, marching} --load-config [PATH] --output-dir [PATH]

Dataset Format

├── transforms.json
├── images/
│   ├── rgb_1.png
│   └── rgb_2.png
├── normals_from_pretrain/
│   ├── rgb_1.png
│   └── rgb_2.png
├── realsense_depth/
│   ├── depth_1.png
│   └── depth_2.png
├── object.ply
└── merged_pcd.ply