AI Depth Estimation with Nav2 Costmap

Overview
Requirements
Tutorial Steps

Overview

Traditional 3D navigation typically requires expensive hardware like LiDAR or Stereo/RGB-D depth cameras. This pipeline allows a standard monocular camera (in this example, a USB camera) to behave like a depth sensor by leveraging an AI model, which estimates depth from 2D images. The resulting depth image is then converted into PointCloud2 message that Nav2 can use as Voxel costmap layer for obstacle avoidance and path planning, enabling cost-effective navigation on the robots.

What is Depth Anything 3 AI model?

Depth Anything 3 (DA3) is an AI model that predicts spatially consistent geometry from an arbitrary number of visual inputs, with or without known camera poses. For more details. In this tutorial, we are using ROS 2 implementation of Depth Anything 3 [2], which provides the ROS 2 composable node to run the inference of the DA3 model. The attached image shows Rviz2 with two image views, one is showig the Color Image topic and other is showing Depth Image topic, published by DA3 ROS2 Node.

Pipeline to process image

../../_images/depth_ai_data_pipeline.png

The data flows sequentially through five distinct steps:

USB Cam Node: Captures the raw RGB video stream from your physical camera. This may be easily replaced with a camera from any source, not only a USB camera.
Crop Decimate Node: Crops or skips pixels to remove unneeded peripheral data and save processing power.
Resize Node: Scales the image down to match the exact input dimensions required by the AI model.
Depth Anything V3 Node: Processes the 2D image and calculates an estimated depth map.
Point Cloud Projection: Converts the 2D depth map into a 3D Point Cloud (sensor_msgs/msg/PointCloud2), which Nav2 can natively understand.

Note

In this tutorial, we are using Depth Anything V3 TensorRT ROS 2 package to run the inference of the DA3 model. This model already provides the depth estimation and point cloud projection functionalities, so we have commented out the pointcloud node. If your are using different model, then you just need to configure the required parameters for the nodes in the nav2_depth_estimation_ai package and launch the pipeline.

Requirements

Before starting, make sure you have the following hardware and software baselines established:

Hardware

Robot: A physical TurtleBot 3 (Burger, Waffle, or Custom setup) running ROS 2 and Nav2.

Camera: Any standard, Linux-compatible Monocular USB RGB camera.

Compute Host: An edge computer mounted on the robot (e.g., NVIDIA Jetson or an x86 laptop) preferably equipped with a CUDA-compatible GPU to run the AI model at a functional frame rate.

Software

Successfully completed the official tutorial for Navigating with a Physical TurtleBot 3.

Calibrate your camera before following the tutorial steps using the official camera calibration tutorial.

Tutorial Steps

We need to install the core image processing stacks alongside the specialized TensorRT-accelerated Depth Anything V3 ROS 2 package.

1. Install Core Dependencies

Run the following commands on your robot’s main computer to install the foundational ROS 2 perception packages:

sudo apt update
sudo apt install ros-$ROS_DISTRO-image-proc ros-$ROS_DISTRO-depth-image-proc ros-$ROS_DISTRO-usb-cam

2. Build Depth Anything V3 package

Navigate to your workspace, clone the TensorRT Depth Anything stack, and build it:

cd ~/ros2_ws/src
git clone https://github.com/ika-rwth-aachen/ros2-depth-anything-v3-trt.git
git clone https://github.com/ros-navigation/navigation2.ai.git

# Resolve any missing package-level dependencies
cd ~/ros2_ws
rosdep install --from-paths src --ignore-src -r -y

# Build the packages with optimization flags enabled
colcon build --cmake-args -DCMAKE_BUILD_TYPE=Release
source install/setup.bash

1. Model Weights Preparation

You need the compiled ONNX model weights for the pipeline to compute depth maps. You can choose any option from below:

Download the ONNX file from Huggingface
Generate ONNX following the instruction here

You can place this file anywhere as you fit, e.g. depth_anything_v3/models/, but remember to modify this path in the nav2_depth_ai_params.yaml file as shown below.

4. Configure Params

Now, you need to configure the parameters for the nodes used to process the pipeline in the nav2_depth_ai_params.yaml file of nav2_depth_estimation_ai pkg.

Params for USB Cam Node

Note

Replace with your sensor driver as you see fit (i.e. realsense)

usb_cam:
  ros__parameters:
    video_device: /dev/video0
    image_width: 640
    image_height: 480
    pixel_format: mjpeg2rgb
    frame_rate: 30.0

Params for Crop Decimate Node

Note

This node is used to crop the image, update the params as you fit.

crop_decimate:
 ros__parameters:
   x_offset: 0
   y_offset: 0
   width: 640
   height: 480
   decimation_x: 1
   decimation_y: 1

Params to resize the image

Here we are using Depth Anything V3 model, and it is exported to ONNX with these params for the input image:

resize:
  ros__parameters:
    width: 504
    height: 280

Params for Depth Anything V3

Note

These params need to be configured for the Depth Anything V3 AI Model

depth_anything_v3:
  ros__parameters:
    # Model configuration
    onnx_path: "$(find-pkg-share depth_anything_v3)/models/DA3METRIC-LARGE.onnx"
    precision: "fp16"  # fp16 or fp32

    # Debug configuration
    enable_debug: false
    debug_colormap: "JET"  # JET, HOT, COOL, SPRING, SUMMER, AUTUMN, WINTER, BONE, GRAY, HSV, PARULA, PLASMA, INFERNO, VIRIDIS, MAGMA, CIVIDIS
    debug_filepath: "/tmp/depth_anything_v3_debug/"
    write_colormap: false
    debug_colormap_min_depth: 0.0    # Minimum depth value for colormap visualization
    debug_colormap_max_depth: 50.0   # Maximum depth value for colormap visualization
    sky_threshold: 0.3               # Threshold for sky classification (lower = more sky)
    sky_depth_cap: 200.0             # Maximum depth value to fill sky regions

    # Point cloud downsampling (1 = no downsampling, 10 = every 10th point)
    point_cloud_downsample_factor: 2

    # Point cloud colorization with RGB from input image
    colorize_point_cloud: true  # Set to true to publish RGB point cloud instead of XYZ only

Point Cloud Node to project the points from depth image

Note

Uncomment if you want to output the point cloud. As in this example, we are using point cloud from depth_anything_v3 node.

pointcloud:
  ros__parameters:
    image_transport: "raw"  # "raw" or "compressed"
    depth_image_transport: "raw"  # "raw" or "compressed"
    queue_size: 10
    invalid_depth: 0.0  # Depth value to use for invalid points (e.g., sky)
    colorize: true  # Set to true to publish RGB point cloud instead of XYZ only
    exact_sync: true  # Set to true to use exact sync, false for approximate synchronization

Voxel Costmap Nav2

To integrate this pointcloud with Nav2 Voxel Costmap Layer, configure the params in your Nav2 Params file.

local_costmap:
  local_costmap:
    ros__parameters:
      update_frequency: 5.0
      publish_frequency: 2.0
      global_frame: odom
      robot_base_frame: base_link
      rolling_window: true
      width: 3
      height: 3
      resolution: 0.05
      robot_radius: 0.15
      plugins: ["voxel_layer", "inflation_layer"]

      voxel_layer:
        plugin: "nav2_costmap_2d::VoxelLayer"
        enabled: true
        publish_voxel_map: true
        origin_z: 0.0
        z_resolution: 0.05
        z_voxels: 16
        max_obstacle_height: 2.0
        mark_threshold: 0
        observation_sources: pointcloud

        pointcloud:
          topic: /pipeline/points
          data_type: "PointCloud2"
          max_obstacle_height: 2.0
          min_obstacle_height: 0.2  # Ignores reflections/noise on the floor
          obstacle_max_range: 8.0    # Maximum reliable distance for AI depth
          obstacle_min_range: 0.0
          raytrace_max_range: 6.0
          raytrace_min_range: 0.0
          clearing: True
          marking: True

      inflation_layer:
        plugin: "nav2_costmap_2d::InflationLayer"
        inflation_radius: 0.5
        cost_scaling_factor: 5.0

5. Launch the Pipeline

Finally, launch the entire pipeline using the provided launch file:

ros2 launch nav2_depth_estimation_ai depth_estimation_pipeline_launch.py

Note

This will start the camera node (if not using already), process the image topic using pipeline through the DA3 model, and publish the resulting PointCloud2 topic. This topic is subscribed by Nav2 to add voxel costmap layer for path planning and obstacle avoidance.

If everything is configured and running correctly, your results in RViz2 should match the demonstration in the video below.

Acknowledgements

This tutorial was developed in collaboration with the ROS 2 community. Special thanks to the contributors who provided insights and feedback during the development process.