OctoMap-RT: Fast Probabilistic Volumetric Mapping Using Ray-Tracing GPUs

Introduction

A 3D occupancy map that is accurately modeled after real-world environments is essential for reliably performing robotic tasks. Probabilistic volumetric mapping (PVM) is a well-known environment mapping method using volumetric voxel grids that represent the probability of occupancy. The main bottleneck of current CPU-based PVM, such as OctoMap, is determining voxel grids with occupied and free states using ray-shooting. In this paper, we propose an octree-based PVM, called OctoMap-RT, using a hybrid of off-the-shelf ray-tracing GPUs and CPUs to substantially improve CPU-based PVM. OctoMap-RT employs massively parallel ray-shooting using GPUs to generate occupied and free voxel grids and to update their occupancy states in parallel, and it exploits CPUs to restructure the PVM using the updated voxels. Our experiments using various large-scale real-world benchmarking environments with dense and high-resolution sensor measurements demonstrate that OctoMap-RT builds maps up to 41.2 times faster than OctoMap and 9.3 times faster than the recent SuperRay CPU implementation. Moreover, OctoMap-RT constructs a map with 0.52% higher accuracy, in terms of the number of occupancy grids, than both OctoMap and SuperRay.

Our Approach

Fig.1. Pipeline of OctoMap-RT consists of intertwined CPU-based serial tasks (gray box) and GPU-based parallel tasks (yellow box). Steps 1 to 2a correspond to preprocessing and run only once, but steps 2b to 5 are repeated online as new sensor data are measured. In step 3, unknown, free, and occupied voxels are colored white, green, and blue, respectively, which are encoded as 00 (unknown), 01 (occupied), 10 (free), and 11 (free or occupied).

1) Voxel representation (CPU):

We estimate the local extent of sensor measurements to set up the shared ray space for ray-shooting in order to minimize the size of BVH for voxels. We subdivide the shared ray space with uniform voxel grids, map the voxels to AABBs, and build a shared BVH of AABBs.

2) Ray-shooting (GPU):

a) BVH instancing: When we construct a BVH for a dense set of AABBs of uniform size, we build a BVH of a subset of the AABBs and instance it to multiple copies of BVHs to speed up the construction of the full BVH and reduce GPU memory consumption.
b) Voxel intersection: We launch rays in a massively-parallel fashion to find intersected AABBs. AABBs containing ray endpoints correspond to occupied voxels. However, AABBs intersected with the rays correspond to free voxels.

3) Consistent occupancy update (GPU):

When a voxel contains ray endpoints and is intersected by other rays, its state can be classified as both occupied and free due to parallel processing. We must ensure that such a voxel is consistently classified as occupied.

4) Voxel readback (GPU→CPU):

All voxels with consistent occupancy information are read from the GPU back to the CPU.

5) Octree update (CPU):

The read voxels are used to update the occupancy probability and octree.

Results

All the experiments were conducted on Intel’s i9-13900K CPU with 64 GB of RAM and NVIDIA’s RTX 4090 GPU. We used DirectX’s DXR as a ray-tracing API with the Microsoft Visual Studio 2017 C++ programming language under 64-bit Windows 10.

Table.I. DATASET STATISTICS

Fig.2. PVM results of OctoMap-RT. The voxels are color-coded depending on the vertical height from the floor, and the dimension of each voxel is 10cm³.

Fig.3. Performance breakdown of OctoMap-RT and its comparisons against OctoMap and SuperRay in log scale with various voxel dimensions using the datasets. For OctoMap and SuperRay, the labels in the stacked bars represent the times for ray data preparation (X), ray-shooting (1), consistent occupancy update (2), and octree update (3), respectively. For OctoMap-RT, the labels in the stacked bars represent the times for voxel representation (1, CPU), voxel/ray data upload (Y, from CPU to GPU), voxel intersection (2, GPU), consistent occupancy update (3, GPU), voxel readback (4, from GPU to CPU), and octree update (5, CPU), respectively.

The average performance in building PVMs using OctoMap-RT compared with OctoMap improved by a factor of 10.7, 10.1, 25.4, and 26.2, respectively, as shown in Fig. 2a, 2b, 2c, and 2d. OctoMap-RT is also 4.2, 4.1, 4.7, and 4.9 times faster than SuperRay in Fig. 2a, 2b, 2c, and 2d.

Fig.4. Number of free/occupied voxels generated using OctoMap (O) vs. OctoMap-RT (RT) using the benchmarks in Fig. 7. (a) 649K/126K vs. 651K/125K. (b) 84M/3.2M vs. 85M/3.1M. (c) 209K/99.3K vs. 210K/99.2K. (d) 460K/232.5K vs. 461K/232.1K.

OctoMap-RT builds the PVM with more voxels than are used in OctoMap, particularly with more free voxels. Fig. 4 supports this finding using the datasets in Fig. 2 with a voxel size of 10 cm. The leaf-level voxels consist of occupied and free voxels. Compared with OctoMap and SuperRay, on average, OctoMap-RT has 0.52% more leaflevel voxels in total, 0.59% fewer occupied voxels, and 0.65% more free voxels.

Table.II. COMPARISONS OF THE GPU MEMORY USAGE IN SHARED BVH

BVH instancing has effectively mitigated memory consumption for the large-scale outdoor environment. For instance, when the voxel size is 10cm, even though the shared ray space of Freiburg is 87.5 times larger than that of SKT-Rooms, as shown in Table I, the shared BVH’s memory of the former is 5.8 times even smaller than that of the latter.

Publications

Heajung Min, Kyung Min Han and Young J. Kim, "OctoMap-RT: Fast Probabilistic Volumetric Mapping Using Ray-Tracing GPUs", IEEE Robotics and Automation Letters (RA-L), 2023

Related Work

Heajung Min, Kyung Min Han and Young J. Kim, "Accelerating Probabilistic Volumetric Mapping using Ray-Tracing Graphics Hardware", IEEE International Conference on Robotics and Automation (ICRA), May 2021

Contact

Ewha Graphics Lab
Department of Computer Science & Engineering, Ewha Womans University
52, Ewhayeodae-gil, Seodaemun-gu, Seoul, Korea, 03760
+82-2-3277-6798

Homepage

Heajung Min, hjmin@ewhain.net
Kyung Min Han, hankm@ewha.ac.kr
Young J. Kim, kimy@ewha.ac.kr

Table.I. DATASET STATISTICS

Fig.2. PVM results of OctoMap-RT. The voxels are color-coded depending on the vertical height from the floor, and the dimension of each voxel is 10cm3.

Fig.4. Number of free/occupied voxels generated using OctoMap (O) vs. OctoMap-RT (RT) using the benchmarks in Fig. 7. (a) 649K/126K vs. 651K/125K. (b) 84M/3.2M vs. 85M/3.1M. (c) 209K/99.3K vs. 210K/99.2K. (d) 460K/232.5K vs. 461K/232.1K.

Table.II. COMPARISONS OF THE GPU MEMORY USAGE IN SHARED BVH

Fig.2. PVM results of OctoMap-RT. The voxels are color-coded depending on the vertical height from the floor, and the dimension of each voxel is 10cm³.