Introduction

A 3D occupancy map that is accurately modeled after real-world environments is essential for reliably performing robotic tasks. Probabilistic volumetric mapping (PVM) is a well-known environment mapping method using volumetric voxel grids that represent the probability of occupancy. The main bottleneck of current CPU-based PVM, such as OctoMap, is determining voxel grids with occupied and free states using ray-shooting. In this paper, we propose an octree-based PVM, called OctoMap-RT, using a hybrid of off-the-shelf ray-tracing GPUs and CPUs to substantially improve CPU-based PVM. OctoMap-RT employs massively parallel ray-shooting using GPUs to generate occupied and free voxel grids and to update their occupancy states in parallel, and it exploits CPUs to restructure the PVM using the updated voxels. Our experiments using various large-scale real-world benchmarking environments with dense and high-resolution sensor measurements demonstrate that OctoMap-RT builds maps up to 41.2 times faster than OctoMap and 9.3 times faster than the recent SuperRay CPU implementation. Moreover, OctoMap-RT constructs a map with 0.52% higher accuracy, in terms of the number of occupancy grids, than both OctoMap and SuperRay.

Our Approach

Fig.1. Pipeline of OctoMap-RT consists of intertwined CPU-based serial tasks (gray box) and GPU-based parallel tasks (yellow box). Steps 1 to 2a correspond to preprocessing and run only once, but steps 2b to 5 are repeated online as new sensor data are measured. In step 3, unknown, free, and occupied voxels are colored white, green, and blue, respectively, which are encoded as 00 (unknown), 01 (occupied), 10 (free), and 11 (free or occupied).

    With large point cloud data of high resolution, voxel determination becomes the bottleneck of a CPU-based PVM, such as OctoMap. We focus on leveraging ray-tracing GPUs that support massively parallel ray shooting to speed up CPU- based PVM. Moreover, we distribute the workload of the CPU and GPU according to the characteristics of each device and required tasks. Specifically, a ray-tracing GPU is used for intensive and regular streaming tasks, such as the BVH construction, ray-voxel intersection, and occupancy update. In contrast, the CPU handles irregular workflows, such as updating the voxel’s occupancy probability and octree. As shown in Fig. 1, OctoMap-RT consists of the following steps to build a PVM, and each step respectively corresponds to a blue-boxed enumeration in the figure:

    1) Voxel representation (CPU):
    We estimate the local extent of sensor measurements to set up the shared ray space for ray-shooting in order to minimize the size of BVH for voxels. We subdivide the shared ray space with uniform voxel grids, map the voxels to AABBs, and build a shared BVH of AABBs.
    2) Ray-shooting (GPU):
    a) BVH instancing: When we construct a BVH for a dense set of AABBs of uniform size, we build a BVH of a subset of the AABBs and instance it to multiple copies of BVHs to speed up the construction of the full BVH and reduce GPU memory consumption.
    b) Voxel intersection: We launch rays in a massively-parallel fashion to find intersected AABBs. AABBs containing ray endpoints correspond to occupied voxels. However, AABBs intersected with the rays correspond to free voxels.
    3) Consistent occupancy update (GPU):
    When a voxel contains ray endpoints and is intersected by other rays, its state can be classified as both occupied and free due to parallel processing. We must ensure that such a voxel is consistently classified as occupied.
    4) Voxel readback (GPU→CPU):
    All voxels with consistent occupancy information are read from the GPU back to the CPU.
    5) Octree update (CPU):
    The read voxels are used to update the occupancy probability and octree.

    The voxel representation 1) and BVH instancing 2a) steps in the pipeline are preprocessed, and the rest are repeated online as new sensor data is fed into the update loop. In the remaining sections, we provide detailed explanations of each step.

Results

All the experiments were conducted on Intel’s i9-13900K CPU with 64 GB of RAM and NVIDIA’s RTX 4090 GPU. We used DirectX’s DXR as a ray-tracing API with the Microsoft Visual Studio 2017 C++ programming language under 64-bit Windows 10.


Table.I. DATASET STATISTICS

Fig.2. PVM results of OctoMap-RT. The voxels are color-coded depending on the vertical height from the floor, and the dimension of each voxel is 10cm3.

Fig.3. Performance breakdown of OctoMap-RT and its comparisons against OctoMap and SuperRay in log scale with various voxel dimensions using the datasets. For OctoMap and SuperRay, the labels in the stacked bars represent the times for ray data preparation (X), ray-shooting (1), consistent occupancy update (2), and octree update (3), respectively. For OctoMap-RT, the labels in the stacked bars represent the times for voxel representation (1, CPU), voxel/ray data upload (Y, from CPU to GPU), voxel intersection (2, GPU), consistent occupancy update (3, GPU), voxel readback (4, from GPU to CPU), and octree update (5, CPU), respectively.

The average performance in building PVMs using OctoMap-RT compared with OctoMap improved by a factor of 10.7, 10.1, 25.4, and 26.2, respectively, as shown in Fig. 2a, 2b, 2c, and 2d. OctoMap-RT is also 4.2, 4.1, 4.7, and 4.9 times faster than SuperRay in Fig. 2a, 2b, 2c, and 2d.


Fig.4. Number of free/occupied voxels generated using OctoMap (O) vs. OctoMap-RT (RT) using the benchmarks in Fig. 7. (a) 649K/126K vs. 651K/125K. (b) 84M/3.2M vs. 85M/3.1M. (c) 209K/99.3K vs. 210K/99.2K. (d) 460K/232.5K vs. 461K/232.1K.

OctoMap-RT builds the PVM with more voxels than are used in OctoMap, particularly with more free voxels. Fig. 4 supports this finding using the datasets in Fig. 2 with a voxel size of 10 cm. The leaf-level voxels consist of occupied and free voxels. Compared with OctoMap and SuperRay, on average, OctoMap-RT has 0.52% more leaflevel voxels in total, 0.59% fewer occupied voxels, and 0.65% more free voxels.


Table.II. COMPARISONS OF THE GPU MEMORY USAGE IN SHARED BVH

BVH instancing has effectively mitigated memory consumption for the large-scale outdoor environment. For instance, when the voxel size is 10cm, even though the shared ray space of Freiburg is 87.5 times larger than that of SKT-Rooms, as shown in Table I, the shared BVH’s memory of the former is 5.8 times even smaller than that of the latter.

Publications


Related Work

Contact

Ewha Graphics Lab
Department of Computer Science & Engineering, Ewha Womans University
  52, Ewhayeodae-gil, Seodaemun-gu, Seoul, Korea, 03760
  +82-2-3277-6798

  Heajung Min, hjmin@ewhain.net
  Kyung Min Han, hankm@ewha.ac.kr
  Young J. Kim, kimy@ewha.ac.kr