Probabilistic volumetric mapping (PVM) represents a 3D environmental map for an autonomous robotic navigational task. A popular implementation such as Octomap is widely used in the robotics community for such a purpose. The Octomap relies on octree to represent a PVM and its main bottleneck lies in massive ray-shooting to determine the occupancy of the underlying volumetric voxel grids. In this paper, we propose GPU-based ray shooting to drastically improve the ray shooting performance in Octomap. Our main idea is based on the use of recent ray-tracing RTX GPU, mainly designed for real-time photo-realistic computer graphics and the accompanying graphics API, known as DXR. Our ray-shooting first maps leaf-level voxels in the given octree to a set of axis-aligned bounding boxes (AABBs) and employ massively parallel ray shooting on them using GPUs to find free and occupied voxels. These are fed back into CPU to update the voxel occupancy and restructure the octree. In our experiments, we have observed more than three-orders-of magnitude performance improvement in terms of ray shooting using ray-tracing RTX GPU over a state-of-the-art Octomap CPU implementation, where the benchmarking environments consist of more than 77K points and 25K~34K voxel grids.

Our Approach

Mapping an Octree on CPU to AABBs on GPU

In order to leverage GPU-accelerated ray shooting for an octree, the voxel elements of the octree must be converted to geometric primitives that the ray-tracing GPU such as the RTX can process, which should be a set of triangles or AABBs in case of RTX. In our problem, we opt for AABBs as target primitives as their geometries are close to the shape of a voxel.

Mapping from an octree in CPU to a set of AABBs in GPU for GPU-based ray shooting.

  • We convert all leaf-level voxels in an octree, with occupied (blue), free (green), and unknown (gray) labels, to individual AABBs.
  • Spatial subdivision of space with different labels to represent occupied, free, or unknown voxel space.
  • A corresponding octree representation on CPU with leaf-level nodes highlighted in yellow and a set of AABBs on GPU that correspond to the leaf-level voxel nodes on CPU.

Massively-Parallel Ray Shooting

(a) Mapped AABBs
(b) Subdivision

Leaf-level voxels are mapped to a set of AABBs and subdivided to the finest resolution for occupancy labeling.

  • Once the BVH of AABB is computed on GPUs, we set up multiple rays in the ray generation shader and shoot them from the sensor origin to the environment obstacles, obtained as a point cloud by the sensor, to find occupied or free voxels of space.
  • If the size of an AABB intersected with a ray is greater than that of a finest-resolution voxel, we subdivide the subspace that the ray traverses to a set of sub-voxels in the finest resolution (typically 16) using DDA.
  • When ray-AABB intersection occurs, determining the occupancy of the voxel is performed in the intersection shader. If the intersection of ray and AABB no longer occurs, the miss shader is executed. Since we do not need shading in our work, we execute the miss shader to simply terminate the ray shooting.

  • The pseudo-code for our whole ray shooting procedure is given in Algorithm 1.


GPU-based ray shooting for map building was implemented on a 64bit Windows 10 operating system and Microsoft Visual Studio 2017 C++ with AMD’s Ryzen 7 3700X CPU, NVIDIA’s RTX 2080 GPU, and 16GBs RAM. We used DirectX’s DXR to drive GPU-accelerated ray tracing on RTX. As a benchmarking platform, we employed a virtual indoor environment built-in Tesse-Unity simulator where a mobile robot equipped with a stereo camera navigated around this environment to collect a point cloud data set and build an octree-based map. The acquired point cloud data is built into an octree with a maximum depth of 16, corresponding to 25K~34K leaf-level voxels.

Results of GPU-based ray shooting from different view points navigating inside a complex virtual building. The top row is the target scene. The bottom row is the corresponding hit count of rays with voxels in the space; as the color changes from blue to red, more voxels are intersected with rays; gray indicates that the distance sensor did not obtain point cloud due to reflections in the environment.

we shoot 320x240 rays (76,800 rays) per each view to collect and identify octree cells. We measure the ray shooting time using the dispatchRay function of RTX, which queries the elapsed time of ray tracing performed on the GPU.

Comparisons of Ray Shooting Performance on Octomap (CPU) and Ours (GPU), and Timing Breakdown

In the case of GPU-based ray shooting, we further included the BVH construction time on the GPU and the time for reading back the intersected voxels from GPU to CPU.

Relative Performance Comparisons of Ray Shooting between Octomap (CPU in blue bars) and Ours (GPU in gold bars) in the Logarithmic Scale. BVH construction and readback from GPU to CPU are denoted in orange and purple bars.

GPU-based ray shooting can be performed three-orders-of-magnitude faster than CPU-based ray shooting on average excluding GPUCPU readback. Even though GPU-CPU readback time is included, the performance improvement is still two-ordersof- magnitude faster than the CPU version.


Ewha Graphics Lab
Department of Computer Science & Engineering, Ewha Womans University
  52, Ewhayeodae-gil, Seodaemun-gu, Seoul, Korea, 03760

  Heajung Min,
  Kyung Min Han,
  Young J. Kim,