GPU-based Massively parallel motion planning algorithms under uncertainty using POMDP


Taekhee Lee         Young J. Kim

Ewha Womans University, Korea

watersp@gmail.com  kimy@ewha.ac.kr


1. Accepted in International Journal of Robotics Research (IJRR) 2015 in
PDF (4612KB), BibTex

2. Publication in International Conference on Robotics and Automation (ICRA) 2013 in
PDF (900KB), BibTex


We present new parallel algorithms that solve continuous-state partially observable Markov decision process (POMDP) problems using the GPU (gPOMDP) and a hybrid of the GPU and CPU (hPOMDP). We choose the Monte Carlo value iteration (MCVI) method as our base algorithm and parallelize this algorithm using the multi-level parallel formulation of MCVI. For each parallel level, we propose efficient algorithms to utilize the massive data parallelism available on modern GPUs. Our GPU-based method uses the two workload distribution techniques, compute/data interleaving and workload balancing, in order to obtain the maximum parallel performance at the highest level. Here we also present a CPU-GPU hybrid method that takes advantage of both CPU and GPU parallelism in order to solve highly complex POMDP planning problems. The CPU is responsible for data preparation, while the GPU performs Monte Cacrlo simulations; these operations are performed concurrently using the compute/data overlap technique between the CPU and GPU. To the best of the authors' knowledge, our algorithms are the first parallel algorithms that efficiently execute POMDP in a massively parallel fashion utilizing the GPU or a hybrid of the GPU and CPU. Our algorithms outperform the existing CPU-based algorithm by a factor of 75~99 based on the chosen benchmark.


• a GPU-based method (gPOMDP) for solving POMDP planning problems with a moderate complexity as well as a CPU-GPU hybrid method (hPOMDP) for solving more complicated POMDP planning problems.

• Solve continuous-state POMDP problems

• Present two workload distribution techniques of the compute/data interleaving technique and workload balancing using the paradigm of gathering, scheduling and running (GSR)

• gPOMDP and hPOMDP outperform the existing CPU-based algorithm by a factor of 75~99 based on the chosen benchmark


• POMDP-based MCVI

Figure 1: The process of POMDP-based MCVI.

The algorithm samples a new belief node that is reachable from the initial belief node using a particle-filtering method. Then, it updates the policy graph using the MCVI method. Note that many loops are required to identify the best node, best policy graph action and best observation.

• gPOMDP: GPU multi-level POMDP-based parallel algorithm

Figure 2: The process of gPOMDP.

The step numbers are matched with those in Figure 1, and the multiply layered boxes of each step indicate parallel execution. The green, blue, and red boxes indicate policy graph node-, action-, and belief-level parallelisms, respectively. Note that, in gPOMDP, five loops are eliminated from the process shown in Figure 1.

• hPOMDP: CPU-GPU POMDP-based hybrid algorithm

Figure 3: The process of hPOMDP.

The step numbers are matched with those in Figure 2. hPOMDP generates particles on the CPU side in order to maximize the utilization rate of the GPU. When the number of particles is less than 7000 and the complexity of the benchmark is low, GPU particle generation can compensate for CPU-GPU data transfer; however, it is more efficient to generate particles in the CPU when the number of particles is greater than 7000 and the complexity of the benchmark is high because the particle simulation time exceeds the particle generation and upload time.


Figure 4: Various benchmarks.

Underwater navigation (Figure 4(a)) : The agent receives a positive reward when it reaches the destination "D" while it receives the negative reward when it encounters the rock "R" or takes an action for moving. A robot starts from an initial point among "S" marks and tries to reach the destination "D" by taking actions for moving. The robot can localize itself only at the bottom or top of the map at "O" marks.
Corridor (Figure 4(b)) : The robot navigates a corridor with four doors. The goal is to enter the second door from the right "O" mark. The robot receives a positive reward when it enters the target door while receiving a negative reward when entering wrong door "X" mark or moving beyond both ends of the corridor.
Herding (Figure 4(c)) : Two robots track 12 escaped crocodiles in a 21X21 grid map with obstacles. The robots have to collaborate with each other to catch the crocodiles and are centrally controlled. The robots receive a positive reward when they successfully capture a crocodile.
Aircraft (Figure 4(d)) : Two unmanned aircrafts try to avoid collision by maintaining a minimum safety distance from each other. Each aircraft receives a negative reward when the distance between them falls below the safety distance. This benchmark is defined in 3D continuous space, and the movement of each aircraft is calculated using 12 parameters including the 3D position (x, y, z), heading angle Θ, and horizontal and vertical speed (u, v) of two aircrafts. This benchmark has a higher complexity than the other 3 benchmarks because it works in 3D space rather than 2D space and uses 12 parameters, while the others use no more than 4 parameters.


Figure 5: Performance comparisons.

The first column indicates the benchmarks. gPOMDP is applied to all benchmarks except the aircraft benchmark, which uses hPOMDP. Columns 2~6 are the total times in seconds to reach the target using the policy graph nodelevel, action-level, belief-level without GSR, belief-level with GSR, and the APPL library. The last column shows speed-ups of gPOMDP or hPOMDP compared with APPL. For hPOMDP, we adopted the optimal parallelism in the GPU, the GSR method.

Figure 6: Number of simulations per second for underwater navigation.

Our GPU-based method (red) performs 10,000 simulations per second on average irrespective of the problem size, while CPU-based APPL does only 100~200 simulations in the same amount of time.

Figure 6: Number of simulations per second for underwater navigation.

The performance comparison of gPOMDP and hPOMDP. We measured the number of simulations per second while increasing the number of particles. The blue line denotes gPOMDP and the red line denotes hPOMDP. A total of 1024 GPU cores were used for both gPOMDP and hPOMDP. hPOMDP performs better when more than 7000 particles are present.

Related Links

APPL : Approximate POMDP Planning ToolKit

Copyright 2015 Computer Graphics Laboratory
Dept of Computer Science & Engineering
Ewha Womans University, Seoul, Korea
Last update: Aug 28, 2015