We present a new parallel algorithm for collision detection using many-core computing platforms of CPUs or GPUs.
Based on the notion of a p-partition front, our algorithm is able to evenly partition and distribute the workload of BVH traversal
among multiple processing cores without the need for dynamic balancing, while minimizing the memory overhead inherent to
state-of-the-art parallel collision detection algorithms. We demonstrate the scalability of our algorithm on different benchmarking
scenarios with and without using temporal coherence, including dynamic simulation of rigid bodies, cloth simulation and random
collision courses. In these experiments, we observe nearly linear performance improvement in terms of the number of processing
cores on the CPUs and GPUs.