Kinodynamic TAMP

Kinodynamic Task and Motion Planning
using VLM-guided and Interleaved Sampling

Ewha Womans University, Dept. of Computer Science and Engineering
IEEE International Conference on Robotics and Automation (ICRA), 2026 (accepted)
^†Indicates Corresponding Author

Video

Our kinodynamic TAMP planner leverages a physics simulator as a black-box transition model to verify both geometric and dynamic constraints, and uses VLMs for backtracking to enable robust recovery from failures.

Abstract

Task and Motion Planning (TAMP) integrates high-level task planning with low-level motion feasibility, but existing methods are costly in long-horizon problems due to excessive motion sampling. While LLMs provide commonsense priors, they lack 3D spatial reasoning and cannot ensure geometric or dynamic feasibility. We propose a kinodynamic TAMP framework based on a hybrid state tree that uniformly represents symbolic and numeric states during planning, enabling task and motion decisions to be jointly decided. Kinodynamic constraints embedded in the TAMP problem are verified by an off-the-shelf motion planner and physics simulator, and a VLM guides exploring a TAMP solution and backtracks the search based on visual rendering of the states. Experiments on the simulated domains and in the real world show 32.14% - 1166.67% increased average success rates compared to traditional and LLM-based TAMP planners and reduced planning time on complex problems, with ablations further highlighting the benefits of VLM guidance.

Our approach

(1) Skeleton Space Generation: Given a problem PDDL and a domain PDDL, a top-k symbolic planner first generates a discrete state graph G, which works as a reduced skeleton space that guides (b), (c).

(2) Hybrid Tree Expansion: Guided by G, we then expand a hybrid state tree T, where each edge is expanded through motion planning and validated by physics simulation.

(3) Replanning: If a node h_t fails to expand, we retry random motion sampling up to K times. If expansion still fails, we prompt the VLM to predict a backtrack node h_r, from which the search resumes. This process repeats until a goal is found or a timeout is reached.

Experimental Results

Simulation Results

Average success rates (%) and planning times (s) of all baselines for 3 ≤ n ≤ 6. n denotes the number of target objects. Planning times are averaged over successful trials only.

Our approach outperforms PDDLStream, a domain-independent traditional TAMP baseline, as well as LLM³, an LLM-based TAMP baseline in Blocksworld and Kitchen domains.

Real-World Demonstration

We demonstrate our TAMP planner in the Blocksworld domain using dual UR5e manipulators.

For n = 3 and n = 4, the success rates are 100%.
For n = 5, two trials failed due to collisions between the gripper and the object, caused by inaccurate object localization under occlusion.
For n = 6, the success rate is 80%, consistent with simulation results.

BibTeX


        @article{kwon2025kinodynamic,
          title={Kinodynamic Task and Motion Planning using VLM-guided and Interleaved Sampling},
          author={Kwon, Minseo and Kim, Young J},
          journal={arXiv preprint arXiv:2510.26139},
          year={2025}
        }