Task and Motion Planning (TAMP) integrates high-level task planning with low-level motion feasibility, but existing methods are costly in long-horizon problems due to excessive motion sampling. While LLMs provide commonsense priors, they lack 3D spatial reasoning and cannot ensure geometric or dynamic feasibility. We propose a kinodynamic TAMP framework based on a hybrid state tree that uniformly represents symbolic and numeric states during planning, enabling task and motion decisions to be jointly decided. Kinodynamic constraints embedded in the TAMP problem are verified by an off-the-shelf motion planner and physics simulator, and a VLM guides exploring a TAMP solution and backtracks the search based on visual rendering of the states. Experiments on the simulated domains and in the real world show 32.14% - 1166.67% increased average success rates compared to traditional and LLM-based TAMP planners and reduced planning time on complex problems, with ablations further highlighting the benefits of VLM guidance.
(1) Skeleton Space Generation: Given a problem PDDL and a domain PDDL, a top-k symbolic planner first generates a discrete state graph G, which works as a reduced skeleton space that guides (b), (c).
(2) Hybrid Tree Expansion: Guided by G, we then expand a hybrid state tree T, where each edge is expanded through motion planning and validated by physics simulation.
(3) Replanning: If a node ht fails to expand, we retry random motion sampling up to K times. If expansion still fails, we prompt the VLM to predict a backtrack node hr, from which the search resumes. This process repeats until a goal is found or a timeout is reached.
Average success rates (%) and planning times (s) of all baselines for 3 ≤ n ≤ 6. n denotes the number of target objects. Planning times are averaged over successful trials only.
Our approach outperforms PDDLStream, a domain-independent traditional TAMP baseline, as well as LLM3, an LLM-based TAMP baseline in Blocksworld and Kitchen domains.
We demonstrate our TAMP planner in the Blocksworld domain using dual UR5e manipulators.
@article{kwon2025kinodynamic,
title={Kinodynamic Task and Motion Planning using VLM-guided and Interleaved Sampling},
author={Kwon, Minseo and Kim, Young J},
journal={arXiv preprint arXiv:2510.26139},
year={2025}
}