Fast and Accurate Task Planning using Neuro-Symbolic Language Models and Multi-level Goal Decomposition

Video

IEEE ICRA 2025

Abstract

IEEE ICRA 2025 , Fast and Accurate Task Planning using Neuro-Symbolic Language Models and Multi-level Goal Decomposition

In robotic task planning, symbolic planners using rule-based representations like PDDL are effective but struggle with long-sequential tasks in complicated environments due to exponentially increasing search space. Meanwhile, LLM-based approaches, which are grounded in artificial neural networks, offer faster inference and commonsense reasoning but suffer from lower success rates. To address the limitations of the current symbolic (slow speed) or LLM-based approaches (low accuracy), we propose a novel neuro-symbolic task planner that decomposes complex tasks into subgoals using LLM and carries out task planning for each subgoal using either symbolic or MCTS-based LLM planners, depending on the subgoal complexity. This decomposition reduces planning time and improves success rates by narrowing the search space and enabling LLMs to focus on more manageable tasks. Our method significantly reduces planning time while maintaining high success rates across task planning domains, as well as real- world and simulated robotics environments. More details are available at http://graphics.ewha.ac.kr/LLMTAMP/.

KROS 2025, (primary version without subgoal decomposition and MCTS) Neuro-Symbolic Task Replanning using Large Language Models: ▼

We introduce a novel task replanning algorithm that combines a symbolic task planner with a multimodal Large Language Model (LLM). Our algorithm starts by describing the scene by extracting the semantic and spatial relationships of objects in the environment through a multimodal LLM and an open-vocabulary object detection model. Then, the LLM formulates a planning problem in symbolic form based on the scene description and the user’s goal description, which are then processed by the symbolic planner to create task plans. These plans are converted into low-level executable codes for the robot, with the LLM performing syntax and semantic checks to ensure validity and facilitate replanning if necessary. We demonstrate the application of our replanning pipeline using dual UR5e manipulators in various benchmark tasks, including pick-and-place operations, block-stacking, and block rearrangement.

Our Approach

Neuro-symbolic task planning pipeline. LLM (the green blocks) and symbolic languages (the orange blocks) are used for various steps in the pipeline.

1) Planning formulation:
Given a planning goal in natural language description and domain knowledge, our task planner relies on PDDL to encode the problem descriptions. We also obtain the semantic and spatial relationships of target objects in the environment using a multi-modal LLM, translated and encoded in problem PDDL.

2) Subgoal generation:
We utilize the L-Model to generate a sequence of subgoals by decomposing the given goal.

3) Task planning:
If the task is moderately complex, we rely on a symbolic planner to solve each subgoal; otherwise, if the task complexity is high we generate and expand a search tree and use the MCTS algorithm with L-Policy as a roll-out policy to solve the subgoal. This subgoal planning is repeated for each sub-task, and the plans are combined to form the overall plan.

An overview of the MCTS LLM Planner.

Experimental Results

Task Planning Results

Success rates (top row) and planning time (bottom row) of CoT, FD, Symbolic LLM, MCTS LLM planners with $ 3 \leq n_s \leq 5 $, and MCTS LLM planner without goal decomposition with $n_s = 5$. The x axis in all the graphs denotes the domain complexity $n$.
Our Symbolic LLM and MCTS LLM planners significantly reduces planning time compared to FD planner and improves success rates compared to CoT planner in 3 PDDL domains (Barman-new, Blocksworld-new, Gripper-new).

Robot Demonstration

Physical robotic demonstration of our planner on Blocksworld-new domain. Initially, ten blocks, labeled from 1 to 10, are divided into three stacks and placed on the table (leftmost image). The goal is to restack the blocks at the same position in the following order: 10 on 7, 7 on 9, 9 on 8, 1 on 3, 3 on 2, 6 on 5, and 5 on 4 (rightmost image). Please refer to the video above for real-time demonstration.

Simulated robotic demonstration of our planner on Barman-new domain. Initially, three ingredients, three shots, and a shaker are placed on the table (leftmost image). The goal is to make a cocktail and pour it into a shot (rightmost image). Please refer to the video above for real-time demonstration.

Replanning Results

Without Replanning

With Replanning

[Table 1] and [Table 2] summarize the success rates and failure causes for two domains, block-stacking, and block rearrangement, comparing cases with and without replanning. For each domain, 30 problems were randomly generated, and we observed whether task planning and low-level code execution succeeded.

Bibtex

IEEE ICRA 2025, "Fast and Accurate Task Planning using Neuro-Symbolic Language Models and Multi-level Goal Decomposition":

@article{kwon2024fast,
    title={Fast and accurate task planning using neuro-symbolic language models and multi-level goal decomposition},
    author={Kwon, Minseo and Kim, Yaesol and Kim, Young J},
    journal={arXiv preprint arXiv:2409.19250},
    year={2024}
}

Contact

Ewha Computer Graphics Lab

Department of Computer Science & Engineering, Ewha Womans University
📍 52, Ewhayeodae-gil, Seodaemun-gu, Seoul, Korea, 03760
📞 +82-2-3277-6798

✉️ Minseo Kwon¹, tahitiro2@gmail.com
✉️ Yaesol Kim¹, kimyaesol@gmail.com
✉️ Young J. Kim¹, kimy@ewha.ac.kr
¹ Ewha Womans University, Korea