2.2. Planning Systems — Goal Decomposition & Reflection
🪄 Step 1: Intuition & Motivation
Core Idea: Imagine giving an agent a big, open-ended task like:
“Design a marketing campaign for a new product.” A normal LLM might generate a single, long answer. But an agentic system says — “Wait! That’s a project, not a sentence.” So it breaks the big goal into smaller steps: research → strategy → messaging → execution.
This is Planning — teaching the agent to think in steps, like a project manager with logic and memory.
Simple Analogy: Think of a chef planning a multi-course meal. They don’t start cooking randomly. They:
- Decide the courses (goals).
- Break each dish into ingredients and steps (subgoals).
- Reflect after tasting — “Too salty? Adjust next dish.” Agents plan in the same way — structure before action, reflection after.
🌱 Step 2: Core Concept
Planning is how agents turn abstract intentions into executable action graphs. Let’s unpack two core planning paradigms: Hierarchical Task Networks (HTNs) and Tree-of-Thoughts (ToT).
Hierarchical Task Networks (HTNs)
HTNs are like organizational charts for tasks. They represent a big problem as a tree:
Goal: Launch Marketing Campaign
├── Research Market
├── Design Strategy
│ ├── Define Audience
│ ├── Choose Channels
│ └── Set Budget
└── Execute Plan
├── Create Content
└── Track ResultsEach node can be decomposed into smaller, atomic tasks until the agent can directly execute them (e.g., call a tool or API).
HTNs are deterministic and structured, great for tasks with clear hierarchies and dependencies.
Tree-of-Thoughts (ToT)
While HTNs are rigid, ToT is exploratory. It models reasoning as a search tree, where each node is a partial idea or decision, and branches represent possible continuations.
The agent explores multiple branches (thoughts), evaluates them, and prunes the weak ones — similar to how a chess player anticipates several moves ahead.
Example:
Goal: “Write a blog post about AI.” Possible branches:
- Branch 1: Focus on AI Ethics.
- Branch 2: Focus on AI Productivity.
- Branch 3: Focus on AI in Education. The agent then evaluates which path best meets the goal and refines it step by step.
Why It Works This Way
Because agents, like humans, often don’t know the full solution upfront — they explore, evaluate, and iterate.
HTNs provide structure and efficiency for well-defined problems. ToT provides creativity and adaptability for open-ended reasoning.
Together, they balance planning (logic) and reflection (learning from outcomes).
How It Fits in ML Thinking
Planning aligns closely with search-based reasoning in AI and hierarchical reinforcement learning (HRL) in ML.
- HTNs resemble HRL’s “options framework” — where each subgoal is like a macro-action.
- ToT resembles beam search or Monte Carlo Tree Search (MCTS) — guided exploration of many reasoning paths.
In essence, planning turns the agent’s reasoning process into a structured decision tree rather than a single straight line.
📐 Step 3: Mathematical Foundation
Let’s describe planning as recursive goal decomposition.
Recursive Goal Function
At any time $t$, the agent has a goal $G_t$. The planning function $f$ breaks it into smaller subgoals:
$$ G_t = { g_{t1}, g_{t2}, ..., g_{tn} } $$Each subgoal $g_{ti}$ may itself be decomposed recursively until it becomes executable:
$$ g_{ti} = f(g_{ti}) $$At each layer, the agent assigns priority scores or utility values to subgoals based on relevance and cost.
🧠 Step 4: Reflection — The Agent’s Self-Improvement Engine
Once tasks are executed, reflection modules help agents learn from their performance. This step answers the meta-question:
“Did I do well, and how can I do better next time?”
Agents use techniques like:
- Post-task summarization: creating a short description of what worked and what didn’t.
- Plan refinement: adjusting the goal hierarchy or next steps.
- Prompt-based scoring: using an LLM to rate its own responses on criteria like relevance, accuracy, and completeness.
Reflection transforms agents from static executors into learning organisms — capable of evolving with experience.
🧠 Step 5: Avoiding Recursive Explosion in ToT
Tree-of-Thoughts is powerful but computationally dangerous — it can explode into thousands of reasoning paths.
To control this, agents use pruning heuristics like:
| Heuristic | Description |
|---|---|
| Beam Search | Keep only the top-$k$ best candidate branches at each level. |
| Utility Thresholds | Stop exploring paths whose utility (usefulness) falls below a threshold. |
| Depth Limits | Restrict how deep the search tree can go. |
| Heuristic Scoring | Score branches based on context relevance or similarity to goal. |
⚖️ Step 6: Strengths, Limitations & Trade-offs
- Enables structured reasoning and planning.
- Reflection promotes continuous improvement.
- Scales from small goals to multi-step projects.
- Planning can become computationally expensive for large tasks.
- Without pruning, ToT may explore redundant or nonsensical branches.
- Over-reflection may cause analysis paralysis — endless loops of self-review.
The balance lies between depth and breadth:
- Too deep = slow and overthinking.
- Too shallow = underdeveloped reasoning. Successful agents dynamically adapt their planning depth based on task complexity.
🚧 Step 7: Common Misunderstandings
🚨 Common Misunderstandings (Click to Expand)
- “Planning is just outlining steps.” No — it involves reasoning, evaluation, and reorganization based on feedback.
- “Reflection is optional.” Reflection is essential for self-correction — it’s how agents evolve beyond trial and error.
- “More subgoals mean better planning.” Not always. Over-decomposition leads to inefficiency and noise.
🧩 Step 8: Mini Summary
🧠 What You Learned: How agents turn abstract goals into structured, hierarchical action plans (HTN) and explore multiple reasoning paths (ToT).
⚙️ How It Works: Through recursive goal decomposition and reflective feedback, agents continually plan, act, and refine their strategies.
🎯 Why It Matters: Planning gives agents foresight; reflection gives them insight — together, they make autonomy intelligent rather than random.