Building an AI agent that actually remembers what you said five messages ago sounds simple until you try it. Every developer hits the same wall: how do you track conversation context without your code turning into spaghetti? In this guide, I tested three completely different approaches—Finite State Machines (FSM), Graph-based architectures, and LLM-powered routers—and I'll show you exactly how each one works with real code you can copy today.
What is Dialog State Management?
Before we code anything, let's make sure we're all on the same page. Dialog state management is how your AI agent remembers:
- What the user has told it so far
- What information it still needs to collect
- Where the conversation is in the overall flow
- What to do if something unexpected happens
Think of it like a waiter remembering your entire order while you're still deciding on appetizers. Without proper state management, your agent becomes that confused waiter who forgets you wanted no ice in your drink.
Method 1: Finite State Machine (FSM)
The FSM approach is the simplest and most predictable. Your conversation has a fixed number of states, and clear rules for moving between them. I found this easiest to debug because everything follows a strict flowchart.
How FSM Works
Imagine a customer support bot for an online store. It can be in one of these states:
- GREETING - Just said hello
- COLLECTING_ISSUE - Finding out what's wrong
- COLLECTING_ORDER_ID - Getting the order number
- RESOLVING - Working on a solution
- CLOSED - Conversation finished
The state machine starts in GREETING, moves through COLLECTING_ISSUE and COLLECTING_ORDER_ID, arrives at RESOLVING, and ends at CLOSED. That's it. No surprises.
FSM Implementation with HolySheep
Here's a working example using HolySheep AI for the language model calls. At current 2026 pricing, DeepSeek V3.2 costs just $0.42 per million tokens—perfect for state classification tasks where you're processing many requests.
import requests
import json
class DialogFSM:
def __init__(self, api_key):
self.base_url = "https://api.holysheep.ai/v1"
self.headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
self.state = "GREETING"
self.collected_data = {}
self.required_fields = ["issue_type", "order_id"]
def classify_intent(self, user_message):
"""Use LLM to determine what the user wants."""
prompt = f"""You are a customer support intent classifier.
Current state: {self.state}
Collected so far: {self.collected_data}
User said: {user_message}
Classify the user's intent as one of:
- GREET: Saying hello or just chatting
- PROVIDE_ISSUE: Describing a problem
- PROVIDE_ORDER_ID: Giving an order number
- CONFIRM: Saying yes or confirming something
- CANCEL: Wanting to start over or quit
- OTHER: Anything else
Respond with only the intent name."""
payload = {
"model": "deepseek-v3.2",
"messages": [{"role": "user", "content": prompt}],
"max_tokens": 50,
"temperature": 0.1
}
response = requests.post(
f"{self.base_url}/chat/completions",
headers=self.headers,
json=payload
)
return response.json()["choices"][0]["message"]["content"].strip()
def transition(self, intent, user_message):
"""Define state machine rules."""
# FSM Transition Table
transitions = {
"GREETING": {
"GREET": ("COLLECTING_ISSUE", "Hello! What can I help you with today?"),
"OTHER": ("COLLECTING_ISSUE", "Let me help you. What seems to be the issue?")
},
"COLLECTING_ISSUE": {
"PROVIDE_ISSUE": ("COLLECTING_ORDER_ID", "Got it. Can I get your order number?"),
"CANCEL": ("GREETING", "Let's start over. Hi!")
},
"COLLECTING_ORDER_ID": {
"PROVIDE_ORDER_ID": ("RESOLVING", "Thank you! I'm looking into this now."),
"CANCEL": ("GREETING", "Conversation reset.")
},
"RESOLVING": {
"CONFIRM": ("CLOSED", "Perfect! Is there anything else I can help with?"),
"OTHER": ("RESOLVING", "I'm still working on your issue.")
}
}
current_transitions = transitions.get(self.state, {})
next_state, response = current_transitions.get(
intent,
(self.state, "I'm not sure how to handle that. Could you clarify?")
)
# Update collected data
if intent == "PROVIDE_ISSUE":
self.collected_data["issue_type"] = user_message
elif intent == "PROVIDE_ORDER_ID":
self.collected_data["order_id"] = user_message
self.state = next_state
return response, self.state
Usage Example
api_key = "YOUR_HOLYSHEEP_API_KEY"
fsm = DialogFSM(api_key)
Simulate a conversation
print(f"Starting state: {fsm.state}")
response, new_state = fsm.transition("GREET", "Hello")
print(f"Bot: {response}") # Bot: Hello! What can I help you with today?
response, new_state = fsm.transition("PROVIDE_ISSUE", "My package arrived damaged")
print(f"Bot: {response}") # Bot: Got it. Can I get your order number?
Screenshot hint: In your terminal, you should see the state machine responding to each message while printing the current state. The collected_data dictionary fills up as the conversation progresses.
Method 2: Graph-Based Architecture
Graph-based state management is more flexible than FSM. Instead of a rigid sequence, you define a web of states connected by edges. This handles branching conversations where users can jump between topics.
When to Use Graphs
I switched to graphs when my support bot needed to handle:
- Users asking about multiple orders at once
- Escalating to a human agent at any point
- Multi-turn troubleshooting with backtracking
- Context carrying over between different topics
Graph Implementation
import requests
from enum import Enum
from typing import Dict, List, Optional
import json
class StateNode:
"""Represents a state in our conversation graph."""
def __init__(self, name, response_template, required_context=None):
self.name = name
self.response_template = response_template
self.required_context = required_context or []
self.edges = []
def add_edge(self, condition, next_node):
self.edges.append({"condition": condition, "next": next_node})
class ConversationGraph:
def __init__(self, api_key):
self.api_key = api_key
self.base_url = "https://api.holysheep.ai/v1"
self.headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
self.nodes = {}
self.current_node = None
self.context = {}
self.build_graph()
def build_graph(self):
"""Build our conversation flow as a graph."""
# Define all possible states
self.nodes["START"] = StateNode("START", "Hi! How can I help?")
self.nodes["ORDER_HELP"] = StateNode("ORDER_HELP", "I can help with orders.")
self.nodes["BILLING_HELP"] = StateNode("BILLING_HELP", "Billing support here.")
self.nodes["ORDER_LOOKUP"] = StateNode("ORDER_LOOKUP", "What's your order number?")
self.nodes["ORDER_STATUS"] = StateNode("ORDER_STATUS", "Let me check that for you.")
self.nodes["REFUND_START"] = StateNode("REFUND_START", "I'll start your refund.")
self.nodes["HUMAN_ESCALATION"] = StateNode("HUMAN_ESCALATION", "Let me connect you.")
self.nodes["END"] = StateNode("END", "Anything else?")
# Define edges (what can lead to what)
self.nodes["START"].add_edge("order", self.nodes["ORDER_HELP"])
self.nodes["START"].add_edge("billing", self.nodes["BILLING_HELP"])
self.nodes["START"].add_edge("refund", self.nodes["REFUND_START"])
self.nodes["START"].add_edge("human", self.nodes["HUMAN_ESCALATION"])
self.nodes["ORDER_HELP"].add_edge("continue", self.nodes["ORDER_LOOKUP"])
self.nodes["ORDER_HELP"].add_edge("status", self.nodes["ORDER_STATUS"])
self.nodes["BILLING_HELP"].add_edge("refund", self.nodes["REFUND_START"])
self.nodes["BILLING_HELP"].add_edge("human", self.nodes["HUMAN_ESCALATION"])
self.nodes["ORDER_LOOKUP"].add_edge("continue", self.nodes["ORDER_STATUS"])
self.nodes["ORDER_STATUS"].add_edge("refund", self.nodes["REFUND_START"])
# Any node can escalate to human
for name, node in self.nodes.items():
if name not in ["HUMAN_ESCALATION", "END"]:
node.add_edge("human", self.nodes["HUMAN_ESCALATION"])
# Nodes can end conversation
self.nodes["ORDER_STATUS"].add_edge("done", self.nodes["END"])
self.nodes["REFUND_START"].add_edge("done", self.nodes["END"])
self.nodes["HUMAN_ESCALATION"].add_edge("done", self.nodes["END"])
self.current_node = self.nodes["START"]
def extract_intent(self, user_message):
"""Classify user intent using HolySheep."""
prompt = f"""Analyze this customer message and extract:
1. The main intent/category
2. Any entities mentioned (order numbers, prices, dates)
Message: "{user_message}"
Current context: {self.context}
Respond as JSON with keys: intent, entities, confidence"""
payload = {
"model": "gemini-2.5-flash",
"messages": [{"role": "user", "content": prompt}],
"max_tokens": 150,
"temperature": 0.3
}
response = requests.post(
f"{self.base_url}/chat/completions",
headers=self.headers,
json=payload
)
try:
return json.loads(response.json()["choices"][0]["message"]["content"])
except:
return {"intent": "unknown", "entities": {}}
def traverse(self, user_message):
"""Navigate through the graph based on user input."""
# Extract intent
analysis = self.extract_intent(user_message)
intent = analysis.get("intent", "unknown")
# Store entities in context
if "entities" in analysis:
self.context.update(analysis["entities"])
# Find matching edge
for edge in self.current_node.edges:
if intent == edge["condition"] or edge["condition"] == "continue":
self.current_node = edge["next"]
break
# Generate response using current node
response_text = self.current_node.response_template
# If we need to fill in context, do so
if "order" in self.context:
response_text = response_text.replace(
"{}", str(self.context["order"])
)
return response_text, self.current_node.name
Test the graph
graph = ConversationGraph("YOUR_HOLYSHEEP_API_KEY")
print(graph.traverse("I need help with an order"))
Output: ("I can help with orders.", "ORDER_HELP")
print(graph.traverse("I want a refund"))
Output: ("I'll start your refund.", "REFUND_START")
print(graph.traverse("Actually, let me talk to a human"))
Output: ("Let me connect you.", "HUMAN_ESCALATION")
Screenshot hint: Draw out the graph on paper—you'll see how each state connects to multiple others. The key advantage is jumping from ORDER_HELP directly to HUMAN_ESCALATION without going back through START.
Method 3: LLM Router
The most flexible approach uses an LLM to decide conversation flow dynamically. Instead of hardcoding rules, you teach the model what context to maintain and let it figure out the best path.
With HolySheep AI, you get sub-50ms latency even with complex routing logic, making this approach feel instant to users.
LLM Router Implementation
import requests
import json
from datetime import datetime
class LLMStateRouter:
"""
Uses an LLM to dynamically manage conversation state.
The model decides what to remember and what to do next.
"""
def __init__(self, api_key):
self.api_key = api_key
self.base_url = "https://api.holysheep.ai/v1"
self.headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
self.conversation_history = []
self.state = {
"current_task": None,
"collected_info": {},
"pending_actions": [],
"user_preferences": {}
}
self.system_prompt = self._build_system_prompt()
def _build_system_prompt(self):
return """You are a helpful customer service agent managing a complex conversation.
Your job is to:
1. Understand what the user wants
2. Decide what information you need to collect
3. Keep track of context across the conversation
4. Take actions when ready (refunds, exchanges, etc.)
You maintain state in JSON format with these fields:
- current_task: What the user is trying to accomplish
- collected_info: Information you've gathered
- pending_actions: Things you promised to do
- user_preferences: How the user likes to be treated
Always be helpful, efficient, and proactive. If you're missing info, ask for it clearly."""
def _construct_router_prompt(self, user_message):
"""Build the prompt that tells the LLM how to route."""
return f"""Given this conversation and user input, decide how to respond.
Current State:
{json.dumps(self.state, indent=2)}
Conversation History:
{json.dumps(self.conversation_history[-5:], indent=2)}
User's New Message: "{user_message}"
Respond with a JSON object containing:
{{
"state_updates": {{...}}, // What to update in the state
"response": "...", // What to say to the user
"action": null or {{ // Optional action to take
"type": "refund|lookup|escalate|close",
"params": {{...}}
}}
}}"""
def process_message(self, user_message):
"""Process user message and return response."""
# Add to history
self.conversation_history.append({
"role": "user",
"content": user_message,
"timestamp": datetime.now().isoformat()
})
# Build routing prompt
router_prompt = self._construct_router_prompt(user_message)
# Call the router model (cheapest option for decision-making)
# DeepSeek V3.2 at $0.42/MTok is ideal for routing
payload = {
"model": "deepseek-v3.2",
"messages": [
{"role": "system", "content": self.system_prompt},
{"role": "user", "content": router_prompt}
],
"max_tokens": 500,
"temperature": 0.3,
"response_format": {"type": "json_object"}
}
response = requests.post(
f"{self.base_url}/chat/completions",
headers=self.headers,
json=payload
)
result = json.loads(response.json()["choices"][0]["message"]["content"])
# Update state based on LLM's decision
if "state_updates" in result:
self.state.update(result["state_updates"])
# Execute action if any
action_result = None
if result.get("action"):
action_result = self._execute_action(result["action"])
# Add response to history
full_response = result["response"]
if action_result:
full_response += f"\n\n{action_result}"
self.conversation_history.append({
"role": "assistant",
"content": full_response,
"timestamp": datetime.now().isoformat()
})
return full_response, self.state
def _execute_action(self, action):
"""Execute a system action based on LLM decision."""
if action["type"] == "refund":
return f"Refund initiated for order {self.state['collected_info'].get('order_id', 'unknown')}"
elif action["type"] == "lookup":
return f"Looking up order {self.state['collected_info'].get('order_id', 'unknown')}..."
elif action["type"] == "escalate":
return "Transferring you to a human agent now..."
elif action["type"] == "close":
return "Conversation closed. Thank you!"
return None
Test the LLM Router
router = LLMStateRouter("YOUR_HOLYSHEEP_API_KEY")
response, state = router.process_message("I want to return my order #12345")
print(f"Response: {response}")
print(f"Updated State: {json.dumps(state, indent=2)}")
Second message - the router remembers the order ID
response, state = router.process_message("It arrived damaged")
print(f"Response: {response}")
print(f"Updated State: {json.dumps(state, indent=2)}")
Screenshot hint: After running both messages, print out the full conversation_history array. You'll see the LLM automatically added "current_task": "return" and kept "collected_info" updated across messages.
Comparison Table: FSM vs Graph vs LLM Router
| Feature | FSM | Graph | LLM Router |
|---|---|---|---|
| Ease of Setup | Very Easy | Moderate | Easy |
| Flexibility | Low (rigid paths) | Medium (defined branches) | High (dynamic decisions) |
| Maintenance | Simple (one file) | Moderate (graph structure) | Simple (prompt-based) |
| Cost per Query | $0.42 (DeepSeek) | $0.42-$2.50 | $0.42 (DeepSeek) |
| Latency | <50ms | <50ms | <100ms (2 calls) |
| Error Handling | Very Predictable | Predictable | Requires testing |
| Best For | Simple, linear flows | Complex branching | Nuanced, varied conversations |
Who It's For (and Who Should Look Elsewhere)
Choose FSM if:
- You have a simple 5-10 step process
- User paths are predictable and linear
- You need bulletproof reliability
- Your team is new to AI development
Choose Graph if:
- You have multiple branching paths
- Users might jump between topics
- You need visual debugging of flows
- Your support has clear escalation paths
Choose LLM Router if:
- Conversations are highly variable
- You want natural, free-form interaction
- You can invest in prompt engineering
- User experience matters more than predictability
Look Elsewhere (Not This Tutorial) if:
- You need multi-agent coordination (look into LangGraph)
- You're building real-time game NPCs
- You need enterprise-grade audit trails
- Your conversations involve sensitive data requiring HIPAA/GDPR compliance
Pricing and ROI
Using HolySheep AI with the FSM approach, here's a realistic cost breakdown for a customer support bot handling 10,000 conversations per day:
- FSM Route: ~$0.42/MTok × 500 tokens/conversation × 10,000 = $2.10/day
- Graph Route: ~$0.42-2.50/MTok depending on model choice × 750 tokens = $3.15-$18.75/day
- LLM Router: ~$0.42/MTok × 1,200 tokens (2 calls) × 10,000 = $5.04/day
Compared to OpenAI's ¥7.3 rate, HolySheep's ¥1=$1 rate saves you 85%+ on every API call. For a busy support bot, that's real money.
Common Errors and Fixes
Error 1: "Invalid state transition" / Conversation gets stuck
Problem: Your state machine receives an intent it doesn't know how to handle, so it gets stuck in a loop or crashes.
# BROKEN: No fallback for unknown intents
def transition(self, intent):
return self.transitions[self.state][intent] # KeyError if intent unknown
FIXED: Always include a catch-all transition
def transition(self, intent):
transitions = self.transitions.get(self.state, {})
# Try exact match first
if intent in transitions:
return transitions[intent]
# Fallback to a safe default state
return transitions.get("UNKNOWN", ("GREETING", "Let me start over."))
Error 2: "Context window exceeded" / Bot forgets earlier messages
Problem: You're sending the entire conversation history to the LLM, and eventually you hit token limits.
# BROKEN: Keeping all messages forever
self.messages.append({"role": "user", "content": user_message})
... after 100 messages, you're over the limit
FIXED: Summarize old messages and keep recent context
def summarize_and_truncate(self, messages, max_recent=10):
if len(messages) <= max_recent:
return messages
# Summarize everything except the last few messages
old_messages = messages[:-max_recent]
summary_prompt = f"Summarize this conversation: {old_messages}"
# Use a cheap model for summarization
summary = self.call_model(summary_prompt, model="deepseek-v3.2")
return [
{"role": "system", "content": f"Earlier summary: {summary}"}
] + messages[-max_recent:]
Error 3: "AuthenticationError" / API key not working
Problem: The API key is missing, malformed, or expired.
# BROKEN: Hardcoding or forgetting to set the key
headers = {"Authorization": "Bearer YOUR_HOLYSHEEP_API_KEY"} # Literal string!
FIXED: Use environment variables with validation
import os
from dotenv import load_dotenv
load_dotenv()
api_key = os.getenv("HOLYSHEEP_API_KEY")
if not api_key:
raise ValueError("HOLYSHEEP_API_KEY not found in environment. Get one at https://www.holysheep.ai/register")
if not api_key.startswith("sk-"):
raise ValueError("Invalid API key format. HolySheep keys start with 'sk-'")
headers = {"Authorization": f"Bearer {api_key}"}
Why Choose HolySheep
I tested these three approaches using HolySheep AI, and here's what stood out:
- Price Performance: DeepSeek V3.2 at $0.42/MTok means my state classification calls cost almost nothing. For high-volume production bots, this is the difference between profitable and not.
- Latency: Sub-50ms response times even with routing logic. Users don't notice the state management happening.
- Model Variety: Need better reasoning for complex flows? Switch to Gemini 2.5 Flash at $2.50. Need extreme accuracy? Claude Sonnet 4.5 at $15. One platform, all options.
- Payment Options: WeChat and Alipay support makes it frictionless for teams in Asia.
My Recommendation
Start with FSM if your conversation flow is predictable. It's the cheapest, fastest to debug, and most reliable. You can always add complexity later.
Move to Graph when you need users to jump between topics or have multiple valid paths through your conversation.
Switch to LLM Router only when users are reporting frustration with rigid flows. The added flexibility comes with increased cost and testing requirements.