MCP Resource và Prompt Template: Hướng Dẫn Quản Lý Context Nâng Cao

Tôi đã từng mất 3 ngày debug một lỗi context_length_exceeded trong production vì không hiểu cách MCP Resource hoạt động. Đó là lý do tôi viết bài này — để bạn không phải đi qua con đường gập ghềnh đó.

Vấn Đề Thực Tế: Khi Context Trở Thành N Bottleneck

Trong một dự án chatbot hỗ trợ khách hàng, tôi gặp lỗi này:

Error: This model's maximum context length is 128000 tokens, 
but you requested 156234 tokens (156234 in the messages + 0 in the completion). 
Please reduce the message length!

Sau khi phân tích, tôi nhận ra: hệ thống đang gửi toàn bộ lịch sử hội thoại (cả nghìn tin nhắn) cho mỗi request. Đó là lúc tôi hiểu sức mạnh của MCP Resource và Prompt Template.

MCP Resource Là Gì?

MCP Resource (Model Context Protocol Resource) cho phép bạn tổ chức và truyền dữ liệu theo cách có cấu trúc, thay vì nhồi nhét mọi thứ vào prompt.

Prompt Template: Bộ Khung Thông Minh

Prompt Template giúp bạn tạo ra các khuôn mẫu tái sử dụng, có thể điều chỉnh dynamic variables mà không cần hard-code.

Triển Khai Thực Tế Với HolyShehe AI

Với HolySheep AI, bạn có thể sử dụng API tương thích OpenAI格式 với chi phí chỉ từ $0.42/MTok (DeepSeek V3.2). Điều này giúp bạn thoải mái experiment với context management mà không lo về chi phí.

Code Triển Khai Chi Tiết

1. Thiết Lập MCP Resource Handler

import requests
import json
from typing import Dict, Any, List
from datetime import datetime

class MCPResourceHandler:
    """Xử lý MCP Resource với caching thông minh"""
    
    def __init__(self, api_key: str):
        self.api_key = api_key
        self.base_url = "https://api.holysheep.ai/v1"
        self.resource_cache = {}
        self.cache_ttl = 300  # 5 phút
        
    def create_resource(self, resource_id: str, data: Dict[str, Any]) -> str:
        """Tạo MCP Resource với metadata"""
        resource = {
            "id": resource_id,
            "data": data,
            "created_at": datetime.utcnow().isoformat(),
            "token_count": self._estimate_tokens(data)
        }
        self.resource_cache[resource_id] = resource
        return resource_id
    
    def _estimate_tokens(self, data: Any) -> int:
        """Ước tính tokens — giả định 1 token = 4 ký tự"""
        text = json.dumps(data) if isinstance(data, (dict, list)) else str(data)
        return len(text) // 4
    
    def get_context_window(self, resources: List[str], max_tokens: int = 128000) -> Dict[str, Any]:
        """Tính toán context window sử dụng"""
        total_tokens = 0
        selected_resources = []
        
        for res_id in resources:
            if res_id not in self.resource_cache:
                continue
                
            res = self.resource_cache[res_id]
            if total_tokens + res["token_count"] <= max_tokens:
                selected_resources.append(res)
                total_tokens += res["token_count"]
        
        return {
            "total_tokens": total_tokens,
            "selected_resources": selected_resources,
            "remaining_tokens": max_tokens - total_tokens,
            "utilization_pct": round((total_tokens / max_tokens) * 100, 2)
        }

Khởi tạo handler
handler = MCPResourceHandler(api_key="YOUR_HOLYSHEEP_API_KEY")
print("MCP Resource Handler đã khởi tạo thành công")

2. Prompt Template System Hoàn Chỉnh

class PromptTemplate:
    """Hệ thống Prompt Template với variable substitution"""
    
    def __init__(self):
        self.templates = {}
        
    def register(self, name: str, template: str, variables: List[str]):
        """Đăng ký template mới"""
        self.templates[name] = {
            "template": template,
            "variables": variables,
            "version": 1
        }
    
    def render(self, name: str, context: Dict[str, Any]) -> str:
        """Render template với context data"""
        if name not in self.templates:
            raise ValueError(f"Template '{name}' không tồn tại")
        
        tmpl = self.templates[name]["template"]
        
        for var in tmpl.split("{{"):
            if "}}" in var:
                var_name = var.split("}}")[0].strip()
                if var_name in context:
                    tmpl = tmpl.replace(f"{{{{ {var_name} }}}}", str(context[var_name]))
                    
        return tmpl
    
    def estimate_cost(self, name: str, model: str) -> float:
        """Ước tính chi phí cho mỗi request"""
        pricing = {
            "gpt-4.1": 8.0,           # $8/MTok
            "claude-sonnet-4.5": 15.0, # $15/MTok  
            "gemini-2.5-flash": 2.5,   # $2.50/MTok
            "deepseek-v3.2": 0.42      # $0.42/MTok
        }
        
        if name not in self.templates:
            return 0.0
            
        template_text = self.templates[name]["template"]
        tokens = len(template_text) // 4
        rate = pricing.get(model, 8.0)
        
        return (tokens / 1_000_000) * rate

Sử dụng hệ thống template
template_system = PromptTemplate()

Template cho chatbot hỗ trợ khách hàng
customer_support_template = """
Bạn là trợ lý hỗ trợ khách hàng {{company_name}}.

Ngữ cảnh sản phẩm:
{{product_context}}

Lịch sử hội thoại gần đây:
{{recent_conversation}}

Câu hỏi hiện tại: {{user_question}}

Hãy trả lời dựa trên ngữ cảnh, không bịa đặt thông tin.
"""

template_system.register(
    name="customer_support",
    template=customer_support_template,
    variables=["company_name", "product_context", "recent_conversation", "user_question"]
)

Template cho tóm tắt tài liệu
doc_summary_template = """
Tóm tắt tài liệu sau trong {{summary_length}}:

Tiêu đề: {{document_title}}
Nội dung:
{{document_content}}

Yêu cầu:
- Điểm chính: {{key_points_required}}
- Giọng văn: {{tone}}
"""

template_system.register(
    name="document_summary",
    template=doc_summary_template,
    variables=["summary_length", "document_title", "document_content", "key_points_required", "tone"]
)

Ước tính chi phí
cost_gpt = template_system.estimate_cost("customer_support", "gpt-4.1")
cost_deepseek = template_system.estimate_cost("customer_support", "deepseek-v3.2")

print(f"Chi phí với GPT-4.1: ${cost_gpt:.6f}")
print(f"Chi phí với DeepSeek V3.2: ${cost_deepseek:.6f}")
print(f"Tiết kiệm: {round((1 - cost_deepseek/cost_gpt) * 100, 1)}%")

3. Tích Hợp Gọi API Với Context Management

import requests
import time

class ContextAwareAI:
    """AI Client với context management thông minh"""
    
    def __init__(self, api_key: str, model: str = "deepseek-v3.2"):
        self.api_key = api_key
        self.base_url = "https://api.holysheep.ai/v1"
        self.model = model
        self.resource_handler = MCPResourceHandler(api_key)
        self.template_system = PromptTemplate()
        self._setup_templates()
        
    def _setup_templates(self):
        """Thiết lập các template mặc định"""
        self.template_system.register(
            name="smart_context",
            template="""Xử lý yêu cầu sau với context được tối ưu hóa:

Số tokens khả dụng: {{available_tokens}}
Ngữ cảnh quan trọng:
{{relevant_context}}

Yêu cầu: {{user_request}}""",
            variables=["available_tokens", "relevant_context", "user_request"]
        )
    
    def call_with_context(
        self, 
        user_request: str, 
        resources: List[Dict],
        max_context_tokens: int = 100000
    ) -> Dict[str, Any]:
        """Gọi API với context được tối ưu"""
        
        # Đăng ký resources
        for i, res in enumerate(resources):
            self.resource_handler.create_resource(
                resource_id=f"res_{i}",
                data=res
            )
        
        # Tính toán context window
        context_info = self.resource_handler.get_context_window(
            resources=[f"res_{i}" for i in range(len(resources))],
            max_tokens=max_context_tokens
        )
        
        # Render prompt với context
        prompt = self.template_system.render("smart_context", {
            "available_tokens": context_info["remaining_tokens"],
            "relevant_context": self._format_context(context_info["selected_resources"]),
            "user_request": user_request
        })
        
        # Gọi API
        start_time = time.time()
        
        response = requests.post(
            f"{self.base_url}/chat/completions",
            headers={
                "Authorization": f"Bearer {self.api_key}",
                "Content-Type": "application/json"
            },
            json={
                "model": self.model,
                "messages": [{"role": "user", "content": prompt}],
                "temperature": 0.7,
                "max_tokens": 2000
            },
            timeout=30
        )
        
        latency_ms = (time.time() - start_time) * 1000
        
        if response.status_code != 200:
            raise Exception(f"API Error: {response.status_code} - {response.text}")
        
        result = response.json()
        
        return {
            "response": result["choices"][0]["message"]["content"],
            "usage": result.get("usage", {}),
            "latency_ms": round(latency_ms, 2),
            "context_utilization": context_info["utilization_pct"]
        }
    
    def _format_context(self, resources: List[Dict]) -> str:
        """Format context thành text"""
        formatted = []
        for res in resources:
            formatted.append(f"- {res['data']}")
        return "\n".join(formatted)

Sử dụng thực tế
client = ContextAwareAI(api_key="YOUR_HOLYSHEEP_API_KEY")

Test với sample context
test_resources = [
    {"type": "product_info", "content": "Máy tính xách tay XYZ - RAM 16GB, SSD 512GB"},
    {"type": "faq", "content": "Bảo hành 24 tháng, hỗ trợ kỹ thuật 24/7"},
    {"type": "promotion", "content": "Giảm giá 15% cho sinh viên"}
]

try:
    result = client.call_with_context(
        user_request="Máy tính này có bảo hành bao lâu và còn khuyến mãi gì không?",
        resources=test_resources
    )
    
    print(f"Response: {result['response']}")
    print(f"Tokens used: {result['usage'].get('total_tokens', 'N/A')}")
    print(f"Latency: {result['latency_ms']}ms")
    print(f"Context utilization: {result['context_utilization']}%")
    
except Exception as e:
    print(f"Lỗi: {e}")

Lỗi Thường Gặp Và Cách Khắc Phục

1. Lỗi Context Length Exceeded

# ❌ Sai: Gửi toàn bộ lịch sử không giới hạn
messages = history  # 1000+ tin nhắn = crash

✅ Đúng: Giới hạn context window
def truncate_conversation(messages: list, max_tokens: int = 120000) -> list:
    """Chỉ giữ lại N tin nhắn gần nhất"""
    result = []
    total_tokens = 0
    
    for msg in reversed(messages):
        msg_tokens = len(msg["content"]) // 4
        if total_tokens + msg_tokens <= max_tokens:
            result.insert(0, msg)
            total_tokens += msg_tokens
        else:
            break
            
    return result

Hoặc dùng sliding window
def sliding_window_context(messages: list, window_size: int = 10) -> list:
    """Chỉ giữ lại window_size tin nhắn gần nhất"""
    return messages[-window_size:] if len(messages) > window_size else messages

2. Lỗi 401 Unauthorized

# ❌ Sai: API key không đúng hoặc thiếu Bearer
headers = {
    "Authorization": "YOUR_HOLYSHEEP_API_KEY"  # Thiếu "Bearer "
}

✅ Đúng: Format chuẩn với Bearer prefix
def create_auth_header(api_key: str) -> dict:
    if not api_key:
        raise ValueError("API key không được để trống")
    
    return {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }

Test connection
def verify_connection(api_key: str) -> bool:
    """Kiểm tra kết nối API"""
    try:
        response = requests.get(
            "https://api.holysheep.ai/v1/models",
            headers=create_auth_header(api_key),
            timeout=10
        )
        return response.status_code == 200
    except requests.exceptions.RequestException as e:
        print(f"Lỗi kết nối: {e}")
        return False

Sử dụng
if not verify_connection("YOUR_HOLYSHEEP_API_KEY"):
    raise Exception("Vui lòng kiểm tra API key của bạn")

3. Lỗi Timeout và Rate Limiting

# ❌ Sai: Không handle timeout, spam request
for i in range(100):
    response = requests.post(url, json=data)  # Rate limit ngay!

✅ Đúng: Exponential backoff + retry
import time
import random
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def create_resilient_session():
    """Tạo session với retry logic"""
    session = requests.Session()
    
    retry_strategy = Retry(
        total=3,
        backoff_factor=1,
        status_forcelist=[429, 500, 502, 503, 504],
    )
    
    adapter = HTTPAdapter(max_retries=retry_strategy)
    session.mount("https://", adapter)
    
    return session

def call_with_retry(session, url: str, payload: dict, headers: dict, max_retries=3):
    """Gọi API với retry và exponential backoff"""
    
    for attempt in range(max_retries):
        try:
            response = session.post(
                url,
                json=payload,
                headers=headers,
                timeout=60  # 60 giây timeout
            )
            
            if response.status_code == 429:
                wait_time = 2 ** attempt + random.uniform(0, 1)
                print(f"Rate limited. Đợi {wait_time:.2f}s...")
                time.sleep(wait_time)
                continue
                
            return response
            
        except requests.exceptions.Timeout:
            print(f"Timeout ở lần thử {attempt + 1}")
            time.sleep(2 ** attempt)
            
        except requests.exceptions.ConnectionError as e:
            print(f"Connection error: {e}")
            time.sleep(5)
    
    raise Exception(f"Thất bại sau {max_retries} lần thử")

Sử dụng
session = create_resilient_session()
result = call_with_retry(
    session,
    url="https://api.holysheep.ai/v1/chat/completions",
    payload={"model": "deepseek-v3.2", "messages": [{"role": "user", "content": "Hello"}]},
    headers={"Authorization": "Bearer YOUR_HOLYSHEEP_API_KEY"}
)

So Sánh Chi Phí Khi Sử Dụng Context Management

Model	Giá/MTok	Không tối ưu (50K tokens)	Tối ưu (10K tokens)	Tiết kiệm
GPT-4.1	$8.00	$0.40	$0.08	80%
Claude Sonnet 4.5	$15.00	$0.75	$0.15	80%
Gemini 2.5 Flash	$2.50	$0.125	$0.025	80%
DeepSeek V3.2	$0.42	$0.021	$0.0042	80%

Với HolySheep AI, bạn có thể chạy cùng một workload với chi phí thấp hơn 85% so với OpenAI. Đặc biệt với DeepSeek V3.2, mỗi triệu tokens chỉ tốn $0.42!

Kinh Nghiệm Thực Chiến

Qua 2 năm làm việc với AI API, tôi rút ra được vài nguyên tắc:

Luôn đặt budget cho context — Đừng bao giờ gửi toàn bộ lịch sử. Tôi thường giới hạn ở 50K tokens cho production systems.
Dùng semantic search để chọn context — Thay vì lấy tin nhắn gần nhất, hãy tìm những tin nhắn liên quan nhất đến query hiện tại.
Cache kết quả — Nếu cùng một câu hỏi được hỏi nhiều lần, cache response. Điều này giúp giảm 40-60% API calls.
Monitor token usage — Theo dõi sát sao token consumption để tối ưu chi phí
Tài nguyên liên quan
Bài viết liên quan

Vấn Đề Thực Tế: Khi Context Trở Thành N Bottleneck

MCP Resource Là Gì?

Prompt Template: Bộ Khung Thông Minh

Triển Khai Thực Tế Với HolyShehe AI

Code Triển Khai Chi Tiết

1. Thiết Lập MCP Resource Handler

Khởi tạo handler

2. Prompt Template System Hoàn Chỉnh

Sử dụng hệ thống template

Template cho chatbot hỗ trợ khách hàng

Template cho tóm tắt tài liệu

Ước tính chi phí

3. Tích Hợp Gọi API Với Context Management

Sử dụng thực tế

Test với sample context

Lỗi Thường Gặp Và Cách Khắc Phục

1. Lỗi Context Length Exceeded

✅ Đúng: Giới hạn context window

Hoặc dùng sliding window

2. Lỗi 401 Unauthorized

✅ Đúng: Format chuẩn với Bearer prefix

Test connection

Sử dụng

3. Lỗi Timeout và Rate Limiting

✅ Đúng: Exponential backoff + retry

Sử dụng

So Sánh Chi Phí Khi Sử Dụng Context Management

Kinh Nghiệm Thực Chiến

Tài nguyên liên quan

Bài viết liên quan

🔥 Thử HolySheep AI