Structured Output 实战：Sử dụng JSON Schema để kiểm soát định dạng đầu ra của LLM

Khi làm việc với các mô hình ngôn ngữ lớn (LLM), một trong những thách thức lớn nhất là đảm bảo đầu ra luôn đúng định dạng. Bạn có thể yêu cầu model trả về JSON, nhưng đôi khi nó vẫn thêm text thừa, bỏ sót trường, hoặc thay đổi cấu trúc. Giải pháp? Structured Output với JSON Schema.

Bài viết này sẽ hướng dẫn bạn cách sử dụng JSON Schema để ràng buộc đầu ra của LLM một cách chính xác, kèm theo so sánh chi phí và độ trễ giữa các nhà cung cấp API hàng đầu.

Tại sao cần Structured Output?

Trong thực tế sản xuất, đầu ra của LLM thường được dùng để:

Điền dữ liệu vào cơ sở dữ liệu
Gọi API downstream
Parse thông tin cho frontend
Tạo báo cáo tự động

Nếu định dạng không nhất quán, code của bạn sẽ phải xử lý vô số edge case, và production sẽ trở thành địa ngục debug. Structured Output giải quyết triệt để vấn đề này bằng cách bắt buộc model trả về đúng schema định nghĩa trước.

So sánh chi phí và hiệu năng API

Tiêu chí	HolySheep AI	OpenAI (API chính thức)	Anthropic (API chính thức)
GPT-4.1	$8/MTok	$8/MTok	-
Claude Sonnet 4.5	$15/MTok	-	$15/MTok
Gemini 2.5 Flash	$2.50/MTok	-	-
DeepSeek V3.2	$0.42/MTok	-	-
Độ trễ trung bình	<50ms	200-500ms	150-400ms
Thanh toán	WeChat/Alipay/USD	Thẻ quốc tế	Thẻ quốc tế
Tỷ giá	¥1 = $1 (85%+ tiết kiệm)	Giá gốc USD	Giá gốc USD
Tín dụng miễn phí	Có khi đăng ký	$5 trial	Có giới hạn
Structured Output	Hỗ trợ đầy đủ	Native	Native

Với mức giá DeepSeek V3.2 chỉ $0.42/MTok và độ trễ dưới 50ms, HolySheep AI là lựa chọn tối ưu cho các dự án cần xử lý volume lớn mà vẫn đảm bảo chất lượng đầu ra.

JSON Schema là gì?

JSON Schema là một chuẩn để mô tả cấu trúc của JSON. Khi kết hợp với Structured Output của LLM, bạn có thể:

Định nghĩa chính xác các trường bắt buộc
Chỉ định kiểu dữ liệu (string, number, boolean, array, object)
Ràng buộc giá trị với enum hoặc pattern
Yêu cầu model chỉ trả về data, không có text giải thích

Code thực chiến với HolySheep AI

Ví dụ 1: Trích xuất thông tin sản phẩm

Giả sử bạn cần trích xuất thông tin sản phẩm từ text mô tả để điền vào database. Dưới đây là cách implement với HolySheep AI:

import anthropic
import json

Kết nối HolySheep AI
client = anthropic.Anthropic(
    base_url="https://api.holysheep.ai/v1",
    api_key="YOUR_HOLYSHEEP_API_KEY"  # Thay bằng API key của bạn
)

Định nghĩa JSON Schema cho thông tin sản phẩm
product_schema = {
    "name": "ProductInfo",
    "description": "Thông tin sản phẩm được trích xuất từ mô tả",
    "type": "object",
    "properties": {
        "product_name": {
            "type": "string",
            "description": "Tên sản phẩm chính"
        },
        "price": {
            "type": "number",
            "description": "Giá sản phẩm (VND)"
        },
        "category": {
            "type": "string",
            "enum": ["electronics", "fashion", "food", "home", "other"],
            "description": "Danh mục sản phẩm"
        },
        "features": {
            "type": "array",
            "items": {"type": "string"},
            "description": "Danh sách tính năng nổi bật"
        },
        "in_stock": {
            "type": "boolean",
            "description": "Sản phẩm có sẵn hàng không"
        }
    },
    "required": ["product_name", "category"]
}

Prompt mô tả sản phẩm
product_description = """
Máy tính xách tay Lenovo ThinkPad X1 Carbon Gen 11
Giá: 32.990.000 VND
CPU Intel Core i7-1365U, RAM 16GB LPDDR5, SSD 512GB NVMe
Màn hình 14 inch 2.8K OLED, cảm biến vân tay, Windows 11 Pro
Điện thoại: 0901234567 - Còn hàng, giao hàng trong 24h
"""

message = client.messages.create(
    model="claude-sonnet-4.5",
    max_tokens=1024,
    system="Bạn là chuyên gia trích xuất thông tin sản phẩm. Chỉ trả về JSON đúng format, không thêm text.",
    messages=[
        {
            "role": "user",
            "content": f"Trích xuất thông tin sản phẩm từ mô tả sau:\n\n{product_description}"
        }
    ],
    extra_headers={"x-ai-beta-name": "product-extraction"},
    extra_body={
        "response_format": {
            "type": "json_schema",
            "json_schema": product_schema
        }
    }
)

Parse kết quả
result = json.loads(message.content[0].text)
print(f"Sản phẩm: {result['product_name']}")
print(f"Giá: {result['price']:,.0f} VND")
print(f"Danh mục: {result['category']}")
print(f"Tính năng: {', '.join(result['features'])}")
print(f"Còn hàng: {result['in_stock']}")

Ví dụ 2: Phân loại ticket hỗ trợ khách hàng

Trong hệ thống CRM, bạn cần tự động phân loại ticket để routing đúng team xử lý:

import anthropic

client = anthropic.Anthropic(
    base_url="https://api.holysheep.ai/v1",
    api_key="YOUR_HOLYSHEEP_API_KEY"
)

Schema cho ticket phân loại
ticket_schema = {
    "name": "SupportTicket",
    "description": "Kết quả phân loại ticket hỗ trợ",
    "type": "object",
    "properties": {
        "category": {
            "type": "string",
            "enum": ["billing", "technical", "refund", "complaint", "inquiry"],
            "description": "Loại ticket"
        },
        "priority": {
            "type": "string", 
            "enum": ["low", "medium", "high", "urgent"],
            "description": "Mức độ ưu tiên xử lý"
        },
        "sentiment": {
            "type": "string",
            "enum": ["positive", "neutral", "negative", "angry"],
            "description": "Cảm xúc của khách hàng"
        },
        "summary": {
            "type": "string",
            "maxLength": 100,
            "description": "Tóm tắt nội dung (dưới 100 ký tự)"
        },
        "department": {
            "type": "string",
            "enum": ["sales", "support", "finance", "management"],
            "description": "Phòng ban phụ trách"
        },
        "action_required": {
            "type": "array",
            "items": {
                "type": "string",
                "enum": ["refund", "replace", "fix", "explain", "escalate", "none"]
            },
            "description": "Các hành động cần thực hiện"
        }
    },
    "required": ["category", "priority", "department"]
}

ticket_content = """
Tôi đã đặt hàng từ 5 ngày trước nhưng vẫn chưa nhận được hàng.
Đơn hàng #12345, thanh toán đã trừ tiền rồi.
Gọi hotline 3 lần không ai nghe máy.
Đây là lần thứ 2 tôi mua hàng ở đây và lần nào cũng gặp vấn đề.
Tôi rất thất vọng và yêu cầu hoàn tiền ngay lập tức.
"""

response = client.messages.create(
    model="claude-sonnet-4.5",
    max_tokens=512,
    system="Phân tích và phân loại ticket hỗ trợ. Trả về JSON chính xác theo schema.",
    messages=[{"role": "user", "content": ticket_content}],
    extra_headers={"x-ai-beta-name": "ticket-classification"},
    extra_body={"response_format": {"type": "json_schema", "json_schema": ticket_schema}}
)

result = json.loads(response.content[0].text)
print(f"[{result['priority'].upper()}] {result['category']} → {result['department']}")
print(f"Cảm xúc: {result['sentiment']}")
print(f"Tóm tắt: {result['summary']}")
print(f"Hành động: {result['action_required']}")

Ví dụ 3: Validate và retry tự động

Một best practice quan trọng là luôn validate output và retry nếu model trả về sai schema:

import anthropic
import jsonschema
import time

client = anthropic.Anthropic(
    base_url="https://api.holysheep.ai/v1",
    api_key="YOUR_HOLYSHEEP_API_KEY"
)

def validate_with_retry(client, prompt, schema, max_retries=3):
    """Gọi API với validate và retry tự động"""
    
    for attempt in range(max_retries):
        try:
            response = client.messages.create(
                model="claude-sonnet-4.5",
                max_tokens=1024,
                system="Trả về đúng JSON theo schema. Không thêm giải thích.",
                messages=[{"role": "user", "content": prompt}],
                extra_headers={"x-ai-beta-name": "validated-extraction"},
                extra_body={"response_format": {"type": "json_schema", "json_schema": schema}}
            )
            
            # Parse JSON
            result = json.loads(response.content[0].text)
            
            # Validate với JSON Schema
            jsonschema.validate(instance=result, schema=schema)
            
            print(f"✓ Thành công ở lần thử {attempt + 1}")
            return result
            
        except jsonschema.exceptions.ValidationError as e:
            print(f"✗ Lần {attempt + 1}: Schema validation failed - {e.message}")
            if attempt < max_retries - 1:
                time.sleep(1)  # Đợi 1 giây trước khi retry
                
        except json.JSONDecodeError as e:
            print(f"✗ Lần {attempt + 1}: JSON parse error - {e}")
            if attempt < max_retries - 1:
                time.sleep(1)
    
    raise Exception(f"Không thể validate sau {max_retries} lần thử")

Ví dụ sử dụng
user_schema = {
    "name": "UserProfile",
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "number", "minimum": 0, "maximum": 150},
        "email": {"type": "string", "format": "email"},
        "interests": {"type": "array", "items": {"type": "string"}}
    },
    "required": ["name", "email"]
}

result = validate_with_retry(
    client, 
    "Trích xuất profile: Nguyễn Văn A, 28 tuổi, email [email protected], sở thích đọc sách và du lịch",
    user_schema
)
print(f"Result: {json.dumps(result, indent=2, ensure_ascii=False)}")

Lỗi thường gặp và cách khắc phục

Lỗi 1: Model trả về text thừa ngoài JSON

Mô tả lỗi: Model trả về kèm markdown code block hoặc text giải thích, gây JSON parse lỗi.

Giải pháp:

# Sai - Model thường thêm markdown
"""
Dưới đây là thông tin bạn yêu cầu:
{
  "name": "Nguyễn Văn A",
  "age": 28
}
"""

Đúng - Sử dụng system prompt rõ ràng và strict mode
client = anthropic.Anthropic(
    base_url="https://api.holysheep.ai/v1",
    api_key="YOUR_HOLYSHEEP_API_KEY"
)

response = client.messages.create(
    model="claude-sonnet-4.5",
    max_tokens=1024,
    system="""Bạn phải trả về DUY NHẤT một JSON object hợp lệ. 
    KHÔNG có markdown, KHÔNG có giải thích, KHÔNG có text khác.
    Bắt đầu trực tiếp bằng { và kết thúc bằng }.""",
    messages=[{"role": "user", "content": "Trích xuất thông tin..."}],
    extra_headers={"x-ai-beta-name": "strict-json"},
    extra_body={
        "response_format": {
            "type": "json_schema", 
            "json_schema": your_schema,
            "strict": True  # Bật strict mode nếu available
        }
    }
)

Parse với strip markdown
raw = response.content[0].text.strip()
if raw.startswith("```json"):
    raw = raw[7:]
if raw.startswith("```"):
    raw = raw[3:]
if raw.endswith("```"):
    raw = raw[:-3]
result = json.loads(raw.strip())

Lỗi 2: Thiếu trường bắt buộc (required fields)

Mô tả lỗi: Model bỏ qua một số trường trong schema, gây lỗi validation khi xử lý downstream.

Giải pháp:

# Kiểm tra required fields trước khi xử lý
def ensure_required_fields(result, schema):
    required = schema.get("required", [])
    missing = [field for field in required if field not in result]
    
    if missing:
        # Retry với prompt nhắc nhở các trường bắt buộc
        retry_prompt = f"""Các trường bắt buộc bị thiếu: {', '.join(missing)}.
        Hãy trả về lại JSON với ĐẦY ĐỦ các trường này."""
        
        response = client.messages.create(
            model="claude-sonnet-4.5",
            max_tokens=1024,
            system="Đảm bảo tất cả required fields đều có giá trị.",
            messages=[
                {"role": "assistant", "content": json.dumps(result)},
                {"role": "user", "content": retry_prompt}
            ],
            extra_headers={"x-ai-beta-name": "fill-missing"},
            extra_body={"response_format": {"type": "json_schema", "json_schema": schema}}
        )
        return json.loads(response.content[0].text)
    
    return result

Hoặc định nghĩa default values trong schema
schema_with_defaults = {
    "name": "Order",
    "type": "object",
    "properties": {
        "status": {
            "type": "string",
            "enum": ["pending", "processing", "shipped", "delivered"],
            "default": "pending"  # Default value nếu model không trả về
        },
        "notes": {
            "type": "string",
            "default": ""  # Mặc định empty string
        }
    }
}

Lỗi 3: Sai kiểu dữ liệu (
Tài nguyên liên quan
📚 Hướng dẫn AI API
💰 Xem giá
📖 Tài liệu nhà phát triển
🚀 Đăng ký miễn phí
Bài viết liên quan
Cohere Command R+ API Tích Hợp Toàn Diện — Hướng Dẫn Chuyên
Agent Đa Phương Thức: Kết Hợp Thị Giác và Thao Tác Công Cụ —
OpenAI Responses API: Hướng Dẫn Toàn Diện Để Di Chuyển Từ Ch

Tại sao cần Structured Output?

So sánh chi phí và hiệu năng API

JSON Schema là gì?

Code thực chiến với HolySheep AI

Ví dụ 1: Trích xuất thông tin sản phẩm

Kết nối HolySheep AI

Định nghĩa JSON Schema cho thông tin sản phẩm

Prompt mô tả sản phẩm

Parse kết quả

Ví dụ 2: Phân loại ticket hỗ trợ khách hàng

Schema cho ticket phân loại

Ví dụ 3: Validate và retry tự động

Ví dụ sử dụng

Lỗi thường gặp và cách khắc phục

Lỗi 1: Model trả về text thừa ngoài JSON

Đúng - Sử dụng system prompt rõ ràng và strict mode

Parse với strip markdown

Lỗi 2: Thiếu trường bắt buộc (required fields)

Hoặc định nghĩa default values trong schema

Tài nguyên liên quan

Bài viết liên quan

🔥 Thử HolySheep AI