Mở Đầu: Khi "ConnectionError: timeout" Phá Hủy Production

23:47 tối, hệ thống chatbot của tôi ngừng hoạt động. Logs tràn ngập lỗi: ConnectionError: timeout after 30000ms. Khách hàng không thể trò chuyện, đội ngũ hỗ trợ báo động liên tục. Nguyên nhân? Tôi đã hardcode API endpoint cũ của nhà cung cấp — endpoint đó đã bị deprecated từ tháng trước.

Đó là khoảnh khắc tôi nhận ra: Model Context Protocol (MCP) không chỉ là một spec mới — nó là cách duy nhất để đảm bảo ứng dụng AI của bạn không bị "break" khi providers thay đổi API.

Bài viết này sẽ đưa bạn từ hiểu biết zero về MCP đến triển khai production-ready, với tất cả lỗi thực tế mà tôi đã gặp và cách khắc phục chúng.

MCP Là Gì? Tại Sao 2026 Là Năm Của MCP

Model Context Protocol (MCP) là một giao thức chuẩn hóa cho phép các ứng dụng kết nối với các Language Model Providers một cách thống nhất. Thay vì viết code riêng cho từng provider (OpenAI, Anthropic, Google...), bạn chỉ cần implement MCP client — và nó hoạt động với mọi provider tuân thủ spec.

Lợi ích cốt lõi:

Kiến Trúc MCP 2026 — Sơ Đồ Request Flow

┌─────────────────────────────────────────────────────────────────┐
│                     MCP Client Application                       │
├─────────────────────────────────────────────────────────────────┤
│  1. MCPClient.connect(provider, api_key)                         │
│     ↓                                                            │
│  2. MCPClient.createSession({                                    │
│        model: "gpt-4.1",                                         │
│        max_tokens: 4096,                                         │
│        temperature: 0.7                                           │
│     })                                                           │
│     ↓                                                            │
│  3. Session.send({                                               │
│        role: "user",                                             │
│        content: "Explain MCP protocol"                           │
│     })                                                           │
│     ↓                                                            │
│  4. Streaming Response (SSE/WebSocket)                           │
│     ↓                                                            │
│  5. Session.close() → Connection cleanup                          │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│                    MCP-Compatible Gateway                        │
│              (HolySheep AI, OpenAI, Anthropic...)                │
├─────────────────────────────────────────────────────────────────┤
│  POST /v1/chat/completions                                       │
│  Authorization: Bearer YOUR_API_KEY                              │
│  Content-Type: application/json                                  │
└─────────────────────────────────────────────────────────────────┘

Triển Khai MCP Client Với HolySheep AI

Tại Sao Chọn HolySheep AI?

HolySheep AI là MCP-compatible provider với chi phí thấp hơn 85%+ so với các provider phương Tây:

Bảng Giá So Sánh 2026

ModelProviderGiá/1M TokensTiết kiệm
GPT-4.1OpenAI$60-
GPT-4.1HolySheep AI$886%
Claude Sonnet 4.5Anthropic$105-
Claude Sonnet 4.5HolySheep AI$1585%
Gemini 2.5 FlashGoogle$17.50-
Gemini 2.5 FlashHolySheep AI$2.5085%
DeepSeek V3.2HolySheep AI$0.42Best Value

Code Mẫu: MCP Client Hoàn Chỉnh

1. MCP Client Cơ Bản (Python)

# mcp_client.py
import aiohttp
import asyncio
from typing import AsyncIterator, Optional
from dataclasses import dataclass
import json

@dataclass
class MCPMessage:
    role: str
    content: str
    name: Optional[str] = None

class HolySheepMCPClient:
    """MCP-compatible client cho HolySheep AI Gateway"""
    
    def __init__(self, api_key: str, base_url: str = "https://api.holysheep.ai/v1"):
        self.api_key = api_key
        self.base_url = base_url.rstrip('/')
        self._session: Optional[aiohttp.ClientSession] = None
    
    async def __aenter__(self):
        self._session = aiohttp.ClientSession(
            headers={
                "Authorization": f"Bearer {self.api_key}",
                "Content-Type": "application/json"
            },
            timeout=aiohttp.ClientTimeout(total=60)
        )
        return self
    
    async def __aexit__(self, *args):
        if self._session:
            await self._session.close()
    
    async def create_session(self, model: str = "gpt-4.1", **kwargs) -> dict:
        """Tạo MCP session - trả về session metadata"""
        async with self._session.post(
            f"{self.base_url}/sessions",
            json={
                "model": model,
                "temperature": kwargs.get("temperature", 0.7),
                "max_tokens": kwargs.get("max_tokens", 4096)
            }
        ) as resp:
            if resp.status == 401:
                raise MCPError("401 Unauthorized - Kiểm tra API key của bạn")
            if resp.status != 201:
                text = await resp.text()
                raise MCPError(f"Session creation failed: {resp.status} - {text}")
            return await resp.json()
    
    async def send_message(self, session_id: str, message: MCPMessage) -> dict:
        """Gửi message trong session"""
        payload = {
            "role": message.role,
            "content": message.content
        }
        if message.name:
            payload["name"] = message.name
        
        async with self._session.post(
            f"{self.base_url}/sessions/{session_id}/messages",
            json=payload
        ) as resp:
            if resp.status == 429:
                raise RateLimitError("Rate limit exceeded - Thử lại sau")
            if resp.status == 400:
                text = await resp.text()
                raise MCPError(f"Bad request: {text}")
            return await resp.json()
    
    async def stream_completions(
        self, session_id: str, messages: list[MCPMessage]
    ) -> AsyncIterator[str]:
        """Stream response - phù hợp cho real-time applications"""
        payload = {
            "messages": [
                {"role": m.role, "content": m.content} 
                for m in messages
            ],
            "stream": True
        }
        
        async with self._session.post(
            f"{self.base_url}/chat/completions",
            json=payload
        ) as resp:
            if resp.status != 200:
                raise MCPError(f"Stream failed: {resp.status}")
            
            async for line in resp.content:
                line = line.decode('utf-8').strip()
                if line.startswith('data: '):
                    data = json.loads(line[6:])
                    if 'choices' in data and len(data['choices']) > 0:
                        delta = data['choices'][0].get('delta', {})
                        if 'content' in delta:
                            yield delta['content']

class MCPError(Exception):
    """Base exception cho MCP errors"""
    pass

class RateLimitError(MCPError):
    """Rate limit exceeded"""
    pass

============================================================

VÍ DỤ SỬ DỤNG

============================================================

async def main(): async with HolySheepMCPClient( api_key="YOUR_HOLYSHEEP_API_KEY" # Thay bằng key thực của bạn ) as client: # Tạo session session = await client.create_session( model="gpt-4.1", temperature=0.7, max_tokens=2048 ) print(f"Session created: {session['session_id']}") # Gửi message response = await client.send_message( session_id=session['session_id'], message=MCPMessage(role="user", content="Giải thích MCP protocol") ) print(f"Response: {response['content']}") if __name__ == "__main__": asyncio.run(main())

2. MCP Server Middleware (Node.js/TypeScript)

// mcp-server.ts
import express, { Request, Response, NextFunction } from 'express';
import { createHmac } from 'crypto';

interface MCPRequest {
  jsonrpc: '2.0';
  id: string | number;
  method: string;
  params?: {
    provider?: string;
    model?: string;
    messages?: Array<{role: string; content: string}>;
    temperature?: number;
    max_tokens?: number;
  };
}

interface MCPResponse {
  jsonrpc: '2.0';
  id: string | number;
  result?: {
    session_id?: string;
    choices?: Array<{message: {content: string}}>;
    usage?: {prompt_tokens: number; completion_tokens: number};
  };
  error?: {
    code: number;
    message: string;
    data?: unknown;
  };
}

const HOLYSHEEP_BASE_URL = 'https://api.holysheep.ai/v1';
const API_KEY = process.env.HOLYSHEEP_API_KEY || 'YOUR_HOLYSHEEP_API_KEY';

// MCP Protocol Validator
function validateMCPRequest(req: MCPRequest): boolean {
  if (req.jsonrpc !== '2.0') return false;
  if (!req.method || typeof req.method !== 'string') return false;
  if (req.id === undefined) return false;
  return true;
}

// HolySheep API Adapter
async function forwardToProvider(
  method: string, 
  params: MCPRequest['params']
): Promise {
  const endpoint = method.startsWith('chat') 
    ? '/chat/completions' 
    : method === 'sessions.create' 
      ? '/sessions' 
      : '/completions';
  
  const headers = {
    'Authorization': Bearer ${API_KEY},
    'Content-Type': 'application/json',
  };
  
  // Map MCP params → HolySheep format
  const body: Record = {};
  if (params?.model) body.model = params.model;
  if (params?.messages) body.messages = params.messages;
  if (params?.temperature !== undefined) body.temperature = params.temperature;
  if (params?.max_tokens) body.max_tokens = params.max_tokens;
  
  try {
    const response = await fetch(${HOLYSHEEP_BASE_URL}${endpoint}, {
      method: 'POST',
      headers,
      body: JSON.stringify(body),
    });
    
    if (response.status === 401) {
      throw { code: -32602, message: 'Invalid API key - Kiểm tra HolySheep API key của bạn' };
    }
    
    if (response.status === 429) {
      throw { code: -32603, message: 'Rate limit exceeded - Retry-After header sẽ được set' };
    }
    
    if (!response.ok) {
      const errorText = await response.text();
      throw { 
        code: -32603, 
        message: Provider error: ${response.status},
        data: errorText 
      };
    }
    
    const data = await response.json();
    return data;
    
  } catch (error: unknown) {
    if ((error as {code?: number}).code) {
      throw error;
    }
    throw { 
      code: -32603, 
      message: Network error: ${(error as Error).message} 
    };
  }
}

// Express Middleware
const app = express();
app.use(express.json());

app.post('/mcp', async (req: Request, res: Response) => {
  const mcpRequest = req.body as MCPRequest;
  
  // Validate MCP request
  if (!validateMCPRequest(mcpRequest)) {
    const errorResponse: MCPResponse = {
      jsonrpc: '2.0',
      id: mcpRequest.id || 0,
      error: { code: -32600, message: 'Invalid Request' }
    };
    return res.status(400).json(errorResponse);
  }
  
  try {
    const result = await forwardToProvider(
      mcpRequest.method, 
      mcpRequest.params
    );
    
    const response: MCPResponse = {
      jsonrpc: '2.0',
      id: mcpRequest.id,
      result
    };
    
    res.json(response);
    
  } catch (error: unknown) {
    const err = error as {code: number; message: string; data?: unknown};
    const response: MCPResponse = {
      jsonrpc: '2.0',
      id: mcpRequest.id || 0,
      error: { code: err.code, message: err.message }
    };
    
    // Map error codes to HTTP status
    const statusMap: Record = {
      '-32600': 400,
      '-32602': 401,
      '-32603': 502,
    };
    res.status(statusMap[err.code] || 500).json(response);
  }
});

// Streaming endpoint
app.post('/mcp/stream', async (req: Request, res: Response) => {
  const { messages, model = 'gpt-4.1', temperature = 0.7 } = req.body;
  
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');
  
  try {
    const response = await fetch(${HOLYSHEEP_BASE_URL}/chat/completions, {
      method: 'POST',
      headers: {
        'Authorization': Bearer ${API_KEY},
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        model,
        messages,
        temperature,
        stream: true,
      }),
    });
    
    if (!response.ok) {
      throw new Error(Provider returned ${response.status});
    }
    
    // Forward stream to client
    for await (const chunk of response.body!) {
      res.write(data: ${chunk.toString()}\n\n);
    }
    
  } catch (error: unknown) {
    res.write(data: [DONE]\n\n);
  }
  
  res.end();
});

const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
  console.log(MCP Server running on port ${PORT});
  console.log(Connected to HolySheep AI: ${HOLYSHEEP_BASE_URL});
});

export { app, validateMCPRequest, forwardToProvider };

3. MCP Integration Test Suite

# test_mcp_client.py
import pytest
import asyncio
from mcp_client import HolySheepMCPClient, MCPMessage, MCPError, RateLimitError

============================================================

FIXTURES

============================================================

@pytest.fixture def api_key(): """Lấy API key từ environment hoặc test key""" import os return os.environ.get('HOLYSHEEP_API_KEY', 'test_key_for_ci') @pytest.fixture async def client(api_key): async with HolySheepMCPClient(api_key=api_key) as c: yield c

============================================================

TESTS: Connection & Authentication

============================================================

class TestAuthentication: @pytest.mark.asyncio async def test_valid_api_key(self, api_key): """Test với API key hợp lệ""" async with HolySheepMCPClient(api_key=api_key) as client: # Không raise exception = pass assert client.api_key == api_key @pytest.mark.asyncio async def test_invalid_api_key_raises_401(self): """Invalid key phải raise 401 Unauthorized""" async with HolySheepMCPClient(api_key="invalid_key_xyz") as client: with pytest.raises(MCPError) as exc_info: await client.create_session(model="gpt-4.1") assert "401" in str(exc_info.value) @pytest.mark.asyncio async def test_empty_api_key_raises_error(self): """Empty key phải raise error ngay lập tức""" with pytest.raises((MCPError, ValueError)): async with HolySheepMCPClient(api_key="") as client: pass

============================================================

TESTS: Session Management

============================================================

class TestSessionManagement: @pytest.mark.asyncio async def test_create_session_success(self, client): """Tạo session thành công""" session = await client.create_session( model="gpt-4.1", temperature=0.7, max_tokens=100 ) assert 'session_id' in session assert session['model'] == 'gpt-4.1' @pytest.mark.asyncio async def test_create_session_with_invalid_model(self, client): """Model không tồn tại phải raise error""" with pytest.raises(MCPError) as exc_info: await client.create_session(model="non-existent-model-xyz") # HolySheep trả về 400 cho invalid model assert "400" in str(exc_info.value) or "not found" in str(exc_info.value).lower() @pytest.mark.asyncio async def test_session_persistence(self, client): """Session có thể reuse cho multiple requests""" session = await client.create_session(model="gpt-4.1") session_id = session['session_id'] # Gửi 2 messages trong cùng session await client.send_message( session_id, MCPMessage(role="user", content="Hello") ) await client.send_message( session_id, MCPMessage(role="user", content="Continue from previous") ) # Không raise exception = pass

============================================================

TESTS: Rate Limiting

============================================================

class TestRateLimiting: @pytest.mark.asyncio async def test_rate_limit_returns_429(self): """Khi rate limit exceeded, server phải return 429""" # Sử dụng key với rate limit thấp cho test async with HolySheepMCPClient(api_key="rate_limited_test_key") as client: # Gửi 100 requests liên tục tasks = [ client.create_session(model="gpt-4.1") for _ in range(100) ] # Chờ tất cả hoàn thành results = await asyncio.gather(*tasks, return_exceptions=True) # Ít nhất 1 request phải fail với 429 errors = [r for r in results if isinstance(r, RateLimitError)] assert len(errors) > 0, "Expected at least one 429 error under load"

============================================================

TESTS: Message Formatting

============================================================

class TestMessageFormatting: def test_message_with_name(self): """Message có name field được format đúng""" msg = MCPMessage(role="system", content="You are assistant", name="config") assert msg.name == "config" def test_message_without_name(self): """Message không có name vẫn valid""" msg = MCPMessage(role="user", content="Hello") assert msg.name is None @pytest.mark.asyncio async def test_system_message_in_session(self, client): """System message hoạt động trong session""" session = await client.create_session(model="gpt-4.1") # Gửi system message await client.send_message( session_id=session['session_id'], message=MCPMessage( role="system", content="You always respond in Vietnamese" ) ) # Không raise exception

============================================================

RUN TESTS

============================================================

if __name__ == "__main__": pytest.main([__file__, "-v", "--tb=short"])

Cấu Hình Production

# config.yaml - Production configuration
server:
  host: "0.0.0.0"
  port: 8080
  workers: 4
  timeout: 60

mcp:
  provider: "holysheep"
  base_url: "https://api.holysheep.ai/v1"
  api_key_env: "HOLYSHEEP_API_KEY"
  
  # Retry configuration
  retry:
    max_attempts: 3
    backoff_factor: 2
    retry_on:
      - 429  # Rate limit
      - 500  # Server error
      - 502  # Bad gateway
      - 503  # Service unavailable
  
  # Circuit breaker
  circuit_breaker:
    failure_threshold: 5
    recovery_timeout: 60
    half_open_max_calls: 3

models:
  gpt-4.1:
    temperature: 0.7
    max_tokens: 4096
    cost_per_1k_tokens: 0.008  # $8/1M tokens
  
  claude-sonnet-4.5:
    temperature: 0.7
    max_tokens: 4096
    cost_per_1k_tokens: 0.015  # $15/1M tokens
  
  deepseek-v3.2:
    temperature: 0.5
    max_tokens: 8192
    cost_per_1k_tokens: 0.00042  # $0.42/1M tokens

rate_limits:
  requests_per_minute: 60
  tokens_per_minute: 120000
  concurrent_sessions: 10

logging:
  level: "INFO"
  format: "json"
  destinations:
    - type: "stdout"
    - type: "file"
      path: "/var/log/mcp/app.log"
      max_size: "100MB"
      rotation: "daily"

Lỗi Thường Gặp Và Cách Khắc Phục

1. Lỗi 401 Unauthorized — API Key Không Hợp Lệ

# ❌ WRONG - Copy paste sai hoặc có khoảng trắng thừa
api_key = " YOUR_HOLYSHEEP_API_KEY "  # Khoảng trắng thừa!
api_key = "sk-xxx\nxxx"  # Xuống dòng do copy

✅ CORRECT

api_key = "YOUR_HOLYSHEEP_API_KEY".strip()

Hoặc sử dụng environment variable

import os api_key = os.environ.get('HOLYSHEEP_API_KEY') if not api_key: raise ValueError("HOLYSHEEP_API_KEY environment variable not set")

Verify key format

def validate_api_key(key: str) -> bool: if not key or len(key) < 20: return False # HolySheep keys không chứa ký tự whitespace return all(c not in key for c in [' ', '\n', '\t', '\r'])

2. Lỗi Connection Timeout — Network Issue

# ❌ WRONG - Timeout quá ngắn cho production
async with HolySheepMCPClient(api_key=key) as client:
    client._session = aiohttp.ClientSession(
        timeout=aiohttp.ClientTimeout(total=5)  # 5 seconds quá ngắn!
    )

✅ CORRECT - Timeout adaptive theo request type

class AdaptiveTimeoutClient(HolySheepMCPClient): TIMEOUTS = { 'session_create': 10, # Tạo session: nhanh 'chat_simple': 30, # Chat đơn giản 'chat_complex': 60, # Chat phức tạp 'embedding': 45, # Embedding 'streaming': 120, # Streaming - cần lâu hơn } async def _request_with_timeout(self, method: str, timeout_type: str, **kwargs): timeout = self.TIMEOUTS.get(timeout_type, 30) try: return await asyncio.wait_for(method(**kwargs), timeout=timeout) except asyncio.TimeoutError: # Log và retry với exponential backoff logger.warning(f"Timeout {timeout_type} after {timeout}s, retrying...") await asyncio.sleep(min(timeout * 0.5, 10)) return await asyncio.wait_for(method(**kwargs), timeout=timeout * 2)

3. Lỗi 429 Rate Limit Exceeded

# ❌ WRONG - Retry ngay lập tức không giải quyết được rate limit
for i in range(10):
    try:
        response = await client.send_message(...)
        break
    except RateLimitError:
        await asyncio.sleep(0.1)  # Quá ngắn!

✅ CORRECT - Exponential backoff với jitter

import random async def send_with_intelligent_retry(client, message, max_retries=5): """Gửi message với retry thông minh""" for attempt in range(max_retries): try: return await client.send_message(message) except RateLimitError as e: if attempt == max_retries - 1: raise # Calculate backoff: base * 2^attempt + jitter base_delay = 1 # 1 giây max_delay = 60 # Tối đa 60 giây # Nếu có Retry-After header, sử dụng nó if hasattr(e, 'retry_after'): delay = int(e.retry_after) else: delay = min(base_delay * (2 ** attempt), max_delay) delay += random.uniform(0, 0.5 * delay) # Thêm jitter logger.info(f"Rate limited, retrying in {delay:.1f}s (attempt {attempt + 1}/{max_retries})") await asyncio.sleep(delay) except MCPError as e: # Non-retryable error logger.error(f"Non-retryable error: {e}") raise

4. Lỗi Invalid Model Name

# ❌ WRONG - Hardcode model name dễ gây lỗi
MODEL = "gpt-4.1"  # Nếu provider đổi tên, code break

✅ CORRECT - Dynamic model resolution

AVAILABLE_MODELS = { 'gpt-4.1': 'gpt-4.1', 'gpt-4.1-turbo': 'gpt-4.1-turbo', 'claude-sonnet-4.5': 'claude-sonnet-4.5', 'gemini-2.5-flash': 'gemini-2.5-flash', 'deepseek-v3.2': 'deepseek-v3.2', } def resolve_model(requested: str) -> str: """Resolve model name với fallback""" if requested in AVAILABLE_MODELS: return AVAILABLE_MODELS[requested] # Thử lowercase lower_requested = requested.lower() for key in AVAILABLE_MODELS: if key.lower() == lower_requested: return AVAILABLE_MODELS[key] # Fallback to default logger.warning(f"Model '{requested}' not found, falling back to 'gpt-4.1'") return 'gpt-4.1'

Sử dụng

model = resolve_model(user_config.get('model', 'gpt-4.1')) session = await client.create_session(model=model)

5. Lỗi Streaming Response Parsing

# ❌ WRONG - Parse không đúng format
async def stream_handler(response):
    full_content = ""
    async for line in response.content:
        # Giả định format sai
        data = json.loads(line)  # Có thể fail!
        full_content += data['content']
    return full_content

✅ CORRECT - Handle multiple SSE formats

async def parse_sse_stream(response): """Parse SSE stream với nhiều format có thể""" buffer = "" async for chunk in response.content.iter_chunked(1024): buffer += chunk.decode('utf-8') # Xử lý từng dòng hoàn chỉnh while '\n' in buffer: line, buffer = buffer.split('\n', 1) line = line.strip() if not line: continue # Format 1: "data: {...}" if line.startswith('data: '): data_str = line[6:] if data_str == '[DONE]': return try: data = json.loads(data_str) yield parse_chunk_data(data) except json.JSONDecodeError: logger.warning(f"Invalid JSON in stream: {data_str[:50]}") continue # Format 2: Raw JSON (không có prefix) else: try: data = json.loads(line) yield parse_chunk_data(data) except json.JSONDecodeError: continue def parse_chunk_data(data: dict) -> str: """Extract content từ chunk data""" # OpenAI format if 'choices' in data: delta = data['choices'][0].get('delta', {}) return delta.get('content', '') # HolySheep format if 'message' in data: return data['message'].get('content', '') return ''

Best Practices Cho Production

1. Connection Pooling

# connection_pool.py
from contextlib import asynccontextmanager
import asyncio
from typing import Optional

class ConnectionPool:
    """Connection pool cho MCP clients - giảm latency đáng kể"""
    
    def __init__(self, api_key: str, pool_size: int = 10):
        self.api_key = api_key
        self.pool_size = pool_size
        self._semaphore = asyncio.Semaphore(pool_size)
        self._clients: list[HolySheepMCPClient] = []
        self._client_ready: asyncio.Event = asyncio.Event()
        self._initialized = False
    
    async def initialize(self):
        """Pre-warm connections"""
        if self._initialized:
            return
        
        self._clients = [
            HolySheepMCPClient(api_key=self.api_key)
            for _ in range(self.pool_size)
        ]
        
        # Warm up connections
        await asyncio.gather(*[
            client.__aenter__()
            for client in self._clients
        ])
        
        self._initialized = True
        self._client_ready.set()
    
    @asynccontextmanager
    async def acquire(self):
        """Acquire a client từ pool"""
        await self._client_ready.wait()
        
        async with self._semaphore:
            # Round-robin selection
            client = self._clients[len(self._clients) % self.pool_size]
            self._pool_index = (self._pool_index + 1) % self.pool_size
            yield client
    
    async def close(self):
        """Close all connections"""
        await asyncio.gather(*[
            client.__aexit__(None, None, None)
            for client in self._clients
        ])
        self._initialized = False

Sử dụng

pool = ConnectionPool(api_key="YOUR_KEY", pool_size=10) await pool.initialize() async def handle_request(message: str): async with pool.acquire() as client: return await client.send_message(message)

2. Cost Optimization

# cost_optimizer.py
from dataclasses import dataclass
from typing import Optional
import time

@dataclass
class TokenUsage:
    prompt_tokens: int
    completion_tokens: int
    total_cost: float
    timestamp: float

class CostOptimizer:
    """Tối ưu chi phí bằng model selection thông minh"""
    
    MODEL_COSTS = {
        'gpt-4.1': 0.008,           # $8/1M tokens
        'claude-sonnet-4.5': 0.015, # $15/1M tokens
        'gemini-2.5-flash': 0.0025, # $2.50/1M tokens
        'deepseek-v3.2': 0.00042,  # $0.42/1M tokens
    }
    
    def __init__(self, monthly_budget: float):
        self.monthly_budget = monthly_budget
        self.spent = 0.0
        self.usage_history: list[TokenUsage] = []
    
    def select_model(self, task_complexity: str) -> str: