Docker Compose 搭建本地 AI API 开发环境：完整指南 2026

Tháng 11 năm 2025, một đồng nghiệp của tôi — Minh, senior developer tại startup thương mại điện tử — gọi điện vào lúc 11 giờ đêm. Dự án RAG (Retrieval-Augmented Generation) cho hệ thống hỗ trợ khách hàng sắp deadline, nhưng API key của OpenAI hết credits. Anh ấy cần tìm giải pháp thay thế ngay lập tức để team có thể tiếp tục development và testing. Câu chuyện này không hiếm gặp — và tôi sẽ chỉ cho bạn cách xây dựng local AI API development environment với Docker Compose để không bao giờ rơi vào tình huống tương tự.

Tại sao cần môi trường development AI cục bộ?

Trước khi đi vào chi tiết kỹ thuật, hãy nói về lý do thực tế:

Tốc độ phát triển: Không phụ thuộc vào network latency, mỗi request chỉ mất <50ms khi kết nối nội bộ
Tiết kiệm chi phí: Tỷ giá ¥1=$1 có nghĩa bạn tiết kiệm được 85%+ so với API gốc của các provider phương Tây
Testing không giới hạn: Development và staging không tiêu tốn credits sản xuất
Lin hoạt model selection: Dễ dàng switch giữa các model như GPT-4.1 ($8/MTok), Claude Sonnet 4.5 ($15/MTok), Gemini 2.5 Flash ($2.50/MTok), DeepSeek V3.2 ($0.42/MTok)

Kiến trúc hệ thống

Môi trường development của chúng ta bao gồm:

Reverse Proxy: Nginx để handle routing và SSL termination
AI API Gateway: Một service đồng nhất hóa các request đến HolySheep AI
Development Database: PostgreSQL cho dữ liệu ứng dụng
Vector Database: Qdrant hoặc Milvus cho RAG workloads
Monitoring: Prometheus + Grafana để theo dõi performance

Bước 1: Cài đặt Docker và Docker Compose

Đảm bảo bạn đã cài đặt Docker Engine (version 20.10+) và Docker Compose V2. Nếu chưa có, hãy tham khảo tài liệu chính thức của Docker.

Bước 2: Tạo cấu trúc thư mục dự án

ai-dev-environment/
├── docker-compose.yml
├── nginx/
│   └── nginx.conf
├── api-gateway/
│   ├── Dockerfile
│   ├── requirements.txt
│   └── app.py
├── qdrant/
│   └── storage/  (volume)
├── prometheus/
│   └── prometheus.yml
└── .env

Bước 3: File cấu hình Docker Compose

Đây là file docker-compose.yml chính — trái tim của toàn bộ hệ thống:

version: '3.8'

services:
  # Nginx reverse proxy - unified entry point
  nginx:
    image: nginx:alpine
    container_name: ai-nginx
    ports:
      - "8080:80"
      - "8443:443"
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
      - ./logs/nginx:/var/log/nginx
    depends_on:
      - api-gateway
      - qdrant
    networks:
      - ai-network
    restart: unless-stopped

  # API Gateway - HolySheep AI integration
  api-gateway:
    build:
      context: ./api-gateway
      dockerfile: Dockerfile
    container_name: ai-api-gateway
    environment:
      - HOLYSHEEP_API_KEY=${HOLYSHEEP_API_KEY}
      - HOLYSHEEP_BASE_URL=https://api.holysheep.ai/v1
      - LOG_LEVEL=DEBUG
    volumes:
      - ./api-gateway:/app
    depends_on:
      - prometheus
    networks:
      - ai-network
    restart: unless-stopped

  # Qdrant Vector Database for RAG
  qdrant:
    image: qdrant/qdrant:v1.7.0
    container_name: ai-qdrant
    ports:
      - "6333:6333"
      - "6334:6334"
    volumes:
      - ./qdrant/storage:/qdrant/storage
    networks:
      - ai-network
    restart: unless-stopped

  # Prometheus metrics collection
  prometheus:
    image: prom/prometheus:v2.45.0
    container_name: ai-prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
    networks:
      - ai-network
    restart: unless-stopped

  # Grafana dashboards
  grafana:
    image: grafana/grafana:10.0.0
    container_name: ai-grafana
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD:-admin123}
    volumes:
      - grafana-data:/var/lib/grafana
    depends_on:
      - prometheus
    networks:
      - ai-network
    restart: unless-stopped

networks:
  ai-network:
    driver: bridge

volumes:
  prometheus-data:
  grafana-data:

Bước 4: API Gateway Service

Đây là service Python Flask đóng vai trò trung gian, cho phép bạn development local với interface quen thuộc nhưng thực tế gọi đến HolySheep AI — nền tảng hỗ trợ thanh toán qua WeChat/Alipay với chi phí cực kỳ cạnh tranh.

requirements.txt

flask==3.0.0
flask-cors==4.0.0
requests==2.31.0
prometheus-client==0.19.0
python-dotenv==1.0.0
gunicorn==21.2.0

app.py — API Gateway Implementation

import os
import requests
from flask import Flask, request, jsonify
from flask_cors import CORS
from prometheus_client import Counter, Histogram, generate_latest
from dotenv import load_dotenv

load_dotenv()

app = Flask(__name__)
CORS(app)

HolySheep AI Configuration
HOLYSHEEP_API_KEY = os.getenv('HOLYSHEEP_API_KEY')
HOLYSHEEP_BASE_URL = os.getenv('HOLYSHEEP_BASE_URL', 'https://api.holysheep.ai/v1')

Prometheus metrics
request_counter = Counter('api_requests_total', 'Total API requests', ['endpoint', 'status'])
request_latency = Histogram('api_request_latency_seconds', 'Request latency', ['endpoint'])

@app.route('/health', methods=['GET'])
def health_check():
    """Health check endpoint for Docker healthchecks"""
    return jsonify({
        'status': 'healthy',
        'service': 'ai-api-gateway',
        'provider': 'HolySheep AI'
    }), 200

@app.route('/v1/chat/completions', methods=['POST'])
@request_latency.time()
def chat_completions():
    """OpenAI-compatible chat completions endpoint"""
    try:
        headers = {
            'Authorization': f'Bearer {HOLYSHEEP_API_KEY}',
            'Content-Type': 'application/json'
        }
        
        payload = request.get_json()
        
        # Forward to HolySheep AI
        response = requests.post(
            f'{HOLYSHEEP_BASE_URL}/chat/completions',
            headers=headers,
            json=payload,
            timeout=30
        )
        
        request_counter.labels(endpoint='chat/completions', status=response.status_code).inc()
        
        return jsonify(response.json()), response.status_code
        
    except requests.exceptions.Timeout:
        request_counter.labels(endpoint='chat/completions', status=408).inc()
        return jsonify({'error': 'Request timeout'}), 408
    except Exception as e:
        request_counter.labels(endpoint='chat/completions', status=500).inc()
        return jsonify({'error': str(e)}), 500

@app.route('/v1/embeddings', methods=['POST'])
@request_latency.time()
def embeddings():
    """OpenAI-compatible embeddings endpoint"""
    try:
        headers = {
            'Authorization': f'Bearer {HOLYSHEEP_API_KEY}',
            'Content-Type': 'application/json'
        }
        
        payload = request.get_json()
        
        response = requests.post(
            f'{HOLYSHEEP_BASE_URL}/embeddings',
            headers=headers,
            json=payload,
            timeout=60
        )
        
        request_counter.labels(endpoint='embeddings', status=response.status_code).inc()
        
        return jsonify(response.json()), response.status_code
        
    except Exception as e:
        request_counter.labels(endpoint='embeddings', status=500).inc()
        return jsonify({'error': str(e)}), 500

@app.route('/v1/models', methods=['GET'])
def list_models():
    """List available models - useful for development"""
    return jsonify({
        'object': 'list',
        'data': [
            {'id': 'gpt-4.1', 'object': 'model', 'pricing': '$8/MTok'},
            {'id': 'claude-sonnet-4.5', 'object': 'model', 'pricing': '$15/MTok'},
            {'id': 'gemini-2.5-flash', 'object': 'model', 'pricing': '$2.50/MTok'},
            {'id': 'deepseek-v3.2', 'object': 'model', 'pricing': '$0.42/MTok'},
        ]
    }), 200

@app.route('/metrics')
def metrics():
    """Prometheus metrics endpoint"""
    return generate_latest(), 200, {'Content-Type': 'text/plain'}

if __name__ == '__main__':
    print(f"🚀 AI API Gateway starting...")
    print(f"   HolySheep Base URL: {HOLYSHEEP_BASE_URL}")
    print(f"   API Key configured: {'Yes' if HOLYSHEEP_API_KEY else 'No'}")
    app.run(host='0.0.0.0', port=5000, debug=True)

Dockerfile cho API Gateway

FROM python:3.11-slim

WORKDIR /app

Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

Copy application
COPY app.py .

Expose port
EXPOSE 5000

Run with gunicorn for production-like behavior
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "--workers", "2", "--timeout", "120", "app:app"]

Bước 5: Nginx Configuration

events {
    worker_connections 1024;
}

http {
    upstream api_gateway {
        server api-gateway:5000;
    }

    upstream qdrant {
        server qdrant:6333;
    }

    server {
        listen 80;
        server_name localhost;

        # Request logging
        access_log /var/log/nginx/access.log;
        error_log /var/log/nginx/error.log;

        location /api/ {
            proxy_pass http://api_gateway/;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            
            # Handle streaming responses
            proxy_buffering off;
            proxy_cache off;
            proxy_read_timeout 300s;
        }

        location /qdrant/ {
            proxy_pass http://qdrant/;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
        }

        location /health {
            return 200 'OK';
            add_header Content-Type text/plain;
        }
    }
}

Bước 6: Prometheus Configuration

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'api-gateway'
    static_configs:
      - targets: ['api-gateway:5000']
    metrics_path: '/metrics'
    scrape_interval: 5s

Bước 7: Environment Variables

Tạo file .env trong thư mục gốc của dự án:

# HolySheep AI - Register at https://holysheep.ai/register
HOLYSHEEP_API_KEY=YOUR_HOLYSHEEP_API_KEY

Grafana credentials
GRAFANA_PASSWORD=dev_password_change_in_production

⚠️ Lưu ý quan trọng: Thay thế YOUR_HOLYSHEEP_API_KEY bằng API key thực tế của bạn. Đăng ký tài khoản mới tại đăng ký HolySheep AI để nhận tín dụng miễn phí khi đăng ký.

Bước 8: Khởi chạy toàn bộ hệ thống

# Build và start tất cả services
docker-compose up -d --build

Xem logs của tất cả services
docker-compose logs -f

Kiểm tra trạng thái
docker-compose ps

Sau khi khởi chạy thành công, các service sẽ available tại:

API Gateway: http://localhost:8080/api/
Qdrant Dashboard: http://localhost:6333/dashboard
Prometheus: http://localhost:9090
Grafana: http://localhost:3000 (admin/admin123)

Sử dụng trong Development: Ví dụ với Python Client

import openai

Kết nối đến local API Gateway (Docker)
openai.api_base = "http://localhost:8080/api/v1"
openai.api_key = "local-dev-key"  # Key không cần thực sự - chỉ để satisfy client

Gọi Chat Completions - sẽ được forward đến HolySheep AI
response = openai.ChatCompletion.create(
    model="gpt-4.1",
    messages=[
        {"role": "system", "content": "Bạn là trợ lý AI hữu ích."},
        {"role": "user", "content": "Giải thích RAG pipeline trong 3 câu"}
    ],
    temperature=0.7,
    max_tokens=500
)

print(response.choices[0].message['content'])

Sử dụng embeddings cho RAG
embedding_response = openai.Embedding.create(
    model="text-embedding-3-small",
    input="Nội dung cần tạo embedding cho vector search"
)

print(f"Embedding dimensions: {len(embedding_response.data[0].embedding)}")

Demo: Xây dựng RAG Pipeline đơn giản

Với hệ thống đã setup, bạn có thể xây dựng một RAG pipeline

Docker Compose 搭建本地 AI API 开发环境：完整指南 2026

Tại sao cần môi trường development AI cục bộ?

Kiến trúc hệ thống

Bước 1: Cài đặt Docker và Docker Compose

Bước 2: Tạo cấu trúc thư mục dự án

Bước 3: File cấu hình Docker Compose

Bước 4: API Gateway Service

requirements.txt

app.py — API Gateway Implementation

HolySheep AI Configuration

Prometheus metrics

Dockerfile cho API Gateway

Install dependencies

Copy application

Expose port

Run with gunicorn for production-like behavior

Bước 5: Nginx Configuration

Bước 6: Prometheus Configuration

Bước 7: Environment Variables

Grafana credentials

Bước 8: Khởi chạy toàn bộ hệ thống

Xem logs của tất cả services

Kiểm tra trạng thái

Sử dụng trong Development: Ví dụ với Python Client

Kết nối đến local API Gateway (Docker)

Gọi Chat Completions - sẽ được forward đến HolySheep AI

Sử dụng embeddings cho RAG

Demo: Xây dựng RAG Pipeline đơn giản

Tài nguyên liên quan

Bài viết liên quan

Tại sao cần môi trường development AI cục bộ?

Kiến trúc hệ thống

Bước 1: Cài đặt Docker và Docker Compose

Bước 2: Tạo cấu trúc thư mục dự án

Bước 3: File cấu hình Docker Compose

Bước 4: API Gateway Service

requirements.txt

app.py — API Gateway Implementation

HolySheep AI Configuration

Prometheus metrics

Dockerfile cho API Gateway

Install dependencies

Copy application

Expose port

Run with gunicorn for production-like behavior

Bước 5: Nginx Configuration

Bước 6: Prometheus Configuration

Bước 7: Environment Variables

Grafana credentials

Bước 8: Khởi chạy toàn bộ hệ thống

Xem logs của tất cả services

Kiểm tra trạng thái

Sử dụng trong Development: Ví dụ với Python Client

Kết nối đến local API Gateway (Docker)

Gọi Chat Completions - sẽ được forward đến HolySheep AI

Sử dụng embeddings cho RAG

Demo: Xây dựng RAG Pipeline đơn giản

Tài nguyên liên quan

Bài viết liên quan

🔥 Thử HolySheep AI