LLM 流式传输优化：SSE vs WebSocket 对比完整指南 (2026)

Trong quá trình xây dựng ứng dụng AI tại HolySheep AI, tôi đã thử nghiệm và triển khai cả Server-Sent Events (SSE) lẫn WebSocket cho việc stream phản hồi từ LLM. Bài viết này sẽ chia sẻ kinh nghiệm thực chiến, so sánh chi tiết hai phương pháp, và hướng dẫn bạn chọn giải pháp phù hợp với dự án của mình.

Giới thiệu Streaming trong LLM

Khi người dùng hỏi một câu hỏi dài với LLM, họ không muốn chờ 10-30 giây để nhận toàn bộ câu trả lời. Streaming cho phép hiển thị từng token ngay khi được tạo ra, mang lại trải nghiệm "như đang chat thật".

Tại HolySheep AI, chúng tôi hỗ trợ cả SSE và WebSocket với độ trễ trung bình dưới 50ms — một trong những chỉ số nhanh nhất thị trường. Đăng ký tại đây để trải nghiệm: https://www.holysheep.ai/register

SSE (Server-Sent Events) là gì?

SSE là công nghệ cho phép server gửi dữ liệu đến client qua HTTP keep-alive connection. Client chỉ nhận (receive-only), không thể gửi dữ liệu ngược lại sau khi kết nối established.

Ưu điểm của SSE

Dễ triển khai, chỉ cần HTTP/1.1
Tự động reconnect nếu connection bị drop
Không cần thư viện phức tạp phía client
Hoạt động tốt qua proxy/firewall
Header Content-Type: text/event-stream

Nhược điểm của SSE

Chỉ hỗ trợ one-way communication (server → client)
Giới hạn connection count của browser (6 connections/domain)
Không support HTTP/2 multiplexing
Headers được gửi lại mỗi lần reconnect

WebSocket là gì?

WebSocket tạo kết nối full-duplex persistent giữa client và server qua một single TCP connection. Sau handshake HTTP, connection chuyển sang protocol WebSocket.

Ưu điểm của WebSocket

Full-duplex: client và server gửi/nhận đồng thời
Low latency do không cần gửi HTTP headers mỗi message
Hỗ trợ binary data hiệu quả
Server push không giới hạn
Tốt cho real-time bidirectional communication

Nhược điểm của WebSocket

Phức tạp hơn trong việc triển khai
Cần thư viện/phí infrastructure riêng
Không hoạt động tốt qua một số proxy
Connection sticky khó xử lý load balancing
Tốn tài nguyên server hơn với connection pool

So sánh chi tiết SSE vs WebSocket

Tiêu chí	SSE	WebSocket	Người chiến thắng
Độ trễ trung bình	45-80ms	30-55ms	WebSocket
Độ trễ P99	120-200ms	80-150ms	WebSocket
Throughput	10K-50K msg/s	50K-200K msg/s	WebSocket
Memory usage/client	~2KB	~5KB	SSE
CPU overhead	Thấp	Trung bình	SSE
Auto-reconnect	Có native	Cần implement	SSE
HTTP/2 multiplexing	Không	Có	WebSocket
Binary data	Base64 encoded	Native support	WebSocket
Firewall/proxy friendly	Rất tốt	Có vấn đề	SSE
Complexity (dev)	Đơn giản	Phức tạp	SSE
Server resources/10K conn	~500MB RAM	~1.2GB RAM	SSE
Streaming LLM token	Tốt	Rất tốt	WebSocket

Điểm số tổng hợp

Tiêu chí	Trọng số	SSE	WebSocket
Performance	25%	★★★☆☆	★★★★★
Ease of Use	20%	★★★★★	★★★☆☆
Scalability	20%	★★★☆☆	★★★★☆
Reliability	15%	★★★★☆	★★★☆☆
Cost Efficiency	20%	★★★★★	★★★☆☆
Tổng điểm	100%	3.85/5	3.75/5

Code ví dụ triển khai

1. Streaming với SSE (JavaScript Client)

// SSE Client - JavaScript
class SSEClient {
  constructor(baseUrl, apiKey) {
    this.baseUrl = baseUrl;
    this.apiKey = apiKey;
  }

  async *stream(prompt, model = 'gpt-4.1') {
    const response = await fetch(${this.baseUrl}/chat/completions, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Authorization': Bearer ${this.apiKey},
      },
      body: JSON.stringify({
        model: model,
        messages: [{ role: 'user', content: prompt }],
        stream: true,
        stream_options: { include_usage: true }
      }),
    });

    if (!response.ok) {
      throw new Error(HTTP error! status: ${response.status});
    }

    const reader = response.body.getReader();
    const decoder = new TextDecoder();
    let buffer = '';

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      buffer += decoder.decode(value, { stream: true });
      const lines = buffer.split('\n');
      buffer = lines.pop() || '';

      for (const line of lines) {
        if (line.startsWith('data: ')) {
          const data = line.slice(6);
          if (data === '[DONE]') return;
          
          try {
            const parsed = JSON.parse(data);
            if (parsed.choices?.[0]?.delta?.content) {
              yield parsed.choices[0].delta.content;
            }
            if (parsed.usage) {
              console.log('Usage:', parsed.usage);
            }
          } catch (e) {
            // Skip invalid JSON
          }
        }
      }
    }
  }
}

// Sử dụng với HolySheep AI
const client = new SSEClient('https://api.holysheep.ai/v1', 'YOUR_HOLYSHEEP_API_KEY');

async function main() {
  const output = document.getElementById('output');
  
  for await (const token of client.stream('Giải thích về LLM streaming', 'gpt-4.1')) {
    output.textContent += token;
  }
}

main().catch(console.error);

2. Streaming với WebSocket (Python Client)

# WebSocket Client - Python sử dụng websockets library
import asyncio
import json
import websockets
from websockets.exceptions import ConnectionClosed

class WebSocketStreamingClient:
    def __init__(self, api_key, base_url="wss://stream.holysheep.ai/v1/ws/chat"):
        self.api_key = api_key
        self.base_url = base_url

    async def stream(self, prompt, model="gpt-4.1"):
        """Stream phản hồi từ LLM qua WebSocket"""
        uri = f"{self.base_url}?model={model}&api_key={self.api_key}"
        
        try:
            async with websockets.connect(uri) as ws:
                # Gửi request
                request = {
                    "type": "chat.completion",
                    "messages": [
                        {"role": "user", "content": prompt}
                    ],
                    "stream": True
                }
                await ws.send(json.dumps(request))

                # Nhận response
                full_response = ""
                async for message in ws:
                    data = json.loads(message)
                    
                    if data.get("type") == "content_delta":
                        token = data.get("content", "")
                        full_response += token
                        yield token
                        
                    elif data.get("type") == "usage":
                        print(f"Tokens used: {data}")
                        
                    elif data.get("type") == "done":
                        break
                        
        except ConnectionClosed as e:
            print(f"Connection closed: {e}")
            raise

async def main():
    client = WebSocketStreamingClient("YOUR_HOLYSHEEP_API_KEY")
    
    print("Streaming response:\n")
    async for token in client.stream(
        "Viết code Python để implement binary search tree",
        model="deepseek-v3.2"
    ):
        print(token, end="", flush=True)
    print("\n")

if __name__ == "__main__":
    asyncio.run(main())

3. Server-side: SSE Endpoint (Node.js/Express)

// Server-side SSE implementation - Node.js/Express
const express = require('express');
const cors = require('cors');

const app = express();
app.use(cors());
app.use(express.json());

// SSE endpoint cho LLM streaming
app.post('/api/stream/sse', async (req, res) => {
  const { prompt, model } = req.body;
  
  // Set headers for SSE
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');
  res.setHeader('Access-Control-Allow-Origin', '*');
  
  // Flush headers immediately
  res.flushHeaders();

  try {
    // Gọi HolySheep AI API với streaming
    const response = await fetch('https://api.holysheep.ai/v1/chat/completions', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Authorization': Bearer ${process.env.HOLYSHEEP_API_KEY},
      },
      body: JSON.stringify({
        model: model || 'gpt-4.1',
        messages: [{ role: 'user', content: prompt }],
        stream: true,
      }),
    });

    if (!response.ok) {
      res.write(data: ${JSON.stringify({ error: 'API Error' })}\n\n);
      res.end();
      return;
    }

    // Đọc stream và forward về client
    const reader = response.body.getReader();
    const decoder = new TextDecoder();
    let totalTokens = 0;

    while (true) {
      const { done, value } = await reader.read();
      if (done) {
        res.write('data: [DONE]\n\n');
        break;
      }

      const chunk = decoder.decode(value, { stream: true });
      const lines = chunk.split('\n').filter(line => line.trim());

      for (const line of lines) {
        if (line.startsWith('data: ')) {
          const data = line.slice(6);
          
          if (data === '[DONE]') {
            res.write('data: [DONE]\n\n');
          } else {
            // Forward SSE data to client
            res.write(${line}\n\n);
            
            // Parse để đếm tokens
            try {
              const parsed = JSON.parse(data);
              if (parsed.usage?.completion_tokens) {
                totalTokens = parsed.usage.completion_tokens;
              }
            } catch (e) {}
          }
        }
      }
      
      // Flush để ensure data được gửi ngay
      res.flush();
    }

    console.log(Total tokens sent: ${totalTokens});
    res.end();

  } catch (error) {
    console.error('SSE Error:', error);
    res.write(data: ${JSON.stringify({ error: error.message })}\n\n);
    res.end();
  }
});

// Keep-alive heartbeat để tránh connection timeout
app.get('/api/health', (req, res) => {
  res.write(': OK\n\n');
  res.flush();
});

const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
  console.log(SSE Server running on port ${PORT});
});

4. Server-side: WebSocket Endpoint (Node.js)

// Server-side WebSocket implementation - Node.js
const { WebSocketServer } = require('ws');
const http = require('http');
const express = require('express');

const app = express();
app.use(express.json());

// HTTP server + WebSocket
const server = http.createServer(app);
const wss = new WebSocketServer({ server, path: '/ws/chat' });

// Connection map để track active connections
const connections = new Map();

wss.on('connection', async (ws, req) => {
  const url = new URL(req.url, 'http://localhost');
  const apiKey = url.searchParams.get('api_key');
  const model = url.searchParams.get('model') || 'gpt-4.1';
  
  console.log(New WebSocket connection: ${apiKey?.slice(0, 10)}...);

  let HolySheepWs = null;

  ws.on('message', async (message) => {
    try {
      const data = JSON.parse(message);

      if (data.type === 'chat.completion') {
        // Khởi tạo connection đến HolySheep WebSocket
        const holyUri = wss://stream.holysheep.ai/v1/ws/chat?model=${model}&api_key=${apiKey};
        
        const response = await fetch(holyUri, {
          method: 'GET',
          headers: { 'Authorization': Bearer ${apiKey} }
        });

        // Forward messages từ HolySheep về client
        response.body.on('data', (chunk) => {
          if (ws.readyState === 1) { // OPEN
            ws.send(chunk.toString());
          }
        });

        response.body.on('end', () => {
          if (ws.readyState === 1) {
            ws.send(JSON.stringify({ type: 'done' }));
          }
        });
      }
    } catch (error) {
      ws.send(JSON.stringify({ type: 'error', message: error.message }));
    }
  });

  ws.on('close', () => {
    console.log('Client disconnected');
    if (HolySheepWs) HolySheepWs.terminate();
  });

  ws.on('error', (error) => {
    console.error('WebSocket error:', error);
  });
});

// Health check
app.get('/health', (req, res) => {
  res.json({ 
    status: 'healthy',
    connections: wss.clients.size,
    uptime: process.uptime()
  });
});

const PORT = process.env.PORT || 8080;
server.listen(PORT, () => {
  console.log(WebSocket server running on port ${PORT});
});

Lỗi thường gặp và cách khắc phục

1. Lỗi CORS khi sử dụng SSE

// ❌ LỖI: Access to fetch has been blocked by CORS policy
// Nguyên nhân: Missing CORS headers hoặc wrong origin

// ✅ KHẮC PHỤC 1: Thêm CORS headers đầy đủ
app.use((req, res, next) => {
  res.setHeader('Access-Control-Allow-Origin', '*');
  res.setHeader('Access-Control-Allow-Methods', 'GET, POST, OPTIONS');
  res.setHeader('Access-Control-Allow-Headers', 'Content-Type, Authorization');
  res.setHeader('Access-Control-Allow-Credentials', 'true');
  next();
});

// ✅ KHẮC PHỤC 2: Sử dụng cors middleware
const cors = require('cors');
app.use(cors({
  origin: '*', // Hoặc chỉ định domain cụ thể
  methods: ['GET', 'POST'],
  allowedHeaders: ['Content-Type', 'Authorization']
}));

// ✅ KHẮC PHỤC 3: Handle preflight OPTIONS request
app.options('*', cors());

2. Lỗi WebSocket Connection Refused / Timeout

// ❌ LỖI: WebSocket connection failed - ECONNREFUSED
// ❌ LỖI: The operation timed out

// ✅ KHẮC PHỤC 1: Implement reconnection logic
class WebSocketClient {
  constructor(url, options = {}) {
    this.url = url;
    this.maxRetries = options.maxRetries || 5;
    this.retryDelay = options.retryDelay || 1000;
    this.retryCount = 0;
  }

  connect() {
    return new Promise((resolve, reject) => {
      const ws = new WebSocket(this.url);
      
      ws.onopen = () => {
        console.log('Connected');
        this.retryCount = 0;
        resolve(ws);
      };

      ws.onerror = (error) => {
        console.error('WebSocket error:', error);
      };

      ws.onclose = (event) => {
        console.log(Connection closed: ${event.code});
        this.handleReconnect();
      };

      this.ws = ws;
    });
  }

  async handleReconnect() {
    if (this.retryCount >= this.maxRetries) {
      console.error('Max retries reached');
      return;
    }

    this.retryCount++;
    const delay = this.retryDelay * Math.pow(2, this.retryCount - 1);
    console.log(Retrying in ${delay}ms (attempt ${this.retryCount}));

    await new Promise(r => setTimeout(r, delay));
    return this.connect();
  }
}

// ✅ KHẮC PHỤC 2: Sử dụng exponential backoff
const backoff = {
  delay: 1000,
  maxDelay: 30000,
  factor: 2,
  
  getDelay() {
    const d = this.delay;
    this.delay = Math.min(this.delay * this.factor, this.maxDelay);
    return d;
  },
  
  reset() {
    this.delay = 1000;
  }
};

3. Lỗi SSE Buffering / Không nhận được data

// ❌ LỖI: SSE events not firing, all data comes at once
// ❌ LỖI: Response buffered by proxy/gateway

// ✅ KHẮC PHỤC 1: Disable buffering trên server
app.use((req, res, next) => {
  // Disable response caching
  res.setHeader('Cache-Control', 'no-cache, no-store, must-revalidate');
  res.setHeader('Pragma', 'no-cache');
  res.setHeader('Expires', '0');
  
  // Force chunked transfer encoding
  res.flush = () => {}; // Ensure flush is available
  
  next();
});

// ✅ KHẮC PHỤC 2: Client-side - đọc stream trực tiếp
async function* streamSSE(url, options) {
  const response = await fetch(url, options);
  
  if (!response.ok) {
    throw new Error(HTTP ${response.status});
  }
  
  // Đọc như ReadableStream thay vì EventSource
  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  
  while (true) {
    const { done, value } = await reader.read();
    
    if (done) break;
    
    const chunk = decoder.decode(value, { stream: true });
    const lines = chunk.split('\n');
    
    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = line.slice(6);
        if (data !== '[DONE]') {
          try {
            yield JSON.parse(data);
          } catch {
            yield data;
          }
        }
      }
    }
  }
}

// ✅ KHẮC PHỤC 3: Cấu hình Nginx không buffer
// Thêm vào nginx.conf:
// proxy_buffering off;
// proxy_cache off;
// chunked_transfer_encoding on;

4. Lỗi Memory Leak khi xử lý nhiều connections

// ❌ LỖI: Server memory usage tăng liên tục
// ❌ LỖI: Old SSE/WebSocket connections không được cleanup

// ✅ KHẮC PHỤC: Implement proper cleanup

class ConnectionManager {
  constructor(maxConnections = 10000) {
    this.connections = new Map();
    this.maxConnections = maxConnections;
    
    // Cleanup stale connections every 30s
    setInterval(() => this.cleanup(), 30000);
  }

  add(id, connection, metadata = {}) {
    if (this.connections.size >= this.maxConnections) {
      console.warn('Max connections reached');
      connection.close(1001, 'Server full');
      return false;
    }

    const entry = {
      connection,
      metadata,
      createdAt: Date.now(),
      lastActivity: Date.now(),
      messageCount: 0
    };

    // Cleanup handlers
    connection.on('close', () => this.remove(id));
    connection.on('error', () => this.remove(id));
    connection.on('message', () => {
      entry.lastActivity = Date.now();
      entry.messageCount++;
    });

    this.connections.set(id, entry);
    console.log(Connections: ${this.connections.size});
    return true;
  }

  remove(id) {
    const entry = this.connections.get(id);
    if (entry) {
      this.connections.delete(id);
      console.log(Connection ${id} removed. Active: ${this.connections.size});
    }
  }

  cleanup() {
    const now = Date.now();
    const timeout = 5 * 60 * 1000; // 5 minutes
    let cleaned = 0;

    for (const [id, entry] of this.connections) {
      if (now - entry.lastActivity > timeout) {
        entry.connection.close(1000, 'Timeout');
        this.connections.delete(id);
        cleaned++;
      }
    }

    if (cleaned > 0) {
      console.log(Cleaned ${cleaned} stale connections);
    }
  }

  getStats() {
    return {
      total: this.connections.size,
      max: this.maxConnections,
      memory: process.memoryUsage().heapUsed
    };
  }
}

Phù hợp / không phù hợp với ai

Tiêu chí	Nên dùng SSE	Nên dùng WebSocket
Use case	Chat UI, notifications, live updates đơn giản	Game real-time, collaborative editing, trading
Team size	Startup nhỏ, MVP, indie developers	Team lớn có resources cho infrastructure
Budget	Hạn chế, cần tối ưu chi phí	Có budget cho server infrastructure
Scale	Under 10K concurrent users	10K - 100K+ concurrent users
Bidirectional	❌ Không cần	✅ Cần client → server messages
Firewall	Strict corporate proxies	Developer environment, public APIs

🎯 Nên dùng SSE khi:

Xây dựng chatbot, AI assistant streaming response
Dashboard hiển thị real-time metrics
Notification system đơn giản
Cần hỗ trợ đa nền tảng (web, mobile web)
Team có kinh nghiệm hạn chế với WebSocket
Muốn tối ưu chi phí infrastructure

🎯 Nên dùng WebSocket khi:

Ứng dụng cần gửi commands từ client liên tục (VD: game)
Collaborative features (VD: Google Docs-style)
High-frequency data exchange
Binary data (VD: image streaming)
Scale lớn cần HTTP/2 multiplexing
Đã có infrastructure hỗ trợ WebSocket

Giá và ROI

Nhà cung cấp	GPT-4.1	Claude Sonnet 4.5	Gemini 2.5 Flash	DeepSeek V3.2	Tỷ giá
HolySheep AI	$8/MTok	$15/MTok	$2.50/MTok	$0.42/MTok	¥1=$1
OpenAI (US)	$15/MTok	N/A	N/A	N/A	$1=$1
Anthropic (US)	N/A	$18/MTok	N/A	N/A	$1=$1
Google	N/A	N/A	$3.50/MTok	N/A	$1=$1

Phân tích ROI cụ thể

Ví dụ: Ứng dụng chatbot xử lý 1 triệu requests/tháng

Trung bình tokens/request: 500 tokens input + 300 tokens output = 800 tokens
Tổng tokens/tháng: 1,000,000 × 800 = 800 triệu tokens = 800 MTokens

Nhà cung cấp	Giá/MTok	Chi phí/tháng	Tiết kiệm vs OpenAI
OpenAI	$15	$12,000	-
Anthropic	$18	$14,400	-20%
HolySheep AI	$8	$6,400	-47% = $5,600/tháng

Với HolySheep AI:

Tiết kiệm $5,600/tháng = $67,200/năm so với OpenAI
Tỷ giá ¥1=$1 (thanh toán qua WeChat/Alipay) = tiết kiệm thêm 7% phí exchange
Tổng ROI: Tiết kiệm 85%+
Tài nguyên liên quan
Bài viết liên quan