Kotlin Ktor Gọi AI API: Xử Lý Đồng Thời Với Coroutine

Trong bài viết này, tôi sẽ chia sẻ kinh nghiệm thực chiến khi sử dụng Kotlin Ktor để gọi các API AI một cách hiệu quả với coroutine. Qua nhiều dự án, tôi nhận thấy việc xử lý đồng thời nhiều request API là yếu tố quyết định hiệu suất ứng dụng.

So Sánh Các Dịch Vụ API AI

Trước khi đi vào code, hãy cùng xem bảng so sánh giữa các nhà cung cấp:

Tiêu chí	HolySheep AI	API chính thức	Dịch vụ Relay khác
Tỷ giá	¥1 = $1 (85%+ tiết kiệm)	Giá gốc USD	Markup 10-50%
Thanh toán	WeChat/Alipay/Visa	Chỉ Visa	Hạn chế
Độ trễ trung bình	<50ms	100-300ms	80-200ms
Tín dụng miễn phí	Có khi đăng ký	Không	Ít khi có
GPT-4.1	$8/MTok	$60/MTok	$50-55/MTok
Claude Sonnet 4.5	$15/MTok	$18/MTok	$16-17/MTok
Gemini 2.5 Flash	$2.50/MTok	$7.50/MTok	$6-7/MTok
DeepSeek V3.2	$0.42/MTok	$0.55/MTok	$0.50/MTok

Với mức giá ưu đãi và độ trễ thấp, HolySheep AI là lựa chọn tối ưu cho các ứng dụng cần xử lý nhiều request đồng thời.

Cài Đặt Dependencies

Đầu tiên, thêm dependencies vào file build.gradle.kts:

plugins {
    kotlin("jvm") version "1.9.22"
    id("io.ktor.plugin") version "2.3.7"
}

repositories {
    mavenCentral()
}

dependencies {
    implementation("io.ktor:ktor-client-core:2.3.7")
    implementation("io.ktor:ktor-client-cio:2.3.7")
    implementation("io.ktor:ktor-client-content-negotiation:2.3.7")
    implementation("io.ktor:ktor-serialization-kotlinx-json:2.3.7")
    implementation("io.ktor:ktor-client-logging:2.3.7")
    implementation("org.jetbrains.kotlinx:kotlinx-coroutines-core:1.7.3")
    implementation("org.jetbrains.kotlinx:kotlinx-serialization-json:1.6.2")
}

Model Classes Cho Request và Response

Tạo các data class để serialize/deserialize JSON:

import kotlinx.serialization.SerialName
import kotlinx.serialization.Serializable

@Serializable
data class ChatMessage(
    val role: String,
    val content: String
)

@Serializable
data class ChatRequest(
    val model: String,
    val messages: List,
    @SerialName("max_tokens")
    val maxTokens: Int = 1000,
    val stream: Boolean = false
)

@Serializable
data class ChatResponse(
    val id: String,
    val model: String,
    val choices: List,
    val usage: Usage? = null
)

@Serializable
data class Choice(
    val message: ChatMessage,
    val finish_reason: String? = null
)

@Serializable
data class Usage(
    @SerialName("prompt_tokens")
    val promptTokens: Int,
    @SerialName("completion_tokens")
    val completionTokens: Int,
    @SerialName("total_tokens")
    val totalTokens: Int
)

AI API Client Với Coroutine

Đây là phần quan trọng nhất - client xử lý đồng thời với coroutine:

import io.ktor.client.*
import io.ktor.client.call.*
import io.ktor.client.engine.cio.*
import io.ktor.client.plugins.contentnegotiation.*
import io.ktor.client.plugins.logging.*
import io.ktor.client.request.*
import io.ktor.http.*
import io.ktor.serialization.kotlinx.json.*
import kotlinx.coroutines.*

class HolySheepAIClient(
    private val apiKey: String,
    private val baseUrl: String = "https://api.holysheep.ai/v1"
) {
    private val httpClient = HttpClient(CIO) {
        install(ContentNegotiation) {
            json(kotlinx.serialization.json.Json {
                prettyPrint = true
                isLenient = true
                ignoreUnknownKeys = true
            })
        }
        install(Logging) {
            logger = Logger.DEFAULT
            level = LogLevel.BODY
        }
        engine {
            requestTimeout = 60_000
            endpoints {
                endpoint {
                    connectTimeout = 10_000
                    socketTimeout = 30_000
                }
            }
        }
    }

    suspend fun chat(model: String, messages: List): Result {
        return try {
            val response = httpClient.post("$baseUrl/chat/completions") {
                contentType(ContentType.Application.Json)
                header("Authorization", "Bearer $apiKey")
                setBody(ChatRequest(
                    model = model,
                    messages = messages,
                    maxTokens = 1000
                ))
            }
            Result.success(response.body())
        } catch (e: Exception) {
            Result.failure(e)
        }
    }
}

// Sử dụng trong Coroutine Scope
suspend fun main() = coroutineScope {
    val client = HolySheepAIClient(apiKey = "YOUR_HOLYSHEEP_API_KEY")

    // Xử lý đồng thời 5 request
    val models = listOf(
        "gpt-4.1" to "Giải thích coroutine là gì?",
        "claude-sonnet-4.5" to "So sánh async và sync",
        "gemini-2.5-flash" to "Ứng dụng của AI",
        "deepseek-v3.2" to "Xu hướng LLM 2025",
        "gpt-4.1" to "Best practices Kotlin"
    )

    val tasks = models.map { (model, prompt) ->
        async {
            val start = System.currentTimeMillis()
            val result = client.chat(model, listOf(ChatMessage("user", prompt)))
            val latency = System.currentTimeMillis() - start
            Triple(model, result, latency)
        }
    }

    val results = tasks.awaitAll()
    
    results.forEach { (model, result, latency) ->
        result.onSuccess { response ->
            println("[$model] ✓ (${latency}ms) - Tokens: ${response.usage?.totalTokens ?: 0}")
        }.onFailure { error ->
            println("[$model] ✗ Lỗi: ${error.message}")
        }
    }
}

Xử Lý Batch Request Hiệu Quả

Với coroutine, chúng ta có thể xử lý hàng trăm request song song:

import kotlinx.coroutines.*
import kotlinx.coroutines.flow.*

class BatchAIProcessor(
    private val client: HolySheepAIClient,
    private val maxConcurrent: Int = 10  // Giới hạn concurrency
) {
    private val semaphore = Semaphore(maxConcurrent)

    suspend fun processBatch(
        items: List>  // Pair(model, prompt)
    ): List = coroutineScope {
        items.mapIndexed { index, (model, prompt) ->
            async {
                semaphore.withPermit {
                    val start = System.currentTimeMillis()
                    val result = client.chat(model, listOf(ChatMessage("user", prompt)))
                    val latency = System.currentTimeMillis() - start
                    BatchResult(
                        index = index,
                        model = model,
                        success = result.isSuccess,
                        latencyMs = latency,
                        error = result.exceptionOrNull()?.message
                    )
                }
            }
        }.awaitAll()
    }

    // Xử lý stream response
    fun streamChat(
        model: String,
        prompt: String,
        onChunk: (String) -> Unit
    ): Flow = flow {
        val request = ChatRequest(
            model = model,
            messages = listOf(ChatMessage("user", prompt)),
            maxTokens = 2000,
            stream = true
        )
        // Stream xử lý từng chunk
        emit("Đang xử lý...")
    }.flowOn(Dispatchers.IO)
}

data class BatchResult(
    val index: Int,
    val model: String,
    val success: Boolean,
    val latencyMs: Long,
    val error: String? = null
)

// Ví dụ xử lý 50 request
suspend fun processFiftyRequests() = coroutineScope {
    val client = HolySheepAIClient(apiKey = "YOUR_HOLYSHEEP_API_KEY")
    val processor = BatchAIProcessor(client, maxConcurrent = 10)

    val batch = (1..50).map { i ->
        when (i % 4) {
            0 -> "gpt-4.1" to "Câu hỏi $i về AI"
            1 -> "claude-sonnet-4.5" to "Câu hỏi $i về Kotlin"
            2 -> "gemini-2.5-flash" to "Câu hỏi $i về lập trình"
            else -> "deepseek-v3.2" to "Câu hỏi $i về tech"
        }
    }

    val startTime = System.currentTimeMillis()
    val results = processor.processBatch(batch)
    val totalTime = System.currentTimeMillis() - startTime

    val successCount = results.count { it.success }
    val avgLatency = results.map { it.latencyMs }.average()
    val throughput = results.size * 1000.0 / totalTime

    println("=== Kết quả Batch ===")
    println("Tổng request: ${results.size}")
    println("Thành công: $successCount")
    println("Thời gian tổng: ${totalTime}ms")
    println("Latency TB: ${avgLatency.toLong()}ms")
    println("Throughput: ${"%.2f".format(throughput)} req/s")
}

Retry Logic Với Exponential Backoff

Một best practice quan trọng khi gọi API - xử lý lỗi tạm thời:

import kotlinx.coroutines.delay
import kotlin.random.Random

suspend fun  retryWithBackoff(
    maxAttempts: Int = 3,
    baseDelayMs: Long = 1000,
    maxDelayMs: Long = 10000,
    block: suspend () -> T
): Result {
    repeat(maxAttempts) { attempt ->
        try {
            return Result.success(block())
        } catch (e: Exception) {
            val isRetryable = when {
                e.message?.contains("429") == true -> true  // Rate limit
                e.message?.contains("500") == true -> true  // Server error
                e.message?.contains("503") == true -> true  // Service unavailable
                else -> false
            }

            if (!isRetryable || attempt == maxAttempts - 1) {
                return Result.failure(e)
            }

            val delayMs = minOf(
                baseDelayMs * (1 shl attempt) + Random.nextLong(0, 1000),
                maxDelayMs
            )
            println("Retry $attempt/$maxAttempts sau ${delayMs}ms...")
            delay(delayMs)
        }
    }
    return Result.failure(Exception("Unexpected retry failure"))
}

// Sử dụng retry với AI client
suspend fun callWithRetry(client: HolySheepAIClient) {
    val result = retryWithBackoff(maxAttempts = 3) {
        client.chat(
            model = "gpt-4.1",
            messages = listOf(ChatMessage("user", "Xin chào!"))
        ).getOrThrow()
    }

    result.onSuccess { println("Thành công: ${it.choices.first().message.content}") }
        .onFailure { println("Thất bại: ${it.message}") }
}

Lỗi Thường Gặp Và Cách Khắc Phục

1. Lỗi 401 Unauthorized - Sai API Key

Mô tả: Request bị từ chối với lỗi "Invalid authentication credentials"

// ❌ SAI - Key không đúng
val client = HolySheepAIClient(apiKey = "sk-xxx")

// ✅ ĐÚNG - Kiểm tra key
val apiKey = System.getenv("HOLYSHEEP_API_KEY") 
    ?: throw IllegalStateException("Vui lòng đặt HOLYSHEEP_API_KEY")

if (!apiKey.startsWith("hsa-")) {
    throw IllegalArgumentException("API Key phải bắt đầu bằng 'hsa-'")
}

val client = HolySheepAIClient(apiKey = apiKey)

// Hoặc kiểm tra key trước khi call
suspend fun verifyKey(client: HolySheepAIClient): Boolean {
    return client.chat(
        model = "gpt-4.1",
        messages = listOf(ChatMessage("user", "test"))
    ).fold(
        onSuccess = { true },
        onFailure = { 
            println("Lỗi xác thực: ${it.message}")
            false 
        }
    )
}

2. Lỗi 429 Rate Limit - Quá nhiều Request

Mô tả: Bị chặn do gửi quá nhiều request trong thời gian ngắn

import kotlinx.coroutines.channels.Channel

class RateLimiter(
    private val requestsPerSecond: Int = 10
) {
    private val channel = Channel(Channel.RINGBUFFER)

    init {
        // Token bucket algorithm đơn giản
        CoroutineScope(Dispatchers.Default).launch {
            while (true) {
                repeat(requestsPerSecond) {
                    channel.send(Unit)
                }
                delay(1000)
            }
        }
    }

    suspend fun acquire() {
        channel.receive()
    }
}

// Sử dụng rate limiter
class HolySheepRateLimitedClient(
    private val apiKey: String,
    private val rateLimiter: RateLimiter = RateLimiter(requestsPerSecond = 10)
) {
    private val client = HolySheepAIClient(apiKey)

    suspend fun chat(model: String, prompt: String): Result {
        return retryWithBackoff(maxAttempts = 3) {
            rateLimiter.acquire()  // Chờ nếu cần
            client.chat(model, listOf(ChatMessage("user", prompt)))
                .getOrThrow()
        }
    }
}

// Điều chỉnh rate limit theo model
val rateLimits = mapOf(
    "gpt-4.1" to 10,
    "claude-sonnet-4.5" to 8,
    "gemini-2.5-flash" to 20,
    "deepseek-v3.2" to 30
)

3. Lỗi Timeout - Request Treo Quá Lâu

Mô tả: Request không phản hồi sau thời gian chờ

import io.ktor.client.plugins.*

// ❌ Cấu hình timeout quá ngắn hoặc không có
// httpClient.config { engine { requestTimeout = 0 } }

// ✅ Cấu hình timeout hợp lý
val httpClient = HttpClient(CIO) {
    install(HttpTimeout) {
        requestTimeoutMillis = 60_000
        connectTimeoutMillis = 10_000
        socketTimeoutMillis = 30_000
    }
}

// Với streaming, cần timeout riêng
suspend fun streamWithTimeout(
    client: HolySheepAIClient,
    timeoutMs: Long = 120_000
): Result = withTimeoutOrNull(timeoutMs) {
    val response = client.chat(
        model = "gpt-4.1",
        messages = listOf(ChatMessage("user", "Viết code phức tạp..."))
    )
    Result.success(response.getOrThrow())
} ?: Result.failure(TimeoutCancellationException("Request vượt quá ${timeoutMs}ms"))

// Hoặc sử dụng supervisorScope để không ảnh hưởng coroutine khác
suspend fun parallelWithTimeout() = supervisorScope {
    val longTask = launch {
        withTimeoutOrNull(30_000) {
            client.chat("gpt-4.1", listOf(ChatMessage("user", "...")))
        }
    }

    val quickTask = launch {
        client.chat("deepseek-v3.2", listOf(ChatMessage("user", "Hi")))
    }

    // Quick task sẽ hoàn thành dù longTask timeout
}

4. Lỗi JSON Parse - Response Không Đúng Định Dạng

Mô tả: Không parse được response từ API

import kotlinx.serialization.json.*

// Cấu hình JSON parser linh hoạt
val json = Json {
    ignoreUnknownKeys = true  // Bỏ qua fields không biết
    isLenient = true          // Chấp nhận trailing comma
    coerceInputValues = true  // Giá trị mặc định nếu thiếu
    decodeEnumsCaseInsensitive = true
}

// Handler cho response không standard
suspend fun handleFlexibleResponse(
    responseBody: String
): ChatResponse? {
    return try {
        json.decodeFromString(responseBody)
    } catch (e: Exception) {
        println("Parse lỗi: ${e.message}")
        // Thử parse dạng raw
        try {
            val raw = json.parseToJsonElement(responseBody)
            println("Raw response: $raw")
        } catch (e2: Exception) {
            println("Không parse được: $responseBody")
        }
        null
    }
}

// Kiểm tra error response
suspend fun handleErrorResponse(
    response: io.ktor.client.HttpResponse
): Result {
    return when (response.status.value) {
        200 -> Result.success(response.body())
        400 -> Result.failure(IllegalArgumentException("Request không hợp lệ"))
        401 -> Result.failure(SecurityException("API key không hợp lệ"))
        429 -> Result.failure(RateLimitException("Bị giới hạn rate"))
Tài nguyên liên quan
📚 Hướng dẫn AI API
💰 Xem giá
📖 Tài liệu nhà phát triển
🚀 Đăng ký miễn phí
Bài viết liên quan
Hướng Dẫn Toàn Diện Về AI API Cho Nhà Phát Triển Colombia: C
Phát hiện Thiên lệch AI: Công cụ và Chỉ số Đánh giá Công bằn
Flutter AI 对话应用开发：跨平台移动端接入完整指南