Imagine this: It's 2:47 AM, and your monitoring dashboard lights up with a cascade of 401 Unauthorized errors. Your AI-powered application is failing silently, users are complaining, and you have no idea why. You dig through your logs, but they're fragmented, incomplete, and lack the critical context needed to diagnose the root cause. This exact scenario—or something dangerously close to it—happens to engineering teams every day. The culprit? Poorly implemented or completely absent security audit logging for AI API integrations.

Security audit logging isn't just a compliance checkbox or a "nice-to-have" feature. For production AI applications, it's the difference between a quick 5-minute fix and a three-hour debugging nightmare that costs you customers, revenue, and sleep. In this comprehensive guide, I'll walk you through battle-tested audit logging strategies that I've implemented across dozens of production AI systems, share real code you can copy-paste today, and help you avoid the common pitfalls that catch even senior engineers off guard.

Why Security Audit Logging Matters for AI APIs

When you're working with AI API providers like HolySheep AI, every request-response cycle contains sensitive information that needs careful tracking. Unlike traditional REST APIs, AI endpoints often process user queries, generate responses, and handle context windows—all of which introduce unique security considerations.

Effective audit logging serves multiple critical purposes in AI applications:

HolySheep AI offers pricing that makes comprehensive logging economically viable: at just $0.42 per million tokens for DeepSeek V3.2 output, you can implement detailed audit logging without the budget anxiety that comes with GPT-4.1's $8/MTok pricing. Combined with their sub-50ms latency and WeChat/Alipay payment options, HolySheep provides the infrastructure foundation your logging strategy needs.

Core Audit Log Schema Design

Before diving into implementation, let's establish a robust log schema that captures everything you need for effective security auditing. A well-designed schema is the foundation of actionable insights.

"""
AI API Security Audit Log Schema
HolySheep AI Compatible Implementation
"""

from dataclasses import dataclass, field
from datetime import datetime, timezone
from typing import Optional, Dict, Any, List
from enum import Enum
import hashlib
import json

class LogLevel(Enum):
    DEBUG = "DEBUG"
    INFO = "INFO"
    WARNING = "WARNING"
    ERROR = "ERROR"
    CRITICAL = "CRITICAL"

class RequestStatus(Enum):
    SUCCESS = "SUCCESS"
    RATE_LIMITED = "RATE_LIMITED"
    AUTH_FAILED = "AUTH_FAILED"
    TIMEOUT = "TIMEOUT"
    SERVER_ERROR = "SERVER_ERROR"
    CLIENT_ERROR = "CLIENT_ERROR"

@dataclass
class APIAuditLog:
    """
    Comprehensive audit log entry for AI API requests.
    Designed for security analysis, cost tracking, and compliance.
    """
    # Temporal tracking
    timestamp: datetime = field(default_factory=lambda: datetime.now(timezone.utc))
    request_id: str = ""  # UUID for correlation
    trace_id: str = ""    # For distributed tracing
    
    # Request metadata
    api_provider: str = "holysheep"  # Clear identifier
    endpoint: str = ""
    http_method: str = "POST"
    model_requested: str = ""
    model_used: str = ""  # Some providers route to different models
    
    # Authentication context
    api_key_id: str = ""  # Hashed key identifier, never raw key
    api_key_prefix: str = ""  # First 8 chars for identification
    auth_method: str = "bearer"
    auth_result: str = ""  # SUCCESS, FAILED, EXPIRED
    
    # Request details
    input_tokens: int = 0
    output_tokens: int = 0
    total_tokens: int = 0
    prompt: str = ""  # Sanitized content
    response: str = ""  # Sanitized content
    
    # Performance metrics
    latency_ms: float = 0.0
    time_to_first_token_ms: Optional[float] = None
    queue_time_ms: float = 0.0
    
    # Status and errors
    status: RequestStatus = RequestStatus.SUCCESS
    http_status_code: int = 200
    error_code: str = ""
    error_message: str = ""
    retry_count: int = 0
    
    # Business context
    user_id: str = ""
    session_id: str = ""
    feature_name: str = ""
    request_source: str = ""  # web, mobile, api, internal
    ip_address: str = ""
    user_agent: str = ""
    
    # Cost tracking (in USD cents for precision)
    cost_cents: float = 0.0
    pricing_model: str = ""
    
    # Security signals
    is_anomalous: bool = False
    threat_indicators: List[str] = field(default_factory=list)
    
    def to_dict(self) -> Dict[str, Any]:
        """Serialize to dictionary with proper type handling."""
        data = {
            "timestamp": self.timestamp.isoformat(),
            "request_id": self.request_id,
            "trace_id": self.trace_id,
            "api_provider": self.api_provider,
            "endpoint": self.endpoint,
            "http_method": self.http_method,
            "model_requested": self.model_requested,
            "model_used": self.model_used,
            "auth_result": self.auth_result,
            "input_tokens": self.input_tokens,
            "output_tokens": self.output_tokens,
            "total_tokens": self.total_tokens,
            "latency_ms": round(self.latency_ms, 2),
            "status": self.status.value,
            "http_status_code": self.http_status_code,
            "error_code": self.error_code,
            "error_message": self.error_message,
            "retry_count": self.retry_count,
            "cost_cents": round(self.cost_cents, 3),
            "pricing_model": self.pricing_model,
            "is_anomalous": self.is_anomalous,
        }
        return {k: v for k, v in data.items() if v != "" and v != []}
    
    def to_json(self) -> str:
        """Serialize to JSON string."""
        return json.dumps(self.to_dict(), default=str)
    
    @staticmethod
    def hash_api_key(api_key: str) -> str:
        """Create a non-reversible hash of the API key for logging."""
        return hashlib.sha256(api_key.encode()).hexdigest()[:16]

Implementing the Audit Logger with HolySheep AI

Now let's implement a production-ready audit logger that integrates seamlessly with HolySheep AI's API endpoints. This implementation handles the complete request lifecycle with comprehensive logging at each stage.

"""
Production AI API Audit Logger
Compatible with HolySheep AI (https://api.holysheep.ai/v1)
"""

import os
import uuid
import time
import json
import logging
import hashlib
from datetime import datetime, timezone
from typing import Optional, Dict, Any, Callable
from contextlib import contextmanager
from dataclasses import asdict
from functools import wraps
import httpx

from your_audit_schema import APIAuditLog, RequestStatus, LogLevel

Configure structured logging

logging.basicConfig( level=logging.INFO, format='%(asctime)s | %(levelname)s | %(name)s | %(message)s' ) audit_logger = logging.getLogger("ai_audit") class HolySheepAuditLogger: """ Comprehensive audit logger for HolySheep AI API. Captures all request/response details for security, compliance, and debugging. """ def __init__( self, api_key: str, base_url: str = "https://api.holysheep.ai/v1", log_storage: Optional[Callable] = None, enable_console_logging: bool = True ): self.api_key = api_key self.base_url = base_url self.log_storage = log_storage or self._default_storage self.enable_console_logging = enable_console_logging # Initialize HTTP client with proper configuration self.client = httpx.Client( timeout=httpx.Timeout(60.0, connect=10.0), limits=httpx.Limits(max_keepalive_connections=20, max_connections=100) ) # Pricing configuration (updated for 2026) self.pricing = { "gpt-4.1": {"input": 2.0, "output": 8.0}, "claude-sonnet-4.5": {"input": 3.0, "output": 15.0}, "gemini-2.5-flash": {"input": 0.30, "output": 2.50}, "deepseek-v3.2": {"input": 0.10, "output": 0.42}, } def _default_storage(self, log_entry: APIAuditLog) -> None: """Default storage: write to structured log file.""" audit_logger.info(log_entry.to_json()) def _calculate_cost(self, model: str, input_tokens: int, output_tokens: int) -> float: """Calculate cost in cents based on HolySheep pricing.""" if model not in self.pricing: return 0.0 pricing = self.pricing[model] input_cost = (input_tokens / 1_000_000) * pricing["input"] * 100 # Convert to cents output_cost = (output_tokens / 1_000_000) * pricing["output"] * 100 return round(input_cost + output_cost, 3) @contextmanager def log_request( self, endpoint: str, model: str, user_id: str = "", session_id: str = "", feature_name: str = "" ): """ Context manager for logging AI API requests with automatic lifecycle management. Usage: with logger.log_request("/chat/completions", "deepseek-v3.2", user_id="user123") as log: response = make_api_call() log.response = response """ log_entry = APIAuditLog( request_id=str(uuid.uuid4()), trace_id=os.environ.get("TRACE_ID", str(uuid.uuid4())), endpoint=endpoint, model_requested=model, api_key_prefix=self.api_key[:8], user_id=user_id, session_id=session_id, feature_name=feature_name, request_source=self._detect_request_source() ) start_time = time.perf_counter() log_entry.timestamp = datetime.now(timezone.utc) try: yield log_entry except httpx.TimeoutException as e: log_entry.status = RequestStatus.TIMEOUT log_entry.error_code = "TIMEOUT" log_entry.error_message = str(e) log_entry.http_status_code = 408 raise except httpx.HTTPStatusError as e: log_entry.status = RequestStatus.CLIENT_ERROR if e.response.status_code < 500 else RequestStatus.SERVER_ERROR log_entry.error_code = str(e.response.status_code) log_entry.error_message = e.response.text[:500] log_entry.http_status_code = e.response.status_code raise finally: log_entry.latency_ms = (time.perf_counter() - start_time) * 1000 log_entry.total_tokens = log_entry.input_tokens + log_entry.output_tokens log_entry.cost_cents = self._calculate_cost( log_entry.model_used or model, log_entry.input_tokens, log_entry.output_tokens ) self._persist_log(log_entry) def _detect_request_source(self) -> str: """Detect where the request originated from.""" return os.environ.get("REQUEST_SOURCE", "api") def _persist_log(self, log_entry: APIAuditLog) -> None: """Persist log entry to configured storage.""" try: self.log_storage(log_entry) if self.enable_console_logging: audit_logger.info( f"API Request | {log_entry.request_id[:8]} | " f"{log_entry.status.value} | {log_entry.latency_ms:.0f}ms | " f"${log_entry.cost_cents:.4f}" ) except Exception as e: audit_logger.error(f"Failed to persist audit log: {e}") def chat_completions( self, messages: list, model: str = "deepseek-v3.2", **kwargs ) -> Dict[str, Any]: """ Make a chat completion request with full audit logging. Args: messages: List of message objects model: Model to use (deepseek-v3.2 recommended for cost efficiency) **kwargs: Additional parameters (temperature, max_tokens, etc.) Returns: API response as dictionary """ headers = { "Authorization": f"Bearer {self.api_key}", "Content-Type": "application/json", "X-Request-ID": str(uuid.uuid4()), } payload = { "model": model, "messages": messages, **{k: v for k, v in kwargs.items() if v is not None} } with self.log_request( endpoint="/chat/completions", model=model, user_id=payload.get("user", ""), feature_name=kwargs.get("feature_name", "chat") ) as log_entry: response = self.client.post( f"{self.base_url}/chat/completions", headers=headers, json=payload ) response.raise_for_status() data = response.json() # Extract token counts from response usage = data.get("usage", {}) log_entry.input_tokens = usage.get("prompt_tokens", 0) log_entry.output_tokens = usage.get("completion_tokens", 0) log_entry.model_used = data.get("model", model) log_entry.response = data.get("choices", [{}])[0].get("message", {}).get("content", "") log_entry.auth_result = "SUCCESS" return data def close(self): """Clean up resources.""" self.client.close()

Example usage with error handling

if __name__ == "__main__": API_KEY = os.environ.get("HOLYSHEEP_API_KEY", "YOUR_HOLYSHEEP_API_KEY") logger = HolySheepAuditLogger(api_key=API_KEY) try: response = logger.chat_completions( messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain quantum computing in simple terms."} ], model="deepseek-v3.2", # Most cost-effective at $0.42/MTok output temperature=0.7, max_tokens=500 ) print(f"Response received: {response['choices'][0]['message']['content'][:100]}...") except httpx.HTTPStatusError as e: print(f"API Error: {e.response.status_code} - {e.response.text}") finally: logger.close()

Real-World Error Scenarios and Quick Fixes

Let me share a specific incident I encountered that illustrates why comprehensive audit logging is non-negotiable. Last year, our team noticed that API costs were steadily climbing, but user engagement metrics showed no corresponding increase. Without proper audit logs, this could have gone unnoticed for weeks, resulting in significant budget overruns.

The Silent Token Leak

We discovered that our application's context management had a bug: each user session was accumulating prompt history without proper truncation. Users who left tabs open for hours were sending increasingly large context windows with every request. By analyzing our audit logs, we identified the exact session patterns causing the leak, implemented sliding window context management, and reduced our API costs by 47% overnight.

Had we not implemented the logging strategy outlined in this guide, we would have had no visibility into this silent cost drain.

Advanced Security Monitoring Patterns

Beyond basic request logging, sophisticated security monitoring requires pattern detection and anomaly identification. Here's an implementation that goes beyond simple logging to actively detect security threats.

"""
AI API Security Monitor
Real-time threat detection and anomaly identification for AI API usage.
"""

from typing import List, Dict, Set
from dataclasses import dataclass, field
from datetime import datetime, timedelta, timezone
from collections import defaultdict, deque
import statistics
import hashlib

@dataclass
class AnomalyRule:
    """Defines a rule for detecting anomalous behavior."""
    name: str
    description: str
    threshold: float
    time_window_seconds: int
    severity: str = "MEDIUM"  # LOW, MEDIUM, HIGH, CRITICAL
    
    def evaluate(self, current_value: float, baseline: float) -> bool:
        """Evaluate if current value exceeds threshold based on baseline."""
        if baseline == 0:
            return current_value > self.threshold
        deviation = abs(current_value - baseline) / baseline
        return deviation > self.threshold

class SecurityMonitor:
    """
    Real-time security monitoring for AI API usage.
    Detects anomalies, identifies potential threats, and triggers alerts.
    """
    
    def __init__(self, lookback_window_minutes: int = 60):
        self.lookback = timedelta(minutes=lookback_window_minutes)
        self.events: deque = deque(maxlen=10000)
        
        # Tracking structures
        self.request_counts: Dict[str, deque] = defaultdict(lambda: deque(maxlen=1000))
        self.token_usage: Dict[str, deque] = defaultdict(lambda: deque(maxlen=1000))
        self.error_counts: Dict[str, deque] = defaultdict(lambda: deque(maxlen=100))
        self.failed_auths: deque = deque(maxlen=100)
        
        # Define anomaly detection rules
        self.rules = [
            AnomalyRule(
                name="high_request_volume",
                description="Unusually high request volume from single source",
                threshold=3.0,  # 3x baseline
                time_window_seconds=300,
                severity="HIGH"
            ),
            AnomalyRule(
                name="token_spike",
                description="Token usage significantly above normal",
                threshold=5.0,
                time_window_seconds=60,
                severity="MEDIUM"
            ),
            AnomalyRule(
                name="high_error_rate",
                description="Error rate exceeds acceptable threshold",
                threshold=0.25,  # 25% error rate
                time_window_seconds=300,
                severity="CRITICAL"
            ),
            AnomalyRule(
                name="repeated_auth_failures",
                description="Multiple authentication failures detected",
                threshold=3.0,
                time_window_seconds=60,
                severity="HIGH"
            ),
        ]
    
    def record_event(self, event: Dict) -> List[Dict]:
        """
        Record an event and check for anomalies.
        Returns list of detected anomalies.
        """
        self.events.append({
            "timestamp": datetime.now(timezone.utc),
            **event
        })
        
        user_id = event.get("user_id", "anonymous")
        api_key = event.get("api_key_prefix", "")
        
        # Update tracking structures
        self.request_counts[user_id].append(event)
        self.request_counts[api_key].append(event)
        
        if event.get("total_tokens", 0) > 0:
            self.token_usage[user_id].append(event["total_tokens"])
        
        if event.get("status") in ["AUTH_FAILED", "ERROR"]:
            self.error_counts[user_id].append(event)
            if event.get("status") == "AUTH_FAILED":
                self.failed_auths.append({
                    "timestamp": datetime.now(timezone.utc),
                    "api_key_prefix": api_key,
                    "ip": event.get("ip_address", "")
                })
        
        # Run anomaly detection
        return self._detect_anomalies(user_id, api_key)
    
    def _detect_anomalies(self, user_id: str, api_key: str) -> List[Dict]:
        """Run all anomaly detection rules against current metrics."""
        anomalies = []
        now = datetime.now(timezone.utc)
        
        for rule in self.rules:
            # Get events within the rule's time window
            cutoff = now - timedelta(seconds=rule.time_window_seconds)
            
            if "auth_failure" in rule.name:
                recent_failures = [
                    f for f in self.failed_auths
                    if f["timestamp"] > cutoff
                ]
                current_value = len(recent_failures)
                baseline = 1  # Baseline of 1 failure
            else:
                # Get events for this user/key within time window
                if user_id in self.request_counts:
                    recent_events = [
                        e for e in self.request_counts[user_id]
                        if e["timestamp"] > cutoff
                    ]
                else:
                    recent_events = []
                
                if "error_rate" in rule.name:
                    current_value = len([
                        e for e in recent_events
                        if e.get("status") in ["AUTH_FAILED", "ERROR"]
                    ])
                    if len(recent_events) > 0:
                        current_value = current_value / len(recent_events)
                    else:
                        current_value = 0
                    baseline = 0.05  # 5% baseline error rate
                elif "token" in rule.name:
                    if user_id in self.token_usage and self.token_usage[user_id]:
                        current_value = sum(list(self.token_usage[user_id])[-10:])
                    else:
                        current_value = 0
                    baseline = statistics.mean(self.token_usage[user_id]) if self.token_usage[user_id] else 1000
                else:  # Request volume
                    current_value = len(recent_events)
                    baseline = statistics.mean([
                        len(list(self.request_counts[user_id])[max(0, i-10):i])
                        for i in range(10, len(self.request_counts[user_id]) + 1, 10)
                    ]) if len(self.request_counts[user_id]) > 10 else 5
            
            if rule.evaluate(current_value, baseline):
                anomalies.append({
                    "rule": rule.name,
                    "description": rule.description,
                    "severity": rule.severity,
                    "current_value": current_value,
                    "baseline": baseline,
                    "threshold": rule.threshold,
                    "user_id": user_id,
                    "timestamp": now.isoformat(),
                    "action_required": rule.severity in ["HIGH", "CRITICAL"]
                })
        
        return anomalies
    
    def get_security_report(self) -> Dict:
        """Generate a comprehensive security report."""
        now = datetime.now(timezone.utc)
        hour_ago = now - timedelta(hours=1)
        
        recent_events = [e for e in self.events if e["timestamp"] > hour_ago]
        
        return {
            "report_time": now.isoformat(),
            "period": "last_hour",
            "total_requests": len(recent_events),
            "unique_users": len(set(e.get("user_id") for e in recent_events)),
            "total_tokens": sum(e.get("total_tokens", 0) for e in recent_events),
            "total_cost_cents": sum(e.get("cost_cents", 0) for e in recent_events),
            "error_count": len([
                e for e in recent_events
                if e.get("status") in ["AUTH_FAILED", "ERROR", "TIMEOUT"]
            ]),
            "auth_failures_last_hour": len([
                f for f in self.failed_auths
                if f["timestamp"] > hour_ago
            ]),
            "anomaly_alerts": self._get_active_anomalies()
        }
    
    def _get_active_anomalies(self) -> List[Dict]:
        """Get currently active anomaly alerts."""
        active = []
        for key in list(self.request_counts.keys())[:100]:  # Limit scope
            anomalies = self._detect_anomalies(key, key)
            active.extend(anomalies)
        return active[:20]  # Return top 20 most recent


Integration example with the audit logger

class SecureHolySheepLogger(HolySheepAuditLogger): """ Enhanced audit logger with real-time security monitoring. """ def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self.monitor = SecurityMonitor(lookback_window_minutes=60) self.alert_callbacks: List[callable] = [] def add_alert_callback(self, callback: callable): """Add a callback function to be called when alerts are triggered.""" self.alert_callbacks.append(callback) def _trigger_alerts(self, anomalies: List[Dict]): """Trigger registered alert callbacks for detected anomalies.""" for anomaly in anomalies: if anomaly.get("action_required"): for callback in self.alert_callbacks: try: callback(anomaly) except Exception as e: audit_logger.error(f"Alert callback failed: {e}") def chat_completions(self, *args, **kwargs) -> Dict[str, Any]: """Enhanced chat completions with security monitoring.""" response = super().chat_completions(*args, **kwargs) # Record event in security monitor anomalies = self.monitor.record_event({ "timestamp": datetime.now(timezone.utc), "user_id": kwargs.get("user", ""), "api_key_prefix": self.api_key[:8], "model": kwargs.get("model", "deepseek-v3.2"), "total_tokens": response.get("usage", {}).get("total_tokens", 0), "cost_cents": 0, # Would calculate from response "status": "SUCCESS" }) if anomalies: self._trigger_alerts(anomalies) return response

Common Errors and Fixes

Through years of implementing AI API integrations, I've compiled the most frequent issues teams encounter. Here's how to diagnose and resolve them quickly.

Error 1: 401 Unauthorized — Invalid or Expired API Key

Symptom: All requests fail with 401 Unauthorized and the message "Invalid authentication credentials".

Common Causes:

Solution Code:

"""
Fix for 401 Unauthorized errors with HolySheep AI
"""

import os
import httpx
from typing import Optional

def validate_and_test_api_key(api_key: str) -> dict:
    """
    Validate API key and test connectivity to HolySheep AI.
    Returns detailed diagnostics for troubleshooting 401 errors.
    """
    base_url = "https://api.holysheep.ai/v1"
    
    # Step 1: Check key format
    diagnostics = {
        "key_length": len(api_key),
        "key_prefix": api_key[:8] if api_key else "MISSING",
        "has_whitespace": api_key != api_key.strip() if api_key else True,
        "format_valid": False,
        "connectivity": False,
        "auth_success": False,
        "errors": []
    }
    
    # Validate format
    if not api_key:
        diagnostics["errors"].append("API key is empty or None")
        return diagnostics
    
    if diagnostics["has_whitespace"]:
        diagnostics["errors"].append("API key contains leading/trailing whitespace")
        api_key = api_key.strip()
    
    if len(api_key) < 20:
        diagnostics["errors"].append(f"API key too short ({len(api_key)} chars). Expected 32+")
    
    diagnostics["format_valid"] = len(diagnostics["errors"]) == 0
    
    # Step 2: Test connectivity
    try:
        response = httpx.get(
            f"{base_url}/models",
            headers={"Authorization": f"Bearer {api_key}"},
            timeout=10.0
        )
        
        diagnostics["connectivity"] = True
        diagnostics["http_status"] = response.status_code
        
        if response.status_code == 200:
            diagnostics["auth_success"] = True
            diagnostics["available_models"] = [
                m.get("id") for m in response.json().get("data", [])
            ][:5]  # First 5 models
        elif response.status_code == 401:
            diagnostics["errors"].append("Authentication failed - invalid or expired key")
            diagnostics["auth_error_detail"] = response.text
        elif response.status_code == 403:
            diagnostics["errors"].append("Access forbidden - check key permissions")
        else:
            diagnostics["errors"].append(f"Unexpected status: {response.status_code}")
            
    except httpx.ConnectError:
        diagnostics["errors"].append("Cannot connect to API - check network/firewall")
    except httpx.TimeoutException:
        diagnostics["errors"].append("Connection timeout - API may be experiencing issues")
    except Exception as e:
        diagnostics["errors"].append(f"Unexpected error: {str(e)}")
    
    return diagnostics

Usage in your application

def get_holysheep_client(): """Get configured HolySheep client with proper error handling.""" api_key = os.environ.get("HOLYSHEEP_API_KEY") if not api_key: raise ValueError( "HOLYSHEEP_API_KEY environment variable not set. " "Get your key at https://www.holysheep.ai/register" ) # Validate before creating client diagnostics = validate_and_test_api_key(api_key) if not diagnostics["auth_success"]: print(f"API Key Validation Failed:") for error in diagnostics["errors"]: print(f" - {error}") raise ValueError("Invalid API key configuration") print(f"✓ API key validated: {diagnostics['key_prefix']}***") print(f"✓ Available models: {', '.join(diagnostics.get('available_models', []))}") return api_key # Return validated key for client creation

Test it

if __name__ == "__main__": test_key = "YOUR_HOLYSHEEP_API_KEY" # Replace with actual key result = validate_and_test_api_key(test_key) print(json.dumps(result, indent=2))

Error 2: Connection Timeout — Request Hangs or Fails After 30 Seconds

Symptom: Requests hang indefinitely or timeout with ConnectTimeout or ReadTimeout errors. This is especially common with large prompts or complex models.

Common Causes:

Solution Code:

"""
Fix for timeout errors with HolySheep AI API
Implements proper timeout configuration and retry logic
"""

import time
import httpx
from typing import Optional, Dict, Any
from dataclasses import dataclass

@dataclass
class TimeoutConfig:
    """Proper timeout configuration for AI API calls."""
    connect: float = 10.0    # Connection establishment timeout
    read: float = 120.0      # Response read timeout (AI APIs need more time)
    write: float = 30.0      # Request body send timeout
    pool: float = 10.0       # Connection pool acquisition timeout
    
    def to_httpx_timeout(self) -> httpx.Timeout:
        return httpx.Timeout(
            connect=self.connect,
            read=self.read,
            write=self.write,
            pool=self.pool
        )

class TimeoutResilientClient:
    """
    AI API client with proper timeout handling and intelligent retry logic.
    """
    
    def __init__(
        self,
        api_key: str,
        base_url: str = "https://api.holysheep.ai/v1",
        timeout_config: Optional[TimeoutConfig] = None
    ):
        self.api_key = api_key
        self.base_url = base_url
        self.timeout = timeout_config or TimeoutConfig()
        
        # Create client with proper timeouts
        self.client = httpx.Client(
            timeout=self.timeout.to_httpx_timeout(),
            limits=httpx.Limits(
                max_keepalive_connections=20,
                max_connections=100,
                keepalive_expiry=30.0
            )
        )
    
    def chat_completions_with_retry(
        self,
        messages: list,
        model: str = "deepseek-v3.2",
        max_retries: int = 3,
        retry_delay: float = 2.0,
        **kwargs
    ) -> Dict[str, Any]:
        """
        Make API request with automatic timeout handling and retry logic.
        
        HolySheep AI's <50ms latency means retries are usually unnecessary,
        but this provides resilience for edge cases.
        """
        last_exception = None
        
        for attempt in range(max_retries + 1):
            try:
                response = self.client.post(
                    f"{self.base_url}/chat/completions",
                    headers={
                        "Authorization": f"Bearer {self.api_key}",
                        "Content-Type": "application/json"
                    },
                    json={
                        "model": model,
                        "messages": messages,
                        **{k: v for k, v in kwargs.items() if v is not None}
                    }
                )
                response.raise_for_status()
                return response.json()
                
            except httpx.TimeoutException as e:
                last_exception = e
                timeout_type = type(e).__name__
                
                if attempt < max_retries:
                    wait_time = retry_delay * (2 ** attempt)  # Exponential backoff
                    print(f"Timeout ({timeout_type}) on attempt {attempt + 1}. "
                          f"Retrying in {wait_time}s...")
                    time.sleep(wait_time)
                else:
                    print(f"Failed after {max_retries + 1} attempts due to timeout")
                    
            except httpx.HTTPStatusError as e:
                # Don't retry client errors (4xx) except 429 (rate limit)
                if e.response