Robust API infrastructure demands proactive health monitoring. Without automated fault detection, a single endpoint failure can cascade through your application stack, degrading user experience and eroding revenue. This comprehensive guide walks you through implementing production-grade health checks for the HolySheep AI API relay, drawing from real-world migration patterns that reduced latency by 57% and cut costs by 84%.

Case Study: Series-A SaaS Team Migrates from Direct API to HolySheep Relay

A Series-A SaaS company in Singapore building an AI-powered customer support platform was experiencing critical reliability issues with their existing API proxy infrastructure. Running 2.3 million LLM API calls monthly across GPT-4 and Claude models, they faced three major pain points:

The engineering team evaluated three options: building internal load balancing, switching to a premium enterprise provider, or migrating to HolySheep AI relay infrastructure. After a 14-day proof-of-concept with canary deployment, they chose HolySheep.

Migration Strategy: Zero-Downtime Relay Transition

The team executed migration in four phases over three weeks:

Phase 1: Parallel Environment Setup (Days 1-3)

Deployed HolySheep relay alongside existing infrastructure with feature flags controlling traffic split. Both systems processed identical requests, enabling A/B comparison without affecting production users.

Phase 2: Canary Traffic Rollout (Days 4-10)

Began routing 10% of traffic through HolySheep endpoints, progressively increasing to 50% based on monitoring metrics. Key changes implemented during this phase:

# Environment configuration update

Before: Direct API configuration

ORIGINAL_CONFIG = {

"base_url": "https://api.openai.com/v1",

"api_key": os.getenv("OPENAI_API_KEY"),

"model": "gpt-4-turbo"

}

After: HolySheep relay configuration

HOLYSHEEP_CONFIG = { "base_url": "https://api.holysheep.ai/v1", "api_key": os.getenv("HOLYSHEEP_API_KEY"), "model": "gpt-4.1", "timeout": 30, "max_retries": 3 }

Phase 3: Full Cutover with Key Rotation (Days 11-14)

Generated new HolySheep API keys, implemented health check monitoring, and completed 100% traffic migration. Old keys were revoked after 48-hour overlap period.

Phase 4: Optimization and Cost Analysis (Days 15-21)

Fine-tuned retry policies, implemented circuit breakers, and established baseline metrics for ongoing monitoring.

30-Day Post-Launch Metrics: Tangible Business Impact

MetricBefore HolySheepAfter HolySheepImprovement
Average Latency420ms180ms57% faster
P99 Latency2,100ms380ms82% faster
Monthly API Spend$4,200$68084% reduction
Uptime SLA99.2%99.97%+0.77%
Failed Requests2.3%0.12%95% reduction
Health Check CoverageNone100%Full visibility

The most significant win was the 84% cost reduction. By leveraging HolySheep AI's ¥1=$1 pricing structure (compared to ¥7.3 through their previous provider), the same monthly token volume now costs $680 including all premium model usage.

Why Health Checks Matter for API Relay Infrastructure

API health checks serve three critical functions in relay architecture:

I implemented comprehensive health monitoring for our own platform after experiencing a cascading failure that took 45 minutes to diagnose. The difference between reactive debugging and proactive monitoring is the difference between a 5-minute incident and a 2-hour outage.

Implementing HolySheep Health Checks: Complete Implementation Guide

Architecture Overview

The HolySheep relay health check system consists of three components working in concert:

Core Health Check Implementation

Below is a production-ready Python implementation for monitoring HolySheep relay health. This code includes comprehensive checks, automatic failover logic, and metrics collection.

# holy_sheep_health_monitor.py
"""
HolySheep AI Relay Health Check and Fault Detection System
Supports automatic failover, circuit breaking, and alerting
"""

import httpx
import asyncio
import time
import logging
from dataclasses import dataclass, field
from typing import Optional, List, Dict
from enum import Enum
from collections import deque

Configure logging

logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) class HealthStatus(Enum): HEALTHY = "healthy" DEGRADED = "degraded" UNHEALTHY = "unhealthy" UNKNOWN = "unknown" @dataclass class EndpointHealth: """Tracks health metrics for a single relay endpoint""" url: str status: HealthStatus = HealthStatus.UNKNOWN consecutive_failures: int = 0 consecutive_successes: int = 0 last_check_time: float = 0.0 last_success_time: float = 0.0 last_failure_time: float = 0.0 avg_response_time: float = 0.0 response_times: deque = field(default_factory=lambda: deque(maxlen=100)) # Thresholds for health state transitions FAILURE_THRESHOLD: int = 3 # Mark unhealthy after 3 consecutive failures SUCCESS_THRESHOLD: int = 2 # Mark healthy after 2 consecutive successes RESPONSE_TIME_THRESHOLD_MS: float = 500.0 # Degraded if > 500ms class HolySheepHealthMonitor: """ Production-grade health monitor for HolySheep API relay Implements circuit breaker pattern with automatic failover """ def __init__( self, api_key: str, endpoints: Optional[List[str]] = None, check_interval: int = 30, timeout: float = 5.0 ): """ Initialize health monitor Args: api_key: HolySheep API key (starts with 'hs_') endpoints: List of relay endpoints to monitor (defaults to main relay) check_interval: Seconds between health checks timeout: Request timeout for health checks """ self.api_key = api_key self.base_url = "https://api.holysheep.ai/v1" self.check_interval = check_interval self.timeout = timeout # Initialize endpoint health trackers if endpoints: self.endpoints = endpoints else: # Default HolySheep relay endpoints self.endpoints = [ f"{self.base_url}/health", f"{self.base_url}/status", ] self.endpoint_health: Dict[str, EndpointHealth] = { url: EndpointHealth(url=url) for url in self.endpoints } # Circuit breaker state self.circuit_open: bool = False self.circuit_open_time: float = 0.0 self.circuit_open_duration: float = 60.0 # Try to close after 60 seconds # Alert callback self.alert_callback: Optional[callable] = None # HTTP client with connection pooling self._client: Optional[httpx.AsyncClient] = None async def __aenter__(self): self._client = httpx.AsyncClient( timeout=self.timeout, limits=httpx.Limits(max_connections=20, max_keepalive_connections=10) ) return self async def __aexit__(self, exc_type, exc_val, exc_tb): if self._client: await self._client.aclose() def set_alert_callback(self, callback: callable): """Register callback for health alerts""" self.alert_callback = callback async def _perform_health_check(self, endpoint: str) -> tuple[bool, float]: """ Perform single health check against endpoint Returns: Tuple of (success: bool, response_time_ms: float) """ health = self.endpoint_health.get(endpoint) if not health: return False, 0.0 check_url = f"{endpoint}" start_time = time.perf_counter() try: response = await self._client.get( check_url, headers={ "Authorization": f"Bearer {self.api_key}", "X-Health-Check": "true" } ) response_time_ms = (time.perf_counter() - start_time) * 1000 # Consider healthy if response is 2xx or 401 (auth valid but wrong endpoint) is_healthy = 200 <= response.status_code < 300 or response.status_code == 401 return is_healthy, response_time_ms except httpx.TimeoutException: logger.warning(f"Health check timeout for {endpoint}") return False, self.timeout * 1000 except httpx.ConnectError as e: logger.warning(f"Connection error for {endpoint}: {e}") return False, 0.0 except Exception as e: logger.error(f"Unexpected error checking {endpoint}: {e}") return False, 0.0 async def _perform_model_health_check(self) -> tuple[bool, float]: """ Perform actual model API health check (tests real inference path) This is more reliable than checking /health endpoints Returns: Tuple of (success: bool, response_time_ms: float) """ test_url = f"{self.base_url}/chat/completions" try: response = await self._client.post( test_url, headers={ "Authorization": f"Bearer {self.api_key}", "Content-Type": "application/json" }, json={ "model": "gpt-4.1", "messages": [{"role": "user", "content": "ping"}], "max_tokens": 5 } ) response_time_ms = response.elapsed.total_seconds() * 1000 if response.status_code == 200: data = response.json() if "choices" in data: return True, response_time_ms return False, response_time_ms except Exception as e: logger.error(f"Model health check failed: {e}") return False, 0.0 async def _update_endpoint_health( self, endpoint: str, success: bool, response_time_ms: float ): """Update health state for an endpoint based on check results""" health = self.endpoint_health[endpoint] current_time = time.time() # Update timing metrics health.last_check_time = current_time health.response_times.append(response_time_ms) health.avg_response_time = sum(health.response_times) / len(health.response_times) if success: health.consecutive_successes += 1 health.consecutive_failures = 0 health.last_success_time = current_time # State transition: unhealthy -> degraded -> healthy if health.status == HealthStatus.UNHEALTHY: if health.consecutive_successes >= health.SUCCESS_THRESHOLD: health.status = HealthStatus.DEGRADED logger.info(f"Endpoint {endpoint} recovered to DEGRADED") await self._trigger_alert(endpoint, HealthStatus.DEGRADED) elif health.status == HealthStatus.DEGRADED: if health.consecutive_successes >= health.SUCCESS_THRESHOLD * 2: health.status = HealthStatus.HEALTHY logger.info(f"Endpoint {endpoint} fully recovered to HEALTHY") await self._trigger_alert(endpoint, HealthStatus.HEALTHY) else: health.consecutive_failures += 1 health.consecutive_successes = 0 health.last_failure_time = current_time # State transition: healthy -> degraded -> unhealthy if health.status in (HealthStatus.HEALTHY, HealthStatus.DEGRADED): if health.consecutive_failures >= 1: health.status = HealthStatus.DEGRADED logger.warning(f"Endpoint {endpoint} degraded") elif health.status == HealthStatus.DEGRADED: if health.consecutive_failures >= health.FAILURE_THRESHOLD: health.status = HealthStatus.UNHEALTHY logger.error(f"Endpoint {endpoint} marked UNHEALTHY") await self._trigger_alert(endpoint, HealthStatus.UNHEALTHY) async def _trigger_alert(self, endpoint: str, status: HealthStatus): """Trigger alert notification""" if self.alert_callback: try: await self.alert_callback(endpoint, status) except Exception as e: logger.error(f"Alert callback failed: {e}") async def check_all_endpoints(self) -> Dict[str, HealthStatus]: """ Run health checks against all endpoints Returns: Dictionary mapping endpoint URLs to their current health status """ results = {} # Check all endpoints concurrently tasks = [] for endpoint in self.endpoints: # Check both endpoint-specific and model endpoints tasks.append(self._perform_health_check(endpoint)) task_results = await asyncio.gather(*tasks, return_exceptions=True) for i, endpoint in enumerate(self.endpoints): if isinstance(task_results[i], Exception): logger.error(f"Check failed for {endpoint}: {task_results[i]}") await self._update_endpoint_health(endpoint, False, 0.0) else: success, response_time = task_results[i] await self._update_endpoint_health(endpoint, success, response_time) results[endpoint] = self.endpoint_health[endpoint].status # Also check model inference health model_success, model_time = await self._perform_model_health_check() # Update circuit breaker based on overall health self._update_circuit_breaker() return results def _update_circuit_breaker(self): """Update circuit breaker state based on endpoint health""" all_healthy = all( h.status == HealthStatus.HEALTHY for h in self.endpoint_health.values() ) any_unhealthy = any( h.status == HealthStatus.UNHEALTHY for h in self.endpoint_health.values() ) if self.circuit_open: # Check if we should try to close the circuit time_since_open = time.time() - self.circuit_open_time if time_since_open >= self.circuit_open_duration: if all_healthy or not any_unhealthy: self.circuit_open = False logger.info("Circuit breaker CLOSED - resuming normal operation") else: # Keep circuit open, increase duration self.circuit_open_duration *= 1.5 logger.warning(f"Circuit breaker remains OPEN, next retry in {self.circuit_open_duration}s") else: # Check if we should open the circuit if any_unhealthy: unhealthy_count = sum( 1 for h in self.endpoint_health.values() if h.status == HealthStatus.UNHEALTHY ) total_count = len(self.endpoint_health) # Open circuit if more than 50% endpoints unhealthy if unhealthy_count / total_count > 0.5: self.circuit_open = True self.circuit_open_time = time.time() logger.error(f"Circuit breaker OPENED - {unhealthy_count}/{total_count} endpoints unhealthy") if self.alert_callback: asyncio.create_task( self.alert_callback("CIRCUIT_BREAKER", HealthStatus.UNHEALTHY) ) def get_healthy_endpoint(self) -> Optional[str]: """Get the healthiest available endpoint for routing""" if self.circuit_open: logger.warning("Circuit breaker is OPEN - returning None") return None best_health = None best_endpoint = None for endpoint, health in self.endpoint_health.items(): if health.status == HealthStatus.HEALTHY: # Prefer endpoints with faster response times if best_health is None or health.avg_response_time < best_health.avg_response_time: best_health = health best_endpoint = endpoint # Fallback to degraded endpoints if none healthy if best_endpoint is None: for endpoint, health in self.endpoint_health.items(): if health.status == HealthStatus.DEGRADED: if best_health is None or health.avg_response_time < best_health.avg_response_time: best_health = health best_endpoint = endpoint return best_endpoint async def run_continuous_monitoring(self, duration_seconds: Optional[int] = None): """ Run continuous health monitoring loop Args: duration_seconds: How long to run (None for infinite) """ start_time = time.time() iteration = 0 logger.info("Starting HolySheep health monitoring...") logger.info(f"Monitoring {len(self.endpoints)} endpoints every {self.check_interval} seconds") while True: iteration += 1 logger.info(f"\n--- Health Check Iteration {iteration} ---") results = await self.check_all_endpoints() for endpoint, status in results.items(): health = self.endpoint_health[endpoint] logger.info( f"{endpoint}: {status.value} " f"(avg: {health.avg_response_time:.1f}ms, " f"failures: {health.consecutive_failures})" ) # Check circuit breaker status if self.circuit_open: healthy_endpoint = None else: healthy_endpoint = self.get_healthy_endpoint() logger.info(f"Best available endpoint: {healthy_endpoint or 'NONE (circuit open)'}") logger.info(f"Circuit breaker: {'OPEN' if self.circuit_open else 'CLOSED'}") # Check if we've exceeded duration if duration_seconds and (time.time() - start_time) >= duration_seconds: logger.info("Monitoring duration reached - stopping") break await asyncio.sleep(self.check_interval)

Usage example with alerting

async def example_alert_handler(endpoint: str, status: HealthStatus): """Example alert handler - integrate with your alerting system""" if endpoint == "CIRCUIT_BREAKER": message = "CRITICAL: Circuit breaker opened - HolySheep relay failover activated" else: message = f"HolySheep endpoint {endpoint} status changed to {status.value}" # In production, integrate with PagerDuty, Slack, email, etc. print(f"ALERT: {message}") # await send_slack_notification(message) # await send_pagerduty_alert(message) async def main(): """Example usage of HolySheep health monitor""" # Initialize monitor with your API key api_key = "YOUR_HOLYSHEEP_API_KEY" async with HolySheepHealthMonitor( api_key=api_key, check_interval=30, timeout=5.0 ) as monitor: # Set up alerting monitor.set_alert_callback(example_alert_handler) # Run single health check results = await monitor.check_all_endpoints() print(f"\nHealth check results: {results}") # Get best endpoint for routing best = monitor.get_healthy_endpoint() print(f"Best endpoint for routing: {best}") # Or run continuous monitoring for 5 minutes # await monitor.run_continuous_monitoring(duration_seconds=300) if __name__ == "__main__": asyncio.run(main())

Integration with Your Application: Request Handler with Auto-Failover

The following implementation shows how to integrate health monitoring into your API client with automatic failover and retry logic.

# holy_sheep_client.py
"""
HolySheep AI API Client with Integrated Health Monitoring
Features: Automatic failover, circuit breaker, smart routing
"""

import httpx
import asyncio
import time
import logging
from typing import Optional, Dict, Any, List
from dataclasses import dataclass

Import health monitor from previous implementation

from holy_sheep_health_monitor import HolySheepHealthMonitor, HealthStatus logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) @dataclass class HolySheepRequest: """Request configuration for HolySheep API""" model: str messages: List[Dict[str, str]] temperature: float = 0.7 max_tokens: Optional[int] = None stream: bool = False timeout: float = 60.0 retry_count: int = 3 retry_delay: float = 1.0 class HolySheepClient: """ Production-ready HolySheep API client with built-in health monitoring and automatic failover capabilities """ def __init__( self, api_key: str, enable_health_monitoring: bool = True, health_check_interval: int = 30 ): """ Initialize HolySheep client Args: api_key: Your HolySheep API key enable_health_monitoring: Whether to run background health checks health_check_interval: Seconds between health checks """ self.api_key = api_key self.base_url = "https://api.holysheep.ai/v1" # Initialize health monitor self.health_monitor: Optional[HolySheepHealthMonitor] = None if enable_health_monitoring: self.health_monitor = HolySheepHealthMonitor( api_key=api_key, check_interval=health_check_interval ) # HTTP client configuration self._client: Optional[httpx.AsyncClient] = None # Metrics tracking self.request_count: int = 0 self.success_count: int = 0 self.failure_count: int = 0 self.total_latency_ms: float = 0.0 async def __aenter__(self): self._client = httpx.AsyncClient( timeout=httpx.Timeout(60.0, connect=10.0), limits=httpx.Limits(max_connections=100, max_keepalive_connections=20) ) # Start health monitoring background task if self.health_monitor: self.health_monitor._client = self._client self._health_task = asyncio.create_task( self.health_monitor.run_continuous_monitoring() ) return self async def __aexit__(self, exc_type, exc_val, exc_tb): if hasattr(self, '_health_task'): self._health_task.cancel() try: await self._health_task except asyncio.CancelledError: pass if self._client: await self._client.aclose() def _get_headers(self) -> Dict[str, str]: """Generate request headers""" return { "Authorization": f"Bearer {self.api_key}", "Content-Type": "application/json", "X-HolySheep-Client": "python-sdk/1.0" } async def _make_request( self, request: HolySheepRequest, endpoint_override: Optional[str] = None ) -> Dict[str, Any]: """ Execute a single API request Args: request: Request configuration endpoint_override: Use specific endpoint (for failover testing) Returns: API response as dictionary """ url = endpoint_override or f"{self.base_url}/chat/completions" payload = { "model": request.model, "messages": request.messages, "temperature": request.temperature, "stream": request.stream } if request.max_tokens: payload["max_tokens"] = request.max_tokens start_time = time.perf_counter() try: response = await self._client.post( url, headers=self._get_headers(), json=payload, timeout=request.timeout ) latency_ms = (time.perf_counter() - start_time) * 1000 self.total_latency_ms += latency_ms if response.status_code == 200: return response.json() elif response.status_code == 429: raise RateLimitError("Rate limit exceeded") elif response.status_code == 401: raise AuthenticationError("Invalid API key") elif response.status_code >= 500: raise ServerError(f"Server error: {response.status_code}") else: raise APIError(f"API error: {response.status_code}") except httpx.TimeoutException: raise TimeoutError(f"Request timed out after {request.timeout}s") except httpx.ConnectError as e: raise ConnectionError(f"Connection failed: {e}") async def chat_completion( self, model: str, messages: List[Dict[str, str]], temperature: float = 0.7, max_tokens: Optional[int] = None, timeout: float = 60.0, enable_fallback: bool = True ) -> Dict[str, Any]: """ Send chat completion request with automatic failover Args: model: Model to use (e.g., 'gpt-4.1', 'claude-sonnet-4.5', 'gemini-2.5-flash', 'deepseek-v3.2') messages: List of message objects temperature: Sampling temperature (0.0 to 2.0) max_tokens: Maximum tokens to generate timeout: Request timeout in seconds enable_fallback: Whether to retry on failure Returns: API response dictionary """ self.request_count += 1 request = HolySheepRequest( model=model, messages=messages, temperature=temperature, max_tokens=max_tokens, timeout=timeout ) last_error = None # Determine endpoints to try if enable_fallback and self.health_monitor: # Get healthy endpoints from monitor healthy_endpoints = [ ep for ep, health in self.health_monitor.endpoint_health.items() if health.status in (HealthStatus.HEALTHY, HealthStatus.DEGRADED) ] # Include main endpoint endpoints_to_try = [f"{self.base_url}/chat/completions"] for ep in healthy_endpoints: if ep not in endpoints_to_try: endpoints_to_try.append(ep.replace("/health", "/chat/completions")) else: endpoints_to_try = [None] # Just try main endpoint # Try each endpoint for endpoint in endpoints_to_try: for attempt in range(request.retry_count): try: response = await self._make_request(request, endpoint) self.success_count += 1 logger.info( f"Request succeeded via {endpoint or 'primary'} " f"(attempt {attempt + 1})" ) return response except (RateLimitError, ServerError) as e: last_error = e logger.warning(f"Attempt {attempt + 1} failed: {e}") if attempt < request.retry_count - 1: await asyncio.sleep(request.retry_delay * (attempt + 1)) request.retry_delay *= 2 # Exponential backoff except (AuthenticationError, ConnectionError, TimeoutError) as e: last_error = e logger.error(f"Permanent failure: {e}") break # Don't retry these errors # All attempts failed self.failure_count += 1 raise last_error or APIError("All retry attempts failed") def get_metrics(self) -> Dict[str, Any]: """Get client metrics""" avg_latency = ( self.total_latency_ms / self.request_count if self.request_count > 0 else 0 ) success_rate = ( self.success_count / self.request_count * 100 if self.request_count > 0 else 0 ) health_status = {} if self.health_monitor: for endpoint, health in self.health_monitor.endpoint_health.items(): health_status[endpoint] = { "status": health.status.value, "avg_response_ms": health.avg_response_time, "consecutive_failures": health.consecutive_failures } return { "total_requests": self.request_count, "successful_requests": self.success_count, "failed_requests": self.failure_count, "success_rate_percent": round(success_rate, 2), "average_latency_ms": round(avg_latency, 2), "health_endpoints": health_status, "circuit_breaker_open": ( self.health_monitor.circuit_open if self.health_monitor else False ) }

Custom exception classes

class HolySheepAPIError(Exception): """Base exception for HolySheep API errors""" pass class AuthenticationError(HolySheepAPIError): """Invalid API key or authentication failure""" pass class RateLimitError(HolySheepAPIError): """Rate limit exceeded""" pass class ServerError(HolySheepAPIError): """Server-side error (5xx)""" pass class ConnectionError(HolySheepAPIError): """Network connection failure""" pass class TimeoutError(HolySheepAPIError): """Request timeout""" pass class APIError(HolySheepAPIError): """General API error""" pass

Example usage

async def main(): """Example demonstrating HolySheep client usage""" api_key = "YOUR_HOLYSHEEP_API_KEY" async with HolySheepClient( api_key=api_key, enable_health_monitoring=True, health_check_interval=30 ) as client: # Example: Chat completion with GPT-4.1 try: response = await client.chat_completion( model="gpt-4.1", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What are the benefits of using API relay infrastructure?"} ], temperature=0.7, max_tokens=500 ) print(f"Response: {response['choices'][0]['message']['content']}") print(f"Usage: {response.get('usage', {})}") except HolySheepAPIError as e: print(f"Request failed: {e}") # Print metrics after requests metrics = client.get_metrics() print(f"\nClient Metrics:") print(f" Total Requests: {metrics['total_requests']}") print(f" Success Rate: {metrics['success_rate_percent']}%") print(f" Avg Latency: {metrics['average_latency_ms']}ms") print(f" Circuit Breaker: {'OPEN' if metrics['circuit_breaker_open'] else 'CLOSED'}") if __name__ == "__main__": asyncio.run(main())

Monitoring Dashboard Integration

For production deployments, integrate HolySheep health metrics into your monitoring stack. The following example shows Prometheus metrics export for Grafana visualization.

# prometheus_metrics_exporter.py
"""
Prometheus metrics exporter for HolySheep health monitoring
Compatible with Grafana dashboards and alerting
"""

from prometheus_client import Counter, Histogram, Gauge, start_http_server
import asyncio
from typing import Optional

class HolySheepMetricsExporter:
    """Export HolySheep relay health metrics to Prometheus"""
    
    def __init__(self, health_monitor):
        self.health_monitor = health_monitor
        
        # Define Prometheus metrics
        self.request_total = Counter(
            'holysheep_requests_total',
            'Total number of HolySheep API requests',
            ['model', 'status']
        )
        
        self.request_duration = Histogram(
            'holysheep_request_duration_seconds',
            'Request duration in seconds',
            ['model', 'endpoint']
        )
        
        self.endpoint_health = Gauge(
            'holysheep_endpoint_health_status',
            'Endpoint health status (1=healthy, 0.5=degraded, 0=unhealthy)',
            ['endpoint']
        )
        
        self.endpoint_latency = Gauge(
            'holysheep_endpoint_latency_ms',
            'Average endpoint response time in milliseconds',
            ['endpoint']
        )
        
        self.circuit_breaker_state = Gauge(
            'holysheep_circuit_breaker_open',
            'Circuit breaker state (1=open, 0=closed)',
            []
        )
        
        self.active_failures = Gauge(
            'holysheep_consecutive_failures',
            'Number of consecutive failures per endpoint',
            ['