MLflow로 파인튜닝 모델 버전 관리 및 자동 배포 파이프라인 구축

프로덕션 환경에서 파인튜닝된 모델을 관리하다 보면 어떤 버전이 어떤 환경에 배포되어 있는지 추적하기 어려워집니다. 저도 세 달 전 새벽에 "Model version mismatch" 오류로 인하여 중요한 API 호출이 실패한 경험이 있습니다. 오늘은 MLflow와 HolySheep AI를 결합하여 파인튜닝 모델의 버전 관리부터 자동 배포 파이프라인까지 구축하는 방법을 단계별로 설명드리겠습니다.

1. 문제 상황: 파인튜닝 모델 관리의 현실

여러 파인튜닝 실험을 진행하다 보면 다음과 같은 문제가 발생합니다:

어떤 하이퍼파라미터 조합이最优 성능을 냈는지 기억이 안남
모델 파일이 어디에 저장되어 있는지 파악이 어려움
새로운 버전 배포 시 기존 호환성이 깨지는 문제
환경별(staging, production) 배포 상태 추적의 어려움

2. MLflow 설치 및 기본 설정

# MLflow 및 관련 패키지 설치
pip install mlflow==2.14.0
pip install boto3==1.34.0
pip install hyperopt==0.2.7
pip install scikit-learn==1.4.0

HolySheep AI SDK 설치
pip install openai==1.12.0

MLflow 서버 실행 (로컬 환경)
mlflow server --backend-store-uri sqlite:///mlflow.db \
    --default-artifact-root ./artifacts \
    --host 0.0.0.0 \
    --port 5000

MLflow 서버가 실행되면 http://localhost:5000에서 웹 UI를 통해 실험 결과를 확인할 수 있습니다. 실제 프로덕션 환경에서는 PostgreSQL과 S3-compatible 스토리지를 사용하는 것을 권장합니다.

3. 파인튜닝 모델 로깅 및 버전 관리

import mlflow
from openai import OpenAI
import json
import time

HolySheep AI API 설정
client = OpenAI(
    api_key="YOUR_HOLYSHEEP_API_KEY",
    base_url="https://api.holysheep.ai/v1"
)

MLflow 실험 설정
mlflow.set_experiment("fine-tuned-sentiment-analysis")

파인튜닝 실행 및 로깅
def train_and_log_model(
    training_file_path: str,
    validation_file_path: str,
    model_base: str = "gpt-4o-mini",
    hyperparameters: dict = None
):
    """
    HolySheep AI에서 파인튜닝 작업을 실행하고 MLflow로 로깅
    """
    mlflow.start_run()
    
    try:
        # 파라미터 로깅
        params = {
            "model_base": model_base,
            "n_epochs": hyperparameters.get("n_epochs", 3),
            "batch_size": hyperparameters.get("batch_size", "auto"),
            "learning_rate_multiplier": hyperparameters.get("learning_rate_multiplier", "auto"),
            "prompt_loss_weight": hyperparameters.get("prompt_loss_weight", 0.01)
        }
        mlflow.log_params(params)
        
        # HolySheep AI에서 파인튜닝 작업 생성
        print(f"[{time.strftime('%H:%M:%S')}] Starting fine-tuning job...")
        start_time = time.time()
        
        # 1단계: 파일 업로드
        with open(training_file_path, "rb") as f:
            training_file = client.files.create(
                file=f,
                purpose="fine-tune"
            )
        
        with open(validation_file_path, "rb") as f:
            validation_file = client.files.create(
                file=f,
                purpose="fine-tune"
            )
        
        # 2단계: 파인튜닝 작업 생성
        fine_tune_job = client.fine_tuning.jobs.create(
            training_file=training_file.id,
            validation_file=validation_file.id,
            model=model_base,
            method="sft",
            hyperparameters=hyperparameters
        )
        
        job_id = fine_tune_job.id
        print(f"Fine-tuning job ID: {job_id}")
        
        # 3단계: 작업 상태 모니터링
        while True:
            job_status = client.fine_tuning.jobs.retrieve(job_id)
            status = job_status.status
            
            print(f"[{time.strftime('%H:%M:%S')}] Status: {status}")
            
            if status == "succeeded":
                trained_model = job_status.fine_tuned_model
                print(f"Training completed! Model: {trained_model}")
                break
            elif status == "failed":
                error_msg = job_status.error.get("message", "Unknown error")
                raise RuntimeError(f"Fine-tuning failed: {error_msg}")
            elif status == "cancelled":
                raise RuntimeError("Fine-tuning job was cancelled")
            
            time.sleep(60)  # 1분마다 상태 확인
        
        training_duration = time.time() - start_time
        
        # 메트릭 로깅
        metrics = {
            "training_duration_seconds": training_duration,
            "training_steps": job_status.trained_tokens // 1000 if job_status.trained_tokens else 0,
            "training_cost_estimate_usd": training_duration * 0.008  # HolySheep estimate
        }
        mlflow.log_metrics(metrics)
        
        # 모델 아티팩트 로깅
        model_info = {
            "model_id": job_id,
            "fine_tuned_model": trained_model,
            "base_model": model_base,
            "training_file": training_file.id,
            "validation_file": validation_file.id,
            "status": status
        }
        
        # 로컬 파일로 모델 메타데이터 저장
        metadata_path = "model_metadata.json"
        with open(metadata_path, "w") as f:
            json.dump(model_info, f, indent=2)
        
        # MLflow에 아티팩트 로깅
        mlflow.log_artifact(metadata_path)
        
        # 태그로 배포 상태 추적
        mlflow.set_tag("deployment_status", "staging")
        mlflow.set_tag("environment", "development")
        
        return trained_model, job_id
        
    except Exception as e:
        mlflow.set_tag("error", str(e))
        raise
    finally:
        mlflow.end_run()

실행 예시
if __name__ == "__main__":
    hyperparameters = {
        "n_epochs": 4,
        "batch_size": "auto",
        "learning_rate_multiplier": 2,
        "prompt_loss_weight": 0.01
    }
    
    try:
        model_name, job_id = train_and_log_model(
            training_file_path="./data/train.jsonl",
            validation_file_path="./data/valid.jsonl",
            model_base="gpt-4o-mini",
            hyperparameters=hyperparameters
        )
        print(f"✓ Model trained and logged: {model_name}")
    except Exception as e:
        print(f"✗ Training failed: {e}")

4. 모델 버전 추적 및 registry 설정

import mlflow
from mlflow.tracking import MlflowClient

MLflow 모델 레지스트리 설정
mlflow.set_registry_uri("sqlite:///mlflow.db")
client = MlflowClient()

def register_model_version(
    run_id: str,
    model_name: str,
    model_alias: str,
    stage: str = "Staging"
):
    """
    학습된 모델을 MLflow 레지스트리에 등록 및 버전 관리
    """
    # 모델 URI 생성
    model_uri = f"runs:/{run_id}/model"
    
    # 모델 레지스트리에 등록
    mv = mlflow.register_model(model_uri, model_name)
    print(f"Registered model: {model_name}, version: {mv.version}")
    
    # 버전별 메타데이터 업데이트
    client.update_model_version(
        name=model_name,
        version=mv.version,
        description=f"Fine-tuned model for sentiment analysis - HolySheep AI job"
    )
    
    # alias 설정 (더 직관적인 버전 참조)
    client.set_registered_model_alias(
        name=model_name,
        alias=model_alias,
        version=mv.version
    )
    
    # stage 전환
    client.transition_model_version_stage(
        name=model_name,
        version=mv.version,
        stage=stage
    )
    
    return mv.version

def get_production_model():
    """
    프로덕션 환경용 모델 조회
    """
    # 방법 1: Stage로 조회
    prod_models = mlflow.search_models(stages=["Production"])
    
    # 방법 2: Alias로 조회 (권장)
    try:
        latest_prod = client.get_model_version_by_alias(
            name="sentiment-analysis-gpt4",
            alias="production"
        )
        return latest_prod.name, latest_prod.version
    except Exception:
        return None, None

def compare_model_versions(model_name: str):
    """
    모든 버전 비교 및 성능 추이 확인
    """
    versions = client.search_model_versions(name=model_name)
    
    print(f"\n{'='*60}")
    print(f"Model: {model_name}")
    print(f"{'='*60}")
    
    for v in sorted(versions, key=lambda x: x.version):
        run = mlflow.get_run(v.run_id)
        metrics = run.data.metrics if run.data.metrics else {}
        
        print(f"\nVersion {v.version} | Stage: {v.current_stage}")
        print(f"  Created: {v.creation_timestamp}")
        print(f"  Description: {v.description}")
        print(f"  Metrics: {metrics}")
        
        # 태그 출력
        tags = v.tags
        if tags:
            print(f"  Tags: {tags}")
    
    return versions

사용 예시
if __name__ == "__main__":
    # 최신 실행(run) 가져오기
    experiment = mlflow.get_experiment_by_name("fine-tuned-sentiment-analysis")
    runs = mlflow.search_runs(experiment_ids=[experiment.experiment_id])
    
    if not runs.empty:
        latest_run = runs.iloc[-1]
        run_id = latest_run.run_id
        
        # 모델 등록
        version = register_model_version(
            run_id=run_id,
            model_name="sentiment-analysis-gpt4",
            model_alias="v1.2.0-stable",
            stage="Staging"
        )
        
        # 모든 버전 비교
        compare_model_versions("sentiment-analysis-gpt4")

5. HolySheep AI를 통한 자동 배포 파이프라인

import mlflow
from openai import OpenAI
from datetime import datetime
import time

client = OpenAI(
    api_key="YOUR_HOLYSHEEP_API_KEY",
    base_url="https://api.holysheep.ai/v1"
)

class ModelDeploymentPipeline:
    """
    MLflow + HolySheep AI 기반 자동 배포 파이프라인
    """
    
    def __init__(self, model_name: str):
        self.model_name = model_name
        self.client = client
        
    def deploy_to_staging(self, model_version: int) -> dict:
        """
        Staging 환경에 배포
        """
        mlflow_client = mlflow.tracking.MlflowClient()
        mv = mlflow_client.get_model_version(self.model_name, model_version)
        
        # MLflow에서 모델 메타데이터 가져오기
        model_uri = f"models:/{self.model_name}/{model_version}"
        
        deploy_info = {
            "model_name": self.model_name,
            "version": model_version,
            "environment": "staging",
            "deployed_at": datetime.now().isoformat(),
            "fine_tuned_model_id": mv.description,
            "endpoint": f"https://api.holysheep.ai/v1/fine-tunes/{mv.description}/inferences"
        }
        
        # MLflow 태그 업데이트
        mlflow_client.update_model_version(
            name=self.model_name,
            version=model_version,
            description=f"Deployed to staging at {deploy_info['deployed_at']}"
        )
        mlflow_client.transition_model_version_stage(
            name=self.model_name,
            version=model_version,
            stage="Staging"
        )
        
        print(f"✓ Deployed {self.model_name}:v{model_version} to Staging")
        return deploy_info
    
    def promote_to_production(self, model_version: int) -> dict:
        """
        프로덕션 환경으로 승격
        """
        mlflow_client = mlflow.tracking.MlflowClient()
        
        # 현재 프로덕션 버전 조회
        current_prod = mlflow_client.get_model_version(
            name=self.model_name,
            version=mlflow_client.get_latest_versions(
                self.model_name, stages=["Production"]
            )[0].version if mlflow_client.get_latest_versions(self.model_name, stages=["Production"]) else None
        )
        
        # 모델 성능 검증 (예시)
        validation_result = self._validate_model(model_version)
        
        if not validation_result["passed"]:
            raise ValueError(f"Model validation failed: {validation_result['reason']}")
        
        # 기존 프로덕션 → Archive
        if current_prod:
            mlflow_client.transition_model_version_stage(
                name=self.model_name,
                version=current_prod.version,
                stage="Archived"
            )
            print(f"Archived previous production version: v{current_prod.version}")
        
        # 새 버전 프로덕션 배포
        mlflow_client.transition_model_version_stage(
            name=self.model_name,
            version=model_version,
            stage="Production"
        )
        
        # Alias 업데이트
        mlflow_client.set_registered_model_alias(
            name=self.model_name,
            alias="production",
            version=model_version
        )
        
        deploy_result = {
            "model_name": self.model_name,
            "version": model_version,
            "previous_version": current_prod.version if current_prod else None,
            "environment": "production",
            "deployed_at": datetime.now().isoformat(),
            "validation_metrics": validation_result
        }
        
        # 배포 이력 로깅
        self._log_deployment(deploy_result)
        
        return deploy_result
    
    def _validate_model(self, model_version: int) -> dict:
        """
        모델 성능 검증 (실제 환경에서는 더 엄격한 검증 필요)
        """
        mlflow_client = mlflow.tracking.MlflowClient()
        mv = mlflow_client.get_model_version(self.model_name, model_version)
        run = mlflow.get_run(mv.run_id)
        
        # 최소 성능 기준치 확인
        metrics = run.data.metrics or {}
        
        # 예시: accuracy >= 0.85
        passed = metrics.get("accuracy", 0) >= 0.85
        
        return {
            "passed": passed,
            "accuracy": metrics.get("accuracy", 0),
            "f1_score": metrics.get("f1_score", 0),
            "reason": "Accuracy meets threshold" if passed else "Accuracy below threshold"
        }
    
    def rollback_to_version(self, version: int) -> dict:
        """
        특정 버전으로 롤백
        """
        mlflow_client = mlflow.tracking.MlflowClient()
        
        # 현재 프로덕션 → Archived
        current_prod = mlflow_client.get_latest_versions(
            self.model_name, stages=["Production"]
        )
        if current_prod:
            mlflow_client.transition_model_version_stage(
                name=self.model_name,
                version=current_prod[0].version,
                stage="Archived"
            )
        
        # 지정된 버전 → Production
        mlflow_client.transition_model_version_stage(
            name=self.model_name,
            version=version,
            stage="Production"
        )
        
        return {
            "status": "success",
            "rolled_back_to": version,
            "deployed_at": datetime.now().isoformat()
        }
    
    def _log_deployment(self, deploy_info: dict):
        """
        배포 이력 MLflow 로깅
        """
        with mlflow.start_run(run_name=f"deployment-{deploy_info['version']}"):
            mlflow.log_params({
                "deployed_model": deploy_info['model_name'],
                "deployed_version": deploy_info['version'],
                "environment": deploy_info['environment']
            })
            mlflow.log_metrics({
                "validation_accuracy": deploy_info['validation_metrics']['accuracy'],
                "validation_f1": deploy_info['validation_metrics']['f1_score']
            })

사용 예시
if __name__ == "__main__":
    pipeline = ModelDeploymentPipeline("sentiment-analysis-gpt4")
    
    # 1. Staging 배포
    staging_result = pipeline.deploy_to_staging(model_version=3)
    
    # 2. 테스트 후 프로덕션 승격
    try:
        prod_result = pipeline.promote_to_production(model_version=3)
        print(f"✓ Production deployment successful: {prod_result}")
    except ValueError as e:
        print(f"✗ Validation failed: {e}")
    
    # 3. 문제 발생 시 롤백
    rollback_result = pipeline.rollback_to_version(version=2)
    print(f"✓ Rollback successful: {rollback_result}")

6. HolySheep AI API 비용 모니터링

import mlflow
from datetime import datetime, timedelta

def calculate_model_cost():
    """
    HolySheep AI 기반 모델 사용 비용 계산 및 로깅
    """
    mlflow_client = mlflow.tracking.MlflowClient()
    
    # HolySheep AI 가격표 (2024년 기준)
    pricing = {
        "gpt-4o": {"input": 2.50, "output": 10.00},      # $/MTok
        "gpt-4o-mini": {"input": 0.15, "output": 0.60},
        "gpt-4.1": {"input": 8.00, "output": 24.00},
        "claude-sonnet-4-20250514": {"input": 3.00, "output": 15.00},
        "gemini-2.5-flash": {"input": 0.40, "output": 2.80}
    }
    
    # 모델별 사용량 통계 조회 (실제 환경에서는 HolySheep 대시보드 API 활용)
    usage_stats = {
        "sentiment-analysis-gpt4": {
            "input_tokens": 1_250_000,
            "output_tokens": 450_000,
            "requests": 15_000,
            "avg_latency_ms": 850
        }
    }
    
    total_cost = 0
    for model_name, stats in usage_stats.items():
        # mlflow에서 모델 정보 조회
        versions = mlflow_client.search_model_versions(name=model_name)
        
        for version in versions:
            if version.current_stage == "Production":
                base_model = version.description or "gpt-4o-mini"
                
                # 비용 계산
                input_cost = (stats["input_tokens"] / 1_000_000) * pricing[base_model]["input"]
                output_cost = (stats["output_tokens"] / 1_000_000) * pricing[base_model]["output"]
                version_cost = input_cost + output_cost
                
                total_cost += version_cost
                
                print(f"\n{'='*50}")
                print(f"Model: {model_name} (v{version.version})")
                print(f"Base Model: {base_model}")
                print(f"Input Tokens: {stats['input_tokens']:,}")
                print(f"Output Tokens: {stats['output_tokens']:,}")
                print(f"Requests: {stats['requests']:,}")
                print(f"Avg Latency: {stats['avg_latency_ms']}ms")
                print(f"Cost: ${version_cost:.2f}")
                
                # MLflow에 비용 로깅
                mlflow.log_metric("monthly_cost_usd", version_cost)
                mlflow.log_metric("avg_latency_ms", stats["avg_latency_ms"])
    
    return total_cost

if __name__ == "__main__":
    monthly_cost = calculate_model_cost()
    print(f"\n{'='*50}")
    print(f"Total Monthly Cost: ${monthly_cost:.2f}")

7. CI/CD 파이프라인 통합

# .github/workflows/model-deployment.yml
name: MLflow Model Deployment

on:
  push:
    branches: [main]
    paths: ['models/**']
  pull_request:
    branches: [main]

jobs:
  train-and-deploy:
    runs-on: ubuntu-latest
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      
      - name: Install dependencies
        run: |
          pip install mlflow openai boto3
          pip install -r requirements.txt
      
      - name: Train model
        env:
          HOLYSHEEP_API_KEY: ${{ secrets.HOLYSHEEP_API_KEY }}
        run: |
          python scripts/train_model.py
      
      - name: Register model to MLflow
        env:
          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
        run: |
          python scripts/register_model.py
      
      - name: Run integration tests
        run: |
          pytest tests/ --model-version=latest
      
      - name: Deploy to staging
        if: github.ref == 'refs/heads/main'
        env:
          HOLYSHEEP_API_KEY: ${{ secrets.HOLYSHEEP_API_KEY }}
        run: |
          python scripts/deploy.py --env=staging
      
      - name: Deploy to production (with approval)
        if: github.ref == 'refs/heads/main'
        environment: 
          name: production
          url: https://api.example.com
        env:
          HOLYSHEEP_API_KEY: ${{ secrets.HOLYSHEEP_API_KEY }}
        run
관련 리소스
📚 AI API 기술 문서
💰 요금제 보기
📖 개발자 문서
🚀 무료 가입
관련 문서
AI 모델 편향 탐지: 공정성 평가 도구와 지표 완벽 가이드
Flutter AI 대화 애플리케이션 개발: HolySheep AI로跨플랫폼 모바일 통합하기
Kotlin Ktor로 AI API 호출하기: 코루틴 동시성 완벽 가이드

1. 문제 상황: 파인튜닝 모델 관리의 현실

2. MLflow 설치 및 기본 설정

HolySheep AI SDK 설치

MLflow 서버 실행 (로컬 환경)

3. 파인튜닝 모델 로깅 및 버전 관리

HolySheep AI API 설정

MLflow 실험 설정

파인튜닝 실행 및 로깅

실행 예시

4. 모델 버전 추적 및 registry 설정

MLflow 모델 레지스트리 설정

사용 예시

5. HolySheep AI를 통한 자동 배포 파이프라인

사용 예시

6. HolySheep AI API 비용 모니터링

7. CI/CD 파이프라인 통합

관련 리소스

관련 문서

🔥 HolySheep AI를 사용해 보세요