量化交易特征工程：用 Order Book 数据构建机器学习因子

ในโลกของ Quantitative Trading หรือ การซื้อขายเชิงปริมาณ การสร้าง Feature Engineering ที่มีประสิทธิภาพคือหัวใจสำคัญของความสำเร็จ โดยเฉพาะอย่างยิ่ง Order Book Data ซึ่งเป็นข้อมูลที่สะท้อนออร์เดอร์ทั้งหมดในตลาดแบบ Real-time ทำให้เราสามารถวิเคราะห์พฤติกรรมราคาและ Volume ได้อย่างแม่นยำ ในบทความนี้เราจะมาเรียนรู้วิธีการสร้าง Machine Learning Features จาก Order Book เพื่อนำไปใช้ในการสร้างโมเดล Trading Strategy ที่ทำกำไรได้จริง

ทำไม Order Book Data ถึงสำคัญในการสร้าง ML Features

Order Book คือข้อมูลที่บันทึกคำสั่งซื้อและคำสั่งขายทั้งหมดในตลาด ณ เวลาใดเวลาหนึ่ง ประกอบด้วย:

Bid Side — คำสั่งซื้อที่รอการจับคู่ จัดเรียงจากราคาสูงสุดไปต่ำสุด
Ask Side — คำสั่งขายที่รอการจับคู่ จัดเรียงจากราคาต่ำสุดไปสูงสุด
Spread — ส่วนต่างระหว่างราคาซื้อและราคาขายสูงสุด
Depth of Market — ปริมาณคำสั่งที่รอดำเนินการในแต่ละระดับราคา

ข้อมูลเหล่านี้ช่วยให้เราเข้าใจ Supply และ Demand ในตลาดแบบ Micro-level ซึ่งสามารถนำมาสร้าง Features ที่ทำนายการเคลื่อนไหวของราคาได้อย่างมีประสิทธิภาพ

การติดตั้งเครื่องมือและ Library

# ติดตั้ง Library ที่จำเป็นสำหรับ Order Book Feature Engineering
pip install pandas numpy scipy sklearn 
pip install ccxt websocket-client requests

สำหรับ Visualization
pip install matplotlib plotly

สำหรับ Machine Learning Model
pip install xgboost lightgbm catboost

การดึงข้อมูล Order Book จาก Exchange

import ccxt
import pandas as pd
import numpy as np
from datetime import datetime

class OrderBookCollector:
    """คลาสสำหรับเก็บข้อมูล Order Book จาก Exchange ต่างๆ"""
    
    def __init__(self, exchange_id='binance'):
        self.exchange = getattr(ccxt, exchange_id)()
        
    def fetch_order_book(self, symbol='BTC/USDT', limit=100):
        """ดึงข้อมูล Order Book ปัจจุบัน"""
        order_book = self.exchange.fetch_order_book(symbol, limit)
        
        bids = pd.DataFrame(order_book['bids'], 
                           columns=['price', 'volume'])
        asks = pd.DataFrame(order_book['asks'], 
                           columns=['price', 'volume'])
        
        return bids, asks
    
    def calculate_spread(self, bids, asks):
        """คำนวณ Spread ระหว่าง Bid และ Ask"""
        best_bid = bids['price'].max()
        best_ask = asks['price'].min()
        spread = best_ask - best_bid
        spread_pct = (spread / best_bid) * 100
        
        return {
            'best_bid': best_bid,
            'best_ask': best_ask,
            'spread': spread,
            'spread_pct': spread_pct
        }

ตัวอย่างการใช้งาน
collector = OrderBookCollector('binance')
bids, asks = collector.fetch_order_book('BTC/USDT', limit=50)
spread_info = collector.calculate_spread(bids, asks)

print(f"Best Bid: {spread_info['best_bid']}")
print(f"Best Ask: {spread_info['best_ask']}")
print(f"Spread: {spread_info['spread_pct']:.4f}%")

การสร้าง Machine Learning Features จาก Order Book

ต่อไปนี้คือ Features สำคัญ ที่เราสามารถสร้างได้จาก Order Book Data:

1. Price-Based Features

import numpy as np
from scipy import stats

def calculate_price_features(bids, asks):
    """
    สร้าง Features ที่เกี่ยวข้องกับราคาจาก Order Book
    
    Features ที่ได้:
    - Spread และ Spread Percentage
    - Mid Price
    - Weighted Mid Price (VWAP-based)
    """
    best_bid = float(bids['price'].max())
    best_ask = float(asks['price'].min())
    
    # Mid Price — ราคากลางระหว่าง Bid และ Ask
    mid_price = (best_bid + best_ask) / 2
    
    # Weighted Mid Price — คำนึงถึง Volume
    bid_volumes = bids['volume'].values[:5]
    ask_volumes = asks['volume'].values[:5]
    
    wmid_price = (best_bid * sum(ask_volumes) + 
                  best_ask * sum(bid_volumes)) / \
                 (sum(bid_volumes) + sum(ask_volumes))
    
    # Spread Features
    spread = best_ask - best_bid
    spread_pct = (spread / mid_price) * 100
    
    # Price Imbalance — ความไม่สมดุลของราคา
    price_imbalance = (best_bid - best_ask) / (best_bid + best_ask) * 100
    
    return {
        'mid_price': mid_price,
        'weighted_mid_price': wmid_price,
        'spread': spread,
        'spread_pct': spread_pct,
        'price_imbalance': price_imbalance
    }

2. Volume-Based Features

def calculate_volume_features(bids, asks, depth=20):
    """
    สร้าง Features ที่เกี่ยวข้องกับ Volume จาก Order Book
    
    Features ที่ได้:
    - Total Bid Volume / Ask Volume
    - Volume Imbalance
    - VWAP (Volume Weighted Average Price)
    - Cumulative Volume Profile
    """
    bid_vol = bids['volume'].values[:depth]
    ask_vol = asks['volume'].values[:depth]
    bid_prices = bids['price'].values[:depth]
    ask_prices = asks['price'].values[:depth]
    
    # Total Volume
    total_bid_vol = np.sum(bid_vol)
    total_ask_vol = np.sum(ask_vol)
    
    # Volume Imbalance — ความไม่สมดุลของ Volume
    # ค่า > 0 หมายถึง Bid มากกว่า (กระทิง), < 0 หมายถึง Ask มากกว่า (หมี)
    vol_imbalance = (total_bid_vol - total_ask_vol) / \
                    (total_bid_vol + total_ask_vol)
    
    # Bid/Ask Volume Ratio
    vol_ratio = total_bid_vol / total_ask_vol if total_ask_vol > 0 else 0
    
    # VWAP จาก Order Book
    bid_vwap = np.sum(bid_prices * bid_vol) / np.sum(bid_vol)
    ask_vwap = np.sum(ask_prices * ask_vol) / np.sum(ask_vol)
    
    # Cumulative Volume Profile (CVP)
    cum_bid_vol = np.cumsum(bid_vol)
    cum_ask_vol = np.cumsum(ask_vol)
    
    # Volume-Weighted Price Distance
    vwap_distance = (bid_vwap + ask_vwap) / 2
    
    return {
        'total_bid_vol': total_bid_vol,
        'total_ask_vol': total_ask_vol,
        'vol_imbalance': vol_imbalance,
        'vol_ratio': vol_ratio,
        'bid_vwap': bid_vwap,
        'ask_vwap': ask_vwap,
        'vwap_distance': vwap_distance,
        'cum_bid_profile': cum_bid_vol,
        'cum_ask_profile': cum_ask_vol
    }

3. Microstructure Features — Order Flow Analysis

def calculate_microstructure_features(order_book_history, window=10):
    """
    วิเคราะห์ Order Flow และ Microstructure จาก Order Book History
    
    Features ที่ได้:
    - Order Flow Imbalance (OFI)
    - Order Arrival Rate
    - Cancel Rate
    - Queue Position
    """
    ofi_list = []
    
    for i in range(1, len(order_book_history)):
        current_bids = order_book_history[i]['bids']
        previous_bids = order_book_history[i-1]['bids']
        
        # Order Flow Imbalance
        current_bid_vol = sum([float(x[1]) for x in current_bids[:5]])
        previous_bid_vol = sum([float(x[1]) for x in previous_bids[:5]])
        
        ofi = current_bid_vol - previous_bid_vol
        ofi_list.append(ofi)
    
    # OFI Statistics
    ofi_mean = np.mean(ofi_list)
    ofi_std = np.std(ofi_list)
    ofi_skew = stats.skew(ofi_list)
    ofi_kurtosis = stats.kurtosis(ofi_list)
    
    # Order Arrival Rate (per second)
    arrival_rate = len(order_book_history) / window
    
    # Cumulative OFI
    cum_ofi = np.cumsum(ofi_list)
    
    return {
        'ofi_mean': ofi_mean,
        'ofi_std': ofi_std,
        'ofi_skew': ofi_skew,
        'ofi_kurtosis': ofi_kurtosis,
        'arrival_rate': arrival_rate,
        'cum_ofi': cum_ofi,
        'ofi_trend': ofi_list[-1] - ofi_list[0]  # OFI Trend
    }

4. Depth Features — Market Depth Analysis

def calculate_depth_features(bids, asks, levels=10):
    """
    วิเคราะห์ Market Depth จากหลายระดับราคา
    
    Features ที่ได้:
    - Depth Ratio
    - Depth Curve Slope
    - Volume Concentration
    - Support/Resistance Levels
    """
    bid_vols = bids['volume'].values[:levels]
    ask_vols = asks['volume'].values[:levels]
    bid_prices = bids['price'].values[:levels]
    ask_prices = asks['price'].values[:levels]
    
    # Depth Ratio
    depth_ratio = np.sum(bid_vols) / np.sum(ask_vols) \
                  if np.sum(ask_vols) > 0 else 1
    
    # Volume Concentration (Herfindahl Index)
    bid_concentration = np.sum(bid_vols**2) / (np.sum(bid_vols)**2) \
                        if np.sum(bid_vols) > 0 else 0
    ask_concentration = np.sum(ask_vols**2) / (np.sum(ask_vols)**2) \
                        if np.sum(ask_vols) > 0 else 0
    
    # Depth Curve Slope — ความชันของเส้น Depth
    bid_slope = np.polyfit(range(len(bid_vols)), bid_prices, 1)[0]
    ask_slope = np.polyfit(range(len(ask_vols)), ask_prices, 1)[0]
    
    # Price Distance from Mid
    bid_distances = (bid_prices[0] - bid_prices) / bid_prices[0] * 100
    ask_distances = (ask_prices - ask_prices[0]) / ask_prices[0] * 100
    
    # Volume Profile
    total_depth = np.sum(bid_vols) + np.sum(ask_vols)
    bid_ratio = np.sum(bid_vols) / total_depth
    
    return {
        'depth_ratio': depth_ratio,
        'bid_concentration': bid_concentration,
        'ask_concentration': ask_concentration,
        'bid_slope': bid_slope,
        'ask_slope': ask_slope,
        'bid_ratio': bid_ratio,
        'bid_distances': bid_distances,
        'ask_distances': ask_distances
    }

การรวม Features และเตรียมข้อมูลสำหรับ ML Model

import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

class FeatureEngineeringPipeline:
    """Pipeline สำหรับสร้าง Features ทั้งหมดจาก Order Book"""
    
    def __init__(self):
        self.feature_names = []
        self.scaler = StandardScaler()
        
    def create_all_features(self, bids, asks, order_book_history=None):
        """รวม Features ทั้งหมดเข้าด้วยกัน"""
        features = {}
        
        # Price Features
        price_feats = calculate_price_features(bids, asks)
        features.update(price_feats)
        
        # Volume Features
        volume_feats = calculate_volume_features(bids, asks)
        features.update(volume_feats)
        
        # Depth Features
        depth_feats = calculate_depth_features(bids, asks)
        features.update(depth_feats)
        
        # Microstructure Features (ถ้ามี History)
        if order_book_history is not None:
            micro_feats = calculate_microstructure_features(
                order_book_history
            )
            features.update(micro_feats)
        
        return features
    
    def create_dataset(self, raw_data_list):
        """สร้าง Dataset สำหรับ Training"""
        feature_dicts = []
        
        for data in raw_data_list:
            bids = data['bids']
            asks = data['asks']
            history = data.get('history', None)
            
            feats = self.create_all_features(bids, asks, history)
            feats['target'] = data['target']  # ราคา Future Return
            feature_dicts.append(feats)
        
        df = pd.DataFrame(feature_dicts)
        
        # จัดการ Missing Values
        df = df.fillna(0)
        
        # เก็บชื่อ Features
        self.feature_names = [col for col in df.columns 
                             if col != 'target']
        
        return df
    
    def prepare_train_test(self, df, test_size=0.2):
        """เตรียมข้อมูลสำหรับ Train/Test Split"""
        X = df[self.feature_names]
        y = df['target']
        
        # Scale Features
        X_scaled = self.scaler.fit_transform(X)
        
        # Split
        X_train, X_test, y_train, y_test = train_test_split(
            X_scaled, y, test_size=test_size, random_state=42
        )
        
        return X_train, X_test, y_train, y_test

ตัวอย่างการใช้งาน
pipeline = FeatureEngineeringPipeline()
df = pipeline.create_dataset(raw_data_list)
X_train, X_test, y_train, y_test = pipeline.prepare_train_test(df)

print(f"Features ทั้งหมด: {len(pipeline.feature_names)}")
print(f"Training Samples: {len(X_train)}")
print(f"Test Samples: {len(X_test)}")

การเทรน Machine Learning Model สำหรับ Price Prediction

import xgboost as xgb
from sklearn.metrics import mean_squared_error, accuracy_score
import numpy as np

def train_price_prediction_model(X_train, y_train, X_test, y_test):
    """
    เทรน XGBoost Model สำหรับทำนายราคา/Return
    
    การตั้งค่าพารามิเตอร์:
    - n_estimators: จำนวน Trees
    - max_depth: ความลึกสูงสุดของ Tree
    - learning_rate: อัตราการเรียนรู้
    - objective: ฟังก์ชัน Loss
    """
    # สร้าง Binary Target (Up/Down)
    y_train_binary = (y_train > 0).astype(int)
    y_test_binary = (y_test > 0).astype(int)
    
    # XGBoost Classifier
    model = xgb.XGBClassifier(
        n_estimators=200,
        max_depth=6,
        learning_rate=0.05,
        subsample=0.8,
        colsample_bytree=0.8,
        objective='binary:logistic',
        eval_metric='logloss',
        use_label_encoder=False
    )
    
    # Train
    model.fit(
        X_train, y_train_binary,
        eval_set=[(X_test, y_test_binary)],
        verbose=False
    )
    
    # Predictions
    y_pred = model.predict(X_test)
    y_pred_proba = model.predict_proba(X_test)[:, 1]
    
    # Metrics
    accuracy = accuracy_score(y_test_binary, y_pred)
    mse = mean_squared_error(y_test, 
                             y_pred_proba * np.std(y_train) + np.mean(y_train))
    
    print(f"Model Accuracy: {accuracy:.4f}")
    print(f"MSE: {mse:.6f}")
    
    return model

Feature Importance
def get_feature_importance(model, feature_names):
    """ดึง Feature Importance จาก Model"""
    importance = model.feature_importances_
    feat_imp = pd.DataFrame({
        'feature': feature_names,
        'importance': importance
    }).sort_values('importance', ascending=False)
    
    return feat_imp

เทรน Model
model = train_price_prediction_model(X_train, y_train, X_test, y_test)

ดู Feature Importance
feat_imp = get_feature_importance(model, pipeline.feature_names)
print("\nTop 10 Features ที่สำคัญที่สุด:")
print(feat_imp.head(10))

การใช้ AI API เพื่อวิเคราะห์ Order Book และสร้าง Strategy

ในการพัฒนา Quantitative Trading Strategy ที่ซับซ้อน เราสามารถใช้ AI API เช่น HolySheep AI เพื่อช่วยวิเคราะห์ข้อมูล Order Book, อธิบาย Patterns และสร้าง Trading Signals ได้อย่างมีประสิทธิภาพ

import requests
import json

class AIOrderBookAnalyzer:
    """ใช้ AI API เพื่อวิเคราะห์ Order Book Data"""
    
    def __init__(self, api_key):
        self.base_url = "https://api.holysheep.ai/v1"
        self.api_key = api_key
        self.headers = {
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        }
    
    def analyze_order_book(self, order_book_data, market_context):
        """
        วิเคราะห์ Order Book ด้วย AI
        
        ส่งข้อมูล Order Book ไปให้ AI วิเคราะห์:
        - Order Flow Imbalance
        - Potential Support/Resistance
        - Trading Signals
        """
        prompt = f"""
        ในฐานะ Quantitative Analyst ที่เชี่ยวชาญด้าน Order Book Analysis:
        
        วิเคราะห์ Order Book Data ต่อไปนี้และให้คำแนะนำ:
        
        Order Book Data:
        {json.dumps(order_book_data, indent=2)}
        
        Market Context:
        {market_context}
        
        กรุณาวิเคราะห์:
        1. Order Flow Imbalance และความหมาย
        2. ระดับ Support/Resistance จาก Volume Clusters
        3. สัญญาณ Trading (Long/Short/Neutral)
        4. Risk Assessment และ Position Sizing
        """
        
        payload = {
            "model": "gpt-4.1",
            "messages": [
                {"role": "system", "content": "You are a quantitative trading expert."},
                {"role": "user", "content": prompt}
            ],
            "temperature": 0.3,
            "max_tokens": 1000
        }
        
        response = requests.post(
            f"{self.base_url}/chat/completions",
            headers=self.headers,
            json=payload
        )
        
        if response.status_code == 200:
            return response.json()['choices'][0]['message']['content']
        else:
            raise Exception(f"API Error: {response.status_code}")
    
    def generate_trading_signals(self, features_df, current_market_data):
        """
        ใช้ AI สร้าง Trading Signals จาก Features
        
        รวม ML Model Predictions กับ AI Analysis
        """
        prompt = f"""
        สร้าง Trading Strategy จากข้อมูลต่อไปนี้:
        
        ML Model Features:
        {features_df.describe().to_string()}
        
        Current Market Data:
        {current_market_data}
        
        ให้คำแนะนำ:
        1. Entry/Exit Points
        2. Stop Loss / Take Profit Levels
        3. Position Size
        4. Risk/Reward Ratio
        """
        
        payload = {
            "model": "deepseek-v3.2",
            "messages": [
                {"role": "user", "content": prompt}
            ],
            "temperature": 0.2,
            "max_tokens": 800
        }
        
        response = requests.post(
            f"{self.base_url}/chat/completions",
            headers=self.headers,
            json=payload
        )
        
        return response.json()['choices'][0]['message']['content']

ตัวอย่างการใช้งาน
analyzer = AIOrderBookAnalyzer("YOUR_HOLYSHEEP_API_KEY")

order_book_data = {
    "bids": [[45000, 1.5], [44900, 2.3], [44800, 4.1]],
    "asks": [[45100, 1.2], [45200, 3.5], [45300, 5.2]],
    "spread_pct": 0.22,
    "vol_imbalance": 0.15
}

market_context = "BTC/USD - High volatility period, Fed announcement expected"

analysis = analyzer.analyze_order_book(order_book_data, market_context)
print(analysis)

การเปรียบเทียบต้นทุน AI API สำหรับ Trading Analysis

ในการพัฒนา Quantitative Trading System ที่ใช้ AI เพื่อวิเคราะห์ Order Book และสร้าง Trading Signals การเลือก AI API Provider ที่เหมาะสมจะช่วยประหยัดต้นทุนได้อย่างมาก โดยเฉพาะเมื่อต้องประมวลผลข้อมูลจำนวนมากในทุก Tick ของตลาด

AI Model	Price (USD/MTok)	10M Tokens/เดือน	Latency	เหมาะกับงาน
DeepSeek V3.2	$0.42	$4,200	<50ms	Bulk Analysis, Feature Engineering
Gemini 2.5 Flash	$2.50	$25,000	<100ms	Fast Analysis, Real-time Signals
GPT-4.1	$8.00	$80,000	<200ms	Complex Strategy Design
Claude Sonnet 4.5	$15.00	$150,000	<150ms	In-depth Research, Backtesting

การคำนวณ ROI: หากคุณใช้ DeepSeek V3.2 แทน Claude Sonnet 4.5 สำหรับ 10M tokens จะประหยัดได้ $145,800/เดือน หรือ 97.2% ของต้นทุน ในขณะที่ยังได้คุณภาพเพียงพอสำหรับงาน Order Book Analysis

เหมาะกับใคร / ไม่เหมาะกับใคร

✅ เหมาะกับใคร

Quantitative Traders — ผู้ที่ต้องการสร้างระบบเทรดอัตโนมัติโดยใช้ Machine Learning
Algorithmic Trading Teams — ทีมที่ต้องการวิเคราะห์ Order Book แบบ Real-time
HFT Firms — บริษัท High-Frequency Trading ที่ต้องการ Latency ต่ำ
Retail Traders — นักเทรดรายย่อยที่ต้องการเรียนรู้ Feature Engineering ขั้นสูง
Data Scientists — ผู้ที่ต้องการสร้าง Dataset สำหรับ Financial ML

❌ ไม่เหมาะกับใคร

Manual Traders — ผู้ที่เทรดโดยใช้ Intuition เป็นหลัก ไม่ใช้ระบบอัตโนมัติ
Long-term Investors — ผู้ที่ถือสินทรัพย์ระยะยาว ไม่ต้องการ Real-time Analysis
ผู้เริ่มต้น — ที่ยังไม่มีพื้นฐาน Python และ ML อาจต้องศึกษาเพิ่มเติม

ราคาและ ROI

การลงทุนใน AI-Powered Trading System นั้นมี ROI ที่ชัดเจน โดยเฉพาะเมื่อใช้ HolySheep AI ซึ่งมีราคาถูกกว่าผู้ให้บริการอื่นถึง 85%+:

ประเภทการใช้งาน

ปริ

量化交易特征工程：用 Order Book 数据构建机器学习因子

ทำไม Order Book Data ถึงสำคัญในการสร้าง ML Features

การติดตั้งเครื่องมือและ Library

สำหรับ Visualization

สำหรับ Machine Learning Model

การดึงข้อมูล Order Book จาก Exchange

ตัวอย่างการใช้งาน

การสร้าง Machine Learning Features จาก Order Book

1. Price-Based Features

2. Volume-Based Features

3. Microstructure Features — Order Flow Analysis

4. Depth Features — Market Depth Analysis

การรวม Features และเตรียมข้อมูลสำหรับ ML Model

ตัวอย่างการใช้งาน

การเทรน Machine Learning Model สำหรับ Price Prediction

Feature Importance

เทรน Model

ดู Feature Importance

การใช้ AI API เพื่อวิเคราะห์ Order Book และสร้าง Strategy

ตัวอย่างการใช้งาน

การเปรียบเทียบต้นทุน AI API สำหรับ Trading Analysis

เหมาะกับใคร / ไม่เหมาะกับใคร

✅ เหมาะกับใคร

❌ ไม่เหมาะกับใคร

ราคาและ ROI

แหล่งข้อมูลที่เกี่ยวข้อง

บทความที่เกี่ยวข้อง

ทำไม Order Book Data ถึงสำคัญในการสร้าง ML Features

การติดตั้งเครื่องมือและ Library

สำหรับ Visualization

สำหรับ Machine Learning Model

การดึงข้อมูล Order Book จาก Exchange

ตัวอย่างการใช้งาน

การสร้าง Machine Learning Features จาก Order Book

1. Price-Based Features

2. Volume-Based Features

3. Microstructure Features — Order Flow Analysis

4. Depth Features — Market Depth Analysis

การรวม Features และเตรียมข้อมูลสำหรับ ML Model

ตัวอย่างการใช้งาน

การเทรน Machine Learning Model สำหรับ Price Prediction

Feature Importance

เทรน Model

ดู Feature Importance

การใช้ AI API เพื่อวิเคราะห์ Order Book และสร้าง Strategy

ตัวอย่างการใช้งาน

การเปรียบเทียบต้นทุน AI API สำหรับ Trading Analysis

เหมาะกับใคร / ไม่เหมาะกับใคร

✅ เหมาะกับใคร

❌ ไม่เหมาะกับใคร

ราคาและ ROI

แหล่งข้อมูลที่เกี่ยวข้อง

บทความที่เกี่ยวข้อง

🔥 ลอง HolySheep AI