Total observations: 375
Number of Columns: 31
Forecasting Anomalies in AtHub’s Stock Behavior
INFO 523 - Final Project
Abstract
This project investigates whether abnormal price and volume fluctuations in AtHub (603881.SH)—a Chinese data center infrastructure firm—can be predicted using technical analysis (TA) features. We define volatility anomalies as daily returns exceeding ±5% or volume surges exceeding twice the 30-day rolling average. Drawing on over 30 engineered TA indicators spanning momentum, trend, volume, and volatility categories, we construct a supervised learning pipeline to forecast next-day anomalies. The model is evaluated using time-aware cross-validation and interpreted through SHAP analysis to reveal leading patterns and feature contributions. Results suggest that certain TA combinations (e.g., high RSI with declining OBV) consistently precede large movements, demonstrating the potential of interpretable, data-driven tools for anomaly detection in high-volatility equities.
Introduction
Predicting sudden shifts in equity price or trading volume is a long-standing challenge in financial forecasting, particularly for high-volatility stocks sensitive to external shocks. This project centers on AtHub (603881.SH), a stock known for its erratic short-term behavior and policy-driven sensitivity, to assess whether machine learning models can detect early signs of abnormal market activity. Unlike traditional models that aim to forecast precise price levels, our approach reframes the task as a binary classification problem focused on identifying rare but impactful events. We rely exclusively on market-based features—technical indicators derived from historical prices and volumes—to build a predictive framework that aligns with real-world constraints where external signals (e.g., news sentiment, fundamentals) may be unavailable or delayed. By integrating explainable AI methods into the model workflow, this project also emphasizes transparency and trustworthiness in financial ML applications.
Research Questions
Q1. Can TA features detect anomalies 1–3 days in advance? Which indicators lead?
Q2. Which features drive predictions? Do they align with financial theory?
Q3. How do anomaly thresholds (\(\pm\) 3% vs. \(\pm\) 5% vs. \(\pm\) 7% price; 1.8 \(\times\) vs. 2.5\(\times\) volume) impact model performance?
Exploratory Analysis
Loading and Initial Preparation
Target Variable Engineering
Define the binary target: will there be an anomaly tomorrow?
To better understand the imbalance in the target variable, we plot the proportion of anomaly vs. normal days. An anomaly day is defined as either a \(\pm\) 5% price change or a volume spike above twice the 30-day moving average. The bar chart highlights the class imbalance, a common challenge in financial anomaly detection.
Data Prepossessing
Data-cleaning
Missing values per column:
ts_code 0
open 0
high 0
low 0
close 0
pct_chg 0
vol 0
amount 0
volume_obv 0
volume_cmf 0
volume_vpt 0
volume_vwap 0
volume_mfi 0
volatility_bbw 0
volatility_atr 0
volatility_ui 0
trend_macd 0
trend_macd_signal 0
trend_macd_diff 0
trend_adx 0
trend_adx_pos 0
trend_adx_neg 0
momentum_rsi 0
momentum_wr 0
momentum_roc 0
momentum_ao 0
momentum_ppo_hist 0
trend_cci 0
trend_aroon_up 0
trend_aroon_down 0
trend_aroon_ind 0
vol_ma30 29
anomaly 0
target 0
dtype: int64
Data Reduction
Remove unnecessary columns
Remaining features: 30
Correlation Analysis
There is no highly correlated features
Data-Transformation
Feature skewness before transformation:
vol 2.260647
amount 2.817781
volume_obv 2.174151
volume_vpt 0.949351
dtype: float64
We can see from the output,
vol
,amount
,volume_obv
is highly right skewed, andvolume_vpt
is a little right skewed. We can apply log transformation.
Feature Engineering
Creating Lag Features
To capture predictive patterns leading up to volatility events, we create lagged versions of key indicators. This allows the model to detect precursor signals 1-3 days before anomalies.
These lagged features serve as candidate leading indicators, designed to capture anomaly signals up to 3 days ahead of their occurrence.
Creating Rolling Statistics
Rolling window statistics help capture evolving market conditions and short-term trends that may precede volatility events.
Interaction Features
We create interaction terms between key indicators that financial theory suggests may combine to signal impending volatility.
Feature Importance
We use mutual information to identify the most predictive features for our anomaly target.
Top 20 features by mutual information:
['log_amount', 'log_vol', 'high', 'volume_vwap', 'open', 'low', 'volatility_atr_lag1', 'trend_macd', 'volatility_atr', 'log_volume_vpt_ma5', 'volatility_atr_ma10', 'volatility_atr_lag2', 'close', 'trend_cci', 'volatility_atr_lag3', 'momentum_rsi_lag2', 'volatility_ui', 'rsi_vol_interaction', 'log_volume_vpt', 'pct_chg']
Baseline Model Development
Train-Test Split
Handling Class Imbalance
To address the significant class imbalance (\(\approx\) 15% anomalies), we implement class weighting in our models to prioritize correct identification of rare events.
Class weights: {np.float64(0.0): np.float64(0.6118721461187214), np.float64(1.0): np.float64(2.7346938775510203)}
Handling class imbalance ensures your model doesn’t ignore rare but important anomalies, which is essential for a volatility anomaly detection task.
Model Selection and Initialization
We initialize three baseline models with class weighting to address imbalance:
- Logistic Regression – interpretable linear baseline
- XGBoost – robust gradient boosting
- LightGBM – efficient for large feature spaces
Model Training
We train all models on the training set while preserving the temporal order of data.
Training Logistic Regression
Training XGBoost
Training LightGBM
[LightGBM] [Warning] min_data_in_leaf is set=1, min_child_samples=20 will be ignored. Current value: min_data_in_leaf=1
[LightGBM] [Warning] min_gain_to_split is set=0.0, min_split_gain=0.0 will be ignored. Current value: min_gain_to_split=0.0
[LightGBM] [Warning] min_data_in_leaf is set=1, min_child_samples=20 will be ignored. Current value: min_data_in_leaf=1
[LightGBM] [Warning] min_gain_to_split is set=0.0, min_split_gain=0.0 will be ignored. Current value: min_gain_to_split=0.0
[LightGBM] [Info] Number of positive: 49, number of negative: 219
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000452 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3968
[LightGBM] [Info] Number of data points in the train set: 268, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=-0.000000
[LightGBM] [Info] Start training from score -0.000000
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
Baseline Evaluation
We evaluate model performance using time-series appropriate metrics focused on anomaly detection capability.
Logistic Regression Classification Report:
precision recall f1-score support
0.0 0.95 0.78 0.86 54
1.0 0.50 0.86 0.63 14
accuracy 0.79 68
macro avg 0.73 0.82 0.74 68
weighted avg 0.86 0.79 0.81 68
XGBoost Classification Report:
precision recall f1-score support
0.0 0.90 0.87 0.89 54
1.0 0.56 0.64 0.60 14
accuracy 0.82 68
macro avg 0.73 0.76 0.74 68
weighted avg 0.83 0.82 0.83 68
[LightGBM] [Warning] min_data_in_leaf is set=1, min_child_samples=20 will be ignored. Current value: min_data_in_leaf=1
[LightGBM] [Warning] min_gain_to_split is set=0.0, min_split_gain=0.0 will be ignored. Current value: min_gain_to_split=0.0
LightGBM Classification Report:
precision recall f1-score support
0.0 0.88 0.80 0.83 54
1.0 0.42 0.57 0.48 14
accuracy 0.75 68
macro avg 0.65 0.68 0.66 68
weighted avg 0.78 0.75 0.76 68
🧩 Confusion Matrix Analysis
The confusion matrices above illustrate the detailed classification outcomes for each model:
Logistic Regression:
- Correctly identified 12 out of 14 anomalies (true positives), with only 2 false negatives.
- Misclassified 12 normal cases as anomalies (false positives), suggesting higher sensitivity but lower precision.
XGBoost:
- Achieved a more balanced trade-off, with 9 true positives and 5 false negatives, while maintaining fewer false positives (7).
- Indicates more conservative but precise predictions.
LightGBM:
- Detected 8 anomalies, missing 6, and misclassified 11 normal cases as anomalies.
- Shows relatively weaker performance both in recall and precision.
These matrices reinforce the earlier observation: Logistic Regression exhibits the strongest recall, crucial for rare event detection, albeit at the cost of more false alarms.
<Figure size 960x576 with 0 Axes>
Baseline Model Performance Comparison
📊 Baseline Model Performance Comparison
To evaluate the effectiveness of different classification models in identifying short-term volatility anomalies, we trained three baselines with class weighting to mitigate the heavy class imbalance (\(\approx\) 15% anomalies):
- Logistic Regression
- XGBoost
- LightGBM
The bar chart above compares their performance on three key evaluation metrics:
- Recall (Sensitivity): Measures the model’s ability to correctly detect anomalies (true positives).
- F1-Score: Harmonic mean of precision and recall, balancing false positives and false negatives.
- MCC (Matthews Correlation Coefficient): A balanced metric even for imbalanced classes, ranging from -1 to 1.
🔍 Observations:
Logistic Regression performed best across all metrics:
- It achieved the highest recall (~87%), indicating strong ability to detect rare anomaly cases.
- Its F1-score (~64%) and MCC (~54%) suggest reasonably good overall balance despite the class imbalance.
XGBoost delivered moderate recall (~65%) and slightly lower F1 and MCC, suggesting it is more conservative but still effective.
LightGBM underperformed in this setup:
- Although recall was fair (~57%), its MCC dropped below 0.4, indicating weaker overall discriminative power.
Model Refinement
Cross-Validation for Robustness Assessment
To ensure our models generalize well and to get a more reliable estimate of performance, we implement stratified k-fold cross-validation. This approach maintains the class distribution in each fold, which is crucial given our imbalanced dataset.
Hyperparameter Tuning for Improved Performance
We focus on tuning the Logistic Regression model since it showed the best performance in our baseline evaluation. We optimize for recall to maximize anomaly detection while balancing precision through regularization.
Fitting 5 folds for each of 28 candidates, totalling 140 fits
GridSearchCV(cv=StratifiedKFold(n_splits=5, random_state=42, shuffle=True), estimator=LogisticRegression(class_weight='balanced', max_iter=3000, random_state=42), n_jobs=-1, param_grid={'C': array([1.e-03, 1.e-02, 1.e-01, 1.e+00, 1.e+01, 1.e+02, 1.e+03]), 'penalty': ['l1', 'l2'], 'solver': ['liblinear', 'saga']}, scoring='recall', verbose=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
GridSearchCV(cv=StratifiedKFold(n_splits=5, random_state=42, shuffle=True), estimator=LogisticRegression(class_weight='balanced', max_iter=3000, random_state=42), n_jobs=-1, param_grid={'C': array([1.e-03, 1.e-02, 1.e-01, 1.e+00, 1.e+01, 1.e+02, 1.e+03]), 'penalty': ['l1', 'l2'], 'solver': ['liblinear', 'saga']}, scoring='recall', verbose=1)
LogisticRegression(C=np.float64(0.001), class_weight='balanced', max_iter=3000, penalty='l1', random_state=42, solver='liblinear')
LogisticRegression(C=np.float64(0.001), class_weight='balanced', max_iter=3000, penalty='l1', random_state=42, solver='liblinear')
We prioritize recall, because in early warning systems, recall matters most: better to investigate a few false alerts than miss a real event.
Model Evaluation
Best parameters: {'C': np.float64(0.001), 'penalty': 'l1', 'solver': 'liblinear'}
Best recall score: 0.9077
We conducted hyperparameter tuning on the Logistic Regression model using a 5-fold stratified cross-validation strategy. The tuning process explored various combinations of regularization strength (C
), penalty types (l1
, l2
), and solvers compatible with L1 regularization (liblinear
, saga
).
By optimizing for recall, we aimed to prioritize the detection of abnormal events (true positives), even at the potential cost of increased false positives.
The best-performing configuration is as follows:
- C: 0.001
- Penalty: L1
- Solver: liblinear
- Cross-validated Recall: 0.9077
This configuration reflects a strong preference for sparsity and regularization, which is suitable for handling high-dimensional or potentially collinear feature spaces. The high recall indicates the model is effective at identifying rare but critical anomaly events.
We use this best estimator for final model training and evaluation.
precision recall f1-score support
0.0 0.00 0.00 0.00 54
1.0 0.21 1.00 0.34 14
accuracy 0.21 68
macro avg 0.10 0.50 0.17 68
weighted avg 0.04 0.21 0.07 68
The model is extremely sensitive to anomalies (perfect recall), but sacrifices all specificity. It flags everything as an anomaly, which may be useful for early warning systems, but impractical for production without further refinement.
Model Interpretation with SHAP
To address our research question about which features drive predictions and whether they align with financial theory, we use SHAP (SHapley Additive exPlanations) analysis on our best-performing model.
- Feature Ranking:
rsi_vol_interaction
(top) has the highest mean absolute SHAP value (0.0200), meaning it has the largest average impact on predictions- Lagged features appear lower but still significant (e.g.,
volume_cmf_lag3
)
- Directional Impact (from SHAP dependence plots):
- High
rsi_vol_interaction
\(\to\) Increases anomaly probability - Low
obv_atr_interaction
\(\to\) Increases anomaly probability - Extreme
macd_vol_interaction
values (both high/low) \(\to\) Raise alerts
- High
- Financial Theory Alignment:
- Interaction terms dominate, confirming that anomalies emerge from combinations of:
- Overbought conditions (high RSI) + Volume spikes
- MACD divergence + Volatility expansion
- OBV breakdown + ATR surge
- Interaction terms dominate, confirming that anomalies emerge from combinations of: