Skip to main content
Evaluation & monitoring

Methodology

How StockPredict AI evaluates models, avoids lookahead bias, and monitors performance over time.

1. Walk-forward validation

Instead of a single static train/test split, StockPredict AI uses walk-forward validation. Historical data is split into sequential windows:

  • Train on an initial history window.
  • Evaluate on the following period (out-of-sample).
  • Roll the window forward and repeat across the timeline.

This matches production behavior: you always train on the past and predict the future, and it reduces overly optimistic backtests.

2. Avoiding lookahead bias

Lookahead bias happens when a model accidentally uses future information during training. StockPredict AI avoids this by:

  • Building features only from data available up to each prediction timestamp.
  • Separating training and evaluation periods in time (no shuffling across the timeline).
  • Using holdout windows that simulate “live” deployment conditions.

The goal is not to “fit the past” perfectly, but to estimate how the model might behave on unseen data.

3. Evaluation metrics

The platform tracks multiple metrics to understand performance from different angles:

  • MAE (Mean Absolute Error): average absolute difference between predicted and realized returns/prices.
  • Directional accuracy: how often the model predicts the sign correctly (up vs. down).
  • Calibration / Brier score: how well predicted probabilities match observed frequencies.
  • Rank metrics: how well the model orders stocks (e.g., top decile vs. bottom decile).

No single metric tells the full story; a dashboard of metrics helps catch drift and overfitting.

4. Drift monitoring

Markets change, so the pipeline includes drift checks to detect when the relationship between features and targets may be breaking down:

  • Population Stability Index (PSI) between historical and recent feature distributions.
  • Rolling directional accuracy and error metrics over time.
  • Monitoring hit-rates across segments and regimes.

When drift is detected, models may need retraining, retuning, or in some cases, feature redesign.

5. Interpreting results

Even with careful validation, ML forecasts are noisy. StockPredict AI emphasizes:

  • Using predictions as probabilistic signals, not guarantees.
  • Combining model outputs with human judgment and risk management.
  • Being transparent about limitations and assumptions.

The platform is designed as an educational and research tool, not a plug-and-play trading system.

Example evaluation snapshot

The monitoring layer tracks multiple signals together so regressions are visible quickly.

Directional accuracy
Rolling
by horizon & ticker
Calibration
Brier score
probabilities vs. outcomes
Data drift
PSI
feature distribution shift
For end-to-end pipeline details, see How it works.