8. Forecast Evaluation Metrics
๐ช Step 1: Intuition & Motivation
Core Idea: Building a forecasting model is only half the story โ the other half is judging how good it is.
But unlike classification (where accuracy or F1-score works), time series forecasting deals with continuous values โ future temperatures, stock prices, or sales volumes. We donโt care whether the forecast is โright or wrong,โ we care how far off it is.
Thatโs why we use error-based metrics, which tell us how close our predictions are to the actual truth.
Simple Analogy: If you throw darts at a board, accuracy tells you how many hit the bullseye โ but in forecasting, we measure how far each dart lands from the center.
๐ฑ Step 2: Core Concept
Letโs uncover how each metric views error differently โ some forgive small mistakes, others heavily punish big ones.
Whatโs Happening Under the Hood?
๐งฎ 1. Mean Absolute Error (MAE)
The simplest and most intuitive metric.
$$ MAE = \frac{1}{n} \sum_{t=1}^{n} |y_t - \hat{y}_t| $$It measures the average absolute difference between predictions ($\hat{y}_t$) and actuals ($y_t$).
- Treats all errors equally โ a 10-unit miss is just a 10-unit miss.
- Easy to interpret: โOn average, my forecast is off by X units.โ
๐ฅ 2. Root Mean Squared Error (RMSE)
The fancier cousin of MAE.
$$ RMSE = \sqrt{\frac{1}{n} \sum_{t=1}^{n} (y_t - \hat{y}_t)^2} $$RMSE squares errors before averaging โ so large mistakes hurt more. If youโre forecasting electricity load or revenue, where big mistakes are disastrous, RMSE highlights them.
๐ 3. Mean Absolute Percentage Error (MAPE)
Gives errors as a percentage โ easier to communicate to non-technical folks.
$$ MAPE = \frac{100}{n} \sum_{t=1}^{n} \left| \frac{y_t - \hat{y}_t}{y_t} \right| $$MAPE = โOn average, our forecasts are X% off.โ But it fails when actual values ($y_t$) are near zero โ division explodes!
โ๏ธ 4. Symmetric MAPE (SMAPE)
A more stable version of MAPE that avoids zero-division issues.
$$ SMAPE = \frac{100}{n} \sum_{t=1}^{n} \frac{|y_t - \hat{y}_t|}{(|y_t| + |\hat{y}_t|)/2} $$By dividing by the average of predicted and actual values, SMAPE stays well-behaved even for tiny targets.
Why It Works This Way
Each metric tells a slightly different story:
- MAE: Measures typical error โ stable and interpretable.
- RMSE: Highlights large deviations โ sensitive to outliers.
- MAPE: Communicates scale-free percentage errors โ intuitive, but unstable near zero.
- SMAPE: Balances percentage comparison and robustness โ good for mixed-scale data.
In forecasting, no single metric rules them all. You pick based on what matters most: interpretability, penalty severity, or scale-independence.
How It Fits in ML Thinking
In machine learning, we evaluate models with loss functions โ forecasting metrics are just loss functions for temporal predictions.
They help compare models (ARIMA vs Prophet vs XGBoost), select hyperparameters, and track model drift over time.
But one crucial rule stands:
Evaluation must always respect time โ no random splits or shuffling!
Use chronological train-test splits to ensure the model is judged on unseen future data, just like in production.
๐ Step 3: Mathematical Foundation
Letโs see the math side-by-side for quick comparison:
| Metric | Formula | Penalizes Large Errors? | Scale-Free? | Handles Zero Values? |
|---|---|---|---|---|
| MAE | $\frac{1}{n} \sum y_t - \hat{y}_t$ | โ No | โ No | โ Yes |
| RMSE | $\sqrt{\frac{1}{n} \sum (y_t - \hat{y}_t)^2}$ | โ Yes | โ No | โ Yes |
| MAPE | $\displaystyle \frac{100}{n}\sum_{t=1}^{n}\left\lvert \frac{y_t-\hat{y}_t}{y_t} \right\rvert$ | โ No | โ Yes | โ No |
| SMAPE | $\frac{100}{n} \sum \frac{ y_t - \hat{y}_t}{( y_t + \hat{y}_t )/2}$ | โ No | โ Yes | โ Yes |
๐ง Step 4: Assumptions or Key Ideas
- Metrics assume predictions and actuals are aligned in time.
- Evaluation uses future data only (no shuffling).
- Different metrics fit different business goals โ e.g., RMSE for sensitive domains, MAPE for interpretability.
โ๏ธ Step 5: Strengths, Limitations & Trade-offs
โ Strengths
- Provide quantitative, comparable performance across models.
- Simple to compute and interpret.
- Customizable to match business objectives.
โ ๏ธ Limitations
- MAE and RMSE depend on data scale (hard to compare across datasets).
- MAPE/SMAPE struggle with near-zero actuals.
- None capture directional accuracy (whether you over- or under-predicted).
โ๏ธ Trade-offs
- MAE โ robust, interpretable, but ignores magnitude importance.
- RMSE โ sensitive to large errors, great when big misses are costly.
- MAPE/SMAPE โ ideal for business communication but can misbehave on small targets. Choosing a metric is really about what hurts most in your domain โ a big miss or consistent small ones.
๐ง Step 6: Common Misunderstandings
๐จ Common Misunderstandings (Click to Expand)
- โWe can use accuracy or Rยฒ for forecasting.โ โ Wrong โ these assume categorical or independent samples.
- โRMSE is always better than MAE.โ โ Not necessarily โ RMSE just emphasizes outliers more.
- โMAPE is safe for all data.โ โ Be careful when actuals approach zero โ it can explode.
๐งฉ Step 7: Mini Summary
๐ง What You Learned: Forecasting metrics quantify how close predictions are to reality โ using absolute, squared, or percentage-based error measures.
โ๏ธ How It Works: Metrics like MAE, RMSE, MAPE, and SMAPE each highlight different aspects of model performance โ average error, large deviations, or scale-free comparison.
๐ฏ Why It Matters: Choosing the right metric ensures models are judged fairly and aligned with real-world goals โ accuracy means nothing if the error cost isnโt understood.