8. Forecast Evaluation Metrics

Machine Learning Interview Guide for Top Tech Roles (2025)

5 min read 871 words

🪄 Step 1: Intuition & Motivation

Core Idea: Building a forecasting model is only half the story — the other half is judging how good it is.

But unlike classification (where accuracy or F1-score works), time series forecasting deals with continuous values — future temperatures, stock prices, or sales volumes. We don’t care whether the forecast is “right or wrong,” we care how far off it is.

That’s why we use error-based metrics, which tell us how close our predictions are to the actual truth.

Simple Analogy: If you throw darts at a board, accuracy tells you how many hit the bullseye — but in forecasting, we measure how far each dart lands from the center.

🌱 Step 2: Core Concept

Let’s uncover how each metric views error differently — some forgive small mistakes, others heavily punish big ones.

What’s Happening Under the Hood?

🧮 1. Mean Absolute Error (MAE)

The simplest and most intuitive metric.

$$ MAE = \frac{1}{n} \sum_{t=1}^{n} |y_t - \hat{y}_t| $$

It measures the average absolute difference between predictions ($\hat{y}_t$) and actuals ($y_t$).

Treats all errors equally — a 10-unit miss is just a 10-unit miss.
Easy to interpret: “On average, my forecast is off by X units.”

💥 2. Root Mean Squared Error (RMSE)

The fancier cousin of MAE.

$$ RMSE = \sqrt{\frac{1}{n} \sum_{t=1}^{n} (y_t - \hat{y}_t)^2} $$

RMSE squares errors before averaging — so large mistakes hurt more. If you’re forecasting electricity load or revenue, where big mistakes are disastrous, RMSE highlights them.

📉 3. Mean Absolute Percentage Error (MAPE)

Gives errors as a percentage — easier to communicate to non-technical folks.

$$ MAPE = \frac{100}{n} \sum_{t=1}^{n} \left| \frac{y_t - \hat{y}_t}{y_t} \right| $$

MAPE = “On average, our forecasts are X% off.” But it fails when actual values ($y_t$) are near zero — division explodes!

⚖️ 4. Symmetric MAPE (SMAPE)

A more stable version of MAPE that avoids zero-division issues.

$$ SMAPE = \frac{100}{n} \sum_{t=1}^{n} \frac{|y_t - \hat{y}_t|}{(|y_t| + |\hat{y}_t|)/2} $$

By dividing by the average of predicted and actual values, SMAPE stays well-behaved even for tiny targets.

Why It Works This Way

Each metric tells a slightly different story:

MAE: Measures typical error — stable and interpretable.
RMSE: Highlights large deviations — sensitive to outliers.
MAPE: Communicates scale-free percentage errors — intuitive, but unstable near zero.
SMAPE: Balances percentage comparison and robustness — good for mixed-scale data.

In forecasting, no single metric rules them all. You pick based on what matters most: interpretability, penalty severity, or scale-independence.

How It Fits in ML Thinking

In machine learning, we evaluate models with loss functions — forecasting metrics are just loss functions for temporal predictions.

They help compare models (ARIMA vs Prophet vs XGBoost), select hyperparameters, and track model drift over time.

But one crucial rule stands:

Evaluation must always respect time — no random splits or shuffling!

Use chronological train-test splits to ensure the model is judged on unseen future data, just like in production.

📐 Step 3: Mathematical Foundation

Let’s see the math side-by-side for quick comparison:

Metric	Formula	Penalizes Large Errors?	Scale-Free?	Handles Zero Values?
MAE	$\frac{1}{n} \sum y_t - \hat{y}_t$	❌ No	❌ No	✅ Yes
RMSE	$\sqrt{\frac{1}{n} \sum (y_t - \hat{y}_t)^2}$	✅ Yes	❌ No	✅ Yes
MAPE	$\displaystyle \frac{100}{n}\sum_{t=1}^{n}\left\lvert \frac{y_t-\hat{y}_t}{y_t} \right\rvert$	❌ No	✅ Yes	❌ No
SMAPE	$\frac{100}{n} \sum \frac{ y_t - \hat{y}_t}{( y_t + \hat{y}_t )/2}$	❌ No	✅ Yes	✅ Yes

Think of MAE as the “average pain,” RMSE as the “worst pain magnified,” and MAPE/SMAPE as “pain in percentage terms.”

🧠 Step 4: Assumptions or Key Ideas

Metrics assume predictions and actuals are aligned in time.
Evaluation uses future data only (no shuffling).
Different metrics fit different business goals — e.g., RMSE for sensitive domains, MAPE for interpretability.

⚖️ Step 5: Strengths, Limitations & Trade-offs

✅ Strengths

Provide quantitative, comparable performance across models.
Simple to compute and interpret.
Customizable to match business objectives.

⚠️ Limitations

MAE and RMSE depend on data scale (hard to compare across datasets).
MAPE/SMAPE struggle with near-zero actuals.
None capture directional accuracy (whether you over- or under-predicted).

⚖️ Trade-offs

MAE → robust, interpretable, but ignores magnitude importance.
RMSE → sensitive to large errors, great when big misses are costly.
MAPE/SMAPE → ideal for business communication but can misbehave on small targets. Choosing a metric is really about what hurts most in your domain — a big miss or consistent small ones.

🚧 Step 6: Common Misunderstandings

🚨 Common Misunderstandings (Click to Expand)

“We can use accuracy or R² for forecasting.” ❌ Wrong — these assume categorical or independent samples.
“RMSE is always better than MAE.” ❌ Not necessarily — RMSE just emphasizes outliers more.
“MAPE is safe for all data.” ❌ Be careful when actuals approach zero — it can explode.

🧩 Step 7: Mini Summary

🧠 What You Learned: Forecasting metrics quantify how close predictions are to reality — using absolute, squared, or percentage-based error measures.

⚙️ How It Works: Metrics like MAE, RMSE, MAPE, and SMAPE each highlight different aspects of model performance — average error, large deviations, or scale-free comparison.

🎯 Why It Matters: Choosing the right metric ensures models are judged fairly and aligned with real-world goals — accuracy means nothing if the error cost isn’t understood.

9. Scaling to Real-World Systems 7. Feature Engineering for Time Series ML