2. Stationarity & Differencing

Machine Learning Interview Guide for Top Tech Roles (2025)

5 min read 865 words

🪄 Step 1: Intuition & Motivation

Core Idea: In time series modeling, we crave stability. If the statistical properties (like mean and variance) of a series keep changing over time, we can’t learn consistent patterns — it’s like aiming at a moving target. A stationary series is one whose story stays consistent: same average behavior, same ups and downs, same volatility throughout.

Simple Analogy: Imagine driving on a flat road — your car runs smoothly. That’s stationary. Now imagine the road slopes up and down unpredictably — your speed keeps fluctuating. That’s non-stationary. Models prefer the flat road; they understand it better.

🌱 Step 2: Core Concept

Let’s unwrap what stationarity means and how to achieve it.

What’s Happening Under the Hood?

A stationary time series is one where the statistical character doesn’t change with time. Formally, its mean, variance, and autocovariance remain constant.

Mean → constant over time (no upward/downward trend)
Variance → consistent spread (no volatility bursts)
Covariance → depends only on lag, not on actual time

In simpler words: the process generating data behaves the same way yesterday, today, and tomorrow.

When the mean drifts upward → trend non-stationarity. When variance grows or shrinks over time → variance non-stationarity.

Why It Works This Way

Most time series models (like ARIMA) assume stability in data relationships. If the mean or variance changes, the model can’t learn one consistent rule — it’s like trying to predict grades while the grading system keeps changing.

So before modeling, we must “stabilize” the series — remove trends or varying volatility so the model focuses on relationships rather than drifts.

How It Fits in ML Thinking

In traditional ML, we often normalize or standardize data so the model trains smoothly. In time series, differencing and transformation serve a similar purpose — they normalize behavior over time.

Stationarity ensures that the relationships learned from the past will still hold in the future — a critical assumption for forecasting.

📐 Step 3: Mathematical Foundation

Let’s peek into the formulas behind the idea.

Stationarity Definition

A time series ${X_t}$ is strictly stationary if for all time shifts $k$,

$$ P(X_{t_1}, X_{t_2}, \dots, X_{t_n}) = P(X_{t_1+k}, X_{t_2+k}, \dots, X_{t_n+k}) $$

This means the joint probability distribution doesn’t change when you move through time. In practice, we relax this to weak stationarity, requiring only constant mean and variance.

Stationarity = same pattern, different timeline. The series behaves the same way no matter when you observe it.

Differencing Formula

$$ Y_t = X_t - X_{t-1} $$

$Y_t$: differenced value (new stationary series)
$X_t$: original observation
$X_{t-1}$: previous observation

This operation removes trends — it measures change, not level.

If one round of differencing isn’t enough, apply twice:

$$ Y_t = (X_t - X_{t-1}) - (X_{t-1} - X_{t-2}) $$

Differencing is like focusing on how much something changes, not its absolute value. You stop caring about “where you are,” only about “how you moved.”

ADF (Augmented Dickey-Fuller) Test

The ADF test checks for a unit root — a sign of non-stationarity.

$$ \Delta X_t = \alpha + \beta t + \gamma X_{t-1} + \delta_1 \Delta X_{t-1} + \dots + \epsilon_t $$

If $\gamma = 0$ → series has a unit root (non-stationary)
If $\gamma < 0$ → series is stationary

The test outputs a p-value:

p < 0.05 → reject null → stationary
p ≥ 0.05 → fail to reject → non-stationary

ADF test asks: “Is this series just wandering aimlessly?” If yes → non-stationary. If no → it has a stable pattern → stationary.

🧠 Step 4: Assumptions or Key Ideas

Stationarity is required for models like ARIMA to make valid predictions.
Differencing assumes the trend is linear or smooth enough to remove.
ADF assumes residuals (noise) are uncorrelated after differencing.
Rolling mean/variance helps visually detect instability.

⚖️ Step 5: Strengths, Limitations & Trade-offs

✅ Strengths

Simplifies modeling — removes moving trends and shifts.
Enables use of powerful linear models (ARIMA, SARIMA).
Differencing is easy and intuitive to apply.

⚠️ Limitations

Over-differencing can erase meaningful structure.
ADF test can be unreliable with small samples.
Variance-stationarity issues remain unsolved by differencing alone — we may need log, Box-Cox, or power transforms.

⚖️ Trade-offs Differencing stabilizes the mean, but not always the variance. That’s where transformations like log (for exponential growth) or Box-Cox (for power scaling) step in. It’s a balancing act: remove instability without erasing useful information.

🚧 Step 6: Common Misunderstandings

🚨 Common Misunderstandings (Click to Expand)

“Differencing always fixes stationarity.” ❌ Sometimes only the mean stabilizes — variance may still fluctuate.
“ADF test says non-stationary → hopeless.” ❌ You can often fix it with transformations.
“Stationarity means flat line.” ❌ A stationary series can wiggle, as long as its wiggles are statistically consistent.

🧩 Step 7: Mini Summary

🧠 What You Learned: Stationarity means the statistical behavior of a time series stays consistent through time — a vital condition for forecasting models.

⚙️ How It Works: By using differencing and tests like ADF, we stabilize the mean and identify when our data stops “drifting.”

🎯 Why It Matters: Without stationarity, model parameters drift too, and forecasts become unreliable. Achieving stationarity gives your model solid ground to stand on.

3. ACF and PACF — Model Identification Tools 10. From ARIMA to Deep Learning