7. Feature Engineering for Time Series ML

Machine Learning Interview Guide for Top Tech Roles (2025)

5 min read 932 words

🪄 Step 1: Intuition & Motivation

Core Idea: Traditional ML algorithms (like XGBoost, Random Forests, or Neural Nets) can’t see time. They treat each row of data as independent — but in time series, each row is connected to the past.

So, before ML models can forecast, we must teach them the language of time. That’s what feature engineering does — it transforms a sequential story into a format ML can understand, by creating meaningful features that encode history, patterns, and temporal context.

Simple Analogy: Imagine you’re predicting a student’s next test score. You wouldn’t just use their name — you’d use their previous scores, study consistency, and exam season. That’s exactly what we do in time series: create lagged versions of the past to predict the future.

🌱 Step 2: Core Concept

Let’s unpack how we “reshape” time series for machine learning.

What’s Happening Under the Hood?

🕰️ Lag Features

Lag features capture how the past influences the present. For a time series $X_t$:

lag_1 = $X_{t-1}$ (previous day’s value)
lag_7 = $X_{t-7}$ (last week’s value)

These lags act like “memory snapshots.”

For instance: If you’re predicting tomorrow’s sales, lag_7 tells you what sales looked like the same day last week — a strong seasonal clue.

📈 Rolling Features

Rolling (or moving) features summarize patterns over a sliding window.

Examples:

Rolling Mean (trend): $\text{mean}(X_{t-3:t})$
Rolling Std (volatility): $\text{std}(X_{t-7:t})$

They give your model a sense of “momentum” — is the value rising, steady, or fluctuating?

📆 Time-Based Encodings

These extract cyclical patterns like:

Day of week (0–6)
Month (1–12)
Quarter (1–4)

But beware — the calendar is cyclical! December and January are close, even though numerically they’re far apart (12 vs 1). To handle this, we use cyclical encodings:

$$ \text{sin_month} = \sin\left(\frac{2\pi \cdot \text{month}}{12}\right) $$

$$ \text{cos_month} = \cos\left(\frac{2\pi \cdot \text{month}}{12}\right) $$

This makes “December” and “January” close again — restoring periodic logic.

Why It Works This Way

Machine learning models love structured, tabular input. But time series data is inherently sequential — it lives in one column ($X_t$).

Feature engineering unfolds that sequence into columns of memory: lagged values, moving summaries, and date encodings. This gives ML models the context they need to learn temporal dependencies — without violating chronological order.

Essentially, you’re converting your series into a supervised learning dataset:

$$ (X_{t-1}, X_{t-2}, X_{t-3}, \dots) \rightarrow X_t $$

This allows even models that have no “sense of time” to learn forecasting patterns.

How It Fits in ML Thinking

Think of each lag or window feature as a “temporal feature column.” In deep learning, RNNs or Transformers handle this automatically by remembering sequences — but for tree-based or linear models, we have to build those memories manually.

This process bridges the world of classical time series and machine learning pipelines — enabling hybrid forecasting systems that scale.

📐 Step 3: Mathematical Foundation

Let’s formalize this transformation.

Supervised Transformation

Given a univariate time series $X_t$, we create features as:

$$ \text{Feature Matrix } F_t = [X_{t-1}, X_{t-2}, \dots, X_{t-n}] $$

and

$$ \text{Target } y_t = X_t $$

Now each row represents a snapshot of the past $n$ steps used to predict the current step.

This process is called windowing — turning the temporal sequence into supervised samples.

Windowing converts “history” into “features.” Instead of time flowing forward, you freeze it — one chunk at a time — and train your model to continue the story.

Sliding Window Mechanism

A sliding window keeps moving forward — always using the latest n points to predict the next one.

For example, with window size = 3:

Time	X_t	lag_1	lag_2	lag_3
t=4	10	9	8	7
t=5	12	10	9	8
t=6	11	12	10	9

This ensures training examples reflect the evolving nature of time — without peeking into the future.

A sliding window is like a camera panning across time — capturing one moment’s context before moving on.

🧠 Step 4: Assumptions or Key Ideas

No Future Information: Never use future data when building features — that’s data leakage.
Consistent Intervals: Missing timestamps must be filled (interpolation or forward-fill).
Window Size Matters: Too small → not enough context; too large → unnecessary noise.
Temporal Alignment: Every feature used for prediction must come strictly before the target time.

⚖️ Step 5: Strengths, Limitations & Trade-offs

✅ Strengths

Enables ML models to forecast sequential data.
Flexible: works for univariate and multivariate series.
Can capture nonlinear relationships better than ARIMA-like models.

⚠️ Limitations

Manual engineering can be tedious for large-scale systems.
Risk of leakage if temporal order isn’t strictly maintained.
Lacks interpretability compared to ARIMA-family models.

⚖️ Trade-offs Feature engineering boosts flexibility and predictive power — but at the cost of interpretability and higher data management complexity. That’s why feature stores and backfills are used in production — to track what features existed when and ensure consistent temporal logic.

🚧 Step 6: Common Misunderstandings

🚨 Common Misunderstandings (Click to Expand)

“Lag features can include future data.” ❌ Never — that’s data leakage.
“Rolling means are always safe.” ❌ Only if computed using past data up to t, not future windows.
“Calendar encodings are just numeric.” ❌ Use cyclical encodings to preserve time continuity.

🧩 Step 7: Mini Summary

🧠 What You Learned: Feature engineering transforms time-dependent data into ML-friendly format using lags, rolling windows, and time encodings.

⚙️ How It Works: By creating lag-based, rolling, and cyclical features, we help ML models learn temporal patterns without violating time order.

🎯 Why It Matters: It’s the key bridge between statistical forecasting and modern ML — allowing scalable, automated, and robust forecasting pipelines.

8. Forecast Evaluation Metrics 6. Facebook Prophet - Practical Forecasting at Scale