Time Series Analysis
Note
The Top Tech Company Angle: Time series problems test your ability to reason under temporal dependencies — where yesterday influences today. Interviewers want to see if you understand why shuffling breaks the data, how to diagnose non-stationarity, and whether you can apply the right transformation or model (ARIMA, SARIMA, Prophet, LSTM) for structured forecasting tasks. Your mastery here shows that you can handle real-world, sequential data like financial trends, server load, or user engagement over time.
1: Understanding Temporal Dependencies and Data Structure
- Begin with the definition: a time series is a sequence of observations indexed in time order (e.g., daily sales, hourly temperature).
- Understand components — trend, seasonality, cyclic behavior, and residuals.
- Learn how autocorrelation captures the relationship between past and present values.
- Visualize data to detect non-stationarity (e.g., using rolling mean/variance).
Deeper Insight: Time-based leakage is one of the most common interview traps. Never shuffle time series data — always use chronological order for training/testing splits.
2: Stationarity & Differencing
- Grasp the concept of stationarity — the statistical properties (mean, variance, covariance) should not change over time.
- Learn to test stationarity using:
- Rolling statistics (plot moving averages)
- ADF (Augmented Dickey-Fuller) Test
- Apply differencing (subtracting previous observations) to stabilize mean levels and remove trends.
Probing Question: “How do you handle a series that is trend-stationary but not variance-stationary?”
Be ready to discuss log transforms, Box-Cox, and power transforms for variance stabilization.
3: ACF and PACF — Model Identification Tools
- Learn Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots:
- ACF → measures overall correlation with lagged versions.
- PACF → isolates direct correlations, removing intermediate effects.
- Use them to identify AR (AutoRegressive) and MA (Moving Average) order parameters.
Deeper Insight: Interviewers might ask you to interpret real ACF/PACF plots — practice identifying AR(1), MA(1), ARMA(1,1) signatures. The “cut-off” behavior is the key diagnostic signal.
4: ARIMA — The Statistical Workhorse
- Understand ARIMA(p, d, q) model:
- p: number of AR terms
- d: degree of differencing
- q: number of MA terms
- Learn the Box-Jenkins methodology for model identification, estimation, and validation.
- Implement an ARIMA model using
statsmodels.tsa.arima.model.ARIMA. - Evaluate with residual diagnostics — residuals should look like white noise.
Probing Question: “What happens if residuals show autocorrelation?”
Answer: “Model is underfitted — revisit ACF/PACF or increase complexity (e.g., add seasonal terms).”
5: SARIMA — Handling Seasonality
- Extend ARIMA to SARIMA(p, d, q)(P, D, Q, s):
- Seasonal terms capture repeating cycles (like weekly or yearly patterns).
- Identify seasonality using seasonal decomposition and ACF spikes at lag multiples of
s. - Implement SARIMA via
statsmodels.tsa.statespace.SARIMAX.
Deeper Insight: Interviewers love to ask “What is the difference between SARIMA and ARIMA?”
Answer: SARIMA explicitly models seasonality with periodic lags; ARIMA doesn’t.
6: Facebook Prophet — Practical Forecasting at Scale
- Understand Prophet’s additive model:
\( y(t) = g(t) + s(t) + h(t) + \epsilon_t \)
where \( g(t) \) is trend, \( s(t) \) is seasonality, \( h(t) \) is holiday effects. - Learn how Prophet auto-detects changepoints and handles irregular intervals.
- Implement Prophet for business-friendly forecasting and interpret output plots.
Probing Question: “Why would you prefer Prophet over ARIMA?”
Mention Prophet’s scalability, ease of use, and robustness to missing data and outliers — ideal for production pipelines.
7: Feature Engineering for Time Series ML
- Engineer lag features (e.g.,
lag_1,lag_7), rolling mean/variance, and time-based encodings (day, month, quarter). - Learn about windowing and sliding windows for supervised ML transformation.
- Avoid leakage — use only past information for each timestamp.
Deeper Insight: For large-scale ML systems, time-based feature pipelines are implemented using feature stores and backfills to maintain temporal integrity.
8: Forecast Evaluation Metrics
- Use time-series-specific metrics:
- MAE (Mean Absolute Error)
- RMSE (Root Mean Squared Error)
- MAPE (Mean Absolute Percentage Error)
- SMAPE (Symmetric MAPE)
- Understand pros and cons — MAPE fails when true values are near zero; RMSE penalizes large errors more.
Probing Question: “Why not use accuracy for forecasting?”
Because forecasting is continuous-valued, and the goal is minimizing error magnitude, not classification accuracy.
9: Scaling to Real-World Systems
- Learn about rolling retraining and time-based cross-validation (e.g.,
TimeSeriesSplitin scikit-learn). - Understand concept drift — data patterns changing over time — and mitigation strategies (retrain schedules, online learning).
- Explore model monitoring for drift detection and performance degradation.
Deeper Insight: Expect interviewers to ask about deployment. For example:
“How do you ensure your time series model remains valid after a sudden event (like a pandemic or server outage)?”
Highlight adaptability: dynamic retraining, drift detection, or regime-switching models.
10: From ARIMA to Deep Learning
- Bridge from classical to modern models:
- RNNs / LSTMs / GRUs for long temporal dependencies.
- Temporal Convolutional Networks (TCNs) for parallelism.
- Transformers for Time Series (Temporal Fusion Transformer) for multivariate forecasting.
- Understand trade-offs: interpretability (ARIMA) vs. representation power (LSTM).
Probing Question: “When would you still prefer ARIMA over LSTM?”
When data is small, interpretability is required, or training compute is constrained.