3.3 Time Series and Lag Analysis

6 min read 1127 words

🪄 Step 1: Intuition & Real-World Analogy

🎯 Core Idea

Time-series analysis in SQL is all about understanding change over time — tracking trends, growth, and periodic patterns directly inside your database.

You can use:

  • LAG() / LEAD() → compare current values to previous or next rows.
  • Moving averages → smooth out daily noise and highlight trends.

This allows analysts to say things like:

“Revenue grew by 15% compared to last week,” “Average sales over the last 7 days are trending up,” or “User logins are dropping compared to yesterday.”


🌍 Real-Life Analogy

Imagine you’re a fitness trainer 📈 tracking your clients’ weights:

  • You write down their weights daily.
  • You want to see how much they lost since yesterday → that’s LAG().
  • You want to see the average weight over the last 7 days → that’s a moving average.

The more historical data you look at (the wider your window), the smoother your trend line — but the heavier your notebook. That’s the trade-off: insight vs. computation cost.


💬 Conversational Style

You might think:

“Wait, can’t I just compute differences in Excel?”

Sure — but imagine your dataset has 10 million daily transactions. SQL lets you compute trends at scale, using OVER windows that operate directly in the query engine. And it’s not just about numbers — it’s about patterns and deltas.


🌱 Step 2: The Core Concept Explained

What It Really Does

Time series analysis in SQL lets you:

  • Compare a value to its neighbors (LAG, LEAD).
  • Compute rolling metrics like averages or sums over time windows.
  • Detect trends (growth rate, spikes, seasonality).

It’s all powered by window frames (ROWS BETWEEN X PRECEDING AND CURRENT ROW) that define how many rows around the current one the function can “see.”

How It Works Internally

For each row:

  1. SQL looks at its partition (optional) and order (usually time).
  2. It builds a window frame — a set of nearby rows.
  3. Functions like LAG(), LEAD(), or AVG() compute using those rows.

Each time you widen the window, the engine stores more intermediate data in memory, which affects performance.

Common Use Cases
  • Day-over-day revenue growth (LAG).
  • Week-over-week moving average (AVG OVER).
  • Detecting churn or inactivity gaps (LEAD).
  • Computing rolling retention, session durations, or sales trends.

💻 Step 3: Code Demonstration & Explanation

Let’s explore practical examples that build intuition and query fluency.


🧩 Example 1: Day-over-Day Change with LAG()

SELECT 
  order_date,
  SUM(amount) AS daily_revenue,
  LAG(SUM(amount)) OVER (ORDER BY order_date) AS prev_day_revenue,
  SUM(amount) - LAG(SUM(amount)) OVER (ORDER BY order_date) AS revenue_change
FROM sales
GROUP BY order_date
ORDER BY order_date;

Explanation:

  • LAG() lets the current day look “back” at the previous day’s revenue.
  • You can easily compute deltas, growth percentages, or detect drop-offs.
order_datedaily_revenueprev_day_revenuerevenue_change
2025-01-011200NULLNULL
2025-01-0218001200600
2025-01-0316001800-200

🧩 Example 2: Next Event Preview with LEAD()

SELECT 
  customer_id,
  order_date,
  LEAD(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) AS next_order_date,
  LEAD(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) - order_date AS days_between_orders
FROM orders;

Explanation:

  • LEAD() peeks forward to find the next event for each user.
  • Useful for measuring re-engagement, session frequency, or churn risk.

🧩 Example 3: Moving Average (Rolling Weekly)

SELECT 
  order_date,
  SUM(amount) AS daily_sales,
  AVG(SUM(amount)) OVER (
    ORDER BY order_date 
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS rolling_weekly_avg
FROM sales
GROUP BY order_date
ORDER BY order_date;

Explanation:

  • Looks at a 7-day window (current + 6 days before).
  • Smooths daily fluctuations — ideal for identifying weekly trends.

🧩 Example 4: Moving Sum with Explicit Range

SELECT 
  order_date,
  SUM(amount) AS daily_sales,
  SUM(SUM(amount)) OVER (
    ORDER BY order_date 
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS weekly_total
FROM sales
GROUP BY order_date
ORDER BY order_date;

Explanation: Perfect for rolling totals — e.g., “total revenue over the last 7 days.”


⚙️ Step 4: Performance Insights & Optimization

ℹ️

🧠 What Happens Under the Hood

  • Each window frame requires temporary storage of multiple rows.
  • For LAG() / LEAD(), the system just fetches one row offset — very cheap.
  • For moving averages, SQL maintains an in-memory sliding buffer of the last N rows. The larger the window → the more memory and I/O needed.

⚙️ Optimizing Time Series Queries

  1. Use the Smallest Possible Window

    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW

    → better than UNBOUNDED PRECEDING when you only need a short period.

  2. Pre-Aggregate by Date If you have billions of transactions per day, first summarize daily totals:

    CREATE MATERIALIZED VIEW daily_sales AS
    SELECT order_date, SUM(amount) AS daily_revenue
    FROM sales
    GROUP BY order_date;
  3. Partition by Entity When Possible Example:

    PARTITION BY region ORDER BY order_date

    Limits memory usage by splitting windows per region.

  4. Cluster or Index by Time Ensures data is physically ordered by order_date, reducing sort cost.

  5. Monitor Execution Plans Use:

    EXPLAIN (ANALYZE, BUFFERS)

    Look for “WindowAgg” nodes — if they show large buffer usage, consider breaking query into chunks.

🚀 Rule of Thumb

The wider your window frame, the heavier the memory footprint. Wide rolling windows ≈ temporary cache explosion.


🎯 Step 5: Interview Perspective & Reasoning

Common Interview Questions
  • “What is the difference between LAG() and LEAD()?”
  • “When would you use a moving average in SQL?”
  • “If your window frame is too wide, what happens to performance?”
  • “What’s the difference between ROWS and RANGE in window frames?”
  • “How do you compute day-over-day growth for millions of rows efficiently?”
Reasoning Expectation

Interviewers want to see:

  • Clarity on temporal logic (time ordering & partitions).
  • Awareness of execution costs (sorts, memory, and buffers).
  • Ability to justify design decisions: “I’d pre-aggregate or partition by date for performance.”
Bonus Challenge

You have sales data at the transaction level for 1 billion rows. Compute 7-day rolling revenue per region efficiently. Discuss indexing, partitioning, and when you’d switch to materialized views.


🧩 Step 6: Common Mistakes & Debugging Insights

🚨 Common Mistakes (Click to Expand)
  • Forgetting ORDER BY in window functions → undefined ordering = meaningless results.

  • Using UNBOUNDED PRECEDING unnecessarily → performance killer.

  • Confusing RANGE and ROWS:

    • ROWS = fixed number of rows.
    • RANGE = all rows within a value range (e.g., 7 days).
  • Not pre-aggregating → applying rolling windows over raw events instead of daily summaries.

  • Missing index on time column → expensive sorts.

Debugging Tip: Use EXPLAIN (ANALYZE, BUFFERS) to check memory usage of WindowAgg. If your rolling query is slow:

  1. Limit window frame.
  2. Materialize intermediate results.
  3. Check physical data order.

🧠 Step 7: Key Takeaways

🧠 Main Idea: Window functions like LAG(), LEAD(), and moving averages let SQL understand change over time — powering real analytics directly in queries.

💻 How It Works: They operate over ordered partitions using window frames that define how many past/future rows are visible.

🚀 Why It Matters: They’re essential for trend analysis, anomaly detection, and forecasting pipelines — but mastering performance trade-offs is what sets experts apart.

Any doubt in content? Ask me anything?
Chat
🤖 👋 Hi there! I'm your learning assistant. If you have any questions about this page or need clarification, feel free to ask!