3.1 Window Functions and Analytics

Core Skills Guide for AI Interviews (Math, Code, SQL) 2025

5 min read 1018 words

🪄 Step 1: Intuition & Real-World Analogy

🎯 Core Idea

Window functions allow you to perform calculations across sets of rows that are related to the current row — without collapsing data into groups like GROUP BY does.

Think of them as “aggregate functions with a memory” — they let each row peek at its neighbors and compute running totals, ranks, or averages while keeping all rows intact.

🌍 Real-Life Analogy

Imagine a marathon leaderboard 🏃‍♀️. As each runner crosses the finish line:

You want to know their rank.
Their cumulative distance.
How their time compares to the average of their group.

A window function is like a running scoreboard that updates for every participant — considering the context (region, team, or category) they belong to.

Each runner (row) sees both their individual data and their contextual statistics.

💬 Conversational Style

You might wonder:

“Can’t I just use GROUP BY?”

Well, GROUP BY aggregates — it collapses rows. But window functions annotate — they add insights to each row while keeping them all visible.

It’s the difference between:

“What’s the total revenue per region?” (GROUP BY)
vs “What’s the running total for each region over time?” (WINDOW FUNCTION)

🌱 Step 2: The Core Concept Explained

What It Really Does

Window functions apply calculations over a “window” — a subset of rows defined by:

PARTITION BY — groups rows (like categories).
ORDER BY — defines row order within each group.
FRAME CLAUSE — (optional) defines the exact range of rows the window sees (e.g., current row + previous 2).

How It Works Internally

When the SQL engine executes a window function:

It first partitions data (like GROUP BY).
It orders rows within each partition.
Then it applies the function to each row’s window frame — computing cumulative or relative metrics.

This happens after the WHERE and GROUP BY stages but before ORDER BY in the final result set.

Common Use Cases

Ranking customers or products by sales.
Calculating running totals or moving averages.
Comparing each row to group statistics (e.g., percentage of total).
Time-series trend analysis.

💻 Step 3: Code Demonstration & Explanation

Let’s build progressively from simple to powerful.

🧩 Example 1: Basic Window Function

SELECT 
  customer_id,
  region,
  SUM(amount) OVER (PARTITION BY region ORDER BY order_date) AS running_total
FROM orders;

Explanation:

PARTITION BY region: resets totals for each region.
ORDER BY order_date: ensures accumulation over time.
Each row keeps its original detail while showing cumulative progress.

So the output could look like this:

customer_id	region	order_date	amount	running_total
C01	East	2025-01-01	100	100
C02	East	2025-01-03	200	300
C03	East	2025-01-05	400	700
C04	West	2025-01-02	300	300
C05	West	2025-01-04	500	800

🧩 Example 2: Ranking with ROW_NUMBER(), RANK(), DENSE_RANK()

SELECT
  customer_id,
  amount,
  ROW_NUMBER() OVER (ORDER BY amount DESC) AS row_num,
  RANK()       OVER (ORDER BY amount DESC) AS rank_pos,
  DENSE_RANK() OVER (ORDER BY amount DESC) AS dense_rank_pos
FROM orders;

Explanation:

amount	ROW_NUMBER	RANK	DENSE_RANK
500	1	1	1
400	2	2	2
400	3	2	2
300	4	4	3

ROW_NUMBER() → unique sequence, even if tied.
RANK() → skips rank numbers after ties.
DENSE_RANK() → doesn’t skip numbers; ties get same rank, next rank increments by 1.

🧩 Example 3: Using WINDOW FRAME Clauses

SELECT
  order_date,
  SUM(amount) OVER (
      ORDER BY order_date
      ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  ) AS moving_total
FROM orders;

Explanation:

This computes a 3-row moving total (current + previous 2 orders).
Perfect for trend analysis or rolling averages.

🧩 Example 4: Comparing Each Row to Group Average

SELECT
  region,
  customer_id,
  amount,
  AVG(amount) OVER (PARTITION BY region) AS regional_avg,
  amount - AVG(amount) OVER (PARTITION BY region) AS diff_from_avg
FROM orders;

Explanation: Each customer sees how their order compares to the average in their region — no grouping, no data loss.

⚙️ Step 4: Performance Insights & Optimization

ℹ️

Execution Flow:

FROM → Load data.
WHERE → Filter rows.
GROUP BY / HAVING → Aggregate (if any).
WINDOW FUNCTIONS → Compute per-row analytics.
ORDER BY / LIMIT → Final sort and pagination.

Optimization Tips:

Indexes help with PARTITION BY and ORDER BY sorting.
Use proper data ordering to minimize window recalculations.
Avoid combining too many window functions — they share computation if frame definitions match.
For time-series analytics, pre-sort data in the storage layer (e.g., clustering by date).

Trade-off:

Window functions use temporary buffers to hold partitioned results. Large windows (no frame restriction) → higher memory consumption.

🎯 Step 5: Interview Perspective & Reasoning

Common Interview Questions

“What’s the difference between GROUP BY and window functions?”
“Explain how RANK() differs from DENSE_RANK().”
“Can you calculate running totals without removing detail rows?”
“How does PARTITION BY differ from GROUP BY?”
“What happens if you omit ORDER BY in a window function?”

Reasoning Expectation

Strong candidates:

Can articulate that window functions don’t collapse rows.
Can demonstrate difference between ranking functions.
Understand execution flow and performance implications.
Provide business use cases like customer ranking, retention, or trend analysis.

Bonus Challenge

Write a query to find the top 3 highest-selling customers per region, breaking ties fairly. Use window functions (no subqueries).
Hint: Use ROW_NUMBER() inside a PARTITION BY region ORDER BY SUM(amount) DESC window.

🧩 Step 6: Common Mistakes & Debugging Insights

🚨 Common Mistakes (Click to Expand)

Forgetting to use ORDER BY → window functions lose deterministic meaning.
Using RANK() when DENSE_RANK() is desired (leads to gaps).
Misusing window functions with GROUP BY — they can’t coexist directly without nesting.
Confusing PARTITION BY (logical grouping) with GROUP BY (aggregating).
Applying window functions inside WHERE (not allowed; must be in SELECT or ORDER BY).

Debugging Tip: Use EXPLAIN ANALYZE to check sort operations — if you see multiple “Sort” nodes, merge window definitions or pre-sort data.

🧠 Step 7: Key Takeaways

🧠 Main Idea: Window functions let you perform per-row analytics within context — perfect for ranking, cumulative totals, and comparisons.

💻 How It Works: Operates over partitions of data using ORDER BY and optional frame clauses — keeping all rows intact.

🚀 Why It Matters: They turn SQL from a data retriever into a full-blown analytical tool — essential for business insights, reporting, and interview excellence.

3.2 Cohort and Retention Analysis 2.4 Avoiding Common Pitfalls