🎯 SQL for Data Analyst

Core Skills Guide for AI Interviews (Math, Code, SQL) 2025

5 min read 1002 words

This roadmap takes you from core query fluency to database performance mastery, mirroring the level of reasoning and rigor expected in high-bar technical interviews.

🧩 Core SQL Concepts

Note

The Top Tech Interview Angle (SELECT, WHERE, and Aggregations): The foundation of every SQL screen. These questions assess your ability to manipulate data accurately, reason about filtering and aggregation order, and balance correctness with performance. Strong candidates explain why queries behave a certain way — not just what the result is.

1.1: Learn the SQL Query Lifecycle

Understand SQL’s logical execution order:

FROM → JOIN → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT

Internalize this order to reason about why filters or aggregates behave differently depending on where they are placed.

Deeper Insight: Interviewers often ask: “Why can’t you use an aggregate function in the WHERE clause?” This tests whether you understand query evaluation stages.

1.2: Filter and Aggregate Like an Analyst

Practice writing filters:

SELECT *
FROM sales
WHERE region = 'West' AND amount > 1000;

Combine with aggregation:

SELECT region, SUM(amount) AS total_sales
FROM sales
WHERE amount > 1000
GROUP BY region;

Learn when to use HAVING vs WHERE and how each affects grouped data.

Probing Question: “If you move a condition from WHERE to HAVING, how does the output change?” This distinguishes between pre-group filtering and post-group filtering.

1.3: Nulls, Case, and Conditional Logic

Understand how NULL propagates in expressions.

Learn conditional constructs like:

SELECT 
  CASE 
    WHEN discount IS NULL THEN 'No Discount'
    ELSE 'Discount Applied'
  END AS discount_flag
FROM orders;

Realize that NULL ≠ 0 ≠ ‘’ — a subtlety that trips up many candidates.

Deeper Insight: Top interviewers might ask: “Why does COUNT(column) exclude NULL values?” Be ready to explain the design rationale behind SQL’s treatment of nulls.

1.4: Combine Tables with Joins

Master INNER, LEFT, RIGHT, and FULL OUTER joins.

Visualize data relationships through simple examples.

SELECT c.customer_id, c.name, o.amount
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id;

Understand how join order and join type affect row count and performance.

Probing Question: “If a LEFT JOIN returns fewer rows than expected — what could be wrong?” Tests reasoning about join conditions, NULL mismatches, and filters.

⚙️ Query Optimization and Performance

Note

The Top Tech Interview Angle: At senior levels, interviewers focus less on syntax and more on how you reason about performance, scalability, and data volume. Can you diagnose slow queries? Can you choose the right indexing or query rewrite?

2.1: Read and Interpret Execution Plans

Use EXPLAIN or EXPLAIN ANALYZE to visualize query plans.

EXPLAIN ANALYZE
SELECT region, SUM(amount)
FROM sales
GROUP BY region;

Identify sequential scans, index scans, nested loops, and hash joins.

Probing Question: “If your query uses a nested loop join over millions of rows, what would you check first?” Tests awareness of join algorithms and data size scaling.

2.2: Indexing for Speed

Understand when indexes help and when they don’t.

CREATE INDEX idx_customer_id ON orders(customer_id);

Realize that indexes speed up lookups but slow down writes (insert/update).

Deeper Insight: “Would an index on a low-cardinality column help?” Tests understanding of selectivity trade-offs.

2.3: Query Refactoring and CTEs

Break down complex queries using Common Table Expressions (CTEs):

WITH regional_sales AS (
  SELECT region, SUM(amount) AS total
  FROM sales
  GROUP BY region
)
SELECT region, total
FROM regional_sales
WHERE total > 10000;

Discuss CTE performance vs subqueries — some databases materialize CTEs, others inline them.

Probing Question: “Would using a CTE always make a query faster?” Excellent candidates explain that readability improves, but performance may degrade depending on DB engine behavior.

2.4: Avoiding Common Pitfalls

Watch out for:
- Functions on indexed columns (breaks index usage)
- Using SELECT *
- Overusing DISTINCT to hide duplicates instead of fixing logic
Apply query simplification and index-friendly design.

Deeper Insight: “Your query works but is slow. How do you debug it?” Expect to discuss using query plans, analyzing I/O cost, and caching layers.

🧮 Analytical SQL for Data Insights

Note

The Top Tech Interview Angle: Analytical SQL questions simulate real-world reporting and metric derivation — e.g., retention, ranking, moving averages. They test your ability to transform raw data into insights efficiently and correctly.

3.1: Window Functions and Analytics

Learn core syntax:

SELECT 
  customer_id,
  SUM(amount) OVER (PARTITION BY region ORDER BY order_date) AS running_total
FROM orders;

Understand ROW_NUMBER, RANK, DENSE_RANK, and their differences.

Probing Question: “If two rows have the same amount, how does RANK() differ from DENSE_RANK()?” Tests detail-oriented understanding of window mechanics.

3.2: Cohort and Retention Analysis

Implement cohort grouping:

SELECT 
  customer_id,
  MIN(order_date) AS cohort_start,
  DATE_PART('month', order_date - MIN(order_date) OVER (PARTITION BY customer_id)) AS months_since_signup
FROM orders;

Learn how to calculate user retention or conversion funnels in pure SQL.

Deeper Insight: “How would you optimize this on billions of rows?” Expect discussions on materialized views, partitions, or pre-aggregation.

3.3: Time Series and Lag Analysis

Use LAG, LEAD, and moving averages:

SELECT 
  order_date,
  amount,
  AVG(amount) OVER (ORDER BY order_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS rolling_weekly_avg
FROM sales;

Probing Question: “If your window frame is too wide, what happens to performance?” Tests understanding of memory usage and intermediate computation.

🧱 Database Design & Normalization

Note

The Top Tech Interview Angle: Even in analyst roles, interviewers expect reasoning about schema design — normalization, redundancy, and trade-offs between performance and maintainability.

4.1: Understand Normal Forms

Learn 1NF, 2NF, 3NF intuitively:
- 1NF: Atomic columns
- 2NF: Remove partial dependencies
- 3NF: Remove transitive dependencies

Probing Question: “Why might you deliberately denormalize a database?” Tests understanding of analytical performance vs design purity.

4.2: Star and Snowflake Schemas

Understand fact and dimension tables.
Recognize that analytical warehouses prefer star schemas for faster joins.

Deeper Insight: “Why are dimension tables small and fact tables large?” Candidates should explain cardinality and join selectivity.

4.3: Data Integrity and Keys

Distinguish primary keys, foreign keys, and unique constraints.
Learn about surrogate vs natural keys and trade-offs in analytics systems.

Probing Question: “When would you choose a surrogate key over a natural key?” Demonstrates reasoning around key stability and schema evolution.

📘 SQL Coding Interview Questions Interview Theories Solutions