🎯 SQL for Data Analyst
This roadmap takes you from core query fluency to database performance mastery, mirroring the level of reasoning and rigor expected in high-bar technical interviews.
🧩 Core SQL Concepts
Note
The Top Tech Interview Angle (SELECT, WHERE, and Aggregations): The foundation of every SQL screen. These questions assess your ability to manipulate data accurately, reason about filtering and aggregation order, and balance correctness with performance. Strong candidates explain why queries behave a certain way — not just what the result is.
1.1: Learn the SQL Query Lifecycle
Understand SQL’s logical execution order:
FROM → JOIN → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMITInternalize this order to reason about why filters or aggregates behave differently depending on where they are placed.
Deeper Insight: Interviewers often ask: “Why can’t you use an aggregate function in the WHERE clause?” This tests whether you understand query evaluation stages.
1.2: Filter and Aggregate Like an Analyst
Practice writing filters:
SELECT * FROM sales WHERE region = 'West' AND amount > 1000;Combine with aggregation:
SELECT region, SUM(amount) AS total_sales FROM sales WHERE amount > 1000 GROUP BY region;Learn when to use
HAVINGvsWHEREand how each affects grouped data.
Probing Question: “If you move a condition from WHERE to HAVING, how does the output change?” This distinguishes between pre-group filtering and post-group filtering.
1.3: Nulls, Case, and Conditional Logic
Understand how
NULLpropagates in expressions.Learn conditional constructs like:
SELECT CASE WHEN discount IS NULL THEN 'No Discount' ELSE 'Discount Applied' END AS discount_flag FROM orders;Realize that NULL ≠ 0 ≠ ‘’ — a subtlety that trips up many candidates.
Deeper Insight: Top interviewers might ask: “Why does
COUNT(column)exclude NULL values?” Be ready to explain the design rationale behind SQL’s treatment of nulls.
1.4: Combine Tables with Joins
Master
INNER,LEFT,RIGHT, andFULL OUTERjoins.Visualize data relationships through simple examples.
SELECT c.customer_id, c.name, o.amount FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id;Understand how join order and join type affect row count and performance.
Probing Question: “If a LEFT JOIN returns fewer rows than expected — what could be wrong?” Tests reasoning about join conditions, NULL mismatches, and filters.
⚙️ Query Optimization and Performance
Note
The Top Tech Interview Angle: At senior levels, interviewers focus less on syntax and more on how you reason about performance, scalability, and data volume. Can you diagnose slow queries? Can you choose the right indexing or query rewrite?
2.1: Read and Interpret Execution Plans
Use
EXPLAINorEXPLAIN ANALYZEto visualize query plans.EXPLAIN ANALYZE SELECT region, SUM(amount) FROM sales GROUP BY region;Identify sequential scans, index scans, nested loops, and hash joins.
Probing Question: “If your query uses a nested loop join over millions of rows, what would you check first?” Tests awareness of join algorithms and data size scaling.
2.2: Indexing for Speed
Understand when indexes help and when they don’t.
CREATE INDEX idx_customer_id ON orders(customer_id);Realize that indexes speed up lookups but slow down writes (insert/update).
Deeper Insight: “Would an index on a low-cardinality column help?” Tests understanding of selectivity trade-offs.
2.3: Query Refactoring and CTEs
Break down complex queries using Common Table Expressions (CTEs):
WITH regional_sales AS ( SELECT region, SUM(amount) AS total FROM sales GROUP BY region ) SELECT region, total FROM regional_sales WHERE total > 10000;Discuss CTE performance vs subqueries — some databases materialize CTEs, others inline them.
Probing Question: “Would using a CTE always make a query faster?” Excellent candidates explain that readability improves, but performance may degrade depending on DB engine behavior.
2.4: Avoiding Common Pitfalls
Watch out for:
- Functions on indexed columns (breaks index usage)
- Using
SELECT * - Overusing DISTINCT to hide duplicates instead of fixing logic
Apply query simplification and index-friendly design.
Deeper Insight: “Your query works but is slow. How do you debug it?” Expect to discuss using query plans, analyzing I/O cost, and caching layers.
🧮 Analytical SQL for Data Insights
Note
The Top Tech Interview Angle: Analytical SQL questions simulate real-world reporting and metric derivation — e.g., retention, ranking, moving averages. They test your ability to transform raw data into insights efficiently and correctly.
3.1: Window Functions and Analytics
Learn core syntax:
SELECT customer_id, SUM(amount) OVER (PARTITION BY region ORDER BY order_date) AS running_total FROM orders;Understand ROW_NUMBER, RANK, DENSE_RANK, and their differences.
Probing Question: “If two rows have the same amount, how does RANK() differ from DENSE_RANK()?” Tests detail-oriented understanding of window mechanics.
3.2: Cohort and Retention Analysis
Implement cohort grouping:
SELECT customer_id, MIN(order_date) AS cohort_start, DATE_PART('month', order_date - MIN(order_date) OVER (PARTITION BY customer_id)) AS months_since_signup FROM orders;Learn how to calculate user retention or conversion funnels in pure SQL.
Deeper Insight: “How would you optimize this on billions of rows?” Expect discussions on materialized views, partitions, or pre-aggregation.
3.3: Time Series and Lag Analysis
Use LAG, LEAD, and moving averages:
SELECT order_date, amount, AVG(amount) OVER (ORDER BY order_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS rolling_weekly_avg FROM sales;
Probing Question: “If your window frame is too wide, what happens to performance?” Tests understanding of memory usage and intermediate computation.
🧱 Database Design & Normalization
Note
The Top Tech Interview Angle: Even in analyst roles, interviewers expect reasoning about schema design — normalization, redundancy, and trade-offs between performance and maintainability.
4.1: Understand Normal Forms
Learn 1NF, 2NF, 3NF intuitively:
- 1NF: Atomic columns
- 2NF: Remove partial dependencies
- 3NF: Remove transitive dependencies
Probing Question: “Why might you deliberately denormalize a database?” Tests understanding of analytical performance vs design purity.
4.2: Star and Snowflake Schemas
- Understand fact and dimension tables.
- Recognize that analytical warehouses prefer star schemas for faster joins.
Deeper Insight: “Why are dimension tables small and fact tables large?” Candidates should explain cardinality and join selectivity.
4.3: Data Integrity and Keys
- Distinguish primary keys, foreign keys, and unique constraints.
- Learn about surrogate vs natural keys and trade-offs in analytics systems.
Probing Question: “When would you choose a surrogate key over a natural key?” Demonstrates reasoning around key stability and schema evolution.