🎯 SQL for Data Analyst

5 min read 1002 words

This roadmap takes you from core query fluency to database performance mastery, mirroring the level of reasoning and rigor expected in high-bar technical interviews.


🧩 Core SQL Concepts

Note

The Top Tech Interview Angle (SELECT, WHERE, and Aggregations): The foundation of every SQL screen. These questions assess your ability to manipulate data accurately, reason about filtering and aggregation order, and balance correctness with performance. Strong candidates explain why queries behave a certain way — not just what the result is.


1.1: Learn the SQL Query Lifecycle

  • Understand SQL’s logical execution order:

    FROM → JOIN → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT
  • Internalize this order to reason about why filters or aggregates behave differently depending on where they are placed.

Deeper Insight: Interviewers often ask: “Why can’t you use an aggregate function in the WHERE clause?” This tests whether you understand query evaluation stages.


1.2: Filter and Aggregate Like an Analyst

  • Practice writing filters:

    SELECT *
    FROM sales
    WHERE region = 'West' AND amount > 1000;
  • Combine with aggregation:

    SELECT region, SUM(amount) AS total_sales
    FROM sales
    WHERE amount > 1000
    GROUP BY region;
  • Learn when to use HAVING vs WHERE and how each affects grouped data.

Probing Question: “If you move a condition from WHERE to HAVING, how does the output change?” This distinguishes between pre-group filtering and post-group filtering.


1.3: Nulls, Case, and Conditional Logic

  • Understand how NULL propagates in expressions.

  • Learn conditional constructs like:

    SELECT 
      CASE 
        WHEN discount IS NULL THEN 'No Discount'
        ELSE 'Discount Applied'
      END AS discount_flag
    FROM orders;
  • Realize that NULL ≠ 0 ≠ ‘’ — a subtlety that trips up many candidates.

Deeper Insight: Top interviewers might ask: “Why does COUNT(column) exclude NULL values?” Be ready to explain the design rationale behind SQL’s treatment of nulls.


1.4: Combine Tables with Joins

  • Master INNER, LEFT, RIGHT, and FULL OUTER joins.

  • Visualize data relationships through simple examples.

    SELECT c.customer_id, c.name, o.amount
    FROM customers c
    LEFT JOIN orders o ON c.customer_id = o.customer_id;
  • Understand how join order and join type affect row count and performance.

Probing Question: “If a LEFT JOIN returns fewer rows than expected — what could be wrong?” Tests reasoning about join conditions, NULL mismatches, and filters.


⚙️ Query Optimization and Performance

Note

The Top Tech Interview Angle: At senior levels, interviewers focus less on syntax and more on how you reason about performance, scalability, and data volume. Can you diagnose slow queries? Can you choose the right indexing or query rewrite?


2.1: Read and Interpret Execution Plans

  • Use EXPLAIN or EXPLAIN ANALYZE to visualize query plans.

    EXPLAIN ANALYZE
    SELECT region, SUM(amount)
    FROM sales
    GROUP BY region;
  • Identify sequential scans, index scans, nested loops, and hash joins.

Probing Question: “If your query uses a nested loop join over millions of rows, what would you check first?” Tests awareness of join algorithms and data size scaling.


2.2: Indexing for Speed

  • Understand when indexes help and when they don’t.

    CREATE INDEX idx_customer_id ON orders(customer_id);
  • Realize that indexes speed up lookups but slow down writes (insert/update).

Deeper Insight: “Would an index on a low-cardinality column help?” Tests understanding of selectivity trade-offs.


2.3: Query Refactoring and CTEs

  • Break down complex queries using Common Table Expressions (CTEs):

    WITH regional_sales AS (
      SELECT region, SUM(amount) AS total
      FROM sales
      GROUP BY region
    )
    SELECT region, total
    FROM regional_sales
    WHERE total > 10000;
  • Discuss CTE performance vs subqueries — some databases materialize CTEs, others inline them.

Probing Question: “Would using a CTE always make a query faster?” Excellent candidates explain that readability improves, but performance may degrade depending on DB engine behavior.


2.4: Avoiding Common Pitfalls

  • Watch out for:

    • Functions on indexed columns (breaks index usage)
    • Using SELECT *
    • Overusing DISTINCT to hide duplicates instead of fixing logic
  • Apply query simplification and index-friendly design.

Deeper Insight: “Your query works but is slow. How do you debug it?” Expect to discuss using query plans, analyzing I/O cost, and caching layers.


🧮 Analytical SQL for Data Insights

Note

The Top Tech Interview Angle: Analytical SQL questions simulate real-world reporting and metric derivation — e.g., retention, ranking, moving averages. They test your ability to transform raw data into insights efficiently and correctly.


3.1: Window Functions and Analytics

  • Learn core syntax:

    SELECT 
      customer_id,
      SUM(amount) OVER (PARTITION BY region ORDER BY order_date) AS running_total
    FROM orders;
  • Understand ROW_NUMBER, RANK, DENSE_RANK, and their differences.

Probing Question: “If two rows have the same amount, how does RANK() differ from DENSE_RANK()?” Tests detail-oriented understanding of window mechanics.


3.2: Cohort and Retention Analysis

  • Implement cohort grouping:

    SELECT 
      customer_id,
      MIN(order_date) AS cohort_start,
      DATE_PART('month', order_date - MIN(order_date) OVER (PARTITION BY customer_id)) AS months_since_signup
    FROM orders;
  • Learn how to calculate user retention or conversion funnels in pure SQL.

Deeper Insight: “How would you optimize this on billions of rows?” Expect discussions on materialized views, partitions, or pre-aggregation.


3.3: Time Series and Lag Analysis

  • Use LAG, LEAD, and moving averages:

    SELECT 
      order_date,
      amount,
      AVG(amount) OVER (ORDER BY order_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS rolling_weekly_avg
    FROM sales;

Probing Question: “If your window frame is too wide, what happens to performance?” Tests understanding of memory usage and intermediate computation.


🧱 Database Design & Normalization

Note

The Top Tech Interview Angle: Even in analyst roles, interviewers expect reasoning about schema design — normalization, redundancy, and trade-offs between performance and maintainability.


4.1: Understand Normal Forms

  • Learn 1NF, 2NF, 3NF intuitively:

    • 1NF: Atomic columns
    • 2NF: Remove partial dependencies
    • 3NF: Remove transitive dependencies

Probing Question: “Why might you deliberately denormalize a database?” Tests understanding of analytical performance vs design purity.


4.2: Star and Snowflake Schemas

  • Understand fact and dimension tables.
  • Recognize that analytical warehouses prefer star schemas for faster joins.

Deeper Insight: “Why are dimension tables small and fact tables large?” Candidates should explain cardinality and join selectivity.


4.3: Data Integrity and Keys

  • Distinguish primary keys, foreign keys, and unique constraints.
  • Learn about surrogate vs natural keys and trade-offs in analytics systems.

Probing Question: “When would you choose a surrogate key over a natural key?” Demonstrates reasoning around key stability and schema evolution.


Any doubt in content? Ask me anything?
Chat
🤖 👋 Hi there! I'm your learning assistant. If you have any questions about this page or need clarification, feel free to ask!