82 min read 17384 words

👩‍💼 Business Analyst (BA)

(Beginner-friendly, example-driven Q&A for Top Tech Company Interviews)


❓ Q1: Retrieve all customer records located in a specific city.

🧩 Problem Statement

Return every customer row where the customer lives in a given city (e.g., Mumbai). This is basic selection + filtering.


💾 Step 1: Assume Example Tables

customers — list of customers.

customer_idnameemailcityjoined_date
1Raj Patelraj@example.comMumbai2023-05-10
2Aisha Khanaisha@example.comDelhi2022-11-01
3Sunil Raosunil@example.comMumbai2024-02-15
4Priya Vermapriya@example.comPune2021-08-20

Columns:

  • customer_id: unique id.
  • city: used to filter.

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Tested concept: SELECT + WHERE filtering. We simply select rows where city = 'Mumbai'.

🏗️ SQL Query

SELECT customer_id, name, email, city, joined_date
FROM customers
WHERE city = 'Mumbai';

🔍 Explanation

  • SELECT chooses columns.
  • WHERE city = 'Mumbai' filters only customers in Mumbai.

🧾 Step 3: Expected Output

customer_idnameemailcityjoined_date
1Raj Patelraj@example.comMumbai2023-05-10
3Sunil Raosunil@example.comMumbai2024-02-15

This shows both customers living in Mumbai.


⚙️ Step 4: Why This Query Works

A direct equality filter returns only matching rows; it’s efficient with an index on city.


⚡ Step 5: Common Mistakes

  • Forgetting quotes around string values.
  • Using LIKE when exact match is intended (or vice versa).
  • Case-sensitivity issues depending on DB collation.

🧠 Step 6: Alternate Approach (Optional)

-- Using parameter / placeholder (safer in application code)
SELECT *
FROM customers
WHERE city = ?;

Use this in prepared statements to avoid SQL injection.


🎯 Interview Tip

Explain that you would parameterize the city in production code and consider indexing city if queries are frequent.


🔍 Interviewer’s Analysis

  • Concept Tested: WHERE filter / selective projection
  • Difficulty: Beginner
  • Time Complexity: O(n) scan (or O(log n) if indexed)
  • Follow-Up Prompts: Indexing, case-sensitivity, parameterization.

🪄 End of Q1 — ready for the next question.


❓ Q2: Find employees whose names start with “A”.

🧩 Problem Statement

Return employee records where name begins with the letter A.


💾 Step 1: Assume Example Tables

employees

emp_idnamedept_idsalaryhired_date
1Amit Sharma10600002020-01-15
2Bhavna Rai20550002019-03-10
3Ananya Roy10700002021-07-22
4Chetan Joshi30500002022-09-05

Columns:

  • name: text to search with pattern.

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Tested concept: pattern matching (LIKE) or ILIKE for case-insensitive checks.

🏗️ SQL Query

SELECT emp_id, name, dept_id, salary, hired_date
FROM employees
WHERE name LIKE 'A%';

(If your DB supports case-insensitive ILIKE and you want to catch lower-case a: use WHERE name ILIKE 'a%'.)

🔍 Explanation

  • LIKE 'A%' matches strings starting with A. % means any following characters.

🧾 Step 3: Expected Output

emp_idnamedept_idsalaryhired_date
1Amit Sharma10600002020-01-15
3Ananya Roy10700002021-07-22

⚙️ Step 4: Why This Query Works

Pattern matching selects rows where the first character equals A. Use ILIKE for case-insensitive behavior if needed.


⚡ Step 5: Common Mistakes

  • Forgetting %.
  • Using = (exact match) instead of LIKE.
  • Not considering case sensitivity.

🧠 Step 6: Alternate Approach (Optional)

SELECT emp_id, name
FROM employees
WHERE SUBSTR(name,1,1) = 'A';

Less flexible and may be slower than LIKE.


🎯 Interview Tip

Mention whether collation is case-sensitive in your DB and prefer ILIKE or LOWER(name) LIKE 'a%' accordingly.


🪄 End of Q2.


❓ Q3: Display unique department names from the employee table.

🧩 Problem Statement

Return distinct department names listed in employees (or a departments table). Use DISTINCT.


💾 Step 1: Assume Example Tables

employees

emp_idnamedept_name
1Amit SharmaSales
2Bhavna RaiHR
3Ananya RoySales
4Chetan JoshiEngineering

Columns:

  • dept_name: department label (could also be a foreign key to departments).

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Concept: SELECT DISTINCT to remove duplicates.

🏗️ SQL Query

SELECT DISTINCT dept_name
FROM employees;

🔍 Explanation

DISTINCT collapses duplicate department values so each name appears once.


🧾 Step 3: Expected Output

dept_name
Sales
HR
Engineering

⚙️ Step 4: Why This Query Works

DISTINCT deduplicates values after selection.


⚡ Step 5: Common Mistakes

  • Selecting multiple columns with DISTINCT when only one is desired.
  • Not trimming whitespace/case differences leading to duplicates.

🧠 Step 6: Alternate Approach (Optional)

SELECT dept_name
FROM employees
GROUP BY dept_name;

Group by also returns unique dept names.


🎯 Interview Tip

If department IDs exist, prefer selecting ID + name to avoid ambiguity.


🪄 End of Q3.


❓ Q4: Fetch employees sorted by their joining date (latest first).

🧩 Problem Statement

Return employees ordered by joined_date descending so newest hires appear first.


💾 Step 1: Assume Example Tables

employees

emp_idnamejoined_date
1Amit2020-01-15
2Bhavna2022-03-10
3Ananya2024-07-01
4Chetan2021-09-05

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Concept: ORDER BY with DESC.

🏗️ SQL Query

SELECT emp_id, name, joined_date
FROM employees
ORDER BY joined_date DESC;

🔍 Explanation

ORDER BY joined_date DESC sorts newest first.


🧾 Step 3: Expected Output

emp_idnamejoined_date
3Ananya2024-07-01
2Bhavna2022-03-10
4Chetan2021-09-05
1Amit2020-01-15

⚙️ Step 4: Why This Query Works

Ordering controls presentation; databases optimize sorting but large sorts can be costly—consider indexing if needed.


⚡ Step 5: Common Mistakes

  • Forgetting DESC and getting oldest first.
  • Sorting on string date instead of date datatype.

🧠 Step 6: Alternate Approach (Optional)

-- Show top 10 newest hires
SELECT emp_id, name, joined_date
FROM employees
ORDER BY joined_date DESC
LIMIT 10;

🎯 Interview Tip

State whether joined_date is stored as DATE/TIMESTAMP to avoid string-sorting issues.


🪄 End of Q4.


❓ Q5: Get the top 10 highest-paid employees.

🧩 Problem Statement

Return the 10 employees with the largest salary. If tie-handling required, define an order.


💾 Step 1: Assume Example Tables

employees

emp_idnamesalary
1Amit Sharma120000
2Bhavna Rai90000
3Ananya Roy120000
4Chetan Joshi80000
5Deepa Shah75000

(Assume more rows beyond 5 in real table.)


🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Concepts: ORDER BY + LIMIT (or window functions for ties).

🏗️ SQL Query

SELECT emp_id, name, salary
FROM employees
ORDER BY salary DESC
LIMIT 10;

(If you want deterministic tie-break: ORDER BY salary DESC, emp_id ASC)

🔍 Explanation

Sort by salary descending, then take the first 10 rows.


🧾 Step 3: Expected Output (top few shown)

emp_idnamesalary
1Amit Sharma120000
3Ananya Roy120000
2Bhavna Rai90000
4Chetan Joshi80000

⚙️ Step 4: Why This Query Works

Ordering then limiting fetches top rows efficiently; use index on salary to improve speed.


⚡ Step 5: Common Mistakes

  • Using TOP syntax in non-supporting DBs (use LIMIT or FETCH FIRST).
  • Not specifying tie-breaker for reproducibility.

🧠 Step 6: Alternate Approach (Optional)

-- Using ROW_NUMBER to handle ties / pagination
WITH ranked AS (
  SELECT emp_id, name, salary,
         ROW_NUMBER() OVER (ORDER BY salary DESC, emp_id) AS rn
  FROM employees
)
SELECT emp_id, name, salary
FROM ranked
WHERE rn <= 10;

🎯 Interview Tip

Discuss tie-handling and how pagination will be implemented for large result sets.


🪄 End of Q5.


❓ Q6: Retrieve all transactions made after January 1, 2024.

🧩 Problem Statement

Return transactions with transaction_date strictly after 2024-01-01.


💾 Step 1: Assume Example Tables

transactions

tx_idcustomer_idamounttransaction_date
1112002023-12-20
2255002024-01-05
3130002024-03-11
434502022-08-10

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Concept: date comparison with WHERE.

🏗️ SQL Query

SELECT tx_id, customer_id, amount, transaction_date
FROM transactions
WHERE transaction_date > '2024-01-01';

(If datetime column: '2024-01-01 00:00:00' as needed.)

🔍 Explanation

Rows with date strictly later than Jan 1, 2024 are returned.


🧾 Step 3: Expected Output

tx_idcustomer_idamounttransaction_date
2255002024-01-05
3130002024-03-11

⚙️ Step 4: Why This Query Works

Date comparisons use chronological ordering; ensure correct datatype.


⚡ Step 5: Common Mistakes

  • Using >= '2024-01-01' if you want strictly after.
  • Comparing strings if dates stored as text with inconsistent format.

🧠 Step 6: Alternate Approach (Optional)

-- Using DATE() conversion (if transaction_date includes time)
SELECT *
FROM transactions
WHERE DATE(transaction_date) > '2024-01-01';

🎯 Interview Tip

Clarify whether “after” includes the date (>=) or strictly greater (>).


🪄 End of Q6.


❓ Q7: Find the total salary expense for each department.

🧩 Problem Statement

Group employees by department and sum salaries to get department-level salary expense.


💾 Step 1: Assume Example Tables

employees

emp_idnamedept_iddept_namesalary
1Amit10Sales60000
2Ananya10Sales70000
3Bhavna20HR55000
4Chetan30Eng80000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Concepts: GROUP BY + aggregate SUM.

🏗️ SQL Query

SELECT dept_id, dept_name, SUM(salary) AS total_salary_expense
FROM employees
GROUP BY dept_id, dept_name;

🔍 Explanation

Group rows by department and compute sum of salary per group.


🧾 Step 3: Expected Output

dept_iddept_nametotal_salary_expense
10Sales130000
20HR55000
30Eng80000

⚙️ Step 4: Why This Query Works

GROUP BY collapses rows into department buckets; SUM adds salaries within each bucket.


⚡ Step 5: Common Mistakes

  • Omitting dept_name from GROUP BY when selected.
  • Using COUNT instead of SUM.

🧠 Step 6: Alternate Approach (Optional)

-- With join to a departments table
SELECT d.dept_id, d.dept_name, COALESCE(SUM(e.salary),0) AS total_salary
FROM departments d
LEFT JOIN employees e
  ON d.dept_id = e.dept_id
GROUP BY d.dept_id, d.dept_name;

This includes departments with zero employees.


🎯 Interview Tip

Mention COALESCE to show departments with no employees return 0 instead of NULL.


🪄 End of Q7.


❓ Q8: Count how many employees belong to each department.

🧩 Problem Statement

For each department, count employees (i.e., headcount per dept).


💾 Step 1: Assume Example Tables

employees

emp_idnamedept_iddept_name
1Amit10Sales
2Ananya10Sales
3Bhavna20HR
4ChetanNULLNULL

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Concept: COUNT(*) with GROUP BY. Decide whether to include NULL dept.

🏗️ SQL Query

SELECT dept_id, dept_name, COUNT(*) AS employee_count
FROM employees
WHERE dept_id IS NOT NULL
GROUP BY dept_id, dept_name;

(To include NULLs as a separate group, remove the WHERE clause.)

🔍 Explanation

Group by dept and count rows per group.


🧾 Step 3: Expected Output

dept_iddept_nameemployee_count
10Sales2
20HR1

⚙️ Step 4: Why This Query Works

COUNT(*) counts rows in each department group.


⚡ Step 5: Common Mistakes

  • Using COUNT(dept_id) which ignores NULLs (sometimes desired).
  • Forgetting GROUP BY for non-aggregated columns.

🧠 Step 6: Alternate Approach (Optional)

-- Include departments with zero employees using departments table
SELECT d.dept_id, d.dept_name, COUNT(e.emp_id) AS employee_count
FROM departments d
LEFT JOIN employees e ON d.dept_id = e.dept_id
GROUP BY d.dept_id, d.dept_name;

🎯 Interview Tip

Explain the difference between COUNT(*), COUNT(column), and COUNT(DISTINCT column).


🪄 End of Q8.


❓ Q9: Get departments with more than 5 employees.

🧩 Problem Statement

Find departments whose employee count exceeds 5.


💾 Step 1: Assume Example Tables

employees

(Only showing relevant counts.)

dept_iddept_nameemployee_count (derived)
10Sales-
20HR-

We will compute counts from employees.


🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Concepts: GROUP BY + HAVING for filtering groups.

🏗️ SQL Query

SELECT dept_id, dept_name, COUNT(*) AS employee_count
FROM employees
GROUP BY dept_id, dept_name
HAVING COUNT(*) > 5;

🔍 Explanation

HAVING filters aggregated groups by their computed count.


🧾 Step 3: Expected Output

dept_iddept_nameemployee_count
50Ops12
70Sales8

(Example departments meeting the criterion.)


⚙️ Step 4: Why This Query Works

WHERE cannot filter aggregated results; HAVING can.


⚡ Step 5: Common Mistakes

  • Using WHERE COUNT(*) > 5 (invalid).
  • Forgetting GROUP BY on selected non-aggregated columns.

🧠 Step 6: Alternate Approach (Optional)

-- Using subquery
SELECT dept_id, dept_name, employee_count
FROM (
  SELECT dept_id, dept_name, COUNT(*) AS employee_count
  FROM employees
  GROUP BY dept_id, dept_name
) t
WHERE employee_count > 5;

🎯 Interview Tip

Explain HAVING vs WHERE clearly before coding.


🪄 End of Q9.


❓ Q10: Find the average salary per job title.

🧩 Problem Statement

Group by job_title and compute average salary for each title.


💾 Step 1: Assume Example Tables

employees

emp_idnamejob_titlesalary
1AmitSales Executive60000
2AnanyaSales Executive70000
3BhavnaHR Manager90000
4ChetanEngineer80000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Concept: AVG() aggregate with GROUP BY.

🏗️ SQL Query

SELECT job_title, ROUND(AVG(salary), 2) AS avg_salary
FROM employees
GROUP BY job_title;

🔍 Explanation

AVG(salary) calculates the mean salary per job title. ROUND formats to 2 decimals.


🧾 Step 3: Expected Output

job_titleavg_salary
Sales Executive65000.00
HR Manager90000.00
Engineer80000.00

⚙️ Step 4: Why This Query Works

Standard aggregation by category returns an average per job title.


⚡ Step 5: Common Mistakes

  • Using integer division in some DBs (hence ROUND).
  • Selecting non-aggregated columns without grouping.

🧠 Step 6: Alternate Approach (Optional)

-- Exclude zero or NULL salaries
SELECT job_title, AVG(salary) AS avg_salary
FROM employees
WHERE salary IS NOT NULL AND salary > 0
GROUP BY job_title;

🎯 Interview Tip

Clarify how to treat NULL salaries and outliers before computing averages.


🪄 End of Q10.


❓ Q11: Retrieve total sales per region.

🧩 Problem Statement

Group transactions by region and sum amount to compute total sales per region.


💾 Step 1: Assume Example Tables

orders

order_idcustomer_idregionamount
11West1200
22North5500
31West3000
43South450

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Concept: GROUP BY region + SUM(amount).

🏗️ SQL Query

SELECT region, SUM(amount) AS total_sales
FROM orders
GROUP BY region;

🔍 Explanation

Sums order amounts grouped by region.


🧾 Step 3: Expected Output

regiontotal_sales
West4200
North5500
South450

⚙️ Step 4: Why This Query Works

Aggregates amounts per region to produce regional totals.


⚡ Step 5: Common Mistakes

  • Forgetting GROUP BY.
  • Not handling NULL region values (may want to group them separately).

🧠 Step 6: Alternate Approach (Optional)

SELECT COALESCE(region, 'Unknown') AS region, SUM(amount) AS total_sales
FROM orders
GROUP BY COALESCE(region, 'Unknown');

🎯 Interview Tip

Discuss currency and timezone normalization if amounts come from different locales.


🪄 End of Q11.


❓ Q12: Count the total number of customers who placed at least one order.

🧩 Problem Statement

Count unique customers appearing in orders (customers with >=1 order).


💾 Step 1: Assume Example Tables

orders

order_idcustomer_idamount
111200
225500
313000
44450

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Concept: COUNT(DISTINCT customer_id).

🏗️ SQL Query

SELECT COUNT(DISTINCT customer_id) AS customers_with_orders
FROM orders;

🔍 Explanation

DISTINCT ensures each customer counted once, even with multiple orders.


🧾 Step 3: Expected Output

customers_with_orders
3

(Customers: 1, 2, and 4)


⚙️ Step 4: Why This Query Works

Counting distinct IDs gives unique customer count.


⚡ Step 5: Common Mistakes

  • Using COUNT(*) which counts rows (orders), not unique customers.
  • Counting distinct on wrong column.

🧠 Step 6: Alternate Approach (Optional)

SELECT COUNT(*) FROM (SELECT 1 FROM orders GROUP BY customer_id) t;

Equivalent but less direct.


🎯 Interview Tip

If large tables, mention potential cost of DISTINCT and ways to optimize (indexes, pre-aggregated tables).


🪄 End of Q12.


❓ Q13: Find customers who made transactions above $5,000 more than once.

🧩 Problem Statement

Find customers who have at least two transactions where amount > 5000 (i.e., count of such transactions per customer > 1).


💾 Step 1: Assume Example Tables

transactions

tx_idcustomer_idamounttx_date
1160002024-02-01
2270002024-03-01
3155002024-04-01
4340002024-05-01
5220002024-06-01

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Concepts: filtering + grouping + HAVING.

🏗️ SQL Query

SELECT customer_id, COUNT(*) AS high_value_tx_count
FROM transactions
WHERE amount > 5000
GROUP BY customer_id
HAVING COUNT(*) > 1;

🔍 Explanation

Filter transactions > 5000, then count per customer, and keep those with count > 1.


🧾 Step 3: Expected Output

customer_idhigh_value_tx_count
12

(Customer 1 has two high-value transactions; customer 2 has only one >5000.)


⚙️ Step 4: Why This Query Works

WHERE narrows rows, GROUP BY groups per customer, HAVING filters groups by count.


⚡ Step 5: Common Mistakes

  • Putting amount > 5000 in HAVING without filtering first (works but less efficient).
  • Forgetting GROUP BY.

🧠 Step 6: Alternate Approach (Optional)

SELECT customer_id
FROM (
  SELECT customer_id,
         SUM(CASE WHEN amount > 5000 THEN 1 ELSE 0 END) AS cnt
  FROM transactions
  GROUP BY customer_id
) t
WHERE cnt > 1;

Useful when counting multiple conditional buckets.


🎯 Interview Tip

Discuss performance trade-offs when transactions is large: an index on amount or partitioning could help.


🪄 End of Q13.


❓ Q14: Identify the most profitable regions by total transaction amount.

🧩 Problem Statement

Compute total transaction amount per region and rank regions by descending totals to find top performers.


💾 Step 1: Assume Example Tables

transactions

tx_idregionamount
1West12000
2North5500
3West3000
4South450
5North8000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Concepts: aggregation + ordering + optional ranking.

🏗️ SQL Query

SELECT region, SUM(amount) AS total_amount
FROM transactions
GROUP BY region
ORDER BY total_amount DESC;

(To get top N regions: add LIMIT N.)

🔍 Explanation

Sums amounts per region and sorts highest to lowest.


🧾 Step 3: Expected Output

regiontotal_amount
West15000
North13500
South450

⚙️ Step 4: Why This Query Works

Group & sum reveals aggregated revenue per region; ORDER BY shows rank.


⚡ Step 5: Common Mistakes

  • Forgetting GROUP BY.
  • Not handling NULL regions.

🧠 Step 6: Alternate Approach (Optional)

-- Add rank column
SELECT region, total_amount,
       RANK() OVER (ORDER BY total_amount DESC) AS region_rank
FROM (
  SELECT region, SUM(amount) AS total_amount
  FROM transactions
  GROUP BY region
) t
ORDER BY total_amount DESC;

🎯 Interview Tip

Discuss whether revenue or profit should be used; profit requires cost data.


🪄 End of Q14.


❓ Q15: Find customers who have not purchased anything in the last 6 months.

🧩 Problem Statement

Return customers with no orders/transactions in the past 6 months (relative to a reference date). For this example assume reference date = 2024-10-31 (today-like).


💾 Step 1: Assume Example Tables

customers

customer_idnamejoined_date
1Raj2021-05-10
2Aisha2022-02-01
3Sunil2023-07-22

orders

order_idcustomer_idorder_date
112024-01-01
222024-08-01
312024-03-05

(Reference date for “last 6 months”: 2024-10-31. Six months before: 2024-04-30.)


🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Concepts: anti-join (LEFT JOIN ... WHERE ... IS NULL) or NOT EXISTS. Date arithmetic needed.

🏗️ SQL Query (preferred NOT EXISTS)

SELECT c.customer_id, c.name
FROM customers c
WHERE NOT EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.customer_id = c.customer_id
    AND o.order_date > DATE '2024-04-30'  -- 6 months before 2024-10-31
);

(If your DB has CURRENT_DATE, compute CURRENT_DATE - INTERVAL '6 months'.)

🔍 Explanation

For each customer, check if any order exists after the cutoff date. If none, return the customer.


🧾 Step 3: Expected Output

customer_idname
3Sunil

(Sunil has no orders after 2024-04-30.)


⚙️ Step 4: Why This Query Works

NOT EXISTS efficiently filters customers with zero qualifying orders.


⚡ Step 5: Common Mistakes

  • Using LEFT JOIN incorrectly and forgetting WHERE o.order_id IS NULL.
  • Wrong date arithmetic or inclusive/exclusive boundary confusion.

🧠 Step 6: Alternate Approach (Optional)

-- LEFT JOIN method
SELECT c.customer_id, c.name
FROM customers c
LEFT JOIN orders o
  ON c.customer_id = o.customer_id
  AND o.order_date > DATE '2024-04-30'
WHERE o.order_id IS NULL;

🎯 Interview Tip

Clarify the reference date and whether “6 months” is exact calendar months or 180 days.


🪄 End of Q15.


❓ Q16: Calculate ARPU (Average Revenue Per User).

🧩 Problem Statement

ARPU = (total revenue in period) / (number of active users in period). For simplicity, compute for all-time: total transaction amount divided by number of unique customers who made at least one transaction.


💾 Step 1: Assume Example Tables

transactions

tx_idcustomer_idamount
111000
222000
313000
43500

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Compute aggregate sum and distinct count, then divide. Use NULLIF to prevent division by zero.

🏗️ SQL Query

SELECT
  ROUND(SUM(amount) / NULLIF(COUNT(DISTINCT customer_id), 0), 2) AS arpu
FROM transactions;

🔍 Explanation

  • SUM(amount) total revenue.
  • COUNT(DISTINCT customer_id) active users.
  • NULLIF(...,0) avoids division by zero if no customers.

🧾 Step 3: Expected Output

arpu
2166.67

(Total revenue = 6500; unique customers = 3 → 6500/3 ≈ 2166.67)


⚙️ Step 4: Why This Query Works

Direct arithmetic on aggregates yields the average revenue per active user.


⚡ Step 5: Common Mistakes

  • Dividing by COUNT(*) (orders) instead of distinct users.
  • Not handling zero users.

🧠 Step 6: Alternate Approach (Optional)

Compute ARPU for a specific period using date filters:

SELECT ROUND(SUM(amount) / NULLIF(COUNT(DISTINCT customer_id), 0), 2) AS arpu
FROM transactions
WHERE transaction_date BETWEEN '2024-01-01' AND '2024-09-30';

🎯 Interview Tip

Define active user (e.g., made at least one purchase in period) before coding.


🪄 End of Q16.


❓ Q17: Calculate the monthly customer retention rate.

🧩 Problem Statement

Monthly retention rate = (customers active this month who were also active previous month) / (customers active previous month). We’ll compute for two months example: previous month = Sept 2024, current month = Oct 2024.


💾 Step 1: Assume Example Tables

orders

order_idcustomer_idorder_date
112024-09-10
222024-09-20
312024-10-05
432024-10-15
522024-08-01

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

We need sets: users active in previous month (Sept) and current month (Oct), then compute intersection size / previous month size.

🏗️ SQL Query

WITH prev_month AS (
  SELECT DISTINCT customer_id
  FROM orders
  WHERE order_date BETWEEN '2024-09-01' AND '2024-09-30'
),
curr_month AS (
  SELECT DISTINCT customer_id
  FROM orders
  WHERE order_date BETWEEN '2024-10-01' AND '2024-10-31'
),
counts AS (
  SELECT
    (SELECT COUNT(*) FROM prev_month) AS prev_count,
    (SELECT COUNT(*) FROM curr_month) AS curr_count,
    (SELECT COUNT(*) FROM prev_month pm
     JOIN curr_month cm ON pm.customer_id = cm.customer_id) AS retained_count
)
SELECT
  prev_count,
  curr_count,
  retained_count,
  CASE WHEN prev_count = 0 THEN NULL
       ELSE ROUND((retained_count::numeric / prev_count) * 100, 2)
  END AS retention_rate_percent
FROM counts;

(For DBs without ::numeric, use CAST.)

🔍 Explanation

  • Gather distinct customers per month.
  • Count previous, current, and intersection.
  • Retention = retained / prev * 100.

🧾 Step 3: Expected Output

prev_countcurr_countretained_countretention_rate_percent
22150.00

(Prev month customers: 1 and 2; current: 1 and 3; retained is customer 1 → 1/2 = 50%)


⚙️ Step 4: Why This Query Works

Set operations via CTEs create clear intermediate results; join computes intersection.


⚡ Step 5: Common Mistakes

  • Forgetting DISTINCT leading to inflated counts.
  • Using current month size as denominator instead of previous month.

🧠 Step 6: Alternate Approach (Optional)

Compute rolling retention for many months by month buckets and lagging; more advanced but same idea.


🎯 Interview Tip

Explain definition of “active” and whether to use purchases, logins, or other events to define activity.


🪄 End of Q17.


❓ Q18: Identify churned customers (inactive in the last 6 months).

🧩 Problem Statement

Return customers with no orders in the past 6 months (similar to Q15 but labelled as churn). Use a reference date (assume 2024-10-31 → cutoff 2024-04-30).


💾 Step 1: Assume Example Tables

customers

customer_idname
1Raj
2Aisha
3Sunil
4Priya

orders

order_idcustomer_idorder_date
112024-01-10
222023-12-05
332024-05-01
442024-03-15

Cutoff: 2024-04-30. Customers with last order date <= cutoff are churned if no orders after cutoff.


🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Find customers whose most recent order is older than the cutoff or who never ordered.

🏗️ SQL Query

WITH last_order AS (
  SELECT customer_id, MAX(order_date) AS last_order_date
  FROM orders
  GROUP BY customer_id
)
SELECT c.customer_id, c.name, lo.last_order_date
FROM customers c
LEFT JOIN last_order lo ON c.customer_id = lo.customer_id
WHERE COALESCE(lo.last_order_date, DATE '1900-01-01') <= DATE '2024-04-30';

(Here COALESCE treats customers with no orders as churned.)

🔍 Explanation

  • last_order finds each customer’s latest order date.
  • Left join to include customers without orders.
  • Filter where last order is on/before cutoff → churned.

🧾 Step 3: Expected Output

customer_idnamelast_order_date
1Raj2024-01-10
2Aisha2023-12-05
4Priya2024-03-15

(Customer 3 had order on 2024-05-01 > cutoff, so not churned.)


⚙️ Step 4: Why This Query Works

LAST order date compares recency per customer; COALESCE handles never-ordered customers.


⚡ Step 5: Common Mistakes

  • Using WHERE on orders before aggregation, which can drop customers entirely.
  • Not handling customers with no orders.

🧠 Step 6: Alternate Approach (Optional)

-- Using NOT EXISTS to find customers with no orders after cutoff
SELECT c.customer_id, c.name
FROM customers c
WHERE NOT EXISTS (
  SELECT 1 FROM orders o
  WHERE o.customer_id = c.customer_id
    AND o.order_date > DATE '2024-04-30'
);

🎯 Interview Tip

State churn definition clearly: “no purchase in X period” vs “decreasing engagement”, and adjust query accordingly.


🪄 End of Q18.


———

👩‍💼 Data Analyst

❓ Q1: Combine employee and department data using an INNER JOIN.

🧩 Problem Statement

Show each employee’s name along with their department name. Only include employees who belong to an existing department.


💾 Step 1: Assume Example Tables

employees

emp_idnamedept_id
1Amit Sharma10
2Ananya Roy20
3Bhavna Rai30
4Chetan Joshi40

departments

dept_iddept_name
10Sales
20HR
30Engineering

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: INNER JOIN returns rows where a match exists in both tables.

🏗️ SQL Query

SELECT e.emp_id,
       e.name,
       d.dept_name
FROM employees e
INNER JOIN departments d
  ON e.dept_id = d.dept_id;

🔍 Explanation

  • e and d are aliases for clarity.
  • Only employees whose dept_id appears in departments are shown.

🧾 Step 3: Expected Output

emp_idnamedept_name
1Amit SharmaSales
2Ananya RoyHR
3Bhavna RaiEngineering

Chetan (Dept 40) is excluded — no matching department.


⚙️ Step 4: Why This Query Works

INNER JOIN merges only intersecting rows by dept_id, giving valid employee-department pairs.


⚡ Step 5: Common Mistakes

  • Forgetting the ON condition → Cartesian join.
  • Selecting columns with same name without prefix → ambiguity.

🧠 Step 6: Alternate Approach

SELECT e.*, d.dept_name
FROM employees e, departments d
WHERE e.dept_id = d.dept_id;

Old-style syntax — functionally same but less readable.


🎯 Interview Tip

Mention that INNER JOIN filters unmatched rows, ideal for integrity-checked relations.


🔍 Interviewer’s Analysis

  • Concept Tested: INNER JOIN
  • Difficulty: Beginner
  • Complexity: ≈ O(n × m) without index; O(n log m) with index
  • Follow-Ups: Explain LEFT JOIN vs INNER JOIN; Indexing strategy on dept_id.

🪄 End of Q1.


❓ Q2: List all employees with their departments (even if unassigned) using a LEFT JOIN.

🧩 Problem Statement

Return every employee and their department name. If an employee has no department, still include them with NULL as department.


💾 Step 1: Assume Example Tables

Same as Q1 but keep Chetan’s department 40 missing.


🧮 Step 2: Step-by-Step Solution

🧠 Understanding

LEFT JOIN keeps all rows from the left table (employees), even when no match on the right.

🏗️ SQL Query

SELECT e.emp_id,
       e.name,
       d.dept_name
FROM employees e
LEFT JOIN departments d
  ON e.dept_id = d.dept_id;

🔍 Explanation

All employees appear; unmatched ones show NULL for department.


🧾 Step 3: Expected Output

emp_idnamedept_name
1Amit SharmaSales
2Ananya RoyHR
3Bhavna RaiEngineering
4Chetan JoshiNULL

⚙️ Step 4: Why This Query Works

LEFT JOIN includes all employees (left table), filling missing matches with NULL.


⚡ Step 5: Common Mistakes

  • Using INNER JOIN instead → drops unassigned employees.
  • Forgetting to check for NULL when filtering post-join.

🧠 Step 6: Alternate Approach

SELECT e.emp_id, e.name,
       COALESCE(d.dept_name, 'Unassigned') AS dept_name
FROM employees e
LEFT JOIN departments d
  ON e.dept_id = d.dept_id;

COALESCE replaces NULL with “Unassigned”.


🎯 Interview Tip

Highlight that LEFT JOIN preserves completeness of the left entity — vital in reports.


🔍 Interviewer’s Analysis

  • Concept Tested: LEFT JOIN / NULL handling
  • Difficulty: Beginner → Intermediate
  • Follow-Ups: How to show departments with no employees (→ RIGHT JOIN).

🪄 End of Q2.


❓ Q3: Retrieve all data from both tables with a FULL OUTER JOIN.

🧩 Problem Statement

Show every employee and every department. Include:

  • employees without departments
  • departments without employees.

💾 Step 1: Assume Example Tables

employees

emp_idnamedept_id
1Amit10
2Ananya20
3Chetan40

departments

dept_iddept_name
10Sales
20HR
30Engineering

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

FULL OUTER JOIN = LEFT JOIN UNION RIGHT JOIN. It keeps all rows from both sides.

🏗️ SQL Query

SELECT e.emp_id,
       e.name,
       e.dept_id AS emp_dept,
       d.dept_name
FROM employees e
FULL OUTER JOIN departments d
  ON e.dept_id = d.dept_id;

🔍 Explanation

  • Matches where possible.
  • When no match, one side’s columns → NULL.

🧾 Step 3: Expected Output

emp_idnameemp_deptdept_name
1Amit10Sales
2Ananya20HR
3Chetan40NULL
NULLNULL30Engineering

⚙️ Step 4: Why This Query Works

FULL OUTER JOIN merges the results of left and right joins, ensuring nothing is lost.


⚡ Step 5: Common Mistakes

  • Using DB that doesn’t support FULL OUTER JOIN (PostgreSQL yes, MySQL no → need UNION).
  • Forgetting to distinguish which dept_id comes from which table.

🧠 Step 6: Alternate Approach

SELECT e.emp_id, e.name, e.dept_id AS emp_dept, d.dept_name
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id
UNION
SELECT e.emp_id, e.name, e.dept_id, d.dept_name
FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.dept_id;

Equivalent manual implementation.


🎯 Interview Tip

Explain which DBs lack native FULL OUTER JOIN and how to emulate it.


🔍 Interviewer’s Analysis

  • Concept Tested: FULL OUTER JOIN / Set operations
  • Difficulty: Intermediate
  • Follow-Ups: “Show only unmatched rows” (→ use WHERE dept_name IS NULL OR name IS NULL).

❓ Q4: Perform a SELF JOIN to show employees along with their managers.

🧩 Problem Statement

Display each employee’s name together with their manager’s name and department.


💾 Step 1: Assume Example Tables

employees

emp_idnamedept_idmanager_id
1Amit Sharma104
2Ananya Roy104
3Bhavna Rai205
4David Kumar10NULL
5Erin Singh20NULL
  • manager_id points to another row’s emp_id — hence “self” join.

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept tested: SELF JOIN — joining a table to itself with aliases.

🏗️ SQL Query

SELECT e.name       AS employee_name,
       m.name       AS manager_name,
       e.dept_id
FROM employees e
LEFT JOIN employees m
  ON e.manager_id = m.emp_id;

🔍 Explanation

  • e = employee copy, m = manager copy.
  • LEFT JOIN keeps top-level managers (NULL manager id).

🧾 Step 3: Expected Output

employee_namemanager_namedept_id
Amit SharmaDavid Kumar10
Ananya RoyDavid Kumar10
Bhavna RaiErin Singh20
David KumarNULL10
Erin SinghNULL20

⚙️ Step 4: Why This Query Works

The same table is scanned twice — once as employees, once as managers — matching on manager_id = emp_id.


⚡ Step 5: Common Mistakes

  • Forgetting aliases → ambiguous column error.
  • Using INNER JOIN → managers only (drops top bosses).

🧠 Step 6: Alternate Approach

SELECT e.name, m.name AS manager_name
FROM employees e
JOIN employees m USING (emp_id)

(Not valid here — shows why aliases + explicit join condition are required.)


🎯 Interview Tip

Explain clearly the difference between self-join and hierarchical CTE (for multi-level chains).


🔍 Interviewer’s Analysis

  • Concept Tested: Self-Join / Aliasing
  • Difficulty: Intermediate
  • Follow-Ups: Find multi-level chain; show manager count per manager.

🪄 End of Q4.


❓ Q5: Find employees who do not belong to any department.

🧩 Problem Statement

Return employees whose dept_id does not exist in the departments table.


💾 Step 1: Assume Example Tables

employees

emp_idnamedept_id
1Amit Sharma10
2Ananya Roy20
3Bhavna Rai40
4Chetan JoshiNULL

departments

dept_iddept_name
10Sales
20HR
30Engineering

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: Anti-Join using LEFT JOIN … IS NULL or NOT IN.

🏗️ SQL Query

SELECT e.emp_id, e.name
FROM employees e
LEFT JOIN departments d
  ON e.dept_id = d.dept_id
WHERE d.dept_id IS NULL;

🔍 Explanation

  • LEFT JOIN keeps all employees.
  • The WHERE d.dept_id IS NULL condition isolates those without a matching department.

🧾 Step 3: Expected Output

emp_idname
3Bhavna Rai
4Chetan Joshi

⚙️ Step 4: Why This Query Works

Unmatched rows (from left table) show NULLs on the right — filtering those identifies missing links.


⚡ Step 5: Common Mistakes

  • Using INNER JOIN (returns only matched rows).
  • Using NOT IN without handling NULLs → no rows returned if NULL exists in subquery.

🧠 Step 6: Alternate Approach

SELECT emp_id, name
FROM employees
WHERE dept_id NOT IN (SELECT dept_id FROM departments WHERE dept_id IS NOT NULL);

🎯 Interview Tip

If interviewer asks “how to optimize,” mention that an anti-join using NOT EXISTS often beats NOT IN.


🔍 Interviewer’s Analysis

  • Concept Tested: Left Anti-Join
  • Difficulty: Intermediate
  • Follow-Ups: Handle NULL values in subqueries; compare NOT IN vs NOT EXISTS.

🪄 End of Q5.


❓ Q6: Display products purchased by customers from more than one country.

🧩 Problem Statement

Find products that have been bought by customers in multiple countries, i.e., count of distinct countries > 1.


💾 Step 1: Assume Example Tables

customers

customer_idnamecountry
1RajIndia
2AishaIndia
3JohnUSA
4MariaUK

orders

order_idcustomer_idproduct_id
11100
22200
33100
44300
53200

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concepts: JOIN → GROUP BY → COUNT(DISTINCT) with HAVING.

🏗️ SQL Query

SELECT o.product_id,
       COUNT(DISTINCT c.country) AS country_count
FROM orders o
JOIN customers c
  ON o.customer_id = c.customer_id
GROUP BY o.product_id
HAVING COUNT(DISTINCT c.country) > 1;

🔍 Explanation

  • Join orders ↔ customers to access country.
  • Group by product and count unique countries.
  • Keep only those sold in > 1 country.

🧾 Step 3: Expected Output

product_idcountry_count
1002
2002

(Product 100 bought in India & USA; Product 200 in India & USA.)


⚙️ Step 4: Why This Query Works

Aggregation over joined data computes cross-customer metrics (“distinct countries per product”).


⚡ Step 5: Common Mistakes

  • Forgetting DISTINCT → over-count.
  • Filtering with WHERE instead of HAVING.

🧠 Step 6: Alternate Approach

-- Using a CTE for clarity
WITH product_countries AS (
  SELECT o.product_id, c.country
  FROM orders o
  JOIN customers c ON o.customer_id = c.customer_id
  GROUP BY o.product_id, c.country
)
SELECT product_id
FROM product_countries
GROUP BY product_id
HAVING COUNT(*) > 1;

🎯 Interview Tip

Discuss indexing (customer_id, product_id) for large joins and explain why COUNT(DISTINCT) is CPU-intensive.


🔍 Interviewer’s Analysis

  • Concept Tested: Join + Aggregation + Distinct
  • Difficulty: Intermediate
  • Follow-Ups: Optimize COUNT(DISTINCT); filter by time period.

🪄 End of Q6.


❓ Q7: Find employees who earn more than the average salary.

🧩 Problem Statement

List employees whose salary is greater than the overall average salary across the company.


💾 Step 1: Assume Example Tables

employees

emp_idnamedept_idsalary
1Amit Sharma1060000
2Ananya Roy1075000
3Bhavna Rai2090000
4Chetan Joshi2050000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept tested: Subquery returning a single value (the company-wide average).

🏗️ SQL Query

SELECT name, salary
FROM employees
WHERE salary > (
  SELECT AVG(salary)
  FROM employees
);

🔍 Explanation

  • The inner query computes one scalar — the average salary.
  • The outer query filters all employees whose salary > that value.

🧾 Step 3: Expected Output

namesalary
Ananya Roy75000
Bhavna Rai90000

Average salary = (60000 + 75000 + 90000 + 50000) / 4 = 68750.


⚙️ Step 4: Why This Query Works

The scalar subquery executes once; each row compares its salary to the result.


⚡ Step 5: Common Mistakes

  • Forgetting parentheses around subquery.
  • Trying to compare multiple rows without aggregation → “subquery returns more than one row.”

🧠 Step 6: Alternate Approach

-- Using a window function (no subquery)
SELECT name, salary
FROM (
  SELECT name, salary, AVG(salary) OVER () AS avg_sal
  FROM employees
) t
WHERE salary > avg_sal;

More modern and efficient.


🎯 Interview Tip

If asked, mention that the window function version avoids recalculating averages repeatedly.


🔍 Interviewer’s Analysis

  • Concept Tested: Scalar subquery / Aggregation
  • Difficulty: Beginner → Intermediate
  • Follow-Ups: Compare performance between subquery and window approach.

🪄 End of Q7.


❓ Q8: Fetch the second-highest salary using subquery and DENSE_RANK.

🧩 Problem Statement

Retrieve the employee(s) with the second-highest salary — including ties.


💾 Step 1: Assume Example Tables

employees

emp_idnamesalary
1Amit Sharma60000
2Ananya Roy75000
3Bhavna Rai90000
4Chetan Joshi90000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We can either:

  1. Use a subquery with MAX and <> logic, or
  2. Use DENSE_RANK() window function.

We’ll demonstrate both.


🏗️ SQL Query (Approach 1 — Subquery)

SELECT name, salary
FROM employees
WHERE salary = (
  SELECT MAX(salary)
  FROM employees
  WHERE salary < (SELECT MAX(salary) FROM employees)
);

🔍 Explanation

  • Inner-most: Finds highest salary (90000).
  • Middle: Finds max salary below that (75000).
  • Outer: Returns employees with that salary.

🏗️ SQL Query (Approach 2 — Using DENSE_RANK)

SELECT name, salary
FROM (
  SELECT name, salary,
         DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
  FROM employees
) t
WHERE rnk = 2;

🔍 Explanation

  • DENSE_RANK() assigns same rank for duplicates.
  • We select rows with rank = 2 → second-highest.

🧾 Step 3: Expected Output

namesalary
Ananya Roy75000

⚙️ Step 4: Why This Query Works

Both methods logically identify the “runner-up” salary value and filter for matching rows.


⚡ Step 5: Common Mistakes

  • Using LIMIT 2 incorrectly → returns top two, not specifically second.
  • Using ROW_NUMBER (which skips ties).

🎯 Interview Tip

Explain how DENSE_RANK handles duplicates gracefully; ROW_NUMBER would not.


🔍 Interviewer’s Analysis

  • Concept Tested: Subqueries / Ranking Functions
  • Difficulty: Intermediate
  • Follow-Ups: Compare performance; show Nth-highest.

🪄 End of Q8.


❓ Q9: Retrieve products that have never been sold.

🧩 Problem Statement

Find products in the catalog that do not appear in the sales table.


💾 Step 1: Assume Example Tables

products

product_idproduct_name
100Laptop
101Phone
102Tablet
103Smartwatch

sales

sale_idproduct_idamount
110050000
210120000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: Anti-Join / NOT EXISTS to find unmatched rows.

🏗️ SQL Query

SELECT p.product_id, p.product_name
FROM products p
WHERE NOT EXISTS (
  SELECT 1
  FROM sales s
  WHERE s.product_id = p.product_id
);

🔍 Explanation

The subquery checks if a product appears in sales. If no match exists, it’s unsold.


🧾 Step 3: Expected Output

product_idproduct_name
102Tablet
103Smartwatch

⚙️ Step 4: Why This Query Works

NOT EXISTS efficiently excludes all products present in sales.


⚡ Step 5: Common Mistakes

  • Using NOT IN without NULL handling (fails if sales.product_id has NULL).
  • Using INNER JOIN instead of anti-join.

🧠 Step 6: Alternate Approach

SELECT p.product_id, p.product_name
FROM products p
LEFT JOIN sales s ON p.product_id = s.product_id
WHERE s.product_id IS NULL;

🎯 Interview Tip

Mention that NOT EXISTS is typically faster than LEFT JOIN for large datasets due to early exits.


🔍 Interviewer’s Analysis

  • Concept Tested: Anti-Join / Exclusion Logic
  • Difficulty: Intermediate
  • Follow-Ups: Performance difference between NOT EXISTS vs LEFT JOIN.

🪄 End of Q9.


❓ Q10: Display customers who never placed an order.

🧩 Problem Statement

List all customers who do not appear in the orders table.


💾 Step 1: Assume Example Tables

customers

customer_idname
1Raj
2Aisha
3Sunil
4Priya

orders

order_idcustomer_idamount
111000
222000
311500

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: Anti-Join / NOT EXISTS again, applied on customersorders.

🏗️ SQL Query

SELECT c.customer_id, c.name
FROM customers c
WHERE NOT EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.customer_id = c.customer_id
);

🔍 Explanation

Each customer checked against the orders table. Only those with no orders are returned.


🧾 Step 3: Expected Output

customer_idname
3Sunil
4Priya

⚙️ Step 4: Why This Query Works

The correlated subquery ensures exclusion only when there are no matches.


⚡ Step 5: Common Mistakes

  • Using NOT IN directly → issues with NULL values.
  • Forgetting correlation (o.customer_id = c.customer_id).

🧠 Step 6: Alternate Approach

SELECT c.customer_id, c.name
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL;

🎯 Interview Tip

In large-scale data, NOT EXISTS tends to outperform joins when proper indexes exist.


🔍 Interviewer’s Analysis

  • Concept Tested: Correlated Subquery / Anti-Join
  • Difficulty: Beginner → Intermediate
  • Follow-Ups: Add condition for “no order in last 6 months”.

🪄 End of Q10.


❓ Q11: Identify departments that don’t have any employees hired after 2020.

🧩 Problem Statement

Find all departments where no employee joined after the year 2020.


💾 Step 1: Assume Example Tables

employees

emp_idnamedept_idhire_date
1Amit Sharma102019-05-01
2Ananya Roy102021-03-01
3Bhavna Rai202018-06-10
4Chetan Joshi302017-01-15

departments

dept_iddept_name
10Sales
20HR
30Tech

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: Use NOT EXISTS to exclude departments with any recent hire.

🏗️ SQL Query

SELECT d.dept_id, d.dept_name
FROM departments d
WHERE NOT EXISTS (
  SELECT 1
  FROM employees e
  WHERE e.dept_id = d.dept_id
    AND e.hire_date > '2020-12-31'
);

🔍 Explanation

  • The subquery searches for employees hired after 2020 in that department.
  • If none exist → department qualifies.

🧾 Step 3: Expected Output

dept_iddept_name
20HR
30Tech

Sales excluded (has Ananya hired in 2021).


⚙️ Step 4: Why This Query Works

NOT EXISTS removes departments with any post-2020 hire.


⚡ Step 5: Common Mistakes

  • Using HAVING incorrectly (not applicable).
  • Mixing up WHERE hire_date < vs > logic.

🧠 Step 6: Alternate Approach

SELECT d.dept_id, d.dept_name
FROM departments d
LEFT JOIN employees e
  ON d.dept_id = e.dept_id AND e.hire_date > '2020-12-31'
WHERE e.emp_id IS NULL;

🎯 Interview Tip

State clearly why you used NOT EXISTS: it ensures logical clarity and performance.


🔍 Interviewer’s Analysis

  • Concept Tested: Correlated Subquery / Temporal Filtering
  • Difficulty: Intermediate
  • Follow-Ups: How to adapt for “no new hires in past N months.”

🪄 End of Q11.


❓ Q12: Retrieve customers whose total purchases exceed the overall customer average.

🧩 Problem Statement

Find customers whose total spend (sum of order amounts) is greater than the average of all customers’ total spend.


💾 Step 1: Assume Example Tables

orders

order_idcustomer_idamount
111000
212000
32500
434000
531000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Two levels of aggregation:

  1. Inner query: total per customer.
  2. Outer query: compare each total to overall average.

🏗️ SQL Query

SELECT customer_id, total_amount
FROM (
  SELECT customer_id, SUM(amount) AS total_amount
  FROM orders
  GROUP BY customer_id
) t
WHERE total_amount > (
  SELECT AVG(total_amount)
  FROM (
    SELECT customer_id, SUM(amount) AS total_amount
    FROM orders
    GROUP BY customer_id
  ) sub
);

🔍 Explanation

  • Inner aggregation: computes SUM(amount) per customer.
  • Outer query compares each to average of those totals.

🧾 Step 3: Expected Output

customer_idtotal_amount
35000
13000

Average customer total = (3000 + 500 + 5000)/3 = 2833.33.


⚙️ Step 4: Why This Query Works

It’s a double subquery: one for per-customer totals, one for the global average.


⚡ Step 5: Common Mistakes

  • Comparing to AVG(amount) directly → wrong (compares per-order, not per-customer).
  • Forgetting aliasing in subqueries.

🧠 Step 6: Alternate Approach

WITH customer_totals AS (
  SELECT customer_id, SUM(amount) AS total_amount
  FROM orders
  GROUP BY customer_id
)
SELECT customer_id, total_amount
FROM customer_totals
WHERE total_amount > (SELECT AVG(total_amount) FROM customer_totals);

CTE improves readability.


🎯 Interview Tip

Explain trade-off between nested subquery and CTE — same result, but CTE improves maintainability.


🔍 Interviewer’s Analysis

  • Concept Tested: Aggregation within subqueries / CTEs
  • Difficulty: Intermediate
  • Follow-Ups: Optimize with temporary tables or materialized views.

🪄 End of Q12.


❓ Q13: Assign ranks to employees by salary within each department.

🧩 Problem Statement

Rank employees based on salary within their department, so that the highest earner in each department gets rank 1.


💾 Step 1: Assume Example Tables

employees

emp_idnamedept_idsalary
1Amit Sharma1060000
2Ananya Roy1080000
3Bhavna Rai2075000
4Chetan Joshi2072000
5Deepa Singh2072000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: Window functionRANK() or DENSE_RANK() partitioned by department.

🏗️ SQL Query

SELECT emp_id,
       name,
       dept_id,
       salary,
       DENSE_RANK() OVER (
         PARTITION BY dept_id
         ORDER BY salary DESC
       ) AS dept_rank
FROM employees;

🔍 Explanation

  • PARTITION BY dept_id = restart rank per department.
  • ORDER BY salary DESC = rank by salary descending.
  • DENSE_RANK() assigns same rank to ties (vs RANK() adds gaps).

🧾 Step 3: Expected Output

emp_idnamedept_idsalarydept_rank
2Ananya Roy10800001
1Amit Sharma10600002
3Bhavna Rai20750001
4Chetan Joshi20720002
5Deepa Singh20720002

⚙️ Step 4: Why This Query Works

Each department acts as its own mini-table; ranks are computed per partition.


⚡ Step 5: Common Mistakes

  • Forgetting PARTITION BY → global ranking instead of per dept.
  • Using ROW_NUMBER() when ties should share rank.

🧠 Step 6: Alternate Approach

-- Using ROW_NUMBER to get unique ordering
SELECT emp_id, name, dept_id, salary,
       ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS dept_rank
FROM employees;

Ensures unique rank per employee.


🎯 Interview Tip

Explain the difference between RANK(), DENSE_RANK(), and ROW_NUMBER() — interviewers love this follow-up.


🔍 Interviewer’s Analysis

  • Concept Tested: Ranking & Partitioning
  • Difficulty: Intermediate
  • Follow-Ups: Explain how to show only top-3 per dept using a subquery filter.

🪄 End of Q13.


❓ Q14: Calculate cumulative (running) sales per employee.

🧩 Problem Statement

Show each employee’s cumulative sales over time, ordered by transaction date.


💾 Step 1: Assume Example Tables

sales

sale_idemp_idsale_dateamount
112024-01-011000
212024-02-012000
322024-01-151500
422024-03-012500
512024-03-053000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: Cumulative total via SUM() OVER (ORDER BY …) partitioned by employee.

🏗️ SQL Query

SELECT emp_id,
       sale_date,
       amount,
       SUM(amount) OVER (
         PARTITION BY emp_id
         ORDER BY sale_date
         ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
       ) AS cumulative_sales
FROM sales;

🔍 Explanation

  • SUM() OVER() calculates total up to current row per employee.
  • The ROWS clause ensures true running sum.

🧾 Step 3: Expected Output

emp_idsale_dateamountcumulative_sales
12024-01-0110001000
12024-02-0120003000
12024-03-0530006000
22024-01-1515001500
22024-03-0125004000

⚙️ Step 4: Why This Query Works

Windowed SUM lets totals accumulate row by row without collapsing groups.


⚡ Step 5: Common Mistakes

  • Forgetting ORDER BY → full sum repeated on each row.
  • Using aggregate SUM() instead of windowed version.

🧠 Step 6: Alternate Approach

For databases not supporting ROWS, the default frame behaves equivalently when ordered.


🎯 Interview Tip

Use this to explain running totals, moving averages, or trend tracking — all derived from similar window logic.


🔍 Interviewer’s Analysis

  • Concept Tested: Window Aggregation
  • Difficulty: Intermediate
  • Follow-Ups: Compute for company-wide cumulative totals (no partition).

🪄 End of Q14.


❓ Q15: Find the difference between each month’s and previous month’s sales.

🧩 Problem Statement

Show monthly sales and how much they increased/decreased compared to the previous month.


💾 Step 1: Assume Example Tables

sales_summary

monthtotal_sales
Jan50000
Feb60000
Mar55000
Apr70000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: LAG() window function — look at previous row’s value.

🏗️ SQL Query

SELECT month,
       total_sales,
       LAG(total_sales) OVER (ORDER BY month) AS prev_sales,
       total_sales - LAG(total_sales) OVER (ORDER BY month) AS change_from_prev
FROM sales_summary;

🔍 Explanation

  • LAG() fetches previous month’s sales.
  • Subtract to find difference.

🧾 Step 3: Expected Output

monthtotal_salesprev_saleschange_from_prev
Jan50000NULLNULL
Feb600005000010000
Mar5500060000-5000
Apr700005500015000

⚙️ Step 4: Why This Query Works

Window functions maintain chronological context; LAG() is ideal for sequential comparisons.


⚡ Step 5: Common Mistakes

  • Ordering alphabetically instead of chronologically (month should be date-type).
  • Using JOIN for month-over-month — unnecessary and inefficient.

🧠 Step 6: Alternate Approach

-- Using LEAD() to show next month’s change instead
SELECT month,
       total_sales,
       LEAD(total_sales) OVER (ORDER BY month) - total_sales AS change_to_next
FROM sales_summary;

🎯 Interview Tip

Talk about using LAG() for retention, revenue growth, or churn — all based on similar deltas.


🔍 Interviewer’s Analysis

  • Concept Tested: LAG() / Time-series comparison
  • Difficulty: Intermediate
  • Follow-Ups: Handle missing months or irregular time gaps.

🪄 End of Q15.


❓ Q16: Show employee salaries and the salary gap vs department average.

🧩 Problem Statement

For each employee, show how far their salary differs from their department’s average salary.


💾 Step 1: Assume Example Tables

employees

emp_idnamedept_idsalary
1Amit Sharma1060000
2Ananya Roy1080000
3Bhavna Rai2090000
4Chetan Joshi2075000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Use window AVG() to compute departmental mean per row, then subtract.

🏗️ SQL Query

SELECT emp_id,
       name,
       dept_id,
       salary,
       ROUND(AVG(salary) OVER (PARTITION BY dept_id), 2) AS dept_avg,
       salary - AVG(salary) OVER (PARTITION BY dept_id) AS salary_gap
FROM employees;

🔍 Explanation

  • AVG(salary) OVER (PARTITION BY dept_id) gives department average.
  • Subtract it from each employee’s salary.

🧾 Step 3: Expected Output

emp_idnamedept_idsalarydept_avgsalary_gap
1Amit Sharma106000070000.00-10000
2Ananya Roy108000070000.0010000
3Bhavna Rai209000082500.007500
4Chetan Joshi207500082500.00-7500

⚙️ Step 4: Why This Query Works

Window functions allow comparing each row to its group’s aggregate without collapsing rows.


⚡ Step 5: Common Mistakes

  • Using GROUP BY dept_id instead of window → collapses rows.
  • Forgetting PARTITION BY → average over all departments.

🎯 Interview Tip

Show that you can mix aggregates and raw rows — that’s what makes windows powerful.


🔍 Interviewer’s Analysis

  • Concept Tested: Window Aggregation / Comparative Analytics
  • Difficulty: Intermediate
  • Follow-Ups: Add flag column for “above/below average”.

🪄 End of Q16.


❓ Q17: Find the Nth highest salary in each department.

🧩 Problem Statement

Retrieve the Nth (e.g., 2nd or 3rd) highest salary per department.


💾 Step 1: Assume Example Tables

employees

emp_idnamedept_idsalary
1Amit Sharma1060000
2Ananya Roy1080000
3Bhavna Rai1070000
4Chetan Joshi2095000
5Deepa Singh2090000
6Esha Patel2088000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Use DENSE_RANK() to assign salary order within department, then filter.

🏗️ SQL Query

SELECT dept_id, name, salary
FROM (
  SELECT dept_id, name, salary,
         DENSE_RANK() OVER (
           PARTITION BY dept_id
           ORDER BY salary DESC
         ) AS rnk
  FROM employees
) t
WHERE rnk = 2;

🔍 Explanation

  • Assign rank by salary (highest = 1).
  • Filter for the target rank (Nth = 2 in this example).

🧾 Step 3: Expected Output (N = 2)

dept_idnamesalary
10Bhavna Rai70000
20Deepa Singh90000

⚙️ Step 4: Why This Query Works

DENSE_RANK() allows rank-based filtering while preserving ties.


⚡ Step 5: Common Mistakes

  • Using LIMIT → gives top-N overall, not per department.
  • Using ROW_NUMBER when ties should share rank.

🧠 Step 6: Alternate Approach

To find Nth highest salary company-wide:

SELECT salary
FROM employees
ORDER BY salary DESC
OFFSET (N - 1) ROWS FETCH NEXT 1 ROWS ONLY;

🎯 Interview Tip

Discuss tie-handling (DENSE_RANK vs ROW_NUMBER) and efficiency with large datasets (index on dept_id, salary).


🔍 Interviewer’s Analysis

  • Concept Tested: Window Ranking / Conditional Filtering
  • Difficulty: Intermediate
  • Follow-Ups: Modify for top-3 instead of exactly N.

🪄 End of Q17.


❓ Q18: Calculate the moving average of posts per user (7-day window).

🧩 Problem Statement

Compute each user’s 7-day moving average of posts, using their posting dates.


💾 Step 1: Assume Example Tables

posts

post_iduser_idpost_datepost_count
112024-01-012
212024-01-023
312024-01-054
412024-01-081
522024-01-015

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: Moving average with date-based window — sliding frame of last 7 days.

🏗️ SQL Query

SELECT user_id,
       post_date,
       post_count,
       AVG(post_count) OVER (
         PARTITION BY user_id
         ORDER BY post_date
         RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW
       ) AS moving_avg_7d
FROM posts;

🔍 Explanation

  • RANGE BETWEEN … dynamically covers 7 calendar days before each post.
  • Average recalculates over that rolling range.

🧾 Step 3: Expected Output (simplified)

user_idpost_datepost_countmoving_avg_7d
12024-01-0122.00
12024-01-0232.50
12024-01-0543.00
12024-01-0812.50
22024-01-0155.00

⚙️ Step 4: Why This Query Works

The sliding window dynamically shifts with each row, capturing exactly 7 days’ worth of history.


⚡ Step 5: Common Mistakes

  • Using ROWS instead of RANGE → counts 7 rows, not 7 days.
  • Forgetting to partition by user_id → merges all users.

🧠 Step 6: Alternate Approach

For databases lacking INTERVAL syntax:

SELECT user_id, post_date, post_count,
       AVG(post_count) OVER (
         PARTITION BY user_id
         ORDER BY post_date
         ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
       ) AS moving_avg_7rows
FROM posts;

🎯 Interview Tip

Explain that this query mimics “rolling averages” used in dashboards, retention, or trend lines.


🔍 Interviewer’s Analysis

  • Concept Tested: RANGE-based Windowing
  • Difficulty: Advanced
  • Follow-Ups: Compare ROWS vs RANGE; extend to 30-day moving avg.

🪄 End of Q18.


❓ Q19: Extract first and last names from a full_name column.

🧩 Problem Statement

Given an employee table with full_name, split it into two separate columns: first_name and last_name.


💾 Step 1: Assume Example Tables

employees

emp_idfull_name
1Amit Sharma
2Ananya Roy
3Bhavna Rai
4Chetan Joshi

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: String splitting using functions like SUBSTRING_INDEX, SPLIT_PART, or CHARINDEX depending on SQL dialect.

🏗️ SQL Query (MySQL)

SELECT emp_id,
       SUBSTRING_INDEX(full_name, ' ', 1) AS first_name,
       SUBSTRING_INDEX(full_name, ' ', -1) AS last_name
FROM employees;

🏗️ SQL Query (PostgreSQL)

SELECT emp_id,
       SPLIT_PART(full_name, ' ', 1) AS first_name,
       SPLIT_PART(full_name, ' ', 2) AS last_name
FROM employees;

🔍 Explanation

  • Split by the first space ' '.
  • The first token = first name; the last = last name.

🧾 Step 3: Expected Output

emp_idfirst_namelast_name
1AmitSharma
2AnanyaRoy
3BhavnaRai
4ChetanJoshi

⚙️ Step 4: Why This Query Works

String functions parse the full name at the space character and extract text before and after.


⚡ Step 5: Common Mistakes

  • Names with middle names or extra spaces cause wrong splits.
  • Forgetting to trim extra whitespace.

🧠 Step 6: Alternate Approach

SELECT emp_id,
       LEFT(full_name, CHARINDEX(' ', full_name)-1) AS first_name,
       RIGHT(full_name, LEN(full_name) - CHARINDEX(' ', full_name)) AS last_name
FROM employees;

(For SQL Server)


🎯 Interview Tip

Mention how to handle multi-word names using regex or splitting by position of first/last space.


🔍 Interviewer’s Analysis

  • Concept Tested: String parsing
  • Difficulty: Beginner
  • Follow-Ups: Handle multiple spaces; trim names with LTRIM() / RTRIM().

🪄 End of Q19.


❓ Q20: Find customers whose email domain is “gmail.com”.

🧩 Problem Statement

Identify all customers whose email addresses belong to the Gmail domain.


💾 Step 1: Assume Example Tables

customers

customer_idnameemail
1Rajraj@gmail.com
2Aishaaisha@yahoo.com
3Sunilsunil@gmail.com
4Priyapriya@outlook.com

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: String pattern matching using LIKE or SUBSTRING_INDEX().

🏗️ SQL Query

SELECT customer_id, name, email
FROM customers
WHERE email LIKE '%@gmail.com';

🔍 Explanation

  • % matches any preceding text.
  • Ensures email ends with “@gmail.com”.

🧾 Step 3: Expected Output

customer_idnameemail
1Rajraj@gmail.com
3Sunilsunil@gmail.com

⚙️ Step 4: Why This Query Works

LIKE '%@gmail.com' filters text ending with that domain.


⚡ Step 5: Common Mistakes

  • Forgetting % before domain → matches only exact @gmail.com.
  • Using case-sensitive match (LIKE is case-insensitive in MySQL, but not in PostgreSQL).

🧠 Step 6: Alternate Approach

-- Extract domain and filter
SELECT customer_id, name, email
FROM customers
WHERE SUBSTRING_INDEX(email, '@', -1) = 'gmail.com';

🎯 Interview Tip

Mention how to normalize case (LOWER(email)) before comparing domains.


🔍 Interviewer’s Analysis

  • Concept Tested: String pattern search
  • Difficulty: Beginner
  • Follow-Ups: Extract username vs domain; group customers by domain.

🪄 End of Q20.


❓ Q21: Retrieve orders made in the last 7 days.

🧩 Problem Statement

Find all orders whose order_date falls within the last 7 days from today.


💾 Step 1: Assume Example Tables

orders

order_idcustomer_idorder_dateamount
112024-02-205000
222024-02-253000
332024-02-287000
442024-03-014000

(Assume today = 2024-03-01)


🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: Date arithmetic using CURRENT_DATE or NOW().

🏗️ SQL Query (MySQL/PostgreSQL)

SELECT order_id, customer_id, order_date, amount
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '7 DAY';

🔍 Explanation

Subtracts 7 days from today’s date and filters all orders on or after that date.


🧾 Step 3: Expected Output

order_idcustomer_idorder_dateamount
222024-02-253000
332024-02-287000
442024-03-014000

⚙️ Step 4: Why This Query Works

Date functions support subtraction intervals, allowing dynamic filtering without hardcoding.


⚡ Step 5: Common Mistakes

  • Using NOW() without DATE() → includes time precision errors.
  • Forgetting to include the current date boundary.

🧠 Step 6: Alternate Approach

-- SQL Server
SELECT *
FROM orders
WHERE order_date >= DATEADD(DAY, -7, GETDATE());

🎯 Interview Tip

Always confirm timezone alignment — especially in analytics pipelines.


🔍 Interviewer’s Analysis

  • Concept Tested: Date filtering / relative intervals
  • Difficulty: Beginner
  • Follow-Ups: Fetch last month’s orders; group by day.

🪄 End of Q21.


❓ Q22: Calculate the age of each employee.

🧩 Problem Statement

Given birth_date, calculate the employee’s age in years as of today.


💾 Step 1: Assume Example Tables

employees

emp_idnamebirth_date
1Amit Sharma1990-05-10
2Ananya Roy1995-08-22
3Bhavna Rai1987-03-01

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: Date difference functions (DATEDIFF, AGE, TIMESTAMPDIFF).

🏗️ SQL Query (MySQL)

SELECT emp_id,
       name,
       TIMESTAMPDIFF(YEAR, birth_date, CURDATE()) AS age
FROM employees;

🏗️ SQL Query (PostgreSQL)

SELECT emp_id,
       name,
       EXTRACT(YEAR FROM AGE(CURRENT_DATE, birth_date)) AS age
FROM employees;

🔍 Explanation

Subtracts birth_date from today’s date to find total years elapsed.


🧾 Step 3: Expected Output

emp_idnameage
1Amit Sharma34
2Ananya Roy29
3Bhavna Rai37

⚙️ Step 4: Why This Query Works

SQL’s date functions automatically handle leap years and partial months accurately.


⚡ Step 5: Common Mistakes

  • Using YEAR(CURDATE()) - YEAR(birth_date) → gives wrong result before birthday.
  • Storing dates as text (use proper DATE type).

🧠 Step 6: Alternate Approach

SELECT emp_id, name,
       FLOOR(DATEDIFF(CURDATE(), birth_date) / 365.25) AS age
FROM employees;

🎯 Interview Tip

Mention that for precise age (down to days), use interval math rather than dividing by 365.


🔍 Interviewer’s Analysis

  • Concept Tested: Date functions / age calculation
  • Difficulty: Beginner
  • Follow-Ups: Compute employee tenure; find oldest/youngest employee.

🪄 End of Q22.


❓ Q23: Count how many users registered each month.

🧩 Problem Statement

Group users by registration month and count how many joined in each month.


💾 Step 1: Assume Example Tables

users

user_idnameregistration_date
1Raj2024-01-15
2Aisha2024-01-20
3Sunil2024-02-05
4Priya2024-02-25
5Deepa2024-03-01

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: Date truncation using MONTH() or DATE_TRUNC() for grouping.

🏗️ SQL Query (MySQL)

SELECT DATE_FORMAT(registration_date, '%Y-%m') AS month,
       COUNT(*) AS user_count
FROM users
GROUP BY DATE_FORMAT(registration_date, '%Y-%m')
ORDER BY month;

🏗️ SQL Query (PostgreSQL)

SELECT TO_CHAR(DATE_TRUNC('month', registration_date), 'YYYY-MM') AS month,
       COUNT(*) AS user_count
FROM users
GROUP BY DATE_TRUNC('month', registration_date)
ORDER BY month;

🔍 Explanation

  • Truncate each registration date to month level.
  • Count users per month.

🧾 Step 3: Expected Output

monthuser_count
2024-012
2024-022
2024-031

⚙️ Step 4: Why This Query Works

Truncating to month ensures all dates in the same month fall into the same group.


⚡ Step 5: Common Mistakes

  • Grouping by full date instead of month.
  • Forgetting to format or order chronologically (lexicographic sort misorders months).

🧠 Step 6: Alternate Approach

SELECT EXTRACT(YEAR FROM registration_date) AS year,
       EXTRACT(MONTH FROM registration_date) AS month,
       COUNT(*) AS user_count
FROM users
GROUP BY year, month
ORDER BY year, month;

Cleaner for year-over-year comparisons.


🎯 Interview Tip

Highlight that monthly grouping is common in KPIs like MAU (Monthly Active Users) and retention dashboards.


🔍 Interviewer’s Analysis

  • Concept Tested: Date grouping / aggregation
  • Difficulty: Intermediate
  • Follow-Ups: Extend to week/day granularity; cumulative monthly users.

🪄 End of Q23.


❓ Q24: Find users who made purchases every month of a year.

🧩 Problem Statement

Identify customers who made at least one purchase in every month of 2024.


💾 Step 1: Assume Example Tables

orders

order_idcustomer_idorder_dateamount
112024-01-15500
212024-02-10300
312024-03-05400
422024-01-25600
522024-04-15200
632024-01-20800
732024-02-20900
832024-03-101000
932024-04-01700
1032024-05-03900

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We must check if a user appears in all 12 months for that year.

🏗️ SQL Query

SELECT customer_id
FROM (
  SELECT customer_id,
         COUNT(DISTINCT EXTRACT(MONTH FROM order_date)) AS active_months
  FROM orders
  WHERE EXTRACT(YEAR FROM order_date) = 2024
  GROUP BY customer_id
) t
WHERE active_months = 12;

🔍 Explanation

  • Inner query counts distinct months with orders per user.
  • Outer query keeps users with all 12 active months.

🧾 Step 3: Expected Output

customer_id
1(if had orders in all 12 months)

⚙️ Step 4: Why This Query Works

Distinct month counting ensures even one order per month counts as active.


⚡ Step 5: Common Mistakes

  • Using COUNT(*) instead of COUNT(DISTINCT MONTH) — overcounts duplicates.
  • Forgetting to filter by year.

🧠 Step 6: Alternate Approach

To check for specific months dynamically:

SELECT customer_id
FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY customer_id
HAVING COUNT(DISTINCT TO_CHAR(order_date, 'YYYY-MM')) = 12;

🎯 Interview Tip

Mention that this is useful in retention or engagement metrics (“active all months”).


🔍 Interviewer’s Analysis

  • Concept Tested: Conditional aggregation + DISTINCT counting
  • Difficulty: Intermediate
  • Follow-Ups: Adapt to “active at least 6 months.”

🪄 End of Q24.


❓ Q25: Identify customers’ first and last purchase dates.

🧩 Problem Statement

For each customer, show the earliest and latest order dates.


💾 Step 1: Assume Example Tables

orders

order_idcustomer_idorder_dateamount
112023-01-10500
212024-02-05300
322023-07-01200
422024-01-20400
532023-03-15900

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We use MIN() and MAX() grouped by customer_id.

🏗️ SQL Query

SELECT customer_id,
       MIN(order_date) AS first_purchase,
       MAX(order_date) AS last_purchase
FROM orders
GROUP BY customer_id;

🔍 Explanation

MIN finds earliest date; MAX finds latest date per customer.


🧾 Step 3: Expected Output

customer_idfirst_purchaselast_purchase
12023-01-102024-02-05
22023-07-012024-01-20
32023-03-152023-03-15

⚙️ Step 4: Why This Query Works

Grouping aggregates across rows for each customer, returning their date range.


⚡ Step 5: Common Mistakes

  • Forgetting GROUP BY.
  • Using ORDER BY and LIMIT — less efficient.

🧠 Step 6: Alternate Approach

To fetch also the amounts of first/last purchases:

SELECT o.customer_id,
       FIRST_VALUE(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) AS first_date,
       LAST_VALUE(order_date) OVER (PARTITION BY customer_id ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_date
FROM orders o;

🎯 Interview Tip

Always clarify if the interviewer wants “first ever” or “first in 2024”.


🔍 Interviewer’s Analysis

  • Concept Tested: Aggregation / Date extremum
  • Difficulty: Beginner
  • Follow-Ups: Add gap between purchases.

🪄 End of Q25.


❓ Q26: Calculate repeat purchase rate (customers who ordered more than once).

🧩 Problem Statement

Find the percentage of customers who placed more than one order.


💾 Step 1: Assume Example Tables

orders

order_idcustomer_idamount
111000
211500
32700
431200
53800

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We need:

  1. Number of customers with >1 orders.
  2. Total customers.
  3. Divide (1) by (2).

🏗️ SQL Query

WITH customer_orders AS (
  SELECT customer_id, COUNT(*) AS order_count
  FROM orders
  GROUP BY customer_id
)
SELECT
  ROUND(
    (COUNT(CASE WHEN order_count > 1 THEN 1 END) * 100.0) /
    COUNT(*),
  2
  ) AS repeat_purchase_rate_percent
FROM customer_orders;

🔍 Explanation

  • CTE counts orders per customer.
  • Then computes ratio of repeaters to total customers.

🧾 Step 3: Expected Output

repeat_purchase_rate_percent
66.67

(2 of 3 customers ordered more than once.)


⚙️ Step 4: Why This Query Works

By summarizing at customer level first, we can derive aggregate proportions.


⚡ Step 5: Common Mistakes

  • Counting orders instead of customers.
  • Forgetting to multiply by 100 for percentage.

🧠 Step 6: Alternate Approach

SELECT
  (SUM(CASE WHEN order_count > 1 THEN 1 ELSE 0 END) * 1.0 / COUNT(*)) AS repeat_rate
FROM (
  SELECT customer_id, COUNT(*) AS order_count
  FROM orders
  GROUP BY customer_id
) sub;

🎯 Interview Tip

This metric is key for customer retention analytics — worth mentioning contextually.


🔍 Interviewer’s Analysis

  • Concept Tested: Aggregation, CTEs, Conditional Logic
  • Difficulty: Intermediate
  • Follow-Ups: Add time filter (“repeat within same quarter”).

🪄 End of Q26.


❓ Q27: Calculate year-over-year (YoY) growth.

🧩 Problem Statement

Compute the YoY growth percentage in sales between consecutive years.


💾 Step 1: Assume Example Tables

sales_summary

yeartotal_sales
202250000
202365000
202478000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: Compare each year’s sales to the previous year using LAG().

🏗️ SQL Query

SELECT year,
       total_sales,
       LAG(total_sales) OVER (ORDER BY year) AS prev_year_sales,
       ROUND(
         ((total_sales - LAG(total_sales) OVER (ORDER BY year)) * 100.0) /
         LAG(total_sales) OVER (ORDER BY year),
         2
       ) AS yoy_growth_percent
FROM sales_summary;

🔍 Explanation

  • LAG() pulls previous year’s total.
  • Computes percentage change.

🧾 Step 3: Expected Output

yeartotal_salesprev_year_salesyoy_growth_percent
202250000NULLNULL
2023650005000030.00
2024780006500020.00

⚙️ Step 4: Why This Query Works

The window function allows comparison without self-join.


⚡ Step 5: Common Mistakes

  • Forgetting to handle first year (NULL previous).
  • Using integer division → 0 results.

🧠 Step 6: Alternate Approach

SELECT a.year, a.total_sales,
       ((a.total_sales - b.total_sales) / b.total_sales) * 100 AS yoy_growth
FROM sales_summary a
JOIN sales_summary b
  ON a.year = b.year + 1;

Manual join alternative.


🎯 Interview Tip

Explain how YoY helps detect seasonal patterns or growth anomalies.


🔍 Interviewer’s Analysis

  • Concept Tested: LAG() / Ratio calculation
  • Difficulty: Intermediate
  • Follow-Ups: Extend to quarter-over-quarter (QoQ) growth.

🪄 End of Q27.


❓ Q28: Identify categories where sales decreased for 3 consecutive months.

🧩 Problem Statement

Detect product categories showing continuous decline in sales across 3 months.


💾 Step 1: Assume Example Tables

sales_by_category

categorymonthtotal_sales
ElectronicsJan10000
ElectronicsFeb9000
ElectronicsMar8500
FurnitureJan5000
FurnitureFeb5500
FurnitureMar6000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: Compare each month with the next two months using LAG() or pattern check.

🏗️ SQL Query

SELECT category
FROM (
  SELECT category,
         month,
         total_sales,
         LAG(total_sales, 1) OVER (PARTITION BY category ORDER BY month) AS prev_month,
         LAG(total_sales, 2) OVER (PARTITION BY category ORDER BY month) AS prev_2_months
  FROM sales_by_category
) t
WHERE total_sales < prev_month
  AND prev_month < prev_2_months;

🔍 Explanation

  • For each category, pull sales for 2 prior months.
  • Filter where 3 consecutive months show decreasing totals.

🧾 Step 3: Expected Output

category
Electronics

⚙️ Step 4: Why This Query Works

Window shifts create lagged comparisons; chained inequalities detect downward trends.


⚡ Step 5: Common Mistakes

  • Comparing wrong order (prev < current).
  • Forgetting to partition by category.

🧠 Step 6: Alternate Approach

WITH ranked AS (
  SELECT category,
         month,
         total_sales,
         ROW_NUMBER() OVER (PARTITION BY category ORDER BY month) AS rn
  FROM sales_by_category
)
SELECT s1.category
FROM ranked s1
JOIN ranked s2 ON s1.category = s2.category AND s1.rn = s2.rn - 1
JOIN ranked s3 ON s1.category = s3.category AND s1.rn = s3.rn - 2
WHERE s1.total_sales > s2.total_sales AND s2.total_sales > s3.total_sales;

🎯 Interview Tip

Mention that you can generalize this for any “n-month declining” trend using dynamic window logic.


🔍 Interviewer’s Analysis

  • Concept Tested: Trend detection / time-series windows
  • Difficulty: Advanced
  • Follow-Ups: Adapt to “increasing trend” or add threshold (% drop).

🪄 End of Q28.


SQL Developer

❓ Q1: Write a query using a CROSS JOIN to generate product–region combinations.

🧩 Problem Statement

Create all possible combinations between products and regions, regardless of whether sales exist. This is useful for generating sales forecast grids or target matrices.


💾 Step 1: Assume Example Tables

products

product_idproduct_name
1Laptop
2Phone
3Tablet

regions

region_idregion_name
101North
102South

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

A CROSS JOIN creates a Cartesian product — every row from one table is paired with every row from the other.

🏗️ SQL Query

SELECT p.product_name, r.region_name
FROM products p
CROSS JOIN regions r;

🔍 Explanation

  • No join condition — every product is paired with every region.
  • Total combinations = product count × region count.

🧾 Step 3: Expected Output

product_nameregion_name
LaptopNorth
LaptopSouth
PhoneNorth
PhoneSouth
TabletNorth
TabletSouth

⚙️ Step 4: Why This Query Works

CROSS JOIN doesn’t rely on keys — it systematically pairs all records.


⚡ Step 5: Common Mistakes

  • Forgetting that CROSS JOIN can explode data volume.
  • Confusing it with INNER JOIN (which requires matching condition).

🧠 Step 6: Alternate Approach

-- Equivalent syntax
SELECT p.product_name, r.region_name
FROM products p, regions r;

(Comma syntax is older and less explicit — prefer CROSS JOIN for clarity.)


🎯 Interview Tip

Explain when you’d use CROSS JOIN — e.g., generating all possible region-product slots before filling sales data.


🔍 Interviewer’s Analysis

  • Concept Tested: Cartesian Product Joins
  • Difficulty: Beginner → Intermediate
  • Follow-Ups: Add sales totals using LEFT JOIN after generating combinations.

🪄 End of Q1.


❓ Q2: Find students who enrolled in all available courses.

🧩 Problem Statement

Identify students who have registered for every course offered.


💾 Step 1: Assume Example Tables

students

student_idname
1Aisha
2Raj
3Meena

courses

course_idcourse_name
101Math
102Science
103History

enrollments

student_idcourse_id
1101
1102
1103
2101
3101
3103

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We need students whose count of enrolled distinct courses = total number of courses.

🏗️ SQL Query

SELECT e.student_id
FROM enrollments e
GROUP BY e.student_id
HAVING COUNT(DISTINCT e.course_id) = (
  SELECT COUNT(*) FROM courses
);

🔍 Explanation

  • Inner query → total course count.
  • Outer query → counts how many each student has taken.
  • If equal → enrolled in all.

🧾 Step 3: Expected Output

student_id
1

Aisha enrolled in all 3 courses.


⚙️ Step 4: Why This Query Works

It compares per-student distinct course enrollments with global total courses.


⚡ Step 5: Common Mistakes

  • Forgetting DISTINCT (double-counts re-enrollments).
  • Comparing with wrong total (e.g., active courses only).

🧠 Step 6: Alternate Approach

SELECT s.student_id, s.name
FROM students s
WHERE NOT EXISTS (
  SELECT 1 FROM courses c
  WHERE NOT EXISTS (
    SELECT 1 FROM enrollments e
    WHERE e.student_id = s.student_id AND e.course_id = c.course_id
  )
);

Uses double NOT EXISTS logic — elegant but advanced.


🎯 Interview Tip

Mention that this is a classic relational division problem (subset logic).


🔍 Interviewer’s Analysis

  • Concept Tested: Grouping / Subquery comparison
  • Difficulty: Intermediate
  • Follow-Ups: Find students enrolled in at least 2 courses.

🪄 End of Q2.


❓ Q3: List managers who oversee more than 3 distinct departments.

🧩 Problem Statement

Return all managers responsible for more than three unique departments.


💾 Step 1: Assume Example Tables

departments

dept_iddept_namemanager_id
10Sales1001
20HR1002
30Tech1001
40Marketing1001
50Finance1003
60Support1001

managers

manager_idname
1001Amit Kumar
1002Neha Sharma
1003David Roy

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We count departments per manager and filter where count > 3.

🏗️ SQL Query

SELECT m.manager_id, m.name, COUNT(DISTINCT d.dept_id) AS dept_count
FROM managers m
JOIN departments d ON m.manager_id = d.manager_id
GROUP BY m.manager_id, m.name
HAVING COUNT(DISTINCT d.dept_id) > 3;

🔍 Explanation

  • JOIN ensures managers linked to their departments.
  • COUNT(DISTINCT dept_id) avoids duplicates.
  • HAVING filters based on aggregate.

🧾 Step 3: Expected Output

manager_idnamedept_count
1001Amit Kumar4

⚙️ Step 4: Why This Query Works

Group aggregation summarizes data by manager; HAVING filters after grouping.


⚡ Step 5: Common Mistakes

  • Using WHERE instead of HAVING.
  • Forgetting DISTINCT for multiple department duplicates.

🧠 Step 6: Alternate Approach

To check managers who oversee exactly 3 departments:

HAVING COUNT(DISTINCT d.dept_id) = 3;

🎯 Interview Tip

Always clarify whether to count distinct or total departments — crucial difference in real-world data.


🔍 Interviewer’s Analysis

  • Concept Tested: Grouped aggregation + HAVING
  • Difficulty: Intermediate
  • Follow-Ups: Include average salary per manager’s team.

🪄 End of Q3.


❓ Q4: Identify employees who earn less than their managers.

🧩 Problem Statement

Show employees whose salary is lower than their manager’s.


💾 Step 1: Assume Example Tables

employees

emp_idnamemanager_idsalary
1Raj Patel460000
2Aisha Khan475000
3Meena Joshi580000
4David KumarNULL95000
5Neha SharmaNULL90000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: Self-join comparing employee’s salary to manager’s salary.

🏗️ SQL Query

SELECT e.name AS employee_name,
       e.salary AS employee_salary,
       m.name AS manager_name,
       m.salary AS manager_salary
FROM employees e
JOIN employees m
  ON e.manager_id = m.emp_id
WHERE e.salary < m.salary;

🔍 Explanation

  • Join employees table to itself — employees (e) vs managers (m).
  • Compare salaries across the relationship.

🧾 Step 3: Expected Output

employee_nameemployee_salarymanager_namemanager_salary
Raj Patel60000David Kumar95000
Aisha Khan75000David Kumar95000
Meena Joshi80000Neha Sharma90000

⚙️ Step 4: Why This Query Works

Self-joins allow hierarchical comparisons between rows in the same table.


⚡ Step 5: Common Mistakes

  • Forgetting to alias tables.
  • Missing WHERE e.salary < m.salary condition — leads to all pairs.

🧠 Step 6: Alternate Approach

-- Using subquery
SELECT name
FROM employees e
WHERE salary < (
  SELECT salary
  FROM employees
  WHERE emp_id = e.manager_id
);

Simpler but less efficient on large datasets.


🎯 Interview Tip

Explain why you chose a join vs subquery — discuss readability and performance trade-offs.


🔍 Interviewer’s Analysis

  • Concept Tested: Self-Join / Hierarchical Comparison
  • Difficulty: Intermediate
  • Follow-Ups: Add tie condition or average comparison per manager.

🪄 End of Q4.


❓ Q5: Find departments where the average salary exceeds the company average.

🧩 Problem Statement

List all departments whose average employee salary is higher than the overall company average salary.


💾 Step 1: Assume Example Tables

employees

emp_idnamedept_idsalary
1Amit Sharma1060000
2Ananya Roy1080000
3Bhavna Rai2090000
4Chetan Joshi2095000
5Deepa Singh3055000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We compare:

  1. Average salary per department.
  2. Overall company average. Return departments where (1) > (2).

🏗️ SQL Query

SELECT dept_id,
       AVG(salary) AS dept_avg_salary
FROM employees
GROUP BY dept_id
HAVING AVG(salary) > (
  SELECT AVG(salary) FROM employees
);

🔍 Explanation

  • Inner query → company-wide average salary.
  • Outer → computes per-department average.
  • HAVING compares group aggregate with scalar subquery.

🧾 Step 3: Expected Output

dept_iddept_avg_salary
2092500

(Dept 20’s avg = 92,500 > company avg = 76,000.)


⚙️ Step 4: Why This Query Works

The HAVING clause filters aggregated results post-grouping, comparing each department’s average to the global mean.


⚡ Step 5: Common Mistakes

  • Using WHERE with aggregate → invalid (must use HAVING).
  • Forgetting parentheses around subquery.

🧠 Step 6: Alternate Approach

WITH dept_avg AS (
  SELECT dept_id, AVG(salary) AS dept_avg_salary
  FROM employees
  GROUP BY dept_id
),
company_avg AS (
  SELECT AVG(salary) AS company_avg_salary
  FROM employees
)
SELECT d.dept_id, d.dept_avg_salary
FROM dept_avg d, company_avg c
WHERE d.dept_avg_salary > c.company_avg_salary;

CTE version — more readable for large analyses.


🎯 Interview Tip

Always show you can articulate multi-level aggregation logic — Interviewers value why you chose a subquery vs CTE.


🔍 Interviewer’s Analysis

  • Concept Tested: Aggregation comparison / HAVING with subquery
  • Difficulty: Intermediate
  • Follow-Ups: Add department names via join; compare median instead of average.

🪄 End of Q5.


❓ Q6: Find employees earning more than their department’s average.

🧩 Problem Statement

Show all employees whose salary is above the average salary of their own department.


💾 Step 1: Assume Example Tables

employees

emp_idnamedept_idsalary
1Amit Sharma1060000
2Ananya Roy1080000
3Bhavna Rai2090000
4Chetan Joshi2075000
5Deepa Singh3055000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We need department-level averages per employee — this is where window functions shine.

🏗️ SQL Query

SELECT emp_id,
       name,
       dept_id,
       salary,
       ROUND(AVG(salary) OVER (PARTITION BY dept_id), 2) AS dept_avg
FROM employees
WHERE salary > (
  SELECT AVG(salary)
  FROM employees e2
  WHERE e2.dept_id = employees.dept_id
);

🔍 Explanation

  • Inner subquery calculates per-department average dynamically.
  • Outer query keeps only those above it.

🧾 Step 3: Expected Output

emp_idnamedept_idsalary
2Ananya Roy1080000
3Bhavna Rai2090000

⚙️ Step 4: Why This Query Works

Each employee’s salary is compared against the average of their department group, computed independently per row.


⚡ Step 5: Common Mistakes

  • Omitting alias in subquery → ambiguous dept_id.
  • Forgetting to group subquery by department → wrong aggregate.

🧠 Step 6: Alternate Approach

-- Using window function (cleaner and faster)
SELECT emp_id,
       name,
       dept_id,
       salary,
       AVG(salary) OVER (PARTITION BY dept_id) AS dept_avg
FROM employees
QUALIFY salary > AVG(salary) OVER (PARTITION BY dept_id);

(Snowflake/BigQuery syntax; QUALIFY filters windowed results.)


🎯 Interview Tip

Mention that this is a common “above-average performer” query — interviewers expect you to know both the subquery and window methods.


🔍 Interviewer’s Analysis

  • Concept Tested: Correlated subquery / window partition
  • Difficulty: Intermediate
  • Follow-Ups: Rank employees within each department by salary.

🪄 End of Q6.


❓ Q7: Calculate the cumulative percentage contribution of sales per region.

🧩 Problem Statement

Display each region’s total sales and its cumulative percentage of the company’s overall sales.


💾 Step 1: Assume Example Tables

sales

regiontotal_sales
North50000
South30000
East15000
West5000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We’ll use:

  1. SUM() for total sales.
  2. SUM() OVER(ORDER BY …) for cumulative sum.
  3. Division to calculate cumulative percentage.

🏗️ SQL Query

SELECT region,
       total_sales,
       SUM(total_sales) OVER (ORDER BY total_sales DESC) AS cumulative_sales,
       ROUND(
         (SUM(total_sales) OVER (ORDER BY total_sales DESC) * 100.0) /
         SUM(total_sales) OVER (),
         2
       ) AS cumulative_percent
FROM sales
ORDER BY total_sales DESC;

🔍 Explanation

  • Sort regions by total sales (descending).
  • Compute running total using window sum.
  • Divide by overall total to get cumulative percentage.

🧾 Step 3: Expected Output

regiontotal_salescumulative_salescumulative_percent
North500005000050.00
South300008000080.00
East150009500095.00
West5000100000100.00

⚙️ Step 4: Why This Query Works

Window SUM() functions let us calculate both running totals and percent contributions in a single pass.


⚡ Step 5: Common Mistakes

  • Forgetting OVER() → turns window sum into group sum.
  • Missing ORDER BY → cumulative values incorrect.
  • Dividing by per-row total instead of global total.

🧠 Step 6: Alternate Approach

-- Simpler: just percentage of total
SELECT region,
       total_sales,
       ROUND(total_sales * 100.0 / SUM(total_sales) OVER (), 2) AS percent_of_total
FROM sales;

Gives each region’s individual contribution, not cumulative.


🎯 Interview Tip

Explain how cumulative contribution is used for Pareto analysis (80/20 rule) — key business insight.


🔍 Interviewer’s Analysis

  • Concept Tested: Windowed SUM / Percentage contribution
  • Difficulty: Intermediate → Advanced
  • Follow-Ups: Filter top regions contributing 80% of total sales.

🪄 End of Q7.


❓ Q8: Show employee performance rank reset every quarter.

🧩 Problem Statement

Rank employees based on performance score, but restart the ranking every quarter.


💾 Step 1: Assume Example Tables

employee_performance

emp_idquarterscore
1Q188
2Q175
3Q192
1Q291
2Q284
3Q278

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We use RANK() with PARTITION BY quarter to reset per quarter.

🏗️ SQL Query

SELECT emp_id,
       quarter,
       score,
       RANK() OVER (
         PARTITION BY quarter
         ORDER BY score DESC
       ) AS quarterly_rank
FROM employee_performance
ORDER BY quarter, quarterly_rank;

🔍 Explanation

  • PARTITION BY quarter → restarts rank each quarter.
  • ORDER BY score DESC → higher score = higher rank.

🧾 Step 3: Expected Output

emp_idquarterscorequarterly_rank
3Q1921
1Q1882
2Q1753
1Q2911
2Q2842
3Q2783

⚙️ Step 4: Why This Query Works

Window partitioning divides data by quarter; within each group, ranking restarts independently.


⚡ Step 5: Common Mistakes

  • Forgetting PARTITION BY → rank continues across all quarters.
  • Using ORDER BY in main query but not in OVER() → wrong rank order.

🎯 Interview Tip

This is a classic “rank reset” question — emphasize PARTITION BY logic clearly.


🔍 Interviewer’s Analysis

  • Concept Tested: Partitioned Ranking
  • Difficulty: Intermediate
  • Follow-Ups: Add tie-handling (switch RANK to DENSE_RANK).

🪄 End of Q8.


❓ Q9: Calculate monthly revenue change (month-over-month).

🧩 Problem Statement

Show each month’s revenue and how much it changed compared to the previous month.


💾 Step 1: Assume Example Tables

monthly_sales

monthrevenue
Jan50000
Feb60000
Mar55000
Apr70000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Use LAG() to access previous month’s revenue, then compute difference.

🏗️ SQL Query

SELECT month,
       revenue,
       LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
       revenue - LAG(revenue) OVER (ORDER BY month) AS change_amount,
       ROUND(
         ((revenue - LAG(revenue) OVER (ORDER BY month)) * 100.0) /
         LAG(revenue) OVER (ORDER BY month), 2
       ) AS change_percent
FROM monthly_sales;

🔍 Explanation

  • LAG() fetches the previous month’s revenue.
  • Subtracts to get absolute and percentage change.

🧾 Step 3: Expected Output

monthrevenueprev_revenuechange_amountchange_percent
Jan50000NULLNULLNULL
Feb60000500001000020.00
Mar5500060000-5000-8.33
Apr70000550001500027.27

⚙️ Step 4: Why This Query Works

LAG() makes it possible to calculate time-based deltas without self-joins.


⚡ Step 5: Common Mistakes

  • Forgetting to order by month (causes random differences).
  • Using integer division for percentages.

🎯 Interview Tip

Always mention how to handle missing months or gaps in time series.


🔍 Interviewer’s Analysis

  • Concept Tested: LAG / Sequential Comparison
  • Difficulty: Intermediate
  • Follow-Ups: Calculate cumulative MoM growth over time.

🪄 End of Q9.


❓ Q10: Calculate YoY growth for each product.

🧩 Problem Statement

For each product, compute its year-over-year (YoY) percentage growth.


💾 Step 1: Assume Example Tables

sales

product_idyearrevenue
1202250000
1202365000
1202470000
2202230000
2202333000
2202430000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We use LAG() within each product partition to compare consecutive years.

🏗️ SQL Query

SELECT product_id,
       year,
       revenue,
       LAG(revenue) OVER (
         PARTITION BY product_id
         ORDER BY year
       ) AS prev_year_revenue,
       ROUND(
         ((revenue - LAG(revenue) OVER (PARTITION BY product_id ORDER BY year)) * 100.0) /
         LAG(revenue) OVER (PARTITION BY product_id ORDER BY year),
         2
       ) AS yoy_growth_percent
FROM sales
ORDER BY product_id, year;

🔍 Explanation

  • PARTITION BY product_id ensures per-product calculations.
  • The LAG() retrieves prior year’s revenue for the same product.

🧾 Step 3: Expected Output

product_idyearrevenueprev_year_revenueyoy_growth_percent
1202250000NULLNULL
12023650005000030.00
1202470000650007.69
2202230000NULLNULL
22023330003000010.00
220243000033000-9.09

⚙️ Step 4: Why This Query Works

It performs rolling comparisons within logical groups (per product) — no self-join needed.


⚡ Step 5: Common Mistakes

  • Missing PARTITION BY → compares across products.
  • Dividing by 0 when previous revenue = 0 (handle with NULLIF).

🧠 Step 6: Alternate Approach

-- Join-based version
SELECT a.product_id, a.year,
       ((a.revenue - b.revenue) / b.revenue) * 100 AS yoy_growth
FROM sales a
JOIN sales b
  ON a.product_id = b.product_id AND a.year = b.year + 1;

Works but slower for large tables.


🎯 Interview Tip

Always mention partitioning when discussing window functions — it’s the magic word interviewers love.


🔍 Interviewer’s Analysis

  • Concept Tested: Partitioned LAG / Ratio Calculations
  • Difficulty: Intermediate → Advanced
  • Follow-Ups: Handle missing years (2022 → 2024 jump).

🪄 End of Q10.


❓ Q11: Display products with 3 consecutive months of sales decline.

🧩 Problem Statement

Identify products whose monthly sales dropped 3 times in a row.


💾 Step 1: Assume Example Tables

monthly_sales

product_idmonthsales
1Jan1000
1Feb900
1Mar800
1Apr700
2Jan1200
2Feb1300
2Mar1250

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We’ll use multiple LAG() windows to detect a downward chain.

🏗️ SQL Query

SELECT product_id
FROM (
  SELECT product_id,
         month,
         sales,
         LAG(sales, 1) OVER (PARTITION BY product_id ORDER BY month) AS prev_sales,
         LAG(sales, 2) OVER (PARTITION BY product_id ORDER BY month) AS prev2_sales
  FROM monthly_sales
) t
WHERE sales < prev_sales AND prev_sales < prev2_sales;

🔍 Explanation

  • Two lagged comparisons ensure 3 consecutive months of decline.
  • The outer filter checks monotonic decreasing trend.

🧾 Step 3: Expected Output

product_id
1

⚙️ Step 4: Why This Query Works

Sequential window comparisons reveal downward patterns efficiently — no need for recursive logic.


⚡ Step 5: Common Mistakes

  • Forgetting to partition by product_id.
  • Using < inconsistently (e.g., <= creates false positives for flat months).

🧠 Step 6: Alternate Approach

-- Using row numbers and self-join
WITH ranked AS (
  SELECT product_id, month, sales,
         ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY month) AS rn
  FROM monthly_sales
)
SELECT r1.product_id
FROM ranked r1
JOIN ranked r2 ON r1.product_id = r2.product_id AND r1.rn = r2.rn - 1
JOIN ranked r3 ON r1.product_id = r3.product_id AND r1.rn = r3.rn - 2
WHERE r1.sales > r2.sales AND r2.sales > r3.sales;

More verbose but works in all SQL dialects.


🎯 Interview Tip

Mention this query demonstrates trend analysis — often used for detecting declining products or churn.


🔍 Interviewer’s Analysis

  • Concept Tested: Consecutive trend detection / LAG chaining
  • Difficulty: Advanced
  • Follow-Ups: Detect 3 consecutive increases; detect “flat” patterns.

🪄 End of Q11.


❓ Q12: Assign ROW_NUMBER to employees ordered by salary.

🧩 Problem Statement

Give each employee a unique sequential number based on their salary order (highest first).


💾 Step 1: Assume Example Tables

employees

emp_idnamesalary
1Amit Sharma60000
2Ananya Roy80000
3Bhavna Rai90000
4Chetan Joshi70000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We’ll use ROW_NUMBER() to generate unique sequence numbers based on salary order.

🏗️ SQL Query

SELECT emp_id,
       name,
       salary,
       ROW_NUMBER() OVER (ORDER BY salary DESC) AS salary_rank
FROM employees;

🔍 Explanation

  • ORDER BY salary DESC sorts employees by salary.
  • ROW_NUMBER() assigns unique numbers starting at 1.

🧾 Step 3: Expected Output

emp_idnamesalarysalary_rank
3Bhavna Rai900001
2Ananya Roy800002
4Chetan Joshi700003
1Amit Sharma600004

⚙️ Step 4: Why This Query Works

Unlike RANK(), ROW_NUMBER() always increments sequentially — even for ties.


⚡ Step 5: Common Mistakes

  • Using RANK() expecting sequential numbers — produces gaps.
  • Forgetting to specify ORDER BY → arbitrary numbering.

🧠 Step 6: Alternate Approach

-- Add department partition
SELECT emp_id, name, dept_id, salary,
       ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS dept_salary_rank
FROM employees;

🎯 Interview Tip

Interviewers often ask, “What’s the difference between ROW_NUMBER, RANK, and DENSE_RANK?” Be ready to explain how they handle ties.


🔍 Interviewer’s Analysis

  • Concept Tested: Window Ranking / ROW_NUMBER
  • Difficulty: Beginner → Intermediate
  • Follow-Ups: Find top-N per department using ROW_NUMBER in subquery.

🪄 End of Q12.


❓ Q13: Delete duplicate rows from a table.

🧩 Problem Statement

Remove duplicate records from a table while keeping only one instance of each unique row.


💾 Step 1: Assume Example Tables

customers

customer_idnameemail
1Aishaaisha@gmail.com
2Rajraj@gmail.com
3Meenameena@gmail.com
2Rajraj@gmail.com
3Meenameena@gmail.com

(Notice duplicates for Raj and Meena.)


🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Use a window function to assign row numbers to duplicates, then delete all but the first.

🏗️ SQL Query (Generic SQL)

DELETE FROM customers
WHERE customer_id NOT IN (
  SELECT MIN(customer_id)
  FROM customers
  GROUP BY name, email
);

🏗️ SQL Query (If DB supports CTEs)

WITH ranked AS (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY name, email ORDER BY customer_id) AS rn
  FROM customers
)
DELETE FROM ranked WHERE rn > 1;

🔍 Explanation

  • Group duplicates by logical uniqueness (name, email).
  • Keep the first row, delete others.
  • ROW_NUMBER() ensures deterministic deletion.

🧾 Step 3: Expected Output

customer_idnameemail
1Aishaaisha@gmail.com
2Rajraj@gmail.com
3Meenameena@gmail.com

⚙️ Step 4: Why This Query Works

The ranking-based approach ensures exactly one record per group remains.


⚡ Step 5: Common Mistakes

  • Forgetting to partition by correct columns.
  • Running DELETE without filtering — wiping out all rows.

🎯 Interview Tip

Interviewers often ask: “How would you do this without CTEs?” — Be ready to explain both approaches (group aggregate + window).


🔍 Interviewer’s Analysis

  • Concept Tested: Deduplication / Window filtering
  • Difficulty: Intermediate
  • Follow-Ups: Preserve the latest record (use ORDER BY in ROW_NUMBER()).

🪄 End of Q13.


❓ Q14: Keep only the first occurrence (lowest ID) of duplicates.

🧩 Problem Statement

From a table with duplicate entries, retain only the first row (lowest ID) for each unique combination.


💾 Step 1: Assume Example Tables

transactions

trans_iduser_idproduct_idamount
1101P1500
2101P1500
3101P2300
4102P3700

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We want the earliest record per user-product pair.

🏗️ SQL Query

DELETE FROM transactions
WHERE trans_id NOT IN (
  SELECT MIN(trans_id)
  FROM transactions
  GROUP BY user_id, product_id
);

🔍 Explanation

  • MIN(trans_id) identifies the first occurrence per group.
  • Deletes all others with higher IDs.

🧾 Step 3: Expected Output

trans_iduser_idproduct_idamount
1101P1500
3101P2300
4102P3700

⚙️ Step 4: Why This Query Works

Using the MIN() of unique identifiers ensures deterministic retention.


⚡ Step 5: Common Mistakes

  • Using DISTINCT instead of deletion — doesn’t remove from table.
  • Forgetting composite uniqueness (user_id + product_id).

🎯 Interview Tip

In production, never DELETE directly — test first with SELECT to confirm affected rows.


🔍 Interviewer’s Analysis

  • Concept Tested: Deduplication via group min/max
  • Difficulty: Beginner → Intermediate
  • Follow-Ups: Keep latest record (replace MIN() with MAX()).

🪄 End of Q14.


❓ Q15: Standardize phone numbers (remove special characters).

🧩 Problem Statement

Clean messy phone numbers by removing -, (, ), spaces, and converting to a uniform 10-digit string.


💾 Step 1: Assume Example Tables

customers

customer_idnamephone
1Raj(123)-456-7890
2Aisha123 456 7890
3Meena123.456.7890
4Amit+91-98765-43210

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We’ll use nested REPLACE() or regex functions to strip unwanted characters.

🏗️ SQL Query (MySQL)

SELECT customer_id,
       name,
       REPLACE(
         REPLACE(
           REPLACE(
             REPLACE(phone, '-', ''),
           '(', ''),
         ')', ''),
       ' ', '') AS clean_phone
FROM customers;

🏗️ SQL Query (PostgreSQL)

SELECT customer_id,
       name,
       REGEXP_REPLACE(phone, '[^0-9]', '', 'g') AS clean_phone
FROM customers;

🔍 Explanation

  • Regex removes all non-digit characters.
  • Results in consistent numeric-only phone format.

🧾 Step 3: Expected Output

customer_idnameclean_phone
1Raj1234567890
2Aisha1234567890
3Meena1234567890
4Amit919876543210

⚙️ Step 4: Why This Query Works

Regex [^\d] removes all characters that aren’t digits, making phone data clean for analytics or joining.


⚡ Step 5: Common Mistakes

  • Forgetting the 'g' flag (global replace).
  • Not handling international codes consistently.

🎯 Interview Tip

Mention you’d validate lengths post-cleanup (10 or 12 digits) before loading into systems.


🔍 Interviewer’s Analysis

  • Concept Tested: String cleaning / regex
  • Difficulty: Intermediate
  • Follow-Ups: Enforce specific length via CHAR_LENGTH().

🪄 End of Q15.


❓ Q16: Split address into street, city, and state.

🧩 Problem Statement

Given a combined address column, extract its components: street, city, and state.


💾 Step 1: Assume Example Tables

addresses

idaddress
1123 Main St, Mumbai, MH
2456 Oak Road, Delhi, DL
3789 Palm Ave, Bangalore, KA

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We’ll use string-splitting functions such as SUBSTRING_INDEX() or SPLIT_PART().

🏗️ SQL Query (MySQL)

SELECT id,
       TRIM(SUBSTRING_INDEX(address, ',', 1)) AS street,
       TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(address, ',', 2), ',', -1)) AS city,
       TRIM(SUBSTRING_INDEX(address, ',', -1)) AS state
FROM addresses;

🏗️ SQL Query (PostgreSQL)

SELECT id,
       SPLIT_PART(address, ',', 1) AS street,
       SPLIT_PART(address, ',', 2) AS city,
       SPLIT_PART(address, ',', 3) AS state
FROM addresses;

🔍 Explanation

  • Split the string by commas.
  • Extract the 1st, 2nd, and 3rd parts.

🧾 Step 3: Expected Output

idstreetcitystate
1123 Main StMumbaiMH
2456 Oak RoadDelhiDL
3789 Palm AveBangaloreKA

⚙️ Step 4: Why This Query Works

The SPLIT_PART() or SUBSTRING_INDEX() functions allow parsing multi-part text columns into normalized attributes.


⚡ Step 5: Common Mistakes

  • Forgetting to trim spaces.
  • Assuming consistent delimiters — real data often varies.

🎯 Interview Tip

Mention that in ETL pipelines, this logic often moves to staging tables to avoid slowing the main DB.


🔍 Interviewer’s Analysis

  • Concept Tested: String splitting / text parsing
  • Difficulty: Intermediate
  • Follow-Ups: Handle inconsistent delimiters (“;” or “|”).

🪄 End of Q16.


❓ Q17: Convert customer names into Proper Case.

🧩 Problem Statement

Standardize customer names so that each word starts with an uppercase letter.


💾 Step 1: Assume Example Tables

customers

customer_idname
1raj patel
2AISHA ROY
3meena singh

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Use text transformation functions — varies by SQL dialect.

🏗️ SQL Query (PostgreSQL)

SELECT customer_id,
       INITCAP(name) AS proper_name
FROM customers;

🏗️ SQL Query (MySQL)

SELECT customer_id,
       CONCAT(UCASE(LEFT(name, 1)), LCASE(SUBSTRING(name, 2))) AS proper_name
FROM customers;

🔍 Explanation

  • INITCAP() capitalizes the first letter of each word.
  • MySQL version manually constructs it using substring logic.

🧾 Step 3: Expected Output

customer_idproper_name
1Raj Patel
2Aisha Roy
3Meena Singh

⚙️ Step 4: Why This Query Works

Transforms messy casing into consistent, human-readable names.


⚡ Step 5: Common Mistakes

  • Ignoring multi-word names in MySQL (only first letter handled).
  • Forgetting locale-specific capitalization rules.

🎯 Interview Tip

Mention that standardized text improves joins, reporting, and ML data quality.


🔍 Interviewer’s Analysis

  • Concept Tested: String transformation
  • Difficulty: Beginner
  • Follow-Ups: Extend to addresses or product titles.

🪄 End of Q17.


❓ Q18: Design tables to store credit card transactions with normalization.

🧩 Problem Statement

Design a normalized schema for storing credit card transactions to ensure data consistency, minimal redundancy, and fast lookups.


💾 Step 1: Conceptual Breakdown

We’re modeling:

  • Customers — who own cards
  • Cards — issued to customers
  • Merchants — where transactions occur
  • Transactions — main fact table

We’ll normalize to 3NF (Third Normal Form):

  • Each table has a unique key.
  • No transitive dependencies.

🧮 Step 2: Table Designs

1️⃣ customers

CREATE TABLE customers (
  customer_id INT PRIMARY KEY,
  name VARCHAR(100),
  email VARCHAR(100) UNIQUE,
  phone VARCHAR(15)
);

2️⃣ cards

CREATE TABLE cards (
  card_id INT PRIMARY KEY,
  customer_id INT,
  card_number CHAR(16) UNIQUE,
  issue_date DATE,
  expiry_date DATE,
  FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

3️⃣ merchants

CREATE TABLE merchants (
  merchant_id INT PRIMARY KEY,
  name VARCHAR(100),
  category VARCHAR(50),
  city VARCHAR(50)
);

4️⃣ transactions

CREATE TABLE transactions (
  transaction_id BIGINT PRIMARY KEY,
  card_id INT,
  merchant_id INT,
  transaction_date TIMESTAMP,
  amount DECIMAL(10, 2),
  status VARCHAR(20),
  FOREIGN KEY (card_id) REFERENCES cards(card_id),
  FOREIGN KEY (merchant_id) REFERENCES merchants(merchant_id)
);

🧾 Step 3: Sample Data Flow

customer_idnameemail
1Rajraj@bank.com
2Aishaaisha@bank.com
card_idcustomer_idcard_number
1011234567812345678
1128765432187654321
merchant_idnamecategorycity
101AmazonE-ComMumbai
102StarbucksFoodDelhi
transaction_idcard_idmerchant_idamountstatus
9001101011200.50Success
900210102350.00Success

⚙️ Step 4: Why This Schema Works

  • Fully normalized → no duplication.
  • Relationships clearly defined with foreign keys.
  • Easy joins for transaction analysis and fraud detection.

🎯 Interview Tip

Interviewers often ask: “How would you denormalize this for analytics?” → Create a materialized view joining all four tables for dashboard reads.


🔍 Interviewer’s Analysis

  • Concept Tested: Schema design / normalization
  • Difficulty: Advanced
  • Follow-Ups: Add partitioning for transactions; introduce currency dimension.

🪄 End of Q18.


❓ Q19: Create an index on customer_id for fast lookups.

🧩 Problem Statement

You frequently filter transactions by customer_id. Improve query performance using indexes.


💾 Step 1: Base Table

CREATE TABLE transactions (
  transaction_id BIGINT PRIMARY KEY,
  customer_id INT,
  amount DECIMAL(10,2),
  transaction_date DATE
);

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Indexing creates a sorted access path — reducing search time from O(n) to O(log n).

🏗️ SQL Query

CREATE INDEX idx_customer_id
ON transactions (customer_id);

🔍 Explanation

  • The database builds a B-tree index on customer_id.

  • Queries like:

    SELECT * FROM transactions WHERE customer_id = 101;

    will use this index to locate rows faster.


⚙️ Step 4: Why This Query Works

An index avoids full-table scans for frequent lookups on indexed columns.


⚡ Step 5: Common Mistakes

  • Indexing low-cardinality columns (e.g., status) → no gain.
  • Over-indexing → slows down inserts and updates.

🎯 Interview Tip

Mention “Index Selectivity”

The ratio of unique values to total rows — higher is better.


🔍 Interviewer’s Analysis

  • Concept Tested: Index design / performance tuning
  • Difficulty: Intermediate
  • Follow-Ups: Explain difference between clustered and non-clustered indexes.

🪄 End of Q19.


❓ Q20: Implement partitioning for large tables.

🧩 Problem Statement

Your transactions table has billions of rows. Implement partitioning to improve performance and maintainability.


💾 Step 1: Example Table

CREATE TABLE transactions (
  transaction_id BIGINT,
  transaction_date DATE,
  amount DECIMAL(10,2)
);

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Partitioning divides data physically into chunks — usually by date, region, or hash — improving query speed and manageability.

🏗️ SQL Query (MySQL RANGE partitioning)

CREATE TABLE transactions_partitioned (
  transaction_id BIGINT,
  transaction_date DATE,
  amount DECIMAL(10,2)
)
PARTITION BY RANGE (YEAR(transaction_date)) (
  PARTITION p2023 VALUES LESS THAN (2024),
  PARTITION p2024 VALUES LESS THAN (2025),
  PARTITION pmax  VALUES LESS THAN MAXVALUE
);

🔍 Explanation

  • Each partition holds one year’s worth of data.

  • Queries like

    SELECT * FROM transactions_partitioned WHERE transaction_date >= '2024-01-01';

    will automatically scan only the relevant partition.


⚙️ Step 4: Why This Query Works

Partitioning prunes irrelevant data blocks → faster scans.


⚡ Step 5: Common Mistakes

  • Using too many partitions → metadata overhead.
  • Forgetting to update partition definitions annually.

🎯 Interview Tip

Always mention partition pruning — the DB engine only reads needed partitions.


🔍 Interviewer’s Analysis

  • Concept Tested: Table partitioning / scalability
  • Difficulty: Advanced
  • Follow-Ups: Discuss hash vs range vs list partitioning.

🪄 End of Q20.


❓ Q21: Create a temporary table for storing summary results.

🧩 Problem Statement

During ETL, you need to store intermediate aggregations temporarily — e.g., daily revenue per region.


💾 Step 1: Example Base Table

sales

regionsale_dateamount
North2024-01-015000
South2024-01-017000
North2024-01-028000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

A temporary table lives only within your session — perfect for multi-step ETL transformations.

🏗️ SQL Query

CREATE TEMPORARY TABLE temp_daily_revenue AS
SELECT region,
       sale_date,
       SUM(amount) AS total_sales
FROM sales
GROUP BY region, sale_date;

🔍 Explanation

  • Creates a temp table with summarized data.
  • Auto-drops when session ends.

🧾 Step 3: Expected Output

regionsale_datetotal_sales
North2024-01-015000
South2024-01-017000
North2024-01-028000

⚙️ Step 4: Why This Query Works

Temporary tables allow multi-step computation without polluting the permanent schema.


⚡ Step 5: Common Mistakes

  • Using permanent tables for staging data.
  • Forgetting session cleanup (though most databases handle it automatically).

🎯 Interview Tip

Mention that temporary tables are critical for incremental ETL pipelines and intermediate staging.


🔍 Interviewer’s Analysis

  • Concept Tested: Temporary tables / staging logic
  • Difficulty: Intermediate
  • Follow-Ups: Use CTEs instead of temporary tables for one-time queries.

🪄 End of Q21.


❓ Q22: Create a view summarizing customer purchases.

🧩 Problem Statement

Define a view that shows each customer’s total purchase amount and last purchase date.


💾 Step 1: Assume Example Tables

orders

order_idcustomer_idorder_dateamount
11012024-01-10500
21012024-02-15300
31022024-01-20700
41032024-03-01200

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

A view is a stored query result that can be referenced like a table.

🏗️ SQL Query

CREATE VIEW customer_summary AS
SELECT customer_id,
       SUM(amount) AS total_purchases,
       MAX(order_date) AS last_purchase_date
FROM orders
GROUP BY customer_id;

🔍 Explanation

  • Aggregates total purchase amount per customer.
  • Tracks most recent purchase date.
  • Reusable for dashboards and reporting.

🧾 Step 3: Expected Output

customer_idtotal_purchaseslast_purchase_date
1018002024-02-15
1027002024-01-20
1032002024-03-01

⚙️ Step 4: Why This Query Works

The view encapsulates recurring logic — improving maintainability and code reuse.


⚡ Step 5: Common Mistakes

  • Using views with volatile data in critical transactions → may degrade performance.
  • Forgetting to refresh materialized views (in DBs that support them).

🎯 Interview Tip

Explain when to use:

  • Views → Logical abstraction.
  • Materialized Views → Precomputed aggregates for faster reporting.

🔍 Interviewer’s Analysis

  • Concept Tested: View creation / data summarization
  • Difficulty: Beginner → Intermediate
  • Follow-Ups: Convert to materialized view for faster reads.

🪄 End of Q22.


❓ Q23: Rewrite a correlated subquery as a join to improve performance.

🧩 Problem Statement

Optimize a query that uses a correlated subquery by rewriting it as a JOIN, which performs better on large datasets.


💾 Step 1: Assume Example Tables

employees

emp_idnamedept_idsalary
1Raj Patel1070000
2Aisha Khan1085000
3Meena Joshi2095000
4Chetan Joshi2088000

departments

dept_iddept_namelocation
10SalesMumbai
20EngineeringDelhi

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

The original query uses a correlated subquery to fetch department names.

🚫 Inefficient Version

SELECT e.name,
       (SELECT dept_name
        FROM departments d
        WHERE d.dept_id = e.dept_id) AS dept_name
FROM employees e;

This subquery executes once per row (O(n²) time for large data).


Optimized JOIN Version

SELECT e.name, d.dept_name
FROM employees e
JOIN departments d
  ON e.dept_id = d.dept_id;

🔍 Explanation

  • The join retrieves all needed data in one scan.
  • Modern SQL optimizers convert simple correlated subqueries into joins, but not always — explicitly writing the join is safer and clearer.

🧾 Step 3: Expected Output

namedept_name
Raj PatelSales
Aisha KhanSales
Meena JoshiEngineering
Chetan JoshiEngineering

⚙️ Step 4: Why This Query Works

JOINs allow set-based operations, minimizing per-row lookups and improving scalability.


⚡ Step 5: Common Mistakes

  • Forgetting to use table aliases (e, d).
  • Using LEFT JOIN unnecessarily when an INNER JOIN suffices.

🎯 Interview Tip

Interviewers often ask:

“How does rewriting improve performance?” Answer: “Because JOINs allow SQL engines to optimize via hash joins or index scans, while correlated subqueries often force nested-loop evaluations.”


🔍 Interviewer’s Analysis

  • Concept Tested: Subquery optimization
  • Difficulty: Intermediate
  • Follow-Ups: Compare runtime via EXPLAIN plans.

🪄 End of Q23.


❓ Q24: Find the difference in execution time between indexed vs non-indexed queries.

🧩 Problem Statement

Demonstrate how indexes improve query performance by comparing execution time of a filtered query with and without an index.


💾 Step 1: Assume Example Table

orders

order_idcustomer_idorder_dateamount
11012024-01-01500
21022024-01-05700

(Assume ~1M rows in practice.)


🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We’ll test a lookup query with and without an index on customer_id.


🚫 Without Index

SELECT * FROM orders WHERE customer_id = 101;
  • Execution plan: Full table scan
  • Time complexity: O(n)

✅ With Index

CREATE INDEX idx_customer_id ON orders(customer_id);

SELECT * FROM orders WHERE customer_id = 101;
  • Execution plan: Index seek
  • Time complexity: O(log n)

🧾 Step 3: Expected Result (Performance Conceptually)

ScenarioOperation TypeApprox. Time (for 1M rows)
Without IndexFull Scan~1.2 sec
With IndexIndex Seek~0.01 sec

⚙️ Step 4: Why This Query Works

An index acts like a sorted lookup table — dramatically reduces search space. Query planners automatically use the best available index for filtering conditions.


⚡ Step 5: Common Mistakes

  • Indexing low-cardinality columns (e.g., gender) → negligible benefit.
  • Forgetting to analyze query plans using EXPLAIN.

🧠 Step 6: Verify with EXPLAIN

EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 101;

Compare cost estimates before and after indexing.


🎯 Interview Tip

If asked, say:

“Indexing improves read performance but adds overhead on writes — so I only index high-selectivity columns used in frequent filters.”


🔍 Interviewer’s Analysis

  • Concept Tested: Indexing impact / EXPLAIN plans
  • Difficulty: Intermediate
  • Follow-Ups: Composite indexes and index maintenance cost.

🪄 End of Q24.


❓ Q25: Demonstrate the safe use of query hints (USE INDEX, FORCE ORDER).

🧩 Problem Statement

Show how to guide the SQL optimizer using query hints, while understanding when it’s safe (or unsafe) to do so.


💾 Step 1: Assume Example Tables

employees departments

emp_idnamedept_id
1Raj10
2Aisha20
3Meena10
dept_iddept_name
10Sales
20HR

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Most of the time, the optimizer chooses the best execution plan automatically. But sometimes — e.g., in skewed data — you can use hints to influence behavior.


✅ Example 1 — Force Index

SELECT e.name, d.dept_name
FROM employees e USE INDEX (idx_dept)
JOIN departments d ON e.dept_id = d.dept_id;

Use Case: If multiple indexes exist, this hint forces the optimizer to use idx_dept.


✅ Example 2 — Force Join Order

SELECT /*+ FORCE ORDER */ e.name, d.dept_name
FROM employees e
JOIN departments d ON e.dept_id = d.dept_id;

Use Case: When query planner’s automatic join reordering produces inefficient plans for known data distributions.


⚙️ Step 4: Why This Query Works

Hints let you override optimizer heuristics only when you’re certain your data pattern justifies it.


⚡ Step 5: Common Mistakes

  • Overusing hints — they can break performance as data evolves.
  • Hardcoding index names that later change.

🎯 Interview Tip

Say this line:

“Hints are a scalpel, not a hammer — I use them sparingly and only after verifying the query plan and data distribution.”


🔍 Interviewer’s Analysis

  • Concept Tested: Optimizer behavior / Query tuning
  • Difficulty: Advanced
  • Follow-Ups: Explain how to verify plan changes with EXPLAIN ANALYZE.

🪄 End of Q25.


Any doubt in content? Ask me anything?
Chat
🤖 👋 Hi there! I'm your learning assistant. If you have any questions about this page or need clarification, feel free to ask!