Core Skills Guide for AI Interviews (Math, Code, SQL) 2025

Interview Coding Solutions

82 min read 17384 words

👩‍💼 Business Analyst (BA)

(Beginner-friendly, example-driven Q&A for Top Tech Company Interviews)

❓ Q1: Retrieve all customer records located in a specific city.

🧩 Problem Statement

Return every customer row where the customer lives in a given city (e.g., Mumbai). This is basic selection + filtering.

💾 Step 1: Assume Example Tables

customers — list of customers.

customer_id	name	email	city	joined_date
1	Raj Patel	raj@example.com	Mumbai	2023-05-10
2	Aisha Khan	aisha@example.com	Delhi	2022-11-01
3	Sunil Rao	sunil@example.com	Mumbai	2024-02-15
4	Priya Verma	priya@example.com	Pune	2021-08-20

Columns:

customer_id: unique id.
city: used to filter.

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Tested concept: SELECT + WHERE filtering. We simply select rows where city = 'Mumbai'.

🏗️ SQL Query

SELECT customer_id, name, email, city, joined_date
FROM customers
WHERE city = 'Mumbai';

🔍 Explanation

SELECT chooses columns.
WHERE city = 'Mumbai' filters only customers in Mumbai.

🧾 Step 3: Expected Output

customer_id	name	email	city	joined_date
1	Raj Patel	raj@example.com	Mumbai	2023-05-10
3	Sunil Rao	sunil@example.com	Mumbai	2024-02-15

This shows both customers living in Mumbai.

⚙️ Step 4: Why This Query Works

A direct equality filter returns only matching rows; it’s efficient with an index on city.

⚡ Step 5: Common Mistakes

Forgetting quotes around string values.
Using LIKE when exact match is intended (or vice versa).
Case-sensitivity issues depending on DB collation.

🧠 Step 6: Alternate Approach (Optional)

-- Using parameter / placeholder (safer in application code)
SELECT *
FROM customers
WHERE city = ?;

Use this in prepared statements to avoid SQL injection.

🎯 Interview Tip

Explain that you would parameterize the city in production code and consider indexing city if queries are frequent.

🔍 Interviewer’s Analysis

Concept Tested: WHERE filter / selective projection
Difficulty: Beginner
Time Complexity: O(n) scan (or O(log n) if indexed)
Follow-Up Prompts: Indexing, case-sensitivity, parameterization.

🪄 End of Q1 — ready for the next question.

❓ Q2: Find employees whose names start with “A”.

🧩 Problem Statement

Return employee records where name begins with the letter A.

💾 Step 1: Assume Example Tables

employees

emp_id	name	dept_id	salary	hired_date
1	Amit Sharma	10	60000	2020-01-15
2	Bhavna Rai	20	55000	2019-03-10
3	Ananya Roy	10	70000	2021-07-22
4	Chetan Joshi	30	50000	2022-09-05

Columns:

name: text to search with pattern.

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Tested concept: pattern matching (LIKE) or ILIKE for case-insensitive checks.

🏗️ SQL Query

SELECT emp_id, name, dept_id, salary, hired_date
FROM employees
WHERE name LIKE 'A%';

(If your DB supports case-insensitive ILIKE and you want to catch lower-case a: use WHERE name ILIKE 'a%'.)

🔍 Explanation

LIKE 'A%' matches strings starting with A. % means any following characters.

🧾 Step 3: Expected Output

emp_id	name	dept_id	salary	hired_date
1	Amit Sharma	10	60000	2020-01-15
3	Ananya Roy	10	70000	2021-07-22

⚙️ Step 4: Why This Query Works

Pattern matching selects rows where the first character equals A. Use ILIKE for case-insensitive behavior if needed.

⚡ Step 5: Common Mistakes

Forgetting %.
Using = (exact match) instead of LIKE.
Not considering case sensitivity.

🧠 Step 6: Alternate Approach (Optional)

SELECT emp_id, name
FROM employees
WHERE SUBSTR(name,1,1) = 'A';

Less flexible and may be slower than LIKE.

🎯 Interview Tip

Mention whether collation is case-sensitive in your DB and prefer ILIKE or LOWER(name) LIKE 'a%' accordingly.

🪄 End of Q2.

❓ Q3: Display unique department names from the employee table.

🧩 Problem Statement

Return distinct department names listed in employees (or a departments table). Use DISTINCT.

💾 Step 1: Assume Example Tables

employees

emp_id	name	dept_name
1	Amit Sharma	Sales
2	Bhavna Rai	HR
3	Ananya Roy	Sales
4	Chetan Joshi	Engineering

Columns:

dept_name: department label (could also be a foreign key to departments).

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Concept: SELECT DISTINCT to remove duplicates.

🏗️ SQL Query

SELECT DISTINCT dept_name
FROM employees;

🔍 Explanation

DISTINCT collapses duplicate department values so each name appears once.

🧾 Step 3: Expected Output

dept_name
Sales
HR
Engineering

⚙️ Step 4: Why This Query Works

DISTINCT deduplicates values after selection.

⚡ Step 5: Common Mistakes

Selecting multiple columns with DISTINCT when only one is desired.
Not trimming whitespace/case differences leading to duplicates.

🧠 Step 6: Alternate Approach (Optional)

SELECT dept_name
FROM employees
GROUP BY dept_name;

Group by also returns unique dept names.

🎯 Interview Tip

If department IDs exist, prefer selecting ID + name to avoid ambiguity.

🪄 End of Q3.

❓ Q4: Fetch employees sorted by their joining date (latest first).

🧩 Problem Statement

Return employees ordered by joined_date descending so newest hires appear first.

💾 Step 1: Assume Example Tables

employees

emp_id	name	joined_date
1	Amit	2020-01-15
2	Bhavna	2022-03-10
3	Ananya	2024-07-01
4	Chetan	2021-09-05

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Concept: ORDER BY with DESC.

🏗️ SQL Query

SELECT emp_id, name, joined_date
FROM employees
ORDER BY joined_date DESC;

🔍 Explanation

ORDER BY joined_date DESC sorts newest first.

🧾 Step 3: Expected Output

emp_id	name	joined_date
3	Ananya	2024-07-01
2	Bhavna	2022-03-10
4	Chetan	2021-09-05
1	Amit	2020-01-15

⚙️ Step 4: Why This Query Works

Ordering controls presentation; databases optimize sorting but large sorts can be costly—consider indexing if needed.

⚡ Step 5: Common Mistakes

Forgetting DESC and getting oldest first.
Sorting on string date instead of date datatype.

🧠 Step 6: Alternate Approach (Optional)

-- Show top 10 newest hires
SELECT emp_id, name, joined_date
FROM employees
ORDER BY joined_date DESC
LIMIT 10;

🎯 Interview Tip

State whether joined_date is stored as DATE/TIMESTAMP to avoid string-sorting issues.

🪄 End of Q4.

❓ Q5: Get the top 10 highest-paid employees.

🧩 Problem Statement

Return the 10 employees with the largest salary. If tie-handling required, define an order.

💾 Step 1: Assume Example Tables

employees

emp_id	name	salary
1	Amit Sharma	120000
2	Bhavna Rai	90000
3	Ananya Roy	120000
4	Chetan Joshi	80000
5	Deepa Shah	75000

(Assume more rows beyond 5 in real table.)

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Concepts: ORDER BY + LIMIT (or window functions for ties).

🏗️ SQL Query

SELECT emp_id, name, salary
FROM employees
ORDER BY salary DESC
LIMIT 10;

(If you want deterministic tie-break: ORDER BY salary DESC, emp_id ASC)

🔍 Explanation

Sort by salary descending, then take the first 10 rows.

🧾 Step 3: Expected Output (top few shown)

emp_id	name	salary
1	Amit Sharma	120000
3	Ananya Roy	120000
2	Bhavna Rai	90000
4	Chetan Joshi	80000

⚙️ Step 4: Why This Query Works

Ordering then limiting fetches top rows efficiently; use index on salary to improve speed.

⚡ Step 5: Common Mistakes

Using TOP syntax in non-supporting DBs (use LIMIT or FETCH FIRST).
Not specifying tie-breaker for reproducibility.

🧠 Step 6: Alternate Approach (Optional)

-- Using ROW_NUMBER to handle ties / pagination
WITH ranked AS (
  SELECT emp_id, name, salary,
         ROW_NUMBER() OVER (ORDER BY salary DESC, emp_id) AS rn
  FROM employees
)
SELECT emp_id, name, salary
FROM ranked
WHERE rn <= 10;

🎯 Interview Tip

Discuss tie-handling and how pagination will be implemented for large result sets.

🪄 End of Q5.

❓ Q6: Retrieve all transactions made after January 1, 2024.

🧩 Problem Statement

Return transactions with transaction_date strictly after 2024-01-01.

💾 Step 1: Assume Example Tables

transactions

tx_id	customer_id	amount	transaction_date
1	1	1200	2023-12-20
2	2	5500	2024-01-05
3	1	3000	2024-03-11
4	3	450	2022-08-10

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Concept: date comparison with WHERE.

🏗️ SQL Query

SELECT tx_id, customer_id, amount, transaction_date
FROM transactions
WHERE transaction_date > '2024-01-01';

(If datetime column: '2024-01-01 00:00:00' as needed.)

🔍 Explanation

Rows with date strictly later than Jan 1, 2024 are returned.

🧾 Step 3: Expected Output

tx_id	customer_id	amount	transaction_date
2	2	5500	2024-01-05
3	1	3000	2024-03-11

⚙️ Step 4: Why This Query Works

Date comparisons use chronological ordering; ensure correct datatype.

⚡ Step 5: Common Mistakes

Using >= '2024-01-01' if you want strictly after.
Comparing strings if dates stored as text with inconsistent format.

🧠 Step 6: Alternate Approach (Optional)

-- Using DATE() conversion (if transaction_date includes time)
SELECT *
FROM transactions
WHERE DATE(transaction_date) > '2024-01-01';

🎯 Interview Tip

Clarify whether “after” includes the date (>=) or strictly greater (>).

🪄 End of Q6.

❓ Q7: Find the total salary expense for each department.

🧩 Problem Statement

Group employees by department and sum salaries to get department-level salary expense.

💾 Step 1: Assume Example Tables

employees

emp_id	name	dept_id	dept_name	salary
1	Amit	10	Sales	60000
2	Ananya	10	Sales	70000
3	Bhavna	20	HR	55000
4	Chetan	30	Eng	80000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Concepts: GROUP BY + aggregate SUM.

🏗️ SQL Query

SELECT dept_id, dept_name, SUM(salary) AS total_salary_expense
FROM employees
GROUP BY dept_id, dept_name;

🔍 Explanation

Group rows by department and compute sum of salary per group.

🧾 Step 3: Expected Output

dept_id	dept_name	total_salary_expense
10	Sales	130000
20	HR	55000
30	Eng	80000

⚙️ Step 4: Why This Query Works

GROUP BY collapses rows into department buckets; SUM adds salaries within each bucket.

⚡ Step 5: Common Mistakes

Omitting dept_name from GROUP BY when selected.
Using COUNT instead of SUM.

🧠 Step 6: Alternate Approach (Optional)

-- With join to a departments table
SELECT d.dept_id, d.dept_name, COALESCE(SUM(e.salary),0) AS total_salary
FROM departments d
LEFT JOIN employees e
  ON d.dept_id = e.dept_id
GROUP BY d.dept_id, d.dept_name;

This includes departments with zero employees.

🎯 Interview Tip

Mention COALESCE to show departments with no employees return 0 instead of NULL.

🪄 End of Q7.

❓ Q8: Count how many employees belong to each department.

🧩 Problem Statement

For each department, count employees (i.e., headcount per dept).

💾 Step 1: Assume Example Tables

employees

emp_id	name	dept_id	dept_name
1	Amit	10	Sales
2	Ananya	10	Sales
3	Bhavna	20	HR
4	Chetan	NULL	NULL

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Concept: COUNT(*) with GROUP BY. Decide whether to include NULL dept.

🏗️ SQL Query

SELECT dept_id, dept_name, COUNT(*) AS employee_count
FROM employees
WHERE dept_id IS NOT NULL
GROUP BY dept_id, dept_name;

(To include NULLs as a separate group, remove the WHERE clause.)

🔍 Explanation

Group by dept and count rows per group.

🧾 Step 3: Expected Output

dept_id	dept_name	employee_count
10	Sales	2
20	HR	1

⚙️ Step 4: Why This Query Works

COUNT(*) counts rows in each department group.

⚡ Step 5: Common Mistakes

Using COUNT(dept_id) which ignores NULLs (sometimes desired).
Forgetting GROUP BY for non-aggregated columns.

🧠 Step 6: Alternate Approach (Optional)

-- Include departments with zero employees using departments table
SELECT d.dept_id, d.dept_name, COUNT(e.emp_id) AS employee_count
FROM departments d
LEFT JOIN employees e ON d.dept_id = e.dept_id
GROUP BY d.dept_id, d.dept_name;

🎯 Interview Tip

Explain the difference between COUNT(*), COUNT(column), and COUNT(DISTINCT column).

🪄 End of Q8.

❓ Q9: Get departments with more than 5 employees.

🧩 Problem Statement

Find departments whose employee count exceeds 5.

💾 Step 1: Assume Example Tables

employees

(Only showing relevant counts.)

dept_id	dept_name	employee_count (derived)
10	Sales	-
20	HR	-

We will compute counts from employees.

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Concepts: GROUP BY + HAVING for filtering groups.

🏗️ SQL Query

SELECT dept_id, dept_name, COUNT(*) AS employee_count
FROM employees
GROUP BY dept_id, dept_name
HAVING COUNT(*) > 5;

🔍 Explanation

HAVING filters aggregated groups by their computed count.

🧾 Step 3: Expected Output

dept_id	dept_name	employee_count
50	Ops	12
70	Sales	8

(Example departments meeting the criterion.)

⚙️ Step 4: Why This Query Works

WHERE cannot filter aggregated results; HAVING can.

⚡ Step 5: Common Mistakes

Using WHERE COUNT(*) > 5 (invalid).
Forgetting GROUP BY on selected non-aggregated columns.

🧠 Step 6: Alternate Approach (Optional)

-- Using subquery
SELECT dept_id, dept_name, employee_count
FROM (
  SELECT dept_id, dept_name, COUNT(*) AS employee_count
  FROM employees
  GROUP BY dept_id, dept_name
) t
WHERE employee_count > 5;

🎯 Interview Tip

Explain HAVING vs WHERE clearly before coding.

🪄 End of Q9.

❓ Q10: Find the average salary per job title.

🧩 Problem Statement

Group by job_title and compute average salary for each title.

💾 Step 1: Assume Example Tables

employees

emp_id	name	job_title	salary
1	Amit	Sales Executive	60000
2	Ananya	Sales Executive	70000
3	Bhavna	HR Manager	90000
4	Chetan	Engineer	80000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Concept: AVG() aggregate with GROUP BY.

🏗️ SQL Query

SELECT job_title, ROUND(AVG(salary), 2) AS avg_salary
FROM employees
GROUP BY job_title;

🔍 Explanation

AVG(salary) calculates the mean salary per job title. ROUND formats to 2 decimals.

🧾 Step 3: Expected Output

job_title	avg_salary
Sales Executive	65000.00
HR Manager	90000.00
Engineer	80000.00

⚙️ Step 4: Why This Query Works

Standard aggregation by category returns an average per job title.

⚡ Step 5: Common Mistakes

Using integer division in some DBs (hence ROUND).
Selecting non-aggregated columns without grouping.

🧠 Step 6: Alternate Approach (Optional)

-- Exclude zero or NULL salaries
SELECT job_title, AVG(salary) AS avg_salary
FROM employees
WHERE salary IS NOT NULL AND salary > 0
GROUP BY job_title;

🎯 Interview Tip

Clarify how to treat NULL salaries and outliers before computing averages.

🪄 End of Q10.

❓ Q11: Retrieve total sales per region.

🧩 Problem Statement

Group transactions by region and sum amount to compute total sales per region.

💾 Step 1: Assume Example Tables

orders

order_id	customer_id	region	amount
1	1	West	1200
2	2	North	5500
3	1	West	3000
4	3	South	450

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Concept: GROUP BY region + SUM(amount).

🏗️ SQL Query

SELECT region, SUM(amount) AS total_sales
FROM orders
GROUP BY region;

🔍 Explanation

Sums order amounts grouped by region.

🧾 Step 3: Expected Output

region	total_sales
West	4200
North	5500
South	450

⚙️ Step 4: Why This Query Works

Aggregates amounts per region to produce regional totals.

⚡ Step 5: Common Mistakes

Forgetting GROUP BY.
Not handling NULL region values (may want to group them separately).

🧠 Step 6: Alternate Approach (Optional)

SELECT COALESCE(region, 'Unknown') AS region, SUM(amount) AS total_sales
FROM orders
GROUP BY COALESCE(region, 'Unknown');

🎯 Interview Tip

Discuss currency and timezone normalization if amounts come from different locales.

🪄 End of Q11.

❓ Q12: Count the total number of customers who placed at least one order.

🧩 Problem Statement

Count unique customers appearing in orders (customers with >=1 order).

💾 Step 1: Assume Example Tables

orders

order_id	customer_id	amount
1	1	1200
2	2	5500
3	1	3000
4	4	450

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Concept: COUNT(DISTINCT customer_id).

🏗️ SQL Query

SELECT COUNT(DISTINCT customer_id) AS customers_with_orders
FROM orders;

🔍 Explanation

DISTINCT ensures each customer counted once, even with multiple orders.

🧾 Step 3: Expected Output

customers_with_orders
3

(Customers: 1, 2, and 4)

⚙️ Step 4: Why This Query Works

Counting distinct IDs gives unique customer count.

⚡ Step 5: Common Mistakes

Using COUNT(*) which counts rows (orders), not unique customers.
Counting distinct on wrong column.

🧠 Step 6: Alternate Approach (Optional)

SELECT COUNT(*) FROM (SELECT 1 FROM orders GROUP BY customer_id) t;

Equivalent but less direct.

🎯 Interview Tip

If large tables, mention potential cost of DISTINCT and ways to optimize (indexes, pre-aggregated tables).

🪄 End of Q12.

❓ Q13: Find customers who made transactions above $5,000 more than once.

🧩 Problem Statement

Find customers who have at least two transactions where amount > 5000 (i.e., count of such transactions per customer > 1).

💾 Step 1: Assume Example Tables

transactions

tx_id	customer_id	amount	tx_date
1	1	6000	2024-02-01
2	2	7000	2024-03-01
3	1	5500	2024-04-01
4	3	4000	2024-05-01
5	2	2000	2024-06-01

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Concepts: filtering + grouping + HAVING.

🏗️ SQL Query

SELECT customer_id, COUNT(*) AS high_value_tx_count
FROM transactions
WHERE amount > 5000
GROUP BY customer_id
HAVING COUNT(*) > 1;

🔍 Explanation

Filter transactions > 5000, then count per customer, and keep those with count > 1.

🧾 Step 3: Expected Output

customer_id	high_value_tx_count
1	2

(Customer 1 has two high-value transactions; customer 2 has only one >5000.)

⚙️ Step 4: Why This Query Works

WHERE narrows rows, GROUP BY groups per customer, HAVING filters groups by count.

⚡ Step 5: Common Mistakes

Putting amount > 5000 in HAVING without filtering first (works but less efficient).
Forgetting GROUP BY.

🧠 Step 6: Alternate Approach (Optional)

SELECT customer_id
FROM (
  SELECT customer_id,
         SUM(CASE WHEN amount > 5000 THEN 1 ELSE 0 END) AS cnt
  FROM transactions
  GROUP BY customer_id
) t
WHERE cnt > 1;

Useful when counting multiple conditional buckets.

🎯 Interview Tip

Discuss performance trade-offs when transactions is large: an index on amount or partitioning could help.

🪄 End of Q13.

❓ Q14: Identify the most profitable regions by total transaction amount.

🧩 Problem Statement

Compute total transaction amount per region and rank regions by descending totals to find top performers.

💾 Step 1: Assume Example Tables

transactions

tx_id	region	amount
1	West	12000
2	North	5500
3	West	3000
4	South	450
5	North	8000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Concepts: aggregation + ordering + optional ranking.

🏗️ SQL Query

SELECT region, SUM(amount) AS total_amount
FROM transactions
GROUP BY region
ORDER BY total_amount DESC;

(To get top N regions: add LIMIT N.)

🔍 Explanation

Sums amounts per region and sorts highest to lowest.

🧾 Step 3: Expected Output

region	total_amount
West	15000
North	13500
South	450

⚙️ Step 4: Why This Query Works

Group & sum reveals aggregated revenue per region; ORDER BY shows rank.

⚡ Step 5: Common Mistakes

Forgetting GROUP BY.
Not handling NULL regions.

🧠 Step 6: Alternate Approach (Optional)

-- Add rank column
SELECT region, total_amount,
       RANK() OVER (ORDER BY total_amount DESC) AS region_rank
FROM (
  SELECT region, SUM(amount) AS total_amount
  FROM transactions
  GROUP BY region
) t
ORDER BY total_amount DESC;

🎯 Interview Tip

Discuss whether revenue or profit should be used; profit requires cost data.

🪄 End of Q14.

❓ Q15: Find customers who have not purchased anything in the last 6 months.

🧩 Problem Statement

Return customers with no orders/transactions in the past 6 months (relative to a reference date). For this example assume reference date = 2024-10-31 (today-like).

💾 Step 1: Assume Example Tables

customers

customer_id	name	joined_date
1	Raj	2021-05-10
2	Aisha	2022-02-01
3	Sunil	2023-07-22

orders

order_id	customer_id	order_date
1	1	2024-01-01
2	2	2024-08-01
3	1	2024-03-05

(Reference date for “last 6 months”: 2024-10-31. Six months before: 2024-04-30.)

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Concepts: anti-join (LEFT JOIN ... WHERE ... IS NULL) or NOT EXISTS. Date arithmetic needed.

🏗️ SQL Query (preferred `NOT EXISTS`)

SELECT c.customer_id, c.name
FROM customers c
WHERE NOT EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.customer_id = c.customer_id
    AND o.order_date > DATE '2024-04-30'  -- 6 months before 2024-10-31
);

(If your DB has CURRENT_DATE, compute CURRENT_DATE - INTERVAL '6 months'.)

🔍 Explanation

For each customer, check if any order exists after the cutoff date. If none, return the customer.

🧾 Step 3: Expected Output

customer_id	name
3	Sunil

(Sunil has no orders after 2024-04-30.)

⚙️ Step 4: Why This Query Works

NOT EXISTS efficiently filters customers with zero qualifying orders.

⚡ Step 5: Common Mistakes

Using LEFT JOIN incorrectly and forgetting WHERE o.order_id IS NULL.
Wrong date arithmetic or inclusive/exclusive boundary confusion.

🧠 Step 6: Alternate Approach (Optional)

-- LEFT JOIN method
SELECT c.customer_id, c.name
FROM customers c
LEFT JOIN orders o
  ON c.customer_id = o.customer_id
  AND o.order_date > DATE '2024-04-30'
WHERE o.order_id IS NULL;

🎯 Interview Tip

Clarify the reference date and whether “6 months” is exact calendar months or 180 days.

🪄 End of Q15.

❓ Q16: Calculate ARPU (Average Revenue Per User).

🧩 Problem Statement

ARPU = (total revenue in period) / (number of active users in period). For simplicity, compute for all-time: total transaction amount divided by number of unique customers who made at least one transaction.

💾 Step 1: Assume Example Tables

transactions

tx_id	customer_id	amount
1	1	1000
2	2	2000
3	1	3000
4	3	500

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Compute aggregate sum and distinct count, then divide. Use NULLIF to prevent division by zero.

🏗️ SQL Query

SELECT
  ROUND(SUM(amount) / NULLIF(COUNT(DISTINCT customer_id), 0), 2) AS arpu
FROM transactions;

🔍 Explanation

SUM(amount) total revenue.
COUNT(DISTINCT customer_id) active users.
NULLIF(...,0) avoids division by zero if no customers.

🧾 Step 3: Expected Output

arpu
2166.67

(Total revenue = 6500; unique customers = 3 → 6500/3 ≈ 2166.67)

⚙️ Step 4: Why This Query Works

Direct arithmetic on aggregates yields the average revenue per active user.

⚡ Step 5: Common Mistakes

Dividing by COUNT(*) (orders) instead of distinct users.
Not handling zero users.

🧠 Step 6: Alternate Approach (Optional)

Compute ARPU for a specific period using date filters:

SELECT ROUND(SUM(amount) / NULLIF(COUNT(DISTINCT customer_id), 0), 2) AS arpu
FROM transactions
WHERE transaction_date BETWEEN '2024-01-01' AND '2024-09-30';

🎯 Interview Tip

Define active user (e.g., made at least one purchase in period) before coding.

🪄 End of Q16.

❓ Q17: Calculate the monthly customer retention rate.

🧩 Problem Statement

Monthly retention rate = (customers active this month who were also active previous month) / (customers active previous month). We’ll compute for two months example: previous month = Sept 2024, current month = Oct 2024.

💾 Step 1: Assume Example Tables

orders

order_id	customer_id	order_date
1	1	2024-09-10
2	2	2024-09-20
3	1	2024-10-05
4	3	2024-10-15
5	2	2024-08-01

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

We need sets: users active in previous month (Sept) and current month (Oct), then compute intersection size / previous month size.

🏗️ SQL Query

WITH prev_month AS (
  SELECT DISTINCT customer_id
  FROM orders
  WHERE order_date BETWEEN '2024-09-01' AND '2024-09-30'
),
curr_month AS (
  SELECT DISTINCT customer_id
  FROM orders
  WHERE order_date BETWEEN '2024-10-01' AND '2024-10-31'
),
counts AS (
  SELECT
    (SELECT COUNT(*) FROM prev_month) AS prev_count,
    (SELECT COUNT(*) FROM curr_month) AS curr_count,
    (SELECT COUNT(*) FROM prev_month pm
     JOIN curr_month cm ON pm.customer_id = cm.customer_id) AS retained_count
)
SELECT
  prev_count,
  curr_count,
  retained_count,
  CASE WHEN prev_count = 0 THEN NULL
       ELSE ROUND((retained_count::numeric / prev_count) * 100, 2)
  END AS retention_rate_percent
FROM counts;

(For DBs without ::numeric, use CAST.)

🔍 Explanation

Gather distinct customers per month.
Count previous, current, and intersection.
Retention = retained / prev * 100.

🧾 Step 3: Expected Output

prev_count	curr_count	retained_count	retention_rate_percent
2	2	1	50.00

(Prev month customers: 1 and 2; current: 1 and 3; retained is customer 1 → 1/2 = 50%)

⚙️ Step 4: Why This Query Works

Set operations via CTEs create clear intermediate results; join computes intersection.

⚡ Step 5: Common Mistakes

Forgetting DISTINCT leading to inflated counts.
Using current month size as denominator instead of previous month.

🧠 Step 6: Alternate Approach (Optional)

Compute rolling retention for many months by month buckets and lagging; more advanced but same idea.

🎯 Interview Tip

Explain definition of “active” and whether to use purchases, logins, or other events to define activity.

🪄 End of Q17.

❓ Q18: Identify churned customers (inactive in the last 6 months).

🧩 Problem Statement

Return customers with no orders in the past 6 months (similar to Q15 but labelled as churn). Use a reference date (assume 2024-10-31 → cutoff 2024-04-30).

💾 Step 1: Assume Example Tables

customers

customer_id	name
1	Raj
2	Aisha
3	Sunil
4	Priya

orders

order_id	customer_id	order_date
1	1	2024-01-10
2	2	2023-12-05
3	3	2024-05-01
4	4	2024-03-15

Cutoff: 2024-04-30. Customers with last order date <= cutoff are churned if no orders after cutoff.

🧮 Step 2: Step-by-Step Solution

🧠 Understanding the Problem

Find customers whose most recent order is older than the cutoff or who never ordered.

🏗️ SQL Query

WITH last_order AS (
  SELECT customer_id, MAX(order_date) AS last_order_date
  FROM orders
  GROUP BY customer_id
)
SELECT c.customer_id, c.name, lo.last_order_date
FROM customers c
LEFT JOIN last_order lo ON c.customer_id = lo.customer_id
WHERE COALESCE(lo.last_order_date, DATE '1900-01-01') <= DATE '2024-04-30';

(Here COALESCE treats customers with no orders as churned.)

🔍 Explanation

last_order finds each customer’s latest order date.
Left join to include customers without orders.
Filter where last order is on/before cutoff → churned.

🧾 Step 3: Expected Output

customer_id	name	last_order_date
1	Raj	2024-01-10
2	Aisha	2023-12-05
4	Priya	2024-03-15

(Customer 3 had order on 2024-05-01 > cutoff, so not churned.)

⚙️ Step 4: Why This Query Works

LAST order date compares recency per customer; COALESCE handles never-ordered customers.

⚡ Step 5: Common Mistakes

Using WHERE on orders before aggregation, which can drop customers entirely.
Not handling customers with no orders.

🧠 Step 6: Alternate Approach (Optional)

-- Using NOT EXISTS to find customers with no orders after cutoff
SELECT c.customer_id, c.name
FROM customers c
WHERE NOT EXISTS (
  SELECT 1 FROM orders o
  WHERE o.customer_id = c.customer_id
    AND o.order_date > DATE '2024-04-30'
);

🎯 Interview Tip

State churn definition clearly: “no purchase in X period” vs “decreasing engagement”, and adjust query accordingly.

🪄 End of Q18.

———

👩‍💼 Data Analyst

❓ Q1: Combine employee and department data using an INNER JOIN.

🧩 Problem Statement

Show each employee’s name along with their department name. Only include employees who belong to an existing department.

💾 Step 1: Assume Example Tables

employees

emp_id	name	dept_id
1	Amit Sharma	10
2	Ananya Roy	20
3	Bhavna Rai	30
4	Chetan Joshi	40

departments

dept_id	dept_name
10	Sales
20	HR
30	Engineering

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: INNER JOIN returns rows where a match exists in both tables.

🏗️ SQL Query

SELECT e.emp_id,
       e.name,
       d.dept_name
FROM employees e
INNER JOIN departments d
  ON e.dept_id = d.dept_id;

🔍 Explanation

e and d are aliases for clarity.
Only employees whose dept_id appears in departments are shown.

🧾 Step 3: Expected Output

emp_id	name	dept_name
1	Amit Sharma	Sales
2	Ananya Roy	HR
3	Bhavna Rai	Engineering

Chetan (Dept 40) is excluded — no matching department.

⚙️ Step 4: Why This Query Works

INNER JOIN merges only intersecting rows by dept_id, giving valid employee-department pairs.

⚡ Step 5: Common Mistakes

Forgetting the ON condition → Cartesian join.
Selecting columns with same name without prefix → ambiguity.

🧠 Step 6: Alternate Approach

SELECT e.*, d.dept_name
FROM employees e, departments d
WHERE e.dept_id = d.dept_id;

Old-style syntax — functionally same but less readable.

🎯 Interview Tip

Mention that INNER JOIN filters unmatched rows, ideal for integrity-checked relations.

🔍 Interviewer’s Analysis

Concept Tested: INNER JOIN
Difficulty: Beginner
Complexity: ≈ O(n × m) without index; O(n log m) with index
Follow-Ups: Explain LEFT JOIN vs INNER JOIN; Indexing strategy on dept_id.

🪄 End of Q1.

❓ Q2: List all employees with their departments (even if unassigned) using a LEFT JOIN.

🧩 Problem Statement

Return every employee and their department name. If an employee has no department, still include them with NULL as department.

💾 Step 1: Assume Example Tables

Same as Q1 but keep Chetan’s department 40 missing.

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

LEFT JOIN keeps all rows from the left table (employees), even when no match on the right.

🏗️ SQL Query

SELECT e.emp_id,
       e.name,
       d.dept_name
FROM employees e
LEFT JOIN departments d
  ON e.dept_id = d.dept_id;

🔍 Explanation

All employees appear; unmatched ones show NULL for department.

🧾 Step 3: Expected Output

emp_id	name	dept_name
1	Amit Sharma	Sales
2	Ananya Roy	HR
3	Bhavna Rai	Engineering
4	Chetan Joshi	NULL

⚙️ Step 4: Why This Query Works

LEFT JOIN includes all employees (left table), filling missing matches with NULL.

⚡ Step 5: Common Mistakes

Using INNER JOIN instead → drops unassigned employees.
Forgetting to check for NULL when filtering post-join.

🧠 Step 6: Alternate Approach

SELECT e.emp_id, e.name,
       COALESCE(d.dept_name, 'Unassigned') AS dept_name
FROM employees e
LEFT JOIN departments d
  ON e.dept_id = d.dept_id;

COALESCE replaces NULL with “Unassigned”.

🎯 Interview Tip

Highlight that LEFT JOIN preserves completeness of the left entity — vital in reports.

🔍 Interviewer’s Analysis

Concept Tested: LEFT JOIN / NULL handling
Difficulty: Beginner → Intermediate
Follow-Ups: How to show departments with no employees (→ RIGHT JOIN).

🪄 End of Q2.

❓ Q3: Retrieve all data from both tables with a FULL OUTER JOIN.

🧩 Problem Statement

Show every employee and every department. Include:

employees without departments
departments without employees.

💾 Step 1: Assume Example Tables

employees

emp_id	name	dept_id
1	Amit	10
2	Ananya	20
3	Chetan	40

departments

dept_id	dept_name
10	Sales
20	HR
30	Engineering

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

FULL OUTER JOIN = LEFT JOIN UNION RIGHT JOIN. It keeps all rows from both sides.

🏗️ SQL Query

SELECT e.emp_id,
       e.name,
       e.dept_id AS emp_dept,
       d.dept_name
FROM employees e
FULL OUTER JOIN departments d
  ON e.dept_id = d.dept_id;

🔍 Explanation

Matches where possible.
When no match, one side’s columns → NULL.

🧾 Step 3: Expected Output

emp_id	name	emp_dept	dept_name
1	Amit	10	Sales
2	Ananya	20	HR
3	Chetan	40	NULL
NULL	NULL	30	Engineering

⚙️ Step 4: Why This Query Works

FULL OUTER JOIN merges the results of left and right joins, ensuring nothing is lost.

⚡ Step 5: Common Mistakes

Using DB that doesn’t support FULL OUTER JOIN (PostgreSQL yes, MySQL no → need UNION).
Forgetting to distinguish which dept_id comes from which table.

🧠 Step 6: Alternate Approach

SELECT e.emp_id, e.name, e.dept_id AS emp_dept, d.dept_name
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id
UNION
SELECT e.emp_id, e.name, e.dept_id, d.dept_name
FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.dept_id;

Equivalent manual implementation.

🎯 Interview Tip

Explain which DBs lack native FULL OUTER JOIN and how to emulate it.

🔍 Interviewer’s Analysis

Concept Tested: FULL OUTER JOIN / Set operations
Difficulty: Intermediate
Follow-Ups: “Show only unmatched rows” (→ use WHERE dept_name IS NULL OR name IS NULL).

❓ Q4: Perform a SELF JOIN to show employees along with their managers.

🧩 Problem Statement

Display each employee’s name together with their manager’s name and department.

💾 Step 1: Assume Example Tables

employees

emp_id	name	dept_id	manager_id
1	Amit Sharma	10	4
2	Ananya Roy	10	4
3	Bhavna Rai	20	5
4	David Kumar	10	NULL
5	Erin Singh	20	NULL

manager_id points to another row’s emp_id — hence “self” join.

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept tested: SELF JOIN — joining a table to itself with aliases.

🏗️ SQL Query

SELECT e.name       AS employee_name,
       m.name       AS manager_name,
       e.dept_id
FROM employees e
LEFT JOIN employees m
  ON e.manager_id = m.emp_id;

🔍 Explanation

e = employee copy, m = manager copy.
LEFT JOIN keeps top-level managers (NULL manager id).

🧾 Step 3: Expected Output

employee_name	manager_name	dept_id
Amit Sharma	David Kumar	10
Ananya Roy	David Kumar	10
Bhavna Rai	Erin Singh	20
David Kumar	NULL	10
Erin Singh	NULL	20

⚙️ Step 4: Why This Query Works

The same table is scanned twice — once as employees, once as managers — matching on manager_id = emp_id.

⚡ Step 5: Common Mistakes

Forgetting aliases → ambiguous column error.
Using INNER JOIN → managers only (drops top bosses).

🧠 Step 6: Alternate Approach

SELECT e.name, m.name AS manager_name
FROM employees e
JOIN employees m USING (emp_id)

(Not valid here — shows why aliases + explicit join condition are required.)

🎯 Interview Tip

Explain clearly the difference between self-join and hierarchical CTE (for multi-level chains).

🔍 Interviewer’s Analysis

Concept Tested: Self-Join / Aliasing
Difficulty: Intermediate
Follow-Ups: Find multi-level chain; show manager count per manager.

🪄 End of Q4.

❓ Q5: Find employees who do not belong to any department.

🧩 Problem Statement

Return employees whose dept_id does not exist in the departments table.

💾 Step 1: Assume Example Tables

employees

emp_id	name	dept_id
1	Amit Sharma	10
2	Ananya Roy	20
3	Bhavna Rai	40
4	Chetan Joshi	NULL

departments

dept_id	dept_name
10	Sales
20	HR
30	Engineering

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: Anti-Join using LEFT JOIN … IS NULL or NOT IN.

🏗️ SQL Query

SELECT e.emp_id, e.name
FROM employees e
LEFT JOIN departments d
  ON e.dept_id = d.dept_id
WHERE d.dept_id IS NULL;

🔍 Explanation

LEFT JOIN keeps all employees.
The WHERE d.dept_id IS NULL condition isolates those without a matching department.

🧾 Step 3: Expected Output

emp_id	name
3	Bhavna Rai
4	Chetan Joshi

⚙️ Step 4: Why This Query Works

Unmatched rows (from left table) show NULLs on the right — filtering those identifies missing links.

⚡ Step 5: Common Mistakes

Using INNER JOIN (returns only matched rows).
Using NOT IN without handling NULLs → no rows returned if NULL exists in subquery.

🧠 Step 6: Alternate Approach

SELECT emp_id, name
FROM employees
WHERE dept_id NOT IN (SELECT dept_id FROM departments WHERE dept_id IS NOT NULL);

🎯 Interview Tip

If interviewer asks “how to optimize,” mention that an anti-join using NOT EXISTS often beats NOT IN.

🔍 Interviewer’s Analysis

Concept Tested: Left Anti-Join
Difficulty: Intermediate
Follow-Ups: Handle NULL values in subqueries; compare NOT IN vs NOT EXISTS.

🪄 End of Q5.

❓ Q6: Display products purchased by customers from more than one country.

🧩 Problem Statement

Find products that have been bought by customers in multiple countries, i.e., count of distinct countries > 1.

💾 Step 1: Assume Example Tables

customers

customer_id	name	country
1	Raj	India
2	Aisha	India
3	John	USA
4	Maria	UK

orders

order_id	customer_id	product_id
1	1	100
2	2	200
3	3	100
4	4	300
5	3	200

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concepts: JOIN → GROUP BY → COUNT(DISTINCT) with HAVING.

🏗️ SQL Query

SELECT o.product_id,
       COUNT(DISTINCT c.country) AS country_count
FROM orders o
JOIN customers c
  ON o.customer_id = c.customer_id
GROUP BY o.product_id
HAVING COUNT(DISTINCT c.country) > 1;

🔍 Explanation

Join orders ↔ customers to access country.
Group by product and count unique countries.
Keep only those sold in > 1 country.

🧾 Step 3: Expected Output

product_id	country_count
100	2
200	2

(Product 100 bought in India & USA; Product 200 in India & USA.)

⚙️ Step 4: Why This Query Works

Aggregation over joined data computes cross-customer metrics (“distinct countries per product”).

⚡ Step 5: Common Mistakes

Forgetting DISTINCT → over-count.
Filtering with WHERE instead of HAVING.

🧠 Step 6: Alternate Approach

-- Using a CTE for clarity
WITH product_countries AS (
  SELECT o.product_id, c.country
  FROM orders o
  JOIN customers c ON o.customer_id = c.customer_id
  GROUP BY o.product_id, c.country
)
SELECT product_id
FROM product_countries
GROUP BY product_id
HAVING COUNT(*) > 1;

🎯 Interview Tip

Discuss indexing (customer_id, product_id) for large joins and explain why COUNT(DISTINCT) is CPU-intensive.

🔍 Interviewer’s Analysis

Concept Tested: Join + Aggregation + Distinct
Difficulty: Intermediate
Follow-Ups: Optimize COUNT(DISTINCT); filter by time period.

🪄 End of Q6.

❓ Q7: Find employees who earn more than the average salary.

🧩 Problem Statement

List employees whose salary is greater than the overall average salary across the company.

💾 Step 1: Assume Example Tables

employees

emp_id	name	dept_id	salary
1	Amit Sharma	10	60000
2	Ananya Roy	10	75000
3	Bhavna Rai	20	90000
4	Chetan Joshi	20	50000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept tested: Subquery returning a single value (the company-wide average).

🏗️ SQL Query

SELECT name, salary
FROM employees
WHERE salary > (
  SELECT AVG(salary)
  FROM employees
);

🔍 Explanation

The inner query computes one scalar — the average salary.
The outer query filters all employees whose salary > that value.

🧾 Step 3: Expected Output

name	salary
Ananya Roy	75000
Bhavna Rai	90000

Average salary = (60000 + 75000 + 90000 + 50000) / 4 = 68750.

⚙️ Step 4: Why This Query Works

The scalar subquery executes once; each row compares its salary to the result.

⚡ Step 5: Common Mistakes

Forgetting parentheses around subquery.
Trying to compare multiple rows without aggregation → “subquery returns more than one row.”

🧠 Step 6: Alternate Approach

-- Using a window function (no subquery)
SELECT name, salary
FROM (
  SELECT name, salary, AVG(salary) OVER () AS avg_sal
  FROM employees
) t
WHERE salary > avg_sal;

More modern and efficient.

🎯 Interview Tip

If asked, mention that the window function version avoids recalculating averages repeatedly.

🔍 Interviewer’s Analysis

Concept Tested: Scalar subquery / Aggregation
Difficulty: Beginner → Intermediate
Follow-Ups: Compare performance between subquery and window approach.

🪄 End of Q7.

❓ Q8: Fetch the second-highest salary using subquery and DENSE_RANK.

🧩 Problem Statement

Retrieve the employee(s) with the second-highest salary — including ties.

💾 Step 1: Assume Example Tables

employees

emp_id	name	salary
1	Amit Sharma	60000
2	Ananya Roy	75000
3	Bhavna Rai	90000
4	Chetan Joshi	90000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We can either:

Use a subquery with MAX and <> logic, or
Use DENSE_RANK() window function.

We’ll demonstrate both.

🏗️ SQL Query (Approach 1 — Subquery)

SELECT name, salary
FROM employees
WHERE salary = (
  SELECT MAX(salary)
  FROM employees
  WHERE salary < (SELECT MAX(salary) FROM employees)
);

🔍 Explanation

Inner-most: Finds highest salary (90000).
Middle: Finds max salary below that (75000).
Outer: Returns employees with that salary.

🏗️ SQL Query (Approach 2 — Using DENSE_RANK)

SELECT name, salary
FROM (
  SELECT name, salary,
         DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
  FROM employees
) t
WHERE rnk = 2;

🔍 Explanation

DENSE_RANK() assigns same rank for duplicates.
We select rows with rank = 2 → second-highest.

🧾 Step 3: Expected Output

name	salary
Ananya Roy	75000

⚙️ Step 4: Why This Query Works

Both methods logically identify the “runner-up” salary value and filter for matching rows.

⚡ Step 5: Common Mistakes

Using LIMIT 2 incorrectly → returns top two, not specifically second.
Using ROW_NUMBER (which skips ties).

🎯 Interview Tip

Explain how DENSE_RANK handles duplicates gracefully; ROW_NUMBER would not.

🔍 Interviewer’s Analysis

Concept Tested: Subqueries / Ranking Functions
Difficulty: Intermediate
Follow-Ups: Compare performance; show Nth-highest.

🪄 End of Q8.

❓ Q9: Retrieve products that have never been sold.

🧩 Problem Statement

Find products in the catalog that do not appear in the sales table.

💾 Step 1: Assume Example Tables

products

product_id	product_name
100	Laptop
101	Phone
102	Tablet
103	Smartwatch

sales

sale_id	product_id	amount
1	100	50000
2	101	20000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: Anti-Join / NOT EXISTS to find unmatched rows.

🏗️ SQL Query

SELECT p.product_id, p.product_name
FROM products p
WHERE NOT EXISTS (
  SELECT 1
  FROM sales s
  WHERE s.product_id = p.product_id
);

🔍 Explanation

The subquery checks if a product appears in sales. If no match exists, it’s unsold.

🧾 Step 3: Expected Output

product_id	product_name
102	Tablet
103	Smartwatch

⚙️ Step 4: Why This Query Works

NOT EXISTS efficiently excludes all products present in sales.

⚡ Step 5: Common Mistakes

Using NOT IN without NULL handling (fails if sales.product_id has NULL).
Using INNER JOIN instead of anti-join.

🧠 Step 6: Alternate Approach

SELECT p.product_id, p.product_name
FROM products p
LEFT JOIN sales s ON p.product_id = s.product_id
WHERE s.product_id IS NULL;

🎯 Interview Tip

Mention that NOT EXISTS is typically faster than LEFT JOIN for large datasets due to early exits.

🔍 Interviewer’s Analysis

Concept Tested: Anti-Join / Exclusion Logic
Difficulty: Intermediate
Follow-Ups: Performance difference between NOT EXISTS vs LEFT JOIN.

🪄 End of Q9.

❓ Q10: Display customers who never placed an order.

🧩 Problem Statement

List all customers who do not appear in the orders table.

💾 Step 1: Assume Example Tables

customers

customer_id	name
1	Raj
2	Aisha
3	Sunil
4	Priya

orders

order_id	customer_id	amount
1	1	1000
2	2	2000
3	1	1500

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: Anti-Join / NOT EXISTS again, applied on customers ↔ orders.

🏗️ SQL Query

SELECT c.customer_id, c.name
FROM customers c
WHERE NOT EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.customer_id = c.customer_id
);

🔍 Explanation

Each customer checked against the orders table. Only those with no orders are returned.

🧾 Step 3: Expected Output

customer_id	name
3	Sunil
4	Priya

⚙️ Step 4: Why This Query Works

The correlated subquery ensures exclusion only when there are no matches.

⚡ Step 5: Common Mistakes

Using NOT IN directly → issues with NULL values.
Forgetting correlation (o.customer_id = c.customer_id).

🧠 Step 6: Alternate Approach

SELECT c.customer_id, c.name
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL;

🎯 Interview Tip

In large-scale data, NOT EXISTS tends to outperform joins when proper indexes exist.

🔍 Interviewer’s Analysis

Concept Tested: Correlated Subquery / Anti-Join
Difficulty: Beginner → Intermediate
Follow-Ups: Add condition for “no order in last 6 months”.

🪄 End of Q10.

❓ Q11: Identify departments that don’t have any employees hired after 2020.

🧩 Problem Statement

Find all departments where no employee joined after the year 2020.

💾 Step 1: Assume Example Tables

employees

emp_id	name	dept_id	hire_date
1	Amit Sharma	10	2019-05-01
2	Ananya Roy	10	2021-03-01
3	Bhavna Rai	20	2018-06-10
4	Chetan Joshi	30	2017-01-15

departments

dept_id	dept_name
10	Sales
20	HR
30	Tech

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: Use NOT EXISTS to exclude departments with any recent hire.

🏗️ SQL Query

SELECT d.dept_id, d.dept_name
FROM departments d
WHERE NOT EXISTS (
  SELECT 1
  FROM employees e
  WHERE e.dept_id = d.dept_id
    AND e.hire_date > '2020-12-31'
);

🔍 Explanation

The subquery searches for employees hired after 2020 in that department.
If none exist → department qualifies.

🧾 Step 3: Expected Output

dept_id	dept_name
20	HR
30	Tech

Sales excluded (has Ananya hired in 2021).

⚙️ Step 4: Why This Query Works

NOT EXISTS removes departments with any post-2020 hire.

⚡ Step 5: Common Mistakes

Using HAVING incorrectly (not applicable).
Mixing up WHERE hire_date < vs > logic.

🧠 Step 6: Alternate Approach

SELECT d.dept_id, d.dept_name
FROM departments d
LEFT JOIN employees e
  ON d.dept_id = e.dept_id AND e.hire_date > '2020-12-31'
WHERE e.emp_id IS NULL;

🎯 Interview Tip

State clearly why you used NOT EXISTS: it ensures logical clarity and performance.

🔍 Interviewer’s Analysis

Concept Tested: Correlated Subquery / Temporal Filtering
Difficulty: Intermediate
Follow-Ups: How to adapt for “no new hires in past N months.”

🪄 End of Q11.

❓ Q12: Retrieve customers whose total purchases exceed the overall customer average.

🧩 Problem Statement

Find customers whose total spend (sum of order amounts) is greater than the average of all customers’ total spend.

💾 Step 1: Assume Example Tables

orders

order_id	customer_id	amount
1	1	1000
2	1	2000
3	2	500
4	3	4000
5	3	1000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Two levels of aggregation:

Inner query: total per customer.
Outer query: compare each total to overall average.

🏗️ SQL Query

SELECT customer_id, total_amount
FROM (
  SELECT customer_id, SUM(amount) AS total_amount
  FROM orders
  GROUP BY customer_id
) t
WHERE total_amount > (
  SELECT AVG(total_amount)
  FROM (
    SELECT customer_id, SUM(amount) AS total_amount
    FROM orders
    GROUP BY customer_id
  ) sub
);

🔍 Explanation

Inner aggregation: computes SUM(amount) per customer.
Outer query compares each to average of those totals.

🧾 Step 3: Expected Output

customer_id	total_amount
3	5000
1	3000

Average customer total = (3000 + 500 + 5000)/3 = 2833.33.

⚙️ Step 4: Why This Query Works

It’s a double subquery: one for per-customer totals, one for the global average.

⚡ Step 5: Common Mistakes

Comparing to AVG(amount) directly → wrong (compares per-order, not per-customer).
Forgetting aliasing in subqueries.

🧠 Step 6: Alternate Approach

WITH customer_totals AS (
  SELECT customer_id, SUM(amount) AS total_amount
  FROM orders
  GROUP BY customer_id
)
SELECT customer_id, total_amount
FROM customer_totals
WHERE total_amount > (SELECT AVG(total_amount) FROM customer_totals);

CTE improves readability.

🎯 Interview Tip

Explain trade-off between nested subquery and CTE — same result, but CTE improves maintainability.

🔍 Interviewer’s Analysis

Concept Tested: Aggregation within subqueries / CTEs
Difficulty: Intermediate
Follow-Ups: Optimize with temporary tables or materialized views.

🪄 End of Q12.

❓ Q13: Assign ranks to employees by salary within each department.

🧩 Problem Statement

Rank employees based on salary within their department, so that the highest earner in each department gets rank 1.

💾 Step 1: Assume Example Tables

employees

emp_id	name	dept_id	salary
1	Amit Sharma	10	60000
2	Ananya Roy	10	80000
3	Bhavna Rai	20	75000
4	Chetan Joshi	20	72000
5	Deepa Singh	20	72000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: Window function — RANK() or DENSE_RANK() partitioned by department.

🏗️ SQL Query

SELECT emp_id,
       name,
       dept_id,
       salary,
       DENSE_RANK() OVER (
         PARTITION BY dept_id
         ORDER BY salary DESC
       ) AS dept_rank
FROM employees;

🔍 Explanation

PARTITION BY dept_id = restart rank per department.
ORDER BY salary DESC = rank by salary descending.
DENSE_RANK() assigns same rank to ties (vs RANK() adds gaps).

🧾 Step 3: Expected Output

emp_id	name	dept_id	salary	dept_rank
2	Ananya Roy	10	80000	1
1	Amit Sharma	10	60000	2
3	Bhavna Rai	20	75000	1
4	Chetan Joshi	20	72000	2
5	Deepa Singh	20	72000	2

⚙️ Step 4: Why This Query Works

Each department acts as its own mini-table; ranks are computed per partition.

⚡ Step 5: Common Mistakes

Forgetting PARTITION BY → global ranking instead of per dept.
Using ROW_NUMBER() when ties should share rank.

🧠 Step 6: Alternate Approach

-- Using ROW_NUMBER to get unique ordering
SELECT emp_id, name, dept_id, salary,
       ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS dept_rank
FROM employees;

Ensures unique rank per employee.

🎯 Interview Tip

Explain the difference between RANK(), DENSE_RANK(), and ROW_NUMBER() — interviewers love this follow-up.

🔍 Interviewer’s Analysis

Concept Tested: Ranking & Partitioning
Difficulty: Intermediate
Follow-Ups: Explain how to show only top-3 per dept using a subquery filter.

🪄 End of Q13.

❓ Q14: Calculate cumulative (running) sales per employee.

🧩 Problem Statement

Show each employee’s cumulative sales over time, ordered by transaction date.

💾 Step 1: Assume Example Tables

sales

sale_id	emp_id	sale_date	amount
1	1	2024-01-01	1000
2	1	2024-02-01	2000
3	2	2024-01-15	1500
4	2	2024-03-01	2500
5	1	2024-03-05	3000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: Cumulative total via SUM() OVER (ORDER BY …) partitioned by employee.

🏗️ SQL Query

SELECT emp_id,
       sale_date,
       amount,
       SUM(amount) OVER (
         PARTITION BY emp_id
         ORDER BY sale_date
         ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
       ) AS cumulative_sales
FROM sales;

🔍 Explanation

SUM() OVER() calculates total up to current row per employee.
The ROWS clause ensures true running sum.

🧾 Step 3: Expected Output

emp_id	sale_date	amount	cumulative_sales
1	2024-01-01	1000	1000
1	2024-02-01	2000	3000
1	2024-03-05	3000	6000
2	2024-01-15	1500	1500
2	2024-03-01	2500	4000

⚙️ Step 4: Why This Query Works

Windowed SUM lets totals accumulate row by row without collapsing groups.

⚡ Step 5: Common Mistakes

Forgetting ORDER BY → full sum repeated on each row.
Using aggregate SUM() instead of windowed version.

🧠 Step 6: Alternate Approach

For databases not supporting ROWS, the default frame behaves equivalently when ordered.

🎯 Interview Tip

Use this to explain running totals, moving averages, or trend tracking — all derived from similar window logic.

🔍 Interviewer’s Analysis

Concept Tested: Window Aggregation
Difficulty: Intermediate
Follow-Ups: Compute for company-wide cumulative totals (no partition).

🪄 End of Q14.

❓ Q15: Find the difference between each month’s and previous month’s sales.

🧩 Problem Statement

Show monthly sales and how much they increased/decreased compared to the previous month.

💾 Step 1: Assume Example Tables

sales_summary

month	total_sales
Jan	50000
Feb	60000
Mar	55000
Apr	70000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: LAG() window function — look at previous row’s value.

🏗️ SQL Query

SELECT month,
       total_sales,
       LAG(total_sales) OVER (ORDER BY month) AS prev_sales,
       total_sales - LAG(total_sales) OVER (ORDER BY month) AS change_from_prev
FROM sales_summary;

🔍 Explanation

LAG() fetches previous month’s sales.
Subtract to find difference.

🧾 Step 3: Expected Output

month	total_sales	prev_sales	change_from_prev
Jan	50000	NULL	NULL
Feb	60000	50000	10000
Mar	55000	60000	-5000
Apr	70000	55000	15000

⚙️ Step 4: Why This Query Works

Window functions maintain chronological context; LAG() is ideal for sequential comparisons.

⚡ Step 5: Common Mistakes

Ordering alphabetically instead of chronologically (month should be date-type).
Using JOIN for month-over-month — unnecessary and inefficient.

🧠 Step 6: Alternate Approach

-- Using LEAD() to show next month’s change instead
SELECT month,
       total_sales,
       LEAD(total_sales) OVER (ORDER BY month) - total_sales AS change_to_next
FROM sales_summary;

🎯 Interview Tip

Talk about using LAG() for retention, revenue growth, or churn — all based on similar deltas.

🔍 Interviewer’s Analysis

Concept Tested: LAG() / Time-series comparison
Difficulty: Intermediate
Follow-Ups: Handle missing months or irregular time gaps.

🪄 End of Q15.

❓ Q16: Show employee salaries and the salary gap vs department average.

🧩 Problem Statement

For each employee, show how far their salary differs from their department’s average salary.

💾 Step 1: Assume Example Tables

employees

emp_id	name	dept_id	salary
1	Amit Sharma	10	60000
2	Ananya Roy	10	80000
3	Bhavna Rai	20	90000
4	Chetan Joshi	20	75000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Use window AVG() to compute departmental mean per row, then subtract.

🏗️ SQL Query

SELECT emp_id,
       name,
       dept_id,
       salary,
       ROUND(AVG(salary) OVER (PARTITION BY dept_id), 2) AS dept_avg,
       salary - AVG(salary) OVER (PARTITION BY dept_id) AS salary_gap
FROM employees;

🔍 Explanation

AVG(salary) OVER (PARTITION BY dept_id) gives department average.
Subtract it from each employee’s salary.

🧾 Step 3: Expected Output

emp_id	name	dept_id	salary	dept_avg	salary_gap
1	Amit Sharma	10	60000	70000.00	-10000
2	Ananya Roy	10	80000	70000.00	10000
3	Bhavna Rai	20	90000	82500.00	7500
4	Chetan Joshi	20	75000	82500.00	-7500

⚙️ Step 4: Why This Query Works

Window functions allow comparing each row to its group’s aggregate without collapsing rows.

⚡ Step 5: Common Mistakes

Using GROUP BY dept_id instead of window → collapses rows.
Forgetting PARTITION BY → average over all departments.

🎯 Interview Tip

Show that you can mix aggregates and raw rows — that’s what makes windows powerful.

🔍 Interviewer’s Analysis

Concept Tested: Window Aggregation / Comparative Analytics
Difficulty: Intermediate
Follow-Ups: Add flag column for “above/below average”.

🪄 End of Q16.

❓ Q17: Find the Nth highest salary in each department.

🧩 Problem Statement

Retrieve the Nth (e.g., 2nd or 3rd) highest salary per department.

💾 Step 1: Assume Example Tables

employees

emp_id	name	dept_id	salary
1	Amit Sharma	10	60000
2	Ananya Roy	10	80000
3	Bhavna Rai	10	70000
4	Chetan Joshi	20	95000
5	Deepa Singh	20	90000
6	Esha Patel	20	88000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Use DENSE_RANK() to assign salary order within department, then filter.

🏗️ SQL Query

SELECT dept_id, name, salary
FROM (
  SELECT dept_id, name, salary,
         DENSE_RANK() OVER (
           PARTITION BY dept_id
           ORDER BY salary DESC
         ) AS rnk
  FROM employees
) t
WHERE rnk = 2;

🔍 Explanation

Assign rank by salary (highest = 1).
Filter for the target rank (Nth = 2 in this example).

🧾 Step 3: Expected Output (N = 2)

dept_id	name	salary
10	Bhavna Rai	70000
20	Deepa Singh	90000

⚙️ Step 4: Why This Query Works

DENSE_RANK() allows rank-based filtering while preserving ties.

⚡ Step 5: Common Mistakes

Using LIMIT → gives top-N overall, not per department.
Using ROW_NUMBER when ties should share rank.

🧠 Step 6: Alternate Approach

To find Nth highest salary company-wide:

SELECT salary
FROM employees
ORDER BY salary DESC
OFFSET (N - 1) ROWS FETCH NEXT 1 ROWS ONLY;

🎯 Interview Tip

Discuss tie-handling (DENSE_RANK vs ROW_NUMBER) and efficiency with large datasets (index on dept_id, salary).

🔍 Interviewer’s Analysis

Concept Tested: Window Ranking / Conditional Filtering
Difficulty: Intermediate
Follow-Ups: Modify for top-3 instead of exactly N.

🪄 End of Q17.

❓ Q18: Calculate the moving average of posts per user (7-day window).

🧩 Problem Statement

Compute each user’s 7-day moving average of posts, using their posting dates.

💾 Step 1: Assume Example Tables

posts

post_id	user_id	post_date	post_count
1	1	2024-01-01	2
2	1	2024-01-02	3
3	1	2024-01-05	4
4	1	2024-01-08	1
5	2	2024-01-01	5

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: Moving average with date-based window — sliding frame of last 7 days.

🏗️ SQL Query

SELECT user_id,
       post_date,
       post_count,
       AVG(post_count) OVER (
         PARTITION BY user_id
         ORDER BY post_date
         RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW
       ) AS moving_avg_7d
FROM posts;

🔍 Explanation

RANGE BETWEEN … dynamically covers 7 calendar days before each post.
Average recalculates over that rolling range.

🧾 Step 3: Expected Output (simplified)

user_id	post_date	post_count	moving_avg_7d
1	2024-01-01	2	2.00
1	2024-01-02	3	2.50
1	2024-01-05	4	3.00
1	2024-01-08	1	2.50
2	2024-01-01	5	5.00

⚙️ Step 4: Why This Query Works

The sliding window dynamically shifts with each row, capturing exactly 7 days’ worth of history.

⚡ Step 5: Common Mistakes

Using ROWS instead of RANGE → counts 7 rows, not 7 days.
Forgetting to partition by user_id → merges all users.

🧠 Step 6: Alternate Approach

For databases lacking INTERVAL syntax:

SELECT user_id, post_date, post_count,
       AVG(post_count) OVER (
         PARTITION BY user_id
         ORDER BY post_date
         ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
       ) AS moving_avg_7rows
FROM posts;

🎯 Interview Tip

Explain that this query mimics “rolling averages” used in dashboards, retention, or trend lines.

🔍 Interviewer’s Analysis

Concept Tested: RANGE-based Windowing
Difficulty: Advanced
Follow-Ups: Compare ROWS vs RANGE; extend to 30-day moving avg.

🪄 End of Q18.

❓ Q19: Extract first and last names from a full_name column.

🧩 Problem Statement

Given an employee table with full_name, split it into two separate columns: first_name and last_name.

💾 Step 1: Assume Example Tables

employees

emp_id	full_name
1	Amit Sharma
2	Ananya Roy
3	Bhavna Rai
4	Chetan Joshi

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: String splitting using functions like SUBSTRING_INDEX, SPLIT_PART, or CHARINDEX depending on SQL dialect.

🏗️ SQL Query (MySQL)

SELECT emp_id,
       SUBSTRING_INDEX(full_name, ' ', 1) AS first_name,
       SUBSTRING_INDEX(full_name, ' ', -1) AS last_name
FROM employees;

🏗️ SQL Query (PostgreSQL)

SELECT emp_id,
       SPLIT_PART(full_name, ' ', 1) AS first_name,
       SPLIT_PART(full_name, ' ', 2) AS last_name
FROM employees;

🔍 Explanation

Split by the first space ' '.
The first token = first name; the last = last name.

🧾 Step 3: Expected Output

emp_id	first_name	last_name
1	Amit	Sharma
2	Ananya	Roy
3	Bhavna	Rai
4	Chetan	Joshi

⚙️ Step 4: Why This Query Works

String functions parse the full name at the space character and extract text before and after.

⚡ Step 5: Common Mistakes

Names with middle names or extra spaces cause wrong splits.
Forgetting to trim extra whitespace.

🧠 Step 6: Alternate Approach

SELECT emp_id,
       LEFT(full_name, CHARINDEX(' ', full_name)-1) AS first_name,
       RIGHT(full_name, LEN(full_name) - CHARINDEX(' ', full_name)) AS last_name
FROM employees;

(For SQL Server)

🎯 Interview Tip

Mention how to handle multi-word names using regex or splitting by position of first/last space.

🔍 Interviewer’s Analysis

Concept Tested: String parsing
Difficulty: Beginner
Follow-Ups: Handle multiple spaces; trim names with LTRIM() / RTRIM().

🪄 End of Q19.

❓ Q20: Find customers whose email domain is “gmail.com”.

🧩 Problem Statement

Identify all customers whose email addresses belong to the Gmail domain.

💾 Step 1: Assume Example Tables

customers

customer_id	name	email
1	Raj	raj@gmail.com
2	Aisha	aisha@yahoo.com
3	Sunil	sunil@gmail.com
4	Priya	priya@outlook.com

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: String pattern matching using LIKE or SUBSTRING_INDEX().

🏗️ SQL Query

SELECT customer_id, name, email
FROM customers
WHERE email LIKE '%@gmail.com';

🔍 Explanation

% matches any preceding text.
Ensures email ends with “@gmail.com”.

🧾 Step 3: Expected Output

customer_id	name	email
1	Raj	raj@gmail.com
3	Sunil	sunil@gmail.com

⚙️ Step 4: Why This Query Works

LIKE '%@gmail.com' filters text ending with that domain.

⚡ Step 5: Common Mistakes

Forgetting % before domain → matches only exact @gmail.com.
Using case-sensitive match (LIKE is case-insensitive in MySQL, but not in PostgreSQL).

🧠 Step 6: Alternate Approach

-- Extract domain and filter
SELECT customer_id, name, email
FROM customers
WHERE SUBSTRING_INDEX(email, '@', -1) = 'gmail.com';

🎯 Interview Tip

Mention how to normalize case (LOWER(email)) before comparing domains.

🔍 Interviewer’s Analysis

Concept Tested: String pattern search
Difficulty: Beginner
Follow-Ups: Extract username vs domain; group customers by domain.

🪄 End of Q20.

❓ Q21: Retrieve orders made in the last 7 days.

🧩 Problem Statement

Find all orders whose order_date falls within the last 7 days from today.

💾 Step 1: Assume Example Tables

orders

order_id	customer_id	order_date	amount
1	1	2024-02-20	5000
2	2	2024-02-25	3000
3	3	2024-02-28	7000
4	4	2024-03-01	4000

(Assume today = 2024-03-01)

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: Date arithmetic using CURRENT_DATE or NOW().

🏗️ SQL Query (MySQL/PostgreSQL)

SELECT order_id, customer_id, order_date, amount
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '7 DAY';

🔍 Explanation

Subtracts 7 days from today’s date and filters all orders on or after that date.

🧾 Step 3: Expected Output

order_id	customer_id	order_date	amount
2	2	2024-02-25	3000
3	3	2024-02-28	7000
4	4	2024-03-01	4000

⚙️ Step 4: Why This Query Works

Date functions support subtraction intervals, allowing dynamic filtering without hardcoding.

⚡ Step 5: Common Mistakes

Using NOW() without DATE() → includes time precision errors.
Forgetting to include the current date boundary.

🧠 Step 6: Alternate Approach

-- SQL Server
SELECT *
FROM orders
WHERE order_date >= DATEADD(DAY, -7, GETDATE());

🎯 Interview Tip

Always confirm timezone alignment — especially in analytics pipelines.

🔍 Interviewer’s Analysis

Concept Tested: Date filtering / relative intervals
Difficulty: Beginner
Follow-Ups: Fetch last month’s orders; group by day.

🪄 End of Q21.

❓ Q22: Calculate the age of each employee.

🧩 Problem Statement

Given birth_date, calculate the employee’s age in years as of today.

💾 Step 1: Assume Example Tables

employees

emp_id	name	birth_date
1	Amit Sharma	1990-05-10
2	Ananya Roy	1995-08-22
3	Bhavna Rai	1987-03-01

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: Date difference functions (DATEDIFF, AGE, TIMESTAMPDIFF).

🏗️ SQL Query (MySQL)

SELECT emp_id,
       name,
       TIMESTAMPDIFF(YEAR, birth_date, CURDATE()) AS age
FROM employees;

🏗️ SQL Query (PostgreSQL)

SELECT emp_id,
       name,
       EXTRACT(YEAR FROM AGE(CURRENT_DATE, birth_date)) AS age
FROM employees;

🔍 Explanation

Subtracts birth_date from today’s date to find total years elapsed.

🧾 Step 3: Expected Output

emp_id	name	age
1	Amit Sharma	34
2	Ananya Roy	29
3	Bhavna Rai	37

⚙️ Step 4: Why This Query Works

SQL’s date functions automatically handle leap years and partial months accurately.

⚡ Step 5: Common Mistakes

Using YEAR(CURDATE()) - YEAR(birth_date) → gives wrong result before birthday.
Storing dates as text (use proper DATE type).

🧠 Step 6: Alternate Approach

SELECT emp_id, name,
       FLOOR(DATEDIFF(CURDATE(), birth_date) / 365.25) AS age
FROM employees;

🎯 Interview Tip

Mention that for precise age (down to days), use interval math rather than dividing by 365.

🔍 Interviewer’s Analysis

Concept Tested: Date functions / age calculation
Difficulty: Beginner
Follow-Ups: Compute employee tenure; find oldest/youngest employee.

🪄 End of Q22.

❓ Q23: Count how many users registered each month.

🧩 Problem Statement

Group users by registration month and count how many joined in each month.

💾 Step 1: Assume Example Tables

users

user_id	name	registration_date
1	Raj	2024-01-15
2	Aisha	2024-01-20
3	Sunil	2024-02-05
4	Priya	2024-02-25
5	Deepa	2024-03-01

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: Date truncation using MONTH() or DATE_TRUNC() for grouping.

🏗️ SQL Query (MySQL)

SELECT DATE_FORMAT(registration_date, '%Y-%m') AS month,
       COUNT(*) AS user_count
FROM users
GROUP BY DATE_FORMAT(registration_date, '%Y-%m')
ORDER BY month;

🏗️ SQL Query (PostgreSQL)

SELECT TO_CHAR(DATE_TRUNC('month', registration_date), 'YYYY-MM') AS month,
       COUNT(*) AS user_count
FROM users
GROUP BY DATE_TRUNC('month', registration_date)
ORDER BY month;

🔍 Explanation

Truncate each registration date to month level.
Count users per month.

🧾 Step 3: Expected Output

month	user_count
2024-01	2
2024-02	2
2024-03	1

⚙️ Step 4: Why This Query Works

Truncating to month ensures all dates in the same month fall into the same group.

⚡ Step 5: Common Mistakes

Grouping by full date instead of month.
Forgetting to format or order chronologically (lexicographic sort misorders months).

🧠 Step 6: Alternate Approach

SELECT EXTRACT(YEAR FROM registration_date) AS year,
       EXTRACT(MONTH FROM registration_date) AS month,
       COUNT(*) AS user_count
FROM users
GROUP BY year, month
ORDER BY year, month;

Cleaner for year-over-year comparisons.

🎯 Interview Tip

Highlight that monthly grouping is common in KPIs like MAU (Monthly Active Users) and retention dashboards.

🔍 Interviewer’s Analysis

Concept Tested: Date grouping / aggregation
Difficulty: Intermediate
Follow-Ups: Extend to week/day granularity; cumulative monthly users.

🪄 End of Q23.

❓ Q24: Find users who made purchases every month of a year.

🧩 Problem Statement

Identify customers who made at least one purchase in every month of 2024.

💾 Step 1: Assume Example Tables

orders

order_id	customer_id	order_date	amount
1	1	2024-01-15	500
2	1	2024-02-10	300
3	1	2024-03-05	400
4	2	2024-01-25	600
5	2	2024-04-15	200
6	3	2024-01-20	800
7	3	2024-02-20	900
8	3	2024-03-10	1000
9	3	2024-04-01	700
10	3	2024-05-03	900
…	…	…	…

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We must check if a user appears in all 12 months for that year.

🏗️ SQL Query

SELECT customer_id
FROM (
  SELECT customer_id,
         COUNT(DISTINCT EXTRACT(MONTH FROM order_date)) AS active_months
  FROM orders
  WHERE EXTRACT(YEAR FROM order_date) = 2024
  GROUP BY customer_id
) t
WHERE active_months = 12;

🔍 Explanation

Inner query counts distinct months with orders per user.
Outer query keeps users with all 12 active months.

🧾 Step 3: Expected Output

customer_id
1	(if had orders in all 12 months)

⚙️ Step 4: Why This Query Works

Distinct month counting ensures even one order per month counts as active.

⚡ Step 5: Common Mistakes

Using COUNT(*) instead of COUNT(DISTINCT MONTH) — overcounts duplicates.
Forgetting to filter by year.

🧠 Step 6: Alternate Approach

To check for specific months dynamically:

SELECT customer_id
FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY customer_id
HAVING COUNT(DISTINCT TO_CHAR(order_date, 'YYYY-MM')) = 12;

🎯 Interview Tip

Mention that this is useful in retention or engagement metrics (“active all months”).

🔍 Interviewer’s Analysis

Concept Tested: Conditional aggregation + DISTINCT counting
Difficulty: Intermediate
Follow-Ups: Adapt to “active at least 6 months.”

🪄 End of Q24.

❓ Q25: Identify customers’ first and last purchase dates.

🧩 Problem Statement

For each customer, show the earliest and latest order dates.

💾 Step 1: Assume Example Tables

orders

order_id	customer_id	order_date	amount
1	1	2023-01-10	500
2	1	2024-02-05	300
3	2	2023-07-01	200
4	2	2024-01-20	400
5	3	2023-03-15	900

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We use MIN() and MAX() grouped by customer_id.

🏗️ SQL Query

SELECT customer_id,
       MIN(order_date) AS first_purchase,
       MAX(order_date) AS last_purchase
FROM orders
GROUP BY customer_id;

🔍 Explanation

MIN finds earliest date; MAX finds latest date per customer.

🧾 Step 3: Expected Output

customer_id	first_purchase	last_purchase
1	2023-01-10	2024-02-05
2	2023-07-01	2024-01-20
3	2023-03-15	2023-03-15

⚙️ Step 4: Why This Query Works

Grouping aggregates across rows for each customer, returning their date range.

⚡ Step 5: Common Mistakes

Forgetting GROUP BY.
Using ORDER BY and LIMIT — less efficient.

🧠 Step 6: Alternate Approach

To fetch also the amounts of first/last purchases:

SELECT o.customer_id,
       FIRST_VALUE(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) AS first_date,
       LAST_VALUE(order_date) OVER (PARTITION BY customer_id ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_date
FROM orders o;

🎯 Interview Tip

Always clarify if the interviewer wants “first ever” or “first in 2024”.

🔍 Interviewer’s Analysis

Concept Tested: Aggregation / Date extremum
Difficulty: Beginner
Follow-Ups: Add gap between purchases.

🪄 End of Q25.

❓ Q26: Calculate repeat purchase rate (customers who ordered more than once).

🧩 Problem Statement

Find the percentage of customers who placed more than one order.

💾 Step 1: Assume Example Tables

orders

order_id	customer_id	amount
1	1	1000
2	1	1500
3	2	700
4	3	1200
5	3	800

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We need:

Number of customers with >1 orders.
Total customers.
Divide (1) by (2).

🏗️ SQL Query

WITH customer_orders AS (
  SELECT customer_id, COUNT(*) AS order_count
  FROM orders
  GROUP BY customer_id
)
SELECT
  ROUND(
    (COUNT(CASE WHEN order_count > 1 THEN 1 END) * 100.0) /
    COUNT(*),
  2
  ) AS repeat_purchase_rate_percent
FROM customer_orders;

🔍 Explanation

CTE counts orders per customer.
Then computes ratio of repeaters to total customers.

🧾 Step 3: Expected Output

repeat_purchase_rate_percent
66.67

(2 of 3 customers ordered more than once.)

⚙️ Step 4: Why This Query Works

By summarizing at customer level first, we can derive aggregate proportions.

⚡ Step 5: Common Mistakes

Counting orders instead of customers.
Forgetting to multiply by 100 for percentage.

🧠 Step 6: Alternate Approach

SELECT
  (SUM(CASE WHEN order_count > 1 THEN 1 ELSE 0 END) * 1.0 / COUNT(*)) AS repeat_rate
FROM (
  SELECT customer_id, COUNT(*) AS order_count
  FROM orders
  GROUP BY customer_id
) sub;

🎯 Interview Tip

This metric is key for customer retention analytics — worth mentioning contextually.

🔍 Interviewer’s Analysis

Concept Tested: Aggregation, CTEs, Conditional Logic
Difficulty: Intermediate
Follow-Ups: Add time filter (“repeat within same quarter”).

🪄 End of Q26.

❓ Q27: Calculate year-over-year (YoY) growth.

🧩 Problem Statement

Compute the YoY growth percentage in sales between consecutive years.

💾 Step 1: Assume Example Tables

sales_summary

year	total_sales
2022	50000
2023	65000
2024	78000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: Compare each year’s sales to the previous year using LAG().

🏗️ SQL Query

SELECT year,
       total_sales,
       LAG(total_sales) OVER (ORDER BY year) AS prev_year_sales,
       ROUND(
         ((total_sales - LAG(total_sales) OVER (ORDER BY year)) * 100.0) /
         LAG(total_sales) OVER (ORDER BY year),
         2
       ) AS yoy_growth_percent
FROM sales_summary;

🔍 Explanation

LAG() pulls previous year’s total.
Computes percentage change.

🧾 Step 3: Expected Output

year	total_sales	prev_year_sales	yoy_growth_percent
2022	50000	NULL	NULL
2023	65000	50000	30.00
2024	78000	65000	20.00

⚙️ Step 4: Why This Query Works

The window function allows comparison without self-join.

⚡ Step 5: Common Mistakes

Forgetting to handle first year (NULL previous).
Using integer division → 0 results.

🧠 Step 6: Alternate Approach

SELECT a.year, a.total_sales,
       ((a.total_sales - b.total_sales) / b.total_sales) * 100 AS yoy_growth
FROM sales_summary a
JOIN sales_summary b
  ON a.year = b.year + 1;

Manual join alternative.

🎯 Interview Tip

Explain how YoY helps detect seasonal patterns or growth anomalies.

🔍 Interviewer’s Analysis

Concept Tested: LAG() / Ratio calculation
Difficulty: Intermediate
Follow-Ups: Extend to quarter-over-quarter (QoQ) growth.

🪄 End of Q27.

❓ Q28: Identify categories where sales decreased for 3 consecutive months.

🧩 Problem Statement

Detect product categories showing continuous decline in sales across 3 months.

💾 Step 1: Assume Example Tables

sales_by_category

category	month	total_sales
Electronics	Jan	10000
Electronics	Feb	9000
Electronics	Mar	8500
Furniture	Jan	5000
Furniture	Feb	5500
Furniture	Mar	6000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: Compare each month with the next two months using LAG() or pattern check.

🏗️ SQL Query

SELECT category
FROM (
  SELECT category,
         month,
         total_sales,
         LAG(total_sales, 1) OVER (PARTITION BY category ORDER BY month) AS prev_month,
         LAG(total_sales, 2) OVER (PARTITION BY category ORDER BY month) AS prev_2_months
  FROM sales_by_category
) t
WHERE total_sales < prev_month
  AND prev_month < prev_2_months;

🔍 Explanation

For each category, pull sales for 2 prior months.
Filter where 3 consecutive months show decreasing totals.

🧾 Step 3: Expected Output

category
Electronics

⚙️ Step 4: Why This Query Works

Window shifts create lagged comparisons; chained inequalities detect downward trends.

⚡ Step 5: Common Mistakes

Comparing wrong order (prev < current).
Forgetting to partition by category.

🧠 Step 6: Alternate Approach

WITH ranked AS (
  SELECT category,
         month,
         total_sales,
         ROW_NUMBER() OVER (PARTITION BY category ORDER BY month) AS rn
  FROM sales_by_category
)
SELECT s1.category
FROM ranked s1
JOIN ranked s2 ON s1.category = s2.category AND s1.rn = s2.rn - 1
JOIN ranked s3 ON s1.category = s3.category AND s1.rn = s3.rn - 2
WHERE s1.total_sales > s2.total_sales AND s2.total_sales > s3.total_sales;

🎯 Interview Tip

Mention that you can generalize this for any “n-month declining” trend using dynamic window logic.

🔍 Interviewer’s Analysis

Concept Tested: Trend detection / time-series windows
Difficulty: Advanced
Follow-Ups: Adapt to “increasing trend” or add threshold (% drop).

🪄 End of Q28.

SQL Developer

❓ Q1: Write a query using a CROSS JOIN to generate product–region combinations.

🧩 Problem Statement

Create all possible combinations between products and regions, regardless of whether sales exist. This is useful for generating sales forecast grids or target matrices.

💾 Step 1: Assume Example Tables

products

product_id	product_name
1	Laptop
2	Phone
3	Tablet

regions

region_id	region_name
101	North
102	South

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

A CROSS JOIN creates a Cartesian product — every row from one table is paired with every row from the other.

🏗️ SQL Query

SELECT p.product_name, r.region_name
FROM products p
CROSS JOIN regions r;

🔍 Explanation

No join condition — every product is paired with every region.
Total combinations = product count × region count.

🧾 Step 3: Expected Output

product_name	region_name
Laptop	North
Laptop	South
Phone	North
Phone	South
Tablet	North
Tablet	South

⚙️ Step 4: Why This Query Works

CROSS JOIN doesn’t rely on keys — it systematically pairs all records.

⚡ Step 5: Common Mistakes

Forgetting that CROSS JOIN can explode data volume.
Confusing it with INNER JOIN (which requires matching condition).

🧠 Step 6: Alternate Approach

-- Equivalent syntax
SELECT p.product_name, r.region_name
FROM products p, regions r;

(Comma syntax is older and less explicit — prefer CROSS JOIN for clarity.)

🎯 Interview Tip

Explain when you’d use CROSS JOIN — e.g., generating all possible region-product slots before filling sales data.

🔍 Interviewer’s Analysis

Concept Tested: Cartesian Product Joins
Difficulty: Beginner → Intermediate
Follow-Ups: Add sales totals using LEFT JOIN after generating combinations.

🪄 End of Q1.

❓ Q2: Find students who enrolled in all available courses.

🧩 Problem Statement

Identify students who have registered for every course offered.

💾 Step 1: Assume Example Tables

students

student_id	name
1	Aisha
2	Raj
3	Meena

courses

course_id	course_name
101	Math
102	Science
103	History

enrollments

student_id	course_id
1	101
1	102
1	103
2	101
3	101
3	103

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We need students whose count of enrolled distinct courses = total number of courses.

🏗️ SQL Query

SELECT e.student_id
FROM enrollments e
GROUP BY e.student_id
HAVING COUNT(DISTINCT e.course_id) = (
  SELECT COUNT(*) FROM courses
);

🔍 Explanation

Inner query → total course count.
Outer query → counts how many each student has taken.
If equal → enrolled in all.

🧾 Step 3: Expected Output

student_id
1

Aisha enrolled in all 3 courses.

⚙️ Step 4: Why This Query Works

It compares per-student distinct course enrollments with global total courses.

⚡ Step 5: Common Mistakes

Forgetting DISTINCT (double-counts re-enrollments).
Comparing with wrong total (e.g., active courses only).

🧠 Step 6: Alternate Approach

SELECT s.student_id, s.name
FROM students s
WHERE NOT EXISTS (
  SELECT 1 FROM courses c
  WHERE NOT EXISTS (
    SELECT 1 FROM enrollments e
    WHERE e.student_id = s.student_id AND e.course_id = c.course_id
  )
);

Uses double NOT EXISTS logic — elegant but advanced.

🎯 Interview Tip

Mention that this is a classic relational division problem (subset logic).

🔍 Interviewer’s Analysis

Concept Tested: Grouping / Subquery comparison
Difficulty: Intermediate
Follow-Ups: Find students enrolled in at least 2 courses.

🪄 End of Q2.

❓ Q3: List managers who oversee more than 3 distinct departments.

🧩 Problem Statement

Return all managers responsible for more than three unique departments.

💾 Step 1: Assume Example Tables

departments

dept_id	dept_name	manager_id
10	Sales	1001
20	HR	1002
30	Tech	1001
40	Marketing	1001
50	Finance	1003
60	Support	1001

managers

manager_id	name
1001	Amit Kumar
1002	Neha Sharma
1003	David Roy

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We count departments per manager and filter where count > 3.

🏗️ SQL Query

SELECT m.manager_id, m.name, COUNT(DISTINCT d.dept_id) AS dept_count
FROM managers m
JOIN departments d ON m.manager_id = d.manager_id
GROUP BY m.manager_id, m.name
HAVING COUNT(DISTINCT d.dept_id) > 3;

🔍 Explanation

JOIN ensures managers linked to their departments.
COUNT(DISTINCT dept_id) avoids duplicates.
HAVING filters based on aggregate.

🧾 Step 3: Expected Output

manager_id	name	dept_count
1001	Amit Kumar	4

⚙️ Step 4: Why This Query Works

Group aggregation summarizes data by manager; HAVING filters after grouping.

⚡ Step 5: Common Mistakes

Using WHERE instead of HAVING.
Forgetting DISTINCT for multiple department duplicates.

🧠 Step 6: Alternate Approach

To check managers who oversee exactly 3 departments:

HAVING COUNT(DISTINCT d.dept_id) = 3;

🎯 Interview Tip

Always clarify whether to count distinct or total departments — crucial difference in real-world data.

🔍 Interviewer’s Analysis

Concept Tested: Grouped aggregation + HAVING
Difficulty: Intermediate
Follow-Ups: Include average salary per manager’s team.

🪄 End of Q3.

❓ Q4: Identify employees who earn less than their managers.

🧩 Problem Statement

Show employees whose salary is lower than their manager’s.

💾 Step 1: Assume Example Tables

employees

emp_id	name	manager_id	salary
1	Raj Patel	4	60000
2	Aisha Khan	4	75000
3	Meena Joshi	5	80000
4	David Kumar	NULL	95000
5	Neha Sharma	NULL	90000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Concept: Self-join comparing employee’s salary to manager’s salary.

🏗️ SQL Query

SELECT e.name AS employee_name,
       e.salary AS employee_salary,
       m.name AS manager_name,
       m.salary AS manager_salary
FROM employees e
JOIN employees m
  ON e.manager_id = m.emp_id
WHERE e.salary < m.salary;

🔍 Explanation

Join employees table to itself — employees (e) vs managers (m).
Compare salaries across the relationship.

🧾 Step 3: Expected Output

employee_name	employee_salary	manager_name	manager_salary
Raj Patel	60000	David Kumar	95000
Aisha Khan	75000	David Kumar	95000
Meena Joshi	80000	Neha Sharma	90000

⚙️ Step 4: Why This Query Works

Self-joins allow hierarchical comparisons between rows in the same table.

⚡ Step 5: Common Mistakes

Forgetting to alias tables.
Missing WHERE e.salary < m.salary condition — leads to all pairs.

🧠 Step 6: Alternate Approach

-- Using subquery
SELECT name
FROM employees e
WHERE salary < (
  SELECT salary
  FROM employees
  WHERE emp_id = e.manager_id
);

Simpler but less efficient on large datasets.

🎯 Interview Tip

Explain why you chose a join vs subquery — discuss readability and performance trade-offs.

🔍 Interviewer’s Analysis

Concept Tested: Self-Join / Hierarchical Comparison
Difficulty: Intermediate
Follow-Ups: Add tie condition or average comparison per manager.

🪄 End of Q4.

❓ Q5: Find departments where the average salary exceeds the company average.

🧩 Problem Statement

List all departments whose average employee salary is higher than the overall company average salary.

💾 Step 1: Assume Example Tables

employees

emp_id	name	dept_id	salary
1	Amit Sharma	10	60000
2	Ananya Roy	10	80000
3	Bhavna Rai	20	90000
4	Chetan Joshi	20	95000
5	Deepa Singh	30	55000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We compare:

Average salary per department.
Overall company average. Return departments where (1) > (2).

🏗️ SQL Query

SELECT dept_id,
       AVG(salary) AS dept_avg_salary
FROM employees
GROUP BY dept_id
HAVING AVG(salary) > (
  SELECT AVG(salary) FROM employees
);

🔍 Explanation

Inner query → company-wide average salary.
Outer → computes per-department average.
HAVING compares group aggregate with scalar subquery.

🧾 Step 3: Expected Output

dept_id	dept_avg_salary
20	92500

(Dept 20’s avg = 92,500 > company avg = 76,000.)

⚙️ Step 4: Why This Query Works

The HAVING clause filters aggregated results post-grouping, comparing each department’s average to the global mean.

⚡ Step 5: Common Mistakes

Using WHERE with aggregate → invalid (must use HAVING).
Forgetting parentheses around subquery.

🧠 Step 6: Alternate Approach

WITH dept_avg AS (
  SELECT dept_id, AVG(salary) AS dept_avg_salary
  FROM employees
  GROUP BY dept_id
),
company_avg AS (
  SELECT AVG(salary) AS company_avg_salary
  FROM employees
)
SELECT d.dept_id, d.dept_avg_salary
FROM dept_avg d, company_avg c
WHERE d.dept_avg_salary > c.company_avg_salary;

CTE version — more readable for large analyses.

🎯 Interview Tip

Always show you can articulate multi-level aggregation logic — Interviewers value why you chose a subquery vs CTE.

🔍 Interviewer’s Analysis

Concept Tested: Aggregation comparison / HAVING with subquery
Difficulty: Intermediate
Follow-Ups: Add department names via join; compare median instead of average.

🪄 End of Q5.

❓ Q6: Find employees earning more than their department’s average.

🧩 Problem Statement

Show all employees whose salary is above the average salary of their own department.

💾 Step 1: Assume Example Tables

employees

emp_id	name	dept_id	salary
1	Amit Sharma	10	60000
2	Ananya Roy	10	80000
3	Bhavna Rai	20	90000
4	Chetan Joshi	20	75000
5	Deepa Singh	30	55000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We need department-level averages per employee — this is where window functions shine.

🏗️ SQL Query

SELECT emp_id,
       name,
       dept_id,
       salary,
       ROUND(AVG(salary) OVER (PARTITION BY dept_id), 2) AS dept_avg
FROM employees
WHERE salary > (
  SELECT AVG(salary)
  FROM employees e2
  WHERE e2.dept_id = employees.dept_id
);

🔍 Explanation

Inner subquery calculates per-department average dynamically.
Outer query keeps only those above it.

🧾 Step 3: Expected Output

emp_id	name	dept_id	salary
2	Ananya Roy	10	80000
3	Bhavna Rai	20	90000

⚙️ Step 4: Why This Query Works

Each employee’s salary is compared against the average of their department group, computed independently per row.

⚡ Step 5: Common Mistakes

Omitting alias in subquery → ambiguous dept_id.
Forgetting to group subquery by department → wrong aggregate.

🧠 Step 6: Alternate Approach

-- Using window function (cleaner and faster)
SELECT emp_id,
       name,
       dept_id,
       salary,
       AVG(salary) OVER (PARTITION BY dept_id) AS dept_avg
FROM employees
QUALIFY salary > AVG(salary) OVER (PARTITION BY dept_id);

(Snowflake/BigQuery syntax; QUALIFY filters windowed results.)

🎯 Interview Tip

Mention that this is a common “above-average performer” query — interviewers expect you to know both the subquery and window methods.

🔍 Interviewer’s Analysis

Concept Tested: Correlated subquery / window partition
Difficulty: Intermediate
Follow-Ups: Rank employees within each department by salary.

🪄 End of Q6.

❓ Q7: Calculate the cumulative percentage contribution of sales per region.

🧩 Problem Statement

Display each region’s total sales and its cumulative percentage of the company’s overall sales.

💾 Step 1: Assume Example Tables

sales

region	total_sales
North	50000
South	30000
East	15000
West	5000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We’ll use:

SUM() for total sales.
SUM() OVER(ORDER BY …) for cumulative sum.
Division to calculate cumulative percentage.

🏗️ SQL Query

SELECT region,
       total_sales,
       SUM(total_sales) OVER (ORDER BY total_sales DESC) AS cumulative_sales,
       ROUND(
         (SUM(total_sales) OVER (ORDER BY total_sales DESC) * 100.0) /
         SUM(total_sales) OVER (),
         2
       ) AS cumulative_percent
FROM sales
ORDER BY total_sales DESC;

🔍 Explanation

Sort regions by total sales (descending).
Compute running total using window sum.
Divide by overall total to get cumulative percentage.

🧾 Step 3: Expected Output

region	total_sales	cumulative_sales	cumulative_percent
North	50000	50000	50.00
South	30000	80000	80.00
East	15000	95000	95.00
West	5000	100000	100.00

⚙️ Step 4: Why This Query Works

Window SUM() functions let us calculate both running totals and percent contributions in a single pass.

⚡ Step 5: Common Mistakes

Forgetting OVER() → turns window sum into group sum.
Missing ORDER BY → cumulative values incorrect.
Dividing by per-row total instead of global total.

🧠 Step 6: Alternate Approach

-- Simpler: just percentage of total
SELECT region,
       total_sales,
       ROUND(total_sales * 100.0 / SUM(total_sales) OVER (), 2) AS percent_of_total
FROM sales;

Gives each region’s individual contribution, not cumulative.

🎯 Interview Tip

Explain how cumulative contribution is used for Pareto analysis (80/20 rule) — key business insight.

🔍 Interviewer’s Analysis

Concept Tested: Windowed SUM / Percentage contribution
Difficulty: Intermediate → Advanced
Follow-Ups: Filter top regions contributing 80% of total sales.

🪄 End of Q7.

❓ Q8: Show employee performance rank reset every quarter.

🧩 Problem Statement

Rank employees based on performance score, but restart the ranking every quarter.

💾 Step 1: Assume Example Tables

employee_performance

emp_id	quarter	score
1	Q1	88
2	Q1	75
3	Q1	92
1	Q2	91
2	Q2	84
3	Q2	78

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We use RANK() with PARTITION BY quarter to reset per quarter.

🏗️ SQL Query

SELECT emp_id,
       quarter,
       score,
       RANK() OVER (
         PARTITION BY quarter
         ORDER BY score DESC
       ) AS quarterly_rank
FROM employee_performance
ORDER BY quarter, quarterly_rank;

🔍 Explanation

PARTITION BY quarter → restarts rank each quarter.
ORDER BY score DESC → higher score = higher rank.

🧾 Step 3: Expected Output

emp_id	quarter	score	quarterly_rank
3	Q1	92	1
1	Q1	88	2
2	Q1	75	3
1	Q2	91	1
2	Q2	84	2
3	Q2	78	3

⚙️ Step 4: Why This Query Works

Window partitioning divides data by quarter; within each group, ranking restarts independently.

⚡ Step 5: Common Mistakes

Forgetting PARTITION BY → rank continues across all quarters.
Using ORDER BY in main query but not in OVER() → wrong rank order.

🎯 Interview Tip

This is a classic “rank reset” question — emphasize PARTITION BY logic clearly.

🔍 Interviewer’s Analysis

Concept Tested: Partitioned Ranking
Difficulty: Intermediate
Follow-Ups: Add tie-handling (switch RANK to DENSE_RANK).

🪄 End of Q8.

❓ Q9: Calculate monthly revenue change (month-over-month).

🧩 Problem Statement

Show each month’s revenue and how much it changed compared to the previous month.

💾 Step 1: Assume Example Tables

monthly_sales

month	revenue
Jan	50000
Feb	60000
Mar	55000
Apr	70000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Use LAG() to access previous month’s revenue, then compute difference.

🏗️ SQL Query

SELECT month,
       revenue,
       LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
       revenue - LAG(revenue) OVER (ORDER BY month) AS change_amount,
       ROUND(
         ((revenue - LAG(revenue) OVER (ORDER BY month)) * 100.0) /
         LAG(revenue) OVER (ORDER BY month), 2
       ) AS change_percent
FROM monthly_sales;

🔍 Explanation

LAG() fetches the previous month’s revenue.
Subtracts to get absolute and percentage change.

🧾 Step 3: Expected Output

month	revenue	prev_revenue	change_amount	change_percent
Jan	50000	NULL	NULL	NULL
Feb	60000	50000	10000	20.00
Mar	55000	60000	-5000	-8.33
Apr	70000	55000	15000	27.27

⚙️ Step 4: Why This Query Works

LAG() makes it possible to calculate time-based deltas without self-joins.

⚡ Step 5: Common Mistakes

Forgetting to order by month (causes random differences).
Using integer division for percentages.

🎯 Interview Tip

Always mention how to handle missing months or gaps in time series.

🔍 Interviewer’s Analysis

Concept Tested: LAG / Sequential Comparison
Difficulty: Intermediate
Follow-Ups: Calculate cumulative MoM growth over time.

🪄 End of Q9.

❓ Q10: Calculate YoY growth for each product.

🧩 Problem Statement

For each product, compute its year-over-year (YoY) percentage growth.

💾 Step 1: Assume Example Tables

sales

product_id	year	revenue
1	2022	50000
1	2023	65000
1	2024	70000
2	2022	30000
2	2023	33000
2	2024	30000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We use LAG() within each product partition to compare consecutive years.

🏗️ SQL Query

SELECT product_id,
       year,
       revenue,
       LAG(revenue) OVER (
         PARTITION BY product_id
         ORDER BY year
       ) AS prev_year_revenue,
       ROUND(
         ((revenue - LAG(revenue) OVER (PARTITION BY product_id ORDER BY year)) * 100.0) /
         LAG(revenue) OVER (PARTITION BY product_id ORDER BY year),
         2
       ) AS yoy_growth_percent
FROM sales
ORDER BY product_id, year;

🔍 Explanation

PARTITION BY product_id ensures per-product calculations.
The LAG() retrieves prior year’s revenue for the same product.

🧾 Step 3: Expected Output

product_id	year	revenue	prev_year_revenue	yoy_growth_percent
1	2022	50000	NULL	NULL
1	2023	65000	50000	30.00
1	2024	70000	65000	7.69
2	2022	30000	NULL	NULL
2	2023	33000	30000	10.00
2	2024	30000	33000	-9.09

⚙️ Step 4: Why This Query Works

It performs rolling comparisons within logical groups (per product) — no self-join needed.

⚡ Step 5: Common Mistakes

Missing PARTITION BY → compares across products.
Dividing by 0 when previous revenue = 0 (handle with NULLIF).

🧠 Step 6: Alternate Approach

-- Join-based version
SELECT a.product_id, a.year,
       ((a.revenue - b.revenue) / b.revenue) * 100 AS yoy_growth
FROM sales a
JOIN sales b
  ON a.product_id = b.product_id AND a.year = b.year + 1;

Works but slower for large tables.

🎯 Interview Tip

Always mention partitioning when discussing window functions — it’s the magic word interviewers love.

🔍 Interviewer’s Analysis

Concept Tested: Partitioned LAG / Ratio Calculations
Difficulty: Intermediate → Advanced
Follow-Ups: Handle missing years (2022 → 2024 jump).

🪄 End of Q10.

❓ Q11: Display products with 3 consecutive months of sales decline.

🧩 Problem Statement

Identify products whose monthly sales dropped 3 times in a row.

💾 Step 1: Assume Example Tables

monthly_sales

product_id	month	sales
1	Jan	1000
1	Feb	900
1	Mar	800
1	Apr	700
2	Jan	1200
2	Feb	1300
2	Mar	1250

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We’ll use multiple LAG() windows to detect a downward chain.

🏗️ SQL Query

SELECT product_id
FROM (
  SELECT product_id,
         month,
         sales,
         LAG(sales, 1) OVER (PARTITION BY product_id ORDER BY month) AS prev_sales,
         LAG(sales, 2) OVER (PARTITION BY product_id ORDER BY month) AS prev2_sales
  FROM monthly_sales
) t
WHERE sales < prev_sales AND prev_sales < prev2_sales;

🔍 Explanation

Two lagged comparisons ensure 3 consecutive months of decline.
The outer filter checks monotonic decreasing trend.

🧾 Step 3: Expected Output

product_id
1

⚙️ Step 4: Why This Query Works

Sequential window comparisons reveal downward patterns efficiently — no need for recursive logic.

⚡ Step 5: Common Mistakes

Forgetting to partition by product_id.
Using < inconsistently (e.g., <= creates false positives for flat months).

🧠 Step 6: Alternate Approach

-- Using row numbers and self-join
WITH ranked AS (
  SELECT product_id, month, sales,
         ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY month) AS rn
  FROM monthly_sales
)
SELECT r1.product_id
FROM ranked r1
JOIN ranked r2 ON r1.product_id = r2.product_id AND r1.rn = r2.rn - 1
JOIN ranked r3 ON r1.product_id = r3.product_id AND r1.rn = r3.rn - 2
WHERE r1.sales > r2.sales AND r2.sales > r3.sales;

More verbose but works in all SQL dialects.

🎯 Interview Tip

Mention this query demonstrates trend analysis — often used for detecting declining products or churn.

🔍 Interviewer’s Analysis

Concept Tested: Consecutive trend detection / LAG chaining
Difficulty: Advanced
Follow-Ups: Detect 3 consecutive increases; detect “flat” patterns.

🪄 End of Q11.

❓ Q12: Assign ROW_NUMBER to employees ordered by salary.

🧩 Problem Statement

Give each employee a unique sequential number based on their salary order (highest first).

💾 Step 1: Assume Example Tables

employees

emp_id	name	salary
1	Amit Sharma	60000
2	Ananya Roy	80000
3	Bhavna Rai	90000
4	Chetan Joshi	70000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We’ll use ROW_NUMBER() to generate unique sequence numbers based on salary order.

🏗️ SQL Query

SELECT emp_id,
       name,
       salary,
       ROW_NUMBER() OVER (ORDER BY salary DESC) AS salary_rank
FROM employees;

🔍 Explanation

ORDER BY salary DESC sorts employees by salary.
ROW_NUMBER() assigns unique numbers starting at 1.

🧾 Step 3: Expected Output

emp_id	name	salary	salary_rank
3	Bhavna Rai	90000	1
2	Ananya Roy	80000	2
4	Chetan Joshi	70000	3
1	Amit Sharma	60000	4

⚙️ Step 4: Why This Query Works

Unlike RANK(), ROW_NUMBER() always increments sequentially — even for ties.

⚡ Step 5: Common Mistakes

Using RANK() expecting sequential numbers — produces gaps.
Forgetting to specify ORDER BY → arbitrary numbering.

🧠 Step 6: Alternate Approach

-- Add department partition
SELECT emp_id, name, dept_id, salary,
       ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS dept_salary_rank
FROM employees;

🎯 Interview Tip

Interviewers often ask, “What’s the difference between ROW_NUMBER, RANK, and DENSE_RANK?” Be ready to explain how they handle ties.

🔍 Interviewer’s Analysis

Concept Tested: Window Ranking / ROW_NUMBER
Difficulty: Beginner → Intermediate
Follow-Ups: Find top-N per department using ROW_NUMBER in subquery.

🪄 End of Q12.

❓ Q13: Delete duplicate rows from a table.

🧩 Problem Statement

Remove duplicate records from a table while keeping only one instance of each unique row.

💾 Step 1: Assume Example Tables

customers

customer_id	name	email
1	Aisha	aisha@gmail.com
2	Raj	raj@gmail.com
3	Meena	meena@gmail.com
2	Raj	raj@gmail.com
3	Meena	meena@gmail.com

(Notice duplicates for Raj and Meena.)

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Use a window function to assign row numbers to duplicates, then delete all but the first.

🏗️ SQL Query (Generic SQL)

DELETE FROM customers
WHERE customer_id NOT IN (
  SELECT MIN(customer_id)
  FROM customers
  GROUP BY name, email
);

🏗️ SQL Query (If DB supports CTEs)

WITH ranked AS (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY name, email ORDER BY customer_id) AS rn
  FROM customers
)
DELETE FROM ranked WHERE rn > 1;

🔍 Explanation

Group duplicates by logical uniqueness (name, email).
Keep the first row, delete others.
ROW_NUMBER() ensures deterministic deletion.

🧾 Step 3: Expected Output

customer_id	name	email
1	Aisha	aisha@gmail.com
2	Raj	raj@gmail.com
3	Meena	meena@gmail.com

⚙️ Step 4: Why This Query Works

The ranking-based approach ensures exactly one record per group remains.

⚡ Step 5: Common Mistakes

Forgetting to partition by correct columns.
Running DELETE without filtering — wiping out all rows.

🎯 Interview Tip

Interviewers often ask: “How would you do this without CTEs?” — Be ready to explain both approaches (group aggregate + window).

🔍 Interviewer’s Analysis

Concept Tested: Deduplication / Window filtering
Difficulty: Intermediate
Follow-Ups: Preserve the latest record (use ORDER BY in ROW_NUMBER()).

🪄 End of Q13.

❓ Q14: Keep only the first occurrence (lowest ID) of duplicates.

🧩 Problem Statement

From a table with duplicate entries, retain only the first row (lowest ID) for each unique combination.

💾 Step 1: Assume Example Tables

transactions

trans_id	user_id	product_id	amount
1	101	P1	500
2	101	P1	500
3	101	P2	300
4	102	P3	700

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We want the earliest record per user-product pair.

🏗️ SQL Query

DELETE FROM transactions
WHERE trans_id NOT IN (
  SELECT MIN(trans_id)
  FROM transactions
  GROUP BY user_id, product_id
);

🔍 Explanation

MIN(trans_id) identifies the first occurrence per group.
Deletes all others with higher IDs.

🧾 Step 3: Expected Output

trans_id	user_id	product_id	amount
1	101	P1	500
3	101	P2	300
4	102	P3	700

⚙️ Step 4: Why This Query Works

Using the MIN() of unique identifiers ensures deterministic retention.

⚡ Step 5: Common Mistakes

Using DISTINCT instead of deletion — doesn’t remove from table.
Forgetting composite uniqueness (user_id + product_id).

🎯 Interview Tip

In production, never DELETE directly — test first with SELECT to confirm affected rows.

🔍 Interviewer’s Analysis

Concept Tested: Deduplication via group min/max
Difficulty: Beginner → Intermediate
Follow-Ups: Keep latest record (replace MIN() with MAX()).

🪄 End of Q14.

❓ Q15: Standardize phone numbers (remove special characters).

🧩 Problem Statement

Clean messy phone numbers by removing -, (, ), spaces, and converting to a uniform 10-digit string.

💾 Step 1: Assume Example Tables

customers

customer_id	name	phone
1	Raj	(123)-456-7890
2	Aisha	123 456 7890
3	Meena	123.456.7890
4	Amit	+91-98765-43210

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We’ll use nested REPLACE() or regex functions to strip unwanted characters.

🏗️ SQL Query (MySQL)

SELECT customer_id,
       name,
       REPLACE(
         REPLACE(
           REPLACE(
             REPLACE(phone, '-', ''),
           '(', ''),
         ')', ''),
       ' ', '') AS clean_phone
FROM customers;

🏗️ SQL Query (PostgreSQL)

SELECT customer_id,
       name,
       REGEXP_REPLACE(phone, '[^0-9]', '', 'g') AS clean_phone
FROM customers;

🔍 Explanation

Regex removes all non-digit characters.
Results in consistent numeric-only phone format.

🧾 Step 3: Expected Output

customer_id	name	clean_phone
1	Raj	1234567890
2	Aisha	1234567890
3	Meena	1234567890
4	Amit	919876543210

⚙️ Step 4: Why This Query Works

Regex [^\d] removes all characters that aren’t digits, making phone data clean for analytics or joining.

⚡ Step 5: Common Mistakes

Forgetting the 'g' flag (global replace).
Not handling international codes consistently.

🎯 Interview Tip

Mention you’d validate lengths post-cleanup (10 or 12 digits) before loading into systems.

🔍 Interviewer’s Analysis

Concept Tested: String cleaning / regex
Difficulty: Intermediate
Follow-Ups: Enforce specific length via CHAR_LENGTH().

🪄 End of Q15.

❓ Q16: Split address into street, city, and state.

🧩 Problem Statement

Given a combined address column, extract its components: street, city, and state.

💾 Step 1: Assume Example Tables

addresses

id	address
1	123 Main St, Mumbai, MH
2	456 Oak Road, Delhi, DL
3	789 Palm Ave, Bangalore, KA

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We’ll use string-splitting functions such as SUBSTRING_INDEX() or SPLIT_PART().

🏗️ SQL Query (MySQL)

SELECT id,
       TRIM(SUBSTRING_INDEX(address, ',', 1)) AS street,
       TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(address, ',', 2), ',', -1)) AS city,
       TRIM(SUBSTRING_INDEX(address, ',', -1)) AS state
FROM addresses;

🏗️ SQL Query (PostgreSQL)

SELECT id,
       SPLIT_PART(address, ',', 1) AS street,
       SPLIT_PART(address, ',', 2) AS city,
       SPLIT_PART(address, ',', 3) AS state
FROM addresses;

🔍 Explanation

Split the string by commas.
Extract the 1st, 2nd, and 3rd parts.

🧾 Step 3: Expected Output

id	street	city	state
1	123 Main St	Mumbai	MH
2	456 Oak Road	Delhi	DL
3	789 Palm Ave	Bangalore	KA

⚙️ Step 4: Why This Query Works

The SPLIT_PART() or SUBSTRING_INDEX() functions allow parsing multi-part text columns into normalized attributes.

⚡ Step 5: Common Mistakes

Forgetting to trim spaces.
Assuming consistent delimiters — real data often varies.

🎯 Interview Tip

Mention that in ETL pipelines, this logic often moves to staging tables to avoid slowing the main DB.

🔍 Interviewer’s Analysis

Concept Tested: String splitting / text parsing
Difficulty: Intermediate
Follow-Ups: Handle inconsistent delimiters (“;” or “|”).

🪄 End of Q16.

❓ Q17: Convert customer names into Proper Case.

🧩 Problem Statement

Standardize customer names so that each word starts with an uppercase letter.

💾 Step 1: Assume Example Tables

customers

customer_id	name
1	raj patel
2	AISHA ROY
3	meena singh

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Use text transformation functions — varies by SQL dialect.

🏗️ SQL Query (PostgreSQL)

SELECT customer_id,
       INITCAP(name) AS proper_name
FROM customers;

🏗️ SQL Query (MySQL)

SELECT customer_id,
       CONCAT(UCASE(LEFT(name, 1)), LCASE(SUBSTRING(name, 2))) AS proper_name
FROM customers;

🔍 Explanation

INITCAP() capitalizes the first letter of each word.
MySQL version manually constructs it using substring logic.

🧾 Step 3: Expected Output

customer_id	proper_name
1	Raj Patel
2	Aisha Roy
3	Meena Singh

⚙️ Step 4: Why This Query Works

Transforms messy casing into consistent, human-readable names.

⚡ Step 5: Common Mistakes

Ignoring multi-word names in MySQL (only first letter handled).
Forgetting locale-specific capitalization rules.

🎯 Interview Tip

Mention that standardized text improves joins, reporting, and ML data quality.

🔍 Interviewer’s Analysis

Concept Tested: String transformation
Difficulty: Beginner
Follow-Ups: Extend to addresses or product titles.

🪄 End of Q17.

❓ Q18: Design tables to store credit card transactions with normalization.

🧩 Problem Statement

Design a normalized schema for storing credit card transactions to ensure data consistency, minimal redundancy, and fast lookups.

💾 Step 1: Conceptual Breakdown

We’re modeling:

Customers — who own cards
Cards — issued to customers
Merchants — where transactions occur
Transactions — main fact table

We’ll normalize to 3NF (Third Normal Form):

Each table has a unique key.
No transitive dependencies.

🧮 Step 2: Table Designs

1️⃣ customers

CREATE TABLE customers (
  customer_id INT PRIMARY KEY,
  name VARCHAR(100),
  email VARCHAR(100) UNIQUE,
  phone VARCHAR(15)
);

2️⃣ cards

CREATE TABLE cards (
  card_id INT PRIMARY KEY,
  customer_id INT,
  card_number CHAR(16) UNIQUE,
  issue_date DATE,
  expiry_date DATE,
  FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

3️⃣ merchants

CREATE TABLE merchants (
  merchant_id INT PRIMARY KEY,
  name VARCHAR(100),
  category VARCHAR(50),
  city VARCHAR(50)
);

4️⃣ transactions

CREATE TABLE transactions (
  transaction_id BIGINT PRIMARY KEY,
  card_id INT,
  merchant_id INT,
  transaction_date TIMESTAMP,
  amount DECIMAL(10, 2),
  status VARCHAR(20),
  FOREIGN KEY (card_id) REFERENCES cards(card_id),
  FOREIGN KEY (merchant_id) REFERENCES merchants(merchant_id)
);

🧾 Step 3: Sample Data Flow

customer_id	name	email
1	Raj	raj@bank.com
2	Aisha	aisha@bank.com

card_id	customer_id	card_number
10	1	1234567812345678
11	2	8765432187654321

merchant_id	name	category	city
101	Amazon	E-Com	Mumbai
102	Starbucks	Food	Delhi

transaction_id	card_id	merchant_id	amount	status
9001	10	101	1200.50	Success
9002	10	102	350.00	Success

⚙️ Step 4: Why This Schema Works

Fully normalized → no duplication.
Relationships clearly defined with foreign keys.
Easy joins for transaction analysis and fraud detection.

🎯 Interview Tip

Interviewers often ask: “How would you denormalize this for analytics?” → Create a materialized view joining all four tables for dashboard reads.

🔍 Interviewer’s Analysis

Concept Tested: Schema design / normalization
Difficulty: Advanced
Follow-Ups: Add partitioning for transactions; introduce currency dimension.

🪄 End of Q18.

❓ Q19: Create an index on `customer_id` for fast lookups.

🧩 Problem Statement

You frequently filter transactions by customer_id. Improve query performance using indexes.

💾 Step 1: Base Table

CREATE TABLE transactions (
  transaction_id BIGINT PRIMARY KEY,
  customer_id INT,
  amount DECIMAL(10,2),
  transaction_date DATE
);

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Indexing creates a sorted access path — reducing search time from O(n) to O(log n).

🏗️ SQL Query

CREATE INDEX idx_customer_id
ON transactions (customer_id);

🔍 Explanation

The database builds a B-tree index on customer_id.

Queries like:

SELECT * FROM transactions WHERE customer_id = 101;

will use this index to locate rows faster.

⚙️ Step 4: Why This Query Works

An index avoids full-table scans for frequent lookups on indexed columns.

⚡ Step 5: Common Mistakes

Indexing low-cardinality columns (e.g., status) → no gain.
Over-indexing → slows down inserts and updates.

🎯 Interview Tip

Mention “Index Selectivity” —

The ratio of unique values to total rows — higher is better.

🔍 Interviewer’s Analysis

Concept Tested: Index design / performance tuning
Difficulty: Intermediate
Follow-Ups: Explain difference between clustered and non-clustered indexes.

🪄 End of Q19.

❓ Q20: Implement partitioning for large tables.

🧩 Problem Statement

Your transactions table has billions of rows. Implement partitioning to improve performance and maintainability.

💾 Step 1: Example Table

CREATE TABLE transactions (
  transaction_id BIGINT,
  transaction_date DATE,
  amount DECIMAL(10,2)
);

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Partitioning divides data physically into chunks — usually by date, region, or hash — improving query speed and manageability.

🏗️ SQL Query (MySQL RANGE partitioning)

CREATE TABLE transactions_partitioned (
  transaction_id BIGINT,
  transaction_date DATE,
  amount DECIMAL(10,2)
)
PARTITION BY RANGE (YEAR(transaction_date)) (
  PARTITION p2023 VALUES LESS THAN (2024),
  PARTITION p2024 VALUES LESS THAN (2025),
  PARTITION pmax  VALUES LESS THAN MAXVALUE
);

🔍 Explanation

Each partition holds one year’s worth of data.

Queries like

SELECT * FROM transactions_partitioned WHERE transaction_date >= '2024-01-01';

will automatically scan only the relevant partition.

⚙️ Step 4: Why This Query Works

Partitioning prunes irrelevant data blocks → faster scans.

⚡ Step 5: Common Mistakes

Using too many partitions → metadata overhead.
Forgetting to update partition definitions annually.

🎯 Interview Tip

Always mention partition pruning — the DB engine only reads needed partitions.

🔍 Interviewer’s Analysis

Concept Tested: Table partitioning / scalability
Difficulty: Advanced
Follow-Ups: Discuss hash vs range vs list partitioning.

🪄 End of Q20.

❓ Q21: Create a temporary table for storing summary results.

🧩 Problem Statement

During ETL, you need to store intermediate aggregations temporarily — e.g., daily revenue per region.

💾 Step 1: Example Base Table

sales

region	sale_date	amount
North	2024-01-01	5000
South	2024-01-01	7000
North	2024-01-02	8000

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

A temporary table lives only within your session — perfect for multi-step ETL transformations.

🏗️ SQL Query

CREATE TEMPORARY TABLE temp_daily_revenue AS
SELECT region,
       sale_date,
       SUM(amount) AS total_sales
FROM sales
GROUP BY region, sale_date;

🔍 Explanation

Creates a temp table with summarized data.
Auto-drops when session ends.

🧾 Step 3: Expected Output

region	sale_date	total_sales
North	2024-01-01	5000
South	2024-01-01	7000
North	2024-01-02	8000

⚙️ Step 4: Why This Query Works

Temporary tables allow multi-step computation without polluting the permanent schema.

⚡ Step 5: Common Mistakes

Using permanent tables for staging data.
Forgetting session cleanup (though most databases handle it automatically).

🎯 Interview Tip

Mention that temporary tables are critical for incremental ETL pipelines and intermediate staging.

🔍 Interviewer’s Analysis

Concept Tested: Temporary tables / staging logic
Difficulty: Intermediate
Follow-Ups: Use CTEs instead of temporary tables for one-time queries.

🪄 End of Q21.

❓ Q22: Create a view summarizing customer purchases.

🧩 Problem Statement

Define a view that shows each customer’s total purchase amount and last purchase date.

💾 Step 1: Assume Example Tables

orders

order_id	customer_id	order_date	amount
1	101	2024-01-10	500
2	101	2024-02-15	300
3	102	2024-01-20	700
4	103	2024-03-01	200

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

A view is a stored query result that can be referenced like a table.

🏗️ SQL Query

CREATE VIEW customer_summary AS
SELECT customer_id,
       SUM(amount) AS total_purchases,
       MAX(order_date) AS last_purchase_date
FROM orders
GROUP BY customer_id;

🔍 Explanation

Aggregates total purchase amount per customer.
Tracks most recent purchase date.
Reusable for dashboards and reporting.

🧾 Step 3: Expected Output

customer_id	total_purchases	last_purchase_date
101	800	2024-02-15
102	700	2024-01-20
103	200	2024-03-01

⚙️ Step 4: Why This Query Works

The view encapsulates recurring logic — improving maintainability and code reuse.

⚡ Step 5: Common Mistakes

Using views with volatile data in critical transactions → may degrade performance.
Forgetting to refresh materialized views (in DBs that support them).

🎯 Interview Tip

Explain when to use:

Views → Logical abstraction.
Materialized Views → Precomputed aggregates for faster reporting.

🔍 Interviewer’s Analysis

Concept Tested: View creation / data summarization
Difficulty: Beginner → Intermediate
Follow-Ups: Convert to materialized view for faster reads.

🪄 End of Q22.

❓ Q23: Rewrite a correlated subquery as a join to improve performance.

🧩 Problem Statement

Optimize a query that uses a correlated subquery by rewriting it as a JOIN, which performs better on large datasets.

💾 Step 1: Assume Example Tables

employees

emp_id	name	dept_id	salary
1	Raj Patel	10	70000
2	Aisha Khan	10	85000
3	Meena Joshi	20	95000
4	Chetan Joshi	20	88000

departments

dept_id	dept_name	location
10	Sales	Mumbai
20	Engineering	Delhi

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

The original query uses a correlated subquery to fetch department names.

🚫 Inefficient Version

SELECT e.name,
       (SELECT dept_name
        FROM departments d
        WHERE d.dept_id = e.dept_id) AS dept_name
FROM employees e;

This subquery executes once per row (O(n²) time for large data).

✅ Optimized JOIN Version

SELECT e.name, d.dept_name
FROM employees e
JOIN departments d
  ON e.dept_id = d.dept_id;

🔍 Explanation

The join retrieves all needed data in one scan.
Modern SQL optimizers convert simple correlated subqueries into joins, but not always — explicitly writing the join is safer and clearer.

🧾 Step 3: Expected Output

name	dept_name
Raj Patel	Sales
Aisha Khan	Sales
Meena Joshi	Engineering
Chetan Joshi	Engineering

⚙️ Step 4: Why This Query Works

JOINs allow set-based operations, minimizing per-row lookups and improving scalability.

⚡ Step 5: Common Mistakes

Forgetting to use table aliases (e, d).
Using LEFT JOIN unnecessarily when an INNER JOIN suffices.

🎯 Interview Tip

Interviewers often ask:

“How does rewriting improve performance?” Answer: “Because JOINs allow SQL engines to optimize via hash joins or index scans, while correlated subqueries often force nested-loop evaluations.”

🔍 Interviewer’s Analysis

Concept Tested: Subquery optimization
Difficulty: Intermediate
Follow-Ups: Compare runtime via EXPLAIN plans.

🪄 End of Q23.

❓ Q24: Find the difference in execution time between indexed vs non-indexed queries.

🧩 Problem Statement

Demonstrate how indexes improve query performance by comparing execution time of a filtered query with and without an index.

💾 Step 1: Assume Example Table

orders

order_id	customer_id	order_date	amount
1	101	2024-01-01	500
2	102	2024-01-05	700
…	…	…	…

(Assume ~1M rows in practice.)

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

We’ll test a lookup query with and without an index on customer_id.

🚫 Without Index

SELECT * FROM orders WHERE customer_id = 101;

Execution plan: Full table scan
Time complexity: O(n)

✅ With Index

CREATE INDEX idx_customer_id ON orders(customer_id);

SELECT * FROM orders WHERE customer_id = 101;

Execution plan: Index seek
Time complexity: O(log n)

🧾 Step 3: Expected Result (Performance Conceptually)

Scenario	Operation Type	Approx. Time (for 1M rows)
Without Index	Full Scan	~1.2 sec
With Index	Index Seek	~0.01 sec

⚙️ Step 4: Why This Query Works

An index acts like a sorted lookup table — dramatically reduces search space. Query planners automatically use the best available index for filtering conditions.

⚡ Step 5: Common Mistakes

Indexing low-cardinality columns (e.g., gender) → negligible benefit.
Forgetting to analyze query plans using EXPLAIN.

🧠 Step 6: Verify with `EXPLAIN`

EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 101;

Compare cost estimates before and after indexing.

🎯 Interview Tip

If asked, say:

“Indexing improves read performance but adds overhead on writes — so I only index high-selectivity columns used in frequent filters.”

🔍 Interviewer’s Analysis

Concept Tested: Indexing impact / EXPLAIN plans
Difficulty: Intermediate
Follow-Ups: Composite indexes and index maintenance cost.

🪄 End of Q24.

❓ Q25: Demonstrate the safe use of query hints (`USE INDEX`, `FORCE ORDER`).

🧩 Problem Statement

Show how to guide the SQL optimizer using query hints, while understanding when it’s safe (or unsafe) to do so.

💾 Step 1: Assume Example Tables

employees departments

emp_id	name	dept_id
1	Raj	10
2	Aisha	20
3	Meena	10

dept_id	dept_name
10	Sales
20	HR

🧮 Step 2: Step-by-Step Solution

🧠 Understanding

Most of the time, the optimizer chooses the best execution plan automatically. But sometimes — e.g., in skewed data — you can use hints to influence behavior.

✅ Example 1 — Force Index

SELECT e.name, d.dept_name
FROM employees e USE INDEX (idx_dept)
JOIN departments d ON e.dept_id = d.dept_id;

Use Case: If multiple indexes exist, this hint forces the optimizer to use idx_dept.

✅ Example 2 — Force Join Order

SELECT /*+ FORCE ORDER */ e.name, d.dept_name
FROM employees e
JOIN departments d ON e.dept_id = d.dept_id;

Use Case: When query planner’s automatic join reordering produces inefficient plans for known data distributions.

⚙️ Step 4: Why This Query Works

Hints let you override optimizer heuristics only when you’re certain your data pattern justifies it.

⚡ Step 5: Common Mistakes

Overusing hints — they can break performance as data evolves.
Hardcoding index names that later change.

🎯 Interview Tip

Say this line:

“Hints are a scalpel, not a hammer — I use them sparingly and only after verifying the query plan and data distribution.”

🔍 Interviewer’s Analysis

Concept Tested: Optimizer behavior / Query tuning
Difficulty: Advanced
Follow-Ups: Explain how to verify plan changes with EXPLAIN ANALYZE.

🪄 End of Q25.

Interview Theories Solutions