👩💼 Business Analyst (BA)
(Beginner-friendly, example-driven Q&A for Top Tech Company Interviews)
❓ Q1: Retrieve all customer records located in a specific city.
🧩 Problem Statement
Return every customer row where the customer lives in a given city (e.g., Mumbai). This is basic selection + filtering.
💾 Step 1: Assume Example Tables
customers — list of customers.
| customer_id | name | city | joined_date | |
|---|---|---|---|---|
| 1 | Raj Patel | raj@example.com | Mumbai | 2023-05-10 |
| 2 | Aisha Khan | aisha@example.com | Delhi | 2022-11-01 |
| 3 | Sunil Rao | sunil@example.com | Mumbai | 2024-02-15 |
| 4 | Priya Verma | priya@example.com | Pune | 2021-08-20 |
Columns:
customer_id: unique id.city: used to filter.
🧮 Step 2: Step-by-Step Solution
🧠 Understanding the Problem
Tested concept: SELECT + WHERE filtering. We simply select rows where city = 'Mumbai'.
🏗️ SQL Query
SELECT customer_id, name, email, city, joined_date
FROM customers
WHERE city = 'Mumbai';🔍 Explanation
SELECTchooses columns.WHERE city = 'Mumbai'filters only customers in Mumbai.
🧾 Step 3: Expected Output
| customer_id | name | city | joined_date | |
|---|---|---|---|---|
| 1 | Raj Patel | raj@example.com | Mumbai | 2023-05-10 |
| 3 | Sunil Rao | sunil@example.com | Mumbai | 2024-02-15 |
This shows both customers living in Mumbai.
⚙️ Step 4: Why This Query Works
A direct equality filter returns only matching rows; it’s efficient with an index on city.
⚡ Step 5: Common Mistakes
- Forgetting quotes around string values.
- Using
LIKEwhen exact match is intended (or vice versa). - Case-sensitivity issues depending on DB collation.
🧠 Step 6: Alternate Approach (Optional)
-- Using parameter / placeholder (safer in application code)
SELECT *
FROM customers
WHERE city = ?;Use this in prepared statements to avoid SQL injection.
🎯 Interview Tip
Explain that you would parameterize the city in production code and consider indexing city if queries are frequent.
🔍 Interviewer’s Analysis
- Concept Tested: WHERE filter / selective projection
- Difficulty: Beginner
- Time Complexity: O(n) scan (or O(log n) if indexed)
- Follow-Up Prompts: Indexing, case-sensitivity, parameterization.
🪄 End of Q1 — ready for the next question.
❓ Q2: Find employees whose names start with “A”.
🧩 Problem Statement
Return employee records where name begins with the letter A.
💾 Step 1: Assume Example Tables
employees
| emp_id | name | dept_id | salary | hired_date |
|---|---|---|---|---|
| 1 | Amit Sharma | 10 | 60000 | 2020-01-15 |
| 2 | Bhavna Rai | 20 | 55000 | 2019-03-10 |
| 3 | Ananya Roy | 10 | 70000 | 2021-07-22 |
| 4 | Chetan Joshi | 30 | 50000 | 2022-09-05 |
Columns:
name: text to search with pattern.
🧮 Step 2: Step-by-Step Solution
🧠 Understanding the Problem
Tested concept: pattern matching (LIKE) or ILIKE for case-insensitive checks.
🏗️ SQL Query
SELECT emp_id, name, dept_id, salary, hired_date
FROM employees
WHERE name LIKE 'A%';(If your DB supports case-insensitive ILIKE and you want to catch lower-case a: use WHERE name ILIKE 'a%'.)
🔍 Explanation
LIKE 'A%'matches strings starting withA.%means any following characters.
🧾 Step 3: Expected Output
| emp_id | name | dept_id | salary | hired_date |
|---|---|---|---|---|
| 1 | Amit Sharma | 10 | 60000 | 2020-01-15 |
| 3 | Ananya Roy | 10 | 70000 | 2021-07-22 |
⚙️ Step 4: Why This Query Works
Pattern matching selects rows where the first character equals A. Use ILIKE for case-insensitive behavior if needed.
⚡ Step 5: Common Mistakes
- Forgetting
%. - Using
=(exact match) instead ofLIKE. - Not considering case sensitivity.
🧠 Step 6: Alternate Approach (Optional)
SELECT emp_id, name
FROM employees
WHERE SUBSTR(name,1,1) = 'A';Less flexible and may be slower than LIKE.
🎯 Interview Tip
Mention whether collation is case-sensitive in your DB and prefer ILIKE or LOWER(name) LIKE 'a%' accordingly.
🪄 End of Q2.
❓ Q3: Display unique department names from the employee table.
🧩 Problem Statement
Return distinct department names listed in employees (or a departments table). Use DISTINCT.
💾 Step 1: Assume Example Tables
employees
| emp_id | name | dept_name |
|---|---|---|
| 1 | Amit Sharma | Sales |
| 2 | Bhavna Rai | HR |
| 3 | Ananya Roy | Sales |
| 4 | Chetan Joshi | Engineering |
Columns:
dept_name: department label (could also be a foreign key todepartments).
🧮 Step 2: Step-by-Step Solution
🧠 Understanding the Problem
Concept: SELECT DISTINCT to remove duplicates.
🏗️ SQL Query
SELECT DISTINCT dept_name
FROM employees;🔍 Explanation
DISTINCT collapses duplicate department values so each name appears once.
🧾 Step 3: Expected Output
| dept_name |
|---|
| Sales |
| HR |
| Engineering |
⚙️ Step 4: Why This Query Works
DISTINCT deduplicates values after selection.
⚡ Step 5: Common Mistakes
- Selecting multiple columns with
DISTINCTwhen only one is desired. - Not trimming whitespace/case differences leading to duplicates.
🧠 Step 6: Alternate Approach (Optional)
SELECT dept_name
FROM employees
GROUP BY dept_name;Group by also returns unique dept names.
🎯 Interview Tip
If department IDs exist, prefer selecting ID + name to avoid ambiguity.
🪄 End of Q3.
❓ Q4: Fetch employees sorted by their joining date (latest first).
🧩 Problem Statement
Return employees ordered by joined_date descending so newest hires appear first.
💾 Step 1: Assume Example Tables
employees
| emp_id | name | joined_date |
|---|---|---|
| 1 | Amit | 2020-01-15 |
| 2 | Bhavna | 2022-03-10 |
| 3 | Ananya | 2024-07-01 |
| 4 | Chetan | 2021-09-05 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding the Problem
Concept: ORDER BY with DESC.
🏗️ SQL Query
SELECT emp_id, name, joined_date
FROM employees
ORDER BY joined_date DESC;🔍 Explanation
ORDER BY joined_date DESC sorts newest first.
🧾 Step 3: Expected Output
| emp_id | name | joined_date |
|---|---|---|
| 3 | Ananya | 2024-07-01 |
| 2 | Bhavna | 2022-03-10 |
| 4 | Chetan | 2021-09-05 |
| 1 | Amit | 2020-01-15 |
⚙️ Step 4: Why This Query Works
Ordering controls presentation; databases optimize sorting but large sorts can be costly—consider indexing if needed.
⚡ Step 5: Common Mistakes
- Forgetting
DESCand getting oldest first. - Sorting on string date instead of date datatype.
🧠 Step 6: Alternate Approach (Optional)
-- Show top 10 newest hires
SELECT emp_id, name, joined_date
FROM employees
ORDER BY joined_date DESC
LIMIT 10;🎯 Interview Tip
State whether joined_date is stored as DATE/TIMESTAMP to avoid string-sorting issues.
🪄 End of Q4.
❓ Q5: Get the top 10 highest-paid employees.
🧩 Problem Statement
Return the 10 employees with the largest salary. If tie-handling required, define an order.
💾 Step 1: Assume Example Tables
employees
| emp_id | name | salary |
|---|---|---|
| 1 | Amit Sharma | 120000 |
| 2 | Bhavna Rai | 90000 |
| 3 | Ananya Roy | 120000 |
| 4 | Chetan Joshi | 80000 |
| 5 | Deepa Shah | 75000 |
(Assume more rows beyond 5 in real table.)
🧮 Step 2: Step-by-Step Solution
🧠 Understanding the Problem
Concepts: ORDER BY + LIMIT (or window functions for ties).
🏗️ SQL Query
SELECT emp_id, name, salary
FROM employees
ORDER BY salary DESC
LIMIT 10;(If you want deterministic tie-break: ORDER BY salary DESC, emp_id ASC)
🔍 Explanation
Sort by salary descending, then take the first 10 rows.
🧾 Step 3: Expected Output (top few shown)
| emp_id | name | salary |
|---|---|---|
| 1 | Amit Sharma | 120000 |
| 3 | Ananya Roy | 120000 |
| 2 | Bhavna Rai | 90000 |
| 4 | Chetan Joshi | 80000 |
⚙️ Step 4: Why This Query Works
Ordering then limiting fetches top rows efficiently; use index on salary to improve speed.
⚡ Step 5: Common Mistakes
- Using
TOPsyntax in non-supporting DBs (useLIMITorFETCH FIRST). - Not specifying tie-breaker for reproducibility.
🧠 Step 6: Alternate Approach (Optional)
-- Using ROW_NUMBER to handle ties / pagination
WITH ranked AS (
SELECT emp_id, name, salary,
ROW_NUMBER() OVER (ORDER BY salary DESC, emp_id) AS rn
FROM employees
)
SELECT emp_id, name, salary
FROM ranked
WHERE rn <= 10;🎯 Interview Tip
Discuss tie-handling and how pagination will be implemented for large result sets.
🪄 End of Q5.
❓ Q6: Retrieve all transactions made after January 1, 2024.
🧩 Problem Statement
Return transactions with transaction_date strictly after 2024-01-01.
💾 Step 1: Assume Example Tables
transactions
| tx_id | customer_id | amount | transaction_date |
|---|---|---|---|
| 1 | 1 | 1200 | 2023-12-20 |
| 2 | 2 | 5500 | 2024-01-05 |
| 3 | 1 | 3000 | 2024-03-11 |
| 4 | 3 | 450 | 2022-08-10 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding the Problem
Concept: date comparison with WHERE.
🏗️ SQL Query
SELECT tx_id, customer_id, amount, transaction_date
FROM transactions
WHERE transaction_date > '2024-01-01';(If datetime column: '2024-01-01 00:00:00' as needed.)
🔍 Explanation
Rows with date strictly later than Jan 1, 2024 are returned.
🧾 Step 3: Expected Output
| tx_id | customer_id | amount | transaction_date |
|---|---|---|---|
| 2 | 2 | 5500 | 2024-01-05 |
| 3 | 1 | 3000 | 2024-03-11 |
⚙️ Step 4: Why This Query Works
Date comparisons use chronological ordering; ensure correct datatype.
⚡ Step 5: Common Mistakes
- Using
>= '2024-01-01'if you want strictly after. - Comparing strings if dates stored as text with inconsistent format.
🧠 Step 6: Alternate Approach (Optional)
-- Using DATE() conversion (if transaction_date includes time)
SELECT *
FROM transactions
WHERE DATE(transaction_date) > '2024-01-01';🎯 Interview Tip
Clarify whether “after” includes the date (>=) or strictly greater (>).
🪄 End of Q6.
❓ Q7: Find the total salary expense for each department.
🧩 Problem Statement
Group employees by department and sum salaries to get department-level salary expense.
💾 Step 1: Assume Example Tables
employees
| emp_id | name | dept_id | dept_name | salary |
|---|---|---|---|---|
| 1 | Amit | 10 | Sales | 60000 |
| 2 | Ananya | 10 | Sales | 70000 |
| 3 | Bhavna | 20 | HR | 55000 |
| 4 | Chetan | 30 | Eng | 80000 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding the Problem
Concepts: GROUP BY + aggregate SUM.
🏗️ SQL Query
SELECT dept_id, dept_name, SUM(salary) AS total_salary_expense
FROM employees
GROUP BY dept_id, dept_name;🔍 Explanation
Group rows by department and compute sum of salary per group.
🧾 Step 3: Expected Output
| dept_id | dept_name | total_salary_expense |
|---|---|---|
| 10 | Sales | 130000 |
| 20 | HR | 55000 |
| 30 | Eng | 80000 |
⚙️ Step 4: Why This Query Works
GROUP BY collapses rows into department buckets; SUM adds salaries within each bucket.
⚡ Step 5: Common Mistakes
- Omitting
dept_namefromGROUP BYwhen selected. - Using
COUNTinstead ofSUM.
🧠 Step 6: Alternate Approach (Optional)
-- With join to a departments table
SELECT d.dept_id, d.dept_name, COALESCE(SUM(e.salary),0) AS total_salary
FROM departments d
LEFT JOIN employees e
ON d.dept_id = e.dept_id
GROUP BY d.dept_id, d.dept_name;This includes departments with zero employees.
🎯 Interview Tip
Mention COALESCE to show departments with no employees return 0 instead of NULL.
🪄 End of Q7.
❓ Q8: Count how many employees belong to each department.
🧩 Problem Statement
For each department, count employees (i.e., headcount per dept).
💾 Step 1: Assume Example Tables
employees
| emp_id | name | dept_id | dept_name |
|---|---|---|---|
| 1 | Amit | 10 | Sales |
| 2 | Ananya | 10 | Sales |
| 3 | Bhavna | 20 | HR |
| 4 | Chetan | NULL | NULL |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding the Problem
Concept: COUNT(*) with GROUP BY. Decide whether to include NULL dept.
🏗️ SQL Query
SELECT dept_id, dept_name, COUNT(*) AS employee_count
FROM employees
WHERE dept_id IS NOT NULL
GROUP BY dept_id, dept_name;(To include NULLs as a separate group, remove the WHERE clause.)
🔍 Explanation
Group by dept and count rows per group.
🧾 Step 3: Expected Output
| dept_id | dept_name | employee_count |
|---|---|---|
| 10 | Sales | 2 |
| 20 | HR | 1 |
⚙️ Step 4: Why This Query Works
COUNT(*) counts rows in each department group.
⚡ Step 5: Common Mistakes
- Using
COUNT(dept_id)which ignores NULLs (sometimes desired). - Forgetting
GROUP BYfor non-aggregated columns.
🧠 Step 6: Alternate Approach (Optional)
-- Include departments with zero employees using departments table
SELECT d.dept_id, d.dept_name, COUNT(e.emp_id) AS employee_count
FROM departments d
LEFT JOIN employees e ON d.dept_id = e.dept_id
GROUP BY d.dept_id, d.dept_name;🎯 Interview Tip
Explain the difference between COUNT(*), COUNT(column), and COUNT(DISTINCT column).
🪄 End of Q8.
❓ Q9: Get departments with more than 5 employees.
🧩 Problem Statement
Find departments whose employee count exceeds 5.
💾 Step 1: Assume Example Tables
employees
(Only showing relevant counts.)
| dept_id | dept_name | employee_count (derived) |
|---|---|---|
| 10 | Sales | - |
| 20 | HR | - |
We will compute counts from employees.
🧮 Step 2: Step-by-Step Solution
🧠 Understanding the Problem
Concepts: GROUP BY + HAVING for filtering groups.
🏗️ SQL Query
SELECT dept_id, dept_name, COUNT(*) AS employee_count
FROM employees
GROUP BY dept_id, dept_name
HAVING COUNT(*) > 5;🔍 Explanation
HAVING filters aggregated groups by their computed count.
🧾 Step 3: Expected Output
| dept_id | dept_name | employee_count |
|---|---|---|
| 50 | Ops | 12 |
| 70 | Sales | 8 |
(Example departments meeting the criterion.)
⚙️ Step 4: Why This Query Works
WHERE cannot filter aggregated results; HAVING can.
⚡ Step 5: Common Mistakes
- Using
WHERE COUNT(*) > 5(invalid). - Forgetting
GROUP BYon selected non-aggregated columns.
🧠 Step 6: Alternate Approach (Optional)
-- Using subquery
SELECT dept_id, dept_name, employee_count
FROM (
SELECT dept_id, dept_name, COUNT(*) AS employee_count
FROM employees
GROUP BY dept_id, dept_name
) t
WHERE employee_count > 5;🎯 Interview Tip
Explain HAVING vs WHERE clearly before coding.
🪄 End of Q9.
❓ Q10: Find the average salary per job title.
🧩 Problem Statement
Group by job_title and compute average salary for each title.
💾 Step 1: Assume Example Tables
employees
| emp_id | name | job_title | salary |
|---|---|---|---|
| 1 | Amit | Sales Executive | 60000 |
| 2 | Ananya | Sales Executive | 70000 |
| 3 | Bhavna | HR Manager | 90000 |
| 4 | Chetan | Engineer | 80000 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding the Problem
Concept: AVG() aggregate with GROUP BY.
🏗️ SQL Query
SELECT job_title, ROUND(AVG(salary), 2) AS avg_salary
FROM employees
GROUP BY job_title;🔍 Explanation
AVG(salary) calculates the mean salary per job title. ROUND formats to 2 decimals.
🧾 Step 3: Expected Output
| job_title | avg_salary |
|---|---|
| Sales Executive | 65000.00 |
| HR Manager | 90000.00 |
| Engineer | 80000.00 |
⚙️ Step 4: Why This Query Works
Standard aggregation by category returns an average per job title.
⚡ Step 5: Common Mistakes
- Using integer division in some DBs (hence
ROUND). - Selecting non-aggregated columns without grouping.
🧠 Step 6: Alternate Approach (Optional)
-- Exclude zero or NULL salaries
SELECT job_title, AVG(salary) AS avg_salary
FROM employees
WHERE salary IS NOT NULL AND salary > 0
GROUP BY job_title;🎯 Interview Tip
Clarify how to treat NULL salaries and outliers before computing averages.
🪄 End of Q10.
❓ Q11: Retrieve total sales per region.
🧩 Problem Statement
Group transactions by region and sum amount to compute total sales per region.
💾 Step 1: Assume Example Tables
orders
| order_id | customer_id | region | amount |
|---|---|---|---|
| 1 | 1 | West | 1200 |
| 2 | 2 | North | 5500 |
| 3 | 1 | West | 3000 |
| 4 | 3 | South | 450 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding the Problem
Concept: GROUP BY region + SUM(amount).
🏗️ SQL Query
SELECT region, SUM(amount) AS total_sales
FROM orders
GROUP BY region;🔍 Explanation
Sums order amounts grouped by region.
🧾 Step 3: Expected Output
| region | total_sales |
|---|---|
| West | 4200 |
| North | 5500 |
| South | 450 |
⚙️ Step 4: Why This Query Works
Aggregates amounts per region to produce regional totals.
⚡ Step 5: Common Mistakes
- Forgetting
GROUP BY. - Not handling
NULLregion values (may want to group them separately).
🧠 Step 6: Alternate Approach (Optional)
SELECT COALESCE(region, 'Unknown') AS region, SUM(amount) AS total_sales
FROM orders
GROUP BY COALESCE(region, 'Unknown');🎯 Interview Tip
Discuss currency and timezone normalization if amounts come from different locales.
🪄 End of Q11.
❓ Q12: Count the total number of customers who placed at least one order.
🧩 Problem Statement
Count unique customers appearing in orders (customers with >=1 order).
💾 Step 1: Assume Example Tables
orders
| order_id | customer_id | amount |
|---|---|---|
| 1 | 1 | 1200 |
| 2 | 2 | 5500 |
| 3 | 1 | 3000 |
| 4 | 4 | 450 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding the Problem
Concept: COUNT(DISTINCT customer_id).
🏗️ SQL Query
SELECT COUNT(DISTINCT customer_id) AS customers_with_orders
FROM orders;🔍 Explanation
DISTINCT ensures each customer counted once, even with multiple orders.
🧾 Step 3: Expected Output
| customers_with_orders |
|---|
| 3 |
(Customers: 1, 2, and 4)
⚙️ Step 4: Why This Query Works
Counting distinct IDs gives unique customer count.
⚡ Step 5: Common Mistakes
- Using
COUNT(*)which counts rows (orders), not unique customers. - Counting distinct on wrong column.
🧠 Step 6: Alternate Approach (Optional)
SELECT COUNT(*) FROM (SELECT 1 FROM orders GROUP BY customer_id) t;Equivalent but less direct.
🎯 Interview Tip
If large tables, mention potential cost of DISTINCT and ways to optimize (indexes, pre-aggregated tables).
🪄 End of Q12.
❓ Q13: Find customers who made transactions above $5,000 more than once.
🧩 Problem Statement
Find customers who have at least two transactions where amount > 5000 (i.e., count of such transactions per customer > 1).
💾 Step 1: Assume Example Tables
transactions
| tx_id | customer_id | amount | tx_date |
|---|---|---|---|
| 1 | 1 | 6000 | 2024-02-01 |
| 2 | 2 | 7000 | 2024-03-01 |
| 3 | 1 | 5500 | 2024-04-01 |
| 4 | 3 | 4000 | 2024-05-01 |
| 5 | 2 | 2000 | 2024-06-01 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding the Problem
Concepts: filtering + grouping + HAVING.
🏗️ SQL Query
SELECT customer_id, COUNT(*) AS high_value_tx_count
FROM transactions
WHERE amount > 5000
GROUP BY customer_id
HAVING COUNT(*) > 1;🔍 Explanation
Filter transactions > 5000, then count per customer, and keep those with count > 1.
🧾 Step 3: Expected Output
| customer_id | high_value_tx_count |
|---|---|
| 1 | 2 |
(Customer 1 has two high-value transactions; customer 2 has only one >5000.)
⚙️ Step 4: Why This Query Works
WHERE narrows rows, GROUP BY groups per customer, HAVING filters groups by count.
⚡ Step 5: Common Mistakes
- Putting
amount > 5000inHAVINGwithout filtering first (works but less efficient). - Forgetting
GROUP BY.
🧠 Step 6: Alternate Approach (Optional)
SELECT customer_id
FROM (
SELECT customer_id,
SUM(CASE WHEN amount > 5000 THEN 1 ELSE 0 END) AS cnt
FROM transactions
GROUP BY customer_id
) t
WHERE cnt > 1;Useful when counting multiple conditional buckets.
🎯 Interview Tip
Discuss performance trade-offs when transactions is large: an index on amount or partitioning could help.
🪄 End of Q13.
❓ Q14: Identify the most profitable regions by total transaction amount.
🧩 Problem Statement
Compute total transaction amount per region and rank regions by descending totals to find top performers.
💾 Step 1: Assume Example Tables
transactions
| tx_id | region | amount |
|---|---|---|
| 1 | West | 12000 |
| 2 | North | 5500 |
| 3 | West | 3000 |
| 4 | South | 450 |
| 5 | North | 8000 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding the Problem
Concepts: aggregation + ordering + optional ranking.
🏗️ SQL Query
SELECT region, SUM(amount) AS total_amount
FROM transactions
GROUP BY region
ORDER BY total_amount DESC;(To get top N regions: add LIMIT N.)
🔍 Explanation
Sums amounts per region and sorts highest to lowest.
🧾 Step 3: Expected Output
| region | total_amount |
|---|---|
| West | 15000 |
| North | 13500 |
| South | 450 |
⚙️ Step 4: Why This Query Works
Group & sum reveals aggregated revenue per region; ORDER BY shows rank.
⚡ Step 5: Common Mistakes
- Forgetting
GROUP BY. - Not handling NULL regions.
🧠 Step 6: Alternate Approach (Optional)
-- Add rank column
SELECT region, total_amount,
RANK() OVER (ORDER BY total_amount DESC) AS region_rank
FROM (
SELECT region, SUM(amount) AS total_amount
FROM transactions
GROUP BY region
) t
ORDER BY total_amount DESC;🎯 Interview Tip
Discuss whether revenue or profit should be used; profit requires cost data.
🪄 End of Q14.
❓ Q15: Find customers who have not purchased anything in the last 6 months.
🧩 Problem Statement
Return customers with no orders/transactions in the past 6 months (relative to a reference date). For this example assume reference date = 2024-10-31 (today-like).
💾 Step 1: Assume Example Tables
customers
| customer_id | name | joined_date |
|---|---|---|
| 1 | Raj | 2021-05-10 |
| 2 | Aisha | 2022-02-01 |
| 3 | Sunil | 2023-07-22 |
orders
| order_id | customer_id | order_date |
|---|---|---|
| 1 | 1 | 2024-01-01 |
| 2 | 2 | 2024-08-01 |
| 3 | 1 | 2024-03-05 |
(Reference date for “last 6 months”: 2024-10-31. Six months before: 2024-04-30.)
🧮 Step 2: Step-by-Step Solution
🧠 Understanding the Problem
Concepts: anti-join (LEFT JOIN ... WHERE ... IS NULL) or NOT EXISTS. Date arithmetic needed.
🏗️ SQL Query (preferred NOT EXISTS)
SELECT c.customer_id, c.name
FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
AND o.order_date > DATE '2024-04-30' -- 6 months before 2024-10-31
);(If your DB has CURRENT_DATE, compute CURRENT_DATE - INTERVAL '6 months'.)
🔍 Explanation
For each customer, check if any order exists after the cutoff date. If none, return the customer.
🧾 Step 3: Expected Output
| customer_id | name |
|---|---|
| 3 | Sunil |
(Sunil has no orders after 2024-04-30.)
⚙️ Step 4: Why This Query Works
NOT EXISTS efficiently filters customers with zero qualifying orders.
⚡ Step 5: Common Mistakes
- Using
LEFT JOINincorrectly and forgettingWHERE o.order_id IS NULL. - Wrong date arithmetic or inclusive/exclusive boundary confusion.
🧠 Step 6: Alternate Approach (Optional)
-- LEFT JOIN method
SELECT c.customer_id, c.name
FROM customers c
LEFT JOIN orders o
ON c.customer_id = o.customer_id
AND o.order_date > DATE '2024-04-30'
WHERE o.order_id IS NULL;🎯 Interview Tip
Clarify the reference date and whether “6 months” is exact calendar months or 180 days.
🪄 End of Q15.
❓ Q16: Calculate ARPU (Average Revenue Per User).
🧩 Problem Statement
ARPU = (total revenue in period) / (number of active users in period). For simplicity, compute for all-time: total transaction amount divided by number of unique customers who made at least one transaction.
💾 Step 1: Assume Example Tables
transactions
| tx_id | customer_id | amount |
|---|---|---|
| 1 | 1 | 1000 |
| 2 | 2 | 2000 |
| 3 | 1 | 3000 |
| 4 | 3 | 500 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding the Problem
Compute aggregate sum and distinct count, then divide. Use NULLIF to prevent division by zero.
🏗️ SQL Query
SELECT
ROUND(SUM(amount) / NULLIF(COUNT(DISTINCT customer_id), 0), 2) AS arpu
FROM transactions;🔍 Explanation
SUM(amount)total revenue.COUNT(DISTINCT customer_id)active users.NULLIF(...,0)avoids division by zero if no customers.
🧾 Step 3: Expected Output
| arpu |
|---|
| 2166.67 |
(Total revenue = 6500; unique customers = 3 → 6500/3 ≈ 2166.67)
⚙️ Step 4: Why This Query Works
Direct arithmetic on aggregates yields the average revenue per active user.
⚡ Step 5: Common Mistakes
- Dividing by COUNT(*) (orders) instead of distinct users.
- Not handling zero users.
🧠 Step 6: Alternate Approach (Optional)
Compute ARPU for a specific period using date filters:
SELECT ROUND(SUM(amount) / NULLIF(COUNT(DISTINCT customer_id), 0), 2) AS arpu
FROM transactions
WHERE transaction_date BETWEEN '2024-01-01' AND '2024-09-30';🎯 Interview Tip
Define active user (e.g., made at least one purchase in period) before coding.
🪄 End of Q16.
❓ Q17: Calculate the monthly customer retention rate.
🧩 Problem Statement
Monthly retention rate = (customers active this month who were also active previous month) / (customers active previous month). We’ll compute for two months example: previous month = Sept 2024, current month = Oct 2024.
💾 Step 1: Assume Example Tables
orders
| order_id | customer_id | order_date |
|---|---|---|
| 1 | 1 | 2024-09-10 |
| 2 | 2 | 2024-09-20 |
| 3 | 1 | 2024-10-05 |
| 4 | 3 | 2024-10-15 |
| 5 | 2 | 2024-08-01 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding the Problem
We need sets: users active in previous month (Sept) and current month (Oct), then compute intersection size / previous month size.
🏗️ SQL Query
WITH prev_month AS (
SELECT DISTINCT customer_id
FROM orders
WHERE order_date BETWEEN '2024-09-01' AND '2024-09-30'
),
curr_month AS (
SELECT DISTINCT customer_id
FROM orders
WHERE order_date BETWEEN '2024-10-01' AND '2024-10-31'
),
counts AS (
SELECT
(SELECT COUNT(*) FROM prev_month) AS prev_count,
(SELECT COUNT(*) FROM curr_month) AS curr_count,
(SELECT COUNT(*) FROM prev_month pm
JOIN curr_month cm ON pm.customer_id = cm.customer_id) AS retained_count
)
SELECT
prev_count,
curr_count,
retained_count,
CASE WHEN prev_count = 0 THEN NULL
ELSE ROUND((retained_count::numeric / prev_count) * 100, 2)
END AS retention_rate_percent
FROM counts;(For DBs without ::numeric, use CAST.)
🔍 Explanation
- Gather distinct customers per month.
- Count previous, current, and intersection.
- Retention = retained / prev * 100.
🧾 Step 3: Expected Output
| prev_count | curr_count | retained_count | retention_rate_percent |
|---|---|---|---|
| 2 | 2 | 1 | 50.00 |
(Prev month customers: 1 and 2; current: 1 and 3; retained is customer 1 → 1/2 = 50%)
⚙️ Step 4: Why This Query Works
Set operations via CTEs create clear intermediate results; join computes intersection.
⚡ Step 5: Common Mistakes
- Forgetting DISTINCT leading to inflated counts.
- Using current month size as denominator instead of previous month.
🧠 Step 6: Alternate Approach (Optional)
Compute rolling retention for many months by month buckets and lagging; more advanced but same idea.
🎯 Interview Tip
Explain definition of “active” and whether to use purchases, logins, or other events to define activity.
🪄 End of Q17.
❓ Q18: Identify churned customers (inactive in the last 6 months).
🧩 Problem Statement
Return customers with no orders in the past 6 months (similar to Q15 but labelled as churn). Use a reference date (assume 2024-10-31 → cutoff 2024-04-30).
💾 Step 1: Assume Example Tables
customers
| customer_id | name |
|---|---|
| 1 | Raj |
| 2 | Aisha |
| 3 | Sunil |
| 4 | Priya |
orders
| order_id | customer_id | order_date |
|---|---|---|
| 1 | 1 | 2024-01-10 |
| 2 | 2 | 2023-12-05 |
| 3 | 3 | 2024-05-01 |
| 4 | 4 | 2024-03-15 |
Cutoff: 2024-04-30. Customers with last order date <= cutoff are churned if no orders after cutoff.
🧮 Step 2: Step-by-Step Solution
🧠 Understanding the Problem
Find customers whose most recent order is older than the cutoff or who never ordered.
🏗️ SQL Query
WITH last_order AS (
SELECT customer_id, MAX(order_date) AS last_order_date
FROM orders
GROUP BY customer_id
)
SELECT c.customer_id, c.name, lo.last_order_date
FROM customers c
LEFT JOIN last_order lo ON c.customer_id = lo.customer_id
WHERE COALESCE(lo.last_order_date, DATE '1900-01-01') <= DATE '2024-04-30';(Here COALESCE treats customers with no orders as churned.)
🔍 Explanation
last_orderfinds each customer’s latest order date.- Left join to include customers without orders.
- Filter where last order is on/before cutoff → churned.
🧾 Step 3: Expected Output
| customer_id | name | last_order_date |
|---|---|---|
| 1 | Raj | 2024-01-10 |
| 2 | Aisha | 2023-12-05 |
| 4 | Priya | 2024-03-15 |
(Customer 3 had order on 2024-05-01 > cutoff, so not churned.)
⚙️ Step 4: Why This Query Works
LAST order date compares recency per customer; COALESCE handles never-ordered customers.
⚡ Step 5: Common Mistakes
- Using
WHEREonordersbefore aggregation, which can drop customers entirely. - Not handling customers with no orders.
🧠 Step 6: Alternate Approach (Optional)
-- Using NOT EXISTS to find customers with no orders after cutoff
SELECT c.customer_id, c.name
FROM customers c
WHERE NOT EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.customer_id
AND o.order_date > DATE '2024-04-30'
);🎯 Interview Tip
State churn definition clearly: “no purchase in X period” vs “decreasing engagement”, and adjust query accordingly.
🪄 End of Q18.
———
👩💼 Data Analyst
❓ Q1: Combine employee and department data using an INNER JOIN.
🧩 Problem Statement
Show each employee’s name along with their department name. Only include employees who belong to an existing department.
💾 Step 1: Assume Example Tables
employees
| emp_id | name | dept_id |
|---|---|---|
| 1 | Amit Sharma | 10 |
| 2 | Ananya Roy | 20 |
| 3 | Bhavna Rai | 30 |
| 4 | Chetan Joshi | 40 |
departments
| dept_id | dept_name |
|---|---|
| 10 | Sales |
| 20 | HR |
| 30 | Engineering |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Concept: INNER JOIN returns rows where a match exists in both tables.
🏗️ SQL Query
SELECT e.emp_id,
e.name,
d.dept_name
FROM employees e
INNER JOIN departments d
ON e.dept_id = d.dept_id;🔍 Explanation
eanddare aliases for clarity.- Only employees whose
dept_idappears indepartmentsare shown.
🧾 Step 3: Expected Output
| emp_id | name | dept_name |
|---|---|---|
| 1 | Amit Sharma | Sales |
| 2 | Ananya Roy | HR |
| 3 | Bhavna Rai | Engineering |
Chetan (Dept 40) is excluded — no matching department.
⚙️ Step 4: Why This Query Works
INNER JOIN merges only intersecting rows by dept_id, giving valid employee-department pairs.
⚡ Step 5: Common Mistakes
- Forgetting the
ONcondition → Cartesian join. - Selecting columns with same name without prefix → ambiguity.
🧠 Step 6: Alternate Approach
SELECT e.*, d.dept_name
FROM employees e, departments d
WHERE e.dept_id = d.dept_id;Old-style syntax — functionally same but less readable.
🎯 Interview Tip
Mention that INNER JOIN filters unmatched rows, ideal for integrity-checked relations.
🔍 Interviewer’s Analysis
- Concept Tested: INNER JOIN
- Difficulty: Beginner
- Complexity: ≈ O(n × m) without index; O(n log m) with index
- Follow-Ups: Explain LEFT JOIN vs INNER JOIN; Indexing strategy on
dept_id.
🪄 End of Q1.
❓ Q2: List all employees with their departments (even if unassigned) using a LEFT JOIN.
🧩 Problem Statement
Return every employee and their department name. If an employee has no department, still include them with NULL as department.
💾 Step 1: Assume Example Tables
Same as Q1 but keep Chetan’s department 40 missing.
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
LEFT JOIN keeps all rows from the left table (employees), even when no match on the right.
🏗️ SQL Query
SELECT e.emp_id,
e.name,
d.dept_name
FROM employees e
LEFT JOIN departments d
ON e.dept_id = d.dept_id;🔍 Explanation
All employees appear; unmatched ones show NULL for department.
🧾 Step 3: Expected Output
| emp_id | name | dept_name |
|---|---|---|
| 1 | Amit Sharma | Sales |
| 2 | Ananya Roy | HR |
| 3 | Bhavna Rai | Engineering |
| 4 | Chetan Joshi | NULL |
⚙️ Step 4: Why This Query Works
LEFT JOIN includes all employees (left table), filling missing matches with NULL.
⚡ Step 5: Common Mistakes
- Using
INNER JOINinstead → drops unassigned employees. - Forgetting to check for
NULLwhen filtering post-join.
🧠 Step 6: Alternate Approach
SELECT e.emp_id, e.name,
COALESCE(d.dept_name, 'Unassigned') AS dept_name
FROM employees e
LEFT JOIN departments d
ON e.dept_id = d.dept_id;COALESCE replaces NULL with “Unassigned”.
🎯 Interview Tip
Highlight that LEFT JOIN preserves completeness of the left entity — vital in reports.
🔍 Interviewer’s Analysis
- Concept Tested: LEFT JOIN / NULL handling
- Difficulty: Beginner → Intermediate
- Follow-Ups: How to show departments with no employees (→ RIGHT JOIN).
🪄 End of Q2.
❓ Q3: Retrieve all data from both tables with a FULL OUTER JOIN.
🧩 Problem Statement
Show every employee and every department. Include:
- employees without departments
- departments without employees.
💾 Step 1: Assume Example Tables
employees
| emp_id | name | dept_id |
|---|---|---|
| 1 | Amit | 10 |
| 2 | Ananya | 20 |
| 3 | Chetan | 40 |
departments
| dept_id | dept_name |
|---|---|
| 10 | Sales |
| 20 | HR |
| 30 | Engineering |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
FULL OUTER JOIN = LEFT JOIN UNION RIGHT JOIN. It keeps all rows from both sides.
🏗️ SQL Query
SELECT e.emp_id,
e.name,
e.dept_id AS emp_dept,
d.dept_name
FROM employees e
FULL OUTER JOIN departments d
ON e.dept_id = d.dept_id;🔍 Explanation
- Matches where possible.
- When no match, one side’s columns → NULL.
🧾 Step 3: Expected Output
| emp_id | name | emp_dept | dept_name |
|---|---|---|---|
| 1 | Amit | 10 | Sales |
| 2 | Ananya | 20 | HR |
| 3 | Chetan | 40 | NULL |
| NULL | NULL | 30 | Engineering |
⚙️ Step 4: Why This Query Works
FULL OUTER JOIN merges the results of left and right joins, ensuring nothing is lost.
⚡ Step 5: Common Mistakes
- Using DB that doesn’t support
FULL OUTER JOIN(PostgreSQL yes, MySQL no → needUNION). - Forgetting to distinguish which
dept_idcomes from which table.
🧠 Step 6: Alternate Approach
SELECT e.emp_id, e.name, e.dept_id AS emp_dept, d.dept_name
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id
UNION
SELECT e.emp_id, e.name, e.dept_id, d.dept_name
FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.dept_id;Equivalent manual implementation.
🎯 Interview Tip
Explain which DBs lack native FULL OUTER JOIN and how to emulate it.
🔍 Interviewer’s Analysis
- Concept Tested: FULL OUTER JOIN / Set operations
- Difficulty: Intermediate
- Follow-Ups: “Show only unmatched rows” (→ use
WHERE dept_name IS NULL OR name IS NULL).
❓ Q4: Perform a SELF JOIN to show employees along with their managers.
🧩 Problem Statement
Display each employee’s name together with their manager’s name and department.
💾 Step 1: Assume Example Tables
employees
| emp_id | name | dept_id | manager_id |
|---|---|---|---|
| 1 | Amit Sharma | 10 | 4 |
| 2 | Ananya Roy | 10 | 4 |
| 3 | Bhavna Rai | 20 | 5 |
| 4 | David Kumar | 10 | NULL |
| 5 | Erin Singh | 20 | NULL |
manager_idpoints to another row’semp_id— hence “self” join.
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Concept tested: SELF JOIN — joining a table to itself with aliases.
🏗️ SQL Query
SELECT e.name AS employee_name,
m.name AS manager_name,
e.dept_id
FROM employees e
LEFT JOIN employees m
ON e.manager_id = m.emp_id;🔍 Explanation
e= employee copy,m= manager copy.LEFT JOINkeeps top-level managers (NULL manager id).
🧾 Step 3: Expected Output
| employee_name | manager_name | dept_id |
|---|---|---|
| Amit Sharma | David Kumar | 10 |
| Ananya Roy | David Kumar | 10 |
| Bhavna Rai | Erin Singh | 20 |
| David Kumar | NULL | 10 |
| Erin Singh | NULL | 20 |
⚙️ Step 4: Why This Query Works
The same table is scanned twice — once as employees, once as managers — matching on manager_id = emp_id.
⚡ Step 5: Common Mistakes
- Forgetting aliases → ambiguous column error.
- Using INNER JOIN → managers only (drops top bosses).
🧠 Step 6: Alternate Approach
SELECT e.name, m.name AS manager_name
FROM employees e
JOIN employees m USING (emp_id)(Not valid here — shows why aliases + explicit join condition are required.)
🎯 Interview Tip
Explain clearly the difference between self-join and hierarchical CTE (for multi-level chains).
🔍 Interviewer’s Analysis
- Concept Tested: Self-Join / Aliasing
- Difficulty: Intermediate
- Follow-Ups: Find multi-level chain; show manager count per manager.
🪄 End of Q4.
❓ Q5: Find employees who do not belong to any department.
🧩 Problem Statement
Return employees whose dept_id does not exist in the departments table.
💾 Step 1: Assume Example Tables
employees
| emp_id | name | dept_id |
|---|---|---|
| 1 | Amit Sharma | 10 |
| 2 | Ananya Roy | 20 |
| 3 | Bhavna Rai | 40 |
| 4 | Chetan Joshi | NULL |
departments
| dept_id | dept_name |
|---|---|
| 10 | Sales |
| 20 | HR |
| 30 | Engineering |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Concept: Anti-Join using LEFT JOIN … IS NULL or NOT IN.
🏗️ SQL Query
SELECT e.emp_id, e.name
FROM employees e
LEFT JOIN departments d
ON e.dept_id = d.dept_id
WHERE d.dept_id IS NULL;🔍 Explanation
LEFT JOINkeeps all employees.- The
WHERE d.dept_id IS NULLcondition isolates those without a matching department.
🧾 Step 3: Expected Output
| emp_id | name |
|---|---|
| 3 | Bhavna Rai |
| 4 | Chetan Joshi |
⚙️ Step 4: Why This Query Works
Unmatched rows (from left table) show NULLs on the right — filtering those identifies missing links.
⚡ Step 5: Common Mistakes
- Using
INNER JOIN(returns only matched rows). - Using
NOT INwithout handling NULLs → no rows returned if NULL exists in subquery.
🧠 Step 6: Alternate Approach
SELECT emp_id, name
FROM employees
WHERE dept_id NOT IN (SELECT dept_id FROM departments WHERE dept_id IS NOT NULL);🎯 Interview Tip
If interviewer asks “how to optimize,” mention that an anti-join using NOT EXISTS often beats NOT IN.
🔍 Interviewer’s Analysis
- Concept Tested: Left Anti-Join
- Difficulty: Intermediate
- Follow-Ups: Handle NULL values in subqueries; compare NOT IN vs NOT EXISTS.
🪄 End of Q5.
❓ Q6: Display products purchased by customers from more than one country.
🧩 Problem Statement
Find products that have been bought by customers in multiple countries, i.e., count of distinct countries > 1.
💾 Step 1: Assume Example Tables
customers
| customer_id | name | country |
|---|---|---|
| 1 | Raj | India |
| 2 | Aisha | India |
| 3 | John | USA |
| 4 | Maria | UK |
orders
| order_id | customer_id | product_id |
|---|---|---|
| 1 | 1 | 100 |
| 2 | 2 | 200 |
| 3 | 3 | 100 |
| 4 | 4 | 300 |
| 5 | 3 | 200 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Concepts: JOIN → GROUP BY → COUNT(DISTINCT) with HAVING.
🏗️ SQL Query
SELECT o.product_id,
COUNT(DISTINCT c.country) AS country_count
FROM orders o
JOIN customers c
ON o.customer_id = c.customer_id
GROUP BY o.product_id
HAVING COUNT(DISTINCT c.country) > 1;🔍 Explanation
- Join orders ↔ customers to access country.
- Group by product and count unique countries.
- Keep only those sold in > 1 country.
🧾 Step 3: Expected Output
| product_id | country_count |
|---|---|
| 100 | 2 |
| 200 | 2 |
(Product 100 bought in India & USA; Product 200 in India & USA.)
⚙️ Step 4: Why This Query Works
Aggregation over joined data computes cross-customer metrics (“distinct countries per product”).
⚡ Step 5: Common Mistakes
- Forgetting
DISTINCT→ over-count. - Filtering with
WHEREinstead ofHAVING.
🧠 Step 6: Alternate Approach
-- Using a CTE for clarity
WITH product_countries AS (
SELECT o.product_id, c.country
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
GROUP BY o.product_id, c.country
)
SELECT product_id
FROM product_countries
GROUP BY product_id
HAVING COUNT(*) > 1;🎯 Interview Tip
Discuss indexing (customer_id, product_id) for large joins and explain why COUNT(DISTINCT) is CPU-intensive.
🔍 Interviewer’s Analysis
- Concept Tested: Join + Aggregation + Distinct
- Difficulty: Intermediate
- Follow-Ups: Optimize COUNT(DISTINCT); filter by time period.
🪄 End of Q6.
❓ Q7: Find employees who earn more than the average salary.
🧩 Problem Statement
List employees whose salary is greater than the overall average salary across the company.
💾 Step 1: Assume Example Tables
employees
| emp_id | name | dept_id | salary |
|---|---|---|---|
| 1 | Amit Sharma | 10 | 60000 |
| 2 | Ananya Roy | 10 | 75000 |
| 3 | Bhavna Rai | 20 | 90000 |
| 4 | Chetan Joshi | 20 | 50000 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Concept tested: Subquery returning a single value (the company-wide average).
🏗️ SQL Query
SELECT name, salary
FROM employees
WHERE salary > (
SELECT AVG(salary)
FROM employees
);🔍 Explanation
- The inner query computes one scalar — the average salary.
- The outer query filters all employees whose
salary> that value.
🧾 Step 3: Expected Output
| name | salary |
|---|---|
| Ananya Roy | 75000 |
| Bhavna Rai | 90000 |
Average salary = (60000 + 75000 + 90000 + 50000) / 4 = 68750.
⚙️ Step 4: Why This Query Works
The scalar subquery executes once; each row compares its salary to the result.
⚡ Step 5: Common Mistakes
- Forgetting parentheses around subquery.
- Trying to compare multiple rows without aggregation → “subquery returns more than one row.”
🧠 Step 6: Alternate Approach
-- Using a window function (no subquery)
SELECT name, salary
FROM (
SELECT name, salary, AVG(salary) OVER () AS avg_sal
FROM employees
) t
WHERE salary > avg_sal;More modern and efficient.
🎯 Interview Tip
If asked, mention that the window function version avoids recalculating averages repeatedly.
🔍 Interviewer’s Analysis
- Concept Tested: Scalar subquery / Aggregation
- Difficulty: Beginner → Intermediate
- Follow-Ups: Compare performance between subquery and window approach.
🪄 End of Q7.
❓ Q8: Fetch the second-highest salary using subquery and DENSE_RANK.
🧩 Problem Statement
Retrieve the employee(s) with the second-highest salary — including ties.
💾 Step 1: Assume Example Tables
employees
| emp_id | name | salary |
|---|---|---|
| 1 | Amit Sharma | 60000 |
| 2 | Ananya Roy | 75000 |
| 3 | Bhavna Rai | 90000 |
| 4 | Chetan Joshi | 90000 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
We can either:
- Use a subquery with
MAXand<>logic, or - Use
DENSE_RANK()window function.
We’ll demonstrate both.
🏗️ SQL Query (Approach 1 — Subquery)
SELECT name, salary
FROM employees
WHERE salary = (
SELECT MAX(salary)
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees)
);🔍 Explanation
- Inner-most: Finds highest salary (90000).
- Middle: Finds max salary below that (75000).
- Outer: Returns employees with that salary.
🏗️ SQL Query (Approach 2 — Using DENSE_RANK)
SELECT name, salary
FROM (
SELECT name, salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
FROM employees
) t
WHERE rnk = 2;🔍 Explanation
DENSE_RANK()assigns same rank for duplicates.- We select rows with rank = 2 → second-highest.
🧾 Step 3: Expected Output
| name | salary |
|---|---|
| Ananya Roy | 75000 |
⚙️ Step 4: Why This Query Works
Both methods logically identify the “runner-up” salary value and filter for matching rows.
⚡ Step 5: Common Mistakes
- Using
LIMIT 2incorrectly → returns top two, not specifically second. - Using
ROW_NUMBER(which skips ties).
🎯 Interview Tip
Explain how DENSE_RANK handles duplicates gracefully; ROW_NUMBER would not.
🔍 Interviewer’s Analysis
- Concept Tested: Subqueries / Ranking Functions
- Difficulty: Intermediate
- Follow-Ups: Compare performance; show Nth-highest.
🪄 End of Q8.
❓ Q9: Retrieve products that have never been sold.
🧩 Problem Statement
Find products in the catalog that do not appear in the sales table.
💾 Step 1: Assume Example Tables
products
| product_id | product_name |
|---|---|
| 100 | Laptop |
| 101 | Phone |
| 102 | Tablet |
| 103 | Smartwatch |
sales
| sale_id | product_id | amount |
|---|---|---|
| 1 | 100 | 50000 |
| 2 | 101 | 20000 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Concept: Anti-Join / NOT EXISTS to find unmatched rows.
🏗️ SQL Query
SELECT p.product_id, p.product_name
FROM products p
WHERE NOT EXISTS (
SELECT 1
FROM sales s
WHERE s.product_id = p.product_id
);🔍 Explanation
The subquery checks if a product appears in sales.
If no match exists, it’s unsold.
🧾 Step 3: Expected Output
| product_id | product_name |
|---|---|
| 102 | Tablet |
| 103 | Smartwatch |
⚙️ Step 4: Why This Query Works
NOT EXISTS efficiently excludes all products present in sales.
⚡ Step 5: Common Mistakes
- Using
NOT INwithout NULL handling (fails ifsales.product_idhas NULL). - Using
INNER JOINinstead of anti-join.
🧠 Step 6: Alternate Approach
SELECT p.product_id, p.product_name
FROM products p
LEFT JOIN sales s ON p.product_id = s.product_id
WHERE s.product_id IS NULL;🎯 Interview Tip
Mention that NOT EXISTS is typically faster than LEFT JOIN for large datasets due to early exits.
🔍 Interviewer’s Analysis
- Concept Tested: Anti-Join / Exclusion Logic
- Difficulty: Intermediate
- Follow-Ups: Performance difference between
NOT EXISTSvsLEFT JOIN.
🪄 End of Q9.
❓ Q10: Display customers who never placed an order.
🧩 Problem Statement
List all customers who do not appear in the orders table.
💾 Step 1: Assume Example Tables
customers
| customer_id | name |
|---|---|
| 1 | Raj |
| 2 | Aisha |
| 3 | Sunil |
| 4 | Priya |
orders
| order_id | customer_id | amount |
|---|---|---|
| 1 | 1 | 1000 |
| 2 | 2 | 2000 |
| 3 | 1 | 1500 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Concept: Anti-Join / NOT EXISTS again, applied on customers ↔ orders.
🏗️ SQL Query
SELECT c.customer_id, c.name
FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);🔍 Explanation
Each customer checked against the orders table.
Only those with no orders are returned.
🧾 Step 3: Expected Output
| customer_id | name |
|---|---|
| 3 | Sunil |
| 4 | Priya |
⚙️ Step 4: Why This Query Works
The correlated subquery ensures exclusion only when there are no matches.
⚡ Step 5: Common Mistakes
- Using
NOT INdirectly → issues with NULL values. - Forgetting correlation (
o.customer_id = c.customer_id).
🧠 Step 6: Alternate Approach
SELECT c.customer_id, c.name
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL;🎯 Interview Tip
In large-scale data, NOT EXISTS tends to outperform joins when proper indexes exist.
🔍 Interviewer’s Analysis
- Concept Tested: Correlated Subquery / Anti-Join
- Difficulty: Beginner → Intermediate
- Follow-Ups: Add condition for “no order in last 6 months”.
🪄 End of Q10.
❓ Q11: Identify departments that don’t have any employees hired after 2020.
🧩 Problem Statement
Find all departments where no employee joined after the year 2020.
💾 Step 1: Assume Example Tables
employees
| emp_id | name | dept_id | hire_date |
|---|---|---|---|
| 1 | Amit Sharma | 10 | 2019-05-01 |
| 2 | Ananya Roy | 10 | 2021-03-01 |
| 3 | Bhavna Rai | 20 | 2018-06-10 |
| 4 | Chetan Joshi | 30 | 2017-01-15 |
departments
| dept_id | dept_name |
|---|---|
| 10 | Sales |
| 20 | HR |
| 30 | Tech |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Concept: Use NOT EXISTS to exclude departments with any recent hire.
🏗️ SQL Query
SELECT d.dept_id, d.dept_name
FROM departments d
WHERE NOT EXISTS (
SELECT 1
FROM employees e
WHERE e.dept_id = d.dept_id
AND e.hire_date > '2020-12-31'
);🔍 Explanation
- The subquery searches for employees hired after 2020 in that department.
- If none exist → department qualifies.
🧾 Step 3: Expected Output
| dept_id | dept_name |
|---|---|
| 20 | HR |
| 30 | Tech |
Sales excluded (has Ananya hired in 2021).
⚙️ Step 4: Why This Query Works
NOT EXISTS removes departments with any post-2020 hire.
⚡ Step 5: Common Mistakes
- Using
HAVINGincorrectly (not applicable). - Mixing up
WHERE hire_date <vs>logic.
🧠 Step 6: Alternate Approach
SELECT d.dept_id, d.dept_name
FROM departments d
LEFT JOIN employees e
ON d.dept_id = e.dept_id AND e.hire_date > '2020-12-31'
WHERE e.emp_id IS NULL;🎯 Interview Tip
State clearly why you used NOT EXISTS: it ensures logical clarity and performance.
🔍 Interviewer’s Analysis
- Concept Tested: Correlated Subquery / Temporal Filtering
- Difficulty: Intermediate
- Follow-Ups: How to adapt for “no new hires in past N months.”
🪄 End of Q11.
❓ Q12: Retrieve customers whose total purchases exceed the overall customer average.
🧩 Problem Statement
Find customers whose total spend (sum of order amounts) is greater than the average of all customers’ total spend.
💾 Step 1: Assume Example Tables
orders
| order_id | customer_id | amount |
|---|---|---|
| 1 | 1 | 1000 |
| 2 | 1 | 2000 |
| 3 | 2 | 500 |
| 4 | 3 | 4000 |
| 5 | 3 | 1000 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Two levels of aggregation:
- Inner query: total per customer.
- Outer query: compare each total to overall average.
🏗️ SQL Query
SELECT customer_id, total_amount
FROM (
SELECT customer_id, SUM(amount) AS total_amount
FROM orders
GROUP BY customer_id
) t
WHERE total_amount > (
SELECT AVG(total_amount)
FROM (
SELECT customer_id, SUM(amount) AS total_amount
FROM orders
GROUP BY customer_id
) sub
);🔍 Explanation
- Inner aggregation: computes
SUM(amount)per customer. - Outer query compares each to average of those totals.
🧾 Step 3: Expected Output
| customer_id | total_amount |
|---|---|
| 3 | 5000 |
| 1 | 3000 |
Average customer total = (3000 + 500 + 5000)/3 = 2833.33.
⚙️ Step 4: Why This Query Works
It’s a double subquery: one for per-customer totals, one for the global average.
⚡ Step 5: Common Mistakes
- Comparing to
AVG(amount)directly → wrong (compares per-order, not per-customer). - Forgetting aliasing in subqueries.
🧠 Step 6: Alternate Approach
WITH customer_totals AS (
SELECT customer_id, SUM(amount) AS total_amount
FROM orders
GROUP BY customer_id
)
SELECT customer_id, total_amount
FROM customer_totals
WHERE total_amount > (SELECT AVG(total_amount) FROM customer_totals);CTE improves readability.
🎯 Interview Tip
Explain trade-off between nested subquery and CTE — same result, but CTE improves maintainability.
🔍 Interviewer’s Analysis
- Concept Tested: Aggregation within subqueries / CTEs
- Difficulty: Intermediate
- Follow-Ups: Optimize with temporary tables or materialized views.
🪄 End of Q12.
❓ Q13: Assign ranks to employees by salary within each department.
🧩 Problem Statement
Rank employees based on salary within their department, so that the highest earner in each department gets rank 1.
💾 Step 1: Assume Example Tables
employees
| emp_id | name | dept_id | salary |
|---|---|---|---|
| 1 | Amit Sharma | 10 | 60000 |
| 2 | Ananya Roy | 10 | 80000 |
| 3 | Bhavna Rai | 20 | 75000 |
| 4 | Chetan Joshi | 20 | 72000 |
| 5 | Deepa Singh | 20 | 72000 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Concept: Window function — RANK() or DENSE_RANK() partitioned by department.
🏗️ SQL Query
SELECT emp_id,
name,
dept_id,
salary,
DENSE_RANK() OVER (
PARTITION BY dept_id
ORDER BY salary DESC
) AS dept_rank
FROM employees;🔍 Explanation
PARTITION BY dept_id= restart rank per department.ORDER BY salary DESC= rank by salary descending.DENSE_RANK()assigns same rank to ties (vsRANK()adds gaps).
🧾 Step 3: Expected Output
| emp_id | name | dept_id | salary | dept_rank |
|---|---|---|---|---|
| 2 | Ananya Roy | 10 | 80000 | 1 |
| 1 | Amit Sharma | 10 | 60000 | 2 |
| 3 | Bhavna Rai | 20 | 75000 | 1 |
| 4 | Chetan Joshi | 20 | 72000 | 2 |
| 5 | Deepa Singh | 20 | 72000 | 2 |
⚙️ Step 4: Why This Query Works
Each department acts as its own mini-table; ranks are computed per partition.
⚡ Step 5: Common Mistakes
- Forgetting
PARTITION BY→ global ranking instead of per dept. - Using
ROW_NUMBER()when ties should share rank.
🧠 Step 6: Alternate Approach
-- Using ROW_NUMBER to get unique ordering
SELECT emp_id, name, dept_id, salary,
ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS dept_rank
FROM employees;Ensures unique rank per employee.
🎯 Interview Tip
Explain the difference between RANK(), DENSE_RANK(), and ROW_NUMBER() — interviewers love this follow-up.
🔍 Interviewer’s Analysis
- Concept Tested: Ranking & Partitioning
- Difficulty: Intermediate
- Follow-Ups: Explain how to show only top-3 per dept using a subquery filter.
🪄 End of Q13.
❓ Q14: Calculate cumulative (running) sales per employee.
🧩 Problem Statement
Show each employee’s cumulative sales over time, ordered by transaction date.
💾 Step 1: Assume Example Tables
sales
| sale_id | emp_id | sale_date | amount |
|---|---|---|---|
| 1 | 1 | 2024-01-01 | 1000 |
| 2 | 1 | 2024-02-01 | 2000 |
| 3 | 2 | 2024-01-15 | 1500 |
| 4 | 2 | 2024-03-01 | 2500 |
| 5 | 1 | 2024-03-05 | 3000 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Concept: Cumulative total via SUM() OVER (ORDER BY …) partitioned by employee.
🏗️ SQL Query
SELECT emp_id,
sale_date,
amount,
SUM(amount) OVER (
PARTITION BY emp_id
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_sales
FROM sales;🔍 Explanation
SUM() OVER()calculates total up to current row per employee.- The
ROWSclause ensures true running sum.
🧾 Step 3: Expected Output
| emp_id | sale_date | amount | cumulative_sales |
|---|---|---|---|
| 1 | 2024-01-01 | 1000 | 1000 |
| 1 | 2024-02-01 | 2000 | 3000 |
| 1 | 2024-03-05 | 3000 | 6000 |
| 2 | 2024-01-15 | 1500 | 1500 |
| 2 | 2024-03-01 | 2500 | 4000 |
⚙️ Step 4: Why This Query Works
Windowed SUM lets totals accumulate row by row without collapsing groups.
⚡ Step 5: Common Mistakes
- Forgetting
ORDER BY→ full sum repeated on each row. - Using aggregate
SUM()instead of windowed version.
🧠 Step 6: Alternate Approach
For databases not supporting ROWS, the default frame behaves equivalently when ordered.
🎯 Interview Tip
Use this to explain running totals, moving averages, or trend tracking — all derived from similar window logic.
🔍 Interviewer’s Analysis
- Concept Tested: Window Aggregation
- Difficulty: Intermediate
- Follow-Ups: Compute for company-wide cumulative totals (no partition).
🪄 End of Q14.
❓ Q15: Find the difference between each month’s and previous month’s sales.
🧩 Problem Statement
Show monthly sales and how much they increased/decreased compared to the previous month.
💾 Step 1: Assume Example Tables
sales_summary
| month | total_sales |
|---|---|
| Jan | 50000 |
| Feb | 60000 |
| Mar | 55000 |
| Apr | 70000 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Concept: LAG() window function — look at previous row’s value.
🏗️ SQL Query
SELECT month,
total_sales,
LAG(total_sales) OVER (ORDER BY month) AS prev_sales,
total_sales - LAG(total_sales) OVER (ORDER BY month) AS change_from_prev
FROM sales_summary;🔍 Explanation
LAG()fetches previous month’s sales.- Subtract to find difference.
🧾 Step 3: Expected Output
| month | total_sales | prev_sales | change_from_prev |
|---|---|---|---|
| Jan | 50000 | NULL | NULL |
| Feb | 60000 | 50000 | 10000 |
| Mar | 55000 | 60000 | -5000 |
| Apr | 70000 | 55000 | 15000 |
⚙️ Step 4: Why This Query Works
Window functions maintain chronological context; LAG() is ideal for sequential comparisons.
⚡ Step 5: Common Mistakes
- Ordering alphabetically instead of chronologically (
monthshould be date-type). - Using
JOINfor month-over-month — unnecessary and inefficient.
🧠 Step 6: Alternate Approach
-- Using LEAD() to show next month’s change instead
SELECT month,
total_sales,
LEAD(total_sales) OVER (ORDER BY month) - total_sales AS change_to_next
FROM sales_summary;🎯 Interview Tip
Talk about using LAG() for retention, revenue growth, or churn — all based on similar deltas.
🔍 Interviewer’s Analysis
- Concept Tested: LAG() / Time-series comparison
- Difficulty: Intermediate
- Follow-Ups: Handle missing months or irregular time gaps.
🪄 End of Q15.
❓ Q16: Show employee salaries and the salary gap vs department average.
🧩 Problem Statement
For each employee, show how far their salary differs from their department’s average salary.
💾 Step 1: Assume Example Tables
employees
| emp_id | name | dept_id | salary |
|---|---|---|---|
| 1 | Amit Sharma | 10 | 60000 |
| 2 | Ananya Roy | 10 | 80000 |
| 3 | Bhavna Rai | 20 | 90000 |
| 4 | Chetan Joshi | 20 | 75000 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Use window AVG() to compute departmental mean per row, then subtract.
🏗️ SQL Query
SELECT emp_id,
name,
dept_id,
salary,
ROUND(AVG(salary) OVER (PARTITION BY dept_id), 2) AS dept_avg,
salary - AVG(salary) OVER (PARTITION BY dept_id) AS salary_gap
FROM employees;🔍 Explanation
AVG(salary) OVER (PARTITION BY dept_id)gives department average.- Subtract it from each employee’s salary.
🧾 Step 3: Expected Output
| emp_id | name | dept_id | salary | dept_avg | salary_gap |
|---|---|---|---|---|---|
| 1 | Amit Sharma | 10 | 60000 | 70000.00 | -10000 |
| 2 | Ananya Roy | 10 | 80000 | 70000.00 | 10000 |
| 3 | Bhavna Rai | 20 | 90000 | 82500.00 | 7500 |
| 4 | Chetan Joshi | 20 | 75000 | 82500.00 | -7500 |
⚙️ Step 4: Why This Query Works
Window functions allow comparing each row to its group’s aggregate without collapsing rows.
⚡ Step 5: Common Mistakes
- Using
GROUP BY dept_idinstead of window → collapses rows. - Forgetting
PARTITION BY→ average over all departments.
🎯 Interview Tip
Show that you can mix aggregates and raw rows — that’s what makes windows powerful.
🔍 Interviewer’s Analysis
- Concept Tested: Window Aggregation / Comparative Analytics
- Difficulty: Intermediate
- Follow-Ups: Add flag column for “above/below average”.
🪄 End of Q16.
❓ Q17: Find the Nth highest salary in each department.
🧩 Problem Statement
Retrieve the Nth (e.g., 2nd or 3rd) highest salary per department.
💾 Step 1: Assume Example Tables
employees
| emp_id | name | dept_id | salary |
|---|---|---|---|
| 1 | Amit Sharma | 10 | 60000 |
| 2 | Ananya Roy | 10 | 80000 |
| 3 | Bhavna Rai | 10 | 70000 |
| 4 | Chetan Joshi | 20 | 95000 |
| 5 | Deepa Singh | 20 | 90000 |
| 6 | Esha Patel | 20 | 88000 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Use DENSE_RANK() to assign salary order within department, then filter.
🏗️ SQL Query
SELECT dept_id, name, salary
FROM (
SELECT dept_id, name, salary,
DENSE_RANK() OVER (
PARTITION BY dept_id
ORDER BY salary DESC
) AS rnk
FROM employees
) t
WHERE rnk = 2;🔍 Explanation
- Assign rank by salary (highest = 1).
- Filter for the target rank (Nth = 2 in this example).
🧾 Step 3: Expected Output (N = 2)
| dept_id | name | salary |
|---|---|---|
| 10 | Bhavna Rai | 70000 |
| 20 | Deepa Singh | 90000 |
⚙️ Step 4: Why This Query Works
DENSE_RANK() allows rank-based filtering while preserving ties.
⚡ Step 5: Common Mistakes
- Using
LIMIT→ gives top-N overall, not per department. - Using
ROW_NUMBERwhen ties should share rank.
🧠 Step 6: Alternate Approach
To find Nth highest salary company-wide:
SELECT salary
FROM employees
ORDER BY salary DESC
OFFSET (N - 1) ROWS FETCH NEXT 1 ROWS ONLY;🎯 Interview Tip
Discuss tie-handling (DENSE_RANK vs ROW_NUMBER) and efficiency with large datasets (index on dept_id, salary).
🔍 Interviewer’s Analysis
- Concept Tested: Window Ranking / Conditional Filtering
- Difficulty: Intermediate
- Follow-Ups: Modify for top-3 instead of exactly N.
🪄 End of Q17.
❓ Q18: Calculate the moving average of posts per user (7-day window).
🧩 Problem Statement
Compute each user’s 7-day moving average of posts, using their posting dates.
💾 Step 1: Assume Example Tables
posts
| post_id | user_id | post_date | post_count |
|---|---|---|---|
| 1 | 1 | 2024-01-01 | 2 |
| 2 | 1 | 2024-01-02 | 3 |
| 3 | 1 | 2024-01-05 | 4 |
| 4 | 1 | 2024-01-08 | 1 |
| 5 | 2 | 2024-01-01 | 5 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Concept: Moving average with date-based window — sliding frame of last 7 days.
🏗️ SQL Query
SELECT user_id,
post_date,
post_count,
AVG(post_count) OVER (
PARTITION BY user_id
ORDER BY post_date
RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW
) AS moving_avg_7d
FROM posts;🔍 Explanation
RANGE BETWEEN …dynamically covers 7 calendar days before each post.- Average recalculates over that rolling range.
🧾 Step 3: Expected Output (simplified)
| user_id | post_date | post_count | moving_avg_7d |
|---|---|---|---|
| 1 | 2024-01-01 | 2 | 2.00 |
| 1 | 2024-01-02 | 3 | 2.50 |
| 1 | 2024-01-05 | 4 | 3.00 |
| 1 | 2024-01-08 | 1 | 2.50 |
| 2 | 2024-01-01 | 5 | 5.00 |
⚙️ Step 4: Why This Query Works
The sliding window dynamically shifts with each row, capturing exactly 7 days’ worth of history.
⚡ Step 5: Common Mistakes
- Using
ROWSinstead ofRANGE→ counts 7 rows, not 7 days. - Forgetting to partition by
user_id→ merges all users.
🧠 Step 6: Alternate Approach
For databases lacking INTERVAL syntax:
SELECT user_id, post_date, post_count,
AVG(post_count) OVER (
PARTITION BY user_id
ORDER BY post_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7rows
FROM posts;🎯 Interview Tip
Explain that this query mimics “rolling averages” used in dashboards, retention, or trend lines.
🔍 Interviewer’s Analysis
- Concept Tested: RANGE-based Windowing
- Difficulty: Advanced
- Follow-Ups: Compare ROWS vs RANGE; extend to 30-day moving avg.
🪄 End of Q18.
❓ Q19: Extract first and last names from a full_name column.
🧩 Problem Statement
Given an employee table with full_name, split it into two separate columns: first_name and last_name.
💾 Step 1: Assume Example Tables
employees
| emp_id | full_name |
|---|---|
| 1 | Amit Sharma |
| 2 | Ananya Roy |
| 3 | Bhavna Rai |
| 4 | Chetan Joshi |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Concept: String splitting using functions like SUBSTRING_INDEX, SPLIT_PART, or CHARINDEX depending on SQL dialect.
🏗️ SQL Query (MySQL)
SELECT emp_id,
SUBSTRING_INDEX(full_name, ' ', 1) AS first_name,
SUBSTRING_INDEX(full_name, ' ', -1) AS last_name
FROM employees;🏗️ SQL Query (PostgreSQL)
SELECT emp_id,
SPLIT_PART(full_name, ' ', 1) AS first_name,
SPLIT_PART(full_name, ' ', 2) AS last_name
FROM employees;🔍 Explanation
- Split by the first space
' '. - The first token = first name; the last = last name.
🧾 Step 3: Expected Output
| emp_id | first_name | last_name |
|---|---|---|
| 1 | Amit | Sharma |
| 2 | Ananya | Roy |
| 3 | Bhavna | Rai |
| 4 | Chetan | Joshi |
⚙️ Step 4: Why This Query Works
String functions parse the full name at the space character and extract text before and after.
⚡ Step 5: Common Mistakes
- Names with middle names or extra spaces cause wrong splits.
- Forgetting to trim extra whitespace.
🧠 Step 6: Alternate Approach
SELECT emp_id,
LEFT(full_name, CHARINDEX(' ', full_name)-1) AS first_name,
RIGHT(full_name, LEN(full_name) - CHARINDEX(' ', full_name)) AS last_name
FROM employees;(For SQL Server)
🎯 Interview Tip
Mention how to handle multi-word names using regex or splitting by position of first/last space.
🔍 Interviewer’s Analysis
- Concept Tested: String parsing
- Difficulty: Beginner
- Follow-Ups: Handle multiple spaces; trim names with
LTRIM()/RTRIM().
🪄 End of Q19.
❓ Q20: Find customers whose email domain is “gmail.com”.
🧩 Problem Statement
Identify all customers whose email addresses belong to the Gmail domain.
💾 Step 1: Assume Example Tables
customers
| customer_id | name | |
|---|---|---|
| 1 | Raj | raj@gmail.com |
| 2 | Aisha | aisha@yahoo.com |
| 3 | Sunil | sunil@gmail.com |
| 4 | Priya | priya@outlook.com |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Concept: String pattern matching using LIKE or SUBSTRING_INDEX().
🏗️ SQL Query
SELECT customer_id, name, email
FROM customers
WHERE email LIKE '%@gmail.com';🔍 Explanation
%matches any preceding text.- Ensures email ends with “@gmail.com”.
🧾 Step 3: Expected Output
| customer_id | name | |
|---|---|---|
| 1 | Raj | raj@gmail.com |
| 3 | Sunil | sunil@gmail.com |
⚙️ Step 4: Why This Query Works
LIKE '%@gmail.com' filters text ending with that domain.
⚡ Step 5: Common Mistakes
- Forgetting
%before domain → matches only exact@gmail.com. - Using case-sensitive match (
LIKEis case-insensitive in MySQL, but not in PostgreSQL).
🧠 Step 6: Alternate Approach
-- Extract domain and filter
SELECT customer_id, name, email
FROM customers
WHERE SUBSTRING_INDEX(email, '@', -1) = 'gmail.com';🎯 Interview Tip
Mention how to normalize case (LOWER(email)) before comparing domains.
🔍 Interviewer’s Analysis
- Concept Tested: String pattern search
- Difficulty: Beginner
- Follow-Ups: Extract username vs domain; group customers by domain.
🪄 End of Q20.
❓ Q21: Retrieve orders made in the last 7 days.
🧩 Problem Statement
Find all orders whose order_date falls within the last 7 days from today.
💾 Step 1: Assume Example Tables
orders
| order_id | customer_id | order_date | amount |
|---|---|---|---|
| 1 | 1 | 2024-02-20 | 5000 |
| 2 | 2 | 2024-02-25 | 3000 |
| 3 | 3 | 2024-02-28 | 7000 |
| 4 | 4 | 2024-03-01 | 4000 |
(Assume today = 2024-03-01)
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Concept: Date arithmetic using CURRENT_DATE or NOW().
🏗️ SQL Query (MySQL/PostgreSQL)
SELECT order_id, customer_id, order_date, amount
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '7 DAY';🔍 Explanation
Subtracts 7 days from today’s date and filters all orders on or after that date.
🧾 Step 3: Expected Output
| order_id | customer_id | order_date | amount |
|---|---|---|---|
| 2 | 2 | 2024-02-25 | 3000 |
| 3 | 3 | 2024-02-28 | 7000 |
| 4 | 4 | 2024-03-01 | 4000 |
⚙️ Step 4: Why This Query Works
Date functions support subtraction intervals, allowing dynamic filtering without hardcoding.
⚡ Step 5: Common Mistakes
- Using
NOW()withoutDATE()→ includes time precision errors. - Forgetting to include the current date boundary.
🧠 Step 6: Alternate Approach
-- SQL Server
SELECT *
FROM orders
WHERE order_date >= DATEADD(DAY, -7, GETDATE());🎯 Interview Tip
Always confirm timezone alignment — especially in analytics pipelines.
🔍 Interviewer’s Analysis
- Concept Tested: Date filtering / relative intervals
- Difficulty: Beginner
- Follow-Ups: Fetch last month’s orders; group by day.
🪄 End of Q21.
❓ Q22: Calculate the age of each employee.
🧩 Problem Statement
Given birth_date, calculate the employee’s age in years as of today.
💾 Step 1: Assume Example Tables
employees
| emp_id | name | birth_date |
|---|---|---|
| 1 | Amit Sharma | 1990-05-10 |
| 2 | Ananya Roy | 1995-08-22 |
| 3 | Bhavna Rai | 1987-03-01 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Concept: Date difference functions (DATEDIFF, AGE, TIMESTAMPDIFF).
🏗️ SQL Query (MySQL)
SELECT emp_id,
name,
TIMESTAMPDIFF(YEAR, birth_date, CURDATE()) AS age
FROM employees;🏗️ SQL Query (PostgreSQL)
SELECT emp_id,
name,
EXTRACT(YEAR FROM AGE(CURRENT_DATE, birth_date)) AS age
FROM employees;🔍 Explanation
Subtracts birth_date from today’s date to find total years elapsed.
🧾 Step 3: Expected Output
| emp_id | name | age |
|---|---|---|
| 1 | Amit Sharma | 34 |
| 2 | Ananya Roy | 29 |
| 3 | Bhavna Rai | 37 |
⚙️ Step 4: Why This Query Works
SQL’s date functions automatically handle leap years and partial months accurately.
⚡ Step 5: Common Mistakes
- Using
YEAR(CURDATE()) - YEAR(birth_date)→ gives wrong result before birthday. - Storing dates as text (use proper DATE type).
🧠 Step 6: Alternate Approach
SELECT emp_id, name,
FLOOR(DATEDIFF(CURDATE(), birth_date) / 365.25) AS age
FROM employees;🎯 Interview Tip
Mention that for precise age (down to days), use interval math rather than dividing by 365.
🔍 Interviewer’s Analysis
- Concept Tested: Date functions / age calculation
- Difficulty: Beginner
- Follow-Ups: Compute employee tenure; find oldest/youngest employee.
🪄 End of Q22.
❓ Q23: Count how many users registered each month.
🧩 Problem Statement
Group users by registration month and count how many joined in each month.
💾 Step 1: Assume Example Tables
users
| user_id | name | registration_date |
|---|---|---|
| 1 | Raj | 2024-01-15 |
| 2 | Aisha | 2024-01-20 |
| 3 | Sunil | 2024-02-05 |
| 4 | Priya | 2024-02-25 |
| 5 | Deepa | 2024-03-01 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Concept: Date truncation using MONTH() or DATE_TRUNC() for grouping.
🏗️ SQL Query (MySQL)
SELECT DATE_FORMAT(registration_date, '%Y-%m') AS month,
COUNT(*) AS user_count
FROM users
GROUP BY DATE_FORMAT(registration_date, '%Y-%m')
ORDER BY month;🏗️ SQL Query (PostgreSQL)
SELECT TO_CHAR(DATE_TRUNC('month', registration_date), 'YYYY-MM') AS month,
COUNT(*) AS user_count
FROM users
GROUP BY DATE_TRUNC('month', registration_date)
ORDER BY month;🔍 Explanation
- Truncate each registration date to month level.
- Count users per month.
🧾 Step 3: Expected Output
| month | user_count |
|---|---|
| 2024-01 | 2 |
| 2024-02 | 2 |
| 2024-03 | 1 |
⚙️ Step 4: Why This Query Works
Truncating to month ensures all dates in the same month fall into the same group.
⚡ Step 5: Common Mistakes
- Grouping by full date instead of month.
- Forgetting to format or order chronologically (lexicographic sort misorders months).
🧠 Step 6: Alternate Approach
SELECT EXTRACT(YEAR FROM registration_date) AS year,
EXTRACT(MONTH FROM registration_date) AS month,
COUNT(*) AS user_count
FROM users
GROUP BY year, month
ORDER BY year, month;Cleaner for year-over-year comparisons.
🎯 Interview Tip
Highlight that monthly grouping is common in KPIs like MAU (Monthly Active Users) and retention dashboards.
🔍 Interviewer’s Analysis
- Concept Tested: Date grouping / aggregation
- Difficulty: Intermediate
- Follow-Ups: Extend to week/day granularity; cumulative monthly users.
🪄 End of Q23.
❓ Q24: Find users who made purchases every month of a year.
🧩 Problem Statement
Identify customers who made at least one purchase in every month of 2024.
💾 Step 1: Assume Example Tables
orders
| order_id | customer_id | order_date | amount |
|---|---|---|---|
| 1 | 1 | 2024-01-15 | 500 |
| 2 | 1 | 2024-02-10 | 300 |
| 3 | 1 | 2024-03-05 | 400 |
| 4 | 2 | 2024-01-25 | 600 |
| 5 | 2 | 2024-04-15 | 200 |
| 6 | 3 | 2024-01-20 | 800 |
| 7 | 3 | 2024-02-20 | 900 |
| 8 | 3 | 2024-03-10 | 1000 |
| 9 | 3 | 2024-04-01 | 700 |
| 10 | 3 | 2024-05-03 | 900 |
| … | … | … | … |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
We must check if a user appears in all 12 months for that year.
🏗️ SQL Query
SELECT customer_id
FROM (
SELECT customer_id,
COUNT(DISTINCT EXTRACT(MONTH FROM order_date)) AS active_months
FROM orders
WHERE EXTRACT(YEAR FROM order_date) = 2024
GROUP BY customer_id
) t
WHERE active_months = 12;🔍 Explanation
- Inner query counts distinct months with orders per user.
- Outer query keeps users with all 12 active months.
🧾 Step 3: Expected Output
| customer_id | |
|---|---|
| 1 | (if had orders in all 12 months) |
⚙️ Step 4: Why This Query Works
Distinct month counting ensures even one order per month counts as active.
⚡ Step 5: Common Mistakes
- Using
COUNT(*)instead ofCOUNT(DISTINCT MONTH)— overcounts duplicates. - Forgetting to filter by year.
🧠 Step 6: Alternate Approach
To check for specific months dynamically:
SELECT customer_id
FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY customer_id
HAVING COUNT(DISTINCT TO_CHAR(order_date, 'YYYY-MM')) = 12;🎯 Interview Tip
Mention that this is useful in retention or engagement metrics (“active all months”).
🔍 Interviewer’s Analysis
- Concept Tested: Conditional aggregation + DISTINCT counting
- Difficulty: Intermediate
- Follow-Ups: Adapt to “active at least 6 months.”
🪄 End of Q24.
❓ Q25: Identify customers’ first and last purchase dates.
🧩 Problem Statement
For each customer, show the earliest and latest order dates.
💾 Step 1: Assume Example Tables
orders
| order_id | customer_id | order_date | amount |
|---|---|---|---|
| 1 | 1 | 2023-01-10 | 500 |
| 2 | 1 | 2024-02-05 | 300 |
| 3 | 2 | 2023-07-01 | 200 |
| 4 | 2 | 2024-01-20 | 400 |
| 5 | 3 | 2023-03-15 | 900 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
We use MIN() and MAX() grouped by customer_id.
🏗️ SQL Query
SELECT customer_id,
MIN(order_date) AS first_purchase,
MAX(order_date) AS last_purchase
FROM orders
GROUP BY customer_id;🔍 Explanation
MIN finds earliest date; MAX finds latest date per customer.
🧾 Step 3: Expected Output
| customer_id | first_purchase | last_purchase |
|---|---|---|
| 1 | 2023-01-10 | 2024-02-05 |
| 2 | 2023-07-01 | 2024-01-20 |
| 3 | 2023-03-15 | 2023-03-15 |
⚙️ Step 4: Why This Query Works
Grouping aggregates across rows for each customer, returning their date range.
⚡ Step 5: Common Mistakes
- Forgetting
GROUP BY. - Using
ORDER BYandLIMIT— less efficient.
🧠 Step 6: Alternate Approach
To fetch also the amounts of first/last purchases:
SELECT o.customer_id,
FIRST_VALUE(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) AS first_date,
LAST_VALUE(order_date) OVER (PARTITION BY customer_id ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_date
FROM orders o;🎯 Interview Tip
Always clarify if the interviewer wants “first ever” or “first in 2024”.
🔍 Interviewer’s Analysis
- Concept Tested: Aggregation / Date extremum
- Difficulty: Beginner
- Follow-Ups: Add gap between purchases.
🪄 End of Q25.
❓ Q26: Calculate repeat purchase rate (customers who ordered more than once).
🧩 Problem Statement
Find the percentage of customers who placed more than one order.
💾 Step 1: Assume Example Tables
orders
| order_id | customer_id | amount |
|---|---|---|
| 1 | 1 | 1000 |
| 2 | 1 | 1500 |
| 3 | 2 | 700 |
| 4 | 3 | 1200 |
| 5 | 3 | 800 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
We need:
- Number of customers with >1 orders.
- Total customers.
- Divide (1) by (2).
🏗️ SQL Query
WITH customer_orders AS (
SELECT customer_id, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
)
SELECT
ROUND(
(COUNT(CASE WHEN order_count > 1 THEN 1 END) * 100.0) /
COUNT(*),
2
) AS repeat_purchase_rate_percent
FROM customer_orders;🔍 Explanation
- CTE counts orders per customer.
- Then computes ratio of repeaters to total customers.
🧾 Step 3: Expected Output
| repeat_purchase_rate_percent |
|---|
| 66.67 |
(2 of 3 customers ordered more than once.)
⚙️ Step 4: Why This Query Works
By summarizing at customer level first, we can derive aggregate proportions.
⚡ Step 5: Common Mistakes
- Counting orders instead of customers.
- Forgetting to multiply by 100 for percentage.
🧠 Step 6: Alternate Approach
SELECT
(SUM(CASE WHEN order_count > 1 THEN 1 ELSE 0 END) * 1.0 / COUNT(*)) AS repeat_rate
FROM (
SELECT customer_id, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
) sub;🎯 Interview Tip
This metric is key for customer retention analytics — worth mentioning contextually.
🔍 Interviewer’s Analysis
- Concept Tested: Aggregation, CTEs, Conditional Logic
- Difficulty: Intermediate
- Follow-Ups: Add time filter (“repeat within same quarter”).
🪄 End of Q26.
❓ Q27: Calculate year-over-year (YoY) growth.
🧩 Problem Statement
Compute the YoY growth percentage in sales between consecutive years.
💾 Step 1: Assume Example Tables
sales_summary
| year | total_sales |
|---|---|
| 2022 | 50000 |
| 2023 | 65000 |
| 2024 | 78000 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Concept: Compare each year’s sales to the previous year using LAG().
🏗️ SQL Query
SELECT year,
total_sales,
LAG(total_sales) OVER (ORDER BY year) AS prev_year_sales,
ROUND(
((total_sales - LAG(total_sales) OVER (ORDER BY year)) * 100.0) /
LAG(total_sales) OVER (ORDER BY year),
2
) AS yoy_growth_percent
FROM sales_summary;🔍 Explanation
LAG()pulls previous year’s total.- Computes percentage change.
🧾 Step 3: Expected Output
| year | total_sales | prev_year_sales | yoy_growth_percent |
|---|---|---|---|
| 2022 | 50000 | NULL | NULL |
| 2023 | 65000 | 50000 | 30.00 |
| 2024 | 78000 | 65000 | 20.00 |
⚙️ Step 4: Why This Query Works
The window function allows comparison without self-join.
⚡ Step 5: Common Mistakes
- Forgetting to handle first year (NULL previous).
- Using integer division → 0 results.
🧠 Step 6: Alternate Approach
SELECT a.year, a.total_sales,
((a.total_sales - b.total_sales) / b.total_sales) * 100 AS yoy_growth
FROM sales_summary a
JOIN sales_summary b
ON a.year = b.year + 1;Manual join alternative.
🎯 Interview Tip
Explain how YoY helps detect seasonal patterns or growth anomalies.
🔍 Interviewer’s Analysis
- Concept Tested: LAG() / Ratio calculation
- Difficulty: Intermediate
- Follow-Ups: Extend to quarter-over-quarter (QoQ) growth.
🪄 End of Q27.
❓ Q28: Identify categories where sales decreased for 3 consecutive months.
🧩 Problem Statement
Detect product categories showing continuous decline in sales across 3 months.
💾 Step 1: Assume Example Tables
sales_by_category
| category | month | total_sales |
|---|---|---|
| Electronics | Jan | 10000 |
| Electronics | Feb | 9000 |
| Electronics | Mar | 8500 |
| Furniture | Jan | 5000 |
| Furniture | Feb | 5500 |
| Furniture | Mar | 6000 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Concept: Compare each month with the next two months using LAG() or pattern check.
🏗️ SQL Query
SELECT category
FROM (
SELECT category,
month,
total_sales,
LAG(total_sales, 1) OVER (PARTITION BY category ORDER BY month) AS prev_month,
LAG(total_sales, 2) OVER (PARTITION BY category ORDER BY month) AS prev_2_months
FROM sales_by_category
) t
WHERE total_sales < prev_month
AND prev_month < prev_2_months;🔍 Explanation
- For each category, pull sales for 2 prior months.
- Filter where 3 consecutive months show decreasing totals.
🧾 Step 3: Expected Output
| category |
|---|
| Electronics |
⚙️ Step 4: Why This Query Works
Window shifts create lagged comparisons; chained inequalities detect downward trends.
⚡ Step 5: Common Mistakes
- Comparing wrong order (
prev < current). - Forgetting to partition by category.
🧠 Step 6: Alternate Approach
WITH ranked AS (
SELECT category,
month,
total_sales,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY month) AS rn
FROM sales_by_category
)
SELECT s1.category
FROM ranked s1
JOIN ranked s2 ON s1.category = s2.category AND s1.rn = s2.rn - 1
JOIN ranked s3 ON s1.category = s3.category AND s1.rn = s3.rn - 2
WHERE s1.total_sales > s2.total_sales AND s2.total_sales > s3.total_sales;🎯 Interview Tip
Mention that you can generalize this for any “n-month declining” trend using dynamic window logic.
🔍 Interviewer’s Analysis
- Concept Tested: Trend detection / time-series windows
- Difficulty: Advanced
- Follow-Ups: Adapt to “increasing trend” or add threshold (% drop).
🪄 End of Q28.
SQL Developer
❓ Q1: Write a query using a CROSS JOIN to generate product–region combinations.
🧩 Problem Statement
Create all possible combinations between products and regions, regardless of whether sales exist.
This is useful for generating sales forecast grids or target matrices.
💾 Step 1: Assume Example Tables
products
| product_id | product_name |
|---|---|
| 1 | Laptop |
| 2 | Phone |
| 3 | Tablet |
regions
| region_id | region_name |
|---|---|
| 101 | North |
| 102 | South |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
A CROSS JOIN creates a Cartesian product — every row from one table is paired with every row from the other.
🏗️ SQL Query
SELECT p.product_name, r.region_name
FROM products p
CROSS JOIN regions r;🔍 Explanation
- No join condition — every product is paired with every region.
- Total combinations = product count × region count.
🧾 Step 3: Expected Output
| product_name | region_name |
|---|---|
| Laptop | North |
| Laptop | South |
| Phone | North |
| Phone | South |
| Tablet | North |
| Tablet | South |
⚙️ Step 4: Why This Query Works
CROSS JOIN doesn’t rely on keys — it systematically pairs all records.
⚡ Step 5: Common Mistakes
- Forgetting that CROSS JOIN can explode data volume.
- Confusing it with INNER JOIN (which requires matching condition).
🧠 Step 6: Alternate Approach
-- Equivalent syntax
SELECT p.product_name, r.region_name
FROM products p, regions r;(Comma syntax is older and less explicit — prefer CROSS JOIN for clarity.)
🎯 Interview Tip
Explain when you’d use CROSS JOIN — e.g., generating all possible region-product slots before filling sales data.
🔍 Interviewer’s Analysis
- Concept Tested: Cartesian Product Joins
- Difficulty: Beginner → Intermediate
- Follow-Ups: Add sales totals using LEFT JOIN after generating combinations.
🪄 End of Q1.
❓ Q2: Find students who enrolled in all available courses.
🧩 Problem Statement
Identify students who have registered for every course offered.
💾 Step 1: Assume Example Tables
students
| student_id | name |
|---|---|
| 1 | Aisha |
| 2 | Raj |
| 3 | Meena |
courses
| course_id | course_name |
|---|---|
| 101 | Math |
| 102 | Science |
| 103 | History |
enrollments
| student_id | course_id |
|---|---|
| 1 | 101 |
| 1 | 102 |
| 1 | 103 |
| 2 | 101 |
| 3 | 101 |
| 3 | 103 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
We need students whose count of enrolled distinct courses = total number of courses.
🏗️ SQL Query
SELECT e.student_id
FROM enrollments e
GROUP BY e.student_id
HAVING COUNT(DISTINCT e.course_id) = (
SELECT COUNT(*) FROM courses
);🔍 Explanation
- Inner query → total course count.
- Outer query → counts how many each student has taken.
- If equal → enrolled in all.
🧾 Step 3: Expected Output
| student_id |
|---|
| 1 |
Aisha enrolled in all 3 courses.
⚙️ Step 4: Why This Query Works
It compares per-student distinct course enrollments with global total courses.
⚡ Step 5: Common Mistakes
- Forgetting
DISTINCT(double-counts re-enrollments). - Comparing with wrong total (e.g., active courses only).
🧠 Step 6: Alternate Approach
SELECT s.student_id, s.name
FROM students s
WHERE NOT EXISTS (
SELECT 1 FROM courses c
WHERE NOT EXISTS (
SELECT 1 FROM enrollments e
WHERE e.student_id = s.student_id AND e.course_id = c.course_id
)
);Uses double NOT EXISTS logic — elegant but advanced.
🎯 Interview Tip
Mention that this is a classic relational division problem (subset logic).
🔍 Interviewer’s Analysis
- Concept Tested: Grouping / Subquery comparison
- Difficulty: Intermediate
- Follow-Ups: Find students enrolled in at least 2 courses.
🪄 End of Q2.
❓ Q3: List managers who oversee more than 3 distinct departments.
🧩 Problem Statement
Return all managers responsible for more than three unique departments.
💾 Step 1: Assume Example Tables
departments
| dept_id | dept_name | manager_id |
|---|---|---|
| 10 | Sales | 1001 |
| 20 | HR | 1002 |
| 30 | Tech | 1001 |
| 40 | Marketing | 1001 |
| 50 | Finance | 1003 |
| 60 | Support | 1001 |
managers
| manager_id | name |
|---|---|
| 1001 | Amit Kumar |
| 1002 | Neha Sharma |
| 1003 | David Roy |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
We count departments per manager and filter where count > 3.
🏗️ SQL Query
SELECT m.manager_id, m.name, COUNT(DISTINCT d.dept_id) AS dept_count
FROM managers m
JOIN departments d ON m.manager_id = d.manager_id
GROUP BY m.manager_id, m.name
HAVING COUNT(DISTINCT d.dept_id) > 3;🔍 Explanation
- JOIN ensures managers linked to their departments.
COUNT(DISTINCT dept_id)avoids duplicates.HAVINGfilters based on aggregate.
🧾 Step 3: Expected Output
| manager_id | name | dept_count |
|---|---|---|
| 1001 | Amit Kumar | 4 |
⚙️ Step 4: Why This Query Works
Group aggregation summarizes data by manager; HAVING filters after grouping.
⚡ Step 5: Common Mistakes
- Using
WHEREinstead ofHAVING. - Forgetting
DISTINCTfor multiple department duplicates.
🧠 Step 6: Alternate Approach
To check managers who oversee exactly 3 departments:
HAVING COUNT(DISTINCT d.dept_id) = 3;🎯 Interview Tip
Always clarify whether to count distinct or total departments — crucial difference in real-world data.
🔍 Interviewer’s Analysis
- Concept Tested: Grouped aggregation + HAVING
- Difficulty: Intermediate
- Follow-Ups: Include average salary per manager’s team.
🪄 End of Q3.
❓ Q4: Identify employees who earn less than their managers.
🧩 Problem Statement
Show employees whose salary is lower than their manager’s.
💾 Step 1: Assume Example Tables
employees
| emp_id | name | manager_id | salary |
|---|---|---|---|
| 1 | Raj Patel | 4 | 60000 |
| 2 | Aisha Khan | 4 | 75000 |
| 3 | Meena Joshi | 5 | 80000 |
| 4 | David Kumar | NULL | 95000 |
| 5 | Neha Sharma | NULL | 90000 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Concept: Self-join comparing employee’s salary to manager’s salary.
🏗️ SQL Query
SELECT e.name AS employee_name,
e.salary AS employee_salary,
m.name AS manager_name,
m.salary AS manager_salary
FROM employees e
JOIN employees m
ON e.manager_id = m.emp_id
WHERE e.salary < m.salary;🔍 Explanation
- Join employees table to itself — employees (
e) vs managers (m). - Compare salaries across the relationship.
🧾 Step 3: Expected Output
| employee_name | employee_salary | manager_name | manager_salary |
|---|---|---|---|
| Raj Patel | 60000 | David Kumar | 95000 |
| Aisha Khan | 75000 | David Kumar | 95000 |
| Meena Joshi | 80000 | Neha Sharma | 90000 |
⚙️ Step 4: Why This Query Works
Self-joins allow hierarchical comparisons between rows in the same table.
⚡ Step 5: Common Mistakes
- Forgetting to alias tables.
- Missing
WHERE e.salary < m.salarycondition — leads to all pairs.
🧠 Step 6: Alternate Approach
-- Using subquery
SELECT name
FROM employees e
WHERE salary < (
SELECT salary
FROM employees
WHERE emp_id = e.manager_id
);Simpler but less efficient on large datasets.
🎯 Interview Tip
Explain why you chose a join vs subquery — discuss readability and performance trade-offs.
🔍 Interviewer’s Analysis
- Concept Tested: Self-Join / Hierarchical Comparison
- Difficulty: Intermediate
- Follow-Ups: Add tie condition or average comparison per manager.
🪄 End of Q4.
❓ Q5: Find departments where the average salary exceeds the company average.
🧩 Problem Statement
List all departments whose average employee salary is higher than the overall company average salary.
💾 Step 1: Assume Example Tables
employees
| emp_id | name | dept_id | salary |
|---|---|---|---|
| 1 | Amit Sharma | 10 | 60000 |
| 2 | Ananya Roy | 10 | 80000 |
| 3 | Bhavna Rai | 20 | 90000 |
| 4 | Chetan Joshi | 20 | 95000 |
| 5 | Deepa Singh | 30 | 55000 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
We compare:
- Average salary per department.
- Overall company average. Return departments where (1) > (2).
🏗️ SQL Query
SELECT dept_id,
AVG(salary) AS dept_avg_salary
FROM employees
GROUP BY dept_id
HAVING AVG(salary) > (
SELECT AVG(salary) FROM employees
);🔍 Explanation
- Inner query → company-wide average salary.
- Outer → computes per-department average.
HAVINGcompares group aggregate with scalar subquery.
🧾 Step 3: Expected Output
| dept_id | dept_avg_salary |
|---|---|
| 20 | 92500 |
(Dept 20’s avg = 92,500 > company avg = 76,000.)
⚙️ Step 4: Why This Query Works
The HAVING clause filters aggregated results post-grouping, comparing each department’s average to the global mean.
⚡ Step 5: Common Mistakes
- Using
WHEREwith aggregate → invalid (must useHAVING). - Forgetting parentheses around subquery.
🧠 Step 6: Alternate Approach
WITH dept_avg AS (
SELECT dept_id, AVG(salary) AS dept_avg_salary
FROM employees
GROUP BY dept_id
),
company_avg AS (
SELECT AVG(salary) AS company_avg_salary
FROM employees
)
SELECT d.dept_id, d.dept_avg_salary
FROM dept_avg d, company_avg c
WHERE d.dept_avg_salary > c.company_avg_salary;CTE version — more readable for large analyses.
🎯 Interview Tip
Always show you can articulate multi-level aggregation logic — Interviewers value why you chose a subquery vs CTE.
🔍 Interviewer’s Analysis
- Concept Tested: Aggregation comparison / HAVING with subquery
- Difficulty: Intermediate
- Follow-Ups: Add department names via join; compare median instead of average.
🪄 End of Q5.
❓ Q6: Find employees earning more than their department’s average.
🧩 Problem Statement
Show all employees whose salary is above the average salary of their own department.
💾 Step 1: Assume Example Tables
employees
| emp_id | name | dept_id | salary |
|---|---|---|---|
| 1 | Amit Sharma | 10 | 60000 |
| 2 | Ananya Roy | 10 | 80000 |
| 3 | Bhavna Rai | 20 | 90000 |
| 4 | Chetan Joshi | 20 | 75000 |
| 5 | Deepa Singh | 30 | 55000 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
We need department-level averages per employee — this is where window functions shine.
🏗️ SQL Query
SELECT emp_id,
name,
dept_id,
salary,
ROUND(AVG(salary) OVER (PARTITION BY dept_id), 2) AS dept_avg
FROM employees
WHERE salary > (
SELECT AVG(salary)
FROM employees e2
WHERE e2.dept_id = employees.dept_id
);🔍 Explanation
- Inner subquery calculates per-department average dynamically.
- Outer query keeps only those above it.
🧾 Step 3: Expected Output
| emp_id | name | dept_id | salary |
|---|---|---|---|
| 2 | Ananya Roy | 10 | 80000 |
| 3 | Bhavna Rai | 20 | 90000 |
⚙️ Step 4: Why This Query Works
Each employee’s salary is compared against the average of their department group, computed independently per row.
⚡ Step 5: Common Mistakes
- Omitting alias in subquery → ambiguous
dept_id. - Forgetting to group subquery by department → wrong aggregate.
🧠 Step 6: Alternate Approach
-- Using window function (cleaner and faster)
SELECT emp_id,
name,
dept_id,
salary,
AVG(salary) OVER (PARTITION BY dept_id) AS dept_avg
FROM employees
QUALIFY salary > AVG(salary) OVER (PARTITION BY dept_id);(Snowflake/BigQuery syntax; QUALIFY filters windowed results.)
🎯 Interview Tip
Mention that this is a common “above-average performer” query — interviewers expect you to know both the subquery and window methods.
🔍 Interviewer’s Analysis
- Concept Tested: Correlated subquery / window partition
- Difficulty: Intermediate
- Follow-Ups: Rank employees within each department by salary.
🪄 End of Q6.
❓ Q7: Calculate the cumulative percentage contribution of sales per region.
🧩 Problem Statement
Display each region’s total sales and its cumulative percentage of the company’s overall sales.
💾 Step 1: Assume Example Tables
sales
| region | total_sales |
|---|---|
| North | 50000 |
| South | 30000 |
| East | 15000 |
| West | 5000 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
We’ll use:
SUM()for total sales.SUM() OVER(ORDER BY …)for cumulative sum.- Division to calculate cumulative percentage.
🏗️ SQL Query
SELECT region,
total_sales,
SUM(total_sales) OVER (ORDER BY total_sales DESC) AS cumulative_sales,
ROUND(
(SUM(total_sales) OVER (ORDER BY total_sales DESC) * 100.0) /
SUM(total_sales) OVER (),
2
) AS cumulative_percent
FROM sales
ORDER BY total_sales DESC;🔍 Explanation
- Sort regions by total sales (descending).
- Compute running total using window sum.
- Divide by overall total to get cumulative percentage.
🧾 Step 3: Expected Output
| region | total_sales | cumulative_sales | cumulative_percent |
|---|---|---|---|
| North | 50000 | 50000 | 50.00 |
| South | 30000 | 80000 | 80.00 |
| East | 15000 | 95000 | 95.00 |
| West | 5000 | 100000 | 100.00 |
⚙️ Step 4: Why This Query Works
Window SUM() functions let us calculate both running totals and percent contributions in a single pass.
⚡ Step 5: Common Mistakes
- Forgetting
OVER()→ turns window sum into group sum. - Missing
ORDER BY→ cumulative values incorrect. - Dividing by per-row total instead of global total.
🧠 Step 6: Alternate Approach
-- Simpler: just percentage of total
SELECT region,
total_sales,
ROUND(total_sales * 100.0 / SUM(total_sales) OVER (), 2) AS percent_of_total
FROM sales;Gives each region’s individual contribution, not cumulative.
🎯 Interview Tip
Explain how cumulative contribution is used for Pareto analysis (80/20 rule) — key business insight.
🔍 Interviewer’s Analysis
- Concept Tested: Windowed SUM / Percentage contribution
- Difficulty: Intermediate → Advanced
- Follow-Ups: Filter top regions contributing 80% of total sales.
🪄 End of Q7.
❓ Q8: Show employee performance rank reset every quarter.
🧩 Problem Statement
Rank employees based on performance score, but restart the ranking every quarter.
💾 Step 1: Assume Example Tables
employee_performance
| emp_id | quarter | score |
|---|---|---|
| 1 | Q1 | 88 |
| 2 | Q1 | 75 |
| 3 | Q1 | 92 |
| 1 | Q2 | 91 |
| 2 | Q2 | 84 |
| 3 | Q2 | 78 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
We use RANK() with PARTITION BY quarter to reset per quarter.
🏗️ SQL Query
SELECT emp_id,
quarter,
score,
RANK() OVER (
PARTITION BY quarter
ORDER BY score DESC
) AS quarterly_rank
FROM employee_performance
ORDER BY quarter, quarterly_rank;🔍 Explanation
PARTITION BY quarter→ restarts rank each quarter.ORDER BY score DESC→ higher score = higher rank.
🧾 Step 3: Expected Output
| emp_id | quarter | score | quarterly_rank |
|---|---|---|---|
| 3 | Q1 | 92 | 1 |
| 1 | Q1 | 88 | 2 |
| 2 | Q1 | 75 | 3 |
| 1 | Q2 | 91 | 1 |
| 2 | Q2 | 84 | 2 |
| 3 | Q2 | 78 | 3 |
⚙️ Step 4: Why This Query Works
Window partitioning divides data by quarter; within each group, ranking restarts independently.
⚡ Step 5: Common Mistakes
- Forgetting
PARTITION BY→ rank continues across all quarters. - Using
ORDER BYin main query but not inOVER()→ wrong rank order.
🎯 Interview Tip
This is a classic “rank reset” question — emphasize PARTITION BY logic clearly.
🔍 Interviewer’s Analysis
- Concept Tested: Partitioned Ranking
- Difficulty: Intermediate
- Follow-Ups: Add tie-handling (switch
RANKtoDENSE_RANK).
🪄 End of Q8.
❓ Q9: Calculate monthly revenue change (month-over-month).
🧩 Problem Statement
Show each month’s revenue and how much it changed compared to the previous month.
💾 Step 1: Assume Example Tables
monthly_sales
| month | revenue |
|---|---|
| Jan | 50000 |
| Feb | 60000 |
| Mar | 55000 |
| Apr | 70000 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Use LAG() to access previous month’s revenue, then compute difference.
🏗️ SQL Query
SELECT month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
revenue - LAG(revenue) OVER (ORDER BY month) AS change_amount,
ROUND(
((revenue - LAG(revenue) OVER (ORDER BY month)) * 100.0) /
LAG(revenue) OVER (ORDER BY month), 2
) AS change_percent
FROM monthly_sales;🔍 Explanation
LAG()fetches the previous month’s revenue.- Subtracts to get absolute and percentage change.
🧾 Step 3: Expected Output
| month | revenue | prev_revenue | change_amount | change_percent |
|---|---|---|---|---|
| Jan | 50000 | NULL | NULL | NULL |
| Feb | 60000 | 50000 | 10000 | 20.00 |
| Mar | 55000 | 60000 | -5000 | -8.33 |
| Apr | 70000 | 55000 | 15000 | 27.27 |
⚙️ Step 4: Why This Query Works
LAG() makes it possible to calculate time-based deltas without self-joins.
⚡ Step 5: Common Mistakes
- Forgetting to order by month (causes random differences).
- Using integer division for percentages.
🎯 Interview Tip
Always mention how to handle missing months or gaps in time series.
🔍 Interviewer’s Analysis
- Concept Tested: LAG / Sequential Comparison
- Difficulty: Intermediate
- Follow-Ups: Calculate cumulative MoM growth over time.
🪄 End of Q9.
❓ Q10: Calculate YoY growth for each product.
🧩 Problem Statement
For each product, compute its year-over-year (YoY) percentage growth.
💾 Step 1: Assume Example Tables
sales
| product_id | year | revenue |
|---|---|---|
| 1 | 2022 | 50000 |
| 1 | 2023 | 65000 |
| 1 | 2024 | 70000 |
| 2 | 2022 | 30000 |
| 2 | 2023 | 33000 |
| 2 | 2024 | 30000 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
We use LAG() within each product partition to compare consecutive years.
🏗️ SQL Query
SELECT product_id,
year,
revenue,
LAG(revenue) OVER (
PARTITION BY product_id
ORDER BY year
) AS prev_year_revenue,
ROUND(
((revenue - LAG(revenue) OVER (PARTITION BY product_id ORDER BY year)) * 100.0) /
LAG(revenue) OVER (PARTITION BY product_id ORDER BY year),
2
) AS yoy_growth_percent
FROM sales
ORDER BY product_id, year;🔍 Explanation
PARTITION BY product_idensures per-product calculations.- The
LAG()retrieves prior year’s revenue for the same product.
🧾 Step 3: Expected Output
| product_id | year | revenue | prev_year_revenue | yoy_growth_percent |
|---|---|---|---|---|
| 1 | 2022 | 50000 | NULL | NULL |
| 1 | 2023 | 65000 | 50000 | 30.00 |
| 1 | 2024 | 70000 | 65000 | 7.69 |
| 2 | 2022 | 30000 | NULL | NULL |
| 2 | 2023 | 33000 | 30000 | 10.00 |
| 2 | 2024 | 30000 | 33000 | -9.09 |
⚙️ Step 4: Why This Query Works
It performs rolling comparisons within logical groups (per product) — no self-join needed.
⚡ Step 5: Common Mistakes
- Missing
PARTITION BY→ compares across products. - Dividing by 0 when previous revenue = 0 (handle with
NULLIF).
🧠 Step 6: Alternate Approach
-- Join-based version
SELECT a.product_id, a.year,
((a.revenue - b.revenue) / b.revenue) * 100 AS yoy_growth
FROM sales a
JOIN sales b
ON a.product_id = b.product_id AND a.year = b.year + 1;Works but slower for large tables.
🎯 Interview Tip
Always mention partitioning when discussing window functions — it’s the magic word interviewers love.
🔍 Interviewer’s Analysis
- Concept Tested: Partitioned LAG / Ratio Calculations
- Difficulty: Intermediate → Advanced
- Follow-Ups: Handle missing years (2022 → 2024 jump).
🪄 End of Q10.
❓ Q11: Display products with 3 consecutive months of sales decline.
🧩 Problem Statement
Identify products whose monthly sales dropped 3 times in a row.
💾 Step 1: Assume Example Tables
monthly_sales
| product_id | month | sales |
|---|---|---|
| 1 | Jan | 1000 |
| 1 | Feb | 900 |
| 1 | Mar | 800 |
| 1 | Apr | 700 |
| 2 | Jan | 1200 |
| 2 | Feb | 1300 |
| 2 | Mar | 1250 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
We’ll use multiple LAG() windows to detect a downward chain.
🏗️ SQL Query
SELECT product_id
FROM (
SELECT product_id,
month,
sales,
LAG(sales, 1) OVER (PARTITION BY product_id ORDER BY month) AS prev_sales,
LAG(sales, 2) OVER (PARTITION BY product_id ORDER BY month) AS prev2_sales
FROM monthly_sales
) t
WHERE sales < prev_sales AND prev_sales < prev2_sales;🔍 Explanation
- Two lagged comparisons ensure 3 consecutive months of decline.
- The outer filter checks monotonic decreasing trend.
🧾 Step 3: Expected Output
| product_id |
|---|
| 1 |
⚙️ Step 4: Why This Query Works
Sequential window comparisons reveal downward patterns efficiently — no need for recursive logic.
⚡ Step 5: Common Mistakes
- Forgetting to partition by
product_id. - Using
<inconsistently (e.g., <= creates false positives for flat months).
🧠 Step 6: Alternate Approach
-- Using row numbers and self-join
WITH ranked AS (
SELECT product_id, month, sales,
ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY month) AS rn
FROM monthly_sales
)
SELECT r1.product_id
FROM ranked r1
JOIN ranked r2 ON r1.product_id = r2.product_id AND r1.rn = r2.rn - 1
JOIN ranked r3 ON r1.product_id = r3.product_id AND r1.rn = r3.rn - 2
WHERE r1.sales > r2.sales AND r2.sales > r3.sales;More verbose but works in all SQL dialects.
🎯 Interview Tip
Mention this query demonstrates trend analysis — often used for detecting declining products or churn.
🔍 Interviewer’s Analysis
- Concept Tested: Consecutive trend detection / LAG chaining
- Difficulty: Advanced
- Follow-Ups: Detect 3 consecutive increases; detect “flat” patterns.
🪄 End of Q11.
❓ Q12: Assign ROW_NUMBER to employees ordered by salary.
🧩 Problem Statement
Give each employee a unique sequential number based on their salary order (highest first).
💾 Step 1: Assume Example Tables
employees
| emp_id | name | salary |
|---|---|---|
| 1 | Amit Sharma | 60000 |
| 2 | Ananya Roy | 80000 |
| 3 | Bhavna Rai | 90000 |
| 4 | Chetan Joshi | 70000 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
We’ll use ROW_NUMBER() to generate unique sequence numbers based on salary order.
🏗️ SQL Query
SELECT emp_id,
name,
salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS salary_rank
FROM employees;🔍 Explanation
ORDER BY salary DESCsorts employees by salary.ROW_NUMBER()assigns unique numbers starting at 1.
🧾 Step 3: Expected Output
| emp_id | name | salary | salary_rank |
|---|---|---|---|
| 3 | Bhavna Rai | 90000 | 1 |
| 2 | Ananya Roy | 80000 | 2 |
| 4 | Chetan Joshi | 70000 | 3 |
| 1 | Amit Sharma | 60000 | 4 |
⚙️ Step 4: Why This Query Works
Unlike RANK(), ROW_NUMBER() always increments sequentially — even for ties.
⚡ Step 5: Common Mistakes
- Using
RANK()expecting sequential numbers — produces gaps. - Forgetting to specify
ORDER BY→ arbitrary numbering.
🧠 Step 6: Alternate Approach
-- Add department partition
SELECT emp_id, name, dept_id, salary,
ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS dept_salary_rank
FROM employees;🎯 Interview Tip
Interviewers often ask, “What’s the difference between ROW_NUMBER, RANK, and DENSE_RANK?”
Be ready to explain how they handle ties.
🔍 Interviewer’s Analysis
- Concept Tested: Window Ranking / ROW_NUMBER
- Difficulty: Beginner → Intermediate
- Follow-Ups: Find top-N per department using
ROW_NUMBERin subquery.
🪄 End of Q12.
❓ Q13: Delete duplicate rows from a table.
🧩 Problem Statement
Remove duplicate records from a table while keeping only one instance of each unique row.
💾 Step 1: Assume Example Tables
customers
| customer_id | name | |
|---|---|---|
| 1 | Aisha | aisha@gmail.com |
| 2 | Raj | raj@gmail.com |
| 3 | Meena | meena@gmail.com |
| 2 | Raj | raj@gmail.com |
| 3 | Meena | meena@gmail.com |
(Notice duplicates for Raj and Meena.)
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Use a window function to assign row numbers to duplicates, then delete all but the first.
🏗️ SQL Query (Generic SQL)
DELETE FROM customers
WHERE customer_id NOT IN (
SELECT MIN(customer_id)
FROM customers
GROUP BY name, email
);🏗️ SQL Query (If DB supports CTEs)
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY name, email ORDER BY customer_id) AS rn
FROM customers
)
DELETE FROM ranked WHERE rn > 1;🔍 Explanation
- Group duplicates by logical uniqueness (
name,email). - Keep the first row, delete others.
ROW_NUMBER()ensures deterministic deletion.
🧾 Step 3: Expected Output
| customer_id | name | |
|---|---|---|
| 1 | Aisha | aisha@gmail.com |
| 2 | Raj | raj@gmail.com |
| 3 | Meena | meena@gmail.com |
⚙️ Step 4: Why This Query Works
The ranking-based approach ensures exactly one record per group remains.
⚡ Step 5: Common Mistakes
- Forgetting to partition by correct columns.
- Running
DELETEwithout filtering — wiping out all rows.
🎯 Interview Tip
Interviewers often ask: “How would you do this without CTEs?” — Be ready to explain both approaches (group aggregate + window).
🔍 Interviewer’s Analysis
- Concept Tested: Deduplication / Window filtering
- Difficulty: Intermediate
- Follow-Ups: Preserve the latest record (use
ORDER BYinROW_NUMBER()).
🪄 End of Q13.
❓ Q14: Keep only the first occurrence (lowest ID) of duplicates.
🧩 Problem Statement
From a table with duplicate entries, retain only the first row (lowest ID) for each unique combination.
💾 Step 1: Assume Example Tables
transactions
| trans_id | user_id | product_id | amount |
|---|---|---|---|
| 1 | 101 | P1 | 500 |
| 2 | 101 | P1 | 500 |
| 3 | 101 | P2 | 300 |
| 4 | 102 | P3 | 700 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
We want the earliest record per user-product pair.
🏗️ SQL Query
DELETE FROM transactions
WHERE trans_id NOT IN (
SELECT MIN(trans_id)
FROM transactions
GROUP BY user_id, product_id
);🔍 Explanation
MIN(trans_id)identifies the first occurrence per group.- Deletes all others with higher IDs.
🧾 Step 3: Expected Output
| trans_id | user_id | product_id | amount |
|---|---|---|---|
| 1 | 101 | P1 | 500 |
| 3 | 101 | P2 | 300 |
| 4 | 102 | P3 | 700 |
⚙️ Step 4: Why This Query Works
Using the MIN() of unique identifiers ensures deterministic retention.
⚡ Step 5: Common Mistakes
- Using
DISTINCTinstead of deletion — doesn’t remove from table. - Forgetting composite uniqueness (user_id + product_id).
🎯 Interview Tip
In production, never DELETE directly — test first with SELECT to confirm affected rows.
🔍 Interviewer’s Analysis
- Concept Tested: Deduplication via group min/max
- Difficulty: Beginner → Intermediate
- Follow-Ups: Keep latest record (replace
MIN()withMAX()).
🪄 End of Q14.
❓ Q15: Standardize phone numbers (remove special characters).
🧩 Problem Statement
Clean messy phone numbers by removing -, (, ), spaces, and converting to a uniform 10-digit string.
💾 Step 1: Assume Example Tables
customers
| customer_id | name | phone |
|---|---|---|
| 1 | Raj | (123)-456-7890 |
| 2 | Aisha | 123 456 7890 |
| 3 | Meena | 123.456.7890 |
| 4 | Amit | +91-98765-43210 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
We’ll use nested REPLACE() or regex functions to strip unwanted characters.
🏗️ SQL Query (MySQL)
SELECT customer_id,
name,
REPLACE(
REPLACE(
REPLACE(
REPLACE(phone, '-', ''),
'(', ''),
')', ''),
' ', '') AS clean_phone
FROM customers;🏗️ SQL Query (PostgreSQL)
SELECT customer_id,
name,
REGEXP_REPLACE(phone, '[^0-9]', '', 'g') AS clean_phone
FROM customers;🔍 Explanation
- Regex removes all non-digit characters.
- Results in consistent numeric-only phone format.
🧾 Step 3: Expected Output
| customer_id | name | clean_phone |
|---|---|---|
| 1 | Raj | 1234567890 |
| 2 | Aisha | 1234567890 |
| 3 | Meena | 1234567890 |
| 4 | Amit | 919876543210 |
⚙️ Step 4: Why This Query Works
Regex [^\d] removes all characters that aren’t digits, making phone data clean for analytics or joining.
⚡ Step 5: Common Mistakes
- Forgetting the
'g'flag (global replace). - Not handling international codes consistently.
🎯 Interview Tip
Mention you’d validate lengths post-cleanup (10 or 12 digits) before loading into systems.
🔍 Interviewer’s Analysis
- Concept Tested: String cleaning / regex
- Difficulty: Intermediate
- Follow-Ups: Enforce specific length via
CHAR_LENGTH().
🪄 End of Q15.
❓ Q16: Split address into street, city, and state.
🧩 Problem Statement
Given a combined address column, extract its components: street, city, and state.
💾 Step 1: Assume Example Tables
addresses
| id | address |
|---|---|
| 1 | 123 Main St, Mumbai, MH |
| 2 | 456 Oak Road, Delhi, DL |
| 3 | 789 Palm Ave, Bangalore, KA |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
We’ll use string-splitting functions such as SUBSTRING_INDEX() or SPLIT_PART().
🏗️ SQL Query (MySQL)
SELECT id,
TRIM(SUBSTRING_INDEX(address, ',', 1)) AS street,
TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(address, ',', 2), ',', -1)) AS city,
TRIM(SUBSTRING_INDEX(address, ',', -1)) AS state
FROM addresses;🏗️ SQL Query (PostgreSQL)
SELECT id,
SPLIT_PART(address, ',', 1) AS street,
SPLIT_PART(address, ',', 2) AS city,
SPLIT_PART(address, ',', 3) AS state
FROM addresses;🔍 Explanation
- Split the string by commas.
- Extract the 1st, 2nd, and 3rd parts.
🧾 Step 3: Expected Output
| id | street | city | state |
|---|---|---|---|
| 1 | 123 Main St | Mumbai | MH |
| 2 | 456 Oak Road | Delhi | DL |
| 3 | 789 Palm Ave | Bangalore | KA |
⚙️ Step 4: Why This Query Works
The SPLIT_PART() or SUBSTRING_INDEX() functions allow parsing multi-part text columns into normalized attributes.
⚡ Step 5: Common Mistakes
- Forgetting to trim spaces.
- Assuming consistent delimiters — real data often varies.
🎯 Interview Tip
Mention that in ETL pipelines, this logic often moves to staging tables to avoid slowing the main DB.
🔍 Interviewer’s Analysis
- Concept Tested: String splitting / text parsing
- Difficulty: Intermediate
- Follow-Ups: Handle inconsistent delimiters (“;” or “|”).
🪄 End of Q16.
❓ Q17: Convert customer names into Proper Case.
🧩 Problem Statement
Standardize customer names so that each word starts with an uppercase letter.
💾 Step 1: Assume Example Tables
customers
| customer_id | name |
|---|---|
| 1 | raj patel |
| 2 | AISHA ROY |
| 3 | meena singh |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Use text transformation functions — varies by SQL dialect.
🏗️ SQL Query (PostgreSQL)
SELECT customer_id,
INITCAP(name) AS proper_name
FROM customers;🏗️ SQL Query (MySQL)
SELECT customer_id,
CONCAT(UCASE(LEFT(name, 1)), LCASE(SUBSTRING(name, 2))) AS proper_name
FROM customers;🔍 Explanation
INITCAP()capitalizes the first letter of each word.- MySQL version manually constructs it using substring logic.
🧾 Step 3: Expected Output
| customer_id | proper_name |
|---|---|
| 1 | Raj Patel |
| 2 | Aisha Roy |
| 3 | Meena Singh |
⚙️ Step 4: Why This Query Works
Transforms messy casing into consistent, human-readable names.
⚡ Step 5: Common Mistakes
- Ignoring multi-word names in MySQL (only first letter handled).
- Forgetting locale-specific capitalization rules.
🎯 Interview Tip
Mention that standardized text improves joins, reporting, and ML data quality.
🔍 Interviewer’s Analysis
- Concept Tested: String transformation
- Difficulty: Beginner
- Follow-Ups: Extend to addresses or product titles.
🪄 End of Q17.
❓ Q18: Design tables to store credit card transactions with normalization.
🧩 Problem Statement
Design a normalized schema for storing credit card transactions to ensure data consistency, minimal redundancy, and fast lookups.
💾 Step 1: Conceptual Breakdown
We’re modeling:
- Customers — who own cards
- Cards — issued to customers
- Merchants — where transactions occur
- Transactions — main fact table
We’ll normalize to 3NF (Third Normal Form):
- Each table has a unique key.
- No transitive dependencies.
🧮 Step 2: Table Designs
1️⃣ customers
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100) UNIQUE,
phone VARCHAR(15)
);2️⃣ cards
CREATE TABLE cards (
card_id INT PRIMARY KEY,
customer_id INT,
card_number CHAR(16) UNIQUE,
issue_date DATE,
expiry_date DATE,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);3️⃣ merchants
CREATE TABLE merchants (
merchant_id INT PRIMARY KEY,
name VARCHAR(100),
category VARCHAR(50),
city VARCHAR(50)
);4️⃣ transactions
CREATE TABLE transactions (
transaction_id BIGINT PRIMARY KEY,
card_id INT,
merchant_id INT,
transaction_date TIMESTAMP,
amount DECIMAL(10, 2),
status VARCHAR(20),
FOREIGN KEY (card_id) REFERENCES cards(card_id),
FOREIGN KEY (merchant_id) REFERENCES merchants(merchant_id)
);🧾 Step 3: Sample Data Flow
| customer_id | name | |
|---|---|---|
| 1 | Raj | raj@bank.com |
| 2 | Aisha | aisha@bank.com |
| card_id | customer_id | card_number |
|---|---|---|
| 10 | 1 | 1234567812345678 |
| 11 | 2 | 8765432187654321 |
| merchant_id | name | category | city |
|---|---|---|---|
| 101 | Amazon | E-Com | Mumbai |
| 102 | Starbucks | Food | Delhi |
| transaction_id | card_id | merchant_id | amount | status |
|---|---|---|---|---|
| 9001 | 10 | 101 | 1200.50 | Success |
| 9002 | 10 | 102 | 350.00 | Success |
⚙️ Step 4: Why This Schema Works
- Fully normalized → no duplication.
- Relationships clearly defined with foreign keys.
- Easy joins for transaction analysis and fraud detection.
🎯 Interview Tip
Interviewers often ask: “How would you denormalize this for analytics?” → Create a materialized view joining all four tables for dashboard reads.
🔍 Interviewer’s Analysis
- Concept Tested: Schema design / normalization
- Difficulty: Advanced
- Follow-Ups: Add partitioning for
transactions; introducecurrencydimension.
🪄 End of Q18.
❓ Q19: Create an index on customer_id for fast lookups.
🧩 Problem Statement
You frequently filter transactions by customer_id. Improve query performance using indexes.
💾 Step 1: Base Table
CREATE TABLE transactions (
transaction_id BIGINT PRIMARY KEY,
customer_id INT,
amount DECIMAL(10,2),
transaction_date DATE
);🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Indexing creates a sorted access path — reducing search time from O(n) to O(log n).
🏗️ SQL Query
CREATE INDEX idx_customer_id
ON transactions (customer_id);🔍 Explanation
The database builds a B-tree index on
customer_id.Queries like:
SELECT * FROM transactions WHERE customer_id = 101;will use this index to locate rows faster.
⚙️ Step 4: Why This Query Works
An index avoids full-table scans for frequent lookups on indexed columns.
⚡ Step 5: Common Mistakes
- Indexing low-cardinality columns (e.g.,
status) → no gain. - Over-indexing → slows down inserts and updates.
🎯 Interview Tip
Mention “Index Selectivity” —
The ratio of unique values to total rows — higher is better.
🔍 Interviewer’s Analysis
- Concept Tested: Index design / performance tuning
- Difficulty: Intermediate
- Follow-Ups: Explain difference between clustered and non-clustered indexes.
🪄 End of Q19.
❓ Q20: Implement partitioning for large tables.
🧩 Problem Statement
Your transactions table has billions of rows.
Implement partitioning to improve performance and maintainability.
💾 Step 1: Example Table
CREATE TABLE transactions (
transaction_id BIGINT,
transaction_date DATE,
amount DECIMAL(10,2)
);🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Partitioning divides data physically into chunks — usually by date, region, or hash — improving query speed and manageability.
🏗️ SQL Query (MySQL RANGE partitioning)
CREATE TABLE transactions_partitioned (
transaction_id BIGINT,
transaction_date DATE,
amount DECIMAL(10,2)
)
PARTITION BY RANGE (YEAR(transaction_date)) (
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025),
PARTITION pmax VALUES LESS THAN MAXVALUE
);🔍 Explanation
Each partition holds one year’s worth of data.
Queries like
SELECT * FROM transactions_partitioned WHERE transaction_date >= '2024-01-01';will automatically scan only the relevant partition.
⚙️ Step 4: Why This Query Works
Partitioning prunes irrelevant data blocks → faster scans.
⚡ Step 5: Common Mistakes
- Using too many partitions → metadata overhead.
- Forgetting to update partition definitions annually.
🎯 Interview Tip
Always mention partition pruning — the DB engine only reads needed partitions.
🔍 Interviewer’s Analysis
- Concept Tested: Table partitioning / scalability
- Difficulty: Advanced
- Follow-Ups: Discuss hash vs range vs list partitioning.
🪄 End of Q20.
❓ Q21: Create a temporary table for storing summary results.
🧩 Problem Statement
During ETL, you need to store intermediate aggregations temporarily — e.g., daily revenue per region.
💾 Step 1: Example Base Table
sales
| region | sale_date | amount |
|---|---|---|
| North | 2024-01-01 | 5000 |
| South | 2024-01-01 | 7000 |
| North | 2024-01-02 | 8000 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
A temporary table lives only within your session — perfect for multi-step ETL transformations.
🏗️ SQL Query
CREATE TEMPORARY TABLE temp_daily_revenue AS
SELECT region,
sale_date,
SUM(amount) AS total_sales
FROM sales
GROUP BY region, sale_date;🔍 Explanation
- Creates a temp table with summarized data.
- Auto-drops when session ends.
🧾 Step 3: Expected Output
| region | sale_date | total_sales |
|---|---|---|
| North | 2024-01-01 | 5000 |
| South | 2024-01-01 | 7000 |
| North | 2024-01-02 | 8000 |
⚙️ Step 4: Why This Query Works
Temporary tables allow multi-step computation without polluting the permanent schema.
⚡ Step 5: Common Mistakes
- Using permanent tables for staging data.
- Forgetting session cleanup (though most databases handle it automatically).
🎯 Interview Tip
Mention that temporary tables are critical for incremental ETL pipelines and intermediate staging.
🔍 Interviewer’s Analysis
- Concept Tested: Temporary tables / staging logic
- Difficulty: Intermediate
- Follow-Ups: Use CTEs instead of temporary tables for one-time queries.
🪄 End of Q21.
❓ Q22: Create a view summarizing customer purchases.
🧩 Problem Statement
Define a view that shows each customer’s total purchase amount and last purchase date.
💾 Step 1: Assume Example Tables
orders
| order_id | customer_id | order_date | amount |
|---|---|---|---|
| 1 | 101 | 2024-01-10 | 500 |
| 2 | 101 | 2024-02-15 | 300 |
| 3 | 102 | 2024-01-20 | 700 |
| 4 | 103 | 2024-03-01 | 200 |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
A view is a stored query result that can be referenced like a table.
🏗️ SQL Query
CREATE VIEW customer_summary AS
SELECT customer_id,
SUM(amount) AS total_purchases,
MAX(order_date) AS last_purchase_date
FROM orders
GROUP BY customer_id;🔍 Explanation
- Aggregates total purchase amount per customer.
- Tracks most recent purchase date.
- Reusable for dashboards and reporting.
🧾 Step 3: Expected Output
| customer_id | total_purchases | last_purchase_date |
|---|---|---|
| 101 | 800 | 2024-02-15 |
| 102 | 700 | 2024-01-20 |
| 103 | 200 | 2024-03-01 |
⚙️ Step 4: Why This Query Works
The view encapsulates recurring logic — improving maintainability and code reuse.
⚡ Step 5: Common Mistakes
- Using views with volatile data in critical transactions → may degrade performance.
- Forgetting to refresh materialized views (in DBs that support them).
🎯 Interview Tip
Explain when to use:
- Views → Logical abstraction.
- Materialized Views → Precomputed aggregates for faster reporting.
🔍 Interviewer’s Analysis
- Concept Tested: View creation / data summarization
- Difficulty: Beginner → Intermediate
- Follow-Ups: Convert to materialized view for faster reads.
🪄 End of Q22.
❓ Q23: Rewrite a correlated subquery as a join to improve performance.
🧩 Problem Statement
Optimize a query that uses a correlated subquery by rewriting it as a JOIN, which performs better on large datasets.
💾 Step 1: Assume Example Tables
employees
| emp_id | name | dept_id | salary |
|---|---|---|---|
| 1 | Raj Patel | 10 | 70000 |
| 2 | Aisha Khan | 10 | 85000 |
| 3 | Meena Joshi | 20 | 95000 |
| 4 | Chetan Joshi | 20 | 88000 |
departments
| dept_id | dept_name | location |
|---|---|---|
| 10 | Sales | Mumbai |
| 20 | Engineering | Delhi |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
The original query uses a correlated subquery to fetch department names.
🚫 Inefficient Version
SELECT e.name,
(SELECT dept_name
FROM departments d
WHERE d.dept_id = e.dept_id) AS dept_name
FROM employees e;This subquery executes once per row (O(n²) time for large data).
✅ Optimized JOIN Version
SELECT e.name, d.dept_name
FROM employees e
JOIN departments d
ON e.dept_id = d.dept_id;🔍 Explanation
- The join retrieves all needed data in one scan.
- Modern SQL optimizers convert simple correlated subqueries into joins, but not always — explicitly writing the join is safer and clearer.
🧾 Step 3: Expected Output
| name | dept_name |
|---|---|
| Raj Patel | Sales |
| Aisha Khan | Sales |
| Meena Joshi | Engineering |
| Chetan Joshi | Engineering |
⚙️ Step 4: Why This Query Works
JOINs allow set-based operations, minimizing per-row lookups and improving scalability.
⚡ Step 5: Common Mistakes
- Forgetting to use table aliases (
e,d). - Using
LEFT JOINunnecessarily when anINNER JOINsuffices.
🎯 Interview Tip
Interviewers often ask:
“How does rewriting improve performance?” Answer: “Because JOINs allow SQL engines to optimize via hash joins or index scans, while correlated subqueries often force nested-loop evaluations.”
🔍 Interviewer’s Analysis
- Concept Tested: Subquery optimization
- Difficulty: Intermediate
- Follow-Ups: Compare runtime via
EXPLAINplans.
🪄 End of Q23.
❓ Q24: Find the difference in execution time between indexed vs non-indexed queries.
🧩 Problem Statement
Demonstrate how indexes improve query performance by comparing execution time of a filtered query with and without an index.
💾 Step 1: Assume Example Table
orders
| order_id | customer_id | order_date | amount |
|---|---|---|---|
| 1 | 101 | 2024-01-01 | 500 |
| 2 | 102 | 2024-01-05 | 700 |
| … | … | … | … |
(Assume ~1M rows in practice.)
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
We’ll test a lookup query with and without an index on customer_id.
🚫 Without Index
SELECT * FROM orders WHERE customer_id = 101;- Execution plan: Full table scan
- Time complexity: O(n)
✅ With Index
CREATE INDEX idx_customer_id ON orders(customer_id);
SELECT * FROM orders WHERE customer_id = 101;- Execution plan: Index seek
- Time complexity: O(log n)
🧾 Step 3: Expected Result (Performance Conceptually)
| Scenario | Operation Type | Approx. Time (for 1M rows) |
|---|---|---|
| Without Index | Full Scan | ~1.2 sec |
| With Index | Index Seek | ~0.01 sec |
⚙️ Step 4: Why This Query Works
An index acts like a sorted lookup table — dramatically reduces search space. Query planners automatically use the best available index for filtering conditions.
⚡ Step 5: Common Mistakes
- Indexing low-cardinality columns (e.g., gender) → negligible benefit.
- Forgetting to analyze query plans using
EXPLAIN.
🧠 Step 6: Verify with EXPLAIN
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 101;Compare cost estimates before and after indexing.
🎯 Interview Tip
If asked, say:
“Indexing improves read performance but adds overhead on writes — so I only index high-selectivity columns used in frequent filters.”
🔍 Interviewer’s Analysis
- Concept Tested: Indexing impact / EXPLAIN plans
- Difficulty: Intermediate
- Follow-Ups: Composite indexes and index maintenance cost.
🪄 End of Q24.
❓ Q25: Demonstrate the safe use of query hints (USE INDEX, FORCE ORDER).
🧩 Problem Statement
Show how to guide the SQL optimizer using query hints, while understanding when it’s safe (or unsafe) to do so.
💾 Step 1: Assume Example Tables
employees
departments
| emp_id | name | dept_id |
|---|---|---|
| 1 | Raj | 10 |
| 2 | Aisha | 20 |
| 3 | Meena | 10 |
| dept_id | dept_name |
|---|---|
| 10 | Sales |
| 20 | HR |
🧮 Step 2: Step-by-Step Solution
🧠 Understanding
Most of the time, the optimizer chooses the best execution plan automatically. But sometimes — e.g., in skewed data — you can use hints to influence behavior.
✅ Example 1 — Force Index
SELECT e.name, d.dept_name
FROM employees e USE INDEX (idx_dept)
JOIN departments d ON e.dept_id = d.dept_id;Use Case:
If multiple indexes exist, this hint forces the optimizer to use idx_dept.
✅ Example 2 — Force Join Order
SELECT /*+ FORCE ORDER */ e.name, d.dept_name
FROM employees e
JOIN departments d ON e.dept_id = d.dept_id;Use Case: When query planner’s automatic join reordering produces inefficient plans for known data distributions.
⚙️ Step 4: Why This Query Works
Hints let you override optimizer heuristics only when you’re certain your data pattern justifies it.
⚡ Step 5: Common Mistakes
- Overusing hints — they can break performance as data evolves.
- Hardcoding index names that later change.
🎯 Interview Tip
Say this line:
“Hints are a scalpel, not a hammer — I use them sparingly and only after verifying the query plan and data distribution.”
🔍 Interviewer’s Analysis
- Concept Tested: Optimizer behavior / Query tuning
- Difficulty: Advanced
- Follow-Ups: Explain how to verify plan changes with
EXPLAIN ANALYZE.
🪄 End of Q25.