2.4 Avoiding Common Pitfalls
🪄 Step 1: Intuition & Real-World Analogy
🎯 Core Idea
Most slow queries don’t suffer from bad databases — they suffer from bad habits.
Tiny mistakes like adding a function in a WHERE clause or writing SELECT * can make your query crawl.
The secret to SQL performance isn’t fancy optimization — it’s avoiding the traps that silently sabotage you.
🌍 Real-Life Analogy
Imagine you’re driving to work:
- You know a highway that’s fast — but one wrong turn (say, a “function” detour) dumps you into traffic.
- Sometimes you carry too much luggage (
SELECT *). - And instead of organizing your trunk, you just tape things down (
DISTINCThack). 😅
SQL queries face the same fate — they’re powerful machines that need clean routes, lightweight loads, and no duct-tape fixes.
💬 Conversational Style
A slow query isn’t always obvious at first glance. It works, but when you look at the execution plan — it’s doing a full table scan when it could’ve used an index.
So let’s explore the most common pitfalls — and how to write SQL that thinks smart, not hard.
🌱 Step 2: The Core Concept Explained
What It Really Does
Common SQL pitfalls degrade performance by confusing the query optimizer or forcing unnecessary work:
- Wrapping indexed columns in functions.
- Using
SELECT *(fetching all columns even if unused). - Using
DISTINCTto mask duplicate logic. - Ignoring caching and index-friendliness.
How It Works Internally
SQL engines rely on statistics and indexes to optimize access.
When you use an expression like LOWER(email) = 'john@example.com', the index on email becomes unusable — because the function changes the data before lookup.
This forces a sequential scan, killing performance.
Similarly, SELECT * forces the engine to:
- Read every column from disk.
- Allocate extra I/O and memory.
- Potentially invalidate caching for future queries.
And DISTINCT? It adds a deduplication step — often implemented as a sort + unique operation.
If you’re using it to fix messy joins — you’re paying for logic mistakes with performance.
Common Use Cases
- Cleaning up production queries that randomly use
DISTINCT. - Replacing function-wrapped columns with computed indexes or preprocessed columns.
- Debugging slow
SELECT *reports. - Teaching query authors about I/O cost awareness.
💻 Step 3: Code Demonstration & Explanation
🧩 Example 1: Functions on Indexed Columns
-- ❌ Bad
SELECT * FROM customers
WHERE LOWER(email) = 'john@example.com';Problem:
Even if there’s an index on email, it won’t be used — because LOWER() transforms every value before comparison.
Fix: Use a functional index or normalize before storing.
-- ✅ Better
CREATE INDEX idx_lower_email ON customers (LOWER(email));
SELECT * FROM customers
WHERE LOWER(email) = 'john@example.com';Now the database can use the functional index directly.
🧩 Example 2: The SELECT * Problem
-- ❌ Bad
SELECT * FROM orders WHERE order_id = 1001;Even if you only need two columns (amount, status), the DB reads every column — maybe dozens of them.
Fix:
-- ✅ Better
SELECT order_id, amount, status
FROM orders
WHERE order_id = 1001;This avoids unnecessary I/O and reduces buffer cache pressure — critical in large tables.
🧩 Example 3: Misusing DISTINCT
-- ❌ Bad
SELECT DISTINCT customer_id
FROM orders o
JOIN payments p ON o.id = p.order_id;If your join logic is wrong, duplicates appear — and DISTINCT hides the symptom, not the disease.
Fix the logic, not the output:
-- ✅ Better
SELECT o.customer_id
FROM orders o
WHERE EXISTS (
SELECT 1 FROM payments p
WHERE p.order_id = o.id
);Now the join is logically correct and efficient — no redundant deduplication.
🧩 Example 4: Index-Friendly Design
-- ❌ Bad: Non-sargable (not Search ARGument Able)
SELECT * FROM sales
WHERE YEAR(order_date) = 2024;This applies a function to every row — making the index on order_date useless.
Fix:
-- ✅ Better
SELECT * FROM sales
WHERE order_date >= '2024-01-01'
AND order_date < '2025-01-01';Now the DB can use the range scan on the order_date index.
⚙️ Step 4: Performance Insights & Optimization
🔍 Execution Order:
- The optimizer can only use indexes on raw column references.
- Any function, wildcard, or data-type mismatch breaks sargability.
📊 I/O Cost:
SELECT *→ higher disk reads, more memory usage.- Non-sargable filters → full table scan.
DISTINCT→ sort + unique = O(N log N) complexity.
🧩 Optimization Tips:
Always filter early and select narrowly.
Prefer indexed columns in filters and joins.
Use
EXPLAIN ANALYZEto check scan types:EXPLAIN ANALYZE SELECT * FROM orders WHERE LOWER(email) = 'x';→ If you see Seq Scan, your index isn’t being used.
Normalize data for index-friendliness — e.g., store lowercase emails.
🎯 Step 5: Interview Perspective & Reasoning
Common Interview Questions
- “Why is using
SELECT *bad practice?” - “How do functions in WHERE clauses affect performance?”
- “When is DISTINCT a bad idea?”
- “How do you detect whether a query uses an index?”
- “Your query works but runs slow — what steps do you take to debug it?”
Reasoning Expectation
Strong candidates demonstrate:
- System-level awareness (I/O, caching, optimizer behavior).
- Understanding of sargability (search argument friendliness).
- The ability to explain and prove why something is slow using
EXPLAIN ANALYZE.
Bonus Challenge
You have this query:
SELECT * FROM orders WHERE UPPER(region) = 'WEST';There’s an index on
region.1️⃣ Explain why the index isn’t used. 2️⃣ Suggest two ways to fix it without changing the application logic. 3️⃣ Show how you’d verify your fix using
EXPLAIN ANALYZE.
🧩 Step 6: Common Mistakes & Debugging Insights
🚨 Common Mistakes (Click to Expand)
- Using functions on columns in
WHEREclauses (breaks index usage). - Assuming
DISTINCTsolves duplicates — instead of checking join conditions. - Fetching every column when only a few are needed.
- Ignoring type mismatches (
idas INT vsidas TEXT). - Not running
ANALYZEto refresh statistics after large data loads. - Over-optimizing with too many indexes — slows down writes.
Debugging Checklist:
- Run
EXPLAIN (ANALYZE, BUFFERS)to inspect query path. - Look for
Seq Scan— if it appears, check if your filter prevents index use. - Compare
actual timevscostto see if estimates are realistic. - Investigate caching — repeated slow queries may need a query cache or materialized view.
🧠 Step 7: Key Takeaways
🧠 Main Idea: Performance problems often come from subtle mistakes — not lack of indexes.
💻 How It Works: Query engines skip indexes if you transform columns, pull all fields, or use brute-force DISTINCTs.
🚀 Why It Matters: Clean, index-friendly SQL scales naturally. The best optimization is knowing what not to do.