2.4 Avoiding Common Pitfalls

6 min read 1071 words

🪄 Step 1: Intuition & Real-World Analogy

🎯 Core Idea

Most slow queries don’t suffer from bad databases — they suffer from bad habits. Tiny mistakes like adding a function in a WHERE clause or writing SELECT * can make your query crawl.

The secret to SQL performance isn’t fancy optimization — it’s avoiding the traps that silently sabotage you.


🌍 Real-Life Analogy

Imagine you’re driving to work:

  • You know a highway that’s fast — but one wrong turn (say, a “function” detour) dumps you into traffic.
  • Sometimes you carry too much luggage (SELECT *).
  • And instead of organizing your trunk, you just tape things down (DISTINCT hack). 😅

SQL queries face the same fate — they’re powerful machines that need clean routes, lightweight loads, and no duct-tape fixes.


💬 Conversational Style

A slow query isn’t always obvious at first glance. It works, but when you look at the execution plan — it’s doing a full table scan when it could’ve used an index.

So let’s explore the most common pitfalls — and how to write SQL that thinks smart, not hard.


🌱 Step 2: The Core Concept Explained

What It Really Does

Common SQL pitfalls degrade performance by confusing the query optimizer or forcing unnecessary work:

  • Wrapping indexed columns in functions.
  • Using SELECT * (fetching all columns even if unused).
  • Using DISTINCT to mask duplicate logic.
  • Ignoring caching and index-friendliness.
How It Works Internally

SQL engines rely on statistics and indexes to optimize access. When you use an expression like LOWER(email) = 'john@example.com', the index on email becomes unusable — because the function changes the data before lookup. This forces a sequential scan, killing performance.

Similarly, SELECT * forces the engine to:

  • Read every column from disk.
  • Allocate extra I/O and memory.
  • Potentially invalidate caching for future queries.

And DISTINCT? It adds a deduplication step — often implemented as a sort + unique operation. If you’re using it to fix messy joins — you’re paying for logic mistakes with performance.

Common Use Cases
  • Cleaning up production queries that randomly use DISTINCT.
  • Replacing function-wrapped columns with computed indexes or preprocessed columns.
  • Debugging slow SELECT * reports.
  • Teaching query authors about I/O cost awareness.

💻 Step 3: Code Demonstration & Explanation

🧩 Example 1: Functions on Indexed Columns

-- ❌ Bad
SELECT * FROM customers
WHERE LOWER(email) = 'john@example.com';

Problem: Even if there’s an index on email, it won’t be used — because LOWER() transforms every value before comparison.

Fix: Use a functional index or normalize before storing.

-- ✅ Better
CREATE INDEX idx_lower_email ON customers (LOWER(email));

SELECT * FROM customers
WHERE LOWER(email) = 'john@example.com';

Now the database can use the functional index directly.


🧩 Example 2: The SELECT * Problem

-- ❌ Bad
SELECT * FROM orders WHERE order_id = 1001;

Even if you only need two columns (amount, status), the DB reads every column — maybe dozens of them.

Fix:

-- ✅ Better
SELECT order_id, amount, status
FROM orders
WHERE order_id = 1001;

This avoids unnecessary I/O and reduces buffer cache pressure — critical in large tables.


🧩 Example 3: Misusing DISTINCT

-- ❌ Bad
SELECT DISTINCT customer_id
FROM orders o
JOIN payments p ON o.id = p.order_id;

If your join logic is wrong, duplicates appear — and DISTINCT hides the symptom, not the disease.

Fix the logic, not the output:

-- ✅ Better
SELECT o.customer_id
FROM orders o
WHERE EXISTS (
  SELECT 1 FROM payments p
  WHERE p.order_id = o.id
);

Now the join is logically correct and efficient — no redundant deduplication.


🧩 Example 4: Index-Friendly Design

-- ❌ Bad: Non-sargable (not Search ARGument Able)
SELECT * FROM sales
WHERE YEAR(order_date) = 2024;

This applies a function to every row — making the index on order_date useless.

Fix:

-- ✅ Better
SELECT * FROM sales
WHERE order_date >= '2024-01-01'
  AND order_date < '2025-01-01';

Now the DB can use the range scan on the order_date index.


⚙️ Step 4: Performance Insights & Optimization

ℹ️

🔍 Execution Order:

  • The optimizer can only use indexes on raw column references.
  • Any function, wildcard, or data-type mismatch breaks sargability.

📊 I/O Cost:

  • SELECT * → higher disk reads, more memory usage.
  • Non-sargable filters → full table scan.
  • DISTINCT → sort + unique = O(N log N) complexity.

🧩 Optimization Tips:

  1. Always filter early and select narrowly.

  2. Prefer indexed columns in filters and joins.

  3. Use EXPLAIN ANALYZE to check scan types:

    EXPLAIN ANALYZE SELECT * FROM orders WHERE LOWER(email) = 'x';

    → If you see Seq Scan, your index isn’t being used.

  4. Normalize data for index-friendliness — e.g., store lowercase emails.


🎯 Step 5: Interview Perspective & Reasoning

Common Interview Questions
  • “Why is using SELECT * bad practice?”
  • “How do functions in WHERE clauses affect performance?”
  • “When is DISTINCT a bad idea?”
  • “How do you detect whether a query uses an index?”
  • “Your query works but runs slow — what steps do you take to debug it?”
Reasoning Expectation

Strong candidates demonstrate:

  • System-level awareness (I/O, caching, optimizer behavior).
  • Understanding of sargability (search argument friendliness).
  • The ability to explain and prove why something is slow using EXPLAIN ANALYZE.
Bonus Challenge

You have this query:

SELECT * FROM orders WHERE UPPER(region) = 'WEST';

There’s an index on region.

1️⃣ Explain why the index isn’t used. 2️⃣ Suggest two ways to fix it without changing the application logic. 3️⃣ Show how you’d verify your fix using EXPLAIN ANALYZE.


🧩 Step 6: Common Mistakes & Debugging Insights

🚨 Common Mistakes (Click to Expand)
  • Using functions on columns in WHERE clauses (breaks index usage).
  • Assuming DISTINCT solves duplicates — instead of checking join conditions.
  • Fetching every column when only a few are needed.
  • Ignoring type mismatches (id as INT vs id as TEXT).
  • Not running ANALYZE to refresh statistics after large data loads.
  • Over-optimizing with too many indexes — slows down writes.

Debugging Checklist:

  1. Run EXPLAIN (ANALYZE, BUFFERS) to inspect query path.
  2. Look for Seq Scan — if it appears, check if your filter prevents index use.
  3. Compare actual time vs cost to see if estimates are realistic.
  4. Investigate caching — repeated slow queries may need a query cache or materialized view.

🧠 Step 7: Key Takeaways

🧠 Main Idea: Performance problems often come from subtle mistakes — not lack of indexes.

💻 How It Works: Query engines skip indexes if you transform columns, pull all fields, or use brute-force DISTINCTs.

🚀 Why It Matters: Clean, index-friendly SQL scales naturally. The best optimization is knowing what not to do.

Any doubt in content? Ask me anything?
Chat
🤖 👋 Hi there! I'm your learning assistant. If you have any questions about this page or need clarification, feel free to ask!