2.2 Indexing for Speed

Core Skills Guide for AI Interviews (Math, Code, SQL) 2025

5 min read 939 words

🪄 Step 1: Intuition & Real-World Analogy

🎯 Core Idea

Indexes make data retrieval lightning-fast — but they come with a cost. They act like a shortcut for reading, but every shortcut has a maintenance fee.

When you run:

CREATE INDEX idx_customer_id ON orders(customer_id);

you’re telling the database:

“Hey, please remember where each customer’s rows are stored — so you can find them instantly later.”

But the trade-off is that every INSERT, UPDATE, or DELETE now has extra work to keep that “shortcut” accurate.

🌍 Real-Life Analogy

Think of an index like a library catalog.

Without a catalog (no index): you walk through every shelf until you find the book. → Sequential scan.
With a catalog (index): you look up the author’s name and go directly to the shelf. → Index scan.

But every time a new book arrives (new record inserted), the librarian must update the catalog — which takes time.

So:

“Indexes make readers happy but librarians cry.” 😅

💬 Conversational Style

You might think, “If indexes make things faster, why not index every column?” Because — every index is like a performance tax on writes. Databases love balanced trade-offs, not blind optimization.

🌱 Step 2: The Core Concept Explained

What It Really Does

An index is a data structure (usually a B-Tree or Hash) that allows the database to quickly find rows matching a condition — without scanning the entire table.

How It Works Internally

When you query:

SELECT * FROM orders WHERE customer_id = 101;

the DBMS can:

Scan the entire table (Sequential Scan) — O(N)
Use the index to jump directly to matching rows — O(log N)

That’s like skipping to the right page instead of reading every line. Internally, a B-Tree index stores (key → pointer to row) pairs, ordered for binary search.

Common Use Cases

Columns frequently used in WHERE, JOIN, or ORDER BY clauses.
Foreign keys (like customer_id in orders).
Columns with high cardinality (many unique values).

💻 Step 3: Code Demonstration & Explanation

🧩 Example 1: Without Index

EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42;

Output:

Seq Scan on orders  (cost=0.00..350.00 rows=100 width=50)
  Filter: (customer_id = 42)

Interpretation: The database reads every row, checking if customer_id = 42. If the table has 10 million orders — it scans all 10 million. 🐢

🧩 Example 2: With Index

CREATE INDEX idx_customer_id ON orders(customer_id);

EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42;

Output:

Index Scan using idx_customer_id on orders  (cost=0.43..8.52 rows=100 width=50)
  Index Cond: (customer_id = 42)

Interpretation: The DB now jumps directly to the rows for customer 42. → No full table scan. Result: Query that took seconds may now finish in milliseconds.

🧩 Example 3: Write Trade-Offs

INSERT INTO orders VALUES (10001, 42, 499.99, '2025-10-24');

Now the DB must:

Insert into the table.
Update every relevant index (like idx_customer_id).

So bulk inserts or frequent updates slow down slightly.

🧩 Example 4: Low-Cardinality Trap

CREATE INDEX idx_status ON orders(status);

If status only has 3 values: 'Pending', 'Shipped', 'Cancelled' → The index doesn’t help — the DB might still scan most rows.

That’s like having a catalog sorted by “color” when 90% of books are blue. 😅

Low selectivity = poor index performance.

⚙️ Step 4: Performance Insights & Optimization

ℹ️

Execution Choice: The optimizer decides between a Seq Scan and Index Scan based on:
- Table size
- Index selectivity (uniqueness of values)
- Cost of random I/O vs sequential read
Rule of Thumb:
- Indexes shine when queries filter down to <10% of rows.
- For large scans, sequential reads are faster (fewer disk seeks).

Composite Indexes:

CREATE INDEX idx_customer_date ON orders(customer_id, order_date);

Great for queries like:

WHERE customer_id = 42 AND order_date > '2025-01-01'

But order matters: The leftmost column (customer_id) must appear in the condition.

Maintenance Tip: Periodically run:
```
REINDEX TABLE orders;
```
to rebuild fragmented indexes and keep lookups fast.

🎯 Step 5: Interview Perspective & Reasoning

Common Interview Questions

“When does an index improve performance?”
“Why might an index be ignored by the query planner?”
“What’s the downside of having too many indexes?”
“Would an index help on a boolean column?”
“Explain the difference between clustered and non-clustered indexes.”

Reasoning Expectation

Interviewers test depth of understanding, not memorization:

Can you explain the trade-off between read and write performance?
Do you know how selectivity affects index usefulness?
Can you justify why an index isn’t being used in a given plan?

Bonus Challenge

Suppose your query filters on gender = 'M' and region = 'North'. Which index is better: 1️⃣ (gender, region) or 2️⃣ (region, gender)?
Discuss why — in terms of cardinality and leftmost prefix rules.

🧩 Step 6: Common Mistakes & Debugging Insights

🚨 Common Mistakes (Click to Expand)

Over-indexing: Creating indexes on every column → slower inserts & wasted space.
Wrong column order in composite indexes: The optimizer can’t use (A, B) index if you only query by B.

Function-wrapped columns:

WHERE LOWER(name) = 'john'

won’t use an index on name. Use a functional index instead:

CREATE INDEX idx_lower_name ON customers(LOWER(name));

Ignoring low-cardinality columns: Index on is_active (Y/N) rarely helps — DB will still scan most rows.
Forgotten maintenance: After major data changes, run:
```
ANALYZE;
```
to update table stats; otherwise, planner may choose bad paths.

🧠 Step 7: Key Takeaways

🧠 Main Idea: Indexes accelerate lookups by maintaining sorted key references — but they slow down data modifications.

💻 How It Works: Uses data structures like B-Trees or Hash maps to find rows in logarithmic time instead of scanning everything.

🚀 Why It Matters: Smart indexing is one of the most powerful performance tuning tools in SQL — mastering it separates data users from database engineers.

2.3 Query Refactoring and CTEs 2.1 Read and Interpret Execution Plans