1.4. Combinatorics & Counting

Core Skills Guide for AI Interviews (Math, Code, SQL) 2025

Probability & Statistics for Data Science

5 min read 916 words

🪄 Step 1: Intuition & Motivation

Core Idea: Combinatorics is the mathematics of counting — it tells us how many different ways something can happen.
In probability, knowing “how many possible outcomes” exist versus “how many favorable ones” is the key to calculating $P(\text{event}) = \frac{\text{favorable outcomes}}{\text{total outcomes}}$.
Simple Analogy: Think of a wardrobe with 3 shirts and 2 pants.
- You can wear 6 different outfits (3×2). That’s combinatorics — it helps you count possibilities systematically instead of listing them one by one.

🌱 Step 2: Core Concept

What’s Happening Under the Hood?

When dealing with randomness, we often ask:

“How many ways can this event happen?”

Combinatorics gives structured answers through three key tools:

Permutations → when order matters.
Combinations → when order doesn’t matter.
Binomial coefficients → the mathematical link between the two.

These ideas make probability calculable, especially when direct measurement or simulation isn’t feasible.

Why It Works This Way

Counting is the foundation of uncertainty. If you can’t count all possible ways an event can occur, you can’t find its probability.

By classifying scenarios based on whether order matters and whether we can repeat choices, combinatorics helps us compute the number of outcomes systematically — without double counting or missing cases.

How It Fits in ML Thinking

Combinatorics isn’t just about dice or cards — it shapes data science thinking:

Sampling strategies (bootstrapping, cross-validation) rely on counting subsets.
In feature engineering, understanding combinations helps estimate feature explosion (e.g., $n$ features → $2^n$ subsets).
In Bayesian models, combinatorial coefficients appear in likelihood functions (like the Binomial distribution).

So even if you’re not shuffling cards, you’re counting possibilities all the time.

📐 Step 3: Mathematical Foundation

Permutations — Order Matters

A permutation counts how many ways you can arrange $r$ objects chosen from $n$ total, where order matters:

$$ P(n, r) = \frac{n!}{(n - r)!} $$

Example: How many 3-digit numbers can be made using digits {1, 2, 3, 4} without repetition? $P(4, 3) = \frac{4!}{1!} = 24$

Permutation is about ordering — swapping items creates new outcomes. If your order changes, your permutation changes.

Combinations — Order Doesn’t Matter

A combination counts how many ways you can choose $r$ objects from $n$ total, where order doesn’t matter:

$$ C(n, r) = \frac{n!}{r!(n - r)!} $$

Example: How many ways can you pick 3 students out of 10 for a project team? $C(10, 3) = \frac{10!}{3!7!} = 120$

Combinations are about groups, not arrangements. Choosing {A, B, C} is the same as choosing {C, B, A}.

Binomial Coefficients — The Bridge

The combination formula is also known as the binomial coefficient, written as:

$$ \binom{n}{r} = C(n, r) = \frac{n!}{r!(n - r)!} $$

These coefficients appear in the Binomial Theorem:

$$(a + b)^n = \sum_{r=0}^{n} \binom{n}{r} a^{n-r} b^r$$

Each $\binom{n}{r}$ tells us how many ways we can pick $r$ “b’s” out of $n$ trials — it’s the combinatorial heart of probabilities like “3 successes out of 5.”

Sampling with/without Replacement

Sampling determines whether we reuse selected items or not:

Type	Description	Formula (if order matters)
With Replacement	Same item can be chosen again	$n^r$
Without Replacement	Chosen items are removed	$P(n, r) = \frac{n!}{(n - r)!}$

Example: Drawing 3 cards from a 52-card deck:

Without replacement: each draw reduces options.
With replacement: every draw has 52 options (since the card goes back).

“Replacement” is like refilling the cookie jar after each grab — the total choices stay constant. Without it, each grab leaves fewer cookies (and fewer outcomes).

🧮 Example Walkthrough: “3 Face Cards from a Deck”

Let’s tackle the interview-style question:

“If you pick 3 cards from a 52-card deck, what’s the probability that all are face cards?”

There are 12 face cards total (J, Q, K of each suit).
We’re choosing 3 cards without replacement.

Total possible 3-card combinations: $C(52, 3) = \frac{52!}{3!49!} = 22{,}100$

Favorable combinations (all face cards): $C(12, 3) = \frac{12!}{3!9!} = 220$

Probability:

$$ P(\text{3 face cards}) = \frac{C(12, 3)}{C(52, 3)} = \frac{220}{22{,}100} \approx 0.00995 $$

So there’s about a 1% chance — a great demonstration of how combinatorics directly enables probability computation.

🧠 Step 4: Assumptions or Key Ideas

Every item or outcome is distinct.
The total set size ($n$) is known and finite.
When sampling, each selection is equally likely.
“With/without replacement” determines independence between draws.

⚖️ Step 5: Strengths, Limitations & Trade-offs

Provides the foundation for all discrete probability calculations.
Enables exact reasoning for complex random setups.
Reduces brute-force enumeration to neat formulas.

Grows combinatorially — numbers explode quickly for large $n$.
Assumes perfect randomness (real data often isn’t).
Hard to apply directly in continuous or dependent systems.

Combinatorics offers clarity at small scales and complexity at large scales. It’s the bridge between raw counting and analytical probability — beautiful, but computationally heavy.

🚧 Step 6: Common Misunderstandings

🚨 Common Misunderstandings (Click to Expand)

Confusing permutations with combinations: → Remember: order matters in permutations, not in combinations.
Forgetting replacement rule: → It drastically changes total outcomes. Always ask: “Do I put the item back?”
Using $n!$ blindly: → Factorials grow huge; always simplify using division (like $C(n, r)$).

🧩 Step 7: Mini Summary

🧠 What You Learned: Combinatorics is how we count possibilities — permutations for order, combinations for groups, and binomial coefficients to bridge both.

⚙️ How It Works: It uses factorial-based formulas to quantify how many distinct ways outcomes can occur under given rules.

🎯 Why It Matters: Without combinatorics, probability has no denominator — it’s the skeleton of uncertainty modeling.

2.1. Core Discrete Distributions 1.3. Bayes’ Theorem & Bayesian Reasoning