2.1. Core Discrete Distributions
🪄 Step 1: Intuition & Motivation
Core Idea: Discrete probability distributions describe how likely each possible outcome of a countable random variable is.
Think of them as the fingerprints of randomness — each distribution captures a unique pattern of how outcomes behave (like flipping coins, counting arrivals, or waiting for success).
Simple Analogy: Imagine tossing a coin, counting customers entering a shop, or tracking how many times you need to roll a die before getting a “6.” Each of these has its own style of randomness — and discrete distributions are how we mathematically describe those styles.
🌱 Step 2: Core Concept
What’s Happening Under the Hood?
Discrete distributions are defined by a Probability Mass Function (PMF) — a rule that assigns a probability to each possible value of a discrete random variable.
The PMF must satisfy two rules:
- $P(X = x_i) \geq 0$ for all possible $x_i$
- $\sum_i P(X = x_i) = 1$
Different distributions arise from different assumptions about the experiment:
- Are we counting successes in fixed trials? (→ Binomial)
- Waiting for the first success? (→ Geometric)
- Counting rare events over time? (→ Poisson)
- A single success/failure trial? (→ Bernoulli)
Why It Works This Way
Each discrete distribution reflects a unique experiment design:
- Bernoulli: one-shot yes/no.
- Binomial: multiple yes/no with fixed trials.
- Poisson: rare random events over time or space.
- Geometric: waiting time until success.
Changing the setup changes the formula — but the underlying principle is always “counting the probability of specific outcomes.”
How It Fits in ML Thinking
Discrete distributions shape how we model and sample data:
- Bernoulli → binary classification targets (0/1).
- Binomial → model accuracy or click-through rates.
- Poisson → event counts (e.g., number of logins/hour).
- Geometric → expected attempts before success (e.g., retries in RL systems).
Understanding their shapes helps in choosing loss functions, likelihood models, and synthetic data generators.
📐 Step 3: Mathematical Foundation
Let’s unpack each distribution one by one.
🎯 1. Bernoulli Distribution
Definition & Formula
Models a single binary event — success (1) or failure (0).
$$ P(X = x) = p^x (1 - p)^{1 - x}, \quad x \in {0,1} $$- $p$: probability of success
- $(1 - p)$: probability of failure
Expected Value: $E[X] = p$ Variance: $Var(X) = p(1 - p)$
🎯 2. Binomial Distribution
Definition & Formula
Counts number of successes in $n$ independent Bernoulli trials.
$$ P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}, \quad k = 0, 1, \dots, n $$- $n$: number of trials
- $k$: number of successes
- $p$: probability of success in each trial
Expected Value: $E[X] = np$ Variance: $Var(X) = np(1 - p)$
Example: The number of customers who buy a product out of 100 visitors.
🎯 3. Poisson Distribution
Definition & Formula
Models the number of rare events in a fixed time or space interval.
$$ P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!}, \quad k = 0, 1, 2, \dots $$- $\lambda$: average rate of occurrence per interval
Expected Value: $E[X] = \lambda$ Variance: $Var(X) = \lambda$
Example:
- Number of customer arrivals per hour
- Number of typos in a page of text
Connection to Binomial: If $n$ is large and $p$ is small such that $np = \lambda$, the Binomial distribution approximates a Poisson.
🎯 4. Geometric Distribution
Definition & Formula
Models the number of trials until the first success.
$$ P(X = k) = (1 - p)^{k - 1} p, \quad k = 1, 2, 3, \dots $$- $p$: probability of success on each trial
Expected Value: $E[X] = \frac{1}{p}$ Variance: $Var(X) = \frac{1 - p}{p^2}$
Example: Number of times you roll a die until a “6” appears.
💡 Interview Question Connection
“If a rare event happens twice in 1000 trials, what distribution models it?”
That’s a Poisson distribution, because:
- The event is rare ($p$ small).
- There are many independent trials ($n$ large).
- The average rate ($\lambda = np$) defines the event frequency.
So, for $\lambda = 2$:
$$ P(X = k) = \frac{e^{-2} 2^k}{k!} $$This models “exactly $k$ rare events in 1000 trials.”
🧠 Step 4: Assumptions or Key Ideas
| Distribution | Core Assumption |
|---|---|
| Bernoulli | Single binary trial |
| Binomial | Fixed number of independent Bernoulli trials |
| Poisson | Rare events over time or space, independent and random |
| Geometric | Repeated identical trials until first success |
Each assumes independence between trials — the moment outcomes influence each other, these models break down.
⚖️ Step 5: Strengths, Limitations & Trade-offs
- Simple yet powerful building blocks of probability.
- Capture real-world event patterns (counts, success/failure, waiting time).
- Easy to compute, interpret, and connect to other distributions.
- Assume perfect independence and fixed probabilities — often unrealistic.
- Sensitive to outliers or dependent data.
- Not ideal for multi-modal or continuous outcomes.
🚧 Step 6: Common Misunderstandings
🚨 Common Misunderstandings (Click to Expand)
- Mixing Binomial and Poisson: → Poisson is a limit case of Binomial, not the same thing.
- Forgetting the independence assumption: → Once trials depend on each other (like card draws without replacement), Binomial no longer applies.
- Confusing “successes” with “positive outcomes”: → “Success” just means “the event we’re tracking,” not necessarily “good.”
🧩 Step 7: Mini Summary
🧠 What You Learned: Discrete distributions describe the probability structure of countable outcomes like coin flips, successes, or event counts.
⚙️ How It Works: Each distribution (Bernoulli, Binomial, Poisson, Geometric) represents a different way of counting randomness under fixed rules.
🎯 Why It Matters: These distributions are the foundation of probabilistic modeling — they power everything from A/B testing to click prediction and anomaly detection.