2.1. Core Discrete Distributions

5 min read 1002 words

🪄 Step 1: Intuition & Motivation

  • Core Idea: Discrete probability distributions describe how likely each possible outcome of a countable random variable is.

    Think of them as the fingerprints of randomness — each distribution captures a unique pattern of how outcomes behave (like flipping coins, counting arrivals, or waiting for success).

  • Simple Analogy: Imagine tossing a coin, counting customers entering a shop, or tracking how many times you need to roll a die before getting a “6.” Each of these has its own style of randomness — and discrete distributions are how we mathematically describe those styles.


🌱 Step 2: Core Concept

What’s Happening Under the Hood?

Discrete distributions are defined by a Probability Mass Function (PMF) — a rule that assigns a probability to each possible value of a discrete random variable.

The PMF must satisfy two rules:

  1. $P(X = x_i) \geq 0$ for all possible $x_i$
  2. $\sum_i P(X = x_i) = 1$

Different distributions arise from different assumptions about the experiment:

  • Are we counting successes in fixed trials? (→ Binomial)
  • Waiting for the first success? (→ Geometric)
  • Counting rare events over time? (→ Poisson)
  • A single success/failure trial? (→ Bernoulli)
Why It Works This Way

Each discrete distribution reflects a unique experiment design:

  • Bernoulli: one-shot yes/no.
  • Binomial: multiple yes/no with fixed trials.
  • Poisson: rare random events over time or space.
  • Geometric: waiting time until success.

Changing the setup changes the formula — but the underlying principle is always “counting the probability of specific outcomes.”

How It Fits in ML Thinking

Discrete distributions shape how we model and sample data:

  • Bernoulli → binary classification targets (0/1).
  • Binomial → model accuracy or click-through rates.
  • Poisson → event counts (e.g., number of logins/hour).
  • Geometric → expected attempts before success (e.g., retries in RL systems).

Understanding their shapes helps in choosing loss functions, likelihood models, and synthetic data generators.


📐 Step 3: Mathematical Foundation

Let’s unpack each distribution one by one.


🎯 1. Bernoulli Distribution

Definition & Formula

Models a single binary event — success (1) or failure (0).

$$ P(X = x) = p^x (1 - p)^{1 - x}, \quad x \in {0,1} $$
  • $p$: probability of success
  • $(1 - p)$: probability of failure

Expected Value: $E[X] = p$ Variance: $Var(X) = p(1 - p)$

Think of Bernoulli as the mathematical version of a coin toss — simple, binary, fundamental.

🎯 2. Binomial Distribution

Definition & Formula

Counts number of successes in $n$ independent Bernoulli trials.

$$ P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}, \quad k = 0, 1, \dots, n $$
  • $n$: number of trials
  • $k$: number of successes
  • $p$: probability of success in each trial

Expected Value: $E[X] = np$ Variance: $Var(X) = np(1 - p)$

Example: The number of customers who buy a product out of 100 visitors.

Binomial = “Many Bernoulli trials rolled together.” Each trial adds a small piece of uncertainty — together, they form a smooth probability curve.

🎯 3. Poisson Distribution

Definition & Formula

Models the number of rare events in a fixed time or space interval.

$$ P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!}, \quad k = 0, 1, 2, \dots $$
  • $\lambda$: average rate of occurrence per interval

Expected Value: $E[X] = \lambda$ Variance: $Var(X) = \lambda$

Example:

  • Number of customer arrivals per hour
  • Number of typos in a page of text

Connection to Binomial: If $n$ is large and $p$ is small such that $np = \lambda$, the Binomial distribution approximates a Poisson.

Poisson captures rare, independent events — like spontaneous sparks of randomness over time.

🎯 4. Geometric Distribution

Definition & Formula

Models the number of trials until the first success.

$$ P(X = k) = (1 - p)^{k - 1} p, \quad k = 1, 2, 3, \dots $$
  • $p$: probability of success on each trial

Expected Value: $E[X] = \frac{1}{p}$ Variance: $Var(X) = \frac{1 - p}{p^2}$

Example: Number of times you roll a die until a “6” appears.

Geometric is “waiting-time probability” — it measures patience under uncertainty.

💡 Interview Question Connection

“If a rare event happens twice in 1000 trials, what distribution models it?”

That’s a Poisson distribution, because:

  • The event is rare ($p$ small).
  • There are many independent trials ($n$ large).
  • The average rate ($\lambda = np$) defines the event frequency.

So, for $\lambda = 2$:

$$ P(X = k) = \frac{e^{-2} 2^k}{k!} $$

This models “exactly $k$ rare events in 1000 trials.”


🧠 Step 4: Assumptions or Key Ideas

DistributionCore Assumption
BernoulliSingle binary trial
BinomialFixed number of independent Bernoulli trials
PoissonRare events over time or space, independent and random
GeometricRepeated identical trials until first success

Each assumes independence between trials — the moment outcomes influence each other, these models break down.


⚖️ Step 5: Strengths, Limitations & Trade-offs

  • Simple yet powerful building blocks of probability.
  • Capture real-world event patterns (counts, success/failure, waiting time).
  • Easy to compute, interpret, and connect to other distributions.
  • Assume perfect independence and fixed probabilities — often unrealistic.
  • Sensitive to outliers or dependent data.
  • Not ideal for multi-modal or continuous outcomes.
These distributions are the “atoms” of probability — clean, idealized, and useful for reasoning, but real-world data often needs hybrid or continuous versions for better fit.

🚧 Step 6: Common Misunderstandings

🚨 Common Misunderstandings (Click to Expand)
  • Mixing Binomial and Poisson: → Poisson is a limit case of Binomial, not the same thing.
  • Forgetting the independence assumption: → Once trials depend on each other (like card draws without replacement), Binomial no longer applies.
  • Confusing “successes” with “positive outcomes”: → “Success” just means “the event we’re tracking,” not necessarily “good.”

🧩 Step 7: Mini Summary

🧠 What You Learned: Discrete distributions describe the probability structure of countable outcomes like coin flips, successes, or event counts.

⚙️ How It Works: Each distribution (Bernoulli, Binomial, Poisson, Geometric) represents a different way of counting randomness under fixed rules.

🎯 Why It Matters: These distributions are the foundation of probabilistic modeling — they power everything from A/B testing to click prediction and anomaly detection.

Any doubt in content? Ask me anything?
Chat
🤖 👋 Hi there! I'm your learning assistant. If you have any questions about this page or need clarification, feel free to ask!