2.2. Core Continuous Distributions
🪄 Step 1: Intuition & Motivation
Core Idea: Continuous distributions describe probabilities for uncountably infinite outcomes — values that can take any number in an interval (like height, temperature, or model error).
Instead of assigning probabilities to specific values (which would be zero!), we assign probabilities to ranges of values — using something called a Probability Density Function (PDF).
Simple Analogy: Think of a smooth hill rather than discrete pebbles.
- Each point on the hill represents a possible value.
- The area under the curve represents probability. That’s how continuous distributions behave — you don’t count outcomes, you measure them.
🌱 Step 2: Core Concept
What’s Happening Under the Hood?
In discrete probability, we sum probabilities across outcomes. In continuous probability, we integrate the density over an interval to find the probability that a value lies in that range:
$$ P(a \leq X \leq b) = \int_a^b f(x),dx $$Here, $f(x)$ is the Probability Density Function (PDF), and the total area under it equals 1.
The Cumulative Distribution Function (CDF), $F(x)$, tells us the probability that the variable takes a value less than or equal to $x$:
$$ F(x) = P(X \leq x) = \int_{-\infty}^{x} f(t),dt $$So the PDF gives instantaneous likelihood, and the CDF gives accumulated probability.
Why It Works This Way
Continuous variables are infinite in granularity — we can’t assign direct probabilities to single points because $P(X = x) = 0$.
Instead, we use areas to represent ranges of outcomes. This allows the model to handle infinitely many possibilities while keeping total probability = 1.
How It Fits in ML Thinking
Most real-world data — weights, times, sensor readings — is continuous.
Continuous distributions appear in ML through:
- Loss functions: The Gaussian’s log-likelihood gives rise to Mean Squared Error (MSE).
- Optimization: Exponential and Gamma define priors and regularizers.
- Uncertainty estimation: PDFs model confidence and data spread.
Understanding these distributions helps explain why models behave the way they do when fitting or generalizing.
📐 Step 3: Mathematical Foundation
Let’s unpack the four most important continuous distributions.
🌈 1. Uniform Distribution
Definition & Formula
All outcomes in a range $[a, b]$ are equally likely.
PDF:
$$ f(x) = \begin{cases} \frac{1}{b - a}, & a \leq x \leq b \ 0, & \text{otherwise} \end{cases} $$Expected Value: $E[X] = \frac{a + b}{2}$ Variance: $Var(X) = \frac{(b - a)^2}{12}$
🔔 2. Normal (Gaussian) Distribution
Definition & Formula
The most important continuous distribution — models natural variation and noise.
PDF:
$$ f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} , e^{-\frac{(x - \mu)^2}{2\sigma^2}} $$- $\mu$: mean (center)
- $\sigma^2$: variance (spread)
Expected Value: $E[X] = \mu$ Variance: $Var(X) = \sigma^2$
Key Features:
- Symmetric around $\mu$
- 68–95–99.7 rule (within 1σ, 2σ, 3σ)
📏 3. Standardization and Z-scores
Concept & Formula
To compare different Normal distributions, we standardize values using a Z-score:
$$ Z = \frac{X - \mu}{\sigma} $$This converts any Normal variable $X$ into the Standard Normal Distribution (mean 0, variance 1).
Why it’s useful:
- Enables comparison across different scales.
- Simplifies probability lookup via standard normal tables.
⏱️ 4. Exponential Distribution
Definition & Formula
Models waiting time between independent events occurring at a constant average rate (like time between web requests).
PDF:
$$ f(x) = \begin{cases} \lambda e^{-\lambda x}, & x \ge 0 \ 0, & x < 0 \end{cases} $$- $\lambda$: rate parameter
Expected Value: $E[X] = \frac{1}{\lambda}$ Variance: $Var(X) = \frac{1}{\lambda^2}$
Unique Property: Memoryless — future behavior doesn’t depend on the past.
🔋 5. Gamma Distribution
Definition & Formula
Generalizes the Exponential — models total waiting time until k events occur.
PDF:
$$ f(x) = \frac{\lambda^k x^{k-1} e^{-\lambda x}}{\Gamma(k)}, \quad x \ge 0 $$- $k$: shape parameter (number of events)
- $\lambda$: rate parameter
Expected Value: $E[X] = \frac{k}{\lambda}$ Variance: $Var(X) = \frac{k}{\lambda^2}$
Special Case: When $k = 1$, Gamma becomes Exponential.
💡 Deeper Insight — The Central Limit Theorem (CLT)
The Intuition Behind the CLT
The Central Limit Theorem says:
When you add up many independent random variables (no matter their distribution), their sum tends to follow a Normal distribution.
This is why the Normal distribution is everywhere — it’s the natural end state of randomness.
Example: Exam scores, daily sales, model errors — all shaped by many small, independent influences — will roughly follow a bell curve.
🧠 Step 4: Assumptions or Key Ideas
| Distribution | Core Assumption |
|---|---|
| Uniform | Equal likelihood across interval |
| Normal | Symmetric variability around mean |
| Exponential | Constant event rate, memoryless process |
| Gamma | Total waiting time until k events |
| CLT | Many small, independent random influences |
⚖️ Step 5: Strengths, Limitations & Trade-offs
- Describe real-world continuous phenomena beautifully.
- Underpin key ML tools (e.g., MSE, Gaussian priors, Kalman filters).
- Provide analytic simplicity for modeling and inference.
- Many real datasets deviate from Gaussian assumptions.
- Heavy tails or skewed data break Normal-based methods.
- Exponential’s memoryless property rarely holds in real-world timings.
🚧 Step 6: Common Misunderstandings
🚨 Common Misunderstandings (Click to Expand)
- “PDF gives probability.” → No! The PDF gives density — the area under it gives probability.
- “Normal = always appropriate.” → Many real-world distributions are non-Gaussian — always test assumptions.
- “Z-scores only work for Gaussian data.” → You can compute Z-scores anywhere, but their probabilistic meaning holds best under Normality.
🧩 Step 7: Mini Summary
🧠 What You Learned: Continuous distributions describe smooth, infinite-valued randomness — from flat Uniforms to bell-shaped Gaussians.
⚙️ How It Works: They use PDFs and CDFs to assign probabilities to ranges of values, and the CLT explains why Gaussian shapes dominate nature.
🎯 Why It Matters: Most ML models assume continuous uncertainty — mastering these distributions is key to understanding everything from prediction confidence to model noise.