3.1. Sampling & Estimation
🪄 Step 1: Intuition & Motivation
Core Idea: In real life, we rarely know everything about a population — we only get samples. From these small pieces, we try to estimate the true parameters (like the average income in a city or the true accuracy of a model).
Sampling & Estimation is how we turn limited data into reliable conclusions about the bigger world.
Simple Analogy: Imagine tasting a spoonful of soup to decide if it needs salt.
- That spoon is your sample.
- The whole pot is the population. If the soup is well stirred (randomly mixed), your sample gives a good estimate of the entire flavor — that’s the intuition behind sampling theory.
🌱 Step 2: Core Concept
What’s Happening Under the Hood?
When we collect a random sample, we compute statistics (like sample mean or variance) to estimate parameters (like population mean or variance).
These sample statistics are themselves random variables — if you took another sample, they’d likely differ slightly.
To study their behavior, we define sampling distributions — the probability distributions of these statistics over repeated samples.
From these, we build:
- Point estimates: one best guess (e.g., sample mean).
- Interval estimates: a range of plausible values (e.g., confidence interval).
Why It Works This Way
Random sampling ensures every subset of the population has an equal chance of being chosen.
This randomness makes our estimates unbiased — on average, they’ll hit the true value. Larger samples reduce variability (like stirring the soup more thoroughly), leading to more stable and accurate estimates.
How It Fits in ML Thinking
Sampling and estimation are the bridge between data and inference:
- Every model’s accuracy, loss, or metric is an estimate based on samples.
- The Law of Large Numbers (LLN) explains why larger datasets stabilize results.
- The Central Limit Theorem (CLT) justifies assuming normality in model errors and confidence intervals.
Without sampling theory, we couldn’t trust our models’ performance or interpret confidence in predictions.
📐 Step 3: Mathematical Foundation
📊 1. Sample Statistics
Sample Mean and Variance
Given a sample ${x_1, x_2, …, x_n}$:
Sample Mean:
$$ \bar{X} = \frac{1}{n} \sum_{i=1}^{n} x_i $$Estimates the population mean ($\mu$).
Sample Variance:
$$ s^2 = \frac{1}{n - 1} \sum_{i=1}^{n} (x_i - \bar{X})^2 $$The $n-1$ corrects bias (Bessel’s correction) for finite samples.
Sample Standard Deviation:
$$ s = \sqrt{s^2} $$
📈 2. Sampling Distributions & Standard Error
Concept & Formula
If you took many samples, each sample mean $\bar{X}$ would be slightly different. Their collective behavior forms the sampling distribution of $\bar{X}$.
The Standard Error (SE) measures the spread of that sampling distribution:
$$ SE_{\bar{X}} = \frac{\sigma}{\sqrt{n}} $$- $\sigma$: population standard deviation
- $n$: sample size
Since $\sigma$ is often unknown, we estimate it with $s$:
$$ SE_{\bar{X}} \approx \frac{s}{\sqrt{n}} $$🔄 3. Law of Large Numbers (LLN)
Statement & Meaning
The LLN states that as sample size $n$ increases, the sample mean $\bar{X}$ converges to the population mean $\mu$.
Formally:
$$ \lim_{n \to \infty} P(|\bar{X} - \mu| < \epsilon) = 1 $$for any small $\epsilon > 0$.
In plain words: The more data you collect, the closer your average gets to the truth.
🔔 4. Central Limit Theorem (CLT)
Statement & Meaning
The CLT states that for large $n$, the sampling distribution of $\bar{X}$ approximates a Normal distribution — no matter the original population’s shape.
$$ \frac{\bar{X} - \mu}{\sigma / \sqrt{n}} \rightarrow N(0,1) $$Key takeaway: Even if the population is non-normal, the average of many samples becomes approximately normal.
🎯 5. Point vs Interval Estimation
Definition & Comparison
- Point Estimate: A single best guess of a parameter (e.g., $\bar{X}$ estimates $\mu$).
- Interval Estimate: A range of plausible values around the point estimate, often with confidence level (like 95%).
Example:
“We estimate the true mean income is $55,000 ± 2,000, with 95% confidence.”
The interval width depends on:
- Sample size ($n$ ↑ → narrower interval)
- Confidence level (higher confidence → wider interval)
- Data variability ($s$ ↑ → wider interval)
🧠 Step 4: Assumptions or Key Ideas
- Samples must be random and independent.
- Larger $n$ improves stability (LLN) and normal approximation (CLT).
- Estimates are unbiased if sampling is fair.
- The CLT holds for almost any population with finite variance.
⚖️ Step 5: Strengths, Limitations & Trade-offs
- Provides the backbone for all inferential statistics.
- Justifies using normal-based confidence intervals in ML metrics.
- Enables robust estimation even from partial data.
- Biased or non-random samples break the theory.
- CLT needs sufficiently large $n$ (depends on population skewness).
- Point estimates hide uncertainty if not paired with intervals.
🚧 Step 6: Common Misunderstandings
🚨 Common Misunderstandings (Click to Expand)
- “The sample mean always equals the population mean.” → No — it only approximates it, improving as $n$ grows.
- “The CLT means all data is normal.” → False — it’s the distribution of sample means that becomes normal, not the raw data.
- “Bigger samples always fix bias.” → Wrong — large biased samples still produce biased results.
🧩 Step 7: Mini Summary
🧠 What You Learned: Sampling and estimation transform limited data into informed population-level insights.
⚙️ How It Works: Through sample means, variances, and the CLT, we connect small observations to large truths.
🎯 Why It Matters: This is the mathematical foundation of all data-driven inference — how we generalize from our data to the world beyond it.