3.1. Sampling & Estimation

Core Skills Guide for AI Interviews (Math, Code, SQL) 2025

Probability & Statistics for Data Science

5 min read 990 words

🪄 Step 1: Intuition & Motivation

Core Idea: In real life, we rarely know everything about a population — we only get samples. From these small pieces, we try to estimate the true parameters (like the average income in a city or the true accuracy of a model).
Sampling & Estimation is how we turn limited data into reliable conclusions about the bigger world.
Simple Analogy: Imagine tasting a spoonful of soup to decide if it needs salt.
- That spoon is your sample.
- The whole pot is the population. If the soup is well stirred (randomly mixed), your sample gives a good estimate of the entire flavor — that’s the intuition behind sampling theory.

🌱 Step 2: Core Concept

What’s Happening Under the Hood?

When we collect a random sample, we compute statistics (like sample mean or variance) to estimate parameters (like population mean or variance).

These sample statistics are themselves random variables — if you took another sample, they’d likely differ slightly.

To study their behavior, we define sampling distributions — the probability distributions of these statistics over repeated samples.

From these, we build:

Point estimates: one best guess (e.g., sample mean).
Interval estimates: a range of plausible values (e.g., confidence interval).

Why It Works This Way

Random sampling ensures every subset of the population has an equal chance of being chosen.

This randomness makes our estimates unbiased — on average, they’ll hit the true value. Larger samples reduce variability (like stirring the soup more thoroughly), leading to more stable and accurate estimates.

How It Fits in ML Thinking

Sampling and estimation are the bridge between data and inference:

Every model’s accuracy, loss, or metric is an estimate based on samples.
The Law of Large Numbers (LLN) explains why larger datasets stabilize results.
The Central Limit Theorem (CLT) justifies assuming normality in model errors and confidence intervals.

Without sampling theory, we couldn’t trust our models’ performance or interpret confidence in predictions.

📐 Step 3: Mathematical Foundation

📊 1. Sample Statistics

Sample Mean and Variance

Given a sample ${x_1, x_2, …, x_n}$:

Sample Mean:
$$ \bar{X} = \frac{1}{n} \sum_{i=1}^{n} x_i $$
Estimates the population mean ($\mu$).
Sample Variance:
$$ s^2 = \frac{1}{n - 1} \sum_{i=1}^{n} (x_i - \bar{X})^2 $$
The $n-1$ corrects bias (Bessel’s correction) for finite samples.
Sample Standard Deviation:
$$ s = \sqrt{s^2} $$

The mean shows the sample’s center; the variance measures how scattered your data is around that center.

📈 2. Sampling Distributions & Standard Error

Concept & Formula

If you took many samples, each sample mean $\bar{X}$ would be slightly different. Their collective behavior forms the sampling distribution of $\bar{X}$.

The Standard Error (SE) measures the spread of that sampling distribution:

$$ SE_{\bar{X}} = \frac{\sigma}{\sqrt{n}} $$

$\sigma$: population standard deviation
$n$: sample size

Since $\sigma$ is often unknown, we estimate it with $s$:

$$ SE_{\bar{X}} \approx \frac{s}{\sqrt{n}} $$

Larger samples shrink uncertainty — as $n$ grows, $\bar{X}$ fluctuates less. That’s why more data → more reliable estimates.

🔄 3. Law of Large Numbers (LLN)

Statement & Meaning

The LLN states that as sample size $n$ increases, the sample mean $\bar{X}$ converges to the population mean $\mu$.

Formally:

$$ \lim_{n \to \infty} P(|\bar{X} - \mu| < \epsilon) = 1 $$

for any small $\epsilon > 0$.

In plain words: The more data you collect, the closer your average gets to the truth.

It’s the mathematical reassurance that “more data beats luck.”

🔔 4. Central Limit Theorem (CLT)

Statement & Meaning

The CLT states that for large $n$, the sampling distribution of $\bar{X}$ approximates a Normal distribution — no matter the original population’s shape.

$$ \frac{\bar{X} - \mu}{\sigma / \sqrt{n}} \rightarrow N(0,1) $$

Key takeaway: Even if the population is non-normal, the average of many samples becomes approximately normal.

The CLT is like statistical magic — it turns chaos into order. Add enough randomness together, and it smooths into a bell curve.

🎯 5. Point vs Interval Estimation

Definition & Comparison

Point Estimate: A single best guess of a parameter (e.g., $\bar{X}$ estimates $\mu$).
Interval Estimate: A range of plausible values around the point estimate, often with confidence level (like 95%).

Example:

“We estimate the true mean income is $55,000 ± 2,000, with 95% confidence.”

The interval width depends on:

Sample size ($n$ ↑ → narrower interval)
Confidence level (higher confidence → wider interval)
Data variability ($s$ ↑ → wider interval)

A point estimate gives your best shot; an interval estimate admits your uncertainty — both are essential for honest inference.

🧠 Step 4: Assumptions or Key Ideas

Samples must be random and independent.
Larger $n$ improves stability (LLN) and normal approximation (CLT).
Estimates are unbiased if sampling is fair.
The CLT holds for almost any population with finite variance.

⚖️ Step 5: Strengths, Limitations & Trade-offs

Provides the backbone for all inferential statistics.
Justifies using normal-based confidence intervals in ML metrics.
Enables robust estimation even from partial data.

Biased or non-random samples break the theory.
CLT needs sufficiently large $n$ (depends on population skewness).
Point estimates hide uncertainty if not paired with intervals.

Sampling theory gives precision from partiality — but only if randomness is respected. Data quantity and quality both matter.

🚧 Step 6: Common Misunderstandings

🚨 Common Misunderstandings (Click to Expand)

“The sample mean always equals the population mean.” → No — it only approximates it, improving as $n$ grows.
“The CLT means all data is normal.” → False — it’s the distribution of sample means that becomes normal, not the raw data.
“Bigger samples always fix bias.” → Wrong — large biased samples still produce biased results.

🧩 Step 7: Mini Summary

🧠 What You Learned: Sampling and estimation transform limited data into informed population-level insights.

⚙️ How It Works: Through sample means, variances, and the CLT, we connect small observations to large truths.

🎯 Why It Matters: This is the mathematical foundation of all data-driven inference — how we generalize from our data to the world beyond it.

3.2. Maximum Likelihood Estimation (MLE)2.3. Joint, Marginal, and Conditional Distributions