3.1. Random Variables & Distributions

Core Skills Guide for AI Interviews (Math, Code, SQL) 2025

Math for Data Science

5 min read 990 words

🪄 Step 1: Intuition & Motivation

Core Idea: Probability is how we describe uncertainty. Data is messy, unpredictable, and random — yet we still want to make sense of it. Random variables and distributions are the mathematical grammar for talking about that randomness.
Simple Analogy: Imagine rolling dice. The dice’s value is uncertain — that’s your random variable. The pattern of how often each number appears — that’s your distribution. In data science, everything — from predicting customer behavior to noise in measurements — behaves like dice rolls drawn from hidden distributions.

🌱 Step 2: Core Concept

What’s Happening Under the Hood?

A random variable is a quantity whose value is uncertain but follows some probabilistic rule.

Discrete random variable — takes specific, countable values (e.g., number of clicks, dice rolls).
Continuous random variable — can take any value within an interval (e.g., height, temperature).

Every random variable follows a distribution — a pattern describing how likely each value is. For discrete variables, that’s a PMF (Probability Mass Function). For continuous variables, it’s a PDF (Probability Density Function).

Why It Works This Way

Think of probability as a description of belief about what could happen.

PMF gives probabilities for exact outcomes: $P(X = k)$
PDF gives a “density” for intervals — we can’t talk about exact probabilities for continuous values (since $P(X = x) = 0$), only ranges.

To compute probabilities in the continuous case, we integrate the PDF:

$$ P(a \le X \le b) = \int_a^b f(x),dx $$

The Cumulative Distribution Function (CDF) accumulates these probabilities up to a point:

$$ F(x) = P(X \le x) $$

How It Fits in ML Thinking

In data science and ML:

Distributions model the noise or uncertainty in data.
Random variables represent uncertain features or parameters.
Many loss functions (e.g., cross-entropy, log loss) come from probabilistic reasoning.
Understanding Normal distributions helps interpret model outputs, and Bernoulli/Binomial/Poisson distributions describe discrete event data.

Probabilistic intuition is crucial for model evaluation — it helps you interpret uncertainty, design better tests, and handle noisy or incomplete data.

📐 Step 3: Mathematical Foundation

Probability Mass Function (PMF)

For a discrete random variable $X$:

$$ P(X = x_i) = p_i, \quad \text{where } \sum_i p_i = 1 $$

Example: For a fair 6-sided die, $P(X = k) = \frac{1}{6}$ for $k = 1, 2, …, 6$.

The PMF is like a list of probabilities — how likely each exact outcome is.

Probability Density Function (PDF)

For a continuous variable $X$, the PDF $f(x)$ satisfies:

$$ P(a \le X \le b) = \int_a^b f(x) , dx, \quad \text{and } \int_{-\infty}^{\infty} f(x),dx = 1 $$

Example: For a Normal distribution:

$$ f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} $$

A PDF doesn’t give exact probabilities — it’s a density curve. The area under the curve represents likelihood.

Cumulative Distribution Function (CDF)

$$ F(x) = P(X \le x) = \int_{-\infty}^x f(t) , dt $$

The CDF increases from 0 → 1 as $x$ grows.

CDF tells you how much probability has “accumulated” up to a point — like filling a glass until it overflows at 1.

Expected Value & Variance

Expected value (mean):

$$ E[X] = \sum_i x_i P(X = x_i) \quad \text{(discrete)} \quad \text{or} \quad E[X] = \int x f(x),dx \quad \text{(continuous)} $$

Variance:

$$ Var(X) = E[(X - E[X])^2] $$

Expectation = “center of mass” of a distribution. Variance = “spread” — how much typical values deviate from the mean.

🔢 Common Distributions

Bernoulli Distribution

Models a binary outcome: success (1) or failure (0).

$$ P(X = x) = p^x (1-p)^{1-x}, \quad x \in {0,1} $$

Mean: $E[X] = p$ Variance: $Var(X) = p(1-p)$

Like flipping a biased coin. Perfect for modeling “yes/no” events.

Binomial Distribution

Sum of $n$ independent Bernoulli trials:

$$ P(X = k) = \binom{n}{k} p^k (1-p)^{n-k} $$

Mean: $E[X] = np$, Variance: $np(1-p)$

“How many successes in $n$ tries?” — like how many customers click an ad out of 100 shown.

Poisson Distribution

Models counts of rare events over a fixed time or space:

$$ P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!} $$

Mean = Variance = $\lambda$

Great for modeling arrival times or occurrences (e.g., calls per minute).

Normal (Gaussian) Distribution

The famous bell curve:

$$ f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} $$

Mean = $\mu$, Variance = $\sigma^2$

Everything with many small random influences tends to look Normal — that’s the Central Limit Theorem at work.

🧠 Step 4: Key Ideas

Random variables quantify uncertainty.
PMF (discrete) and PDF (continuous) describe probability structure.
Expected value and variance summarize distributions.
Common distributions (Bernoulli, Binomial, Poisson, Normal) cover most real-world randomness.
The Central Limit Theorem (CLT) ensures averages of many random variables tend toward Normal — even if the original variables weren’t.

⚖️ Step 5: Strengths, Limitations & Trade-offs

Provides the foundation for statistical inference and model uncertainty.
Connects real-world randomness to mathematical structure.
Enables probabilistic modeling and hypothesis testing.

Real data may not follow ideal distributions.
Assumptions of independence or identical distribution (i.i.d.) are often violated.
Estimating parameters (e.g., $\mu$, $\sigma$) can be noisy for small data.

Understanding distributions means knowing when they apply and when they fail. Good data scientists test assumptions — not just fit curves.

🚧 Step 6: Common Misunderstandings

🚨 Common Misunderstandings (Click to Expand)

Myth: The Normal distribution applies to everything. → Truth: It emerges from CLT but not all data is Normal (especially skewed or heavy-tailed).
Myth: PDF gives probability directly. → Truth: It gives density; probability requires integrating over an interval.
Myth: Independence isn’t a big deal. → Truth: Violating i.i.d. assumptions breaks statistical inference (confidence intervals, CLT).

🧩 Step 7: Mini Summary

🧠 What You Learned: Random variables capture uncertainty, and distributions describe how likely outcomes are.

⚙️ How It Works: PMFs and PDFs define probability structure; expected values summarize it. Common distributions like Normal, Binomial, and Poisson model real-world randomness.

🎯 Why It Matters: Probability is the language of data science — without it, we can’t quantify uncertainty or trust our models’ predictions.

3.2. Expectation, Variance & Covariance 2.4. Optimization & Convexity