Math for Data Science - Roadmap
🧠 Linear Algebra — The Language of Data
Note
The Top Tech Company Angle: Linear algebra is the foundation of every ML model — from linear regression and PCA to transformers and diffusion models. Interviews test how well you understand data as vectors and matrices, how transformations work, and how you connect geometric intuition to model behavior.
1.1: Vectors & Operations
- Grasp vectors as ordered collections of numbers representing data points or directions in space.
- Master operations — addition, scalar multiplication, dot and cross products — and their geometric meanings.
- Learn vector norms ($L_1$, $L_2$) and their importance in regularization and optimization.
Note: Expect probing questions like “What does cosine similarity actually measure?” or “Why does $L_2$ regularization shrink weights?” Interviewers want to see intuition over formulas.
1.2: Matrix Operations
- Understand matrix representations of datasets ($X \in \mathbb{R}^{n \times d}$).
- Learn multiplication rules, transposes, and the geometric interpretation of linear transformations.
- Practice with identity, diagonal, and orthogonal matrices.
Note: Be prepared to derive matrix-vector gradients like $\nabla_W (XW - y)^2$, connecting to backprop intuition.
1.3: Determinants, Inverses & Rank
- Understand when a matrix is invertible and what rank deficiency implies about multicollinearity.
- Compute determinants conceptually (not just numerically) as a measure of transformation volume.
- Explore pseudo-inverse and its role in Least Squares regression.
Note: A classic probing question: “What happens when $X^TX$ is not invertible? How do you regularize it?”
1.4: Eigenvalues, Eigenvectors & SVD
- Understand eigenvectors as directions of invariant transformation.
- Derive the PCA connection: variance maximization and orthogonal projections.
- Study Singular Value Decomposition (SVD) for dimensionality reduction and matrix factorization.
Note: Be ready for follow-ups like: “Why is PCA robust to rotation but not scaling?” or “What’s the geometric meaning of SVD?”
📈 Calculus — The Engine of Learning
Note
The Top Tech Company Angle: Calculus drives every optimization routine. You must understand how parameters change loss functions. Interviews test your fluency in gradients, partial derivatives, and the chain rule, not your ability to memorize formulas.
2.1: Limits, Continuity & Differentiability
- Review the concepts of continuity and smoothness — vital for understanding loss surfaces.
- Practice identifying non-differentiable points (e.g., ReLU activation).
Note: Expect to explain why ReLU “breaks” differentiability and how subgradients fix it.
2.2: Derivatives & Gradients
- Understand derivatives as rates of change and gradients as directions of steepest ascent.
- Compute gradients for scalar, vector, and matrix functions using Jacobian and Hessian intuition.
- Connect derivatives to backpropagation in neural networks.
Note: A good probing question: “What happens when your gradient vanishes or explodes? How do you stabilize training?”
2.3: Chain Rule & Backpropagation
- Learn multi-variable chain rule step-by-step and trace computation graphs.
- Understand automatic differentiation (used in PyTorch, TensorFlow).
- Implement simple backprop for linear layers and activations.
Note: They might ask: “Why is computational graph ordering crucial for efficiency in deep learning frameworks?”
2.4: Optimization & Convexity
- Study gradient descent, stochastic gradient descent (SGD), and momentum methods.
- Explore convex vs. non-convex functions and why convexity guarantees global minima.
- Understand learning rate schedules and convergence diagnostics.
Note: You’ll often be asked: “Why does Adam sometimes fail to converge? When would you prefer SGD?”
🎲 Probability & Statistics — The Language of Uncertainty
Note
The Top Tech Company Angle: This area tests your reasoning about uncertainty, model assumptions, and statistical significance. You must interpret what model outputs mean probabilistically and reason about variance, bias, and distributions.
3.1: Random Variables & Distributions
- Learn discrete vs. continuous random variables.
- Understand PDFs, PMFs, CDFs, and expected values.
- Get comfortable with Normal, Bernoulli, Binomial, and Poisson distributions.
Note: Expect: “Why does the Central Limit Theorem matter for model evaluation?” or “What happens if your data isn’t i.i.d.?”
3.2: Expectation, Variance & Covariance
- Derive expectation and variance formulas by hand.
- Understand covariance and correlation — especially their geometric interpretation.
- Apply to multivariate normal distributions.
Note: Interviewers often test whether you can connect covariance matrices to elliptical contours in data visualization.
3.3: Bayes’ Theorem & Conditional Probability
- Derive Bayes’ theorem and apply it to classification intuition.
- Understand independence, conditional independence, and posterior probability updates.
- Connect this to Naive Bayes and probabilistic graphical models.
Note: Be ready for “How do you interpret $P(A|B)$ in model calibration terms?” or “Why can Naive Bayes work despite unrealistic assumptions?”
3.4: Sampling & Estimation
- Learn Maximum Likelihood Estimation (MLE) and its geometric interpretation.
- Study bias-variance tradeoff, consistency, and efficiency.
- Explore confidence intervals and hypothesis testing basics.
Note: Expect trade-off discussions like “Why might a biased estimator be preferable in high-variance regimes?”
⚙️ Information Theory — The Mathematics of Model Learning
Note
The Top Tech Company Angle: Used to evaluate your understanding of model uncertainty and entropy minimization. You’ll encounter these in loss functions (cross-entropy), mutual information, and regularization theory.
4.1: Entropy, Cross-Entropy & KL Divergence
- Derive Shannon entropy and its intuition as “average information content.”
- Understand cross-entropy loss and its connection to maximum likelihood.
- Explore KL divergence for comparing probability distributions.
Note: Common probing question: “Why does minimizing cross-entropy correspond to maximizing likelihood?”
4.2: Mutual Information
- Learn how mutual information measures dependence between variables.
- Understand its role in feature selection and representation learning.
Note: Expect a discussion around “Why is mutual information non-negative?” or “How do VAEs use KL divergence?”
📊 Statistical Learning Foundations
Note
The Top Tech Company Angle: This section ties the math together into modeling intuition — from loss function derivations to regularization effects.
5.1: Bias-Variance Tradeoff
- Derive decomposition of MSE = Bias² + Variance + Irreducible Error.
- Understand implications for underfitting vs. overfitting.
Note: Interviewers love when you explain this with intuitive graphs and real model examples.
5.2: Regularization
- Understand $L_1$ (Lasso) and $L_2$ (Ridge) regularization mathematically and geometrically.
- Connect to sparsity and weight decay.
Note: “Why does $L_1$ induce sparsity while $L_2$ doesn’t?” is a top-tier conceptual question.
5.3: Gradient-Based Optimization in Practice
- Explore convergence challenges, learning rate tuning, and adaptive optimizers.
- Study batch normalization and gradient clipping as numerical stabilizers.
Note: Discuss numerical precision trade-offs in large-scale model training.
5.4: PCA, SVD & Dimensionality Reduction
- Derive PCA step-by-step from variance maximization.
- Understand the SVD decomposition and its computational benefits.
Note: Expect practical questions like “How would you implement PCA from scratch using NumPy?” or “Why do we center data first?”