1.4. Eigenvalues, Eigenvectors & SVD
🪄 Step 1: Intuition & Motivation
Core Idea: Eigenvalues and eigenvectors reveal the hidden geometry of linear transformations — how they stretch, shrink, or flip space. And Singular Value Decomposition (SVD) generalizes that idea to any matrix, letting us understand and simplify data transformations in all directions.
Simple Analogy: Imagine pushing and twisting a lump of clay (your dataset). Some directions stretch the clay, some compress it, and some stay perfectly aligned. Those special “stay-aligned” directions are eigenvectors, and the amount of stretch or shrink along them are the eigenvalues.
🌱 Step 2: Core Concept
What’s Happening Under the Hood?
A linear transformation (matrix) acts like a geometric machine:
$$ A \mathbf{v} = \lambda \mathbf{v} $$Here:
- $\mathbf{v}$ is an eigenvector (the direction that doesn’t rotate under $A$).
- $\lambda$ is an eigenvalue (how much $A$ scales that direction).
So, while most vectors change direction under $A$, eigenvectors are the “steady directions” that only stretch or compress.
Why It Works This Way
If you think of $A$ as a transformation, then applying it to different vectors changes their direction — except for some privileged ones. Those privileged directions (eigenvectors) mark where $A$ acts purely as scaling.
Example: For a 2×2 transformation matrix that stretches space horizontally twice as much as vertically,
- one eigenvector aligns horizontally (scaled by 2),
- another vertically (scaled by 1). All other directions mix into combinations of these.
How It Fits in ML Thinking
Eigenvalues and eigenvectors appear everywhere in data science:
- Covariance matrices (PCA): Eigenvectors define principal components (directions of maximum variance).
- Graph analysis: Eigenvalues define connectivity and stability.
- Optimization: The Hessian’s eigenvalues tell you about curvature (how steep or flat a region is).
- Neural networks: Weight matrices’ singular values affect gradient flow and training stability.
📐 Step 3: Mathematical Foundation
Eigenvalue–Eigenvector Equation
To find them, solve:
$$ (A - \lambda I)\mathbf{v} = 0 $$Non-trivial solutions exist only when:
$$ \det(A - \lambda I) = 0 $$This gives the characteristic equation for $\lambda$.
Geometric Meaning
If $A$ represents a transformation of space:
- Eigenvectors = invariant directions (arrows that stay along the same line).
- Eigenvalues = stretch or shrink factor along those directions.
So, in a 2D space:
$|\lambda| > 1$ → stretches.
$0 < |\lambda| < 1$ → compresses.
$\lambda < 0$ → flips direction.
Imagine arrows drawn in every direction. Only a few arrows keep pointing the same way after $A$ acts — those are eigenvectors.
Connection to PCA (Variance Maximization)
PCA finds directions (principal components) where data variance is maximized.
- Compute covariance matrix: $$ \Sigma = \frac{1}{n} X^T X $$
- Find eigenvectors $v_i$ and eigenvalues $\lambda_i$: $$ \Sigma v_i = \lambda_i v_i $$
- The eigenvector with the largest $\lambda_i$ points along the direction of maximum variance.
Hence, PCA = eigen-decomposition of the covariance matrix.
Singular Value Decomposition (SVD)
Any matrix $A$ (even non-square) can be decomposed as:
$$ A = U \Sigma V^T $$where:
- $U$: left singular vectors (orthogonal basis in output space),
- $V$: right singular vectors (orthogonal basis in input space),
- $\Sigma$: diagonal matrix of singular values (stretching factors).
Relationship with eigenvalues:
$$ A^T A v_i = \sigma_i^2 v_i $$So, singular values $\sigma_i = \sqrt{\lambda_i}$ from $A^T A$.
Dimensionality Reduction via SVD
We can approximate $A$ using only top $k$ singular values:
$$ A \approx U_k \Sigma_k V_k^T $$This yields a rank-$k$ approximation — capturing the most important structure while discarding noise.
This forms the mathematical basis of:
PCA
Latent Semantic Analysis (LSA)
Matrix factorization in recommender systems
SVD compresses data like a summary — keeping the strongest “notes” and dropping faint background noise.
🧠 Step 4: Key Ideas
- Eigenvectors: Invariant directions of a transformation.
- Eigenvalues: Amount of stretching or compression along those directions.
- PCA Connection: Eigenvectors of the covariance matrix = principal components.
- SVD: General form of eigen-decomposition that works for all matrices.
- Dimensionality Reduction: Keep top singular values → approximate data efficiently.
⚖️ Step 5: Strengths, Limitations & Trade-offs
- Reveals fundamental data structure (directions of maximum variance).
- SVD works for any matrix (not just square ones).
- Used in compression, noise reduction, and feature extraction.
- Sensitive to scaling — features with larger magnitudes dominate variance.
- Doesn’t capture nonlinear patterns.
- Computationally expensive for large datasets.
🚧 Step 6: Common Misunderstandings
🚨 Common Misunderstandings (Click to Expand)
- Myth: Eigenvectors are only for square matrices. → Truth: SVD extends the idea to any matrix.
- Myth: PCA always improves model accuracy. → Truth: It helps when redundancy exists — otherwise, it may remove meaningful variance.
- Myth: Eigenvalues are just math trivia. → Truth: They define the energy (variance) captured by each component — the core of dimensionality reduction.
🧩 Step 7: Mini Summary
🧠 What You Learned: Eigenvectors and eigenvalues describe invariant directions and their scaling; SVD extends this to all matrices.
⚙️ How It Works: PCA uses these ideas to find the directions of maximum variance — the most “informative” axes of the data.
🎯 Why It Matters: Understanding eigenvectors and SVD lets you see inside data transformations — the foundation of compression, denoising, and modern ML embeddings.