1.4. Eigenvalues, Eigenvectors & SVD

5 min read 941 words

🪄 Step 1: Intuition & Motivation

  • Core Idea: Eigenvalues and eigenvectors reveal the hidden geometry of linear transformations — how they stretch, shrink, or flip space. And Singular Value Decomposition (SVD) generalizes that idea to any matrix, letting us understand and simplify data transformations in all directions.

  • Simple Analogy: Imagine pushing and twisting a lump of clay (your dataset). Some directions stretch the clay, some compress it, and some stay perfectly aligned. Those special “stay-aligned” directions are eigenvectors, and the amount of stretch or shrink along them are the eigenvalues.


🌱 Step 2: Core Concept

What’s Happening Under the Hood?

A linear transformation (matrix) acts like a geometric machine:

$$ A \mathbf{v} = \lambda \mathbf{v} $$

Here:

  • $\mathbf{v}$ is an eigenvector (the direction that doesn’t rotate under $A$).
  • $\lambda$ is an eigenvalue (how much $A$ scales that direction).

So, while most vectors change direction under $A$, eigenvectors are the “steady directions” that only stretch or compress.

Eigenvectors are the axes of change, and eigenvalues are the amount of change.

Why It Works This Way

If you think of $A$ as a transformation, then applying it to different vectors changes their direction — except for some privileged ones. Those privileged directions (eigenvectors) mark where $A$ acts purely as scaling.

Example: For a 2×2 transformation matrix that stretches space horizontally twice as much as vertically,

  • one eigenvector aligns horizontally (scaled by 2),
  • another vertically (scaled by 1). All other directions mix into combinations of these.

How It Fits in ML Thinking

Eigenvalues and eigenvectors appear everywhere in data science:

  • Covariance matrices (PCA): Eigenvectors define principal components (directions of maximum variance).
  • Graph analysis: Eigenvalues define connectivity and stability.
  • Optimization: The Hessian’s eigenvalues tell you about curvature (how steep or flat a region is).
  • Neural networks: Weight matrices’ singular values affect gradient flow and training stability.

📐 Step 3: Mathematical Foundation

Eigenvalue–Eigenvector Equation
$$ A \mathbf{v} = \lambda \mathbf{v} $$

To find them, solve:

$$ (A - \lambda I)\mathbf{v} = 0 $$

Non-trivial solutions exist only when:

$$ \det(A - \lambda I) = 0 $$

This gives the characteristic equation for $\lambda$.

We’re looking for directions where the transformation $A$ acts as pure stretching, not twisting.

Geometric Meaning

If $A$ represents a transformation of space:

  • Eigenvectors = invariant directions (arrows that stay along the same line).
  • Eigenvalues = stretch or shrink factor along those directions.

So, in a 2D space:

  • $|\lambda| > 1$ → stretches.

  • $0 < |\lambda| < 1$ → compresses.

  • $\lambda < 0$ → flips direction.

    Imagine arrows drawn in every direction. Only a few arrows keep pointing the same way after $A$ acts — those are eigenvectors.

Connection to PCA (Variance Maximization)

PCA finds directions (principal components) where data variance is maximized.

  1. Compute covariance matrix: $$ \Sigma = \frac{1}{n} X^T X $$
  2. Find eigenvectors $v_i$ and eigenvalues $\lambda_i$: $$ \Sigma v_i = \lambda_i v_i $$
  3. The eigenvector with the largest $\lambda_i$ points along the direction of maximum variance.

Hence, PCA = eigen-decomposition of the covariance matrix.

The first principal component is the “longest axis” of your data cloud — the direction along which your data spreads the most.

Singular Value Decomposition (SVD)

Any matrix $A$ (even non-square) can be decomposed as:

$$ A = U \Sigma V^T $$

where:

  • $U$: left singular vectors (orthogonal basis in output space),
  • $V$: right singular vectors (orthogonal basis in input space),
  • $\Sigma$: diagonal matrix of singular values (stretching factors).

Relationship with eigenvalues:

$$ A^T A v_i = \sigma_i^2 v_i $$

So, singular values $\sigma_i = \sqrt{\lambda_i}$ from $A^T A$.

SVD generalizes the concept of eigen-decomposition — even for rectangular data — by finding how data “stretches” along orthogonal axes.

Dimensionality Reduction via SVD

We can approximate $A$ using only top $k$ singular values:

$$ A \approx U_k \Sigma_k V_k^T $$

This yields a rank-$k$ approximation — capturing the most important structure while discarding noise.

This forms the mathematical basis of:

  • PCA

  • Latent Semantic Analysis (LSA)

  • Matrix factorization in recommender systems

    SVD compresses data like a summary — keeping the strongest “notes” and dropping faint background noise.

🧠 Step 4: Key Ideas


⚖️ Step 5: Strengths, Limitations & Trade-offs

PCA is rotation-invariant (doesn’t depend on data orientation) but not scale-invariant (requires normalization). Scaling matters — hence we standardize features before PCA.

🚧 Step 6: Common Misunderstandings

🚨 Common Misunderstandings (Click to Expand)

🧩 Step 7: Mini Summary

🧠 What You Learned: Eigenvectors and eigenvalues describe invariant directions and their scaling; SVD extends this to all matrices.

⚙️ How It Works: PCA uses these ideas to find the directions of maximum variance — the most “informative” axes of the data.

🎯 Why It Matters: Understanding eigenvectors and SVD lets you see inside data transformations — the foundation of compression, denoising, and modern ML embeddings.

2.1. Limits, Continuity & Differentiability1.3. Determinants, Inverses & Rank
Any doubt in content? Ask me anything?
Chat
🤖 👋 Hi there! I'm your learning assistant. If you have any questions about this page or need clarification, feel free to ask!