2.1. K-Means as an Expectation-Maximization (EM) Special Case

5 min read 920 words

🪄 Step 1: Intuition & Motivation

Core Idea: At first glance, K-Means and Gaussian Mixture Models (GMMs) seem totally different — one is geometric, the other probabilistic. But here’s the twist: K-Means is actually a simplified cousin of GMM, running under the same logic as the Expectation-Maximization (EM) algorithm — just without the “softness.”
In essence, K-Means says “Each point belongs to exactly one cluster”, while GMM gently replies, “Maybe, but let’s give it a probability.”
Simple Analogy:
Imagine sorting songs into playlists. K-Means says, “This song goes only in the Rock playlist.” GMM says, “This song sounds 70% Rock, 30% Pop — let’s record that uncertainty.”

🌱 Step 2: Core Concept

What’s Happening Under the Hood?

Both K-Means and GMM try to explain data by finding cluster structures, but they think differently:

K-Means: Uses hard assignments. Each data point strictly belongs to one cluster based on nearest centroid.
GMM: Uses soft assignments. Each data point belongs to every cluster with a probability.

Underneath, both follow an iterative two-step process that alternates between “guessing” and “updating.”

E-step (Expectation):
- Estimate which cluster each point belongs to.
- In K-Means → assign to the nearest centroid (hard).
- In GMM → compute probabilities of belonging to each cluster (soft).
M-step (Maximization):
- Update cluster parameters (centroids or Gaussian parameters) to best fit the new assignments.
- In K-Means → recompute the mean of assigned points.
- In GMM → recompute mean, covariance, and mixing weights using probabilities.

Why K-Means is a Special Case of EM

The EM algorithm for GMM models assumes data is generated by several Gaussian distributions:

$$ p(x) = \sum_{k=1}^{K} \pi_k \mathcal{N}(x | \mu_k, \Sigma_k) $$

If we make these simplifying assumptions:

Each cluster has equal covariance (spherical and identical $\Sigma = \sigma^2 I$).
All clusters have equal prior probability ($\pi_k = 1/K$).
We assign each point to the most likely cluster (not a probability).

Then, GMM’s EM steps collapse into K-Means’ simple steps of:

Assign → closest centroid.
Update → mean of assigned points.

That’s why K-Means is viewed as the “hard-assignment limit” of EM when the variance $\sigma^2 \to 0$.

How GMM Handles Overlapping Clusters

Unlike K-Means, which forces each point into one cluster, GMM acknowledges uncertainty — it gives each data point a membership probability for each cluster.

For example, if a point sits between two centroids, K-Means will pick one side arbitrarily, while GMM will say:

“This point is 55% likely to belong to Cluster A and 45% to Cluster B.”

This makes GMM better for real-world data where boundaries blur or clusters overlap — think of customer groups with shared behaviors or songs that mix genres.

📐 Step 3: Mathematical Foundation

K-Means Objective (Hard Assignments)

$$ J = \sum_{i=1}^{K} \sum_{x \in C_i} ||x - \mu_i||^2 $$

Hard assignment: each point $x$ belongs to exactly one cluster $C_i$.
Goal: minimize total distance between points and their assigned centroids.
K-Means simply measures closeness and says “belong to the nearest group.”

GMM Log-Likelihood (Soft Assignments)

$$ \mathcal{L} = \sum_{n=1}^{N} \log \left( \sum_{k=1}^{K} \pi_k \mathcal{N}(x_n | \mu_k, \Sigma_k) \right) $$

Here, $\pi_k$ are cluster weights and $\mathcal{N}(x_n | \mu_k, \Sigma_k)$ is the Gaussian probability density. Each data point contributes to all clusters, weighted by its membership probability.

This formula doesn’t just measure distance — it measures how likely each cluster is to have produced a data point.

Soft vs. Hard Membership

In K-Means, the membership matrix $r_{nk}$ (point $n$ in cluster $k$) is binary:
$$ r_{nk} = \begin{cases} 1, & \text{if } k = \arg\min_j ||x_n - \mu_j||^2 \ 0, & \text{otherwise} \end{cases} $$
In GMM, it’s probabilistic:
$$ r_{nk} = \frac{\pi_k \mathcal{N}(x_n | \mu_k, \Sigma_k)}{\sum_{j=1}^{K} \pi_j \mathcal{N}(x_n | \mu_j, \Sigma_j)} $$
K-Means assigns points like “yes or no.”

GMM assigns points like “how much.”

2.1. K-Means as an Expectation-Maximization (EM) Special Case

🪄 Step 1: Intuition & Motivation

🌱 Step 2: Core Concept

📐 Step 3: Mathematical Foundation

🧠 Step 4: Assumptions or Key Ideas

⚖️ Step 5: Strengths, Limitations & Trade-offs

🚧 Step 6: Common Misunderstandings

🧩 Step 7: Mini Summary