1.5. Evaluate and Interpret Results
🪄 Step 1: Intuition & Motivation
Core Idea: Once K-Means finishes clustering, you’re left with groups — but how do you know if they make sense? This is where evaluation comes in. We measure how “tight” and “separated” the clusters are — in short, how good our grouping is.
Simple Analogy:
Imagine sorting colored marbles into bowls. You’d want marbles of the same color to stay together (tight clusters) and different colors to be far apart (well-separated clusters). Cluster evaluation is just a mathematical way of checking how well you did that.
🌱 Step 2: Core Concept
What’s Happening Under the Hood?
After running K-Means, we get:
- Cluster Assignments: Which data point belongs where.
- Centroids: The average position of each cluster. Now we want to check two things:
- Are points close to their own cluster’s centroid?
- Are clusters far apart from each other?
If both are true, we’ve achieved good clustering.
Why Evaluation Matters
Unlike supervised learning (where we have labels and can measure accuracy), clustering is unsupervised — we don’t have the “right answers.” So, we use internal metrics, which evaluate clustering based only on the data itself — not external labels.
They help us:
- Detect whether we chose too few or too many clusters.
- Compare clustering results across different runs.
- Quantify how well-separated and cohesive our clusters are.
How It Fits in ML Thinking
Evaluating K-Means is about developing data intuition. In real-world projects, you often need to justify:
- Why you chose $K = 4$ instead of $K = 5$.
- Whether your clustering is meaningful or just mathematically neat. These metrics turn those judgments into defensible, data-driven decisions.
📐 Step 3: Mathematical Foundation
1️⃣ Inertia / WCSS (Within-Cluster Sum of Squares)
- Measures how close data points are to their respective centroids.
- Lower WCSS = tighter, more compact clusters.
2️⃣ Silhouette Score
For each point:
$$ s = \frac{b - a}{\max(a, b)} $$Where:
$a$ = average distance from the point to others in the same cluster (cohesion).
$b$ = average distance from the point to the nearest other cluster (separation).
Range: $-1 \leq s \leq 1$
High $s$ (~1): Point is well placed.
Around 0: Point lies between clusters.
Negative $s$: Point may be in the wrong cluster.
3️⃣ Davies–Bouldin Index
Where:
$\sigma_i$ = average distance of points in cluster $i$ to their centroid (cluster scatter).
$d(\mu_i, \mu_j)$ = distance between cluster centroids $i$ and $j$.
Lower DBI = better clustering (tight, well-separated clusters).
🧠 Step 4: Assumptions or Key Ideas
- K-Means assumes clusters are spherical and roughly equal in size, so evaluation metrics work best under these conditions.
- Metrics like WCSS depend on Euclidean distance; they don’t work well with non-numeric or categorical data.
- There’s no universal “perfect K” — it depends on data complexity and purpose.
The right number of clusters is not found — it’s balanced between simplicity and usefulness.
⚖️ Step 5: Strengths, Limitations & Trade-offs
✅ Strengths
- Quantitative methods (like Silhouette score) make evaluation objective.
- Helps identify overfitting (too many clusters) or underfitting (too few).
- Makes comparison across models systematic.
⚠️ Limitations
- Metrics may disagree — one might favor fewer clusters while another prefers more.
- Sensitive to scaling and noisy data.
- Internal metrics can’t measure “real-world meaning.”
🚧 Step 6: Common Misunderstandings
🚨 Common Misunderstandings (Click to Expand)
- “Lower WCSS always means better clusters.” Not always — more clusters naturally reduce WCSS but may overfit.
- “The Silhouette score always gives a single best K.” It’s a guide, not a rule — interpret it alongside visual inspection.
- “Cluster quality = perfect separation.” Real data is messy; some overlap is expected and acceptable.
🧩 Step 7: Mini Summary
🧠 What You Learned: You explored how to evaluate cluster quality using metrics like WCSS, Silhouette score, and Davies–Bouldin index, and how to choose the optimal number of clusters ($K$).
⚙️ How It Works: These measures assess intra-cluster tightness and inter-cluster separation, guiding us toward meaningful groupings.
🎯 Why It Matters: Evaluation bridges the gap between algorithmic success and real-world usefulness — it ensures your clustering not only converges but also makes intuitive, actionable sense.