2.1 User–User and Item–Item Collaborative Filtering
🪄 Step 1: Intuition & Motivation
Core Idea: If predictive modeling (Series 1) was about using features to make predictions, collaborative filtering (CF) is about using people.
Instead of saying “This movie is romantic, so you might like it,” CF says,
“People who liked what you liked also loved this — maybe you will too.”
It’s crowd wisdom at scale — using other users’ behavior to predict your preferences.
Simple Analogy: Think of it like asking your friends for recommendations. You say, “I loved Inception.” Your friend, who also loved Inception, says, “You’ll adore Interstellar.” Voilà — collaborative filtering in action.
🌱 Step 2: Core Concept
Collaborative Filtering (CF) relies on the user–item interaction matrix — a big grid showing who rated what.
Example (simplified):
| User | Inception | Interstellar | Titanic | Joker |
|---|---|---|---|---|
| A | 5 | ? | 3 | ? |
| B | 4 | 5 | ? | ? |
| C | ? | 4 | 2 | 5 |
The goal:
Predict the missing ? entries — “what rating would User A give to Interstellar?”
We can approach this two ways:
- User–User CF: Find users similar to A, use their ratings.
- Item–Item CF: Find items similar to Inception, use their ratings.
What’s Happening Under the Hood?
Both methods rely on similarity computation. We compute how similar two users (or items) are based on their rating patterns.
Then, to predict a rating:
- Find the top k most similar neighbors.
- Combine their known ratings to estimate the unknown one.
In math terms, for predicting user u’s rating on item i:
$$ \hat{r}*{ui} = \frac{\sum*{v \in N(u)} sim(u, v) \times r_{vi}}{\sum_{v \in N(u)} |sim(u, v)|} $$where:
- $sim(u, v)$ = similarity between users $u$ and $v$
- $r_{vi}$ = rating given by user $v$ to item $i$
- $N(u)$ = set of most similar users to $u$
Why It Works This Way
Humans naturally form preference clusters — we subconsciously group by taste. If a few users’ behaviors resemble yours, chances are, your future choices will align too.
Item-based CF works similarly but flips the logic:
- Instead of “users like users,” it’s “items like items.”
- E.g., Inception and Interstellar are often co-rated highly by the same people — so they’re “neighbors” in taste-space.
How It Fits in ML Thinking
Collaborative Filtering is one of the first steps beyond “feature-based prediction.” It doesn’t need explicit item attributes (like genre or tags). It purely learns from interaction patterns — hence, collaborative.
In the grand ML picture, it’s:
- Non-parametric (doesn’t assume a fixed formula)
- Based on similarity computation and weighted averaging
- Easy to interpret but hard to scale
This makes it foundational for understanding later methods like Matrix Factorization and Neural Collaborative Filtering — which compress these patterns into learned embeddings.
📐 Step 3: Mathematical Foundation
Let’s look at how similarity measures shape CF behavior.
Cosine Similarity
- $r_u, r_v$: rating vectors of users $u$ and $v$
- Measures the angle between rating vectors, not magnitude.
Intuition: Two users who rated the same movies similarly will have small angles between their rating vectors — higher cosine similarity.
Pearson Correlation
- Adjusts for rating scale differences.
- Ignores absolute values; focuses on relative patterns.
Intuition: Even if User A rates harshly (always 3s) and User B generously (always 5s), they may still have similar tastes if their preferences rise and fall together.
Jaccard Similarity
Used for implicit data (clicks, views) — where only presence matters.
Intuition: It measures overlap — how many items we both interacted with, out of total items either of us touched.
🧠 Step 4: Assumptions or Key Ideas
- Taste Similarity: Users with similar past behavior will have similar future preferences.
- Stationarity: User preferences don’t shift too rapidly over time.
- Data Density: Works best when the rating matrix isn’t too sparse — i.e., users have rated enough items.
When these assumptions fail (especially with sparse data), CF performance drops sharply.
⚖️ Step 5: Strengths, Limitations & Trade-offs
- Easy to understand and implement.
- No need for explicit item or user features.
- Captures human-like reasoning: “people like me liked this.”
- Scalability issues: O($n^2$) similarity computations as users/items grow.
- Sparsity problem: Many users rate very few items, reducing overlap.
- Struggles with cold-start — can’t recommend to new users/items.
🚧 Step 6: Common Misunderstandings
🚨 Common Misunderstandings (Click to Expand)
- “Cosine similarity always works best.” Not true — use Pearson when rating scales differ, and Jaccard for implicit feedback.
- “User–User and Item–Item CF are the same.” They differ in scaling behavior: Item–Item is usually faster and more stable over time.
- “More neighbors = better accuracy.” Too many neighbors add noise; usually 20–50 suffice.
🧩 Step 7: Mini Summary
🧠 What You Learned: Collaborative Filtering uses similarity between users or items to fill missing preferences in the rating matrix.
⚙️ How It Works: It computes similarities (Cosine, Pearson, Jaccard), finds nearest neighbors, and aggregates their ratings for prediction.
🎯 Why It Matters: It’s the foundation of personalized recommendations — capturing crowd wisdom without needing hand-crafted features.