6.1 Graph-Based Recommendations

Machine Learning Interview Guide for Top Tech Roles (2025)

6 min read 1103 words

🪄 Step 1: Intuition & Motivation

Core Idea: Traditional recommendation systems view users and items as rows and columns in a matrix — but what if we treated them as nodes in a network connected by interactions (likes, views, purchases)?

That’s the idea behind Graph-Based Recommendations: We model relationships explicitly as a graph — capturing not just who liked what, but how everyone and everything is connected.

Simple Analogy: Imagine a party 🎉 where every person (user) is connected to the snacks (items) they’ve tasted. If you liked chocolate cake, and your friend liked chocolate cake and brownies, there’s a good chance you’ll enjoy brownies too.

That’s how graph-based systems think — “friends of friends” reasoning, but generalized to any network of preferences.

🌱 Step 2: Core Concept

Let’s begin by visualizing the user–item bipartite graph — the foundation of all graph-based recommenders.

User–Item Bipartite Graph

A bipartite graph has two distinct types of nodes:

Users ($U = {u_1, u_2, …, u_m}$)
Items ($I = {i_1, i_2, …, i_n}$)

Edges connect users and items when there’s an interaction (e.g., a rating, click, or purchase).

Formally:

$$ G = (V, E), \quad V = U \cup I, \quad E = {(u, i) \ | \ u \text{ interacted with } i} $$

Each edge may carry weights:

Explicit feedback: rating score (1–5 stars)
Implicit feedback: frequency or recency of interaction

This structure naturally captures similarity:

Two users are similar if they connect to similar items.
Two items are similar if they’re consumed by similar users.

Graphs let your recommender see beyond individual behavior — they see the community pattern.

From Collaborative Filtering to Graphs

Traditional collaborative filtering (CF) only looks at direct connections — user–item pairs.

Graphs extend this idea to multi-hop relationships:

User A → Item X → User B → Item Y ⇒ If User A liked Item X, and User B also liked X and Y, maybe A will like Y.

This propagation of influence through the graph is what makes Graph Neural Networks (GNNs) so powerful in recommendation tasks.

GNNs let your recommender “walk” through the network — discovering new interests by exploring neighboring connections.

Message Passing in GNNs

The central mechanism in GNNs is message passing — where nodes aggregate information from their neighbors to update their embeddings.

For each node $v$:

$$ h_v^{(k+1)} = \sigma \Big( W^{(k)} \cdot \text{AGG}\big({h_u^{(k)} : u \in \mathcal{N}(v)}\big) \Big) $$

Where:

$h_v^{(k)}$ = embedding of node $v$ at layer $k$
$\mathcal{N}(v)$ = neighbors of node $v$
$\text{AGG}$ = aggregation function (mean, sum, attention, etc.)
$\sigma$ = non-linear activation (e.g., ReLU)
$W^{(k)}$ = learnable weights

After $K$ layers, each node’s embedding encodes information from its $K$-hop neighborhood.

Think of it like word-of-mouth learning — each node “listens” to what its friends are saying before updating its own opinion.

GraphSAGE: Inductive Representation Learning

Traditional GNNs like GCN (Graph Convolutional Network) require the entire graph in memory — impractical for millions of users and items.

GraphSAGE (Hamilton et al., 2017) fixes this by sampling a limited number of neighbors during training — making it inductive (works even for unseen nodes).

Its update rule:

$$ h_v^{(k+1)} = \sigma \Big( W^{(k)} [h_v^{(k)} \Vert \text{AGG}\big({h_u^{(k)}, u \in \mathcal{N}(v)}\big)] \Big) $$

Here $\Vert$ denotes concatenation.

Key benefit:

You can generate embeddings for new users or items without retraining the whole model — perfect for cold-start.

GraphSAGE doesn’t memorize the graph — it learns how to learn from neighbors, making it future-proof for new nodes.

PinSage: Scalable GNN for Recommendations

PinSage (Ying et al., 2018) — developed at Pinterest — scales GNNs to billions of nodes.

How it works:

Samples neighbors using random walks (biased toward more important nodes).
Aggregates features using weighted attention (more weight to frequent interactions).
Optimizes embeddings to bring positive pairs (clicked items) closer and negative pairs further apart.

Loss function (simplified contrastive objective):

$$ \mathcal{L} = -\log \frac{\exp(\text{sim}(h_u, h_i^+))}{\sum_{j \in N(u)} \exp(\text{sim}(h_u, h_j^-))} $$

PinSage learns from graph context instead of matrix rows — perfect for discovering niche, serendipitous items. 🌟

📐 Step 3: Mathematical Foundation

Let’s condense the main mathematical pieces you should understand intuitively.

Graph Representation

Adjacency matrix $A$ captures the user–item graph:

$$ A_{ui} = \begin{cases} 1, & \text{if user } u \text{ interacted with item } i \ 0, & \text{otherwise} \end{cases} $$

We learn embeddings $P$ (users) and $Q$ (items) through neighborhood propagation:

$$ P' = f(A, Q), \quad Q' = f(A^T, P) $$

After message passing, user embeddings encode multi-hop relationships through $A^k$ (the $k$-hop connectivity).

Every propagation layer lets information travel one “hop” further — like extending your friend circle by one degree.

Link Prediction Objective

Ultimately, recommendation = link prediction between user $u$ and item $i$:

$$ \hat{r}_{ui} = \sigma(h_u^\top h_i) $$

where $\sigma$ is a sigmoid or softmax activation.

We train this via binary cross-entropy loss:

$$ \mathcal{L} = -\sum_{(u,i)} [y_{ui} \log \hat{r}*{ui} + (1 - y*{ui}) \log(1 - \hat{r}_{ui})] $$

The model learns to predict missing edges in the graph — i.e., potential new likes or purchases.

🧠 Step 4: Assumptions or Key Ideas

Graph Structure Encodes Context: Relationships (edges) are as important as nodes themselves.
Homophily: Connected nodes likely share similar preferences.
Inductive Reasoning: GraphSAGE and PinSage can infer embeddings for unseen users/items.
Neighborhood Aggregation: Each node learns by averaging or attending to its neighbors.

⚖️ Step 5: Strengths, Limitations & Trade-offs

Captures higher-order connections (multi-hop reasoning).
Handles cold-start better than traditional CF.
Learns from both structure (edges) and attributes (node features).
Scales efficiently with methods like GraphSAGE and PinSage.

Graph construction can be expensive and memory-heavy.
Sensitive to noisy edges (incorrect or random interactions).
Training GNNs requires careful sampling and batching.
Interpretation of embeddings is non-trivial.

You trade simplicity (matrix factorization) for expressiveness (graph structure). Graph-based methods shine when relationships matter more than ratings — but require powerful infrastructure.

🚧 Step 6: Common Misunderstandings

🚨 Common Misunderstandings (Click to Expand)

“Graphs are only useful for social networks.” Nope — any recommender with relational data (users ↔ items) forms a natural graph.
“Graph models eliminate cold-start completely.” They reduce it by leveraging neighbors — but still need some connections or metadata.
“GNNs replace collaborative filtering.” They extend CF — retaining its intuition while adding multi-hop intelligence.

🧩 Step 7: Mini Summary

🧠 What You Learned: Graph-based recommenders represent users and items as nodes connected by interactions, learning from the structure of relationships.

⚙️ How It Works: Through message passing (GraphSAGE, PinSage), embeddings are updated by aggregating information from neighbors, enabling inductive reasoning and multi-hop learning.

🎯 Why It Matters: Graph-based models capture deep relational patterns and solve cold-start challenges — empowering next-generation, context-aware recommenders.

6.2 Reinforcement Learning in Recommenders 5.3 Evaluation at Scale