R-squared and Adjusted R-squared: Linear Regression

4 min read 791 words

🪄 Step 1: Intuition & Motivation

Core Idea: Once we’ve trained our regression model, the next big question is: how good is it? That’s where R-squared and Adjusted R-squared come in. These metrics tell us how well our model explains the variation in the target — or in plain English, how much of the “story” our model actually captures.
Simple Analogy: Imagine your target variable ($y$) as a messy room. R-squared measures how much of that mess your model manages to organize neatly into labeled boxes (explained variance). Adjusted R-squared then walks in and says, “Hey, don’t just add random boxes that don’t help! I’ll only reward you if each new box actually helps tidy the room.”

🌱 Step 2: Core Concept

What’s Happening Under the Hood?

R-squared quantifies how much of the variation in your data is captured by your model’s predictions.

Let’s define two quantities:

Total variation in $y$ (TSS) — how much $y$ varies overall:
$$ TSS = \sum (y_i - \bar{y})^2 $$
Residual variation (RSS) — how much of that variation the model failed to explain:
$$ RSS = \sum (y_i - \hat{y_i})^2 $$

Then, R-squared is just:

$$ R^2 = 1 - \frac{RSS}{TSS} $$

If $R^2 = 1$, your model explains everything (perfect fit).
If $R^2 = 0$, your model explains nothing (you might as well predict the mean).

Why It Works This Way

R-squared compares how bad your model is versus just predicting the mean every time.
If your model reduces the error a lot compared to the mean, $R^2$ goes up.

But there’s a catch — R-squared never decreases, even if you add useless features.
That’s where Adjusted R-squared comes in — it penalizes unnecessary complexity.

How It Fits in ML Thinking

R-squared gives you interpretability: how much of the target’s behavior your model can account for.
Adjusted R-squared introduces regularization of reasoning — it’s the statistical ancestor of today’s complexity penalties (like L1, L2, etc.).

📐 Step 3: Mathematical Foundation

R-squared ($R^2$)

$$ R^2 = 1 - \frac{RSS}{TSS} $$

Where:

$RSS = \sum (y_i - \hat{y_i})^2$ (residual sum of squares)
$TSS = \sum (y_i - \bar{y})^2$ (total sum of squares)

Interpretation:
$R^2$ tells you the proportion of total variance in $y$ that’s explained by the model.

If $R^2 = 0.8$, it means “80% of the variance in the outcome can be explained by the model.”

Think of $R^2$ as your model’s report card:

1.0 → “You’ve explained everything!”
0.0 → “You didn’t explain anything new!”
Negative → “You made things worse than just predicting the mean!”

Adjusted R-squared ($\bar{R}^2$)

$$ \bar{R}^2 = 1 - (1 - R^2)\frac{n - 1}{n - p - 1} $$

Where:

$n$ = number of samples
$p$ = number of predictors (features)

Interpretation:
Adjusted R-squared corrects for adding too many features that don’t help.
When you add a useless feature, $R^2$ goes up slightly, but $\bar{R}^2$ drops, warning you that the improvement isn’t genuine.

It’s like a strict teacher:
“Don’t just bring more friends to the group project — unless they actually do work, I’ll lower your grade.”

🧠 Step 4: Key Ideas and Assumptions

1️⃣ Variance-based interpretation:
R-squared assumes your data has measurable variance to begin with. If your target is almost constant, R-squared becomes meaningless.

2️⃣ Monotonicity:
R-squared never decreases when adding features — but Adjusted R-squared can, if those features add noise instead of value.

3️⃣ Model context:
R-squared only compares models using the same dependent variable. It’s not absolute truth — it’s a relative measure of “fit.”

⚖️ Step 5: Strengths, Limitations & Trade-offs

Easy to interpret — intuitive measure of “fit.”
Adjusted R-squared discourages overfitting.
Helpful for comparing nested models (one inside another).

R-squared alone can mislead — always increases with more variables.
Doesn’t measure predictive accuracy on new data.
Can’t compare across different targets or data scales.

High R-squared = good fit (on training data)
but not always good generalization.
Use Adjusted R-squared (and cross-validation) for a reality check.

🚧 Step 6: Common Misunderstandings

🚨 Common Misunderstandings (Click to Expand)

“A higher R-squared always means a better model.”
Not necessarily — you might just be overfitting with too many features.
“R-squared can tell you how accurate your predictions are.”
It can’t. It measures fit to training data, not prediction performance.
“R-squared can decrease with more variables.”
No — it can’t. But Adjusted R-squared can, and that’s the point.

🧩 Step 7: Mini Summary

🧠 What You Learned: R-squared measures how much variance your model explains, while Adjusted R-squared penalizes unnecessary complexity.

⚙️ How It Works: R-squared compares your model’s errors to a naive mean predictor; Adjusted R-squared corrects for added features.

🎯 Why It Matters: These metrics teach you to balance fit with parsimony — the essence of good modeling.

Regularization (Ridge, Lasso, ElasticNet): Linear Regression Polynomial and Interaction Terms: Linear Regression