Polynomial and Interaction Terms: Linear Regression
🪄 Step 1: Intuition & Motivation
- Core Idea: Linear Regression loves straight lines. But real-world relationships? They curve, bend, twist — sometimes even loop around. To capture these non-linear patterns without abandoning linearity in parameters, we use polynomial and interaction terms.
They let the model draw gentle curves and account for how features might work together — while still keeping the math simple and interpretable.
- Simple Analogy: Think of Linear Regression as a painter with only a ruler — great for straight strokes but terrible at waves. Polynomial terms hand that painter a flexible ruler — they can now draw smooth bends while still knowing the exact equation behind every stroke.
🌱 Step 2: Core Concept
What’s Happening Under the Hood?
We transform the input features so that a linear model can represent non-linear relationships.
Suppose your model is:
$$ y = \beta_0 + \beta_1x + \epsilon $$This only fits a straight line.
But if the relationship between $x$ and $y$ is curved, we can add polynomial terms:
Now, the model can represent curves and bends by combining these powered-up versions of $x$.
Each added power ($x^2$, $x^3$, …) adds flexibility — but also risk (overfitting).
Interaction Terms
Interaction terms capture how two features combine their influence on the outcome.
For features $x_1$ and $x_2$:
$$ y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_3(x_1 \cdot x_2) + \epsilon $$Here, $\beta_3$ represents the extra effect when both $x_1$ and $x_2$ act together.
This helps when one variable changes the impact of another (e.g., “age” modifies how “exercise hours” affect health).
Why It Works This Way
That means we can still use all the familiar Linear Regression machinery — least squares, gradient descent, interpretability — while letting the model flex more naturally around real data curves.
How It Fits in ML Thinking
Polynomial and interaction terms are feature engineering in disguise — they expand the feature space to make simple models powerful.
In deep learning, this same idea is achieved automatically through non-linear activations and layer combinations.
In Linear Regression, you build the nonlinearity yourself.
📐 Step 3: Mathematical Foundation
Polynomial Regression
A polynomial regression of degree $k$ has the form:
$$ y = \beta_0 + \beta_1x + \beta_2x^2 + \dots + \beta_kx^k + \epsilon $$The model fits curves up to order $k$.
The higher $k$, the more flexible (and wiggly) the curve.
Matrix Form:
$$ y = X^*\beta + \epsilon $$where $X^*$ is the expanded design matrix containing powers of $x$:
$$ X^* = [1, x, x^2, x^3, \dots, x^k] $$Interaction Terms
If you have two features, $x_1$ and $x_2$, an interaction term adds a new feature $x_1x_2$.
$$ y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_3x_1x_2 + \epsilon $$Effectively, this allows the slope of one variable to depend on another variable’s value.
🧠 Step 4: Key Ideas and Assumptions
1️⃣ Linearity in coefficients remains:
The model stays mathematically linear — only the features are nonlinear.
2️⃣ Feature scaling becomes crucial:
Polynomial features (like $x^2$, $x^3$) can explode in magnitude.
Always standardize before fitting.
3️⃣ Higher-degree polynomials increase flexibility — and risk.
Beyond a certain degree, the model fits noise rather than signal (overfitting).
4️⃣ Interaction terms require interpretability care:
Coefficients now represent conditional effects, not isolated influences.
⚖️ Step 5: Strengths, Limitations & Trade-offs
- Captures non-linear relationships while keeping interpretability.
- Enables richer feature interactions without complex algorithms.
- Works with regularization (Ridge, Lasso) to control complexity.
- Susceptible to overfitting — especially at high degrees.
- Introduces multicollinearity (strong correlation among powers).
- Harder to interpret when many interaction terms exist.
Polynomial terms add expressive power, but at the cost of simplicity.
Smart practitioners use regularization and cross-validation to find the balance.
🚧 Step 6: Common Misunderstandings
🚨 Common Misunderstandings (Click to Expand)
“Polynomial regression isn’t linear regression.”
It is — because the coefficients ($\beta$) still appear linearly.“Higher degree always means better fit.”
Not necessarily — beyond a certain point, you start fitting noise.“Interaction terms are optional fluff.”
Nope — they can reveal deep relationships (like “exercise improves health more for younger people”).
🧩 Step 7: Mini Summary
🧠 What You Learned: Polynomial and interaction terms extend Linear Regression to model curves and feature dependencies while remaining linear in parameters.
⚙️ How It Works: Add powers of features (for curvature) or products of features (for interactions) — then fit with the same OLS machinery.
🎯 Why It Matters: This is the first step from simple linear models to more expressive, non-linear relationships — and the reason regularization and validation are essential.