Polynomial and Interaction Terms: Linear Regression

4 min read 831 words

🪄 Step 1: Intuition & Motivation

Core Idea: Linear Regression loves straight lines. But real-world relationships? They curve, bend, twist — sometimes even loop around. To capture these non-linear patterns without abandoning linearity in parameters, we use polynomial and interaction terms.

They let the model draw gentle curves and account for how features might work together — while still keeping the math simple and interpretable.

Simple Analogy: Think of Linear Regression as a painter with only a ruler — great for straight strokes but terrible at waves. Polynomial terms hand that painter a flexible ruler — they can now draw smooth bends while still knowing the exact equation behind every stroke.

🌱 Step 2: Core Concept

What’s Happening Under the Hood?

We transform the input features so that a linear model can represent non-linear relationships.

Suppose your model is:

$$ y = \beta_0 + \beta_1x + \epsilon $$

This only fits a straight line.
But if the relationship between $x$ and $y$ is curved, we can add polynomial terms:

$$ y = \beta_0 + \beta_1x + \beta_2x^2 + \beta_3x^3 + \dots + \epsilon $$

Now, the model can represent curves and bends by combining these powered-up versions of $x$.

Each added power ($x^2$, $x^3$, …) adds flexibility — but also risk (overfitting).

Interaction Terms

Interaction terms capture how two features combine their influence on the outcome.

For features $x_1$ and $x_2$:

$$ y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_3(x_1 \cdot x_2) + \epsilon $$

Here, $\beta_3$ represents the extra effect when both $x_1$ and $x_2$ act together.

This helps when one variable changes the impact of another (e.g., “age” modifies how “exercise hours” affect health).

Why It Works This Way

The model remains linear in parameters ($\beta$) even if the relationship between $x$ and $y$ becomes nonlinear.
That means we can still use all the familiar Linear Regression machinery — least squares, gradient descent, interpretability — while letting the model flex more naturally around real data curves.

How It Fits in ML Thinking

Polynomial and interaction terms are feature engineering in disguise — they expand the feature space to make simple models powerful.

In deep learning, this same idea is achieved automatically through non-linear activations and layer combinations.
In Linear Regression, you build the nonlinearity yourself.

📐 Step 3: Mathematical Foundation

Polynomial Regression

A polynomial regression of degree $k$ has the form:

$$ y = \beta_0 + \beta_1x + \beta_2x^2 + \dots + \beta_kx^k + \epsilon $$

The model fits curves up to order $k$.
The higher $k$, the more flexible (and wiggly) the curve.

Matrix Form:

$$ y = X^*\beta + \epsilon $$

where $X^*$ is the expanded design matrix containing powers of $x$:

$$ X^* = [1, x, x^2, x^3, \dots, x^k] $$

Polynomial expansion is like lifting the data into a higher-dimensional space where curved relationships look straight again.

Interaction Terms

If you have two features, $x_1$ and $x_2$, an interaction term adds a new feature $x_1x_2$.

$$ y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_3x_1x_2 + \epsilon $$

Effectively, this allows the slope of one variable to depend on another variable’s value.

Interaction terms make your regression line “bend differently” depending on the value of another feature — like customizing one slope per subgroup.

🧠 Step 4: Key Ideas and Assumptions

1️⃣ Linearity in coefficients remains:
The model stays mathematically linear — only the features are nonlinear.

2️⃣ Feature scaling becomes crucial:
Polynomial features (like $x^2$, $x^3$) can explode in magnitude.
Always standardize before fitting.

3️⃣ Higher-degree polynomials increase flexibility — and risk.
Beyond a certain degree, the model fits noise rather than signal (overfitting).

4️⃣ Interaction terms require interpretability care:
Coefficients now represent conditional effects, not isolated influences.

⚖️ Step 5: Strengths, Limitations & Trade-offs

Captures non-linear relationships while keeping interpretability.
Enables richer feature interactions without complex algorithms.
Works with regularization (Ridge, Lasso) to control complexity.

Susceptible to overfitting — especially at high degrees.
Introduces multicollinearity (strong correlation among powers).
Harder to interpret when many interaction terms exist.

Flexibility vs. Stability:
Polynomial terms add expressive power, but at the cost of simplicity.
Smart practitioners use regularization and cross-validation to find the balance.

🚧 Step 6: Common Misunderstandings

🚨 Common Misunderstandings (Click to Expand)

“Polynomial regression isn’t linear regression.”
It is — because the coefficients ($\beta$) still appear linearly.
“Higher degree always means better fit.”
Not necessarily — beyond a certain point, you start fitting noise.
“Interaction terms are optional fluff.”
Nope — they can reveal deep relationships (like “exercise improves health more for younger people”).

🧩 Step 7: Mini Summary

🧠 What You Learned: Polynomial and interaction terms extend Linear Regression to model curves and feature dependencies while remaining linear in parameters.

⚙️ How It Works: Add powers of features (for curvature) or products of features (for interactions) — then fit with the same OLS machinery.

🎯 Why It Matters: This is the first step from simple linear models to more expressive, non-linear relationships — and the reason regularization and validation are essential.

R-squared and Adjusted R-squared: Linear Regression p-values and Confidence Intervals: Linear Regression