1. Linear Regression
π Flashcards
.fit()
method.β‘ Short Theories
Regression predicts continuous outputs; classification predicts discrete labels.
Overfitting = high variance, underfitting = high bias.
Gradient descent iteratively adjusts weights to minimize the cost function.
Batch GD uses all data per step; SGD uses one sample; mini-batch balances both.
Scikit-learnβs
LinearRegression
uses OLS;SGDRegressor
uses iterative GD.
π€ Interview Q&A
Q1: What is Linear Regression and when is it used?
π― TL;DR: Linear Regression models the relationship between input features and a continuous target by fitting a linear function.
π± Conceptual Explanation
It assumes that the dependent variable can be expressed as a weighted sum of independent variables plus an error term. Think of it like drawing the best straight line through a scatter plot of data.
π Technical / Math Details
Univariate case:
Multivariate case:
Where $w_0$ is bias, $w_1, w_2, …, w_n$ are weights, and $x$ is feature vector.
βοΈ Trade-offs & Production Notes
- Fast, interpretable, baseline model.
- Struggles with non-linear relationships.
- Sensitive to outliers.
π¨ Common Pitfalls
- Forgetting to scale features before applying GD.
- Ignoring multicollinearity.
π£ Interview-ready Answer
“Linear Regression predicts continuous outcomes by fitting a linear equation between features and target; itβs simple, interpretable, but limited for non-linear data.”
Q2: Explain the cost function in Linear Regression.
π― TL;DR: The cost function (MSE) measures average squared error between predicted and true values; we minimize it.
π± Conceptual Explanation
The cost function quantifies model error. By minimizing it, the model finds the line of best fit.
π Technical / Math Details
$$ J(w) = \frac{1}{2m} \sum_{i=1}^m (\hat{y}^{(i)} - y^{(i)})^2 $$- $m$: number of samples
- $\hat{y}^{(i)}$: predicted value
- $y^{(i)}$: true value
βοΈ Trade-offs & Production Notes
- Convex β guarantees global minimum.
- Sensitive to outliers since errors are squared.
π¨ Common Pitfalls
- Confusing cost with evaluation metrics.
- Using non-scaled features β slow GD convergence.
π£ Interview-ready Answer
“The cost function in Linear Regression is mean squared error, which penalizes large deviations by squaring residuals.”
Q3: How does Gradient Descent work in Linear Regression?
π― TL;DR: Gradient descent iteratively updates weights by moving in the opposite direction of the gradient of the cost function.
π± Conceptual Explanation
Itβs like walking downhill blindfoldedβeach step follows the slope until you reach the valley (minimum cost).
π Technical / Math Details
Update rule:
- $\alpha$: learning rate
- $\frac{\partial J}{\partial w_j}$: gradient
βοΈ Trade-offs & Production Notes
- Batch GD: stable, but slow for large data.
- SGD: faster, but noisier.
- Mini-batch: balance between both.
π¨ Common Pitfalls
- Learning rate too small β slow.
- Too large β divergence.
π£ Interview-ready Answer
“Gradient Descent reduces error by updating weights opposite the gradient of the cost until convergence.”
Q4: Whatβs the difference between Univariate and Multivariate Linear Regression?
π― TL;DR: Univariate uses one feature; multivariate uses multiple features to predict the target.
π± Conceptual Explanation
Univariate draws a line in 2D space; multivariate draws a hyperplane in higher dimensions.
π Technical / Math Details
- Univariate:
$$ \hat{y} = w_0 + w_1x $$ - Multivariate:
$$ \hat{y} = w^T x $$
βοΈ Trade-offs & Production Notes
- Multivariate captures richer relationships.
- Risk of multicollinearity with many features.
π¨ Common Pitfalls
- Using too many irrelevant features β overfitting.
π£ Interview-ready Answer
“Univariate Linear Regression predicts with one feature, while multivariate uses multiple features, forming a hyperplane.”
Q5: Explain Overfitting vs Underfitting in Regression.
π― TL;DR: Overfitting β memorizes training data; underfitting β too simple to learn patterns.
π± Conceptual Explanation
Overfit = model too complex, performs poorly on new data. Underfit = model too simple, fails even on training data.
π Technical / Math Details
- Overfit: high variance.
- Underfit: high bias.
βοΈ Trade-offs & Production Notes
- Use regularization (L1/L2) to combat overfitting.
- Add features or complexity to reduce underfitting.
π¨ Common Pitfalls
- Evaluating only on training set.
π£ Interview-ready Answer
“Overfitting means the model is too complex and generalizes poorly; underfitting means itβs too simple to capture patterns.”
Q6: How does Scikit-learn implement Linear Regression?
π― TL;DR: LinearRegression
uses closed-form OLS; SGDRegressor
uses iterative optimization.
π± Conceptual Explanation
Scikit-learn provides both analytical and iterative implementations for Linear Regression, making it easy to use in practice.
π Technical / Math Details
LinearRegression
: solves $$ w = (X^TX)^{-1}X^Ty $$SGDRegressor
: applies gradient descent updates.
βοΈ Trade-offs & Production Notes
- OLS is fast for small/medium data.
- SGD scales better for huge datasets.
π¨ Common Pitfalls
- Not normalizing input data before SGD.
π£ Interview-ready Answer
“Scikit-learnβs LinearRegression uses closed-form OLS, while SGDRegressor applies gradient descent for scalability.”
π Key Formulas
Univariate Regression Equation
- $w_0$: bias/intercept
- $w_1$: slope/weight
- $x$: input feature
Interpretation: Models output as a straight line function of input.
Multivariate Regression Equation
- $w$: weight vector
- $x$: feature vector (with bias term $x_0=1$)
Interpretation: Generalizes line to a hyperplane in higher dimensions.
Mean Squared Error (Cost Function)
- $m$: number of samples
- $y^{(i)}$: true output
- $\hat{y}^{(i)}$: predicted output
Interpretation: Average squared deviation between predictions and truth; convex, easy to optimize.
Gradient Descent Update Rule
- $\alpha$: learning rate
- $\frac{\partial J}{\partial w_j}$: gradient for parameter $w_j$
Interpretation: Iteratively adjusts weights to minimize cost function.
β Cheatsheet
- Regression predicts continuous outputs; classification β discrete.
- Overfitting = high variance, underfitting = high bias.
- Univariate: one feature β straight line; multivariate: multiple features β hyperplane.
- Cost function: MSE, convex, optimized via gradient descent.
- Batch vs SGD vs Mini-batch: trade-off between stability and speed.
- Scikit-learn:
LinearRegression
(OLS),SGDRegressor
(gradient descent).