Linear Regression

Theoretical Understanding:

  • Regression shows a line or curve that passes through all the data points on a target-predictor graph in such a way that the vertical distance between the data points and the regression line is minimum.

  • In simple word in a linear regression we are trying to get a best fit line that has the minimum residuals(actual values - predicted values) by updating the intercept and slope using gradient decent.

reg1

Hypothesis function

y=θ_1+θ_0 x

  • In above equation y is a target variable and x is a given independent variables and theta_1 is a intercept of the line and thete_2 is a slope of the line
  • Now we have to update a theta_1 and theta_2 so that we can minimize the error so that we can find a best fit line for our linear regression model. We update a theta parameter with the help of Gradient Decent.

### How to minimize the error?

  • Here comes a role of cost function which is given by:

cost_func_LR

  • Here to get a best fit line we have to minimize a error that our model has made during training, which is done by minimizing above cost function. So to get the minimum error we will have to find value of theta which gives the minimum error and that value will be found using Gradient Decent method.

What is gradient decent?

  • To update θ1 and θ2 values in order to reduce Cost function (minimizing MSE value) and achieving the best fit line the model uses Gradient Descent. The idea is to start with random θ1 and θ2 values and then iteratively updating the values, reaching minimum cost(Global Minima).

gad1

Important Interview Topics in Linear Regression

1. What Are the Basic Assumption?

There are four assumptions associated with a linear regression model:

  1. Linearity: The relationship between X and the mean of Y is linear.
  2. Homoscedasticity: The variance of residual is the same for any value of X.
  3. Independence: Observations are independent of each other.
  4. Normality: For any fixed value of X, Y is normally distributed.

2. Advantages

  1. Linear regression performs exceptionally well for linearly separable data
  2. Easy to implement and train the model
  3. It can handle overfitting using dimensionlity reduction techniques and cross validation and regularization

3. Disadvantages

  1. Sometimes Lot of Feature Engineering Is required
  2. If the independent features are correlated it may affect performance
  3. It is often quite prone to noise and overfitting

4. Whether Feature Scaling is required?

  • Yes

    5. Impact of Missing Values?

  • It is sensitive to missing values

    6. Impact of outliers?

  • Linear regression needs the relationship between the independent and dependent variables to be linear. It is also important to check for outliers since linear regression is sensitive to outlier effects.

outlier

Types of Problems it can solve(Supervised)

  1. Regression

Overfitting

  • In linear regression overfitting occurs when the model is “too complex”. This usually happens when there are a large number of parameters compared to the number of observations. Such a model will not generalise well to new data. That is, it will perform well on training data, but poorly on test data.
Different Problem statement you can solve using Linear Regression
  1. Advance House Price Prediction
  2. Flight Price Prediction

Practical Implementation

  1. https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html