Simple and Multiple Linear Regression for Beginners

6 min readSep 10, 2020

Linear Regression is a Machine Learning algorithm. Based on Supervised Learning, a linear regression attempts to model the linear relationship between one or more predictor variables and a continuous target variable. By finding the relationship between the predictors and target variables, we can predict a target value. To give an example, based on certain house features (predictors) such as number of bedrooms and total square feet, we can predict house prices (target)!

Types of Variables

Independent Variable (features): data that is standalone and cannot be controlled directly, has direct effect on the dependent variable

Dependent Variables (target): data that is controlled directly, directly affected by the independent variables

Example: income (independent) depends on other features (dependent) such as education level, age, marriage status

What is Regression?

Regression: statistical method used to understand the relationships between variables

Simple Linear Regression: single feature to model a linear relationship with a target variable

Multiple Linear Regression: uses multiple features to model a linear relationship with a target variable

Simple Linear Regression

Let’s start off with simple linear regression since that’s the easiest to start with. We have data points that pertain to something in which we plot the independent variable on the X-axis and the dependent variable on the Y-axis.

The straight line is the “line of best fit” as it best describes the relationship between the independent and dependent variable. Once we have the line of best fit, we can start predicting!

The goal of regression analysis is to fit a line, out of an infinite number of lines that best describes the data.

The equation for a simple linear regression is shown below.

The first equation should look familiar — we learned this in Algebra! The second equation is an alternative of the first equation, it can be written either way and will give the same result. Let’s break down the formula:

Y is the dependent variable we’re predictor for
X is the independent variable
B1 is the slope, which determines the angle of the line
B0 is the intercept, which is a constant determining the value of y when x is 0

What do those carrots above each variable mean? That means you’re using a “hat” notation which stands for estimations. Because we can’t expect the model to predict the correct value, we can only estimate the predictions.

Once we calculate the regression coefficients, slope and intercept, we can replace X with a random value to get Y.

Multiple Linear Regression

When working with multiple independent variables, we’re still trying to find a relationship between features and the target variables. The only difference is that there’s more features we need to deal with.

This is only 2 features, years of education and seniority, on a 3D plane. Imagine if we had more than 3 features, visualizing a multiple linear model starts becoming difficult. No need to be frightened, let’s look at the equation and things will start becoming familiar.

The equation for a multiple linear regression is shown below.

n stands for the number of variables

If we look at the first half of the equation, it’s the exact same as the simple linear regression equation! What does the other half of the equation mean? Well they’re just added features! If we add more features, our equation becomes bigger.

Multicollinearity of Features

Because we’re dealing with more features, we run into something called multicollinearity. Let’s say we’re predicting for house prices again and we have several features such as total square feet and basement square feet. Wait a minute…isn’t basement square feet similar total square feet? These are most likely correlated with each other and this is problem in regression.

Multicollinearity: when independent variables are correlated with each other

Why is this a problem? Independent variables should be independent of each other. When independent variables are correlated, it indicates that changes in one variable are associated with shifts in another variable.

How to deal with multicollinearity? We need to plot all of the independent variables and check if they are correlated with each other. If they are correlated, we should remove the least important features.

Assumptions for Linear Regression

In order to get the best results or best estimates for the regression model, we need to satisfy a few assumptions. If not satisfied, you might not be able to trust the results.

Linearity

Linearity assumption requires that there is a linear relationship between the dependent(Y) and independent(X) variables
Why is this important? If there is no trend or pattern, it will result in bad predictions.

Normality

The normality assumption states that the model residuals should follow a normal distribution
Why is this important? If residuals are not normal, the predictors technically mean different things at different levels of the dependent variable.

residuals are not normal (skewed to the right)

Homoscedasticity

The residuals are equal across the regression line.
Heteroscedasticity is the inverse meaning unequal.

How do we interpret the line of best fit?

Coefficient of Determination (R²): A statistical measure that is used to assess the goodness of fit of a regression model. Must be a value between 0 and 1.

We have below the same plot as above with an added “baseline model”. R² uses this and compares the line of best fit to the baseline model, which is the mean of the observed values of the dependent variable.

Is our fitted regression line better than our baseline model?

Residual Sum of Squared Errors (RSS) is sum of all squared difference between actual Y and predicted Y. (red arrow above)
Total Sum of Squared Error (TSS) is the sum of all squared difference between Y and mean of Y. (vertical orange arrow above)

As the R² value gets closer to 1, the model will make better predictions and if R² values gets closer to 0, the model will make poorer predictions.

Interpretation of R²

Let’s say we got an R² of 0.88. The way to interpret this R² is…

“88% of the variations in dependent variable Y are explained by the independent variable in our model”

How to improve the model?

More data, the better. When we have more data, we can make more accurate and better predictions.
Choosing more appropriate features that are more correlated to the target variable can help make better predictions. Feature selection is important here to help reduce the number of unimportant features and keep only the important features in the model.

Sources

A lot of the information here has been taught through the curriculum of Flatiron School. The pictures has helped me a lot with understanding the material when I was learning, in which I used in this article. Thank you Flatiron School!