Beginner’s Notes on Linear Regression
This article is a simple summary of my notes on Linear Regression; explain it like I’m 5 version.
Linear Regression is a foundational machine learning algorithm, which is supervised — meaning we know what the data represents. It is used to model the relationship between two or more things, i.e., one or more features and an outcome / result.
Some of the questions we could pursue using a linear regression analysis include: What is the relationship between this variable and that variable? Do these set of variables have a significant correlation with a particular outcome?
The ultimate goal of linear regression analysis is to create a function that models the relationship between the given variables. So this means finding the line or plane that minimizes the errors in our predictions when compared with the labeled data. Here, the line or plane represents the model that we are building in order to make some predictions. The errors represent the measurement between the correct answer (labeled data) and the predictions made by the line or plane or model. The labeled data is simply the correct answers that we have and that we use to train the model or create the model in the first place.
There are different ways to define the error term. The parameters of the mathematical model are coefficients, bi, and the y-intercept. Much like the simple linear formula, y = mx + b. In univariate linear regression this is:
y = bₒ + bᵢ x
…and in multivariate linear regression this is:
y = bₒ + b₁x₁ + b₂x₂ + … + bᵢxᵢ
So all in all, linear regression is a foundational power house and a fundamental skill to learn as a beginner data scientist. It comes in many flavors and variations depending on the context and definitions, i.e. statistical/mathematical assumptions.