Simple Linear Regression In Python

Hi there. In this post, I experiment with Python in creating a simple linear regression line (line of best fit) for (fake) sample data.

Sections

Setup & Sample Data

To start, I import matplotlib and the linear_model function from sklearn into Python. (I am currently using Anaconda so installation of these packages would occur in the Anaconda prompt and not in the regular windows command prompt.)

A Scatterplot

Generating a simple scatterplot in matplotlib does not require many lines of code. Here is the code and output.

Linear Regression Line Plot

When it comes to linear regression, we want to represent these points with one single line. This one single line can be seen as a line of best fit such that the aggregate distance from each point to the line of best fit is the lowest possible.

(The line is simple for many but not the best fit overall. You could propose some other functions which be closer to the points but they would not be as easy to understand. There is a tradeoff here.)

After some trial and error, I have found out that you need to convert the elements from the x-values into its own list. If this is not done, then you get an error when running a linear regression with sklearn.

Here is the rest of the code and the resulting linear regression plot.

References

• Datacamp’s Python Cheat Sheet for matplotlib
• http://dataconomy.com/2015/02/linear-regression-implementation-in-python/
• http://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html
• https://stackoverflow.com/questions/15569529/convert-list-elements-into-list-of-lists