← Back to Community
Profile icon
ItamarCohen28

# What's Linear Regression?.css-1q3m8ps{margin-left:var(--space-4);margin-right:var(--space-4);display:none;}

Linear Regression describes the relationship between variables by fitting a line to the observed data.

# Why do we need it?.css-1q3m8ps{margin-left:var(--space-4);margin-right:var(--space-4);display:none;}

We need it to predict house prices, from size and location (As you will see from the example).

# Why MLR?.css-1q3m8ps{margin-left:var(--space-4);margin-right:var(--space-4);display:none;}

MLR helps you predict between real data and predicted data using more than 2 variables.

# The importing:.css-1q3m8ps{margin-left:var(--space-4);margin-right:var(--space-4);display:none;}

We need to import all the libraries that we need for the Linear Regression.

.css-19sk4h4{position:relative;}.css-1bu6gr6{-webkit-align-items:stretch;-webkit-box-align:stretch;-ms-flex-align:stretch;align-items:stretch;border-width:0;border-style:solid;box-sizing:border-box;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-basis:auto;-ms-flex-preferred-size:auto;flex-basis:auto;-webkit-flex-direction:column;-ms-flex-direction:column;flex-direction:column;-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;outline:none;min-height:0;min-width:0;position:relative;}.css-1n2m10r{padding:var(--space-8);border-radius:var(--border-radius-4);background-color:var(--background-higher);}.css-1hwur6u{-webkit-align-items:stretch;-webkit-box-align:stretch;-ms-flex-align:stretch;align-items:stretch;border-width:0;border-style:solid;box-sizing:border-box;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-basis:auto;-ms-flex-preferred-size:auto;flex-basis:auto;-webkit-flex-direction:column;-ms-flex-direction:column;flex-direction:column;-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;outline:none;min-height:0;min-width:0;padding:var(--space-8);border-radius:var(--border-radius-4);background-color:var(--background-higher);}.css-1svvr0w{height:0;}.css-1ubbl1f{padding:var(--space-4);padding-left:var(--space-4);padding-right:var(--space-2);font-family:var(--font-family-code);font-size:14px;line-height:var(--line-height-small);overflow-x:auto;word-break:break-word;white-space:break-spaces;overflow-wrap:anywhere;}import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

Now we need to read the CSV file, my data is already cleaned up, but you will need to clean your data up, I'll cover this in a different tutorial.
For now, that's the code:

.css-19sk4h4{position:relative;}.css-1bu6gr6{-webkit-align-items:stretch;-webkit-box-align:stretch;-ms-flex-align:stretch;align-items:stretch;border-width:0;border-style:solid;box-sizing:border-box;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-basis:auto;-ms-flex-preferred-size:auto;flex-basis:auto;-webkit-flex-direction:column;-ms-flex-direction:column;flex-direction:column;-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;outline:none;min-height:0;min-width:0;position:relative;}.css-1n2m10r{padding:var(--space-8);border-radius:var(--border-radius-4);background-color:var(--background-higher);}.css-1hwur6u{-webkit-align-items:stretch;-webkit-box-align:stretch;-ms-flex-align:stretch;align-items:stretch;border-width:0;border-style:solid;box-sizing:border-box;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-basis:auto;-ms-flex-preferred-size:auto;flex-basis:auto;-webkit-flex-direction:column;-ms-flex-direction:column;flex-direction:column;-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;outline:none;min-height:0;min-width:0;padding:var(--space-8);border-radius:var(--border-radius-4);background-color:var(--background-higher);}.css-1svvr0w{height:0;}.css-1ubbl1f{padding:var(--space-4);padding-left:var(--space-4);padding-right:var(--space-2);font-family:var(--font-family-code);font-size:14px;line-height:var(--line-height-small);overflow-x:auto;word-break:break-word;white-space:break-spaces;overflow-wrap:anywhere;}streeteasy = pd.read_csv("https://raw.githubusercontent.com/sonnynomnom/Codecademy-Machine-Learning-Fundamentals/master/StreetEasy/manhattan.csv")

df = pd.DataFrame(streeteasy)

x = df[['bedrooms', 'bathrooms', 'size_sqft', 'min_to_subway', 'floor', 'building_age_yrs', 'no_fee', 'has_roofdeck', 'has_washer_dryer', 'has_doorman', 'has_elevator', 'has_dishwasher', 'has_patio', 'has_gym']]

y = df[['rent']]

(We also declared x and y because they are two of our MLR variables).

# The train and test splitting for descent gradient.css-1q3m8ps{margin-left:var(--space-4);margin-right:var(--space-4);display:none;}

Gradient Descent is iterating between hundreds or even thousands of options and selecting the best one.
The selecting operation is done by an algorithm that scores each of the iterations by this formula:

The learning rate is important, because sometimes it's too big, so the program takes hours to run, so you must decrease the learning rate, and sometimes you'll need to increase it.
The best score is 1, but it's impossible to reach it, because no algorithm is perfect.
That's the code:

.css-19sk4h4{position:relative;}.css-1bu6gr6{-webkit-align-items:stretch;-webkit-box-align:stretch;-ms-flex-align:stretch;align-items:stretch;border-width:0;border-style:solid;box-sizing:border-box;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-basis:auto;-ms-flex-preferred-size:auto;flex-basis:auto;-webkit-flex-direction:column;-ms-flex-direction:column;flex-direction:column;-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;outline:none;min-height:0;min-width:0;position:relative;}.css-1n2m10r{padding:var(--space-8);border-radius:var(--border-radius-4);background-color:var(--background-higher);}.css-1hwur6u{-webkit-align-items:stretch;-webkit-box-align:stretch;-ms-flex-align:stretch;align-items:stretch;border-width:0;border-style:solid;box-sizing:border-box;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-basis:auto;-ms-flex-preferred-size:auto;flex-basis:auto;-webkit-flex-direction:column;-ms-flex-direction:column;flex-direction:column;-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;outline:none;min-height:0;min-width:0;padding:var(--space-8);border-radius:var(--border-radius-4);background-color:var(--background-higher);}.css-1svvr0w{height:0;}.css-1ubbl1f{padding:var(--space-4);padding-left:var(--space-4);padding-right:var(--space-2);font-family:var(--font-family-code);font-size:14px;line-height:var(--line-height-small);overflow-x:auto;word-break:break-word;white-space:break-spaces;overflow-wrap:anywhere;}x_train, x_test, y_train, y_test = train_test_split(x, y, train_size = 0.8, test_size = 0.2, random_state=6)

# Creating and fitting the model.css-1q3m8ps{margin-left:var(--space-4);margin-right:var(--space-4);display:none;}

Now, we just want to fit the model into SKlearn Linear Regression algorithm, and MathPlotLib plotting element so it will be clear, we'll also print the score of the x and y train.

.css-19sk4h4{position:relative;}.css-1bu6gr6{-webkit-align-items:stretch;-webkit-box-align:stretch;-ms-flex-align:stretch;align-items:stretch;border-width:0;border-style:solid;box-sizing:border-box;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-basis:auto;-ms-flex-preferred-size:auto;flex-basis:auto;-webkit-flex-direction:column;-ms-flex-direction:column;flex-direction:column;-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;outline:none;min-height:0;min-width:0;position:relative;}.css-1n2m10r{padding:var(--space-8);border-radius:var(--border-radius-4);background-color:var(--background-higher);}.css-1hwur6u{-webkit-align-items:stretch;-webkit-box-align:stretch;-ms-flex-align:stretch;align-items:stretch;border-width:0;border-style:solid;box-sizing:border-box;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-basis:auto;-ms-flex-preferred-size:auto;flex-basis:auto;-webkit-flex-direction:column;-ms-flex-direction:column;flex-direction:column;-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;outline:none;min-height:0;min-width:0;padding:var(--space-8);border-radius:var(--border-radius-4);background-color:var(--background-higher);}.css-1svvr0w{height:0;}.css-1ubbl1f{padding:var(--space-4);padding-left:var(--space-4);padding-right:var(--space-2);font-family:var(--font-family-code);font-size:14px;line-height:var(--line-height-small);overflow-x:auto;word-break:break-word;white-space:break-spaces;overflow-wrap:anywhere;}lm = LinearRegression()

model = lm.fit(x_train, y_train)

y_predict= lm.predict(x_test)

print("Train score:")
print(lm.score(x_train, y_train))

print("Test score:")
print(lm.score(x_test, y_test))
plt.scatter(y_test, y_predict)

plt.plot(range(20000), range(20000))
plt.xlabel("Prices: $Y_i$")
plt.ylabel("Predicted prices: $\hat{Y}_i$")
plt.title("Actual Rent vs Predicted Rent")

plt.show()

Hope you enjoyed!
That's the full Repl:
MLR - Full Code

Voters
Profile icon
naththyaanaicch
Profile icon
yyrkhn2
Profile icon
YairCohen2
Profile icon
ItamarCohen28