How to plot a linear regression in Python
Learn how to plot linear regression in Python. Discover different methods, tips, real-world applications, and how to debug common errors.

A linear regression plot visualizes the relationship between variables, a key step in data analysis. Python offers powerful libraries to create these plots with clarity and precision for any dataset.
Here, you'll explore different techniques to create these plots effectively. The article covers practical tips, shows real-world applications, and provides advice to help you debug common errors and refine your visualizations.
Basic linear regression plot with NumPy and Matplotlib
import numpy as np
import matplotlib.pyplot as plt
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3.5, 5, 6.2, 7.5])
m, b = np.polyfit(x, y, 1)
plt.scatter(x, y)
plt.plot(x, m*x + b, color='red')
plt.show()--OUTPUT--[A scatter plot with blue dots representing the data points and a red line showing the linear regression fit]
This approach combines NumPy’s calculation power with Matplotlib’s visualization tools. The core of the regression is the np.polyfit(x, y, 1) function. It computes the slope (m) and intercept (b) for a line of best fit. The final argument, 1, specifies a first-degree polynomial, which is simply a straight line.
Once you have the slope and intercept, you can visualize the results. First, plt.scatter() plots your original data points. Then, plt.plot(x, m*x + b) draws the regression line by applying the calculated coefficients to the x-values, effectively overlaying the trend on your data.
Common libraries for regression visualization
While the NumPy and Matplotlib approach is fundamental, libraries like pandas, seaborn, and scikit-learn offer more direct and powerful methods for regression plotting.
Using pandas for linear regression plots
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'x': [1, 2, 3, 4, 5], 'y': [2, 3.5, 5, 6.2, 7.5]})
plt.scatter(df.x, df.y)
plt.plot(df.x, df.x * 1.35 + 0.7, color='green')
plt.xlabel('X values')
plt.ylabel('Y values')
plt.show()--OUTPUT--[A scatter plot with data points and a green regression line, with labeled X and Y axes]
Using pandas organizes your data into a DataFrame, a common practice that simplifies data handling. From there, you can plot columns directly with matplotlib, as seen with plt.scatter(df.x, df.y).
- Unlike the previous method, the regression line here is manually defined with the equation
df.x * 1.35 + 0.7. - This example also adds clarity by labeling the axes using
plt.xlabel()andplt.ylabel().
Creating regression plots with seaborn
import seaborn as sns
import numpy as np
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3.5, 5, 6.2, 7.5])
sns.regplot(x=x, y=y, line_kws={"color":"purple"})
plt.title("Linear Regression with Seaborn")
plt.show()--OUTPUT--[A scatter plot with data points, a purple regression line, and a shaded confidence interval region]
Seaborn streamlines regression plotting with its regplot() function. It’s a high-level tool that combines the scatter plot and regression line fitting into a single command, so you don't need to calculate the slope and intercept yourself.
- The function automatically draws both the data points and the line of best fit.
- A key feature is the shaded confidence interval it adds around the regression line, which visualizes the uncertainty in the model's fit.
- You can easily style the line using the
line_kwsparameter to pass a dictionary of keyword arguments, such as color.
Using scikit-learn for regression visualization
from sklearn.linear_model import LinearRegression
import numpy as np
import matplotlib.pyplot as plt
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([2, 3.5, 5, 6.2, 7.5])
model = LinearRegression().fit(X, y)
plt.scatter(X, y)
plt.plot(X, model.predict(X), color='orange')
plt.text(1, 7, f'R² = {model.score(X, y):.3f}')
plt.show()--OUTPUT--[A scatter plot with data points, an orange regression line, and an R-squared value displayed]
scikit-learn frames regression as a machine learning task. It’s a powerful library where you first create and train a LinearRegression model using the .fit(X, y) method. This prepares the model to make predictions from your data.
- Your feature data
Xmust be reshaped with.reshape(-1, 1), asscikit-learnexpects a 2D array. - The regression line is drawn using
model.predict(X), which applies the trained model to generate the line's points. - The
.score()method conveniently calculates the R-squared value—a metric showing how well the line fits the data—which is then displayed on the plot.
Advanced regression plotting techniques
With the fundamentals covered, you're ready to tackle more complex visualizations, such as plotting multiple variables, showing uncertainty, and building interactive regression plots.
Visualizing multiple regression with 3D plots
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.linear_model import LinearRegression
x1 = np.random.rand(100)
x2 = np.random.rand(100)
y = 2*x1 + 3*x2 + np.random.randn(100)*0.5
X = np.column_stack((x1, x2))
model = LinearRegression().fit(X, y)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x1, x2, y)
x1_range = np.linspace(0, 1, 10)
x2_range = np.linspace(0, 1, 10)
X1, X2 = np.meshgrid(x1_range, x2_range)
Z = model.predict(np.column_stack((X1.ravel(), X2.ravel()))).reshape(X1.shape)
ax.plot_surface(X1, X2, Z, alpha=0.3)
plt.show()--OUTPUT--[A 3D scatter plot with data points and a semi-transparent surface representing the multiple regression plane]
When your outcome depends on two variables, you move from a regression line to a regression plane. This code visualizes that relationship in 3D using matplotlib and scikit-learn. A LinearRegression model is trained on two independent variables (x1, x2) to predict a dependent one (y).
- First,
ax.scatter()plots the raw data points in 3D space. - Then,
np.meshgrid()creates a coordinate grid, andmodel.predict()calculates the corresponding Z-values to define the regression plane. - Finally,
ax.plot_surface()draws this plane over the data.
Adding confidence intervals to regression plots
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3.5, 5, 6.2, 7.5])
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
y_pred = intercept + slope * x
plt.scatter(x, y)
plt.plot(x, y_pred, 'r-')
plt.fill_between(x, y_pred - std_err*2, y_pred + std_err*2, alpha=0.2)
plt.show()--OUTPUT--[A scatter plot with data points, a red regression line, and a light red shaded region representing the confidence interval]
This approach uses SciPy to visualize the uncertainty in your regression model. It's a powerful way to show how much the predicted values might vary from the actual data.
- The
stats.linregress()function is the workhorse here. It returns several statistical values, including the standard error (std_err), which quantifies the model's prediction error. - You then use
plt.fill_between()to draw a shaded region around the regression line. This band, defined by the standard error, visually represents the confidence interval for your predictions.
Creating interactive regression plots with plotly
import plotly.express as px
import pandas as pd
import numpy as np
# Create sample data
np.random.seed(42)
x = np.arange(1, 101)
y = 2*x + 10*np.random.randn(100)
df = pd.DataFrame({'x': x, 'y': y})
# Create interactive regression plot
fig = px.scatter(df, x='x', y='y', trendline='ols',
trendline_color_override='red')
fig.update_layout(title='Interactive Linear Regression')
fig.show()--OUTPUT--[An interactive scatter plot with data points and a red regression line, with hover capabilities showing point values]
Plotly Express makes creating interactive visualizations incredibly straightforward. The px.scatter() function handles both the scatter plot and the regression line in a single command, which is a significant time saver.
- The key is the
trendline='ols'argument. It tells Plotly to automatically compute and draw an Ordinary Least Squares regression line. - The resulting plot is fully interactive. You can hover over data points to see their values, making it perfect for data exploration.
Move faster with Replit
Replit is an AI-powered development platform that transforms natural language into working applications. Describe what you want to build, and Replit Agent creates it—complete with databases, APIs, and deployment.
For the regression techniques covered in this article, Replit Agent can turn them into production-ready tools.
- Build a financial forecasting tool that uses linear regression to predict stock prices from historical data and visualizes the trendline.
- Create a sales dashboard that plots monthly revenue and displays a regression line with confidence intervals to project future growth.
- Deploy a scientific analysis utility that generates interactive 3D regression plots for researchers to explore relationships between multiple experimental variables.
You can turn any of these concepts into a working application. Try Replit Agent by describing your idea, and it will write, test, and deploy the code for you.
Common errors and challenges
Plotting regression models can be tricky, but most errors have straightforward fixes you can master quickly.
Dealing with NaN values in regression data
Missing data, often represented as NaN (Not a Number), can stop a regression analysis in its tracks. Functions like np.polyfit() or LinearRegression().fit() can't operate on incomplete datasets, which will usually raise an error.
- You can remove rows with missing values using the
dropna()method in pandas. - Alternatively, you could fill them with a calculated value, such as the column’s mean or median, using
fillna().
Correcting array shapes for sklearn regression models
When using scikit-learn, you might encounter a ValueError because your data has the wrong shape. The library’s models expect the feature data, X, to be a 2D array, even if you only have one feature. A 1D array or pandas Series won't work on its own.
The fix is to reshape your data. Calling .reshape(-1, 1) on your feature array converts it into a single-column 2D array, satisfying scikit-learn's input requirements and allowing the model to train correctly.
Fixing axis limits for proper regression visualization
Sometimes your plot's axes might not adjust properly, cutting off data points or extending the regression line awkwardly beyond your data's range. This can make the visualization confusing or misleading. You can take control by setting the axis boundaries yourself.
Use matplotlib functions like plt.xlim() and plt.ylim() after creating your plot. This lets you define the exact visual window, ensuring your data and trendline are framed clearly and effectively.
Dealing with NaN values in regression data
Missing data, or NaN values, are a common roadblock in regression analysis. Most libraries can't perform calculations on incomplete datasets, which will cause the code to fail. The example below shows what happens when np.polyfit() encounters NaN values.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Dataset with missing values
data = pd.DataFrame({
'x': [1, 2, np.nan, 4, 5],
'y': [2, np.nan, 5, 6.2, 7.5]
})
# Will fail with missing values
plt.scatter(data['x'], data['y'])
m, b = np.polyfit(data['x'], data['y'], 1)
plt.plot(data['x'], m*data['x'] + b, color='red')
plt.show()
The calculation fails because np.polyfit() receives columns containing np.nan values, which are mathematically undefined. The corrected code below demonstrates how to prepare the data before plotting.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Dataset with missing values
data = pd.DataFrame({
'x': [1, 2, np.nan, 4, 5],
'y': [2, np.nan, 5, 6.2, 7.5]
})
# Fix: drop missing values before plotting
clean_data = data.dropna()
plt.scatter(clean_data['x'], clean_data['y'])
m, b = np.polyfit(clean_data['x'], clean_data['y'], 1)
plt.plot(clean_data['x'], m*clean_data['x'] + b, color='red')
plt.show()
The fix is to clean the data before analysis. By calling data.dropna(), you create a new DataFrame that excludes any rows with missing values. This clean dataset can then be used by np.polyfit() without causing an error.
- Always check for and handle
NaNvalues before performing calculations, especially when working with data from external sources, as it's a common source of errors.
Correcting array shapes for sklearn regression models
You'll often hit a ValueError with scikit-learn if your data isn't shaped correctly. The library expects a 2D array for your features, but it's easy to accidentally pass a 1D array. The following code demonstrates this common mistake.
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
# Incorrect shape for sklearn
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3.5, 5, 6.2, 7.5])
# This will raise an error
model = LinearRegression()
model.fit(x, y) # x needs to be 2D
plt.scatter(x, y)
plt.plot(x, model.predict(x), color='red')
plt.show()
The error occurs because the model.fit() method receives the x data as a simple, one-dimensional array. It's expecting a columnar format instead. The following code demonstrates the necessary adjustment.
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
# Correcting shape for sklearn
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3.5, 5, 6.2, 7.5])
# Fix: reshape x to be 2D
x_2d = x.reshape(-1, 1)
model = LinearRegression()
model.fit(x_2d, y)
plt.scatter(x, y)
plt.plot(x, model.predict(x_2d), color='red')
plt.show()
The fix is to reshape your feature data before passing it to the fit() method. scikit-learn requires a 2D array for features, even when you only have one. This is because the library is designed to handle multiple features by default.
- The code
x.reshape(-1, 1)converts your 1D array into a 2D array with a single column.
This simple change aligns the data with the library's input requirements, allowing the model to train without error.
Fixing axis limits for proper regression visualization
By default, a regression line in matplotlib only spans the range of your data points. This can make the trend look abrupt or incomplete. The code below illustrates this common visualization issue, where the line stops short at the first and last points.
import numpy as np
import matplotlib.pyplot as plt
# Data points
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3.5, 5, 6.2, 7.5])
# Linear regression
m, b = np.polyfit(x, y, 1)
plt.scatter(x, y)
plt.plot(x, m*x + b, color='red')
# Line only spans the x range of data points
plt.show()
The issue arises because plt.plot() is only given the original x values to draw upon. As a result, the line doesn't extend beyond your data's boundaries. The following code shows how to correct this.
import numpy as np
import matplotlib.pyplot as plt
# Data points
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3.5, 5, 6.2, 7.5])
# Linear regression
m, b = np.polyfit(x, y, 1)
plt.scatter(x, y)
# Fix: extend the line beyond data points
x_line = np.array([0, 6]) # Extended range
plt.plot(x_line, m*x_line + b, color='red')
plt.xlim(0, 6) # Set explicit axis limits
plt.show()
The fix is to manually extend the line's range. Instead of plotting with your original x values, you create a new array, x_line, that spans a wider area. This new array is then used to draw the regression line with plt.plot().
- To ensure the entire line is visible, you can adjust the plot's boundaries using
plt.xlim(). - This makes your trendline look more complete and is helpful for visualizing extrapolations beyond your dataset.
Real-world applications
With these common errors solved, you can apply regression plotting to practical scenarios like real estate analysis and model diagnostics.
Using numpy to predict real estate prices
This practical example shows how to model the relationship between house size and price, allowing you to visualize and predict property values with a simple regression line.
import numpy as np
import matplotlib.pyplot as plt
sizes = np.array([750, 850, 950, 1050, 1150, 1250])
prices = np.array([150, 170, 195, 215, 235, 260])
m, b = np.polyfit(sizes, prices, 1)
plt.scatter(sizes, prices)
plt.plot(sizes, m*sizes + b, 'r-')
plt.xlabel('House Size (sq ft)')
plt.ylabel('Price ($1000s)')
plt.show()
This snippet uses NumPy to perform the core math for a linear regression. After defining the sizes and prices arrays, it calls np.polyfit() to compute the slope (m) and intercept (b) of the trendline. Matplotlib then handles the visualization, and labeling the axes with plt.xlabel() and plt.ylabel() makes the final plot easy to interpret.
- The
plt.scatter()function displays the original data as individual points. - The
plt.plot()function overlays the calculated regression line, styled as a solid red line with'r-'.
Creating residual plots to diagnose model fit
A residual plot visualizes the errors in your model’s predictions, which is a great way to diagnose how well the regression line fits your data.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
y = np.array([2, 4, 5, 4, 6, 8, 7, 10, 11, 14]) + np.random.randn(10)
model = LinearRegression().fit(x.reshape(-1, 1), y)
residuals = y - model.predict(x.reshape(-1, 1))
plt.scatter(x, residuals)
plt.axhline(y=0, color='red')
plt.title('Residual Plot')
plt.show()
This code trains a LinearRegression model and calculates the residuals—the difference between the actual y values and the model's predictions. It then creates a scatter plot to visualize how these residuals are distributed.
- The
residualsare calculated by subtracting the output ofmodel.predict()from the originalyarray. plt.scatter()plots these residuals against the originalxvalues.- A horizontal line at zero is added with
plt.axhline(), representing where the model's prediction perfectly matches the actual data.
Get started with Replit
Turn these techniques into a real tool with Replit Agent. Describe what you want, like “a web app that predicts housing prices from square footage and plots the regression line,” or “a dashboard visualizing sales data with a trendline.”
The agent writes the code, tests for errors, and deploys the app, turning your prompt into a finished product. Start building with Replit.
Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.
Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.



.png)