How to plot a decision boundary in Python
Plotting decision boundaries in Python? Learn various methods, tips, real-world applications, and how to debug common errors.

A decision boundary visualizes a model's classification logic and separates data points into classes. Python's libraries make it simple to plot these boundaries for better model performance insight.
In this article, you'll learn several techniques to plot decision boundaries. We'll cover practical tips, real-world applications, and advice to debug common issues, so you can master this skill.
Basic decision boundary with LogisticRegression
from sklearn.datasets import make_blobs
from sklearn.linear_model import LogisticRegression
import matplotlib.pyplot as plt
import numpy as np
X, y = make_blobs(centers=2, random_state=42)
model = LogisticRegression().fit(X, y)
xx, yy = np.meshgrid(np.linspace(X[:,0].min()-1, X[:,0].max()+1, 100),
np.linspace(X[:,1].min()-1, X[:,1].max()+1, 100))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.3)
plt.scatter(X[:,0], X[:,1], c=y, edgecolors='k')
plt.show()--OUTPUT--[A plot showing blue and orange data points separated by a light blue/orange shaded decision boundary]
This code visualizes the model's decision-making process across the feature space. It starts by using np.meshgrid to create a dense grid of points that covers the entire plot area, essentially forming a canvas. The model then runs predict on every point in this grid to determine which class it would fall into.
Finally, plt.contourf colors the regions of this canvas based on the predicted class for each point. The line where the colors change is the decision boundary. Plotting the original data points on top shows you how the model's logic separates the actual data.
Common techniques for decision boundaries
While the basic method is effective, you can build more versatile and informative plots using a custom function, predict_proba for probabilities, or decision_function with SVMs.
Using a custom plot_boundary function for reusability
def plot_boundary(model, X, y):
xx, yy = np.meshgrid(np.linspace(X[:,0].min()-1, X[:,0].max()+1, 100),
np.linspace(X[:,1].min()-1, X[:,1].max()+1, 100))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.3)
plt.scatter(X[:,0], X[:,1], c=y, edgecolors='k')
plot_boundary(model, X, y)
plt.show()--OUTPUT--[A plot showing the same decision boundary as before but created using a reusable function]
Encapsulating the plotting logic inside a custom function like plot_boundary is a great practice for reusability. This allows you to easily visualize boundaries for different models without rewriting code.
- The function takes the
model, featuresX, and labelsyas arguments. - This makes it simple to swap in a new classifier or dataset.
- It keeps your main script clean and focused on the modeling itself.
Visualizing probability contours with predict_proba
Z_prob = model.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:,1].reshape(xx.shape)
plt.contourf(xx, yy, Z_prob, alpha=0.6, cmap='RdBu_r')
plt.colorbar(label='Probability of class 1')
plt.scatter(X[:,0], X[:,1], c=y, edgecolors='k')
plt.show()--OUTPUT--[A plot showing a red-blue gradient of probabilities with a colorbar indicating confidence levels]
Instead of a simple yes-or-no classification, predict_proba reveals the model's confidence for each prediction. This method plots a gradient showing how the probability of belonging to a class changes across the feature space, offering a more nuanced view than a hard boundary.
- The
predict_probafunction returns the probability for each class; here, we're visualizing the probability of class 1. - A color map like
RdBu_rcreates a smooth gradient, with colors intensifying as the model's certainty increases. - The
colorbarprovides a legend, mapping the colors on the plot to specific probability values.
Using decision_function with SVM classifiers
from sklearn.svm import SVC
svm = SVC(kernel='linear').fit(X, y)
Z_svm = svm.decision_function(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
plt.contourf(xx, yy, Z_svm, alpha=0.3, cmap='RdBu_r')
plt.scatter(X[:,0], X[:,1], c=y, edgecolors='k')
plt.show()--OUTPUT--[A plot showing decision function values with a red-blue gradient and a linear boundary]
For classifiers like Support Vector Machines (SVMs), the decision_function offers a unique perspective. Instead of predicting a class label directly, it calculates each point's signed distance to the decision boundary. This value tells you not only which side of the line a point falls on but also how far away it is.
- The decision boundary is the line where the function's output is zero.
- Positive and negative values correspond to the different classes.
- The color gradient in the plot visualizes this distance, showing the model's "margin" of confidence.
Advanced visualization techniques
With the basics down, you can adapt these plotting methods for more advanced models that handle multiple classes or create complex, non-linear boundaries.
Plotting multi-class boundaries with make_blobs
X_multi, y_multi = make_blobs(centers=3, random_state=42)
multi_model = LogisticRegression().fit(X_multi, y_multi)
xx_m, yy_m = np.meshgrid(np.linspace(X_multi[:,0].min()-1, X_multi[:,0].max()+1, 100),
np.linspace(X_multi[:,1].min()-1, X_multi[:,1].max()+1, 100))
Z_multi = multi_model.predict(np.c_[xx_m.ravel(), yy_m.ravel()]).reshape(xx_m.shape)
plt.contourf(xx_m, yy_m, Z_multi, alpha=0.3, cmap='viridis')
plt.scatter(X_multi[:,0], X_multi[:,1], c=y_multi, cmap='viridis', edgecolors='k')
plt.show()--OUTPUT--[A plot showing three different colored classes of data points with decision boundaries between them]
Plotting boundaries for models with more than two classes follows the same core logic. This example uses make_blobs(centers=3) to generate data with three distinct clusters, and the LogisticRegression model learns to separate them.
- The
predictmethod now assigns each point in the grid to one of the three classes. contourfthen draws the decision regions, creating boundaries where the predictions change from one class to another.- Using a distinct color map like
'viridis'helps differentiate the class regions clearly.
Non-linear boundaries with RBF kernel SVC
rbf_model = SVC(kernel='rbf').fit(X, y)
Z_rbf = rbf_model.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
plt.contourf(xx, yy, Z_rbf, alpha=0.3)
plt.scatter(X[:,0], X[:,1], c=y, edgecolors='k')
plt.show()--OUTPUT--[A plot showing a curved non-linear decision boundary separating two classes]
Linear models aren't always enough. When your data is more complex, you can use a Support Vector Classifier with a non-linear kernel. By setting kernel='rbf' in the SVC model, it learns a curved boundary instead of a straight line. This is perfect for capturing intricate patterns in your data.
- The Radial Basis Function (RBF) kernel allows the
SVCto create flexible, non-linear decision boundaries. - This approach is highly effective for datasets where the classes are not linearly separable.
- The resulting plot shows how the model can bend its boundary to correctly classify more points.
Decision boundaries with RandomForest classifier
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=100, random_state=42).fit(X, y)
Z_rf = rf.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
plt.contourf(xx, yy, Z_rf, alpha=0.3)
plt.scatter(X[:,0], X[:,1], c=y, edgecolors='k')
plt.show()--OUTPUT--[A plot showing a more complex and potentially jagged decision boundary created by the Random Forest]
A RandomForestClassifier builds an entire "forest" of decision trees and combines their outputs for a final vote. This ensemble approach creates a highly flexible decision boundary that can capture very complex patterns. Unlike the smooth curves of an SVC, a random forest's boundary can appear more jagged or irregular as it fits the data closely.
- The final boundary is an aggregation of many individual tree predictions.
- This method is powerful for non-linear data, allowing the model to learn intricate separations between classes.
Move faster with Replit
Replit is an AI-powered development platform that transforms natural language into working applications. Describe what you want to build, and Replit Agent creates it—complete with databases, APIs, and deployment.
For the decision boundary techniques we've explored, Replit Agent can turn them into production-ready tools:
- Build an interactive dashboard that visualizes how different classifiers, like
SVCorRandomForestClassifier, draw boundaries on a given dataset. - Create a fraud detection simulator that plots the decision boundary separating legitimate and fraudulent transactions based on two input features.
- Deploy a medical diagnosis tool that visualizes how a model classifies patient data into different risk categories.
Bring your own machine learning concepts to life. Try Replit Agent and watch it build, test, and deploy your application automatically.
Common errors and challenges
Plotting decision boundaries can be tricky, but you can navigate common errors involving data shapes, feature scaling, and probability predictions with a few fixes.
Fixing shape mismatch when plotting with predict
A frequent error is a ValueError related to mismatched array shapes. This usually happens because the predict function returns a one-dimensional array of predictions, but contourf expects a two-dimensional grid that matches your meshgrid.
- The fix is to reshape the predictions array to match the shape of your grid.
- After making predictions on your flattened grid, simply add
.reshape(xx.shape)to transform the output back into the correct 2D structure for plotting.
Improving boundaries with proper feature StandardScaler
If your decision boundary looks skewed or counterintuitive, it might be a feature scaling issue. When features have vastly different ranges—like age (18-80) and income (50k-500k)—models can incorrectly prioritize the feature with larger values.
- Use
StandardScalerfrom scikit-learn to standardize your features before fitting the model. - This process rescales your data so that all features have a mean of 0 and a standard deviation of 1, ensuring they contribute equally and leading to a more accurate boundary.
Fixing predict_proba errors with SVC
Calling predict_proba on a Support Vector Classifier (SVC) will throw an error by default. This is because probability estimation is computationally intensive and isn't enabled automatically.
- To fix this, you must set
probability=Truewhen you first create the model, for example:SVC(kernel='rbf', probability=True). - This tells the classifier to perform the extra steps needed to calculate probability scores, allowing you to visualize the model's confidence levels.
Fixing shape mismatch when plotting with predict
A common pitfall is passing your meshgrid arrays directly to the predict function. Scikit-learn models expect a single array of data points, not two separate coordinate grids. This input shape mismatch will immediately trigger an error. The code below demonstrates this mistake.
X, y = make_blobs(centers=2, random_state=42)
model = LogisticRegression().fit(X, y)
xx, yy = np.meshgrid(np.linspace(X[:,0].min()-1, X[:,0].max()+1, 100),
np.linspace(X[:,1].min()-1, X[:,1].max()+1, 100))
# This will cause an error - incorrect input shape
Z = model.predict(xx, yy).reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.3)
plt.scatter(X[:,0], X[:,1], c=y, edgecolors='k')
The predict function expects a single array of points, but here it receives two separate grids, xx and yy. This input mismatch is the source of the error. The corrected version below shows how to fix this.
X, y = make_blobs(centers=2, random_state=42)
model = LogisticRegression().fit(X, y)
xx, yy = np.meshgrid(np.linspace(X[:,0].min()-1, X[:,0].max()+1, 100),
np.linspace(X[:,1].min()-1, X[:,1].max()+1, 100))
# Correctly combine xx and yy into a 2D feature array
Z = model.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.3)
plt.scatter(X[:,0], X[:,1], c=y, edgecolors='k')
The solution is to correctly format the grid for the predict function, which expects a single array of data points. You can't pass the xx and yy grids separately. Instead, you must combine them into a two-column array that represents all the points in your plot.
- The function
np.c_[xx.ravel(), yy.ravel()]handles this by first flattening each grid withravel(). - Then,
np.c_stacks them as columns, creating the exact input shape the model needs.
Improving boundaries with proper feature StandardScaler
When features aren't scaled properly, a model can produce a skewed decision boundary that doesn't accurately reflect the data's structure. The feature with the larger scale often dominates the learning process, leading to a distorted and less effective model.
The code below demonstrates this problem. We'll intentionally make one feature much larger than the other to show how it impacts the boundary created by the SVC model.
from sklearn.datasets import make_classification
from sklearn.svm import SVC
X, y = make_classification(n_samples=100, n_features=2, random_state=42)
X[:,0] = X[:,0] * 1000 # Scaling first feature to be much larger
model = SVC(kernel='linear').fit(X, y)
xx, yy = np.meshgrid(np.linspace(X[:,0].min()-1, X[:,0].max()+1, 100),
np.linspace(X[:,1].min()-1, X[:,1].max()+1, 100))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.3)
plt.scatter(X[:,0], X[:,1], c=y, edgecolors='k')
By multiplying one feature by 1000, the code forces the SVC to prioritize it, creating a distorted boundary that ignores the data's true structure. The following code demonstrates how to correct this imbalance for a more accurate plot.
from sklearn.datasets import make_classification
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
X, y = make_classification(n_samples=100, n_features=2, random_state=42)
X[:,0] = X[:,0] * 1000 # Scaling first feature to be much larger
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
model = SVC(kernel='linear').fit(X_scaled, y)
xx, yy = np.meshgrid(np.linspace(X_scaled[:,0].min()-1, X_scaled[:,0].max()+1, 100),
np.linspace(X_scaled[:,1].min()-1, X_scaled[:,1].max()+1, 100))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.3)
plt.scatter(X_scaled[:,0], X_scaled[:,1], c=y, edgecolors='k')
The solution is to apply StandardScaler to your features before fitting the model. This step rescales the data, so features with large numerical ranges don't unfairly influence the model's logic.
- The model is then trained and plotted using this new
X_scaleddata. - This produces a more accurate boundary that reflects the data's true structure.
Always consider scaling when your features have different units or scales, like age and income, to avoid this common pitfall.
Fixing predict_proba errors with SVC
Calling predict_proba on a Support Vector Classifier (SVC) will raise an error by default. This is because probability estimation is computationally intensive and isn't enabled automatically. The code below demonstrates what happens when you try to call it without proper setup.
from sklearn.svm import SVC
from sklearn.datasets import make_blobs
X, y = make_blobs(centers=2, random_state=42)
model = SVC(kernel='linear').fit(X, y)
xx, yy = np.meshgrid(np.linspace(X[:,0].min()-1, X[:,0].max()+1, 100),
np.linspace(X[:,1].min()-1, X[:,1].max()+1, 100))
# This will raise an error - SVC doesn't have predict_proba by default
Z_prob = model.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:,1].reshape(xx.shape)
plt.contourf(xx, yy, Z_prob, alpha=0.6, cmap='RdBu_r')
plt.scatter(X[:,0], X[:,1], c=y, edgecolors='k')
The code fails because the default SVC model lacks the predict_proba method, causing an error when it's called. The following snippet demonstrates the simple fix required when you initialize the model.
from sklearn.svm import SVC
from sklearn.datasets import make_blobs
X, y = make_blobs(centers=2, random_state=42)
# Enable probability estimates during model creation
model = SVC(kernel='linear', probability=True).fit(X, y)
xx, yy = np.meshgrid(np.linspace(X[:,0].min()-1, X[:,0].max()+1, 100),
np.linspace(X[:,1].min()-1, X[:,1].max()+1, 100))
Z_prob = model.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:,1].reshape(xx.shape)
plt.contourf(xx, yy, Z_prob, alpha=0.6, cmap='RdBu_r')
plt.scatter(X[:,0], X[:,1], c=y, edgecolors='k')
The fix is to enable probability estimates when you initialize the SVC model. By default, this feature is off to save on computation, so you need to turn it on by setting probability=True.
- This tells the classifier to perform the extra calculations needed for probability scores.
- Once enabled, you can call
predict_probato get the model's confidence levels, which is great for creating more nuanced visualizations.
Real-world applications
With the technical challenges solved, you can apply these plotting skills to practical fields like credit scoring and medical diagnosis.
Credit risk assessment with LogisticRegression
A LogisticRegression model can create a decision boundary to visualize how lenders might assess credit risk, separating applicants based on their income and debt ratio.
import numpy as np, matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
X = np.column_stack([np.random.normal(50000, 15000, 100), np.random.normal(0.3, 0.1, 100)])
y = ((X[:,0] > 45000) & (X[:,1] < 0.4)).astype(int)
model = LogisticRegression().fit(X, y)
xx, yy = np.meshgrid(np.linspace(20000, 80000, 100), np.linspace(0.1, 0.6, 100))
plt.contourf(xx, yy, model.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape), alpha=0.3)
plt.scatter(X[:,0], X[:,1], c=y, edgecolors='k')
plt.xlabel('Income ($)'); plt.ylabel('Debt Ratio')
plt.show()
This code generates a synthetic dataset for a binary classification task, using np.random.normal to create two features. It then programmatically assigns labels to the data points: one class for points where the first feature is above 45,000 and the second is below 0.4, and another class for all other points.
- A
LogisticRegressionmodel is trained on this labeled data. - The plot then visualizes the decision boundary the model learned, showing how it separates the two classes across the feature space.
Visualizing medical diagnoses with SVC and PCA
By combining Principal Component Analysis (PCA) with a Support Vector Classifier (SVC), you can visualize how a model classifies high-dimensional data from the load_breast_cancer dataset into diagnostic categories.
from sklearn.datasets import load_breast_cancer
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
cancer = load_breast_cancer()
X_pca = PCA(n_components=2).fit_transform(StandardScaler().fit_transform(cancer.data))
model = SVC(kernel='rbf', probability=True).fit(X_pca, cancer.target)
xx, yy = np.meshgrid(np.linspace(X_pca[:,0].min()-1, X_pca[:,0].max()+1, 100),
np.linspace(X_pca[:,1].min()-1, X_pca[:,1].max()+1, 100))
plt.contourf(xx, yy, model.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:,1].reshape(xx.shape), alpha=0.6)
plt.scatter(X_pca[:,0], X_pca[:,1], c=cancer.target, edgecolors='k')
plt.show()
This code prepares the high-dimensional breast cancer dataset for a 2D plot. First, it uses StandardScaler to ensure all features contribute equally. Then, PCA condenses the data into its two most informative components, making it possible to visualize.
- An
SVCmodel with a non-linearrbfkernel is trained on this simplified 2D data to capture complex relationships. - The final plot uses
predict_probato visualize the model’s confidence, showing a probability gradient rather than a simple boundary line.
Get started with Replit
Turn what you've learned into a real tool. Tell Replit Agent: “Build a dashboard to visualize customer churn” or “Create a tool that plots loan approval boundaries based on income and credit score.”
The agent writes the code, tests for errors, and deploys your app automatically. Start building with Replit and bring your ideas to life.
Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.
Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.



