How to make a confusion matrix in Python
Learn to create a confusion matrix in Python. Explore methods, tips, real-world applications, and how to debug common errors.

A confusion matrix is a key tool to evaluate machine learning model performance. It provides a clear summary of prediction accuracy so you can understand where your model succeeds and fails.
In this article, you'll learn how to create a confusion matrix with libraries like Scikit-learn and its confusion_matrix function. You'll explore practical techniques, real-world applications, and debugging advice for your projects.
Using sklearn.metrics.confusion_matrix
from sklearn.metrics import confusion_matrix
y_true = [0, 1, 0, 1, 0, 1, 0, 1]
y_pred = [0, 1, 0, 0, 1, 1, 0, 1]
cm = confusion_matrix(y_true, y_pred)
print(cm)--OUTPUT--[[3 1]
[1 3]]
The confusion_matrix function from Scikit-learn takes two primary arguments: y_true for the actual labels and y_pred for your model's predictions. It compares these arrays to generate the matrix.
The output [[3 1], [1 3]] breaks down performance at a glance:
- Top-left (3): True Negatives, where class 0 was correctly identified.
- Top-right (1): False Positives, where class 0 was mistaken for class 1.
- Bottom-left (1): False Negatives, where class 1 was mistaken for class 0.
- Bottom-right (3): True Positives, where class 1 was correctly identified.
Basic visualization techniques
That raw array is a good start, but visualizing it with tools like Matplotlib, Seaborn, and Pandas gives you a much clearer picture of performance.
Using matplotlib to plot the matrix
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import confusion_matrix
y_true = [0, 1, 0, 1, 0, 1, 0, 1]
y_pred = [0, 1, 0, 0, 1, 1, 0, 1]
cm = confusion_matrix(y_true, y_pred)
plt.imshow(cm, cmap='Blues')
plt.colorbar()
plt.show()--OUTPUT--[Output would be a blue-colored heatmap with a color scale]
Matplotlib’s imshow function renders the confusion matrix as an image, giving you a quick visual representation of the data. The cmap='Blues' argument sets the color map, so cells with higher values appear in darker shades of blue.
- The
plt.colorbar()function adds a scale to the side of the plot, which helps you map the colors back to their numerical values. - Using a heatmap like this makes it easy to spot the most frequent outcomes at a glance, especially with larger matrices.
Creating a heatmap with seaborn
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
y_true = [0, 1, 0, 1, 0, 1, 0, 1]
y_pred = [0, 1, 0, 0, 1, 1, 0, 1]
cm = confusion_matrix(y_true, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='YlGnBu')
plt.show()--OUTPUT--[Output would be a yellow-green-blue heatmap with numbers in each cell]
For a more polished and informative plot, Seaborn is an excellent choice. Its heatmap() function is tailor-made for this task and builds upon Matplotlib to offer a cleaner result with just one line.
- Setting
annot=Trueis the most useful feature, as it writes the data value in each cell. - The
fmt='d'argument tells Seaborn to format those numbers as integers. - A different color map like
cmap='YlGnBu'gives your plot a fresh look.
Using pandas for better labeling
import pandas as pd
from sklearn.metrics import confusion_matrix
y_true = [0, 1, 0, 1, 0, 1, 0, 1]
y_pred = [0, 1, 0, 0, 1, 1, 0, 1]
cm = confusion_matrix(y_true, y_pred)
df_cm = pd.DataFrame(cm, index=['Actual 0', 'Actual 1'], columns=['Predicted 0', 'Predicted 1'])
print(df_cm)--OUTPUT--Predicted 0 Predicted 1
Actual 0 3 1
Actual 1 1 3
While the raw array is useful, `pandas` makes it much more readable. By wrapping the matrix in a `pd.DataFrame`, you can assign explicit labels to the rows and columns. This transforms the output from a simple grid of numbers into a self-explanatory table.
- The
indexargument labels the rows, showing the actual ground truth values. - The
columnsargument labels the columns, showing your model's predictions.
This small change eliminates any guesswork, making it clear what each value in the matrix represents.
Advanced techniques
Beyond the basic plots, advanced techniques can help you extract more nuanced insights and tell a clearer story about your model's performance.
Creating a normalized confusion matrix
import numpy as np
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
y_true = [0, 1, 0, 1, 0, 1, 0, 1]
y_pred = [0, 1, 0, 0, 1, 1, 0, 1]
cm = confusion_matrix(y_true, y_pred)
cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
sns.heatmap(cm_normalized, annot=True, cmap='Blues', fmt='.2f')
plt.show()--OUTPUT--[Output would be a heatmap with normalized values (between 0-1) in each cell]
Normalizing the confusion matrix shifts the focus from raw counts to proportions, which is great for evaluating performance on imbalanced datasets. It helps you see the percentage of correct and incorrect predictions for each class. The code achieves this by dividing each value in the matrix by the sum of its corresponding row.
- The key operation is
cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]. - When plotting,
fmt='.2f'formats the cell annotations as floating-point numbers with two decimal places.
Using the yellowbrick library for classification reports
from yellowbrick.classifier import ConfusionMatrix
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
X, y = make_classification(random_state=42)
model = RandomForestClassifier(random_state=42)
cm = ConfusionMatrix(model, classes=[0, 1])
cm.fit(X, y)
cm.score(X, y)
cm.show()--OUTPUT--[Output would be a styled confusion matrix with performance metrics]
The yellowbrick library offers a high-level approach that streamlines plotting. Instead of manually generating the matrix and then plotting it, you wrap your model directly in a ConfusionMatrix visualizer object. This object handles the entire workflow for you, from training to rendering.
- You initialize
ConfusionMatrixwith your model, such as aRandomForestClassifier. - The
.fit()and.score()methods train and evaluate the model internally. - Calling
.show()automatically generates and displays a polished plot, often including extra metrics like precision and recall.
Creating an interactive confusion matrix with plotly
import plotly.figure_factory as ff
import numpy as np
from sklearn.metrics import confusion_matrix
y_true = [0, 1, 0, 1, 0, 1, 0, 1]
y_pred = [0, 1, 0, 0, 1, 1, 0, 1]
cm = confusion_matrix(y_true, y_pred)
x = ['Predicted 0', 'Predicted 1']
y = ['Actual 0', 'Actual 1']
fig = ff.create_annotated_heatmap(cm, x=x, y=y, colorscale='Blues')
fig.show()--OUTPUT--[Output would be an interactive heatmap that can be hovered over for details]
Plotly takes visualization a step further by making it interactive. The create_annotated_heatmap function from plotly.figure_factory generates a plot you can hover over to see details for each cell. This is especially useful for presentations or for doing a deep dive into your model's results without cluttering the static image.
- You simply pass your confusion matrix
cmto the function. - The
xandyarguments let you provide clear, descriptive labels for the axes. - The result is a clean, interactive heatmap that makes exploring your model's performance more intuitive.
Move faster with Replit
Learning these techniques is a great first step, but Replit helps you go from experimenting with code to shipping complete applications. Replit is an AI-powered development platform with all Python dependencies pre-installed, so you can skip setup and start coding instantly.
Instead of just piecing together functions, you can use Agent 4 to build a full-featured product from a simple description. You can take the concepts from this article and build a complete tool:
- An interactive dashboard that visualizes model performance with a
plotlyconfusion matrix based on uploaded predictions. - A model evaluation utility that generates a full
yellowbrickclassification report from true and predicted labels. - A performance monitoring tool that retrains a classifier and compares the new confusion matrix against a baseline to detect model drift.
Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.
Common errors and challenges
When building a confusion matrix, you might run into a few common issues, but they are all simple to solve.
A frequent challenge is a label order mismatch. By default, the confusion_matrix function sorts labels alphanumerically. If your classes are strings like "cat" and "dog," the matrix will be ordered with "cat" first, which might not be what you expect. You can enforce a specific order by passing a list to the labels parameter, such as labels=['dog', 'cat'], to ensure the axes align with your interpretation.
You'll also get a ValueError if your input arrays have different shapes. The confusion_matrix function requires y_true and y_pred to have the exact same number of elements because it performs a one-to-one comparison. If they don't match, double-check your data preparation steps to find where the discrepancy was introduced.
Sometimes, a particular batch of data might not contain every possible class. This can cause the output matrix to have a smaller shape than you expect, breaking any code that relies on fixed dimensions. To prevent this, use the labels parameter to provide a complete list of all possible classes. This guarantees the confusion matrix always has the correct size, even if some classes aren't present in a given prediction set.
Handling label order mismatch in confusion_matrix
By default, confusion_matrix sorts labels alphanumerically. This can scramble your output if the order isn’t what you expect, making the results tricky to interpret. The following code demonstrates what happens when your true and predicted labels don't align as intended.
from sklearn.metrics import confusion_matrix
# Classes in dataset are [0, 1, 2] but predictions use [1, 0, 2]
y_true = [0, 1, 2, 0, 1, 2]
y_pred = [1, 0, 2, 1, 0, 2]
# This creates a confusing matrix that's hard to interpret
cm = confusion_matrix(y_true, y_pred)
print(cm)
The model consistently swaps its predictions for classes 0 and 1. Because the function sorts the labels by default, the resulting matrix incorrectly suggests random errors instead of a systematic mix-up. See how to fix this below.
from sklearn.metrics import confusion_matrix
y_true = [0, 1, 2, 0, 1, 2]
y_pred = [1, 0, 2, 1, 0, 2]
# Explicitly specify the label order to ensure consistency
labels = [0, 1, 2]
cm = confusion_matrix(y_true, y_pred, labels=labels)
print(cm)
The solution is to pass an explicit order to the labels parameter. This forces confusion_matrix to build the matrix with your desired axis sequence, making the output predictable and easy to interpret.
- This is crucial when your class labels are strings or non-sequential numbers, as it prevents the default alphanumeric sorting from scrambling your results.
By setting labels=[0, 1, 2], you ensure the output accurately reflects the model's behavior.
Handling wrong input shapes with confusion_matrix
You'll get a ValueError if you pass raw probabilities to confusion_matrix instead of class labels. This often happens with neural network outputs, which produce continuous scores. The function expects 1D arrays for both inputs, as the code below demonstrates.
from sklearn.metrics import confusion_matrix
import numpy as np
# One-hot encoded predictions from a neural network
y_true = [0, 1, 0, 1]
y_pred = np.array([
[0.9, 0.1],
[0.2, 0.8],
[0.7, 0.3],
[0.3, 0.7]
])
# This will raise an error
cm = confusion_matrix(y_true, y_pred)
The function fails because y_pred is a 2D array of probabilities, not the final class predictions. It needs a 1D array to match y_true's shape. The following code shows how to make that conversion.
from sklearn.metrics import confusion_matrix
import numpy as np
y_true = [0, 1, 0, 1]
y_pred = np.array([
[0.9, 0.1],
[0.2, 0.8],
[0.7, 0.3],
[0.3, 0.7]
])
# Convert probabilities to class predictions
y_pred_classes = np.argmax(y_pred, axis=1)
cm = confusion_matrix(y_true, y_pred_classes)
print(cm)
The solution is to convert the probability scores into definite class labels before generating the matrix. You can do this with np.argmax(y_pred, axis=1), which finds the index of the highest probability for each prediction. This index represents the final predicted class.
- This is a crucial step when your model, like a neural network, outputs probabilities instead of discrete class labels. By converting them, you align the input with what
confusion_matrixexpects.
Dealing with missing labels in confusion_matrix
It's common for a batch of data to be missing a class your model was trained on. When this happens, confusion_matrix returns a smaller matrix than you expect, which can break your code. The following example demonstrates this exact problem.
from sklearn.metrics import confusion_matrix
# Training data had 3 classes, but test set is missing class 2
y_true = [0, 1, 0, 1, 0, 1]
y_pred = [0, 1, 1, 1, 0, 0]
# This creates a 2x2 matrix, losing information about class 2
cm = confusion_matrix(y_true, y_pred)
print(cm)
The function is unaware of the missing class 2, so it dynamically creates a 2x2 matrix based on the provided data. This can break any code expecting a fixed size. See the correct implementation below.
from sklearn.metrics import confusion_matrix
y_true = [0, 1, 0, 1, 0, 1]
y_pred = [0, 1, 1, 1, 0, 0]
# Specify all expected labels even if some are missing
labels = [0, 1, 2]
cm = confusion_matrix(y_true, y_pred, labels=labels)
print(cm)
The solution is to explicitly define all possible classes using the labels parameter. This forces confusion_matrix to create a matrix with a consistent shape, even if a class is missing from the data batch. This prevents your code from breaking unexpectedly.
- This guarantees the output matrix includes rows and columns for all classes, filling missing ones with zeros.
- It's crucial when processing data in chunks or when a test set doesn't represent every class.
Real-world applications
Beyond the code, confusion matrices are essential for evaluating real-world applications, from spam filters to sentiment analysis models.
Evaluating a spam filter with confusion_matrix metrics
A confusion matrix is essential for evaluating a spam filter because it reveals the critical trade-off between blocking legitimate emails and letting spam through, something a simple accuracy score can't show.
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score
# Actual labels (1=spam, 0=not spam)
y_true = [1, 0, 1, 0, 1, 0, 1, 0]
# Predicted labels from our spam filter
y_pred = [1, 0, 0, 0, 1, 0, 1, 1]
# Create and analyze confusion matrix
cm = confusion_matrix(y_true, y_pred)
print("Confusion Matrix:")
print(cm)
print(f"Accuracy: {accuracy_score(y_true, y_pred):.2f}")
print(f"Precision: {precision_score(y_true, y_pred):.2f}")
print(f"Recall: {recall_score(y_true, y_pred):.2f}")
This code evaluates a spam filter by comparing its predictions (y_pred) against the actual labels (y_true). It uses Scikit-learn's metrics to get a detailed performance picture beyond just a confusion matrix.
accuracy_scoreshows the overall percentage of correct predictions.precision_scoremeasures the model's exactness—of all emails flagged as spam, how many were actually spam?recall_scoremeasures completeness—of all actual spam emails, how many did the filter successfully catch?
These metrics provide a nuanced view of the model's effectiveness.
Multi-class sentiment analysis with confusion_matrix
A confusion matrix is just as useful for multi-class problems like sentiment analysis, where it reveals exactly which categories your model tends to confuse.
from sklearn.metrics import confusion_matrix, classification_report
# Simulated sentiment analysis results (0=negative, 1=neutral, 2=positive)
y_true = [0, 1, 2, 0, 1, 2, 0, 1, 2]
y_pred = [0, 1, 1, 0, 1, 2, 0, 2, 2]
# Create and display confusion matrix
cm = confusion_matrix(y_true, y_pred)
print("Sentiment Analysis Confusion Matrix:")
print(cm)
# Display classification report
target_names = ['Negative', 'Neutral', 'Positive']
print("\nClassification Report:")
print(classification_report(y_true, y_pred, target_names=target_names))
This code evaluates a multi-class model by generating two key outputs. First, it creates a raw confusion matrix with confusion_matrix, which helps you see exactly where the model gets confused—for example, mistaking a "neutral" sentiment for "positive."
- The
classification_reportfunction offers a more detailed text summary of performance. - By providing a list of strings to the
target_namesparameter, you make the report much easier to read by replacing numeric labels like0and1with descriptive names like 'Negative' and 'Neutral'.
Get started with Replit
Now, turn what you've learned into a real tool with Replit Agent. Describe what you want, like "build a dashboard that visualizes a confusion matrix from uploaded CSVs" or "create a utility that generates a classification report."
Replit Agent will write the code, test for errors, and help you deploy your application. Start building with Replit.
Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.
Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

.png)
.png)
.png)