How to make a confusion matrix in Python

Learn how to create a confusion matrix in Python. Explore different methods, tips, real-world applications, and how to debug common errors.

How to make a confusion matrix in Python
Published on: 
Tue
Mar 3, 2026
Updated on: 
Thu
Mar 5, 2026
The Replit Team Logo Image
The Replit Team

A confusion matrix is a vital tool for machine learning model evaluation. It shows correct and incorrect predictions in a simple table, which gives you clear insights into your model's performance.

In this article, you'll build a confusion matrix with Python. You will also explore practical techniques, real-world examples, and tips to debug your models and master this essential metric.

Using sklearn.metrics.confusion_matrix

from sklearn.metrics import confusion_matrix
y_true = [0, 1, 0, 1, 0, 1, 0, 1]
y_pred = [0, 1, 0, 0, 1, 1, 0, 1]
cm = confusion_matrix(y_true, y_pred)
print(cm)--OUTPUT--[[3 1]
[1 3]]

The confusion_matrix function from scikit-learn streamlines the process. It takes two key arguments: y_true, a list of the correct labels, and y_pred, a list of your model's corresponding predictions.

The function returns a 2x2 array that neatly summarizes performance. In this case, the output is [[3 1], [1 3]]. Here’s what that means:

  • True Negatives (Top-Left): 3 instances of class 0 were correctly identified.
  • False Positives (Top-Right): 1 instance was incorrectly labeled as class 1.
  • False Negatives (Bottom-Left): 1 instance was incorrectly labeled as class 0.
  • True Positives (Bottom-Right): 3 instances of class 1 were correctly identified.

Basic visualization techniques

That raw array is a good start, but visualizing your confusion matrix with tools like matplotlib and seaborn makes it far easier to understand.

Using matplotlib to plot the matrix

import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import confusion_matrix

y_true = [0, 1, 0, 1, 0, 1, 0, 1]
y_pred = [0, 1, 0, 0, 1, 1, 0, 1]
cm = confusion_matrix(y_true, y_pred)
plt.imshow(cm, cmap='Blues')
plt.colorbar()
plt.show()--OUTPUT--[Output would be a blue-colored heatmap with a color scale]

This code snippet uses matplotlib to create a simple heatmap from your confusion matrix. The core of this visualization is the plt.imshow() function, which renders the matrix array as an image. Using cmap='Blues' colors the cells based on their value—darker shades represent higher numbers, making it easy to spot where most predictions fall.

  • The plt.colorbar() function adds a legend that maps the colors to their corresponding counts.
  • Finally, plt.show() displays the generated plot for you to analyze.

Creating a heatmap with seaborn

import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix

y_true = [0, 1, 0, 1, 0, 1, 0, 1]
y_pred = [0, 1, 0, 0, 1, 1, 0, 1]
cm = confusion_matrix(y_true, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='YlGnBu')
plt.show()--OUTPUT--[Output would be a yellow-green-blue heatmap with numbers in each cell]

For a more polished visual, you can use seaborn. Its sns.heatmap() function builds on matplotlib but adds useful features out of the box. The most significant improvement comes from setting annot=True, which displays the actual count inside each cell of the heatmap.

  • The fmt='d' argument ensures these annotations are displayed as integers.
  • Like before, cmap='YlGnBu' sets the color palette for the visualization.

Using pandas for better labeling

import pandas as pd
from sklearn.metrics import confusion_matrix

y_true = [0, 1, 0, 1, 0, 1, 0, 1]
y_pred = [0, 1, 0, 0, 1, 1, 0, 1]
cm = confusion_matrix(y_true, y_pred)
df_cm = pd.DataFrame(cm, index=['Actual 0', 'Actual 1'], columns=['Predicted 0', 'Predicted 1'])
print(df_cm)--OUTPUT--Predicted 0  Predicted 1
Actual 0            3            1
Actual 1            1            3

While a raw array is useful, it lacks context. By wrapping your confusion matrix in a pandas DataFrame, you can add clear, descriptive labels. This makes the output much easier to interpret at a glance.

  • The pd.DataFrame() function converts your NumPy array into a structured table.
  • You use the index parameter to label the rows, such as ['Actual 0', 'Actual 1'].
  • The columns parameter labels the columns, like ['Predicted 0', 'Predicted 1'].

The result is a clean, self-explanatory table that clearly distinguishes between actual and predicted values without needing extra explanation.

Advanced techniques

While basic visualizations are a great starting point, you can unlock deeper insights by normalizing the matrix, generating full classification reports, and creating interactive plots.

Creating a normalized confusion matrix

import numpy as np
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

y_true = [0, 1, 0, 1, 0, 1, 0, 1]
y_pred = [0, 1, 0, 0, 1, 1, 0, 1]
cm = confusion_matrix(y_true, y_pred)
cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
sns.heatmap(cm_normalized, annot=True, cmap='Blues', fmt='.2f')
plt.show()--OUTPUT--[Output would be a heatmap with normalized values (between 0-1) in each cell]

Normalizing your confusion matrix converts raw counts into proportions, making it easier to interpret performance, especially with imbalanced datasets. It shifts the focus from absolute numbers to the percentage of correct and incorrect predictions for each class.

  • The core of this technique is the line cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]. This divides each value in the confusion matrix by the total number of instances for that class, giving you a row-wise percentage.
  • When plotting with seaborn, using fmt='.2f' formats the annotations to two decimal places, which is perfect for displaying these new proportions.

Using the yellowbrick library for classification reports

from yellowbrick.classifier import ConfusionMatrix
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

X, y = make_classification(random_state=42)
model = RandomForestClassifier(random_state=42)
cm = ConfusionMatrix(model, classes=[0, 1])
cm.fit(X, y)
cm.score(X, y)
cm.show()--OUTPUT--[Output would be a styled confusion matrix with performance metrics]

The yellowbrick library offers a high-level approach that bundles model fitting, prediction, and visualization into one convenient package. It acts as a wrapper around your scikit-learn classifier, automating the entire workflow for you.

  • First, you instantiate the ConfusionMatrix visualizer with your chosen model, like RandomForestClassifier.
  • Then, you call the .fit() and .score() methods directly on the visualizer object.
  • Finally, .show() generates and displays a polished plot that includes the matrix and other useful classification metrics.

Creating an interactive confusion matrix with plotly

import plotly.figure_factory as ff
import numpy as np
from sklearn.metrics import confusion_matrix

y_true = [0, 1, 0, 1, 0, 1, 0, 1]
y_pred = [0, 1, 0, 0, 1, 1, 0, 1]
cm = confusion_matrix(y_true, y_pred)
x = ['Predicted 0', 'Predicted 1']
y = ['Actual 0', 'Actual 1']
fig = ff.create_annotated_heatmap(cm, x=x, y=y, colorscale='Blues')
fig.show()--OUTPUT--[Output would be an interactive heatmap that can be hovered over for details]

For a more dynamic analysis, you can use plotly to build an interactive heatmap. Unlike static plots, this allows you to hover over each cell to see its value, which is perfect for detailed exploration or sharing your findings in a dashboard.

  • The core of this method is plotly.figure_factory.create_annotated_heatmap().
  • You simply pass it your confusion matrix array, along with lists for the x and y axis labels.
  • The function handles the rest, generating a clean, interactive visual that you can display with fig.show().

Move faster with Replit

Replit is an AI-powered development platform that transforms natural language into working applications. You just describe what you want to build, and Replit Agent creates it—complete with databases, APIs, and deployment.

It’s a fast way to turn the concepts from this article into production-ready tools. For example, Replit Agent can build:

  • A model evaluation dashboard that visualizes a normalized confusion matrix with seaborn.
  • An interactive reporting tool using plotly that lets stakeholders explore model performance.
  • A monitoring service that automatically generates and emails a full yellowbrick classification report.

You can bring these ideas to life by simply describing them. Try Replit Agent to build, test, and deploy your next machine learning application directly from your browser.

Common errors and challenges

While creating a confusion matrix is often straightforward, a few common issues can trip you up and lead to misleading results.

Handling label order mismatch in confusion_matrix

The confusion_matrix function infers label order from the data. This can be a problem if the unique labels in y_true and y_pred appear in a different order, as your axes might not represent what you think they do. For example, the row for "Actual 0" might not align with the column for "Predicted 0."

To prevent this ambiguity, you should explicitly set the order with the labels parameter. Passing a sorted list like labels=[0, 1] forces the function to use a consistent structure, ensuring the top-left cell always corresponds to class 0 predicted as class 0.

Handling wrong input shapes with confusion_matrix

You'll encounter a ValueError if your y_true and y_pred arrays don't have the same number of elements. This common error occurs when your prediction set is a different size than your ground truth set, making a one-to-one comparison impossible. Always confirm both inputs have the same length before generating the matrix.

Dealing with missing labels in confusion_matrix

It's possible for a class to exist in your true labels but never appear in your model's predictions. When this happens, the function might produce a matrix with fewer columns than rows, which can break your visualization code or make interpretation difficult.

The most reliable solution is to use the labels parameter to provide a complete list of all possible classes. This guarantees the output matrix has the correct dimensions, even if the model never predicts certain labels, giving you a full and accurate picture of its performance.

Handling label order mismatch in confusion_matrix

The confusion_matrix function determines label order based on when it first encounters them. If the order in y_true differs from y_pred, you'll get a matrix with misaligned axes that's hard to trust. The following code shows this exact problem.

from sklearn.metrics import confusion_matrix

# Classes in dataset are [0, 1, 2] but predictions use [1, 0, 2]
y_true = [0, 1, 2, 0, 1, 2]
y_pred = [1, 0, 2, 1, 0, 2]

# This creates a confusing matrix that's hard to interpret
cm = confusion_matrix(y_true, y_pred)
print(cm)

Because the labels in y_pred appear in a different sequence than in y_true, the function generates a matrix with sorted axes. This can cause you to misinterpret which column corresponds to which predicted class. See how to enforce a consistent order below.

from sklearn.metrics import confusion_matrix

y_true = [0, 1, 2, 0, 1, 2]
y_pred = [1, 0, 2, 1, 0, 2]

# Explicitly specify the label order to ensure consistency
labels = [0, 1, 2]
cm = confusion_matrix(y_true, y_pred, labels=labels)
print(cm)

The fix is simple: explicitly set the class order with the labels parameter. When you call confusion_matrix(y_true, y_pred, labels=[0, 1, 2]), you're telling the function exactly how to structure the matrix. This guarantees the axes align correctly, so you can trust your results. It's a crucial step in multiclass classification, where the order of labels can easily become jumbled between your true values and predictions.

Handling wrong input shapes with confusion_matrix

The confusion_matrix function expects a simple list of predicted labels, not raw probabilities. You'll run into a ValueError if you pass it a multi-dimensional array—like the direct output from a neural network—because the input shapes don't match. The code below shows this exact error.

from sklearn.metrics import confusion_matrix
import numpy as np

# One-hot encoded predictions from a neural network
y_true = [0, 1, 0, 1]
y_pred = np.array([
   [0.9, 0.1],
   [0.2, 0.8],
   [0.7, 0.3],
   [0.3, 0.7]
])

# This will raise an error
cm = confusion_matrix(y_true, y_pred)

The error occurs because y_pred contains raw probabilities instead of the final predicted labels. The confusion_matrix function needs a simple list of predictions to work correctly. See how to fix this in the code below.

from sklearn.metrics import confusion_matrix
import numpy as np

y_true = [0, 1, 0, 1]
y_pred = np.array([
   [0.9, 0.1],
   [0.2, 0.8],
   [0.7, 0.3],
   [0.3, 0.7]
])

# Convert probabilities to class predictions
y_pred_classes = np.argmax(y_pred, axis=1)
cm = confusion_matrix(y_true, y_pred_classes)
print(cm)

The solution is to convert the raw probabilities into definite class labels before passing them to the function. You can do this with np.argmax(y_pred, axis=1), which finds the index of the highest probability for each prediction. This index corresponds to the predicted class. This is a common step when working with models like neural networks that output probabilities instead of discrete labels, ensuring your input has the correct shape for confusion_matrix.

Dealing with missing labels in confusion_matrix

Your model might not predict every possible class, or a class might be missing from your test data. This causes confusion_matrix to return a matrix with incorrect dimensions, potentially leading to errors in your visualization code. The code below demonstrates this issue.

from sklearn.metrics import confusion_matrix

# Training data had 3 classes, but test set is missing class 2
y_true = [0, 1, 0, 1, 0, 1]
y_pred = [0, 1, 1, 1, 0, 0]

# This creates a 2x2 matrix, losing information about class 2
cm = confusion_matrix(y_true, y_pred)
print(cm)

The confusion_matrix function builds its structure based only on the labels it finds. Because class 2 is missing from the inputs, the output is a 2x2 matrix that misrepresents the model's full scope. See the fix below.

from sklearn.metrics import confusion_matrix

y_true = [0, 1, 0, 1, 0, 1]
y_pred = [0, 1, 1, 1, 0, 0]

# Specify all expected labels even if some are missing
labels = [0, 1, 2]
cm = confusion_matrix(y_true, y_pred, labels=labels)
print(cm)

The fix is to explicitly define all possible classes with the labels parameter. By passing labels=[0, 1, 2], you force confusion_matrix to generate a 3x3 matrix, even though class 2 is absent from the data. This ensures your matrix dimensions are always correct and prevents errors in your visualization code. It's a crucial step when your test data might not represent every class from your training set.

Real-world applications

Now that you've handled the technical details, you can apply confusion matrices to practical problems like spam filtering and sentiment analysis.

Evaluating a spam filter with confusion_matrix metrics

In a spam filter, a confusion matrix is essential because it quantifies the critical difference between mistakenly blocking a legitimate email and accidentally letting spam into the inbox.

from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score

# Actual labels (1=spam, 0=not spam)
y_true = [1, 0, 1, 0, 1, 0, 1, 0]
# Predicted labels from our spam filter
y_pred = [1, 0, 0, 0, 1, 0, 1, 1]

# Create and analyze confusion matrix
cm = confusion_matrix(y_true, y_pred)
print("Confusion Matrix:")
print(cm)
print(f"Accuracy: {accuracy_score(y_true, y_pred):.2f}")
print(f"Precision: {precision_score(y_true, y_pred):.2f}")
print(f"Recall: {recall_score(y_true, y_pred):.2f}")

This code uses scikit-learn to calculate key performance metrics for the spam filter. It compares the ground truth (y_true) against the model's predictions (y_pred). It’s a way to measure more than just overall accuracy, which can be misleading.

  • precision_score tells you how trustworthy the "spam" predictions are. A high score means fewer legitimate emails are incorrectly flagged.
  • recall_score shows how many of the actual spam emails the filter caught. A high score means less spam gets into the inbox.

Multi-class sentiment analysis with confusion_matrix

A confusion matrix is just as useful for multi-class problems, like sentiment analysis, where it can show you if your model is confusing neutral comments with positive ones.

from sklearn.metrics import confusion_matrix, classification_report

# Simulated sentiment analysis results (0=negative, 1=neutral, 2=positive)
y_true = [0, 1, 2, 0, 1, 2, 0, 1, 2]
y_pred = [0, 1, 1, 0, 1, 2, 0, 2, 2]

# Create and display confusion matrix
cm = confusion_matrix(y_true, y_pred)
print("Sentiment Analysis Confusion Matrix:")
print(cm)

# Display classification report
target_names = ['Negative', 'Neutral', 'Positive']
print("\nClassification Report:")
print(classification_report(y_true, y_pred, target_names=target_names))

This code evaluates a multi-class sentiment model by comparing true labels (y_true) with predictions (y_pred). It goes beyond a simple matrix by also generating a full classification_report, which offers a more complete performance summary.

  • The confusion_matrix provides the raw counts of correct and incorrect predictions for each sentiment class.
  • The classification_report uses these counts to calculate precision, recall, and F1-score, giving you a detailed breakdown for each category. Using the target_names parameter makes the output much clearer.

Get started with Replit

Turn your knowledge into a real tool. Give Replit Agent a prompt like "build a dashboard that visualizes a confusion_matrix with plotly" or "create a tool that calculates precision and recall from user-inputted labels."

Replit Agent writes the code, tests for errors, and deploys your application. Start building with Replit.

Get started free

Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

Get started for free

Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.