How to plot a box plot in Python
Learn how to plot a box plot in Python. This guide covers various methods, tips, real-world examples, and common error debugging.

Box plots are a powerful tool to visualize data distributions in Python. They offer a concise summary of key statistical measures through a simple graphical representation.
In this article, you'll learn techniques to create box plots using popular libraries. You'll also find practical tips for customization, explore real-world applications, and get debugging advice for common issues.
Basic box plot with matplotlib
import matplotlib.pyplot as plt
import numpy as np
data = np.random.randn(100)
plt.boxplot(data)
plt.title('Simple Box Plot')
plt.show()--OUTPUT--# No text output, displays a box plot figure
To demonstrate the plot, the code first generates sample data. The line np.random.randn(100) creates an array of 100 random numbers, which is a quick way to get a dataset without loading an external file.
Creating the plot is surprisingly simple. The plt.boxplot(data) function does all the heavy lifting; it takes the raw data and automatically calculates the statistical summary needed to draw the box and whiskers. The final call to plt.show() renders the visual output.
Basic customization techniques
That simple matplotlib plot is a great start, but you can create far more informative and visually appealing plots for comparison with a little more code.
Using seaborn for more attractive box plots
import seaborn as sns
import numpy as np
data = np.random.randn(100)
sns.boxplot(y=data)
plt.title('Seaborn Box Plot')
plt.show()--OUTPUT--# No text output, displays a styled box plot figure
For a more polished look with minimal effort, you can use the seaborn library. It’s built on top of matplotlib and offers improved default styles that make your plots easier to read.
The core of the operation is the sns.boxplot() function. Note a couple of small differences:
- You explicitly assign the data to an axis, in this case with
y=data. - Since
seabornworks withmatplotlib, you can still use familiar functions likeplt.show()to display the final plot.
Customizing box plot colors and appearance
import matplotlib.pyplot as plt
import numpy as np
data = np.random.randn(100)
box = plt.boxplot(data, patch_artist=True)
for patch in box['boxes']:
patch.set_facecolor('lightblue')
plt.grid(linestyle='--', alpha=0.7)
plt.show()--OUTPUT--# No text output, displays a box plot with custom styling
You can go beyond the default styles by directly manipulating the plot's components. The key is setting patch_artist=True in your plt.boxplot() call, which tells matplotlib to treat the box as a fillable shape.
- The function returns a dictionary of plot elements. You can loop through
box['boxes']to access the main rectangle. - Inside the loop, use a method like
set_facecolor('lightblue')to change its color. - You can also add a background grid with
plt.grid()to make the plot easier to read.
Creating grouped box plots for comparison
import matplotlib.pyplot as plt
import numpy as np
data = [np.random.normal(0, std, 100) for std in range(1, 4)]
plt.boxplot(data, labels=['Group 1', 'Group 2', 'Group 3'])
plt.ylabel('Value')
plt.show()--OUTPUT--# No text output, displays multiple box plots side by side
Box plots are especially useful for comparing distributions across different groups. The key is to pass a list of datasets to the plt.boxplot() function. In the example, a list comprehension creates three distinct datasets, each with a different standard deviation.
- The
datavariable holds a list, where each item is a separate array of numbers. - The
labelsparameter assigns a name to each corresponding box plot, making the comparison clear.
This simple structure allows you to visualize multiple groups side by side in a single chart.
Advanced box plot techniques
Beyond basic styling and grouping, you can create even more powerful visualizations by adding statistical annotations, combining plot types, and introducing interactivity.
Adding statistical annotations to box plots
import matplotlib.pyplot as plt
import numpy as np
data = np.random.randn(100)
plt.boxplot(data)
plt.axhline(y=np.mean(data), color='r', linestyle='--', label=f'Mean: {np.mean(data):.2f}')
plt.text(1.1, np.median(data), f'Median: {np.median(data):.2f}')
plt.legend()
plt.show()--OUTPUT--# No text output, displays a box plot with statistical information
Adding statistical context directly to your plot makes it much easier to interpret. This example uses a couple of handy matplotlib functions to display the mean and median right on the chart.
- The
plt.axhline()function draws a horizontal line across the plot. Here, it's used to mark the data's mean, styled as a dashed red line. - You can add specific text with
plt.text(). The code places the median's value right next to the median line on the box plot itself.
Finally, plt.legend() displays the label you created for the mean's horizontal line.
Combining box plots with other visualization types
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
data = np.random.randn(100)
fig, ax = plt.subplots()
ax.boxplot(data, positions=[1])
sns.stripplot(y=data, x=np.ones_like(data), jitter=True, alpha=0.5, ax=ax)
plt.tight_layout()
plt.show()--OUTPUT--# No text output, displays a box plot with overlaid strip plot
Combining plots can reveal more than a single visualization. This example overlays a strip plot on a box plot, giving you both the five-number summary and a look at every individual data point. This is great for spotting density, gaps, and outliers that a standard box plot might hide.
- The process starts by creating a shared canvas for both plots using
fig, ax = plt.subplots(). - You then draw the
ax.boxplot()and thesns.stripplot()on the sameaxobject. - Using
jitter=Truein the strip plot spreads the points out so they don't overlap, making the underlying data distribution easier to see.
Creating interactive box plots with plotly
import plotly.graph_objects as go
import numpy as np
data = np.random.randn(100)
fig = go.Figure()
fig.add_trace(go.Box(y=data, name='Data'))
fig.update_layout(title='Interactive Box Plot')
fig.show()--OUTPUT--# No text output, opens an interactive box plot in browser
For plots you can interact with, the plotly library is an excellent tool. It creates visualizations that let you hover over elements to see precise statistical values, a clear step up from static images.
The workflow is straightforward:
- You initialize a figure object with
go.Figure(), which acts as a canvas for your plot. - Next, you add the visualization by calling
fig.add_trace()with ago.Boxobject that contains your data. - Finally,
fig.show()renders the complete, interactive chart.
Move faster with Replit
Replit is an AI-powered development platform that comes with all Python dependencies pre-installed, so you can skip setup and start coding instantly.
Instead of piecing together individual techniques, you can use Agent 4 to build complete, working applications. It's designed to take your project from a simple description to a finished product by handling the code, databases, APIs, and deployment.
- A performance dashboard that compares server response times across different regions using grouped box plots.
- An interactive financial analysis tool that visualizes stock price volatility, complete with hover-over details for key stats like the median and quartiles.
- A data validation utility that automatically generates box plots and strip plots from a CSV file to help you spot outliers and distribution anomalies before analysis.
Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.
Common errors and challenges
Even with simple plots, you can run into a few common snags; here’s how to fix them quickly.
When creating grouped box plots, you might find that your custom labels don't appear. This usually happens if you forget to pass the labels argument to plt.boxplot() or if the number of labels doesn't match the number of datasets. Double-check that your list of labels has exactly one entry for each group you're plotting.
An error when plotting is often caused by NaN—or 'Not a Number'—values in your dataset. Matplotlib's boxplot function can't process them, so you'll need to clean your data first. You can filter these values out before plotting, for example, by using the dropna() method on a pandas DataFrame or by creating a boolean mask with NumPy's isnan() function.
Sometimes your box plot shows up horizontally when you expected it to be vertical. With seaborn, this is typically because the data was assigned to the wrong axis, like using x=data instead of y=data. In matplotlib, plots are vertical by default, but you can make them horizontal by setting the vert=False parameter in the boxplot function.
Fixing missing labels in grouped boxplot
It’s easy to end up with a confusing chart when plotting multiple groups. Without labels, your box plots become anonymous, defeating the purpose of comparison. This common issue usually stems from forgetting to include the labels argument in your plt.boxplot() call.
The code below demonstrates what this looks like in practice. Notice how the x-axis simply shows numbers instead of meaningful group names, making the plot difficult to read.
import matplotlib.pyplot as plt
import numpy as np
data = [np.random.normal(0, 1, 100), np.random.normal(2, 1, 100), np.random.normal(4, 1, 100)]
plt.boxplot(data)
plt.title('Comparison of Groups')
plt.show()
The plt.boxplot(data) call receives multiple datasets but no corresponding names. Matplotlib defaults to numbering the groups on the x-axis, resulting in a generic chart. See how a small addition fixes this in the corrected code below.
import matplotlib.pyplot as plt
import numpy as np
data = [np.random.normal(0, 1, 100), np.random.normal(2, 1, 100), np.random.normal(4, 1, 100)]
plt.boxplot(data, labels=['Group A', 'Group B', 'Group C'])
plt.title('Comparison of Groups')
plt.show()
The solution is to provide a list of names to the labels parameter in the plt.boxplot() function. This ensures each dataset in your list is paired with a corresponding string, making the chart instantly readable. Always double-check that the number of labels matches the number of datasets you're plotting. This simple step is crucial for creating clear, comparative visualizations and avoiding confusion when you present your findings.
Resolving NaN values error in boxplot
Resolving NaN values error in boxplot
Encountering an error when plotting is often due to NaN—or 'Not a Number'—values lurking in your data. Matplotlib's boxplot function can't process these missing values, which will typically cause your code to fail. You'll need to clean them first.
The code below demonstrates what happens when you try to plot a dataset containing NaN values without handling them first.
import matplotlib.pyplot as plt
import numpy as np
data = np.random.randn(100)
data[20:30] = np.nan # Adding some NaN values
plt.boxplot(data)
plt.title('Box Plot with NaN Values')
plt.show()
The code intentionally corrupts the dataset by replacing a slice of numbers with np.nan values. This injection of missing data is what triggers the plotting failure. See how the corrected code below addresses this issue.
import matplotlib.pyplot as plt
import numpy as np
data = np.random.randn(100)
data[20:30] = np.nan # Adding some NaN values
clean_data = data[~np.isnan(data)]
plt.boxplot(clean_data)
plt.title('Box Plot with NaN Values Removed')
plt.show()
The solution is to filter out NaN values before plotting. The code creates a boolean mask with np.isnan(data) to find the missing entries. By using the ~ operator, you invert this mask to select only the valid numbers, creating a clean dataset that can be passed to plt.boxplot().
This is a common problem when you're working with real-world data from files or APIs, since they often contain missing values.
Troubleshooting incorrect boxplot orientation
Troubleshooting incorrect boxplot orientation
Sometimes your box plot shows up horizontally when you expected it to be vertical. This common orientation issue can make your chart look awkward and hard to read, especially if the figure's dimensions don't match the plot's layout.
The code below shows how a simple plt.boxplot() call can result in a mismatched orientation, creating a plot that doesn't use its canvas effectively.
import matplotlib.pyplot as plt
import numpy as np
data = np.random.randn(100)
plt.figure(figsize=(4, 8)) # Tall figure for horizontal boxplot
plt.boxplot(data)
plt.xlabel('Value')
plt.ylabel('Data Distribution')
plt.show()
By default, plt.boxplot() creates a vertical plot. This conflicts with the tall figure dimensions and incorrectly assigned axis labels, leading to a visually awkward result. The corrected code below shows how to resolve this orientation mismatch.
import matplotlib.pyplot as plt
import numpy as np
data = np.random.randn(100)
plt.figure(figsize=(8, 4)) # Wide figure for horizontal boxplot
plt.boxplot(data, vert=False)
plt.xlabel('Value')
plt.ylabel('Data Distribution')
plt.show()
The fix is simple: setting vert=False in the plt.boxplot() function flips the plot from vertical to horizontal. This aligns the visualization with the wider figure dimensions and the axis labels. You'll want to use this parameter whenever you need a horizontal layout, which is especially useful when you have many groups or long labels that would otherwise overlap on a vertical axis. It ensures your chart is both readable and visually balanced.
Real-world applications
Moving past technical hurdles, you can see how these plots provide clear, actionable insights in fields like education and clinical research.
Comparing student performance across schools with boxplot
You can use a grouped box plot to effectively compare student test scores from different schools, making it easy to see how performance distributions differ.
import matplotlib.pyplot as plt
import numpy as np
# Simulate test scores from different schools
school_a = np.random.normal(72, 8, 30)
school_b = np.random.normal(68, 10, 30)
school_c = np.random.normal(77, 6, 30)
plt.boxplot([school_a, school_b, school_c], labels=['School A', 'School B', 'School C'])
plt.title('Test Score Distribution by School')
plt.ylabel('Score')
plt.grid(linestyle='--', alpha=0.3)
plt.show()
This code simulates test scores for three different schools using NumPy's np.random.normal() function. Each school's data is generated with a unique mean and standard deviation, creating distinct distributions for comparison.
- The key to the grouped plot is passing a list of these datasets—
[school_a, school_b, school_c]—directly toplt.boxplot(). - The
labelsparameter then assigns a name to each corresponding dataset, which makes the final chart easy to interpret.
Finally, the code adds a title, a y-axis label, and a faint grid to improve the plot's readability.
Identifying outliers in clinical trials with boxplot
Box plots are particularly effective in clinical trials for spotting outliers, which can represent unusual patient responses to a treatment.
import matplotlib.pyplot as plt
import numpy as np
# Simulate drug trial data (blood pressure reduction)
np.random.seed(42)
placebo = np.random.normal(5, 3, 30)
drug_x = np.concatenate([np.random.normal(10, 3, 27), np.random.normal(25, 2, 3)])
drug_y = np.random.normal(8, 4, 30)
box = plt.boxplot([placebo, drug_x, drug_y],
labels=['Placebo', 'Drug X', 'Drug Y'],
patch_artist=True)
for patch in box['boxes']:
patch.set_facecolor('lightblue')
plt.title('Blood Pressure Reduction by Treatment')
plt.ylabel('Reduction (mm Hg)')
plt.show()
# Identify outliers in Drug X
outliers = [x for x in drug_x if abs(x - np.mean(drug_x)) > 2 * np.std(drug_x)]
print(f"Drug X outliers: {outliers}")
This code simulates clinical trial data to demonstrate outlier detection, using np.random.seed(42) to ensure the results are reproducible.
- The dataset for
drug_xis unique. It's created by combining two different data distributions withnp.concatenate, which intentionally introduces a few high values that will appear as outliers. - The box plot visualizes these groups, and setting
patch_artist=Trueallows for custom styling like changing the box colors. - Finally, a list comprehension programmatically identifies outliers by finding values more than two standard deviations from the mean.
Get started with Replit
Turn what you've learned into a real tool with Replit Agent. Describe what you want to build, like “a dashboard to visualize server response times with box plots” or “a utility that finds outliers in a CSV.”
Replit Agent will write the code, test for errors, and deploy your application for you. Start building with Replit.
Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.
Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.



