How to calculate standard deviation in Python
Learn how to calculate standard deviation in Python. Explore different methods, real-world applications, and common errors with our guide.

Standard deviation is a key statistical measure that shows data variability. Python offers powerful, built-in tools to calculate it, which simplifies complex analysis for developers and data scientists alike.
In this article, we'll cover several methods to calculate standard deviation. You'll find practical techniques, real-world applications, and debugging advice to help you master this essential statistical skill in Python.
Basic standard deviation with the statistics module
import statistics
data = [2, 4, 4, 4, 5, 5, 7, 9]
std_dev = statistics.stdev(data)
print(f"Standard Deviation: {std_dev}")--OUTPUT--Standard Deviation: 2.138089935299395
Python’s built-in statistics module offers the most direct path for calculating standard deviation. It’s designed for fundamental statistical operations, so you don’t have to implement the underlying mathematical formulas yourself.
The statistics.stdev() function specifically calculates the sample standard deviation from an iterable, like the list of numbers shown in the example. This approach is ideal for simple datasets or when you want to avoid adding external libraries like NumPy or pandas for a straightforward task.
Standard deviation with popular libraries
For more complex datasets or performance-heavy tasks, you'll often turn to powerful libraries like NumPy and pandas or even implement the calculation from scratch.
Using numpy for standard deviation calculation
import numpy as np
data = [2, 4, 4, 4, 5, 5, 7, 9]
std_dev = np.std(data)
print(f"NumPy Standard Deviation: {std_dev}")--OUTPUT--NumPy Standard Deviation: 2.0
NumPy is a cornerstone for numerical computing in Python, offering highly optimized functions for large datasets. The np.std() function provides a fast and efficient way to compute standard deviation, especially when you're working with arrays.
It's important to note that np.std() calculates the population standard deviation by default. This is different from the statistics module, which finds the sample standard deviation. This distinction explains the different results you see and is a critical detail in statistical analysis.
Using pandas for standard deviation
import pandas as pd
data = [2, 4, 4, 4, 5, 5, 7, 9]
std_dev = pd.Series(data).std()
print(f"Pandas Standard Deviation: {std_dev}")--OUTPUT--Pandas Standard Deviation: 2.138089935299395
Pandas is a go-to library for data manipulation, and it handles statistical calculations smoothly. The first step is converting your data into a pandas Series, which is a core, one-dimensional data structure in the library. Once you have a Series, you can call the .std() method directly on it.
- It's important to know that, by default, pandas calculates the sample standard deviation, aligning with the
statisticsmodule's behavior.
Computing standard deviation from scratch
import math
data = [2, 4, 4, 4, 5, 5, 7, 9]
mean = sum(data) / len(data)
variance = sum((x - mean) ** 2 for x in data) / (len(data) - 1)
std_dev = math.sqrt(variance)
print(f"Manual Standard Deviation: {std_dev}")--OUTPUT--Manual Standard Deviation: 2.138089935299395
Implementing the standard deviation calculation from scratch gives you a clear view of the underlying mechanics. This manual approach breaks the statistical formula into three distinct steps, giving you full control over the process.
- First, you calculate the mean (average) by using
sum(data) / len(data). - Next, you find the variance. This is the sum of the squared differences from the mean, divided by the number of data points minus one (
len(data) - 1). Using- 1ensures you're calculating the sample standard deviation. - Finally, you get the standard deviation by taking the square root of the variance with
math.sqrt().
Advanced standard deviation techniques
With the fundamentals down, you can now handle more nuanced tasks like adjusting calculations with the ddof parameter, processing 2D arrays, and using scipy.stats for robust analysis.
Adjusting degrees of freedom with ddof parameter
import numpy as np
data = [2, 4, 4, 4, 5, 5, 7, 9]
population_std = np.std(data, ddof=0) # Population standard deviation
sample_std = np.std(data, ddof=1) # Sample standard deviation
print(f"Population std: {population_std}\nSample std: {sample_std}")--OUTPUT--Population std: 2.0
Sample std: 2.138089935299395
The ddof parameter, short for "Delta Degrees of Freedom," gives you direct control over the divisor in the standard deviation formula. This is what allows you to switch between calculating for a population versus a sample.
- Using
ddof=0, which is NumPy's default, calculates the population standard deviation by dividing by N (the total number of data points). - Setting
ddof=1calculates the sample standard deviation by dividing by N-1. This makes NumPy's output match the behavior of thestatisticsmodule.
Calculating standard deviation for 2D arrays
import numpy as np
data_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
row_std = np.std(data_2d, axis=1) # Standard deviation for each row
col_std = np.std(data_2d, axis=0) # Standard deviation for each column
print(f"Row std devs: {row_std}\nColumn std devs: {col_std}")--OUTPUT--Row std devs: [0.81649658 0.81649658 0.81649658]
Column std devs: [2.44948974 2.44948974 2.44948974]
NumPy's np.std() function isn't limited to one-dimensional data; it works seamlessly with 2D arrays. The key is the axis parameter, which lets you specify whether to calculate the standard deviation along rows or columns. This gives you granular control over your analysis.
- Setting
axis=1computes the standard deviation for each row individually. - Using
axis=0computes the standard deviation for each column.
This is incredibly useful for analyzing structured data, like finding the variability within different features or samples in a dataset.
Using scipy.stats for statistical analysis
from scipy import stats
data = [2, 4, 4, 4, 5, 5, 7, 9]
mean, variance, skewness, kurtosis = stats.describe(data)[2:]
std_dev = math.sqrt(variance)
print(f"Standard Deviation: {std_dev}\nSkewness: {skewness}\nKurtosis: {kurtosis}")--OUTPUT--Standard Deviation: 2.138089935299395
Skewness: 0.5961350990092221
Kurtosis: -0.3349928471959984
SciPy is your tool for deeper statistical dives. Instead of just one metric, the stats.describe() function returns a comprehensive summary of your dataset all at once.
- It efficiently calculates key statistics like mean, variance, skewness, and kurtosis.
- In this example, the standard deviation is then found by taking the square root of the variance returned by
stats.describe().
This approach is perfect when you need a broader understanding of your data's characteristics beyond just its variability.
Move faster with Replit
Replit is an AI-powered development platform that comes with all Python dependencies pre-installed, so you can skip setup and start coding instantly. This lets you move from learning individual techniques, like using stdev(), to building complete working applications.
Instead of piecing together functions manually, you can describe the app you want to build and have Agent 4 take it from idea to working product. Here are a few examples of what you could create:
- A financial volatility dashboard that analyzes stock price data to visualize risk.
- A data quality tool that scans datasets and calculates the standard deviation for each column to flag inconsistencies.
- An A/B test analyzer that computes the standard deviation of conversion rates to help you make data-driven decisions.
Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.
Common errors and challenges
When calculating standard deviation in Python, you might encounter a few common errors, but they're all straightforward to solve with the right approach.
Handling empty lists with statistics.stdev()
Passing an empty list to statistics.stdev() will trigger a StatisticsError. This happens because standard deviation measures spread, and you can't measure the spread of nothing. The calculation requires at least two data points to work correctly.
- To avoid this error, you can add a simple check in your code to ensure your list contains two or more items before you attempt the calculation.
Understanding the difference between np.std() and statistics.stdev()
It's a common point of confusion when np.std() and statistics.stdev() return different values for the same dataset. This isn't an error but a result of their different default assumptions about your data.
- The
statisticsmodule calculates the sample standard deviation, assuming your data is a subset of a larger group. - NumPy, on the other hand, calculates the population standard deviation by default, treating your data as the entire group.
As mentioned earlier, you can make NumPy behave like the statistics module by setting the ddof=1 parameter.
Handling NaN values in standard deviation calculations
Missing data, often represented as NaN (Not a Number), can cause problems. How your calculation is affected depends entirely on the library you're using.
- The built-in
statistics.stdev()function will fail and raise an error if it encounters anyNaNvalues in your data. - Libraries like NumPy and pandas are built to handle missing data more gracefully. NumPy offers a specific function,
np.nanstd(), that computes the standard deviation while ignoringNaNvalues. Pandas methods typically ignore them by default.
If you expect missing values, using NumPy or pandas can save you the extra step of cleaning the data manually.
Handling empty lists with statistics.stdev()
A common pitfall is feeding an empty list into statistics.stdev(). Since standard deviation needs data to measure variability, an empty dataset logically causes a StatisticsError. The function requires at least two data points to produce a meaningful result. See this error in action in the code below.
import statistics
data = [] # Empty list
std_dev = statistics.stdev(data)
print(f"Standard Deviation: {std_dev}")
The error is triggered because the empty data list is passed directly to statistics.stdev(). You can prevent this by adding a simple conditional check before the calculation. The code below shows how to implement this fix.
import statistics
data = [] # Empty list
try:
std_dev = statistics.stdev(data)
print(f"Standard Deviation: {std_dev}")
except statistics.StatisticsError as e:
print(f"Error: {e}") # Will display "variance requires at least two data points"
The fix is to wrap the statistics.stdev() call in a try...except block. This structure lets your program attempt the calculation and gracefully catch the statistics.StatisticsError if the list is empty. Instead of crashing, your code can execute a fallback, like printing a helpful message. It's a crucial safeguard anytime you're working with data that might be empty, such as results from a database query or user input.
Understanding the difference between np.std() and statistics.stdev()
It’s a common point of confusion when np.std() and statistics.stdev() return different values for the same data. This isn't an error but a result of their different assumptions—one calculates population standard deviation, the other sample. The code below shows this in practice.
import numpy as np
import statistics
data = [2, 4, 4, 4, 5, 5, 7, 9]
np_std = np.std(data)
stats_std = statistics.stdev(data)
print(f"NumPy std: {np_std}, Statistics std: {stats_std}") # Different results
By calling both np.std() and statistics.stdev() on the same data, the code highlights their different default behaviors. See how to reconcile this difference in the code that follows.
import numpy as np
import statistics
data = [2, 4, 4, 4, 5, 5, 7, 9]
np_std = np.std(data, ddof=1) # Set degrees of freedom to 1
stats_std = statistics.stdev(data)
print(f"NumPy std: {np_std}, Statistics std: {stats_std}") # Same results
The fix is to align NumPy's calculation with the statistics module's behavior. By setting the ddof=1 parameter in np.std(), you're telling it to compute the sample standard deviation instead of its default population standard deviation. This makes both functions produce the same result.
- This adjustment is crucial when you need consistent statistical analysis across different libraries, especially when your data represents a sample of a larger population.
Handling NaN values in standard deviation calculations
Missing data, represented as NaN (Not a Number), can throw a wrench in your calculations. Depending on the library, these values can either be ignored or cause your entire function to return NaN. The code below shows what happens when NumPy encounters one.
import numpy as np
data = [2, 4, np.nan, 5, 7, 9] # Contains a NaN value
std_dev = np.std(data)
print(f"Standard Deviation: {std_dev}") # Results in NaN
The standard np.std() function includes the NaN value in its calculation, which causes the entire operation to return NaN. The code below shows how to get a valid result by ignoring missing values.
import numpy as np
data = [2, 4, np.nan, 5, 7, 9] # Contains a NaN value
std_dev = np.nanstd(data) # Use nanstd to ignore NaN values
print(f"Standard Deviation: {std_dev}") # Correct result
The fix is to use NumPy’s np.nanstd() function. It’s designed to compute the standard deviation while automatically ignoring any NaN values in your data, preventing a single missing value from invalidating your entire result.
This is crucial when working with real-world datasets—like sensor data or survey results—where missing information is common. Using np.nanstd() ensures your statistical analysis remains robust and accurate, even with incomplete data.
Real-world applications
Moving beyond the calculations, standard deviation helps solve practical problems, from setting fair grading curves to spotting unusual data points.
Setting grading thresholds with statistics.stdev()
Calculating the standard deviation of test scores lets you set objective grading thresholds, such as defining an 'A' grade as any score one standard deviation above the class mean.
import statistics
scores = [65, 72, 78, 80, 81, 85, 87, 88, 89, 90, 92, 95, 98]
mean = statistics.mean(scores)
std_dev = statistics.stdev(scores)
a_threshold = mean + std_dev
c_threshold = mean - std_dev
print(f"Mean: {mean:.1f}, Standard Deviation: {std_dev:.1f}")
print(f"A grade threshold (mean + 1σ): {a_threshold:.1f}")
print(f"C grade threshold (mean - 1σ): {c_threshold:.1f}")
This code uses the statistics module to analyze a list of scores. It first computes the average with statistics.mean() and the data's spread with statistics.stdev(). From there, it establishes two key benchmarks based on how far values are from the mean.
- The
a_thresholdis set to one standard deviation above the average. - The
c_thresholdis set to one standard deviation below it.
This technique helps you identify a central range in your dataset. Scores falling between these two points are statistically typical for this group.
Detecting outliers using the 3 * std_dev threshold
A common and effective method for spotting anomalies in your data is to flag any point that falls beyond a 3 * std_dev threshold from the mean.
import numpy as np
temperatures = [22.1, 22.5, 22.3, 22.7, 22.0, 23.1, 22.8, 35.2, 22.9, 23.0]
mean = np.mean(temperatures)
std_dev = np.std(temperatures)
outliers = [t for t in temperatures if abs(t - mean) > 3 * std_dev]
outlier_positions = [i for i, t in enumerate(temperatures) if abs(t - mean) > 3 * std_dev]
print(f"Temperature outliers detected: {outliers}")
print(f"Outlier positions in dataset: {outlier_positions}")
This code calculates the statistical center and spread of the temperatures data. It then establishes a boundary for what's considered normal using a common rule of thumb—three standard deviations from the mean.
- A list comprehension iterates through the data, flagging any temperature that falls outside this boundary.
- A second comprehension uses
enumerate()to pinpoint the exact position of these anomalies within the original list.
This approach effectively isolates unusual values like the 35.2 reading, giving you both the outlier and its location.
Get started with Replit
Now, turn what you've learned into a real tool. Describe what you want to build to Replit Agent, like “a tool that calculates standard deviation for uploaded data” or “a dashboard that visualizes stock price volatility.”
The Agent writes the code, tests for errors, and deploys your application. You can focus on refining your idea instead of debugging. Start building with Replit.
Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.
Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

.png)
.png)
.png)