How to calculate standard deviation in Python
Learn how to calculate standard deviation in Python. Explore different methods, real-world applications, and common errors to debug.

Standard deviation is a key statistical measure of data dispersion. Python offers powerful, built-in tools to calculate it, which simplifies complex analysis and provides deeper insights into your datasets.
Here, you'll learn several techniques to compute standard deviation. You will get practical tips, see real-world applications, and receive advice to debug common errors, so you can master this essential statistical skill.
Basic standard deviation with the statistics module
import statistics
data = [2, 4, 4, 4, 5, 5, 7, 9]
std_dev = statistics.stdev(data)
print(f"Standard Deviation: {std_dev}")--OUTPUT--Standard Deviation: 2.138089935299395
Python's built-in statistics module is your go-to for straightforward statistical operations. It's designed for simplicity and readability, making it perfect for quick analyses without needing external libraries like NumPy or SciPy.
The statistics.stdev() function calculates the sample standard deviation. This is an important distinction—it treats your data as a subset of a larger population. For most real-world data analysis, this is exactly what you need, as you're typically working with samples rather than complete datasets.
Standard deviation with popular libraries
Beyond the built-in module, you can leverage robust libraries like NumPy and Pandas for advanced analysis or even compute the standard deviation from scratch.
Using numpy for standard deviation calculation
import numpy as np
data = [2, 4, 4, 4, 5, 5, 7, 9]
std_dev = np.std(data)
print(f"NumPy Standard Deviation: {std_dev}")--OUTPUT--NumPy Standard Deviation: 2.0
NumPy is a powerhouse for numerical work in Python. When you use np.std(), you're tapping into a highly optimized function. You might notice its result is different from the statistics module. This isn't an error; it's a difference in what's being calculated.
- NumPy's
np.std()computes the population standard deviation by default, which treats your data as the entire group. - To calculate the sample standard deviation instead, simply pass the argument
ddof=1to the function.
Using pandas for standard deviation
import pandas as pd
data = [2, 4, 4, 4, 5, 5, 7, 9]
std_dev = pd.Series(data).std()
print(f"Pandas Standard Deviation: {std_dev}")--OUTPUT--Pandas Standard Deviation: 2.138089935299395
Pandas is essential for data analysis, and calculating standard deviation is straightforward. You first wrap your data in a pd.Series, a fundamental one-dimensional array structure in the library. From there, you can call the .std() method directly on the Series object, making it a seamless part of your data manipulation workflow.
- Like the
statisticsmodule, pandas calculates the sample standard deviation by default, which is why their results align. This is ideal for most analytical scenarios where your data represents a subset of a larger population.
Computing standard deviation from scratch
import math
data = [2, 4, 4, 4, 5, 5, 7, 9]
mean = sum(data) / len(data)
variance = sum((x - mean) ** 2 for x in data) / (len(data) - 1)
std_dev = math.sqrt(variance)
print(f"Manual Standard Deviation: {std_dev}")--OUTPUT--Manual Standard Deviation: 2.138089935299395
Building the calculation from scratch reveals the statistical logic at work. It’s a three-step process that you can implement with basic Python functions and the math module.
- First, you find the mean (average) by dividing the
sum()of the data by itslen(). - Next, you compute the sample variance by summing the squared differences from the mean and dividing by the number of data points minus one—
len(data) - 1. - Finally, the standard deviation is the square root of the variance, which you find using
math.sqrt().
Advanced standard deviation techniques
Beyond these foundational methods, you can fine-tune calculations with parameters like ddof, handle multidimensional data, and tap into advanced libraries for deeper statistical analysis.
Adjusting degrees of freedom with ddof parameter
import numpy as np
data = [2, 4, 4, 4, 5, 5, 7, 9]
population_std = np.std(data, ddof=0) # Population standard deviation
sample_std = np.std(data, ddof=1) # Sample standard deviation
print(f"Population std: {population_std}\nSample std: {sample_std}")--OUTPUT--Population std: 2.0
Sample std: 2.138089935299395
The ddof parameter in NumPy's np.std() function stands for "Delta Degrees of Freedom" and lets you control the divisor in the standard deviation calculation. This is how you switch between calculating for a population versus a sample.
ddof=0: This is the default. It calculates the population standard deviation, dividing by the total number of data points.ddof=1: This calculates the sample standard deviation by dividing by the number of data points minus one. This adjustment provides a more accurate estimate when your data is a sample of a larger group.
Calculating standard deviation for 2D arrays
import numpy as np
data_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
row_std = np.std(data_2d, axis=1) # Standard deviation for each row
col_std = np.std(data_2d, axis=0) # Standard deviation for each column
print(f"Row std devs: {row_std}\nColumn std devs: {col_std}")--OUTPUT--Row std devs: [0.81649658 0.81649658 0.81649658]
Column std devs: [2.44948974 2.44948974 2.44948974]
When working with multidimensional data like a 2D array, NumPy's np.std() becomes even more powerful. You can specify the direction of your calculation using the axis parameter, which is essential for structured datasets.
axis=1calculates the standard deviation for each row, analyzing the spread of values horizontally.axis=0calculates it for each column, giving you the deviation vertically.
Using scipy.stats for statistical analysis
from scipy import stats
data = [2, 4, 4, 4, 5, 5, 7, 9]
mean, variance, skewness, kurtosis = stats.describe(data)[2:]
std_dev = math.sqrt(variance)
print(f"Standard Deviation: {std_dev}\nSkewness: {skewness}\nKurtosis: {kurtosis}")--OUTPUT--Standard Deviation: 2.138089935299395
Skewness: 0.5961350990092221
Kurtosis: -0.3349928471959984
For a broader statistical picture, you can use SciPy. The stats.describe() function is a powerhouse, returning a full descriptive summary of your dataset in one go. It's perfect when you need more than just a single measure of dispersion.
- The function provides the
variance, from which you can calculate the standard deviation usingmath.sqrt(). - You also get advanced metrics like
skewness, which measures the data's asymmetry, andkurtosis, which describes the shape of its distribution.
This makes SciPy an efficient choice for comprehensive data exploration.
Move faster with Replit
Replit is an AI-powered development platform that transforms natural language into working applications. It’s designed to help you take the statistical concepts from this article and build production-ready tools without getting bogged down in boilerplate code.
Describe what you want to build, and Replit Agent creates it—complete with databases, APIs, and deployment. It can turn the standard deviation functions you've just learned into practical applications.
- Build a financial volatility dashboard that calculates and visualizes the standard deviation of stock prices using
numpy.std(). - Create a quality control monitor that uses
pandas.std()to track manufacturing consistency and flag deviations in real time. - Deploy a statistical analysis tool that provides a full descriptive summary, including skewness and kurtosis, with
scipy.stats.describe().
Describe your app idea to Replit Agent, and it writes the code, tests it, and fixes issues automatically, all in your browser.
Common errors and challenges
Even with powerful tools, you can run into a few common pitfalls when calculating standard deviation in Python; here’s how to navigate them.
- Handling empty lists with
statistics.stdev() - If you try to calculate the standard deviation of an empty list using
statistics.stdev(), Python will raise aStatisticsError. This happens because standard deviation measures dispersion, and you need at least two data points to measure any spread. Always check that your list contains more than one value before passing it to the function to avoid this error. - Understanding the difference between
np.std()andstatistics.stdev() - A frequent point of confusion is the different results from NumPy's
np.std()and the built-instatistics.stdev(). Remember thatnp.std()calculates the population standard deviation by default, whilestatistics.stdev()computes the sample standard deviation. To align their results, you can instruct NumPy to calculate the sample standard deviation by setting theddof=1parameter. - Handling
NaNvalues in standard deviation calculations - Missing data, often represented as
NaN(Not a Number), can disrupt your calculations. NumPy’snp.std()will returnNaNif any are present, but Pandas’.std()method automatically ignores them. For NumPy, you can either filter out theNaNvalues beforehand or use the specializednp.nanstd()function, which is designed to perform the calculation while skipping over missing entries.
Handling empty lists with statistics.stdev()
Feeding an empty list to the statistics.stdev() function is a quick way to trigger a runtime error. Since the function needs data to measure dispersion, it raises a StatisticsError when it receives none. The following code demonstrates this exact scenario.
import statistics
data = [] # Empty list
std_dev = statistics.stdev(data)
print(f"Standard Deviation: {std_dev}")
The code defines an empty list and passes it directly to statistics.stdev(). Since the function requires data to measure spread, this input is invalid and triggers the error. The following example demonstrates how to avoid this crash.
import statistics
data = [] # Empty list
try:
std_dev = statistics.stdev(data)
print(f"Standard Deviation: {std_dev}")
except statistics.StatisticsError as e:
print(f"Error: {e}") # Will display "variance requires at least two data points"
To prevent a crash, you can wrap the statistics.stdev() call in a try...except block. This approach attempts the calculation and, if a statistics.StatisticsError occurs, executes the code in the except block instead of halting the program. It’s a robust way to handle dynamic data—such as inputs from a user or results from a database query—which might occasionally be empty.
Understanding the difference between np.std() and statistics.stdev()
It's a common source of confusion when NumPy's np.std() and the built-in statistics.stdev() return different values for the same data. This isn't an error but a fundamental difference in their default calculations. The code below demonstrates this discrepancy.
import numpy as np
import statistics
data = [2, 4, 4, 4, 5, 5, 7, 9]
np_std = np.std(data)
stats_std = statistics.stdev(data)
print(f"NumPy std: {np_std}, Statistics std: {stats_std}") # Different results
This code directly compares the outputs of np.std() and statistics.stdev() on the same list, revealing their different default results. See how to make NumPy's calculation match the statistics module in the following example.
import numpy as np
import statistics
data = [2, 4, 4, 4, 5, 5, 7, 9]
np_std = np.std(data, ddof=1) # Set degrees of freedom to 1
stats_std = statistics.stdev(data)
print(f"NumPy std: {np_std}, Statistics std: {stats_std}") # Same results
To align the results from np.std() and statistics.stdev(), you simply need to tell NumPy to calculate the sample standard deviation. You can do this by passing the argument ddof=1. This adjustment is crucial when you're working with data samples and need your results to be consistent across different Python libraries, ensuring your analysis is uniform and comparable. This ensures you're always comparing apples to apples in your statistical work.
Handling NaN values in standard deviation calculations
Missing data, represented as NaN values, can poison your calculations. When you use NumPy's np.std() function on a dataset containing even one NaN, the entire result becomes NaN, which can silently break your analysis. The code below shows this in action.
import numpy as np
data = [2, 4, np.nan, 5, 7, 9] # Contains a NaN value
std_dev = np.std(data)
print(f"Standard Deviation: {std_dev}") # Results in NaN
This code intentionally includes a np.nan value. Since np.std() cannot process non-numeric data, the calculation fails and returns NaN. The following example demonstrates how to correctly compute the standard deviation in this scenario.
import numpy as np
data = [2, 4, np.nan, 5, 7, 9] # Contains a NaN value
std_dev = np.nanstd(data) # Use nanstd to ignore NaN values
print(f"Standard Deviation: {std_dev}") # Correct result
NumPy provides a specialized function to solve this: np.nanstd(). This function computes the standard deviation while automatically ignoring any NaN values in your dataset. It’s incredibly useful when you're working with real-world data, which often contains missing entries. Using np.nanstd() ensures your calculations proceed smoothly without being derailed by incomplete data, giving you a clean and accurate result.
Real-world applications
Now that you can sidestep common errors, you can confidently apply these calculations to solve real-world problems.
Setting grading thresholds with statistics.stdev()
Standard deviation offers a statistical method for setting grading thresholds, allowing you to define what constitutes an 'A' or 'C' based on the overall spread of scores.
import statistics
scores = [65, 72, 78, 80, 81, 85, 87, 88, 89, 90, 92, 95, 98]
mean = statistics.mean(scores)
std_dev = statistics.stdev(scores)
a_threshold = mean + std_dev
c_threshold = mean - std_dev
print(f"Mean: {mean:.1f}, Standard Deviation: {std_dev:.1f}")
print(f"A grade threshold (mean + 1σ): {a_threshold:.1f}")
print(f"C grade threshold (mean - 1σ): {c_threshold:.1f}")
This code shows a practical way to set data-driven grading cutoffs. It uses the statistics module to find the mean and std_dev of a list of scores. From there, it defines grade boundaries based on how far scores deviate from the average.
- The
a_thresholdis set one standard deviation above the mean. - The
c_thresholdis set one standard deviation below it.
This approach makes the grading scale responsive to the class's actual performance, rather than using fixed percentages.
Detecting outliers using the 3 * std_dev threshold
Standard deviation is also a powerful tool for outlier detection; a common rule of thumb is to flag any data point that is more than three standard deviations (3 * std_dev) away from the mean.
import numpy as np
temperatures = [22.1, 22.5, 22.3, 22.7, 22.0, 23.1, 22.8, 35.2, 22.9, 23.0]
mean = np.mean(temperatures)
std_dev = np.std(temperatures)
outliers = [t for t in temperatures if abs(t - mean) > 3 * std_dev]
outlier_positions = [i for i, t in enumerate(temperatures) if abs(t - mean) > 3 * std_dev]
print(f"Temperature outliers detected: {outliers}")
print(f"Outlier positions in dataset: {outlier_positions}")
This script demonstrates a practical way to spot unusual data points. It leverages NumPy's np.mean() and np.std() functions to establish a baseline for a list of temperatures. The key logic is in the list comprehensions.
- The first one filters the
temperatureslist, keeping only values where the distance from the mean—found withabs(t - mean)—exceeds the3 * std_devthreshold. - The second uses
enumerate()to pinpoint the exact index of these outliers in the original dataset.
This approach effectively isolates values that are statistically significant deviations.
Get started with Replit
Now, turn these concepts into a working tool. Describe your idea to Replit Agent, like "build a financial dashboard that visualizes stock price volatility" or "create a quality control monitor that flags outliers."
The agent writes the code, tests for errors, and deploys your application. Start building with Replit to turn your idea into a finished product.
Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.
Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.



.png)