How to calculate Euclidean distance in Python

Learn how to calculate Euclidean distance in Python. Discover different methods, real-world applications, and tips for debugging common errors.

How to calculate Euclidean distance in Python
Published on: 
Tue
Mar 17, 2026
Updated on: 
Fri
Mar 20, 2026
The Replit Team

Euclidean distance measures the straight-line distance between two points. In Python, you can calculate this fundamental metric for data analysis, machine learning, and scientific computing applications.

In this article, we'll cover several techniques to compute this distance. You'll find practical tips, explore real-world applications, and get advice to debug your code effectively.

Basic calculation with the Pythagorean formula

import math

point1 = (1, 2)
point2 = (4, 6)
distance = math.sqrt((point2[0] - point1[0])**2 + (point2[1] - point1[1])**2)
print(f"Euclidean distance: {distance}")--OUTPUT--Euclidean distance: 5.0

This approach translates the Pythagorean formula directly into Python code. You're essentially calculating the length of the hypotenuse of a right triangle formed by the two points.

  • First, the code finds the difference between the x-coordinates (point2[0] - point1[0]) and the y-coordinates.
  • Next, each difference is squared using the **2 operator.
  • Finally, the math.sqrt() function computes the square root of their sum, giving you the final distance.

Standard libraries for distance calculations

Moving beyond the manual approach, you'll find that libraries like NumPy and SciPy provide more efficient and concise ways to calculate the same distance.

Using numpy.linalg.norm() for efficient calculation

import numpy as np

point1 = np.array([1, 2])
point2 = np.array([4, 6])
distance = np.linalg.norm(point2 - point1)
print(f"Euclidean distance: {distance}")--OUTPUT--Euclidean distance: 5.0

The NumPy library simplifies this calculation by treating points as vectors. First, you convert your coordinate tuples into NumPy arrays using np.array(). Subtracting one array from the other—point2 - point1—creates a new vector representing the displacement between the points.

  • The np.linalg.norm() function then calculates the length of this displacement vector.
  • By default, it computes the L2 norm, which is another name for the Euclidean distance.
  • This approach is cleaner and more efficient, especially when you're working with higher-dimensional data.

Using scipy.spatial.distance.euclidean() function

from scipy.spatial import distance

point1 = (1, 2)
point2 = (4, 6)
dist = distance.euclidean(point1, point2)
print(f"Euclidean distance: {dist}")--OUTPUT--Euclidean distance: 5.0

SciPy provides a specialized function for this task within its scipy.spatial module. The distance.euclidean() function is highly readable and communicates its purpose clearly, making your code easy to understand.

  • It works directly with list-like data structures, such as tuples, so you don't need to convert your points into a specific array type first.

This approach is an excellent choice when you need a straightforward and explicit method for calculating Euclidean distance, especially if you're already using SciPy for other scientific computing tasks.

Using math.sqrt() and sum() for multi-dimensional points

import math

point1 = (1, 2, 3)
point2 = (4, 6, 8)
distance = math.sqrt(sum((a - b) ** 2 for a, b in zip(point1, point2)))
print(f"Euclidean distance in 3D: {distance}")--OUTPUT--Euclidean distance in 3D: 7.0710678118654755

This method cleverly extends the Pythagorean formula to handle points with any number of dimensions. It’s a flexible approach that works for 2D, 3D, or even higher-dimensional spaces without needing external libraries.

  • The zip() function first pairs up the corresponding coordinates from each point.
  • A generator expression, (a - b) ** 2 for a, b in zip(...), calculates the squared difference for each dimension.
  • The sum() function adds all these squared differences together.
  • Finally, math.sqrt() computes the square root of the total sum.

Advanced Euclidean distance techniques

With the basics covered, you're ready to tackle more complex scenarios by creating reusable functions, optimizing calculations for multiple points, and implementing weighted distance.

Creating a reusable Euclidean distance function

import math

def euclidean_distance(point1, point2):
return math.sqrt(sum([(a - b) ** 2 for a, b in zip(point1, point2)]))

print(euclidean_distance([1, 2, 3], [4, 6, 8]))
print(euclidean_distance([1, 2, 3, 4], [5, 6, 7, 8]))--OUTPUT--7.0710678118654755
8.0

Encapsulating the logic in a function like euclidean_distance makes your code much cleaner and more reusable. Instead of rewriting the formula each time, you can simply call this function with different sets of points. It’s a practical way to abstract away the complexity.

  • This approach is highly flexible. The function automatically adapts to points of any dimension, as long as both inputs have the same number of coordinates.
  • You can pass lists or tuples directly, making it easy to integrate into your projects without extra data conversion.

Vectorized calculation for multiple points

import numpy as np

points1 = np.array([[1, 2], [3, 4], [5, 6]])
points2 = np.array([[7, 8], [9, 10], [11, 12]])
distances = np.linalg.norm(points2 - points1, axis=1)
print(distances)--OUTPUT--[8.48528137 8.48528137 8.48528137]

When you're working with multiple pairs of points, NumPy's vectorized operations are incredibly efficient. Instead of looping, you can subtract the entire arrays—points2 - points1—in a single operation. This instantly gives you an array of displacement vectors, one for each pair of points.

  • The key is the axis=1 argument. It instructs np.linalg.norm() to perform the calculation along each row.
  • This means you get the distance for each point pair individually, all in one go, which is much faster than processing them one by one.

Implementing weighted Euclidean distance

import numpy as np

def weighted_euclidean(point1, point2, weights):
return np.sqrt(np.sum(weights * (point2 - point1) ** 2))

p1 = np.array([1, 2, 3])
p2 = np.array([4, 6, 8])
weights = np.array([0.5, 1.0, 2.0]) # Different importance per dimension
print(f"Weighted distance: {weighted_euclidean(p1, p2, weights)}")--OUTPUT--Weighted distance: 8.366600265340756

Sometimes, not all dimensions are equally important. Weighted Euclidean distance lets you assign more or less significance to certain coordinates using a weights array. It's a useful variation when some features in your data matter more than others.

  • The calculation starts by finding the squared difference for each dimension, but then it multiplies each result by its corresponding weight.
  • A weight greater than 1.0 increases a dimension's influence on the total distance, while a weight less than 1.0 diminishes it.
  • Finally, np.sum() adds these adjusted values before np.sqrt() computes the final distance.

Move faster with Replit

Replit is an AI-powered development platform that transforms natural language into working applications. Describe what you want to build, and Replit Agent creates it—complete with databases, APIs, and deployment.

For the distance calculation techniques we've explored, Replit Agent can turn them into production-ready tools:

  • Build a nearest neighbor finder that identifies the closest data points in a machine learning model. This could use the vectorized np.linalg.norm() for speed.
  • Create a geographic distance utility that calculates the straight-line distance between two sets of coordinates.
  • Deploy a simple recommendation engine that suggests similar items by measuring the distance between their feature vectors.

Describe your app idea, and Replit Agent writes the code, tests it, and fixes issues automatically, all in your browser.

Common errors and challenges

Even with powerful libraries, you might run into issues like type mismatches, division by zero, or missing data when calculating distances.

A common hurdle with numpy.linalg.norm() is the TypeError. This usually happens when your input data isn't purely numerical. For instance, if you read data from a file and a number is accidentally stored as a string, NumPy won't be able to perform the mathematical operations.

To fix this, you need to ensure your arrays contain only numbers like integers or floats before you pass them to the function. You can use the .astype() method on your NumPy array to explicitly convert the data to a numerical type.

Vector normalization is a related task where you divide a vector by its magnitude to get a unit vector. If you try to normalize a zero vector—which can happen if you're calculating the distance between two identical points—you'll get a DivisionByZeroError because its magnitude is zero.

It’s good practice to add a check before you normalize. You can calculate the norm first and, if it's zero, decide how to handle it. Depending on your goal, you might return the zero vector as is or flag it as a special case.

Real-world data is messy, and you'll often find missing values, represented in NumPy as np.nan. When you try to calculate the distance with data containing np.nan, the result will also be np.nan, which can silently disrupt your analysis.

You have a few options for dealing with this:

  • You can filter out any points that contain missing values, though this might mean losing valuable data.
  • Another approach is imputation, where you replace the np.nan with a calculated value, such as the mean or median of that feature across all your data points.

Functions like np.isnan() are useful for finding these missing values so you can handle them before the distance calculation.

Fixing type errors when calculating with numpy.linalg.norm()

A TypeError is a frequent stumbling block with numpy.linalg.norm(), often caused by mixing standard Python data types. You can't directly subtract a list from a tuple, as the subtraction operator (-) doesn't support it. The code below demonstrates this common error.

import numpy as np

point1 = [1, 2, 3] # List
point2 = (4, 6, 8) # Tuple
distance = np.linalg.norm(point2 - point1) # TypeError: unsupported operand type(s)
print(f"Euclidean distance: {distance}")

The code fails because the subtraction operator (-) can't handle mixed data structures like a list and a tuple. For its vectorized math to work, NumPy expects its own array types. The corrected implementation resolves this incompatibility.

import numpy as np

point1 = np.array([1, 2, 3]) # Convert to numpy array
point2 = np.array([4, 6, 8]) # Convert to numpy array
distance = np.linalg.norm(point2 - point1)
print(f"Euclidean distance: {distance}")

The solution is to convert both points into NumPy arrays using np.array(). This enables NumPy's element-wise subtraction to execute correctly. The original TypeError occurs because the subtraction operator (-) doesn't work between standard Python lists and tuples. Keep an eye out for this issue when you're combining data from different sources, and always ensure your inputs are NumPy arrays before performing vectorized calculations.

Avoiding division by zero when normalizing vectors

Vector normalization involves dividing a vector by its magnitude. A ZeroDivisionError occurs if you try to normalize a zero vector, like [0, 0], because its magnitude is zero. This is a common issue when processing datasets with null entries.

The following code demonstrates what happens when you attempt this operation on an array containing a zero vector without any safeguards.

import numpy as np

points = np.array([[1, 2], [0, 0], [5, 6]])
distances = np.linalg.norm(points, axis=1)
normalized = points / distances[:, np.newaxis] # ZeroDivisionError at [0, 0]
print(normalized)

The distances array includes a zero for the [0, 0] vector. The broadcasting operation points / distances[:, np.newaxis] then attempts to divide by that zero, causing the error. The following implementation shows how to handle this case.

import numpy as np

points = np.array([[1, 2], [0, 0], [5, 6]])
distances = np.linalg.norm(points, axis=1)
# Add small epsilon to avoid division by zero
normalized = points / (distances[:, np.newaxis] + 1e-10)
print(normalized)

The solution is a common numerical stability trick. By adding a tiny value, or epsilon, to the denominator with distances[:, np.newaxis] + 1e-10, you prevent the division operation from ever encountering a true zero. This sidesteps the error while having a negligible impact on the final result for non-zero vectors. It’s a practical way to handle datasets that might contain zero vectors, which can arise from duplicate data points or null entries.

Handling missing values with np.nan in distance calculations

Since any calculation involving np.nan results in np.nan, your distance function will return a non-numerical value if even one coordinate is missing. This can silently break your analysis downstream. The following code shows how this plays out with np.linalg.norm().

import numpy as np

point1 = np.array([1, 2, np.nan])
point2 = np.array([4, 5, 6])
distance = np.linalg.norm(point2 - point1) # Returns NaN
print(f"Euclidean distance: {distance}")

The subtraction operation propagates the np.nan from point1 into the resulting vector. Consequently, np.linalg.norm() can't compute a numerical distance and returns nan. The following implementation shows one way to address this.

import numpy as np

point1 = np.array([1, 2, np.nan])
point2 = np.array([4, 5, 6])
valid_indices = ~np.isnan(point1) & ~np.isnan(point2)
distance = np.linalg.norm(point2[valid_indices] - point1[valid_indices])
print(f"Euclidean distance (ignoring NaNs): {distance}")

This solution calculates the distance using only the dimensions where both points have valid data. It first identifies indices without np.nan values using ~np.isnan(). The & operator then creates a final mask for indices valid in both arrays. By slicing the arrays with this mask—point1[valid_indices]—you ensure np.linalg.norm() only operates on complete data pairs. This is a reliable way to handle incomplete data from real-world sources without discarding entire data points.

Real-world applications

With the technical challenges solved, you can now use these distance calculations to tackle practical problems in e-commerce and data analysis.

Finding similar products with euclidean_distance()

By treating product features like price and rating as coordinates, you can use Euclidean distance to find the most similar item for a customer.

import numpy as np

# Product features: [price, rating, popularity]
products = np.array([
[59.99, 4.2, 85], # Product 1
[29.99, 3.8, 70], # Product 2
[49.99, 4.5, 90], # Product 3
])
user_preference = np.array([45.00, 4.0, 80])

# Find most similar product
distances = np.linalg.norm(products - user_preference, axis=1)
most_similar = np.argmin(distances)
print(f"Most similar product: {most_similar+1} with distance: {distances[most_similar]:.2f}")

This example uses NumPy's vectorized operations to efficiently find the best match from a list of products based on a user's preference. It treats each product and the user's preference as arrays of numerical features.

  • The np.linalg.norm() function calculates the distance between the user's preference and all products at once.
  • Using axis=1 is key, as it ensures the distance is computed for each product row individually.
  • Finally, np.argmin() returns the index of the product with the smallest distance, identifying the closest match.

Clustering customer data with k-means using Euclidean distance

The k-means algorithm leverages Euclidean distance to partition customer data into distinct clusters, which helps you identify patterns in their behavior.

from sklearn.cluster import KMeans
import numpy as np

# Customer data: [annual_income($K), spending_score(1-100)]
customer_data = np.array([
[45, 85], [50, 88], [35, 35], [75, 40],
[60, 30], [80, 70], [30, 30], [65, 90]
])

kmeans = KMeans(n_clusters=3, random_state=42)
clusters = kmeans.fit_predict(customer_data)
print(f"Customer clusters: {clusters}")
print(f"Cluster centers: \n{kmeans.cluster_centers_}")

This code uses scikit-learn's KMeans algorithm to group customer data based on income and spending scores. The model is configured to find three distinct groups by setting n_clusters=3, and random_state=42 ensures the results are reproducible.

  • The fit_predict() method does all the work. It trains the model on your customer_data and assigns each customer to a cluster in a single step.
  • The output shows these assignments as an array. It also reveals the final cluster centers, which represent the average income and spending score for each group.

Get started with Replit

Put these distance calculations to work. Tell Replit Agent to “build a tool that finds the closest color in a palette” or “create a utility that suggests similar items based on feature distance.”

The agent writes the code, tests for errors, and deploys the app for you. Start building with Replit.

Get started free

Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

Get started for free

Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.