How to differentiate in Python

Learn how to differentiate in Python. Discover different methods, tips, real-world applications, and how to debug common errors.

How to differentiate in Python
Published on: 
Wed
Mar 25, 2026
Updated on: 
Thu
Mar 26, 2026
The Replit Team

Differentiation is a core calculus concept, essential for tasks like optimization in machine learning. Python libraries make it straightforward to compute derivatives and avoid complex manual calculations.

You will explore several techniques to perform differentiation and get practical implementation tips for your own projects. You'll also see real-world applications and receive specific, actionable advice to debug your code for successful integration.

Basic numerical differentiation with NumPy

import numpy as np

def numerical_derivative(f, x, h=1e-5):
return (f(x + h) - f(x)) / h

# Example with f(x) = x^2
f = lambda x: x**2
print(f"Derivative of x^2 at x=3: {numerical_derivative(f, 3)}")--OUTPUT--Derivative of x^2 at x=3: 6.00010000002468

The numerical_derivative function approximates a derivative using the finite difference method. This approach mirrors the foundational definition of a derivative from calculus, which measures the rate of change over an infinitesimally small interval.

Here’s how it works:

  • The parameter h represents this small interval, set to a tiny value like 1e-5.
  • The function calculates the slope between two very close points: f(x) and f(x + h).

Notice the output for x^2 at x=3 isn't exactly 6. This highlights that numerical methods provide an approximation, not an exact symbolic result. The precision depends heavily on the choice of h.

Common differentiation methods

Beyond the basic finite difference method, you can use more robust techniques to achieve greater accuracy or find exact analytical derivatives.

Using sympy for symbolic differentiation

import sympy as sp

x = sp.Symbol('x')
expr = x**2 + 3*x + 2
derivative = sp.diff(expr, x)
print(f"Expression: {expr}")
print(f"Derivative: {derivative}")
print(f"Derivative at x=3: {derivative.subs(x, 3)}")--OUTPUT--Expression: x**2 + 3*x + 2
Derivative: 2*x + 3
Derivative at x=3: 9

For exact analytical results, you can use sympy. This library performs symbolic differentiation, which means it works with mathematical expressions directly rather than numerical approximations.

  • First, you define your variable as a symbol with sp.Symbol('x').
  • Then, the sp.diff() function computes the exact derivative formula—in this case, 2*x + 3.
  • Finally, you can evaluate this new expression at any point using subs(), which substitutes the symbol for a numerical value.

This approach gives you the precise answer of 9, avoiding the small errors inherent in numerical methods.

Using central difference approximation

def central_diff(f, x, h=1e-5):
return (f(x + h) - f(x - h)) / (2 * h)

def f(x):
return x**2 + 3*x + 2

print(f"Derivative at x=3: {central_diff(f, 3)}")--OUTPUT--Derivative at x=3: 9.000000000000666

The central difference method offers a more accurate approximation than the basic finite difference approach. It achieves this by calculating the slope between two points that are symmetric around x. This balanced approach often cancels out errors more effectively, leading to a better result.

  • The function evaluates the expression at both x + h and x - h.
  • It then divides the difference by 2 * h.

As you can see from the output, this method yields a result that's much closer to the exact derivative of 9. It's a great choice when you need higher precision without switching to symbolic math.

Using autograd for automatic differentiation

import autograd.numpy as np
from autograd import grad

def f(x):
return x**2 + 3*x + 2

df = grad(f)
print(f"Derivative at x=3: {df(3.0)}")
print(f"Derivative at x=5: {df(5.0)}")--OUTPUT--Derivative at x=3: 9.0
Derivative at x=5: 13.0

Automatic differentiation with autograd offers a powerful middle ground. It's more precise than numerical methods and often more efficient than symbolic computation—making it a go-to for machine learning.

  • The grad() function takes your original function, f, and returns a new function, df, that calculates its derivative.
  • You can then call df with any number to get the exact derivative at that point, as shown with df(3.0).
  • It's important to use autograd's wrapped version of NumPy so it can track all the operations needed for differentiation.

Advanced differentiation techniques

Once you've mastered the basics, you can extend these techniques to compute higher-order and partial derivatives or use powerful libraries like torch for gradients.

Computing higher-order derivatives

import sympy as sp

x = sp.Symbol('x')
f = sp.sin(x) + sp.exp(x)
df = sp.diff(f, x)
ddf = sp.diff(df, x)
print(f"Function: {f}")
print(f"First derivative: {df}")
print(f"Second derivative: {ddf}")--OUTPUT--Function: sin(x) + exp(x)
First derivative: cos(x) + exp(x)
Second derivative: -sin(x) + exp(x)

You can also compute higher-order derivatives, which are simply derivatives of derivatives. With sympy, this process is straightforward. You just apply the sp.diff() function sequentially to get the result you need.

  • First, you calculate the initial derivative of your function, which the code saves as df.
  • Then, you differentiate that result again to find the second derivative, ddf.

This method isn't limited to the second derivative; you can repeat it to find the third, fourth, and so on.

Computing partial derivatives

import sympy as sp

x, y = sp.symbols('x y')
f = 2*x**2 + 3*x*y + y**2
df_dx = sp.diff(f, x)
df_dy = sp.diff(f, y)
print(f"Function: {f}")
print(f"∂f/∂x: {df_dx}")
print(f"∂f/∂y: {df_dy}")--OUTPUT--Function: 2*x**2 + 3*x*y + y**2
∂f/∂x: 4*x + 3*y
∂f/∂y: 3*x + 2*y

For functions with multiple variables, you can compute partial derivatives. It's a key concept in multivariable calculus, and sympy makes it simple. You start by defining all your variables, for example, with sp.symbols('x y').

  • To find the partial derivative with respect to x, you use sp.diff(f, x). This treats y as a constant during the calculation.
  • Similarly, sp.diff(f, y) differentiates with respect to y while holding x constant.

This lets you isolate how the function changes as each variable shifts independently.

Using torch for gradient computation

import torch

x = torch.tensor([3.0], requires_grad=True)
y = x**2 + 3*x + 2
y.backward()
print(f"Function value at x=3: {y.item()}")
print(f"Derivative at x=3: {x.grad.item()}")--OUTPUT--Function value at x=3: 20.0
Derivative at x=3: 9.0

PyTorch, a go-to library for deep learning, excels at computing gradients using automatic differentiation. This method is highly efficient, especially for the complex functions found in AI models. It works by tracking operations to calculate derivatives automatically.

  • You start by creating a tensor and setting requires_grad=True to tell PyTorch you want to compute gradients with respect to it.
  • After defining your function, you call the .backward() method on the output tensor.
  • This triggers backpropagation, and the resulting gradient is stored in the input tensor’s .grad attribute.

Move faster with Replit

Replit is an AI-powered development platform that transforms natural language into working applications. You can describe what you want to build, and Replit Agent will create it—complete with a user interface, backend logic, and deployment.

You can use the differentiation methods from this article as a starting point. Replit Agent can take these concepts and build fully functional applications, such as:

  • A gradient descent visualizer that animates how an optimization algorithm finds the minimum of a function.
  • A physics simulation tool that calculates an object’s velocity and acceleration using higher-order derivatives.
  • A financial modeling dashboard that uses partial derivatives to show how different variables impact a business outcome.

Bring your own ideas to life by describing them in natural language. Replit Agent can write the code, set up the environment, and deploy your application, turning a concept into a reality in minutes.

Common errors and challenges

Even with powerful libraries, you might run into a few common pitfalls when differentiating in Python.

Fixing step size issues in the numerical_derivative function is a frequent challenge. The accuracy of this method hinges on the step size, represented by h.

  • If h is too large, the approximation becomes too coarse and strays far from the true derivative.
  • If h is too small, you can run into floating-point precision errors, where the computer struggles to handle the tiny numbers and produces inaccurate results.
  • Finding the right balance often requires some experimentation to see what works best for your specific function.

When using a torch.tensor, it's easy to make a gradient accumulation error. By default, PyTorch adds new gradients to any existing ones every time you call the .backward() method.

  • While this is useful for some advanced models, it often leads to incorrect results in standard optimization loops.
  • To avoid this, you must manually reset the gradients to zero before each backpropagation pass.
  • This is typically done by calling tensor.grad.zero_() on each parameter or using an optimizer's built-in optimizer.zero_grad() function.

You may also need to resolve convergence problems in an algorithm like gradient_descent. Sometimes, the model fails to find a function's minimum, which is often tied to the learning rate.

  • A learning rate that is too high can cause the algorithm to overshoot the minimum, bouncing around erratically without settling.
  • On the other hand, a learning rate that is too low can make convergence incredibly slow or cause the algorithm to get stuck in a local minimum.
  • Tuning the learning rate is the first step. If issues persist, you can explore adaptive learning rate methods that adjust the step size automatically.

Fixing step size issues in the numerical_derivative function

Choosing the right step size, h, is critical for the numerical_derivative function. When a function oscillates quickly, a large h value will average out the changes, giving a poor approximation of the true derivative. See how this plays out in the code below.

import numpy as np

def numerical_derivative(f, x, h=0.1): # step size too large
return (f(x + h) - f(x)) / h

def f(x):
return np.sin(10*x) # function with high frequency

print(f"Derivative at x=0.5: {numerical_derivative(f, 0.5)}")
print(f"Actual derivative: {10*np.cos(10*0.5)}")

With a step size h of 0.1, the calculation steps over the rapid changes in the np.sin(10*x) function, missing its steep curve entirely. This leads to a wildly inaccurate derivative. See how a small adjustment fixes this.

import numpy as np

def numerical_derivative(f, x, h=1e-6): # much smaller step size
return (f(x + h) - f(x)) / h

def f(x):
return np.sin(10*x) # function with high frequency

print(f"Derivative at x=0.5: {numerical_derivative(f, 0.5)}")
print(f"Actual derivative: {10*np.cos(10*0.5)}")

By shrinking the step size h to a tiny value like 1e-6, the approximation becomes far more accurate. A smaller step captures the rapid oscillations in a high-frequency function like np.sin(10*x), preventing the calculation from "stepping over" the curve's steep parts. This is why the new result is much closer to the true derivative. You'll need to pay close attention to step size for any function that changes quickly to avoid inaccurate results.

Avoiding the gradient accumulation error with torch.tensor

When you call .backward() in PyTorch, it doesn't replace the old gradient—it adds the new one on top. This accumulation is a feature, but it can cause unexpected results if you're not aware of it. The following code demonstrates this common pitfall.

import torch

x = torch.tensor([2.0], requires_grad=True)

# Multiple computations without zeroing gradients
y1 = x**2
y1.backward()
print(f"First gradient: {x.grad}")

y2 = 3*x
y2.backward() # This will cause an error because gradients accumulate
print(f"Second gradient: {x.grad}")

The first .backward() call calculates the gradient for y1. Because the gradient isn't reset, the second call adds the new gradient from y2 on top, producing an incorrect result. See the proper approach below.

import torch

x = torch.tensor([2.0], requires_grad=True)

# First computation
y1 = x**2
y1.backward()
print(f"First gradient: {x.grad}")

# Zero gradients before next computation
x.grad.zero_()

y2 = 3*x
y2.backward()
print(f"Second gradient: {x.grad}")

The fix is simple—you must manually reset the gradients before each new calculation. By calling x.grad.zero_() after the first .backward() call, you clear the existing gradient. This ensures the next .backward() call computes the gradient for y2 correctly, without adding it to the old one. This is essential in training loops where you update weights iteratively, as you need a fresh gradient for each step.

Resolving convergence problems in gradient_descent

The gradient_descent algorithm can fail to converge if its learning_rate is too high. When a function is steep, the algorithm's steps become too large, causing it to overshoot the minimum repeatedly. The following code demonstrates what happens when it doesn't work.

def gradient_descent(f, df, x0, learning_rate=1.0, iterations=100):
x = x0
for i in range(iterations):
x = x - learning_rate * df(x)
return x

# Trying to find minimum of a steep function
df = lambda x: 20*x
min_x = gradient_descent(lambda x: 10*x**2, df, 5.0)
print(f"Minimum found at x = {min_x}")

The learning_rate is so high that each update step, x - learning_rate * df(x), sends the value of x further from the minimum. The algorithm diverges instead of converging. See how a small change corrects this.

def gradient_descent(f, df, x0, learning_rate=0.05, iterations=100):
x = x0
for i in range(iterations):
gradient = df(x)
x = x - learning_rate * gradient
return x

# Trying to find minimum of a steep function
df = lambda x: 20*x
min_x = gradient_descent(lambda x: 10*x**2, df, 5.0)
print(f"Minimum found at x = {min_x}")

The fix is to lower the learning_rate to a smaller value like 0.05. This ensures the gradient_descent algorithm takes smaller, more manageable steps toward the minimum.

  • A smaller step size prevents the algorithm from overshooting the target, especially on steep functions.
  • This allows the updates to converge steadily instead of diverging.

Keep an eye on this when your algorithm's values jump around erratically or move away from the minimum.

Real-world applications

Now that you've learned to navigate differentiation's common pitfalls, you can use it to find a function's minimum or calculate an object's velocity.

Finding function minima with gradient_descent

A primary application of differentiation is finding a function's minimum with an algorithm like gradient_descent, which iteratively takes steps in the opposite direction of the derivative to find the lowest point.

import numpy as np

def gradient_descent(f, df, x0, learning_rate=0.1, iterations=100):
x = x0
for i in range(iterations):
x = x - learning_rate * df(x)
return x

df = lambda x: 2*x + 4
min_x = gradient_descent(lambda x: x**2 + 4*x + 4, df, 5.0)
print(f"Minimum found at x = {min_x}")
print(f"Function value at minimum: {min_x**2 + 4*min_x + 4}")

The gradient_descent function iteratively finds a function's minimum. It starts at an initial point x0 and refines its guess over a set number of iterations. In each step, it calculates the gradient with df(x) to determine the slope and moves the current value x slightly downhill.

  • The learning_rate controls how big each step is—a crucial parameter for convergence.
  • The core logic is x = x - learning_rate * df(x), which updates the position.
  • The example finds the minimum for the function x**2 + 4*x + 4, starting from x=5.0.

Calculating velocity and acceleration from position data

Differentiation is also fundamental for analyzing physical motion, allowing you to calculate an object's velocity and acceleration from its position data over time.

import numpy as np

time = np.array([0, 1, 2, 3, 4])
position = np.array([0, 1, 4, 9, 16]) # x = t²

def numerical_derivative(f, t, dt=1):
return [(f[i+1] - f[i])/dt for i in range(len(f)-1)]

velocity = numerical_derivative(position, time)
acceleration = numerical_derivative(velocity, time[:-1])
print(f"Velocities: {velocity}")
print(f"Accelerations: {acceleration}")

The numerical_derivative function calculates the rate of change between discrete data points, like measurements taken over time. It uses a list comprehension to compute the difference between each consecutive point in an array. This is a practical way to approximate a derivative when you don't have a continuous mathematical function to work with.

  • First, the code applies this function to the position data to calculate the velocity.
  • It then reuses the function on the resulting velocity data to find the acceleration.

The default time step, dt=1, assumes each measurement was taken one unit of time apart.

Get started with Replit

Put these concepts into practice by building a real tool. Tell Replit Agent to “build a gradient descent visualizer” or “create a symbolic derivative calculator that accepts a math expression and shows the steps.”

Replit Agent will write the code, test for errors, and deploy your app for you. Start building with Replit and bring your ideas to life.

Get started free

Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

Get started for free

Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.