How to use regex in Python

Learn how to use regex in Python with our guide. Discover tips, real-world applications, and how to debug common errors.

Published on:

Tue

Mar 10, 2026

Updated on:

Wed

Apr 1, 2026

The Replit Team

ON THIS PAGE

Example H2

Regular expressions, or regex, are powerful tools for pattern matching in text. Python's re module provides a robust way to search, split, and manipulate strings with complex patterns.

In this article, you’ll explore key techniques and real-world applications. You'll also get practical tips and debugging advice to help you master regex for your specific needs.

Basic pattern matching with `re.search()`

import re text = "Python is awesome" match = re.search(r"Python", text) print(f"Found match: {match.group(0)}")--OUTPUT--Found match: Python

The re.search() function is your go-to for finding the first occurrence of a pattern anywhere in a string. It doesn't just return the matched text. Instead, it gives you a match object that contains details about what it found.

Here’s a breakdown of the key steps:

The function looks for the pattern r"Python" within the text variable. Using a raw string with the r prefix is a best practice that prevents backslashes in your pattern from being misinterpreted.
When a match is found, you can access the matched text itself by calling the group(0) method on the resulting match object.

Common regex operations

Beyond finding the first match with re.search(), you'll often need to find all occurrences of a pattern or replace text entirely.

Finding all occurrences with `re.findall()`

import re text = "Python is great. Python is powerful." matches = re.findall(r"Python", text) print(f"Found {len(matches)} matches: {matches}")--OUTPUT--Found 2 matches: ['Python', 'Python']

When you need to find every instance of a pattern, re.findall() is the tool for the job. Unlike re.search(), which stops after the first hit, this function scans the entire string and collects all non-overlapping matches.

It returns a list of strings, not a collection of match objects.
If no matches are found, you’ll get an empty list. This simplifies your code since you don't need to check for None.

Replacing text with `re.sub()`

import re text = "Python is my favorite language" modified = re.sub(r"Python", "RegEx", text) print(f"Original: {text}\nModified: {modified}")--OUTPUT--Original: Python is my favorite language Modified: RegEx is my favorite language

For search-and-replace tasks, re.sub() is your best friend. It scans a string for all occurrences of a pattern and replaces them with a specified replacement string, returning a new string with the changes.

The function takes three key arguments: the pattern to find, the string to replace it with, and the input text.
Unlike re.search(), it replaces every match it finds, not just the first one.
It returns a completely new string, leaving your original data intact.

Using raw strings for pattern clarity

import re # Without raw string match1 = re.search("\\d+", "I have 123 apples") # With raw string match2 = re.search(r"\d+", "I have 123 apples") print(f"Both find: {match1.group(0)} and {match2.group(0)}")--OUTPUT--Both find: 123 and 123

Python strings process backslashes for escape sequences, which can clash with regex patterns that also use them. To match a digit with \d, you'd need to write "\\d" in a regular string to escape the backslash for Python's interpreter first.

Using a raw string, prefixed with an r like in r"\d+", is the standard solution. It tells the interpreter to pass the string directly to the regex engine without processing the backslashes.

This simple practice makes your patterns much cleaner and helps you avoid hard-to-spot bugs, especially as your expressions get more complex.

Advanced regex techniques

Now that you can find and replace simple patterns, you can tackle more complex challenges using capturing groups, search flags, and lookaround assertions.

Capturing groups with parentheses

import re text = "Email me at [email protected] for details" match = re.search(r"(\w+)@(\w+)\.(\w+)", text) print(f"Username: {match.group(1)}, Domain: {match.group(2)}, TLD: {match.group(3)}")--OUTPUT--Username: user, Domain: example, TLD: com

Capturing groups, created by placing parentheses () in your pattern, let you extract specific substrings from a larger match. They're perfect for when you need to parse structured data, like pulling apart an email address.

The pattern r"(\w+)@(\w+)\.(\w+)" uses three groups to isolate the username, domain, and TLD.
You can access these captured parts individually using the group() method with an index, like match.group(1) for the first group.
Remember that match.group(0) still returns the entire matched string, [email protected].

Modifying search behavior with flags

import re text = "Python\nJAVA\nC++" matches_case_sensitive = re.findall(r"java", text) matches_insensitive = re.findall(r"java", text, re.IGNORECASE) print(f"Case sensitive: {matches_case_sensitive}\nCase insensitive: {matches_insensitive}")--OUTPUT--Case sensitive: [] Case insensitive: ['JAVA']

You can fine-tune your searches with regex flags, which are special arguments that modify a pattern's behavior. By default, matching is case-sensitive, so a search for "java" won't find "JAVA". Adding a flag changes this.

Passing re.IGNORECASE to functions like re.findall() makes your search case-insensitive.
This allows you to write simpler patterns that work more flexibly, without needing to account for every possible capitalization.

Using lookahead and lookbehind assertions

import re text = "Price: $50, Cost: $30, Value: $80" # Find numbers that follow a dollar sign prices = re.findall(r"(?<=\$)\d+", text) print(f"Extracted prices: {prices}")--OUTPUT--Extracted prices: ['50', '30', '80']

Lookarounds let you match text based on what’s around it—without including that context in the final result. They’re perfect for when you need to enforce a rule, like “must be preceded by a dollar sign,” but only want to extract the value itself.

The pattern r"(?<=\$)\d+" uses a positive lookbehind, (?<=\$), to assert that the match must be preceded by a literal $.
The engine only captures the \d+ part (the digits) because the lookbehind is just a condition, not part of the match.

This is why you get a clean list of numbers, without the currency symbols.

Move faster with Replit

Replit is an AI-powered development platform that comes with all Python dependencies pre-installed, so you can skip setup and start coding instantly. Describe what you want to build, and Agent 4 handles everything from writing the code to connecting databases and deploying it live.

Instead of piecing together techniques, you can describe the app you actually want to build and the Agent will take it from idea to working product:

An invoice parser that scans documents to extract all item numbers and their corresponding prices, using patterns to find values that follow specific labels.
A content sanitizer that automatically finds and replaces all sensitive information, like credit card numbers or personal IDs, with a placeholder.
A URL validator that checks if a link is properly formatted and breaks it down into its core components like the protocol, domain, and path.

Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.

Common errors and challenges

Regex is powerful, but you'll likely run into a few common pitfalls as you get started.

Handling potential None results from re.search(): If re.search() doesn't find a match, it returns None instead of a match object. A common mistake is to immediately try calling a method like .group() on the result, which will cause an AttributeError. Always check if the result is not None before you try to use it.
Escaping special characters in patterns: Characters like ., *, +, and $ have special meanings in regex. If you want to match these characters literally—for instance, to find an actual period in a sentence—you must escape them with a backslash (\). Forgetting to escape them, like using . when you meant \., is a frequent cause of patterns not working as expected.
Understanding greedy vs. non-greedy quantifiers: By default, quantifiers like * and + are "greedy," meaning they try to match as much text as possible. For example, if you use <.*> to find a tag in <h1>Title</h1>, it will match the entire string from the first < to the last >. To fix this, you can make the quantifier "non-greedy" by adding a question mark, like <.*?>. This version matches the shortest possible string, correctly stopping at the first closing >.

Handling potential `None` results from `re.search()`

A frequent stumbling block when using re.search() is forgetting that it returns None if no match is found. If you assume a match object always exists and try to call .group(), your script will crash with an AttributeError. The following code demonstrates this common error.

import re text = "Python is awesome" match = re.search(r"Java", text) print(f"Found match: {match.group(0)}")

The pattern r"Java" doesn't exist in the string, so match becomes None. The script fails when it attempts to call .group(0) on this None object. Here’s how to handle it correctly.

import re text = "Python is awesome" match = re.search(r"Java", text) if match: print(f"Found match: {match.group(0)}") else: print("No match found")

The solution is to wrap your logic in a conditional check. By writing if match:, you test whether the re.search() function returned a match object or None. Since None evaluates to False in a boolean context, the code inside the if block only runs when a match is actually found. This simple check prevents the AttributeError and makes your code more robust, especially when patterns might not always match your input text.

Escaping special characters in patterns

In regex, characters like . and $ have special jobs. If you want to match them literally, you'll need to escape them with a backslash. Forgetting this is a common pitfall that causes patterns to fail. The following code shows this in action.

import re text = "The price is $50.00" match = re.search(r"$\d+.\d+", text) print(f"Price: {match.group(0) if match else 'Not found'}")

The pattern r"$\d+.\d+" fails because $ is an anchor for the end of the string, not a literal dollar sign. Additionally, . matches any character, not a decimal point. The corrected code below shows the proper approach.

import re text = "The price is $50.00" match = re.search(r"\$\d+\.\d+", text) print(f"Price: {match.group(0) if match else 'Not found'}")

The fix works by "escaping" the special characters with a backslash. The pattern r"\$\d+\.\d+" tells the regex engine to look for a literal dollar sign (\$) and a literal period (\.), not their special meanings. You'll need to do this whenever your search pattern includes characters that have a special function in regex, like . or $, to ensure they're matched as plain text. This is a common solution for unexpected match failures.

Understanding greedy vs. non-greedy `*` quantifiers

By default, quantifiers like * are "greedy," meaning they'll match as much text as they possibly can. This often leads to unexpected results when you're trying to extract a smaller piece of a string. The following code demonstrates this common issue.

import re html = "<div>First content</div><div>Second content</div>" match = re.search(r"<div>.*</div>", html) print(f"Extracted: {match.group(0)}")

The .* pattern matches everything from the first <div> to the very last </div> in the string, capturing both divs instead of just the first one. The corrected code below shows how to fix this behavior.

import re html = "<div>First content</div><div>Second content</div>" match = re.search(r"<div>.*?</div>", html) print(f"Extracted: {match.group(0)}")

The solution is to make the quantifier non-greedy by adding a question mark: .*?. This simple change flips the behavior from "match as much as possible" to "match as little as possible." The engine now stops at the first </div> it encounters, correctly isolating the first element. You'll find this technique crucial whenever you're parsing text with repeating structures, like HTML tags or log entries, and need to capture distinct blocks.

Real-world applications

Moving beyond theory and troubleshooting, you can apply regex to practical challenges like parsing log files and standardizing messy text data.

Extracting data from log files with `re.finditer()`

The re.finditer() function is particularly useful for scanning large files like logs, as it provides a memory-efficient iterator that yields a detailed match object for each find, including its position.

import re log_entry = "192.168.1.1 - - [21/Oct/2023:10:32:24 +0000] \"GET /index.html HTTP/1.1\" 200 4523" ip_pattern = r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}" for match in re.finditer(ip_pattern, log_entry): print(f"Found IP address at position {match.start()}: {match.group(0)}")

This example shows how you can parse a log entry to find an IP address. The pattern, r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}", is specifically built to match the four-part structure of an IP.

The \d{1,3} component looks for a sequence of one to three digits.
The \. part matches a literal dot, which is crucial for separating the numbers.

The re.finditer() function scans the string for all non-overlapping occurrences of this pattern. Your for loop then processes each resulting match object, and match.group(0) extracts the full matched string—the IP address.

Cleaning and standardizing text data with regex

When you're dealing with messy data from different sources, regex provides a powerful way to find and reformat information into a consistent style.

import re raw_data = ["Phone: (555) 123-4567", "Tel: 555.987.6543", "Contact: 555-789-0123"] clean_numbers = [] for entry in raw_data: number = re.search(r"(\d{3})[)\s.-]+(\d{3})[.\s-]+(\d{4})", entry) if number: standardized = f"{number.group(1)}-{number.group(2)}-{number.group(3)}" clean_numbers.append(standardized) print(f"Standardized phone numbers: {clean_numbers}")

This code standardizes phone numbers from a list of messy strings. It uses re.search() to locate and pull out the digits, successfully navigating the different separators like parentheses, spaces, and dots.

Capturing groups like (\d{3}) are used to isolate the three distinct parts of the phone number.
A character set, [)\s.-]+, flexibly matches any of the inconsistent separators between the number segments.

Once the parts are captured, an f-string rebuilds them into a consistent format, which is then added to a new, clean list.

Get started with Replit

Turn your new regex skills into a real tool. Just tell Replit Agent what you need: "a script to parse log files for IP addresses" or "a tool to standardize all phone numbers in a CSV file".

Replit Agent writes the code, tests for errors, and deploys your app. You just provide the instructions. Start building with Replit.

Build your first app today

Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.

Get started free

Get started for free

Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

Get started for free