How to use regex in Python
Master regex in Python. Explore methods, tips, real-world examples, and learn how to debug common errors.

Regular expressions, or regex, are a powerful tool for pattern matching in text. Python's re module provides a comprehensive set of functions to search, split, and manipulate strings with precision.
In this article, you'll explore core regex techniques and see real world applications. You'll find practical tips and debugging advice to help you confidently apply these patterns in your own projects.
Basic pattern matching with re.search()
import re
text = "Python is awesome"
match = re.search(r"Python", text)
print(f"Found match: {match.group(0)}")--OUTPUT--Found match: Python
The re.search() function scans a string to find the first location where the pattern produces a match. It’s ideal when you only need to confirm a pattern’s existence or find its first occurrence anywhere in the text. If successful, it returns a match object that holds details about what was found.
The r before the pattern string is a crucial detail. It signifies a raw string, which prevents backslashes in your pattern from being misinterpreted by Python. Once you have a match object, you can use methods like .group(0) to extract the specific text that matched the pattern.
Common regex operations
While re.search() locates the first pattern match, you'll often need to perform other common operations like finding every instance or replacing text altogether.
Finding all occurrences with re.findall()
import re
text = "Python is great. Python is powerful."
matches = re.findall(r"Python", text)
print(f"Found {len(matches)} matches: {matches}")--OUTPUT--Found 2 matches: ['Python', 'Python']
Unlike re.search(), which stops after the first hit, re.findall() scans the entire string to find every non-overlapping match. This makes it perfect for tasks where you need to count occurrences or extract all instances of a pattern.
- The function returns a list of strings, where each string is a match. As seen in the example,
matchesbecomes['Python', 'Python']. - If the pattern isn't found, you'll get an empty list instead of an error or
None. This makes it convenient to work with.
Replacing text with re.sub()
import re
text = "Python is my favorite language"
modified = re.sub(r"Python", "RegEx", text)
print(f"Original: {text}\nModified: {modified}")--OUTPUT--Original: Python is my favorite language
Modified: RegEx is my favorite language
The re.sub() function is your go-to for search-and-replace tasks. It finds all occurrences of a pattern and substitutes them with a new string, returning a modified copy while leaving the original text untouched.
- Its core arguments are the pattern, the replacement string, and the text you're searching.
- Unlike a simple string replacement, you can use complex regex patterns to find what needs changing.
This makes it incredibly powerful for tasks like sanitizing user input or refactoring code across multiple files.
Using raw strings for pattern clarity
import re
# Without raw string
match1 = re.search("\\d+", "I have 123 apples")
# With raw string
match2 = re.search(r"\d+", "I have 123 apples")
print(f"Both find: {match1.group(0)} and {match2.group(0)}")--OUTPUT--Both find: 123 and 123
The r prefix before a pattern string isn't just for show—it creates a raw string. In standard Python strings, a backslash is an escape character. This means you'd have to type "\\d+" to represent the regex pattern \d+, which can quickly become unreadable.
- Raw strings treat backslashes as literal characters, letting you write patterns like
r"\d+"naturally. - This simple habit makes your code cleaner and helps you avoid hard-to-spot bugs, especially with complex patterns.
Advanced regex techniques
Now that you've seen how to find and replace text, you can unlock more powerful patterns with capturing groups, search flags, and lookaround assertions.
Capturing groups with parentheses
import re
text = "Email me at [email protected] for details"
match = re.search(r"(\w+)@(\w+)\.(\w+)", text)
print(f"Username: {match.group(1)}, Domain: {match.group(2)}, TLD: {match.group(3)}")--OUTPUT--Username: user, Domain: example, TLD: com
Parentheses () in your pattern create capturing groups, letting you isolate and extract specific parts of a match. This is incredibly useful when you need to pull out structured data from a string, not just find the pattern itself.
- You can access these captured substrings using the
match.group()method with an index, starting from 1. - In the example,
match.group(1)grabs the username,match.group(2)gets the domain, andmatch.group(3)isolates the top-level domain.
This turns a simple match into a powerful parsing tool.
Modifying search behavior with flags
import re
text = "Python\nJAVA\nC++"
matches_case_sensitive = re.findall(r"java", text)
matches_insensitive = re.findall(r"java", text, re.IGNORECASE)
print(f"Case sensitive: {matches_case_sensitive}\nCase insensitive: {matches_insensitive}")--OUTPUT--Case sensitive: []
Case insensitive: ['JAVA']
Flags let you tweak how a regex pattern behaves. By default, searches are case-sensitive, which is why the initial search for "java" finds nothing in the example text.
- By adding the
re.IGNORECASEflag, you instruct the function to disregard case, allowing it to match"JAVA". - This is just one of several flags. Others like
re.MULTILINEhelp with patterns that span multiple lines, whilere.DOTALLmakes the.metacharacter match newlines too.
Using lookahead and lookbehind assertions
import re
text = "Price: $50, Cost: $30, Value: $80"
# Find numbers that follow a dollar sign
prices = re.findall(r"(?<=\$)\d+", text)
print(f"Extracted prices: {prices}")--OUTPUT--Extracted prices: ['50', '30', '80']
Lookaround assertions let you match patterns based on what comes before or after your target text without including that context in the final result. They act like conditional checks for your regex, ensuring a pattern is present but not capturing it.
- The pattern
(?<=\$)is a positive lookbehind. It asserts that the characters immediately preceding the match must be a dollar sign$. - Because it's an assertion,
re.findall()returns only the digits matched by\d+, not the dollar signs. - Lookaheads work similarly but check for patterns that follow your match.
Move faster with Replit
Replit is an AI-powered development platform that transforms natural language into working applications. It's designed to help you build and deploy software directly from a description of what you want to create.
With the regex techniques you've just learned, you can use Replit Agent to turn those concepts into production-ready tools. Describe what you want to build, and the agent creates it—complete with databases, APIs, and deployment.
- Build a log file analyzer that uses capturing groups to extract timestamps, error levels, and messages from unstructured text.
- Create a data sanitization API that uses
re.sub()to automatically find and redact sensitive information like phone numbers or email addresses. - Deploy a web scraper that uses
re.findall()and lookarounds to pull all prices from a product page without including the currency symbols.
Describe your app idea, and Replit Agent will write the code, test it, and deploy it automatically.
Common errors and challenges
Even with its power, regex comes with a few common tripwires that can catch you off guard if you're not prepared.
A frequent source of errors is forgetting that re.search() returns None when no match is found. If you immediately try to call a method like .group() on the result, your program will crash with an AttributeError. It's a good practice to always check if the match object is not None before you try to use it.
Many characters—including ., *, +, and $—have special meanings in regex. If you need to match one of these characters literally, you must escape it with a backslash (\). For example, to find a literal dot, your pattern needs to use \.; otherwise, . will match any character and give you unexpected results.
By default, quantifiers like * and + are "greedy," meaning they'll match as much text as possible. This can be tricky when you're trying to extract text between two specific markers. To fix this, you can make the quantifier "non-greedy" by adding a question mark (e.g., *? or +?), which tells the engine to match the shortest possible string instead.
Handling potential None results from re.search()
When re.search() fails to find a pattern, it returns None instead of a match object. Attempting to call .group() on this None value is a classic mistake that results in an AttributeError. See what happens when a search comes up empty.
import re
text = "Python is awesome"
match = re.search(r"Java", text)
print(f"Found match: {match.group(0)}")
Since the pattern "Java" isn't found, the match variable becomes None. The code then fails when it tries to call .group(0) on that empty result. See how to properly check for a match first.
import re
text = "Python is awesome"
match = re.search(r"Java", text)
if match:
print(f"Found match: {match.group(0)}")
else:
print("No match found")
To avoid the error, always check if the match object exists before using it. The corrected code uses a simple if match: statement, which works because a match object evaluates to true while None is false. This conditional guard prevents the AttributeError by ensuring you only call .group() when a valid match is present. It’s a crucial habit for any function that might return None, not just re.search().
Escaping special characters in patterns
Characters like $ and . are powerful in regex, but they don't match their literal counterparts by default. The $ character, for instance, anchors a pattern to the end of a string. See what happens when you try to match a price without escaping these special characters.
import re
text = "The price is $50.00"
match = re.search(r"$\d+.\d+", text)
print(f"Price: {match.group(0) if match else 'Not found'}")
The search for r"$\d+.\d+" comes up empty because the pattern expects the price to be at the very end of the string. The dot . also creates a problem. Here’s the corrected approach.
import re
text = "The price is $50.00"
match = re.search(r"\$\d+\.\d+", text)
print(f"Price: {match.group(0) if match else 'Not found'}")
To get the correct match, you must escape the special characters. The original pattern fails because $ tries to anchor the match to the end of the string, and . matches any character. By adding a backslash, you tell the regex engine to treat them as literals. The corrected pattern uses \$ to find the dollar sign and \. to find the period, successfully capturing the price. Keep this in mind whenever your pattern includes metacharacters.
Understanding greedy vs. non-greedy * quantifiers
The greedy nature of the * quantifier can cause problems when you're parsing text with repeating patterns, like HTML. It will match from the first opening tag to the very last closing tag. See what happens with this code.
import re
html = "<div>First content</div><div>Second content</div>"
match = re.search(r"<div>.*</div>", html)
print(f"Extracted: {match.group(0)}")
Because the .* pattern is greedy, it doesn't stop at the first closing tag. Instead, it captures everything from the opening <div> to the final </div>. See how to adjust the pattern for the correct result.
import re
html = "<div>First content</div><div>Second content</div>"
match = re.search(r"<div>.*?</div>", html)
print(f"Extracted: {match.group(0)}")
By adding a question mark to the quantifier, you create a non-greedy pattern: .*?. This tells the regex engine to match the shortest possible string that satisfies the pattern. Instead of capturing everything from the first <div> to the last </div>, it stops at the first closing tag it finds. This is essential when you're parsing text with repeating start and end markers, like HTML or log entries.
Real-world applications
Understanding the theory and potential pitfalls prepares you to apply regex in real-world scenarios like log analysis and data cleaning.
Extracting data from log files with re.finditer()
The re.finditer() function is particularly useful for log analysis because it returns an iterator of match objects, giving you access to both the matched text and its metadata for every single find.
import re
log_entry = "192.168.1.1 - - [21/Oct/2023:10:32:24 +0000] \"GET /index.html HTTP/1.1\" 200 4523"
ip_pattern = r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"
for match in re.finditer(ip_pattern, log_entry):
print(f"Found IP address at position {match.start()}: {match.group(0)}")
This example uses re.finditer() to efficiently scan a log entry for an IP address. The pattern r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}" is designed to find four groups of one to three digits, each separated by a literal dot.
- Unlike
re.findall(),re.finditer()returns an iterator. This is memory-efficient because it yields match objects one by one instead of storing them all in a list. - The loop then unpacks each match object to access its details, like the matched text with
.group(0)and its starting index with.start().
Cleaning and standardizing text data with regex
Regular expressions are perfect for cleaning up messy data, allowing you to turn inconsistent entries like phone numbers into a single, standard format.
import re
raw_data = ["Phone: (555) 123-4567", "Tel: 555.987.6543", "Contact: 555-789-0123"]
clean_numbers = []
for entry in raw_data:
number = re.search(r"(\d{3})[)\s.-]+(\d{3})[.\s-]+(\d{4})", entry)
if number:
standardized = f"{number.group(1)}-{number.group(2)}-{number.group(3)}"
clean_numbers.append(standardized)
print(f"Standardized phone numbers: {clean_numbers}")
This script loops through a list of raw text entries to extract and reformat phone numbers. Its power comes from a flexible regex pattern that handles multiple formats at once.
- Three capturing groups,
(\d{3}), isolate the area code, prefix, and line number. - The character set
[)\s.-]+acts as a wildcard for separators, matching any combination of parentheses, spaces, dots, or hyphens between the number segments.
The captured digits are then reassembled into a single, standardized format, showing how you can parse structured data from messy strings.
Get started with Replit
Turn your regex skills into a real tool with Replit Agent. Describe what you want to build, like “a log parser that extracts IP addresses and status codes” or “a script that validates a list of emails.”
The agent will write the code, test for errors, and deploy your application automatically. Start building with Replit and bring your idea to life.
Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.
Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.



