How to use groupby() in Python

Learn how to use Python's groupby function. This guide shows you different methods, tips, real-world applications, and error debugging.

Published on:

Wed

Mar 25, 2026

Updated on:

Thu

Mar 26, 2026

The Replit Team

ON THIS PAGE

Example H2

Python's groupby function is a powerful tool for data analysis. It splits data into groups based on specific criteria, which simplifies the aggregation and transformation of information.

In this article, you'll learn essential groupby techniques and practical tips. We'll cover real-world applications and debugging advice to help you master this function for your data projects.

Basic usage of `groupby` from `itertools`

from itertools import groupby data = [1, 1, 1, 2, 2, 3, 4, 4, 4, 1] for key, group in groupby(data): print(f"{key}: {list(group)}")--OUTPUT--1: [1, 1, 1] 2: [2, 2] 3: [3] 4: [4, 4, 4] 1: [1]

The itertools.groupby function works by scanning for consecutive identical elements. It's important to know that for a complete grouping, you'll need to sort your data first. Since the example data list isn't sorted, groupby creates a new group each time the key changes.

This is why the key 1 appears twice in the output:

The function first groups the initial three 1s.
It then moves on to group the 2s, 3s, and 4s.
When it encounters the final 1 at the end of the list, it treats it as a new, separate group because it's not adjacent to the others.

Essential `groupby` techniques

With the fundamentals covered, you can make groupby even more powerful by using custom keys, counting items, and working with multiple data fields.

Using `groupby` with custom keys

from itertools import groupby words = ['apple', 'ant', 'banana', 'bear', 'cherry'] for key, group in groupby(sorted(words), key=lambda x: x[0]): print(f"{key}: {list(group)}")--OUTPUT--a: ['ant', 'apple'] b: ['banana', 'bear'] c: ['cherry']

You can customize grouping logic with a key function. In this example, key=lambda x: x[0] tells groupby to group words by their first letter. It’s a powerful way to categorize data beyond simple equality.

The lambda function extracts the first character of each word, which becomes the grouping key.
This results in groups for ‘a’, ‘b’, and ‘c’, containing all words that start with those letters from the sorted list.

Grouping and counting with `groupby`

from itertools import groupby data = [1, 2, 2, 3, 3, 3, 4, 4, 1, 1] for key, group in groupby(sorted(data)): group_list = list(group) print(f"{key}: {len(group_list)} occurrences")--OUTPUT--1: 3 occurrences 2: 2 occurrences 3: 3 occurrences 4: 2 occurrences

You can easily count item occurrences by combining groupby with len(). Since groupby works on consecutive elements, you first need to sort the data to group all identical items together.

The group object returned by groupby is an iterator.
You can convert this iterator into a list, like list(group).
Then, simply get its length using len() to find the count for each key.

Working with multiple fields in `groupby`

from itertools import groupby people = [ ('John', 'Engineer'), ('Lisa', 'Doctor'), ('Mike', 'Engineer'), ('Sarah', 'Teacher') ] people.sort(key=lambda x: x[1]) # Sort by profession for profession, group in groupby(people, key=lambda x: x[1]): names = [name for name, _ in group] print(f"{profession}: {names}")--OUTPUT--Doctor: ['Lisa'] Engineer: ['John', 'Mike'] Teacher: ['Sarah']

You can use groupby to organize more complex data, like a list of tuples. This example groups a list of people by their profession. The key is to first sort the data by the field you want to group—in this case, the profession.

The key=lambda x: x[1] function tells both sort and groupby to focus on the second element of each tuple.
Once grouped, a list comprehension, [name for name, _ in group], efficiently extracts just the names from the tuples in each group.

Advanced `groupby` applications

Building on these essential techniques, you can solve more advanced problems by combining groupby with other functions and optimizing its use with complex data.

Combining `groupby` with other itertools functions

from itertools import groupby, chain groups = [[1, 1, 2], [2, 3, 3], [4, 5, 5]] flat_data = chain.from_iterable(groups) for key, group in groupby(sorted(flat_data)): print(f"{key}: {list(group)}")--OUTPUT--1: [1, 1] 2: [2, 2] 3: [3, 3] 4: [4] 5: [5, 5]

You can make groupby more versatile by pairing it with other itertools functions. Here, chain.from_iterable() is used to flatten the nested groups list into a single, continuous stream of data. This prepares the numbers for a unified grouping operation.

The chain.from_iterable() function effectively unpacks the sublists.
Next, sorted() arranges the flattened data, which is essential for groupby to work correctly.
Finally, groupby processes this single, sorted sequence to group all identical numbers from the original nested structure.

Using `groupby` with complex data structures

from itertools import groupby from collections import namedtuple Person = namedtuple('Person', ['name', 'age', 'city']) people = [ Person('Alice', 30, 'New York'), Person('Bob', 25, 'Boston'), Person('Charlie', 30, 'Chicago'), Person('Dave', 25, 'Denver') ] for age, group in groupby(sorted(people, key=lambda x: x.age), key=lambda x: x.age): print(f"Age {age}: {[p.name for p in group]}")--OUTPUT--Age 25: ['Bob', 'Dave'] Age 30: ['Alice', 'Charlie']

You can use groupby with more than just simple lists; it’s also perfect for organizing complex data structures like the namedtuple used here. A namedtuple improves readability by letting you access data by name (x.age) instead of by index.

The code first sorts the list of Person objects by age, which is a crucial step for grouping.
Then, groupby uses the same key (x.age) to bundle the sorted people into groups.
A list comprehension efficiently extracts just the names from each group.

Performance optimizations for `groupby` operations

from itertools import groupby import time data = list(range(1000)) * 10 # 10,000 items start = time.time() result = {k: list(g) for k, g in groupby(sorted(data))} end = time.time() print(f"Processed {len(data)} items in {end-start:.6f} seconds") print(f"Created {len(result)} groups")--OUTPUT--Processed 10000 items in 0.003500 seconds Created 1000 groups

The groupby function is highly efficient because it's an iterator, processing items one by one without loading everything into memory. As the code demonstrates, it can group 10,000 items in just a few milliseconds. This makes it a great choice for performance-sensitive applications.

Keep in mind that the main performance cost often comes from the initial sorted() call, which is necessary for grouping unsorted data.
The example uses a dictionary comprehension to store the results, which is a concise and efficient way to consume the groupby iterator.

Move faster with Replit

Replit is an AI-powered development platform that transforms natural language into working applications. With Replit Agent, you can turn the concepts from this article into production-ready apps—it builds complete applications with databases, APIs, and deployment directly from your descriptions.

For example, you could use the groupby techniques from this article to have the agent build practical tools:

A log file analyzer that groups entries by status code and counts the occurrences of each.
A simple CRM tool that organizes contacts by company or location.
A data processing utility that categorizes transactions by date and calculates daily totals.

Describe your app idea to Replit Agent, and it will write the code, test it, and deploy your application automatically, all within your browser.

Common errors and challenges

Even with its power, groupby has a few common pitfalls that can trip you up if you're not careful.

Forgetting to sort data before using `groupby`

A frequent mistake is forgetting to sort your data before grouping. The groupby function works by identifying consecutive runs of identical items, so if your data isn't sorted, it will create a new group every time the key changes. This can lead to multiple, fragmented groups for the same key, which is rarely what you want.

Consuming the `group` iterator multiple times

The group object that groupby returns is an iterator, not a list. This means you can only loop over it once. If you try to access its contents a second time, you'll find it's empty. To reuse the items in a group, you should convert the iterator to a list immediately, like with group_list = list(group).

Using case-insensitive grouping with `groupby`

By default, groupby performs case-sensitive comparisons, meaning 'Apple' and 'apple' would end up in different groups. To group them together, you need to normalize the keys. You can do this by providing a key function that converts strings to a consistent case, such as key=str.lower, ensuring all variations are treated as the same item.

Forgetting to sort data before using `groupby`

Since groupby only looks at consecutive elements, failing to sort your data first can produce unexpected results. Notice how the number 1 appears in two separate groups in the following example because the list isn't sorted before grouping.

from itertools import groupby data = [1, 2, 2, 3, 1, 1, 4, 5, 5] for key, group in groupby(data): print(f"{key}: {list(group)}")

The output splits the number 1 into two groups because the function processes [1], moves to other numbers, and only later finds another run of [1, 1]. The following code demonstrates the correct approach.

from itertools import groupby data = [1, 2, 2, 3, 1, 1, 4, 5, 5] for key, group in groupby(sorted(data)): print(f"{key}: {list(group)}")

The fix is simple: wrap your data in the sorted() function before passing it to groupby(). This pre-sorts the list, placing all identical elements next to each other.

This allows groupby() to correctly process the data and create a single, unified group for each key.
Always remember to sort first when your goal is to group all identical items from an unordered collection, preventing fragmented results.

Consuming the `group` iterator multiple times

The group object returned by groupby is an iterator, not a list. This means you can only consume its contents once. Trying to access the items again will yield an empty result, which can be a frustrating bug. See this common pitfall in action below.

from itertools import groupby data = [1, 1, 1, 2, 2, 3, 3, 3] for key, group in groupby(data): print(f"Count of {key}: {len(list(group))}") print(f"Values for {key}: {list(group)}")

The first print call exhausts the group iterator when it creates a list to get the count. Because the iterator is now empty, the second print call has nothing to display. The code below shows the right way to handle this.

from itertools import groupby data = [1, 1, 1, 2, 2, 3, 3, 3] for key, group in groupby(data): group_list = list(group) print(f"Count of {key}: {len(group_list)}") print(f"Values for {key}: {group_list}")

The fix is to convert the group iterator into a list and save it to a variable, like group_list = list(group). This captures all the items so you can reuse them without exhausting the iterator.

You can then get the count with len(group_list) and print the list itself without losing the data.
This is a crucial step whenever you need to perform more than one action on the items within a single group.

Using case-insensitive grouping with `groupby`

By default, groupby is case-sensitive, so it treats 'Apple' and 'apple' as separate items. This can lead to fragmented groups when you want to group words regardless of their case. The code below shows what happens when you don't account for this.

from itertools import groupby words = ['Apple', 'apple', 'Banana', 'banana'] for key, group in groupby(sorted(words)): print(f"{key}: {list(group)}")

The default sorted() function is case-sensitive, placing 'Apple' before 'apple'. This prevents groupby() from seeing them as a consecutive run, resulting in four separate groups. The code below shows how to handle this correctly.

from itertools import groupby words = ['Apple', 'apple', 'Banana', 'banana'] for key, group in groupby(sorted(words, key=str.lower), key=str.lower): print(f"{key}: {list(group)}")

The fix is to normalize your data by applying key=str.lower to both the sorted() function and groupby(). This forces both functions to treat uppercase and lowercase letters as identical for sorting and grouping.

The sorted() function will now place items like 'Apple' and 'apple' next to each other.
Then, groupby() uses the same lowercase key to put them into a single group.

This technique is essential when working with text data where case consistency isn't guaranteed.

Real-world applications

With the common pitfalls handled, you can apply groupby to powerful real-world tasks like analyzing server logs and processing time series data.

Analyzing server logs with `groupby`

The groupby function simplifies server log analysis by letting you group entries by date to summarize daily status codes.

from itertools import groupby # Sample log entries (date, HTTP status) logs = [ ("2023-07-01", 200), ("2023-07-01", 404), ("2023-07-01", 200), ("2023-07-02", 500), ("2023-07-02", 200) ] # Group logs by date for date, group in groupby(sorted(logs), key=lambda x: x[0]): entries = list(group) status_codes = [status for _, status in entries] print(f"Date: {date}, Status codes: {status_codes}")

This example processes log data by date. It's crucial to sort the logs list first using sorted(logs, key=lambda x: x[0]), as groupby() only works on consecutive items. The lambda function tells both sorted() and groupby() to use the date—the first element in each tuple—as the key.

Once grouped by date, the code converts the group iterator into a list named entries so the data can be reused.
A list comprehension, [status for _, status in entries], then efficiently extracts just the status codes for that day.

Processing time series data with `groupby`

You can also use groupby to process time series data, such as calculating the average daily temperature from multiple sensor readings.

from itertools import groupby from statistics import mean # Sample time series data: (date, temperature) readings = [ ("2023-07-01", 22.5), ("2023-07-01", 27.8), ("2023-07-02", 21.0), ("2023-07-02", 28.2), ("2023-07-03", 23.4), ("2023-07-03", 26.1) ] # Group by date and calculate daily statistics for date, group in groupby(readings, key=lambda x: x[0]): temps = [temp for _, temp in group] avg_temp = mean(temps) print(f"Date: {date}, Readings: {len(temps)}, Avg: {avg_temp:.1f}°C")

This example demonstrates how to aggregate time series data. Since the readings list is already sorted by date, you can directly use groupby to bundle the data. The key=lambda x: x[0] function ensures grouping happens by the first element in each tuple—the date.

Inside the loop, a list comprehension efficiently extracts all temperature readings for the current date.
The mean() function from the statistics module then calculates the daily average from these readings.

This approach is a clean and efficient way to summarize data points over specific time intervals.

Get started with Replit

Turn your knowledge of groupby into a real application. Give Replit Agent a prompt like, “build a tool that groups financial transactions by month” or “create a dashboard that categorizes user activity by day.”

The agent will write the code, handle testing, and deploy your application for you. Start building with Replit and bring your ideas to life.

Get started free

Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

Get started free

Get started for free

Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

Get started for free