How to pivot a dataframe in Python

Learn how to pivot a dataframe in Python. This guide covers different methods, real-world applications, and common errors to help you master it.

Published on:

Tue

Mar 17, 2026

Updated on:

Tue

Mar 24, 2026

The Replit Team

ON THIS PAGE

Example H2

A dataframe pivot in Python is a crucial technique to reshape data. You can transform data from a long format to a wide format, which makes analysis much easier.

In this article, you'll learn various techniques using functions like pivot(). You will also find practical tips, real-world applications, and common debugging advice to master dataframe transformations.

Basic pivoting with `pivot()`

import pandas as pd df = pd.DataFrame({ 'date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'], 'product': ['apple', 'banana', 'apple', 'banana'], 'sales': [10, 15, 20, 25] }) pivoted = df.pivot(index='date', columns='product', values='sales') print(pivoted)--OUTPUT--product apple banana date 2023-01-01 10 15 2023-01-02 20 25

The pivot() function is the simplest way to restructure your data. It takes the unique values from one column and turns them into new columns, effectively reshaping the table from a long to a wide format.

Here’s how the arguments work together:

index='date' sets each unique date as a new row.
columns='product' creates new columns for each unique product.
values='sales' populates the table with the sales data.

This transformation makes it much easier to compare product sales side by side for any given day.

Reshaping with built-in pandas functions

While pivot() handles the basics, pandas offers more powerful functions like pivot_table(), unstack(), and melt() for tackling aggregation and advanced reshaping needs.

Using `pivot_table()` for aggregation

import pandas as pd df = pd.DataFrame({ 'date': ['2023-01-01', '2023-01-01', '2023-01-01', '2023-01-02'], 'product': ['apple', 'banana', 'apple', 'banana'], 'sales': [10, 15, 5, 25] }) pivot_table = df.pivot_table(index='date', columns='product', values='sales', aggfunc='sum') print(pivot_table)--OUTPUT--product apple banana date 2023-01-01 15 15 2023-01-02 NaN 25

When your data has duplicate entries for the index and column pairs, pivot() will fail. That’s where pivot_table() comes in—it’s designed to handle this by aggregating data. You can specify how to combine these duplicate values using the aggfunc argument.

aggfunc='sum' tells pandas to add the values. In this case, it sums the two 'apple' sales on the first day to get 15. Other common functions include 'mean' and 'count'.

You'll also notice NaN in the output. This simply indicates missing data, as there were no 'apple' sales recorded on the second day.

Using `unstack()` with multi-index DataFrames

import pandas as pd df = pd.DataFrame({ 'date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'], 'product': ['apple', 'banana', 'apple', 'banana'], 'sales': [10, 15, 20, 25] }) multi_index = df.set_index(['date', 'product'])['sales'] unstacked = multi_index.unstack() print(unstacked)--OUTPUT--product apple banana date 2023-01-01 10 15 2023-01-02 20 25

The unstack() function is your tool for pivoting a DataFrame that already has multiple index levels. It works by taking one of the index levels and turning it into columns. In the example, you first create a hierarchical index from the date and product columns using set_index().

Calling unstack() then pivots the innermost index level—in this case, product.
Each unique item in the product index becomes a new column, effectively reshaping the data.

Converting wide to long format with `melt()`

import pandas as pd wide_df = pd.DataFrame({ 'date': ['2023-01-01', '2023-01-02'], 'apple': [10, 20], 'banana': [15, 25] }) melted = pd.melt(wide_df, id_vars='date', var_name='product', value_name='sales') print(melted)--OUTPUT--date product sales 0 2023-01-01 apple 10 1 2023-01-02 apple 20 2 2023-01-01 banana 15 3 2023-01-02 banana 25

The melt() function is the inverse of pivot(), transforming data from a wide to a long format. This is incredibly useful when you need to unpivot columns back into rows for certain types of analysis or plotting.

id_vars='date' specifies the column to keep as an identifier.
var_name='product' creates a new column containing the names of the unpivoted columns—in this case, 'apple' and 'banana'.
value_name='sales' creates a column to hold the values from those unpivoted columns.

Advanced pivoting techniques

Beyond the standard functions, you can also implement custom aggregation logic, reshape data with multiple value columns, or use crosstab() for frequency counts.

Pivoting with custom aggregation functions

import pandas as pd import numpy as np df = pd.DataFrame({ 'date': ['2023-01-01', '2023-01-01', '2023-01-01', '2023-01-02'], 'product': ['apple', 'banana', 'apple', 'banana'], 'sales': [10, 15, 5, 25] }) custom_pivot = df.pivot_table( index='date', columns='product', values='sales', aggfunc={'sales': [np.sum, np.mean, np.max]} ) print(custom_pivot)--OUTPUT--sales sum mean max product apple banana apple banana apple banana date 2023-01-01 15 15 7.5 15 10 15 2023-01-02 NaN 25 NaN 25 NaN 25

You're not limited to single aggregation functions. The pivot_table() function lets you apply multiple calculations at once by passing a dictionary to the aggfunc argument. This is perfect for getting a deeper summary of your data in one go.

The dictionary key, 'sales', targets the column you want to aggregate.
The value is a list of functions—like np.sum, np.mean, and np.max—that you want to apply.

The resulting pivot table has multi-level columns, showing each calculation for every product on each date.

Handling multiple value columns

import pandas as pd df = pd.DataFrame({ 'date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'], 'product': ['apple', 'banana', 'apple', 'banana'], 'sales': [10, 15, 20, 25], 'quantity': [100, 80, 120, 90] }) multi_val_pivot = df.pivot_table(index='date', columns='product', values=['sales', 'quantity']) print(multi_val_pivot)--OUTPUT--sales quantity product apple banana apple banana date 2023-01-01 10 15 100 80 2023-01-02 20 25 120 90

You aren't limited to pivoting a single value column. The pivot_table() function can reshape a DataFrame based on multiple metrics, such as both sales and quantity, all at once.

To do this, simply pass a list of column names to the values argument, like ['sales', 'quantity'].
The result is a pivot table with multi-level columns. The top level represents the value columns (sales, quantity), and the second level shows the pivoted categories (product).

Using `crosstab()` for frequency analysis

import pandas as pd df = pd.DataFrame({ 'date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02', '2023-01-02'], 'product': ['apple', 'banana', 'apple', 'banana', 'apple'], 'region': ['east', 'west', 'east', 'west', 'west'] }) cross_tab = pd.crosstab( index=df['date'], columns=df['product'], values=df['region'].map({'east': 1, 'west': 2}), aggfunc='sum' ) print(cross_tab)--OUTPUT--product apple banana date 2023-01-01 1 2 2023-01-02 3 2

The crosstab() function is a specialized tool for creating a cross-tabulation, which is essentially a frequency table. While it can simply count occurrences, you can also use it for more complex aggregations—making it a convenient alternative to pivot_table() when you're working with Series or arrays directly.

The index and columns arguments define the table's structure, using data from the date and product columns.
Here, values are taken from the region column after converting strings to numbers with map().
Finally, aggfunc='sum' computes the total for each date and product combination.

Move faster with Replit

Replit is an AI-powered development platform that transforms natural language into working applications. Describe what you want to build, and Replit Agent creates it—complete with databases, APIs, and deployment.

For the pivoting techniques covered in this article, Replit Agent can turn them into production applications:

Build an interactive sales dashboard that uses pivot_table() to display monthly revenue by product category.
Create a data transformation utility that converts wide-format survey results into a long format using melt().
Deploy a market analysis tool that generates frequency tables from user behavior data with crosstab().

Describe your app idea, and Replit Agent can write the code, test it, and get it running for you.

Common errors and challenges

Pivoting DataFrames can be tricky; here are solutions to some of the most common roadblocks you'll encounter when reshaping your data.

One of the most frequent errors occurs when using pivot() on data with duplicate entries for your chosen index and columns. The function will raise a ValueError because it doesn't know how to assign multiple values to a single cell. The fix is to switch to pivot_table(), which is built to handle this by aggregating the duplicates with a function you specify in the aggfunc argument.

After pivoting, you'll often find your new DataFrame is sprinkled with NaN (Not a Number) values. This isn't an error—it just means there was no original data for that specific row and column combination. However, these missing values can disrupt calculations, so you'll likely want to handle them. You can use the fillna(0) method to replace them with zeros, which is often a sensible default for numerical data like sales figures.

If your numbers in a pivot_table() look off, the aggregation function is the first place to check. A common mistake is forgetting to set the aggfunc argument, which causes pandas to default to calculating the mean of the values. If you were expecting a total, this will give you incorrect results. Always be explicit with your aggregation—use aggfunc='sum' for totals or 'count' for frequencies to ensure your analysis is accurate.

Handling duplicate values when using `pivot()`

The pivot() function is strict, requiring unique index and column combinations. If your data has duplicate entries, like two sales records for the same product on one day, the function can't decide which value to use and will fail. The following code illustrates this common scenario.

import pandas as pd df = pd.DataFrame({ 'date': ['2023-01-01', '2023-01-01', '2023-01-01', '2023-01-02'], 'product': ['apple', 'banana', 'apple', 'banana'], # Duplicate (date, product) pair 'sales': [10, 15, 5, 25] }) # This will raise a ValueError pivoted = df.pivot(index='date', columns='product', values='sales') print(pivoted)

The code fails because the pivot() function encounters two sales values, 10 and 5, for 'apple' on the same date. It can't assign both to one cell. The following example demonstrates the correct way to handle this.

import pandas as pd df = pd.DataFrame({ 'date': ['2023-01-01', '2023-01-01', '2023-01-01', '2023-01-02'], 'product': ['apple', 'banana', 'apple', 'banana'], 'sales': [10, 15, 5, 25] }) # Use pivot_table instead with an aggregation function pivoted = df.pivot_table(index='date', columns='product', values='sales', aggfunc='sum') print(pivoted)

The fix is to switch to pivot_table(), which is designed to handle duplicates. You just need to tell it how to combine the values.

The aggfunc='sum' argument instructs pandas to add the duplicate entries together. In this example, it correctly sums the two 'apple' sales (10 and 5) to get 15.

This is the go-to method anytime you suspect your data might have multiple records for the same index and column pair.

Dealing with missing values in pivoted data

Pivoting often leaves you with NaN values where data is missing. This isn't an error, but it can disrupt calculations like sums or averages, leading to unexpected results in your analysis. The following code demonstrates how these missing entries can cause problems.

import pandas as pd df = pd.DataFrame({ 'date': ['2023-01-01', '2023-01-01', '2023-01-02'], 'product': ['apple', 'banana', 'apple'], # Missing banana for 2023-01-02 'sales': [10, 15, 20] }) pivoted = df.pivot(index='date', columns='product', values='sales') # Calculations will be affected by NaN values total_by_date = pivoted.sum(axis=1) print(pivoted) print("Total sales by date:", total_by_date)

The code calculates daily totals, but the NaN value for 'banana' on the second day results in an incomplete sum, which can skew your analysis. The following example demonstrates how to get an accurate total.

import pandas as pd df = pd.DataFrame({ 'date': ['2023-01-01', '2023-01-01', '2023-01-02'], 'product': ['apple', 'banana', 'apple'], 'sales': [10, 15, 20] }) pivoted = df.pivot(index='date', columns='product', values='sales') # Fill NaN values with 0 before calculations pivoted_filled = pivoted.fillna(0) total_by_date = pivoted_filled.sum(axis=1) print(pivoted) print("Total sales by date:", total_by_date)

The fix is to chain the fillna(0) method to your pivoted DataFrame. This simple step swaps out any NaN values for zeros, which is essential before you start running calculations.

By replacing the missing data, you ensure that functions like sum() give you an accurate total instead of a result skewed by NaN.

This is a common and necessary cleanup step whenever you pivot data that might have gaps, especially with numerical values.

Troubleshooting incorrect aggregation with `pivot_table()`

It's a common pitfall with pivot_table() to get misleading results when a single aggregation function is applied to columns that need different calculations. For example, you might want to sum quantities but average prices. The following code demonstrates this exact issue.

import pandas as pd df = pd.DataFrame({ 'date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'], 'product': ['apple', 'apple', 'banana', 'banana'], 'quantity': [5, 10, 15, 20], 'price': [1.0, 1.2, 0.8, 0.9] }) # Incorrect: Using mean for both quantity and price revenue_table = df.pivot_table( index='date', columns='product', values=['quantity', 'price'], aggfunc='mean' ) print(revenue_table)

The code applies aggfunc='mean' to both the quantity and price columns. This incorrectly calculates the average quantity instead of the total, skewing your results. The next example demonstrates the proper way to handle this.

import pandas as pd df = pd.DataFrame({ 'date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'], 'product': ['apple', 'apple', 'banana', 'banana'], 'quantity': [5, 10, 15, 20], 'price': [1.0, 1.2, 0.8, 0.9] }) # Correct: Using sum for quantity and mean for price revenue_table = df.pivot_table( index='date', columns='product', values=['quantity', 'price'], aggfunc={'quantity': 'sum', 'price': 'mean'} ) print(revenue_table)

The fix is to pass a dictionary to the aggfunc argument. This lets you assign a specific calculation to each value column, ensuring your results are accurate. This is crucial when you're summarizing columns that need different logic, like totaling quantities while averaging prices.

The dictionary keys specify the columns (e.g., 'quantity').
The values define the function to apply (e.g., 'sum').

Real-world applications

Pivoting moves beyond a technical exercise when you apply it to real-world scenarios, transforming raw data into clear business intelligence.

Analyzing sales performance by region with `pivot()`

The pivot() function is ideal for transforming raw sales data into a clear summary, making it easy to compare quarterly revenue across different regions.

import pandas as pd # Sales data across regions and quarters sales_data = { 'region': ['North', 'South', 'East', 'West', 'North', 'South', 'East', 'West'], 'quarter': ['Q1', 'Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2', 'Q2'], 'revenue': [120, 105, 95, 115, 125, 110, 100, 120] } df = pd.DataFrame(sales_data) regional_performance = df.pivot(index='region', columns='quarter', values='revenue') regional_performance['Total'] = regional_performance.sum(axis=1) print(regional_performance)

This code transforms a long list of sales records into a wide, easy-to-read summary table. The pivot() function is the key, reorganizing the DataFrame so you can see performance at a glance.

The index='region' argument sets each region as a distinct row.
columns='quarter' turns unique quarters into new columns.
values='revenue' populates the new cells with the corresponding revenue data.

After pivoting, a new Total column is calculated by summing the revenues across each row with sum(axis=1).

Creating a customer segment analysis with `pivot_table()`

With pivot_table(), you can segment customer data to analyze key metrics like average spending and purchase frequency across different demographic groups.

import pandas as pd # Customer purchase data with demographics purchase_data = { 'customer_id': [1, 2, 3, 4, 5, 6, 7, 8], 'age_group': ['18-25', '26-35', '18-25', '36-45', '26-35', '36-45', '18-25', '26-35'], 'gender': ['M', 'F', 'F', 'M', 'M', 'F', 'M', 'F'], 'purchase_amount': [120, 195, 105, 240, 180, 210, 90, 170] } df = pd.DataFrame(purchase_data) segment_analysis = df.pivot_table( index='age_group', columns='gender', values='purchase_amount', aggfunc=['mean', 'count'] ) print(segment_analysis)

This code uses the pivot_table() function to transform raw purchase data into a summary table. It organizes the data by customer segments, allowing for a multi-faceted analysis in a single view.

The index='age_group' and columns='gender' arguments set up the rows and columns of the new table.
values='purchase_amount' specifies which data to analyze.
aggfunc=['mean', 'count'] is the key—it calculates both the average purchase amount and the number of transactions for each age and gender combination, creating a detailed, multi-level report.

Get started with Replit

Put your knowledge into practice by building a real tool. Tell Replit Agent to “build a dashboard that pivots sales data by region” or “create a utility that converts wide-format survey results into a long format.”

The agent will write the code, test for errors, and deploy your application for you. Start building with Replit.

Get started free

Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

Get started free

Get started for free

Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

Get started for free