How to pivot a dataframe in Python
Learn how to pivot a dataframe in Python. This guide covers different methods, real-world applications, and common errors to help you master it.

A dataframe pivot in Python is a crucial technique to reshape data. You can transform data from a long format to a wide format, which makes analysis much easier.
In this article, you'll learn various techniques using functions like pivot(). You will also find practical tips, real-world applications, and common debugging advice to master dataframe transformations.
Basic pivoting with pivot()
import pandas as pd
df = pd.DataFrame({
'date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
'product': ['apple', 'banana', 'apple', 'banana'],
'sales': [10, 15, 20, 25]
})
pivoted = df.pivot(index='date', columns='product', values='sales')
print(pivoted)--OUTPUT--product apple banana
date
2023-01-01 10 15
2023-01-02 20 25
The pivot() function is the simplest way to restructure your data. It takes the unique values from one column and turns them into new columns, effectively reshaping the table from a long to a wide format.
Here’s how the arguments work together:
index='date'sets each unique date as a new row.columns='product'creates new columns for each unique product.values='sales'populates the table with the sales data.
This transformation makes it much easier to compare product sales side by side for any given day.
Reshaping with built-in pandas functions
While pivot() handles the basics, pandas offers more powerful functions like pivot_table(), unstack(), and melt() for tackling aggregation and advanced reshaping needs.
Using pivot_table() for aggregation
import pandas as pd
df = pd.DataFrame({
'date': ['2023-01-01', '2023-01-01', '2023-01-01', '2023-01-02'],
'product': ['apple', 'banana', 'apple', 'banana'],
'sales': [10, 15, 5, 25]
})
pivot_table = df.pivot_table(index='date', columns='product', values='sales', aggfunc='sum')
print(pivot_table)--OUTPUT--product apple banana
date
2023-01-01 15 15
2023-01-02 NaN 25
When your data has duplicate entries for the index and column pairs, pivot() will fail. That’s where pivot_table() comes in—it’s designed to handle this by aggregating data. You can specify how to combine these duplicate values using the aggfunc argument.
aggfunc='sum'tells pandas to add the values. In this case, it sums the two 'apple' sales on the first day to get 15. Other common functions include'mean'and'count'.
You'll also notice NaN in the output. This simply indicates missing data, as there were no 'apple' sales recorded on the second day.
Using unstack() with multi-index DataFrames
import pandas as pd
df = pd.DataFrame({
'date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
'product': ['apple', 'banana', 'apple', 'banana'],
'sales': [10, 15, 20, 25]
})
multi_index = df.set_index(['date', 'product'])['sales']
unstacked = multi_index.unstack()
print(unstacked)--OUTPUT--product apple banana
date
2023-01-01 10 15
2023-01-02 20 25
The unstack() function is your tool for pivoting a DataFrame that already has multiple index levels. It works by taking one of the index levels and turning it into columns. In the example, you first create a hierarchical index from the date and product columns using set_index().
- Calling
unstack()then pivots the innermost index level—in this case,product. - Each unique item in the
productindex becomes a new column, effectively reshaping the data.
Converting wide to long format with melt()
import pandas as pd
wide_df = pd.DataFrame({
'date': ['2023-01-01', '2023-01-02'],
'apple': [10, 20],
'banana': [15, 25]
})
melted = pd.melt(wide_df, id_vars='date', var_name='product', value_name='sales')
print(melted)--OUTPUT--date product sales
0 2023-01-01 apple 10
1 2023-01-02 apple 20
2 2023-01-01 banana 15
3 2023-01-02 banana 25
The melt() function is the inverse of pivot(), transforming data from a wide to a long format. This is incredibly useful when you need to unpivot columns back into rows for certain types of analysis or plotting.
id_vars='date'specifies the column to keep as an identifier.var_name='product'creates a new column containing the names of the unpivoted columns—in this case, 'apple' and 'banana'.value_name='sales'creates a column to hold the values from those unpivoted columns.
Advanced pivoting techniques
Beyond the standard functions, you can also implement custom aggregation logic, reshape data with multiple value columns, or use crosstab() for frequency counts.
Pivoting with custom aggregation functions
import pandas as pd
import numpy as np
df = pd.DataFrame({
'date': ['2023-01-01', '2023-01-01', '2023-01-01', '2023-01-02'],
'product': ['apple', 'banana', 'apple', 'banana'],
'sales': [10, 15, 5, 25]
})
custom_pivot = df.pivot_table(
index='date',
columns='product',
values='sales',
aggfunc={'sales': [np.sum, np.mean, np.max]}
)
print(custom_pivot)--OUTPUT--sales
sum mean max
product apple banana apple banana apple banana
date
2023-01-01 15 15 7.5 15 10 15
2023-01-02 NaN 25 NaN 25 NaN 25
You're not limited to single aggregation functions. The pivot_table() function lets you apply multiple calculations at once by passing a dictionary to the aggfunc argument. This is perfect for getting a deeper summary of your data in one go.
- The dictionary key,
'sales', targets the column you want to aggregate. - The value is a list of functions—like
np.sum,np.mean, andnp.max—that you want to apply.
The resulting pivot table has multi-level columns, showing each calculation for every product on each date.
Handling multiple value columns
import pandas as pd
df = pd.DataFrame({
'date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
'product': ['apple', 'banana', 'apple', 'banana'],
'sales': [10, 15, 20, 25],
'quantity': [100, 80, 120, 90]
})
multi_val_pivot = df.pivot_table(index='date', columns='product', values=['sales', 'quantity'])
print(multi_val_pivot)--OUTPUT--sales quantity
product apple banana apple banana
date
2023-01-01 10 15 100 80
2023-01-02 20 25 120 90
You aren't limited to pivoting a single value column. The pivot_table() function can reshape a DataFrame based on multiple metrics, such as both sales and quantity, all at once.
- To do this, simply pass a list of column names to the
valuesargument, like['sales', 'quantity']. - The result is a pivot table with multi-level columns. The top level represents the value columns (
sales,quantity), and the second level shows the pivoted categories (product).
Using crosstab() for frequency analysis
import pandas as pd
df = pd.DataFrame({
'date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02', '2023-01-02'],
'product': ['apple', 'banana', 'apple', 'banana', 'apple'],
'region': ['east', 'west', 'east', 'west', 'west']
})
cross_tab = pd.crosstab(
index=df['date'],
columns=df['product'],
values=df['region'].map({'east': 1, 'west': 2}),
aggfunc='sum'
)
print(cross_tab)--OUTPUT--product apple banana
date
2023-01-01 1 2
2023-01-02 3 2
The crosstab() function is a specialized tool for creating a cross-tabulation, which is essentially a frequency table. While it can simply count occurrences, you can also use it for more complex aggregations—making it a convenient alternative to pivot_table() when you're working with Series or arrays directly.
- The
indexandcolumnsarguments define the table's structure, using data from thedateandproductcolumns. - Here,
valuesare taken from theregioncolumn after converting strings to numbers withmap(). - Finally,
aggfunc='sum'computes the total for each date and product combination.
Move faster with Replit
Replit is an AI-powered development platform that transforms natural language into working applications. Describe what you want to build, and Replit Agent creates it—complete with databases, APIs, and deployment.
For the pivoting techniques covered in this article, Replit Agent can turn them into production applications:
- Build an interactive sales dashboard that uses
pivot_table()to display monthly revenue by product category. - Create a data transformation utility that converts wide-format survey results into a long format using
melt(). - Deploy a market analysis tool that generates frequency tables from user behavior data with
crosstab().
Describe your app idea, and Replit Agent can write the code, test it, and get it running for you.
Common errors and challenges
Pivoting DataFrames can be tricky; here are solutions to some of the most common roadblocks you'll encounter when reshaping your data.
One of the most frequent errors occurs when using pivot() on data with duplicate entries for your chosen index and columns. The function will raise a ValueError because it doesn't know how to assign multiple values to a single cell. The fix is to switch to pivot_table(), which is built to handle this by aggregating the duplicates with a function you specify in the aggfunc argument.
After pivoting, you'll often find your new DataFrame is sprinkled with NaN (Not a Number) values. This isn't an error—it just means there was no original data for that specific row and column combination. However, these missing values can disrupt calculations, so you'll likely want to handle them. You can use the fillna(0) method to replace them with zeros, which is often a sensible default for numerical data like sales figures.
If your numbers in a pivot_table() look off, the aggregation function is the first place to check. A common mistake is forgetting to set the aggfunc argument, which causes pandas to default to calculating the mean of the values. If you were expecting a total, this will give you incorrect results. Always be explicit with your aggregation—use aggfunc='sum' for totals or 'count' for frequencies to ensure your analysis is accurate.
Handling duplicate values when using pivot()
The pivot() function is strict, requiring unique index and column combinations. If your data has duplicate entries, like two sales records for the same product on one day, the function can't decide which value to use and will fail. The following code illustrates this common scenario.
import pandas as pd
df = pd.DataFrame({
'date': ['2023-01-01', '2023-01-01', '2023-01-01', '2023-01-02'],
'product': ['apple', 'banana', 'apple', 'banana'], # Duplicate (date, product) pair
'sales': [10, 15, 5, 25]
})
# This will raise a ValueError
pivoted = df.pivot(index='date', columns='product', values='sales')
print(pivoted)
The code fails because the pivot() function encounters two sales values, 10 and 5, for 'apple' on the same date. It can't assign both to one cell. The following example demonstrates the correct way to handle this.
import pandas as pd
df = pd.DataFrame({
'date': ['2023-01-01', '2023-01-01', '2023-01-01', '2023-01-02'],
'product': ['apple', 'banana', 'apple', 'banana'],
'sales': [10, 15, 5, 25]
})
# Use pivot_table instead with an aggregation function
pivoted = df.pivot_table(index='date', columns='product', values='sales', aggfunc='sum')
print(pivoted)
The fix is to switch to pivot_table(), which is designed to handle duplicates. You just need to tell it how to combine the values.
- The
aggfunc='sum'argument instructs pandas to add the duplicate entries together. In this example, it correctly sums the two 'apple' sales (10 and 5) to get 15.
This is the go-to method anytime you suspect your data might have multiple records for the same index and column pair.
Dealing with missing values in pivoted data
Pivoting often leaves you with NaN values where data is missing. This isn't an error, but it can disrupt calculations like sums or averages, leading to unexpected results in your analysis. The following code demonstrates how these missing entries can cause problems.
import pandas as pd
df = pd.DataFrame({
'date': ['2023-01-01', '2023-01-01', '2023-01-02'],
'product': ['apple', 'banana', 'apple'], # Missing banana for 2023-01-02
'sales': [10, 15, 20]
})
pivoted = df.pivot(index='date', columns='product', values='sales')
# Calculations will be affected by NaN values
total_by_date = pivoted.sum(axis=1)
print(pivoted)
print("Total sales by date:", total_by_date)
The code calculates daily totals, but the NaN value for 'banana' on the second day results in an incomplete sum, which can skew your analysis. The following example demonstrates how to get an accurate total.
import pandas as pd
df = pd.DataFrame({
'date': ['2023-01-01', '2023-01-01', '2023-01-02'],
'product': ['apple', 'banana', 'apple'],
'sales': [10, 15, 20]
})
pivoted = df.pivot(index='date', columns='product', values='sales')
# Fill NaN values with 0 before calculations
pivoted_filled = pivoted.fillna(0)
total_by_date = pivoted_filled.sum(axis=1)
print(pivoted)
print("Total sales by date:", total_by_date)
The fix is to chain the fillna(0) method to your pivoted DataFrame. This simple step swaps out any NaN values for zeros, which is essential before you start running calculations.
- By replacing the missing data, you ensure that functions like
sum()give you an accurate total instead of a result skewed byNaN.
This is a common and necessary cleanup step whenever you pivot data that might have gaps, especially with numerical values.
Troubleshooting incorrect aggregation with pivot_table()
It's a common pitfall with pivot_table() to get misleading results when a single aggregation function is applied to columns that need different calculations. For example, you might want to sum quantities but average prices. The following code demonstrates this exact issue.
import pandas as pd
df = pd.DataFrame({
'date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
'product': ['apple', 'apple', 'banana', 'banana'],
'quantity': [5, 10, 15, 20],
'price': [1.0, 1.2, 0.8, 0.9]
})
# Incorrect: Using mean for both quantity and price
revenue_table = df.pivot_table(
index='date',
columns='product',
values=['quantity', 'price'],
aggfunc='mean'
)
print(revenue_table)
The code applies aggfunc='mean' to both the quantity and price columns. This incorrectly calculates the average quantity instead of the total, skewing your results. The next example demonstrates the proper way to handle this.
import pandas as pd
df = pd.DataFrame({
'date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
'product': ['apple', 'apple', 'banana', 'banana'],
'quantity': [5, 10, 15, 20],
'price': [1.0, 1.2, 0.8, 0.9]
})
# Correct: Using sum for quantity and mean for price
revenue_table = df.pivot_table(
index='date',
columns='product',
values=['quantity', 'price'],
aggfunc={'quantity': 'sum', 'price': 'mean'}
)
print(revenue_table)
The fix is to pass a dictionary to the aggfunc argument. This lets you assign a specific calculation to each value column, ensuring your results are accurate. This is crucial when you're summarizing columns that need different logic, like totaling quantities while averaging prices.
- The dictionary keys specify the columns (e.g.,
'quantity'). - The values define the function to apply (e.g.,
'sum').
Real-world applications
Pivoting moves beyond a technical exercise when you apply it to real-world scenarios, transforming raw data into clear business intelligence.
Analyzing sales performance by region with pivot()
The pivot() function is ideal for transforming raw sales data into a clear summary, making it easy to compare quarterly revenue across different regions.
import pandas as pd
# Sales data across regions and quarters
sales_data = {
'region': ['North', 'South', 'East', 'West', 'North', 'South', 'East', 'West'],
'quarter': ['Q1', 'Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2', 'Q2'],
'revenue': [120, 105, 95, 115, 125, 110, 100, 120]
}
df = pd.DataFrame(sales_data)
regional_performance = df.pivot(index='region', columns='quarter', values='revenue')
regional_performance['Total'] = regional_performance.sum(axis=1)
print(regional_performance)
This code transforms a long list of sales records into a wide, easy-to-read summary table. The pivot() function is the key, reorganizing the DataFrame so you can see performance at a glance.
- The
index='region'argument sets each region as a distinct row. columns='quarter'turns unique quarters into new columns.values='revenue'populates the new cells with the corresponding revenue data.
After pivoting, a new Total column is calculated by summing the revenues across each row with sum(axis=1).
Creating a customer segment analysis with pivot_table()
With pivot_table(), you can segment customer data to analyze key metrics like average spending and purchase frequency across different demographic groups.
import pandas as pd
# Customer purchase data with demographics
purchase_data = {
'customer_id': [1, 2, 3, 4, 5, 6, 7, 8],
'age_group': ['18-25', '26-35', '18-25', '36-45', '26-35', '36-45', '18-25', '26-35'],
'gender': ['M', 'F', 'F', 'M', 'M', 'F', 'M', 'F'],
'purchase_amount': [120, 195, 105, 240, 180, 210, 90, 170]
}
df = pd.DataFrame(purchase_data)
segment_analysis = df.pivot_table(
index='age_group',
columns='gender',
values='purchase_amount',
aggfunc=['mean', 'count']
)
print(segment_analysis)
This code uses the pivot_table() function to transform raw purchase data into a summary table. It organizes the data by customer segments, allowing for a multi-faceted analysis in a single view.
- The
index='age_group'andcolumns='gender'arguments set up the rows and columns of the new table. values='purchase_amount'specifies which data to analyze.aggfunc=['mean', 'count']is the key—it calculates both the average purchase amount and the number of transactions for each age and gender combination, creating a detailed, multi-level report.
Get started with Replit
Put your knowledge into practice by building a real tool. Tell Replit Agent to “build a dashboard that pivots sales data by region” or “create a utility that converts wide-format survey results into a long format.”
The agent will write the code, test for errors, and deploy your application for you. Start building with Replit.
Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.
Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.


.png)
.png)