How to copy a dataframe in Python
Learn to copy a dataframe in Python. Explore various methods, tips, real-world uses, and how to fix common errors.

Copying a dataframe is a core operation in Python's data analysis workflow. It's essential to preserve original data as you experiment, which prevents unintended modifications and ensures data integrity.
In this article, we'll cover essential techniques for dataframe duplication. You'll find practical tips, real-world applications, and debugging advice to help you handle data manipulations with confidence.
Using the copy() method
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df_copy = df.copy()
print(df_copy)--OUTPUT--A B
0 1 4
1 2 5
2 3 6
The copy() method is the standard and most explicit way to duplicate a dataframe in pandas. As the example shows, calling df.copy() creates df_copy, a new object that’s a complete, independent replica of the original dataframe.
This is crucial because it performs a deep copy. It means any changes you make to df_copy won't affect df. If you simply used df_copy = df, you would only create a reference, and any changes to df_copy would also change df.
Basic techniques for copying DataFrames
While copy() is the most direct method, you can also duplicate dataframes using the DataFrame() constructor or even simple slicing with [:].
Using copy(deep=True) for deep copying
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df_deep_copy = df.copy(deep=True)
df.at[0, 'A'] = 100 # Modify the original
print("Original:", df)
print("Deep copy:", df_deep_copy)--OUTPUT--Original: A B
0 100 4
1 2 5
2 3 6
Deep copy: A B
0 1 4
1 2 5
2 3 6
The copy() method defaults to a deep copy, so setting deep=True is often redundant but makes your intent explicit. This process creates an entirely new dataframe and copies the underlying data. As the code shows, when the original df is modified, df_deep_copy remains completely untouched.
- This guarantees that your original data is safe from any changes you make to the copy, which is crucial for data integrity.
Using the DataFrame() constructor
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df_copy = pd.DataFrame(df)
print(df_copy)--OUTPUT--A B
0 1 4
1 2 5
2 3 6
You can also create a copy by passing your original dataframe into the DataFrame() constructor. This builds a new, independent dataframe, so your original data remains safe from any later modifications. While this method works, it isn't always the clearest way to duplicate your data.
- It effectively creates a deep copy of the dataframe's data, protecting the original from changes.
- However, using
df.copy()is generally preferred. It’s more idiomatic and makes your code easier to read by clearly signaling your intent.
Using the slice notation [:]
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df_copy = df[:]
print(df_copy)--OUTPUT--A B
0 1 4
1 2 5
2 3 6
Slicing a dataframe with [:] is another way to create a copy. This syntax, familiar from Python lists, selects all rows and returns them in a new dataframe object. It's a quick shorthand for duplication, but it comes with a few caveats you should be aware of.
- This method performs a shallow copy. While the new dataframe is a separate object, the underlying data might be shared. For simple data types, it works fine, but with complex objects in your dataframe, changes could unexpectedly affect the original.
- Using
df.copy()is generally better practice because it's more explicit and less ambiguous about your intent to create a copy.
Advanced techniques for copying DataFrames
When basic copying isn't enough, you can tackle more complex scenarios with Python's copy module, selective column copying, or serialization using pickle.
Using the copy module for deep copying
import pandas as pd
import copy
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df_copy = copy.deepcopy(df)
print(df_copy)--OUTPUT--A B
0 1 4
1 2 5
2 3 6
Python's built-in copy module provides a general-purpose way to duplicate objects, and its copy.deepcopy() function works perfectly on DataFrames. It recursively traverses the object, creating a completely new DataFrame along with all of its underlying data. This produces an independent clone, functionally identical to using df.copy(deep=True).
- While effective, using the pandas-native
df.copy()is more idiomatic and clearly communicates your intent within a data analysis script. - The
copy.deepcopy()method is especially useful when you need to copy complex, nested Python objects that happen to contain DataFrames.
Copying with specific columns
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
df_subset_copy = df[['A', 'B']].copy()
print(df_subset_copy)--OUTPUT--A B
0 1 4
1 2 5
2 3 6
Often, you only need a subset of your data. You can create a copy containing just specific columns by passing a list of their names, like df[['A', 'B']]. Chaining the .copy() method right after the selection is a crucial step.
- This creates a new, independent dataframe, so modifications to your subset won't affect the original data.
- It also explicitly tells pandas you're working with a copy, which helps you avoid the common
SettingWithCopyWarning.
Using serialization with pickle
import pandas as pd
import pickle
from io import BytesIO
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
buffer = BytesIO()
pickle.dump(df, buffer)
buffer.seek(0)
df_copy = pickle.load(buffer)
print(df_copy)--OUTPUT--A B
0 1 4
1 2 5
2 3 6
Serialization with Python's pickle module offers another powerful way to create a deep copy. The process involves converting the DataFrame into a byte stream using pickle.dump() and then reconstructing it with pickle.load(). This creates a completely independent clone of your original data.
- This technique is especially useful for more than just in-memory copying. You can use it to save a DataFrame to a file or send it over a network.
- While effective, it's more verbose than
df.copy(). It’s best reserved for when you need to persist or transfer the DataFrame, not just duplicate it within your script.
Move faster with Replit
Replit is an AI-powered development platform that transforms natural language into working applications. Describe what you want to build, and Replit Agent creates it—complete with databases, APIs, and deployment.
For the DataFrame copying techniques we've explored, Replit Agent can turn them into production-ready tools. You can build applications that rely on safe, independent data manipulation.
- Create a data sandbox tool that lets users apply transformations to a copy of a dataset, preserving the original source.
- Build a financial modeling dashboard that generates multiple scenarios by duplicating a base dataframe with
df.copy()and applying different variables to each copy. - Deploy a feature engineering utility that isolates specific columns to test new data features without risking the main dataset.
Describe your app idea, and Replit Agent writes the code, tests it, and fixes issues automatically, all in your browser.
Common errors and challenges
Even with the right methods, copying dataframes can lead to confusing warnings and unexpected behavior if you're not careful.
The SettingWithCopyWarning is one of the most common hurdles. It’s not an error, but a heads-up from pandas that you might be modifying a temporary copy of your data instead of the original dataframe. This often happens with "chained indexing," where you select rows and columns in two separate steps, like df[df['A'] > 1]['B'] = 100.
- To fix this, use
.locto perform the selection and assignment in a single operation:df.loc[df['A'] > 1, 'B'] = 100. - If you truly intend to modify a new dataframe, make it explicit. Create a copy first with
subset = df[df['A'] > 1].copy(), then make your changes tosubset.
Shallow copies can also cause silent bugs when your dataframe contains nested objects like lists or dictionaries. A shallow copy creates a new dataframe but may not copy the objects inside. Instead, both the original and the copy can end up pointing to the exact same list.
- If you change an element in a list through the copied dataframe, that change will unexpectedly appear in the original dataframe, too.
- The solution is to always use a deep copy with
df.copy(deep=True)whenever your columns contain mutable objects.
Method chaining can be powerful, but it’s another source of ambiguity that the .copy() method can resolve. When you chain multiple filtering or selection operations, pandas can't always tell if the final result is a view of the original data or a new copy. Trying to assign a value at the end of a long chain often triggers the SettingWithCopyWarning.
- By inserting
.copy()into the chain, you explicitly tell pandas to create a new, independent dataframe at that point. - This makes your code's intent clear, silences the warning, and ensures your operations are performed on a distinct piece of data.
Troubleshooting the SettingWithCopyWarning
The SettingWithCopyWarning is one of pandas' most common alerts. It appears when you try to modify a slice of a DataFrame, leaving it unclear whether the change should affect the original. The following code shows this ambiguity in action.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
subset = df[df['A'] > 1] # Creates a view, not a copy
subset['B'] = 0 # May trigger SettingWithCopyWarning
print(subset)
print(df) # Original df might be unexpectedly modified
Pandas raises this warning because the assignment to subset happens in a separate step after the initial filtering. This chained operation makes it unclear if you're modifying a temporary view or a new copy. The example below shows how to make your intent explicit.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
subset = df[df['A'] > 1].copy() # Explicitly create a copy
subset['B'] = 0 # No warning now
print(subset)
print(df) # Original df remains unchanged
The fix is to explicitly create a new dataframe by adding .copy() after your filtering operation. This signals your intent to pandas, ensuring that any modifications are made to a separate copy, not a view of the original data.
- This prevents the
SettingWithCopyWarningand guarantees your original dataframe remains unchanged.
Beware of nested objects in shallow copies
A shallow copy can introduce subtle bugs when your DataFrame contains nested objects like lists or dictionaries. The copy and the original end up sharing these objects, so a change in one can unexpectedly affect the other. The following example demonstrates this problem.
import pandas as pd
df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [3, 4]]})
df_copy = df.copy() # Default is shallow copy
df_copy.at[0, 'B'][0] = 99 # Modifies the nested list
print("Original:", df)
print("Copy:", df_copy) # Both show the same change!
The shallow copy only duplicates the DataFrame's structure, not the list inside. When you modify the list in df_copy, the change reflects in the original because both DataFrames share the same list object. The following example shows how to prevent this.
import pandas as pd
df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [3, 4]]})
df_copy = df.copy(deep=True) # Explicitly use deep copy
df_copy.at[0, 'B'][0] = 99 # Only modifies copy's nested list
print("Original:", df)
print("Copy:", df_copy) # Only copy shows the change
The fix is to force a deep copy using df.copy(deep=True). This creates a completely new DataFrame and also duplicates any nested objects inside, like the list in column 'B'. Now, when you modify the list in the copy, the original DataFrame remains untouched, preserving your data's integrity.
- Always use a deep copy when your DataFrame contains mutable objects like lists or dictionaries to prevent these unintended side effects.
Fixing method chaining with copy()
Method chaining is powerful but can create ambiguity. When you string together multiple operations, pandas might not know if you're working on a view or a copy. The placement of .copy() is key, as the code below shows how misplacing it can cause issues.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]})
result = df[df['A'] > 2]['B'].copy() + 10 # Wrong placement of copy()
print(result) # May cause unexpected behavior
The problem is that .copy() is called on the final Series, not the filtered DataFrame. This late placement fails to resolve the ambiguity of the chained indexing operation. The following example shows the correct way to structure this.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]})
result = df[df['A'] > 2].copy()['B'] + 10 # Correct placement
print(result) # Properly copies then operates on the data
The solution is to place .copy() right after the filtering operation, as in df[df['A'] > 2].copy(). It’s a clear signal to pandas to create a new, independent DataFrame at that exact moment. This simple change resolves the ambiguity of the method chain.
- It ensures that any following operations, like selecting column
'B'and adding 10, are performed on a separate copy. - This prevents the
SettingWithCopyWarningand guarantees your original data remains untouched.
Real-world applications
Now that you can navigate the common challenges, you can apply the copy() method confidently in real-world data analysis.
Using copy() in data preprocessing workflows
The copy() method is a critical tool in data preprocessing, as it lets you safely apply transformations like imputation to a duplicate dataset while preserving the integrity of your original raw data.
import pandas as pd
# Original dataset with missing values
raw_data = pd.DataFrame({'age': [25, 30, None, 40], 'income': [50000, None, 75000, 90000]})
processed_data = raw_data.copy()
# Apply preprocessing to the copy only
processed_data.fillna(round(processed_data.mean()), inplace=True)
print("Original data:\n", raw_data)
print("\nProcessed data:\n", processed_data)
This example demonstrates a standard data cleaning workflow. A new DataFrame, processed_data, is created as an exact duplicate of raw_data using the copy() method. This step is essential for protecting your original dataset from any changes.
- The
fillna()method is then called on the copy to replace any missing values with the mean of their respective columns. - Since all modifications happen on
processed_data, theraw_dataDataFrame is left completely untouched, ensuring your source data remains pristine for other analyses.
Creating data snapshots with copy() for A/B test analysis
In A/B test analysis, the copy() method lets you create separate data snapshots for each group, making it easy to compare their performance without affecting the original dataset.
import pandas as pd
# Sample A/B test results
ab_data = pd.DataFrame({
'user_id': range(1, 7),
'group': ['A', 'B', 'A', 'B', 'A', 'B'],
'conversion': [1, 0, 0, 1, 0, 1]
})
# Create separate copies for each test group
group_a = ab_data[ab_data['group'] == 'A'].copy()
group_b = ab_data[ab_data['group'] == 'B'].copy()
# Calculate conversion rates
a_conversion_rate = group_a['conversion'].mean()
b_conversion_rate = group_b['conversion'].mean()
print(f"Group A conversion rate: {a_conversion_rate:.2f}")
print(f"Group B conversion rate: {b_conversion_rate:.2f}")
print(f"Lift: {(b_conversion_rate - a_conversion_rate) / a_conversion_rate:.2%}")
This code demonstrates a common workflow for analyzing A/B test results. It starts by filtering the main ab_data DataFrame to create two separate copies—one for group_a and another for group_b. Using .copy() ensures each segment is independent.
- With the data cleanly separated, it’s easy to calculate the average conversion rate for each group by applying the
.mean()method to theconversioncolumn. - Finally, the code prints the individual conversion rates and the overall lift, which measures the performance difference between the two groups.
Get started with Replit
Put your new skills to work. Tell Replit Agent: "Build a data sandbox that uses copy() to let users test transformations safely," or "Create a scenario modeling tool that duplicates a base dataframe for analysis."
Replit Agent writes the code, tests for errors, and deploys your application directly from your browser. Start building with Replit.
Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.
Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.


.png)
.png)