How to copy a dataframe in Python

Learn to copy a dataframe in Python. Explore various methods, tips, real-world uses, and how to fix common errors.

Published on:

Tue

Mar 17, 2026

Updated on:

Tue

Mar 24, 2026

The Replit Team

ON THIS PAGE

Example H2

Copying a dataframe is a core operation in Python's data analysis workflow. It's essential to preserve original data as you experiment, which prevents unintended modifications and ensures data integrity.

In this article, we'll cover essential techniques for dataframe duplication. You'll find practical tips, real-world applications, and debugging advice to help you handle data manipulations with confidence.

Using the `copy()` method

import pandas as pd df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) df_copy = df.copy() print(df_copy)--OUTPUT--A B 0 1 4 1 2 5 2 3 6

The copy() method is the standard and most explicit way to duplicate a dataframe in pandas. As the example shows, calling df.copy() creates df_copy, a new object that’s a complete, independent replica of the original dataframe.

This is crucial because it performs a deep copy. It means any changes you make to df_copy won't affect df. If you simply used df_copy = df, you would only create a reference, and any changes to df_copy would also change df.

Basic techniques for copying DataFrames

While copy() is the most direct method, you can also duplicate dataframes using the DataFrame() constructor or even simple slicing with [:].

Using `copy(deep=True)` for deep copying

import pandas as pd df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) df_deep_copy = df.copy(deep=True) df.at[0, 'A'] = 100 # Modify the original print("Original:", df) print("Deep copy:", df_deep_copy)--OUTPUT--Original: A B 0 100 4 1 2 5 2 3 6 Deep copy: A B 0 1 4 1 2 5 2 3 6

The copy() method defaults to a deep copy, so setting deep=True is often redundant but makes your intent explicit. This process creates an entirely new dataframe and copies the underlying data. As the code shows, when the original df is modified, df_deep_copy remains completely untouched.

This guarantees that your original data is safe from any changes you make to the copy, which is crucial for data integrity.

Using the `DataFrame()` constructor

import pandas as pd df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) df_copy = pd.DataFrame(df) print(df_copy)--OUTPUT--A B 0 1 4 1 2 5 2 3 6

You can also create a copy by passing your original dataframe into the DataFrame() constructor. This builds a new, independent dataframe, so your original data remains safe from any later modifications. While this method works, it isn't always the clearest way to duplicate your data.

It effectively creates a deep copy of the dataframe's data, protecting the original from changes.
However, using df.copy() is generally preferred. It’s more idiomatic and makes your code easier to read by clearly signaling your intent.

Using the slice notation `[:]`

import pandas as pd df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) df_copy = df[:] print(df_copy)--OUTPUT--A B 0 1 4 1 2 5 2 3 6

Slicing a dataframe with [:] is another way to create a copy. This syntax, familiar from Python lists, selects all rows and returns them in a new dataframe object. It's a quick shorthand for duplication, but it comes with a few caveats you should be aware of.

This method performs a shallow copy. While the new dataframe is a separate object, the underlying data might be shared. For simple data types, it works fine, but with complex objects in your dataframe, changes could unexpectedly affect the original.
Using df.copy() is generally better practice because it's more explicit and less ambiguous about your intent to create a copy.

Advanced techniques for copying DataFrames

When basic copying isn't enough, you can tackle more complex scenarios with Python's copy module, selective column copying, or serialization using pickle.

Using the `copy` module for deep copying

import pandas as pd import copy df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) df_copy = copy.deepcopy(df) print(df_copy)--OUTPUT--A B 0 1 4 1 2 5 2 3 6

Python's built-in copy module provides a general-purpose way to duplicate objects, and its copy.deepcopy() function works perfectly on DataFrames. It recursively traverses the object, creating a completely new DataFrame along with all of its underlying data. This produces an independent clone, functionally identical to using df.copy(deep=True).

While effective, using the pandas-native df.copy() is more idiomatic and clearly communicates your intent within a data analysis script.
The copy.deepcopy() method is especially useful when you need to copy complex, nested Python objects that happen to contain DataFrames.

Copying with specific columns

import pandas as pd df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}) df_subset_copy = df[['A', 'B']].copy() print(df_subset_copy)--OUTPUT--A B 0 1 4 1 2 5 2 3 6

Often, you only need a subset of your data. You can create a copy containing just specific columns by passing a list of their names, like df[['A', 'B']]. Chaining the .copy() method right after the selection is a crucial step.

This creates a new, independent dataframe, so modifications to your subset won't affect the original data.
It also explicitly tells pandas you're working with a copy, which helps you avoid the common SettingWithCopyWarning.

Using serialization with `pickle`

import pandas as pd import pickle from io import BytesIO df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) buffer = BytesIO() pickle.dump(df, buffer) buffer.seek(0) df_copy = pickle.load(buffer) print(df_copy)--OUTPUT--A B 0 1 4 1 2 5 2 3 6

Serialization with Python's pickle module offers another powerful way to create a deep copy. The process involves converting the DataFrame into a byte stream using pickle.dump() and then reconstructing it with pickle.load(). This creates a completely independent clone of your original data.

This technique is especially useful for more than just in-memory copying. You can use it to save a DataFrame to a file or send it over a network.
While effective, it's more verbose than df.copy(). It’s best reserved for when you need to persist or transfer the DataFrame, not just duplicate it within your script.

Move faster with Replit

Replit is an AI-powered development platform that transforms natural language into working applications. Describe what you want to build, and Replit Agent creates it—complete with databases, APIs, and deployment.

For the DataFrame copying techniques we've explored, Replit Agent can turn them into production-ready tools. You can build applications that rely on safe, independent data manipulation.

Create a data sandbox tool that lets users apply transformations to a copy of a dataset, preserving the original source.
Build a financial modeling dashboard that generates multiple scenarios by duplicating a base dataframe with df.copy() and applying different variables to each copy.
Deploy a feature engineering utility that isolates specific columns to test new data features without risking the main dataset.

Describe your app idea, and Replit Agent writes the code, tests it, and fixes issues automatically, all in your browser.

Common errors and challenges

Even with the right methods, copying dataframes can lead to confusing warnings and unexpected behavior if you're not careful.

The SettingWithCopyWarning is one of the most common hurdles. It’s not an error, but a heads-up from pandas that you might be modifying a temporary copy of your data instead of the original dataframe. This often happens with "chained indexing," where you select rows and columns in two separate steps, like df[df['A'] > 1]['B'] = 100.

To fix this, use .loc to perform the selection and assignment in a single operation: df.loc[df['A'] > 1, 'B'] = 100.
If you truly intend to modify a new dataframe, make it explicit. Create a copy first with subset = df[df['A'] > 1].copy(), then make your changes to subset.

Shallow copies can also cause silent bugs when your dataframe contains nested objects like lists or dictionaries. A shallow copy creates a new dataframe but may not copy the objects inside. Instead, both the original and the copy can end up pointing to the exact same list.

If you change an element in a list through the copied dataframe, that change will unexpectedly appear in the original dataframe, too.
The solution is to always use a deep copy with df.copy(deep=True) whenever your columns contain mutable objects.

Method chaining can be powerful, but it’s another source of ambiguity that the .copy() method can resolve. When you chain multiple filtering or selection operations, pandas can't always tell if the final result is a view of the original data or a new copy. Trying to assign a value at the end of a long chain often triggers the SettingWithCopyWarning.

By inserting .copy() into the chain, you explicitly tell pandas to create a new, independent dataframe at that point.
This makes your code's intent clear, silences the warning, and ensures your operations are performed on a distinct piece of data.

Troubleshooting the `SettingWithCopyWarning`

The SettingWithCopyWarning is one of pandas' most common alerts. It appears when you try to modify a slice of a DataFrame, leaving it unclear whether the change should affect the original. The following code shows this ambiguity in action.

import pandas as pd df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) subset = df[df['A'] > 1] # Creates a view, not a copy subset['B'] = 0 # May trigger SettingWithCopyWarning print(subset) print(df) # Original df might be unexpectedly modified

Pandas raises this warning because the assignment to subset happens in a separate step after the initial filtering. This chained operation makes it unclear if you're modifying a temporary view or a new copy. The example below shows how to make your intent explicit.

import pandas as pd df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) subset = df[df['A'] > 1].copy() # Explicitly create a copy subset['B'] = 0 # No warning now print(subset) print(df) # Original df remains unchanged

The fix is to explicitly create a new dataframe by adding .copy() after your filtering operation. This signals your intent to pandas, ensuring that any modifications are made to a separate copy, not a view of the original data.

This prevents the SettingWithCopyWarning and guarantees your original dataframe remains unchanged.

Beware of nested objects in shallow copies

A shallow copy can introduce subtle bugs when your DataFrame contains nested objects like lists or dictionaries. The copy and the original end up sharing these objects, so a change in one can unexpectedly affect the other. The following example demonstrates this problem.

import pandas as pd df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [3, 4]]}) df_copy = df.copy() # Default is shallow copy df_copy.at[0, 'B'][0] = 99 # Modifies the nested list print("Original:", df) print("Copy:", df_copy) # Both show the same change!

The shallow copy only duplicates the DataFrame's structure, not the list inside. When you modify the list in df_copy, the change reflects in the original because both DataFrames share the same list object. The following example shows how to prevent this.

import pandas as pd df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [3, 4]]}) df_copy = df.copy(deep=True) # Explicitly use deep copy df_copy.at[0, 'B'][0] = 99 # Only modifies copy's nested list print("Original:", df) print("Copy:", df_copy) # Only copy shows the change

The fix is to force a deep copy using df.copy(deep=True). This creates a completely new DataFrame and also duplicates any nested objects inside, like the list in column 'B'. Now, when you modify the list in the copy, the original DataFrame remains untouched, preserving your data's integrity.

Always use a deep copy when your DataFrame contains mutable objects like lists or dictionaries to prevent these unintended side effects.

Fixing method chaining with `copy()`

Method chaining is powerful but can create ambiguity. When you string together multiple operations, pandas might not know if you're working on a view or a copy. The placement of .copy() is key, as the code below shows how misplacing it can cause issues.

import pandas as pd df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]}) result = df[df['A'] > 2]['B'].copy() + 10 # Wrong placement of copy() print(result) # May cause unexpected behavior

The problem is that .copy() is called on the final Series, not the filtered DataFrame. This late placement fails to resolve the ambiguity of the chained indexing operation. The following example shows the correct way to structure this.

import pandas as pd df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]}) result = df[df['A'] > 2].copy()['B'] + 10 # Correct placement print(result) # Properly copies then operates on the data

The solution is to place .copy() right after the filtering operation, as in df[df['A'] > 2].copy(). It’s a clear signal to pandas to create a new, independent DataFrame at that exact moment. This simple change resolves the ambiguity of the method chain.

It ensures that any following operations, like selecting column 'B' and adding 10, are performed on a separate copy.
This prevents the SettingWithCopyWarning and guarantees your original data remains untouched.

Real-world applications

Now that you can navigate the common challenges, you can apply the copy() method confidently in real-world data analysis.

Using `copy()` in data preprocessing workflows

The copy() method is a critical tool in data preprocessing, as it lets you safely apply transformations like imputation to a duplicate dataset while preserving the integrity of your original raw data.

import pandas as pd # Original dataset with missing values raw_data = pd.DataFrame({'age': [25, 30, None, 40], 'income': [50000, None, 75000, 90000]}) processed_data = raw_data.copy() # Apply preprocessing to the copy only processed_data.fillna(round(processed_data.mean()), inplace=True) print("Original data:\n", raw_data) print("\nProcessed data:\n", processed_data)

This example demonstrates a standard data cleaning workflow. A new DataFrame, processed_data, is created as an exact duplicate of raw_data using the copy() method. This step is essential for protecting your original dataset from any changes.

The fillna() method is then called on the copy to replace any missing values with the mean of their respective columns.
Since all modifications happen on processed_data, the raw_data DataFrame is left completely untouched, ensuring your source data remains pristine for other analyses.

Creating data snapshots with `copy()` for A/B test analysis

In A/B test analysis, the copy() method lets you create separate data snapshots for each group, making it easy to compare their performance without affecting the original dataset.

import pandas as pd # Sample A/B test results ab_data = pd.DataFrame({ 'user_id': range(1, 7), 'group': ['A', 'B', 'A', 'B', 'A', 'B'], 'conversion': [1, 0, 0, 1, 0, 1] }) # Create separate copies for each test group group_a = ab_data[ab_data['group'] == 'A'].copy() group_b = ab_data[ab_data['group'] == 'B'].copy() # Calculate conversion rates a_conversion_rate = group_a['conversion'].mean() b_conversion_rate = group_b['conversion'].mean() print(f"Group A conversion rate: {a_conversion_rate:.2f}") print(f"Group B conversion rate: {b_conversion_rate:.2f}") print(f"Lift: {(b_conversion_rate - a_conversion_rate) / a_conversion_rate:.2%}")

This code demonstrates a common workflow for analyzing A/B test results. It starts by filtering the main ab_data DataFrame to create two separate copies—one for group_a and another for group_b. Using .copy() ensures each segment is independent.

With the data cleanly separated, it’s easy to calculate the average conversion rate for each group by applying the .mean() method to the conversion column.
Finally, the code prints the individual conversion rates and the overall lift, which measures the performance difference between the two groups.

Get started with Replit

Put your new skills to work. Tell Replit Agent: "Build a data sandbox that uses copy() to let users test transformations safely," or "Create a scenario modeling tool that duplicates a base dataframe for analysis."

Replit Agent writes the code, tests for errors, and deploys your application directly from your browser. Start building with Replit.

Get started free

Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

Get started free

Get started for free

Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

Get started for free