How to append a dataframe in Python

Learn to append DataFrames in Python with various methods. Get tips, see real-world applications, and learn how to debug common errors.

How to append a dataframe in Python
Published on: 
Tue
Mar 3, 2026
Updated on: 
Wed
Apr 1, 2026
The Replit Team

To combine datasets in Python, you often append a pandas DataFrame. This core operation adds new rows, which is essential for many data analysis tasks.

In this article, you'll explore techniques like pd.concat(). We'll also provide practical performance tips, review real-world applications, and offer debugging advice to help you confidently handle data combination tasks.

Using pd.concat() for basic appending

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
result = pd.concat([df1, df2])
print(result)--OUTPUT--A B
0 1 3
1 2 4
0 5 7
1 6 8

The pd.concat() function is the primary tool for this task. It takes an iterable—in this case, a list of DataFrames [df1, df2]—and joins them. By default, it stacks them vertically along axis=0, which is exactly what you need when appending rows.

Pay attention to the index in the output. The function preserves the original indices from each DataFrame, resulting in duplicate labels. This is a key behavior to be aware of, as you'll often want to reset the index for a clean, sequential series after combining your data.

Basic DataFrame appending techniques

Beyond the basic pd.concat() function, you can also use the older DataFrame.append() method or combine multiple DataFrames and Series in a single operation.

Using the DataFrame.append() method

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
result = df1.append(df2)
print(result)--OUTPUT--A B
0 1 3
1 2 4
0 5 7
1 6 8

The DataFrame.append() method offers another way to add rows. Unlike the top-level pd.concat() function, you call append() directly on a DataFrame, such as df1.append(df2).

  • It’s deprecated: This method is considered outdated. The pandas team recommends using pd.concat() for all new code because append() will be removed in a future version.
  • Identical result: It produces the same output as the previous example, including the duplicate index labels from the original DataFrames.

For these reasons, sticking with pd.concat() is the best practice for writing future-proof code.

Concatenating multiple DataFrames with pd.concat()

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
df3 = pd.DataFrame({'A': [9, 10], 'B': [11, 12]})
result = pd.concat([df1, df2, df3])
print(result)--OUTPUT--A B
0 1 3
1 2 4
0 5 7
1 6 8
0 9 11
1 10 12

The real strength of pd.concat() is its ability to handle more than two DataFrames at once. You aren't limited to pairwise combinations.

  • Simply pass a list containing all the DataFrames you want to join, like [df1, df2, df3].
  • The function appends them in the order they appear in the list, creating one unified DataFrame.

This approach is much cleaner and more efficient than repeatedly calling an append method. As before, the original indices are preserved in the output.

Appending a Series to a DataFrame

import pandas as pd

df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
series = pd.Series({'A': 5, 'B': 6})
result = pd.concat([df, pd.DataFrame([series])])
print(result)--OUTPUT--A B
0 1 3
1 2 4
0 5 6

You can also append a Series, which is useful for adding a single row of data. Since pd.concat() expects a list of DataFrames, you can't pass the Series in directly. You need to convert it into a DataFrame first.

  • The key is to wrap the Series in a list and pass it to the DataFrame constructor: pd.DataFrame([series]).

This simple step transforms your Series into a single-row DataFrame, making it compatible for concatenation.

Advanced DataFrame appending techniques

With the fundamentals covered, you can now address common challenges like mismatched columns, duplicate indices, and optimizing the append process for greater efficiency.

Handling different column sets when appending

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'C': [7, 8]})
result = pd.concat([df1, df2], sort=False)
print(result)--OUTPUT--A B C
0 1 3.0 NaN
1 2 4.0 NaN
0 5 NaN 7.0
1 6 NaN 8.0

When you combine DataFrames that don't share the same columns, pd.concat() creates a new DataFrame containing all columns from both. It aligns the data by column name, not by position.

  • For columns that exist in one DataFrame but not the other, pandas fills the missing spots with NaN (Not a Number).
  • Using sort=False is a good practice. It prevents pandas from alphabetically sorting the columns, which maintains the original order and improves performance.

Resetting index with ignore_index=True

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
result = pd.concat([df1, df2], ignore_index=True)
print(result)--OUTPUT--A B
0 1 3
1 2 4
2 5 7
3 6 8

As you've seen, concatenating DataFrames can lead to a messy index with duplicate labels. The ignore_index=True parameter is the solution. When you set it, pd.concat() discards the original indices entirely.

  • It creates a fresh, sequential index for the new DataFrame, starting from 0.
  • This is the standard way to get a clean result, making your combined data much easier to work with.

Efficient appending with list comprehension

import pandas as pd

dataframes = [
pd.DataFrame({'A': [i, i+1], 'B': [i+2, i+3]})
for i in range(1, 6, 2)
]
result = pd.concat(dataframes, ignore_index=True)
print(result)--OUTPUT--A B
0 1 3
1 2 4
2 3 5
3 4 6
4 5 7
5 6 8

When you need to combine many DataFrames, it's far more efficient to collect them in a list first and then perform a single concatenation. This example uses a Python list comprehension to quickly generate a list of DataFrames. Afterward, pd.concat() is called just once on the entire collection.

  • This approach is much faster because it avoids the performance penalty of creating a new DataFrame in memory with every single append operation.
  • It’s a standard pattern for performance-critical tasks that keeps your code clean and memory-friendly.

Move faster with Replit

Replit is an AI-powered development platform that comes with all Python dependencies pre-installed, so you can skip setup and start coding instantly. This lets you move from learning individual techniques like pd.concat() to building complete applications with Agent 4, which can take an idea to a working product directly from a description.

Instead of piecing together techniques, you can describe the app you want to build and let Agent 4 construct it for you:

  • A utility that consolidates daily log files from different sources into a single master file for analysis.
  • A script that appends new user registration data to an existing customer DataFrame.
  • A dashboard that combines monthly sales reports from various regional offices into one comprehensive annual summary.

Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.

Common errors and challenges

Appending DataFrames can introduce issues like duplicate indices, missing values, and type errors, but you can easily manage them with the right approach.

Fixing index duplication with ignore_index=True

One of the most common side effects of using pd.concat() is ending up with duplicate index labels. This happens because pandas preserves the original index from each DataFrame you combine. While this behavior is predictable, it can make selecting or slicing data difficult.

The fix is simple: set the ignore_index=True parameter. When you do this, pd.concat() discards the old indices and generates a new, clean index from 0. It’s a crucial step for creating a tidy, usable final DataFrame.

Handling missing values after concatenation

You'll often see NaN (Not a Number) values appear after concatenating DataFrames with different columns. This isn't an error—it's pandas' way of handling misaligned data by filling in the gaps. However, these missing values can cause problems for calculations or machine learning models.

  • Use the .fillna() method to replace NaN with a specific value, like 0 for numerical columns or an empty string for text.
  • Use the .dropna() method to remove rows or columns that contain any NaN values. Be careful with this, as you might lose important data.

Fixing TypeError when concatenating non-DataFrame objects

A TypeError usually means you've tried to concatenate something that isn't a DataFrame. The pd.concat() function expects a list of DataFrames, so if you pass a raw Series or another object type, it will fail.

This often happens when adding a single row. The solution is to ensure every item you're concatenating is a DataFrame. If you have a Series, you can quickly convert it by wrapping it in a list and passing it to the DataFrame constructor, like pd.DataFrame([my_series]), before you concatenate.

Fixing index duplication with ignore_index=True

When you use pd.concat() without resetting the index, you get duplicate labels. This isn't just messy; it creates ambiguity. Trying to select data using .loc[0], for example, becomes unreliable. The following code demonstrates this exact problem in action.

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

# Concatenating without handling indexes
result = pd.concat([df1, df2])
print(result)
print(f"Values at index 0: {result.loc[0]}") # Ambiguous - returns first occurrence

The result.loc[0] call is ambiguous because multiple rows share the index 0. Pandas returns only the first match it finds, effectively hiding other data at that same index. The following code demonstrates the correct approach for a clean result.

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

# Reset index to avoid duplication
result = pd.concat([df1, df2], ignore_index=True)
print(result)
print(f"Values at index 0: {result.loc[0]}")

By setting ignore_index=True, you instruct pd.concat() to discard the original indices and generate a new, sequential one. This ensures every row in the final DataFrame has a unique label, which resolves the ambiguity. As a result, a call like result.loc[0] becomes predictable and reliably returns a single row. You should use this approach whenever you need a clean, usable index after combining DataFrames.

Handling missing values after concatenation

When you concatenate DataFrames with mismatched columns, pandas fills the gaps with NaN values. While this prevents errors during the merge, these missing markers can cause unexpected results in later calculations, like when you try to sum a column. The following code demonstrates this problem.

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'C': [7, 8]})

# This creates NaN values and may cause issues in calculations
result = pd.concat([df1, df2])
print(result)
total_b = result['B'].sum()
print(f"Sum of column B: {total_b}") # Includes NaN values

The sum() function on column B skips the NaN values, but leaving them unaddressed can cause problems in other calculations. The code below shows a better way to handle these gaps before they become an issue.

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'C': [7, 8]})

# Fill missing values with a default value
result = pd.concat([df1, df2]).fillna(0)
print(result)
total_b = result['B'].sum()
print(f"Sum of column B: {total_b}")

The solution is to chain the .fillna(0) method directly after pd.concat(). This replaces all NaN values with 0, ensuring your columns are ready for mathematical operations like .sum(). You should always consider this step when your source DataFrames have different column sets, as it prevents unexpected behavior in your analysis and keeps your data clean and consistent.

Fixing TypeError when concatenating non-DataFrame objects

Fixing TypeError when concatenating non-DataFrame objects

The pd.concat() function is strict—it only accepts a list of DataFrames. If you include another object type, like a raw NumPy array, Python will raise a TypeError. The code below demonstrates what happens when this rule is broken.

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
array = np.array([5, 6])

# Trying to concatenate DataFrame with numpy array directly
result = pd.concat([df, array])
print(result)

The TypeError is triggered because the NumPy array isn't wrapped in a DataFrame structure that pd.concat() expects. The function requires all items in the list to be DataFrames. The following code demonstrates the correct approach.

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
array = np.array([5, 6])

# Convert array to DataFrame first
array_df = pd.DataFrame([array], columns=['A', 'B'])
result = pd.concat([df, array_df])
print(result)

The solution is to convert the NumPy array into a DataFrame before concatenating. This ensures every item you pass to pd.concat() is the correct type. To do this:

  • Wrap the array in a list and pass it to pd.DataFrame(), making sure to specify the columns so the new row aligns correctly with the existing structure.

This error is common when you mix data types from different libraries, so always check your inputs before combining them.

Real-world applications

With the troubleshooting covered, you can confidently apply DataFrame appending to practical scenarios like consolidating monthly reports or building a unified product catalog.

Combining monthly sales reports with pd.concat()

Consolidating data from different time periods, like merging monthly sales reports, is a perfect use case for pd.concat().

import pandas as pd

jan_sales = pd.DataFrame({
'Product': ['Widget A', 'Widget B'],
'Units': [100, 150],
'Month': ['Jan', 'Jan']
})

feb_sales = pd.DataFrame({
'Product': ['Widget A', 'Widget C'],
'Units': [120, 90],
'Month': ['Feb', 'Feb']
})

quarterly_report = pd.concat([jan_sales, feb_sales], ignore_index=True)
print(quarterly_report)

This example demonstrates how to stack two DataFrames, jan_sales and feb_sales, on top of each other. The pd.concat() function handles the combination, creating a unified quarterly_report.

  • The key parameter here is ignore_index=True. It discards the original indices from each DataFrame.
  • As a result, the final table gets a fresh, sequential index starting from 0, making the data easier to reference.

This technique is essential for cleanly merging datasets that share a similar structure.

Building a comprehensive product catalog from different suppliers

Building a unified product catalog from multiple suppliers often requires you to standardize inconsistent column names before you can append the data.

import pandas as pd

supplier_a = pd.DataFrame({
'ProductID': ['A001', 'A002'],
'Name': ['Premium Widget', 'Deluxe Gadget'],
'Price': [19.99, 24.99]
})

supplier_b = pd.DataFrame({
'Product_Code': ['B001', 'B002'],
'Product_Name': ['Economy Widget', 'Basic Tool'],
'Wholesale_Price': [12.99, 9.99]
})

supplier_b = supplier_b.rename(columns={
'Product_Code': 'ProductID',
'Product_Name': 'Name',
'Wholesale_Price': 'Price'
})

complete_catalog = pd.concat([supplier_a, supplier_b], ignore_index=True)
print(complete_catalog)

This example tackles a common data-wrangling problem: combining datasets with mismatched column names. Before appending, you must align the schemas. The .rename() method is used on supplier_b to make its column names—like Product_Code and Product_Name—match those in supplier_a.

  • Once the columns are consistent, pd.concat() stacks the two DataFrames vertically.
  • Using ignore_index=True creates a new, clean index for the combined catalog, making the final data much easier to work with.

Get started with Replit

Now, turn what you've learned into a real tool. Describe what you want to build to Replit Agent, like "a utility that merges multiple CSV log files" or "a script that appends new survey responses to an existing dataset".

Replit Agent writes the code, tests for errors, and helps you deploy your application. Start building with Replit.

Build your first app today

Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.

Get started for free

Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.