How to remove a row from a dataframe in Python
Learn to remove rows from a Python dataframe. Explore different methods, tips, real-world applications, and how to debug common errors.
.png)
You often need to remove rows from a pandas DataFrame for data cleaning and preparation. Python provides powerful methods to filter your dataset, which ensures data integrity and improves model accuracy.
In this article, you'll learn various techniques to remove rows, from the simple drop() method to conditional selection. We'll also cover practical tips, real-world applications, and common debugging advice.
Using drop() to remove a row by index
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df_new = df.drop(index=1) # Remove row with index 1
print(df_new)--OUTPUT--A B
0 1 4
2 3 6
The drop() method is a straightforward way to remove rows when you know their index. In the example, we target the row with index 1 by passing index=1 as an argument. This tells pandas exactly which row to eliminate from the dataset.
It's important to note that drop() doesn't modify the original DataFrame by default. Instead, it returns a new DataFrame without the specified row. That's why the result is assigned to a new variable, df_new, preserving the original df for any further operations.
Basic row removal techniques
While the drop() method is perfect for known indices, you'll often need to remove rows based on conditions, position, or labels instead.
Using boolean indexing to filter rows
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df_filtered = df[df['A'] != 2] # Keep rows where A is not 2
print(df_filtered)--OUTPUT--A B
0 1 4
2 3 6
Boolean indexing lets you filter rows based on their content. It works by creating a temporary list of True and False values based on a condition you provide. Pandas then keeps only the rows that correspond to a True value.
- The condition
df['A'] != 2checks each value in column 'A'. - It returns
Truefor rows where the value isn't 2 andFalsefor rows where it is. - The DataFrame is then filtered, keeping only the rows marked
Trueand creating a new DataFrame.
Using iloc to remove rows by position
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df_new = df.iloc[[0, 2]] # Keep only rows at positions 0 and 2
print(df_new)--OUTPUT--A B
0 1 4
2 3 6
The iloc indexer selects data purely by its integer position. Rather than telling pandas which rows to remove, you provide a list of the row positions you want to keep. It's a great way to filter your DataFrame when you know the exact location of the data you need, regardless of its index label.
- The expression
df.iloc[[0, 2]]tells pandas to create a new DataFrame. - It includes only the rows at the specified integer positions—in this case, the first and third rows.
Using loc to remove rows by label
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['a', 'b', 'c'])
df_new = df.loc[['a', 'c']] # Keep only rows with labels 'a' and 'c'
print(df_new)--OUTPUT--A B
a 1 4
c 3 6
Unlike iloc, the loc indexer selects rows based on their labels, not their integer position. When your DataFrame has a custom index, such as ['a', 'b', 'c'], you can use loc to cherry-pick rows by name.
- The expression
df.loc[['a', 'c']]creates a new DataFrame containing only the rows with those specific labels. - This effectively removes any rows whose labels aren't in the list—in this case, the row labeled 'b' is excluded from the new DataFrame.
Advanced row removal techniques
With the fundamentals down, you can use specialized methods to remove rows with missing values, drop duplicates, or filter using complex queries.
Removing rows with missing values using dropna()
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, 2, np.nan], 'B': [4, np.nan, 6]})
df_clean = df.dropna() # Remove rows with any NaN values
print(df_clean)--OUTPUT--A B
0 1.0 4.0
Missing data, often shown as NaN values, can disrupt your analysis. The dropna() method is a quick way to clean it up by removing rows with any missing information. It's a fundamental step in preparing data for modeling.
By default, dropna() operates with a simple rule:
- It scans each row for
NaNvalues and drops the entire row if it finds even one.
This is why only the first row, which is complete, remains in the resulting df_clean DataFrame. The original df is left unchanged.
Conditionally removing rows with query()
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [4, 5, 6, 7]})
df_filtered = df.query('A > 2 and B < 7') # Keep rows matching the condition
print(df_filtered)--OUTPUT--A B
2 3 6
The query() method offers a highly readable way to filter DataFrames using a string expression. It's especially useful for complex conditions because it keeps your code clean and intuitive. This approach lets you write filtering logic that reads almost like a natural sentence.
- The method evaluates the string
'A > 2 and B < 7'against the DataFrame. - It keeps only the rows where the value in column 'A' is greater than 2
andthe value in column 'B' is less than 7.
Removing duplicate rows with drop_duplicates()
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 2, 3], 'B': [4, 5, 5, 6]})
df_unique = df.drop_duplicates() # Remove duplicate rows
print(df_unique)--OUTPUT--A B
0 1 4
1 2 5
3 3 6
The drop_duplicates() method is your go-to for cleaning datasets with redundant entries. It works by scanning for rows that are identical across all columns and removing them, which is crucial for maintaining data quality.
- By default, it keeps the first occurrence of a duplicated row and discards any subsequent copies.
- Similar to other filtering operations, it returns a new DataFrame and leaves the original one unchanged, preserving your source data.
Move faster with Replit
Replit is an AI-powered development platform that transforms natural language into working applications. You can describe what you want to build, and Replit Agent creates it—complete with databases, APIs, and deployment.
For the row removal techniques we've covered, Replit Agent can turn them into production-ready tools:
- Build a data cleaning utility that automatically scrubs datasets using
dropna()before they're fed into a machine learning model. - Create a contact list manager that finds and removes duplicate entries from imported files with
drop_duplicates(). - Deploy a dynamic sales dashboard that filters transactions using complex conditions similar to the
query()method.
Turn your own concepts into working applications. Describe your app idea, and Replit Agent writes the code, tests it, and fixes issues automatically, all from your browser.
Common errors and challenges
When removing rows from a DataFrame, a few common mistakes can lead to unexpected results and bugs in your code.
Forgetting that drop() doesn't modify the original DataFrame
A frequent oversight is calling drop() and assuming it has altered your original DataFrame. Most pandas operations, including drop(), return a new, modified DataFrame by default, leaving the original untouched. This design choice protects your source data from accidental changes.
If you want to modify the DataFrame directly, you must either reassign the result back to the variable, like df = df.drop(index=1), or use the inplace=True argument. However, reassigning is often preferred because it makes the code's intent clearer.
Incorrectly chaining multiple boolean conditions
When you filter with multiple conditions, you can't use Python's standard and or or keywords. These operators try to evaluate the truthiness of an entire pandas Series at once, which is ambiguous and will raise an error. Instead, you must use the bitwise operators & for "and" and | for "or".
It's also crucial to wrap each condition in parentheses. Due to Python's operator precedence rules, an expression like df['A'] > 2 & df['B'] < 7 will fail. The correct syntax is (df['A'] > 2) & (df['B'] < 7), which ensures each condition is evaluated separately before they are combined.
Not handling NaN values when filtering rows
Missing values, represented as NaN, behave uniquely during comparisons. A condition like df['column'] != some_value will always evaluate to False for rows where the value is NaN. This means rows with missing data will be unintentionally dropped from your filtered result.
To avoid this, you should handle NaN values explicitly. You can either remove them beforehand with dropna() or include a specific check for them in your filter using methods like pd.notna().
Forgetting that drop() doesn't modify the original DataFrame
It’s a common trip-up for new pandas users. You call the drop() method, expecting it to change your DataFrame, but nothing happens. This is because pandas returns a new DataFrame by default, leaving the original one intact.
The following code demonstrates this behavior. Notice how the printed output still includes the row you tried to remove.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df.drop(index=1) # This doesn't modify df
print(df) # Still contains all rows
The code calls df.drop(index=1), but the new DataFrame it returns is never saved. Because the result isn't assigned to a variable, printing df shows the original, unmodified data. Check the example below for the fix.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
# Either assign to a new variable
df_new = df.drop(index=1)
# Or use inplace=True parameter
# df.drop(index=1, inplace=True)
print(df_new)
To make your changes stick, you need to capture the new DataFrame that drop() returns. You can do this by assigning the result to a new variable, like df_new = df.drop(index=1). This is the safest way to preserve your original data while working with a modified version.
Alternatively, you can modify the DataFrame directly by setting the inplace=True parameter. Just be careful—this approach permanently alters your data, so it's best used when you're certain you won't need the original rows.
Incorrectly chaining multiple boolean conditions
It's a common mistake to forget parentheses when chaining multiple conditions with operators like & or |. Due to Python's operator precedence, this will raise an error. The code below shows what happens when you try to filter without them.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]})
# This will fail due to operator precedence
filtered_df = df[df['A'] > 1 & df['B'] < 8]
print(filtered_df)
The code fails because Python evaluates the bitwise & operator before the comparison >. It incorrectly tries to compute 1 & df['B'], which is an invalid operation. Check the example below for the correct syntax.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]})
# Use parentheses to correctly group conditions
filtered_df = df[(df['A'] > 1) & (df['B'] < 8)]
print(filtered_df)
The fix is to wrap each condition in parentheses. This forces Python to evaluate each comparison, such as df['A'] > 1, before combining them with the & operator. Without the parentheses, Python’s operator precedence rules cause it to incorrectly evaluate 1 & df['B'] first, which triggers an error. It's a simple but crucial habit to adopt whenever you're filtering a DataFrame with multiple conditions to ensure your logic works as intended.
Not handling NaN values when filtering rows
Missing values, or NaNs, can cause silent errors when you're filtering. Because any comparison with a NaN value evaluates to False, rows with missing data often get excluded from your results without any warning, which can skew your analysis.
The code below demonstrates how a simple filter like df['A'] > 1 unintentionally drops a row containing a NaN value, even though the condition isn't directly targeting it.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, 2, np.nan, 4], 'B': [5, 6, 7, 8]})
# This will exclude the row with NaN
filtered_df = df[df['A'] > 1]
print(filtered_df)
The filter df['A'] > 1 drops the row containing NaN because the comparison evaluates to False. This silently removes the row, which can skew your results. The code below demonstrates how to handle this correctly.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, 2, np.nan, 4], 'B': [5, 6, 7, 8]})
# Explicitly handle NaN values
filtered_df = df[(df['A'] > 1) | df['A'].isna()]
print(filtered_df)
To prevent NaN values from being silently dropped, you must explicitly account for them. The solution combines your original condition, df['A'] > 1, with a check for missing values using the | (or) operator and the isna() method.
This tells pandas to keep rows that either meet the primary condition or contain a NaN value in that column. Always watch for this when your filtering logic must preserve rows with incomplete data.
Real-world applications
These filtering skills are crucial for real-world tasks, from removing outliers in sales data to analyzing specific time periods.
Removing outliers from sales data
Outliers like an unusually large sale can distort your analysis, so filtering them out is a common way to keep your sales metrics accurate.
import pandas as pd
# Sample sales dataset with outliers
sales = pd.DataFrame({
'product': ['A', 'B', 'C', 'D', 'E'],
'units_sold': [150, 200, 1500, 140, 170]
})
# Remove outliers (values over 1000)
sales_clean = sales[sales['units_sold'] <= 1000]
print(sales_clean)
This example shows how to remove rows using a numerical condition. The code creates a boolean mask with the expression sales['units_sold'] <= 1000. This operation checks each value in the units_sold column.
- Rows that satisfy the condition are marked
True, while others are markedFalse.
Pandas then uses this mask to select only the rows marked True. The final result is a new DataFrame, sales_clean, which no longer contains the row with 1500 units sold.
Filtering time series data with timestamp conditions
You can filter time series data by combining date and time conditions with other data validation checks, which is useful for cleaning sensor logs or financial records.
import pandas as pd
# Sample temperature readings dataset
readings = pd.DataFrame({
'timestamp': pd.date_range('2023-01-01', periods=5, freq='H'),
'sensor': ['A', 'A', 'B', 'B', 'A'],
'temperature': [22.1, 22.5, -15.0, 21.8, 22.3]
})
# Remove anomalous readings and filter by time
valid_readings = readings[
(readings['temperature'] > 0) &
(readings['timestamp'] < '2023-01-01 03:00:00')
]
print(valid_readings)
This example demonstrates how to filter a DataFrame using multiple conditions on different data types. The code keeps only the rows that meet two specific criteria, which are combined using the & operator.
- The first condition,
readings['temperature'] > 0, removes anomalous readings by filtering out any negative temperatures. - The second condition,
readings['timestamp'] < '2023-01-01 03:00:00', narrows the dataset to a specific time window.
Only rows where both conditions are met are included in the final valid_readings DataFrame.
Get started with Replit
Turn what you've learned into a real tool. Tell Replit Agent to "build a utility that cleans CSVs by removing duplicate rows" or "create a dashboard that filters out entries with missing data".
It writes the code, tests for errors, and deploys the app directly from your prompt. Start building with Replit.
Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.
Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.


.png)
.png)