How to use MongoDB in Python

Learn how to use MongoDB with Python. This guide covers various methods, tips, real-world applications, and common error debugging.

How to use MongoDB in Python
Published on: 
Tue
Mar 10, 2026
Updated on: 
Wed
Apr 1, 2026
The Replit Team

MongoDB is a popular NoSQL database that pairs powerfully with Python for flexible data management. Its document-oriented structure makes it a great choice for modern applications that require scalability and speed.

In this article, you'll explore key techniques to connect Python with MongoDB. You'll find practical tips, real-world applications, and debugging advice to help you build robust, data-driven projects with confidence.

Connect to MongoDB with PyMongo

# Install pymongo if not already installed
# pip install pymongo

from pymongo import MongoClient

client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']

print("Connected to MongoDB successfully!")--OUTPUT--Connected to MongoDB successfully!

To begin, you import MongoClient from the PyMongo library, which is the primary tool for establishing a connection. You then create a client instance by passing it a connection string. The string 'mongodb://localhost:27017/' tells PyMongo to connect to a MongoDB server running on your local machine (localhost) at the default port, 27017.

Once connected, you can select a database and a collection using dictionary-style syntax, like client['mydatabase']. A key feature of MongoDB is its flexibility; if the database or collection you specify doesn't exist, it will be created automatically the first time you insert data.

Basic MongoDB operations

With the connection established, you can now interact with your data by creating, querying, updating, and deleting documents in your collection.

Create and insert documents with insert_one() and insert_many()

from pymongo import MongoClient

client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']

result = collection.insert_one({"name": "John", "age": 30})
print(f"Inserted document ID: {result.inserted_id}")

customers = [{"name": "Alice", "age": 25}, {"name": "Bob", "age": 35}]
results = collection.insert_many(customers)
print(f"Number of documents inserted: {len(results.inserted_ids)}")--OUTPUT--Inserted document ID: 64a7f8d93652e845c2d8f1a2
Number of documents inserted: 2

To add data, you use the insert_one() method for a single document or insert_many() for multiple. These methods take Python dictionaries as arguments, which become documents in your collection. MongoDB automatically assigns a unique _id to each new document.

  • The insert_one() method returns an object containing the new document's unique inserted_id.
  • For bulk operations, insert_many() accepts a list of dictionaries and returns an object with a list of all the new inserted_ids.

Query documents with find() and find_one()

from pymongo import MongoClient

client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']

doc = collection.find_one({"name": "John"})
print(f"Found document: {doc}")

cursor = collection.find({"age": {"$gt": 25}})
for customer in cursor:
print(f"{customer['name']}: {customer['age']}")--OUTPUT--Found document: {'_id': 64a7f8d93652e845c2d8f1a2, 'name': 'John', 'age': 30}
John: 30
Bob: 35

Retrieving data is straightforward with PyMongo's query methods. You can fetch a single document using find_one(), which returns the first entry matching your query. If no document is found, it returns None.

  • To get multiple documents, use the find() method. This returns a cursor—an iterable object you can loop through to process each result.
  • Queries can be simple key-value matches or use advanced operators. For example, {"age": {"$gt": 25}} uses the $gt operator to find all customers older than 25.

Update and delete documents with MongoDB

from pymongo import MongoClient

client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']

update_result = collection.update_one(
{"name": "John"},
{"$set": {"age": 31}}
)
print(f"Modified {update_result.modified_count} document")

delete_result = collection.delete_one({"name": "Alice"})
print(f"Deleted {delete_result.deleted_count} document")--OUTPUT--Modified 1 document
Deleted 1 document

To modify existing data, you use the update_one() method. It requires two arguments: a query to locate the document and an update document using the $set operator to specify which fields to change.

  • For removing data, delete_one() finds and deletes the first document that matches your query.
  • Both methods return result objects that confirm how many documents were affected via the modified_count and deleted_count properties, respectively.

Advanced MongoDB techniques

Moving beyond simple data manipulation, you can unlock deeper insights and greater control over your data with aggregation, indexing, and schema validation.

Use MongoDB aggregation pipeline for data analysis

from pymongo import MongoClient

client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']

pipeline = [
{"$match": {"age": {"$gt": 25}}},
{"$group": {"_id": None, "avg_age": {"$avg": "$age"}, "count": {"$sum": 1}}},
{"$project": {"_id": 0, "avg_age": 1, "count": 1}}
]

result = list(collection.aggregate(pipeline))
print(f"Aggregation result: {result[0]}")--OUTPUT--Aggregation result: {'avg_age': 33.0, 'count': 2}

The aggregation framework lets you perform complex data processing directly in the database. You define a pipeline—a series of stages—that transforms your data step-by-step. Each stage takes the output from the previous one, allowing for sophisticated analysis.

  • The $match stage acts as a filter, selecting only documents where the age is greater than 25.
  • The $group stage then takes those documents and calculates the average age using $avg and the total count with $sum.
  • Finally, $project reshapes the output, removing the default _id and including only the final avg_age and count fields.

Improve query performance with MongoDB indexes

from pymongo import MongoClient, ASCENDING

client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']

# Create an index on the name field
collection.create_index([("name", ASCENDING)])

# List all indexes in the collection
indexes = list(collection.list_indexes())
print("Collection indexes:")
for index in indexes:
print(f" - {index['name']}: {index['key']}")

# Use explain to see how a query uses the index
explanation = collection.find({"name": "John"}).explain()
print(f"Query uses index: {explanation['queryPlanner']['winningPlan']['inputStage']['indexName']}")--OUTPUT--Collection indexes:
- _id_: SON([('_id', 1)])
- name_1: SON([('name', 1)])
Query uses index: name_1

Indexes are a game-changer for performance, especially as your dataset grows. Using create_index() on a field like name builds a special data structure that makes lookups incredibly fast. Think of it like the index in a book—it lets MongoDB jump straight to the right data instead of reading every page.

  • You can confirm your index exists by calling list_indexes(), which shows all active indexes for the collection.
  • To see if your query is actually using the index, the explain() method is your best friend. It reveals the query plan, confirming that MongoDB used the index for a faster search.

Implement MongoDB schema validation

from pymongo import MongoClient

client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']

# Create collection with validation
validator = {
"$jsonSchema": {
"bsonType": "object",
"required": ["name", "email"],
"properties": {
"name": {"bsonType": "string"},
"email": {"bsonType": "string", "pattern": "^.+@.+$"}
}
}
}

if "validated_customers" in db.list_collection_names():
db.validated_customers.drop()

db.create_collection("validated_customers", validator=validator)
validated_coll = db.validated_customers

# Insert valid document
valid_doc = {"name": "John", "email": "[email protected]"}
result = validated_coll.insert_one(valid_doc)
print(f"Valid document inserted with ID: {result.inserted_id}")--OUTPUT--Valid document inserted with ID: 64a7f8d93652e845c2d8f1a6

While MongoDB is known for its flexibility, schema validation is a powerful feature for enforcing data consistency. You define a validator object using $jsonSchema to set the rules for your documents. This validator is then applied when you create a collection with create_collection().

  • The required key lists all mandatory fields, such as name and email in this example.
  • The properties key defines data types (bsonType) and can even use a pattern to check formats, ensuring emails are structured correctly.

This approach ensures that only documents matching your schema can be inserted, protecting your data's integrity at the database level.

Move faster with Replit

Replit is an AI-powered development platform that comes with all Python dependencies pre-installed. This lets you skip the setup and start coding instantly, without worrying about environment configuration.

Instead of piecing together individual techniques, you can use Agent 4 to build a complete application from a simple description. It helps you move from learning methods like find() and aggregate() to building real products. For example, you could create:

  • A customer management dashboard that uses find() to search and display user profiles from your database.
  • An analytics tool that runs an aggregation pipeline to calculate key metrics, like average user age or total sign-ups per day.
  • A data entry utility that enforces a schema to validate new records before using insert_one() to add them to a collection.

Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.

Common errors and challenges

Even with the right tools, you might run into a few common hurdles when integrating Python and MongoDB.

Working with MongoDB ObjectId in queries

One of the most frequent mix-ups involves MongoDB's unique _id field. It's not a simple string but a special ObjectId type. If you try to query for a document using just the string version of its ID, your query will come up empty.

  • To fix this, you need to import ObjectId from the bson.objectid library.
  • Then, wrap the ID string in the ObjectId() constructor within your query, like find_one({"_id": ObjectId("your_id_string")}). This ensures MongoDB can correctly locate the document.

Handling connection errors with try-except

Your application can't always assume a successful database connection. The server might be down, or credentials could be wrong. Without proper error handling, a connection failure will crash your script.

The standard Pythonic way to manage this is with a try-except block. By wrapping your connection logic and catching specific exceptions like pymongo.errors.ConnectionFailure, you can build more resilient applications that handle interruptions gracefully.

Resolving data type mismatches in MongoDB queries

MongoDB is precise about data types during queries. A common mistake is searching for a number where a string is stored, or vice versa. For example, a query for {"age": 30} will not find a document where the age is stored as "30".

This strictness means you need to be consistent with your data types. If your queries aren't returning the results you expect, double-check that the data types in your query match the types stored in the database. This is where schema validation, discussed earlier, can be a lifesaver.

Working with MongoDB ObjectId in queries

A frequent stumbling block is treating MongoDB's unique _id as a simple string. It's actually a special ObjectId type, and this distinction matters. If you query for a document using its ID as a string, the database won't find it. The code below shows this exact problem in action.

from pymongo import MongoClient

client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']

# This won't work - string ID won't match ObjectId
document_id = "64a7f8d93652e845c2d8f1a2"
result = collection.find_one({"_id": document_id})
print(f"Document found: {result}")

The code returns None because it passes a raw string to the _id field. MongoDB’s find_one() method requires a specific object type for this field, so the query finds no match. See the corrected version below.

from pymongo import MongoClient
from bson.objectid import ObjectId

client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']

# Convert string ID to ObjectId for correct matching
document_id = "64a7f8d93652e845c2d8f1a2"
result = collection.find_one({"_id": ObjectId(document_id)})
print(f"Document found: {result}")

The solution is to convert the string ID into a MongoDB ObjectId. By importing ObjectId from bson.objectid and wrapping the ID string in your query—{"_id": ObjectId(document_id)}—you provide the exact data type MongoDB needs to locate the document. This mismatch often occurs when you use an ID that has been passed through your application as a string, for example, from a URL parameter. This small change makes your query successful.

Handling connection errors with try-except

A resilient application must anticipate that a database connection isn't guaranteed. If your MongoDB server is offline or unreachable, any script that tries to connect without safeguards will immediately crash. The following code shows what happens in this exact scenario.

from pymongo import MongoClient

client = MongoClient('mongodb://localhost:27017/', serverSelectionTimeoutMS=1000)
db = client['mydatabase']
collection = db['customers']

# The program will crash if MongoDB is not running
print("Connected to MongoDB successfully!")

This code tries to connect but has no fallback for when the server is down. PyMongo raises an exception that crashes the script. The corrected version below shows how to build a more resilient connection.

from pymongo import MongoClient
from pymongo.errors import ConnectionFailure, ServerSelectionTimeoutError

try:
client = MongoClient('mongodb://localhost:27017/', serverSelectionTimeoutMS=1000)
# Force a connection to verify it works
client.admin.command('ping')
print("Connected to MongoDB successfully!")
except (ConnectionFailure, ServerSelectionTimeoutError) as e:
print(f"MongoDB connection error: {e}")

To prevent crashes from a downed server, wrap your connection logic in a try-except block. This allows your application to handle interruptions gracefully instead of stopping unexpectedly. This approach is essential for building robust applications that can report an error and continue running.

  • The code catches specific errors like ConnectionFailure.
  • It uses client.admin.command('ping') to actively verify the connection, since PyMongo connects lazily by default.

Resolving data type mismatches in MongoDB queries

MongoDB is precise about data types, a detail that often trips developers up. A query for a number using a string value will fail, even if the characters look the same, returning no results. The code below demonstrates this common mistake in action.

from pymongo import MongoClient

client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']

# The age stored as integer but queried as string
# This won't match any documents
result = collection.find_one({"age": "30"})
print(f"Customer found: {result}")

The query returns None because it searches for the string "30", but the database stores the age as an integer. This data type mismatch causes the search to fail. See the corrected version below for the solution.

from pymongo import MongoClient

client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']

# Use the correct data type (integer) in the query
result = collection.find_one({"age": 30})
print(f"Customer found: {result}")

The fix is simple but crucial: your query must use the correct data type. The corrected code works because it searches for the integer 30, which matches how the age is stored in the database. The previous attempt failed because it looked for the string "30". Keep an eye out for this when handling user input or API data, as values often arrive as strings by default. Always ensure your query types match your schema.

Real-world applications

Putting these concepts into practice helps you build powerful features, from logging user activity to running complex sales analytics.

Log user activity in a web application with datetime

MongoDB's native support for date objects makes it simple to log events with Python's datetime module, allowing you to perform time-based queries efficiently.

from pymongo import MongoClient
from datetime import datetime

client = MongoClient('mongodb://localhost:27017/')
db = client['webapp']
logs = db['user_activity']

# Log a user login event
log_entry = {
"user_id": "user123",
"action": "login",
"timestamp": datetime.now(),
"ip_address": "192.168.1.1"
}
logs.insert_one(log_entry)

# Query recent activity for a specific user
recent_activity = list(logs.find(
{"user_id": "user123"},
{"action": 1, "timestamp": 1, "_id": 0}
).sort("timestamp", -1).limit(5))
print(recent_activity)

This code demonstrates how to create and retrieve time-sensitive data. First, it builds a log entry as a Python dictionary, using datetime.now() to capture the current time. This entry is then saved to the database with insert_one().

The second part shows a powerful query chain to fetch recent activity. It uses:

  • find() to filter for a specific user and project only the needed fields.
  • sort() to arrange the logs from newest to oldest.
  • limit() to retrieve just the five most recent entries.

Implement data aggregation for sales analytics with $match and $group

MongoDB's aggregation pipeline is ideal for sales analytics, allowing you to filter transactions with $match and then use $group to calculate key metrics like total revenue and average price per category.

from pymongo import MongoClient
from bson.son import SON

client = MongoClient('mongodb://localhost:27017/')
db = client['store']
sales = db['transactions']

# Insert sample sales data
sales.insert_many([
{"product": "laptop", "category": "electronics", "price": 1200, "date": datetime(2023, 7, 1)},
{"product": "phone", "category": "electronics", "price": 800, "date": datetime(2023, 7, 2)},
{"product": "desk", "category": "furniture", "price": 350, "date": datetime(2023, 7, 3)},
{"product": "chair", "category": "furniture", "price": 150, "date": datetime(2023, 7, 3)}
])

# Analyze sales by category with sum and average
pipeline = [
{"$group": {
"_id": "$category",
"total_sales": {"$sum": "$price"},
"avg_price": {"$avg": "$price"},
"count": {"$sum": 1}
}},
{"$sort": SON([("total_sales", -1)])}
]
results = list(sales.aggregate(pipeline))
for result in results:
print(f"{result['_id']}: ${result['total_sales']} ({result['count']} items, avg: ${result['avg_price']:.2f})")

This code uses an aggregation pipeline to analyze sales data directly in the database. The aggregate() method processes documents through a series of stages to produce a summarized result.

  • The $group stage first organizes all transactions by their category. It then calculates the total sales, average price, and item count for each group using operators like $sum and $avg.
  • Next, the $sort stage arranges the resulting categories by total_sales in descending order, showing the most profitable ones first.

Get started with Replit

Turn your knowledge into a real tool with Replit Agent. Describe what you want to build, like “a sales dashboard that aggregates transactions by category” or “a utility to log and retrieve recent user activity.”

Replit Agent writes the code, tests for errors, and helps you deploy your application. It handles the heavy lifting so you can focus on your idea. Start building with Replit.

Build your first app today

Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.

Get started for free

Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.