How to use MongoDB in Python
Learn to use MongoDB with Python. This guide covers various methods, tips, real-world applications, and how to debug common errors.

MongoDB is a popular NoSQL database for modern applications. When you pair it with Python, you get a powerful combination for flexible and scalable data management for your projects.
In this article, you'll explore essential techniques to connect Python with MongoDB. We cover practical tips, real-world applications, and common debugging advice to help you master database interactions effectively.
Connect to MongoDB with PyMongo
# Install pymongo if not already installed
# pip install pymongo
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']
print("Connected to MongoDB successfully!")--OUTPUT--Connected to MongoDB successfully!
To start, you create an instance of MongoClient, which acts as the main connection to your MongoDB server. The connection string specifies the server's address—in this case, a local instance running on the default port.
Once connected, you can select a database and a collection using simple bracket notation, like client['mydatabase']. A key feature of MongoDB is that it doesn't actually create the database or the customers collection until you insert the first document. This lazy creation makes development more flexible.
Basic MongoDB operations
With your connection ready, you can now manage your collection's documents using core methods like insert_one(), find(), and others for updating and deleting data.
Create and insert documents with insert_one() and insert_many()
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']
result = collection.insert_one({"name": "John", "age": 30})
print(f"Inserted document ID: {result.inserted_id}")
customers = [{"name": "Alice", "age": 25}, {"name": "Bob", "age": 35}]
results = collection.insert_many(customers)
print(f"Number of documents inserted: {len(results.inserted_ids)}")--OUTPUT--Inserted document ID: 64a7f8d93652e845c2d8f1a2
Number of documents inserted: 2
Adding documents to your collection is straightforward. Since MongoDB documents are structured like Python dictionaries, you simply pass a dictionary to the method.
- Use
insert_one()to add a single document. The method returns the unique_idthat MongoDB automatically assigns upon creation. - For bulk operations, use
insert_many()with a list of dictionaries. This is much more efficient than inserting documents one by one in a loop.
Query documents with find() and find_one()
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']
doc = collection.find_one({"name": "John"})
print(f"Found document: {doc}")
cursor = collection.find({"age": {"$gt": 25}})
for customer in cursor:
print(f"{customer['name']}: {customer['age']}")--OUTPUT--Found document: {'_id': 64a7f8d93652e845c2d8f1a2, 'name': 'John', 'age': 30}
John: 30
Bob: 35
Retrieving data is just as simple. You can fetch documents using a query filter, which is a dictionary specifying the criteria you're looking for.
- Use
find_one()to get the first document that matches your query. It's perfect when you only expect a single result. - For multiple documents,
find()returns aCursorobject. You can then loop through this cursor to access each document that meets your criteria, like finding all customers older than a certain age using the"$gt"operator.
Update and delete documents with MongoDB
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']
update_result = collection.update_one(
{"name": "John"},
{"$set": {"age": 31}}
)
print(f"Modified {update_result.modified_count} document")
delete_result = collection.delete_one({"name": "Alice"})
print(f"Deleted {delete_result.deleted_count} document")--OUTPUT--Modified 1 document
Deleted 1 document
Modifying and removing documents is just as direct. You use a filter to target the documents you want to change and an update operator to specify the modifications.
- The
update_one()method finds the first document matching your filter and applies the changes. It’s crucial to use the$setoperator to modify specific fields without replacing the entire document. - Similarly,
delete_one()removes the first document that matches your query. For bulk operations, you can useupdate_many()anddelete_many().
Both methods return a result object that confirms how many documents were affected, such as modified_count or deleted_count.
Advanced MongoDB techniques
Moving beyond basic document management, you can unlock deeper insights and greater control with tools for aggregation, indexing, and schema validation.
Use MongoDB aggregation pipeline for data analysis
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']
pipeline = [
{"$match": {"age": {"$gt": 25}}},
{"$group": {"_id": None, "avg_age": {"$avg": "$age"}, "count": {"$sum": 1}}},
{"$project": {"_id": 0, "avg_age": 1, "count": 1}}
]
result = list(collection.aggregate(pipeline))
print(f"Aggregation result: {result[0]}")--OUTPUT--Aggregation result: {'avg_age': 33.0, 'count': 2}
The aggregation framework lets you run complex data processing tasks directly in the database. You define a pipeline—a series of stages that transform your data step by step—and pass it to the aggregate() method.
- The
$matchstage filters documents, much like afind()query. Here, it selects customers older than 25. - Next,
$groupaggregates the filtered data. It calculates the average age using$avgand counts the documents with$sum. - Finally,
$projectreshapes the output, removing the default_idand including only the average age and count.
Improve query performance with MongoDB indexes
from pymongo import MongoClient, ASCENDING
client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']
# Create an index on the name field
collection.create_index([("name", ASCENDING)])
# List all indexes in the collection
indexes = list(collection.list_indexes())
print("Collection indexes:")
for index in indexes:
print(f" - {index['name']}: {index['key']}")
# Use explain to see how a query uses the index
explanation = collection.find({"name": "John"}).explain()
print(f"Query uses index: {explanation['queryPlanner']['winningPlan']['inputStage']['indexName']}")--OUTPUT--Collection indexes:
- _id_: SON([('_id', 1)])
- name_1: SON([('name', 1)])
Query uses index: name_1
Indexes are essential for optimizing query performance. Think of them like an index in a book—they help the database find data quickly without scanning every page. Using create_index() on the name field creates a sorted list that makes searches on that field significantly faster.
- You can verify that your index exists by calling
list_indexes(), which shows all indexes on the collection. - The
explain()method provides a detailed query plan, confirming that MongoDB is using your new index to efficiently retrieve documents.
Implement MongoDB schema validation
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
# Create collection with validation
validator = {
"$jsonSchema": {
"bsonType": "object",
"required": ["name", "email"],
"properties": {
"name": {"bsonType": "string"},
"email": {"bsonType": "string", "pattern": "^.+@.+$"}
}
}
}
if "validated_customers" in db.list_collection_names():
db.validated_customers.drop()
db.create_collection("validated_customers", validator=validator)
validated_coll = db.validated_customers
# Insert valid document
valid_doc = {"name": "John", "email": "[email protected]"}
result = validated_coll.insert_one(valid_doc)
print(f"Valid document inserted with ID: {result.inserted_id}")--OUTPUT--Valid document inserted with ID: 64a7f8d93652e845c2d8f1a6
While MongoDB's flexibility is a major advantage, you can enforce a specific document structure using schema validation. You define a validator object with rules using $jsonSchema and apply it when you run create_collection(). This ensures data integrity from the start.
- The schema in the example makes the
nameandemailfields required. - It also validates data types and uses a
patternto check that theemailfield looks like a real email address.
This approach guarantees that every document in the validated_customers collection conforms to your rules, preventing inconsistent or invalid data from being saved.
Move faster with Replit
Replit is an AI-powered development platform that transforms natural language into working applications. You can describe what you want to build, and Replit Agent creates it—complete with databases, APIs, and deployment.
For the MongoDB techniques we've explored, Replit Agent can turn them into production-ready tools:
- Build a customer management dashboard that uses
insert_one()to add new users andfind()to search for them. - Create a real-time analytics tool that leverages the aggregation pipeline to calculate metrics like average user age with
$avg. - Deploy a user registration system that enforces data integrity with schema validation, ensuring every new account has a valid email.
Describe your app idea, and Replit Agent writes the code, tests it, and fixes issues automatically, all in your browser.
Common errors and challenges
Even with its flexibility, you might run into a few common roadblocks when integrating MongoDB into your Python applications.
Working with MongoDB ObjectId in queries
MongoDB automatically assigns a unique _id to every document, but it's not a simple string—it's a special ObjectId type. A frequent mistake is trying to query for a document using its ID as a plain string, which won't return any results because the types don't match.
- To fix this, you need to convert the string ID into a proper
ObjectIdobject before passing it to your query. - First, import
ObjectIdfrom thebson.objectidlibrary, then wrap the ID string like this:find_one({"_id": ObjectId("your_id_string")}). This ensures your query uses the correct data type.
Handling connection errors with try-except
Your application can't always assume the database will be available. Network problems or server downtime can lead to connection failures, which will crash your script if they aren't handled.
The best practice is to wrap your connection code in a try-except block. This allows you to catch exceptions like ConnectionFailure and handle them gracefully—perhaps by logging the error and retrying the connection or notifying the user that the database is temporarily unavailable.
Resolving data type mismatches in MongoDB queries
MongoDB queries are sensitive to data types. If you store a number in a field like age, you must use a number in your query to find it. Searching with a string, such as {"age": "30"}, won't match a document where the age is stored as the integer 30.
This kind of mismatch is a common source of "empty" results that can be tricky to debug. Always double-check that the data types in your query filter match the types of the data stored in your collection to ensure your queries work as expected.
Working with MongoDB ObjectId in queries
Because MongoDB's unique _id is a special ObjectId type, you can't query for it using a plain string. This data type mismatch is a frequent source of empty results. The following code demonstrates what happens when you try it.
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']
# This won't work - string ID won't match ObjectId
document_id = "64a7f8d93652e845c2d8f1a2"
result = collection.find_one({"_id": document_id})
print(f"Document found: {result}")
The query returns None because it tries to match a plain string with MongoDB's special ObjectId type. Since the types are different, the search fails. The following code shows how to properly query by ID.
from pymongo import MongoClient
from bson.objectid import ObjectId
client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']
# Convert string ID to ObjectId for correct matching
document_id = "64a7f8d93652e845c2d8f1a2"
result = collection.find_one({"_id": ObjectId(document_id)})
print(f"Document found: {result}")
The solution is to convert the ID string into a proper ObjectId before querying. You'll need to import ObjectId from the bson.objectid library and then wrap the string ID, like ObjectId("your_id_string"). This ensures the data types match, allowing MongoDB to find the document. Keep an eye out for this when handling IDs from API requests or URL parameters, as they're typically passed as strings.
Handling connection errors with try-except
Database connections can be unreliable. If your MongoDB server is offline, the MongoClient call will fail, causing the application to crash. An unhandled exception is a surefire way to stop everything in its tracks. The code below shows what this looks like.
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/', serverSelectionTimeoutMS=1000)
db = client['mydatabase']
collection = db['customers']
# The program will crash if MongoDB is not running
print("Connected to MongoDB successfully!")
The code attempts a direct connection with MongoClient without any error handling. If the server is down, an unhandled exception stops the script immediately. The next example shows how to catch this error and prevent the crash.
from pymongo import MongoClient
from pymongo.errors import ConnectionFailure, ServerSelectionTimeoutError
try:
client = MongoClient('mongodb://localhost:27017/', serverSelectionTimeoutMS=1000)
# Force a connection to verify it works
client.admin.command('ping')
print("Connected to MongoDB successfully!")
except (ConnectionFailure, ServerSelectionTimeoutError) as e:
print(f"MongoDB connection error: {e}")
To prevent crashes, wrap your connection logic in a try-except block. This lets you catch specific errors like ConnectionFailure and handle them gracefully. The client.admin.command('ping') call actively checks the connection. If it fails, the except block runs, printing an error message instead of stopping your application. This is crucial for building robust applications that can handle unexpected downtime.
Resolving data type mismatches in MongoDB queries
MongoDB queries are sensitive to data types, a detail that can easily trip you up. If you store a field like age as a number but search for it using a string, the query won't find a match. The code below demonstrates this common pitfall.
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']
# The age stored as integer but queried as string
# This won't match any documents
result = collection.find_one({"age": "30"})
print(f"Customer found: {result}")
The query for {"age": "30"} returns nothing because the database stores the age as a number, not a string. This type mismatch causes the search to fail. The following code shows the correct way to query.
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']
# Use the correct data type (integer) in the query
result = collection.find_one({"age": 30})
print(f"Customer found: {result}")
The fix is to ensure your query's data type matches what's stored in the database. The query {"age": 30} works because it uses an integer, correctly matching the document's data. This issue is especially common when handling user input from web forms or APIs, which typically provide data as strings. Always verify and, if necessary, convert data types before querying to prevent your searches from returning empty results unexpectedly.
Real-world applications
These MongoDB techniques translate directly into practical applications, from logging user activity to building powerful sales analytics dashboards.
Log user activity in a web application with datetime
A practical application of MongoDB is creating a logging system where each user action is recorded as a document containing a precise timestamp from Python's datetime module.
from pymongo import MongoClient
from datetime import datetime
client = MongoClient('mongodb://localhost:27017/')
db = client['webapp']
logs = db['user_activity']
# Log a user login event
log_entry = {
"user_id": "user123",
"action": "login",
"timestamp": datetime.now(),
"ip_address": "192.168.1.1"
}
logs.insert_one(log_entry)
# Query recent activity for a specific user
recent_activity = list(logs.find(
{"user_id": "user123"},
{"action": 1, "timestamp": 1, "_id": 0}
).sort("timestamp", -1).limit(5))
print(recent_activity)
This example shows how to handle time-sensitive data. A document is created with a current timestamp using datetime.now() and saved with insert_one().
- The query chains multiple methods to refine the results.
find()filters for a specific user, and a projection is used to return only theactionandtimestampfields. sort("timestamp", -1)arranges the results from newest to oldest.limit(5)then restricts the output to the top five entries, giving you a concise history of recent activity.
Implement data aggregation for sales analytics with $match and $group
You can build a powerful sales analytics report by using the aggregation pipeline to first filter transactions with $match and then summarize the results by category with $group.
from pymongo import MongoClient
from bson.son import SON
client = MongoClient('mongodb://localhost:27017/')
db = client['store']
sales = db['transactions']
# Insert sample sales data
sales.insert_many([
{"product": "laptop", "category": "electronics", "price": 1200, "date": datetime(2023, 7, 1)},
{"product": "phone", "category": "electronics", "price": 800, "date": datetime(2023, 7, 2)},
{"product": "desk", "category": "furniture", "price": 350, "date": datetime(2023, 7, 3)},
{"product": "chair", "category": "furniture", "price": 150, "date": datetime(2023, 7, 3)}
])
# Analyze sales by category with sum and average
pipeline = [
{"$group": {
"_id": "$category",
"total_sales": {"$sum": "$price"},
"avg_price": {"$avg": "$price"},
"count": {"$sum": 1}
}},
{"$sort": SON([("total_sales", -1)])}
]
results = list(sales.aggregate(pipeline))
for result in results:
print(f"{result['_id']}: ${result['total_sales']} ({result['count']} items, avg: ${result['avg_price']:.2f})")
This code uses an aggregation pipeline to analyze sales data directly within MongoDB. The pipeline processes documents in stages, transforming raw transaction data into a summarized report.
- The
$groupstage organizes transactions by theircategory. It then calculates the total sales with$sum, the average price with$avg, and the number of items for each group. - Next, the
$sortstage arranges the resulting categories in descending order based ontotal_sales, putting the most profitable ones first.
This approach is highly efficient for creating on-the-fly analytics from your data.
Get started with Replit
Turn what you’ve learned into a real tool. Describe your idea to Replit Agent, like “a user activity logger that saves events to MongoDB” or “a sales dashboard that aggregates data by category.”
The agent writes the code, tests for errors, and deploys your app automatically. It handles the entire development lifecycle from start to finish. Start building with Replit.
Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.
Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.



