How to use MongoDB in Python
Learn how to use MongoDB with Python. This guide covers various methods, tips, real-world applications, and common error debugging.

MongoDB is a popular NoSQL database that pairs powerfully with Python for flexible data management. Its document-oriented structure makes it a great choice for modern applications that require scalability and speed.
In this article, you'll explore key techniques to connect Python with MongoDB. You'll find practical tips, real-world applications, and debugging advice to help you build robust, data-driven projects with confidence.
Connect to MongoDB with PyMongo
# Install pymongo if not already installed
# pip install pymongo
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']
print("Connected to MongoDB successfully!")--OUTPUT--Connected to MongoDB successfully!
To begin, you import MongoClient from the PyMongo library, which is the primary tool for establishing a connection. You then create a client instance by passing it a connection string. The string 'mongodb://localhost:27017/' tells PyMongo to connect to a MongoDB server running on your local machine (localhost) at the default port, 27017.
Once connected, you can select a database and a collection using dictionary-style syntax, like client['mydatabase']. A key feature of MongoDB is its flexibility; if the database or collection you specify doesn't exist, it will be created automatically the first time you insert data.
Basic MongoDB operations
With the connection established, you can now interact with your data by creating, querying, updating, and deleting documents in your collection.
Create and insert documents with insert_one() and insert_many()
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']
result = collection.insert_one({"name": "John", "age": 30})
print(f"Inserted document ID: {result.inserted_id}")
customers = [{"name": "Alice", "age": 25}, {"name": "Bob", "age": 35}]
results = collection.insert_many(customers)
print(f"Number of documents inserted: {len(results.inserted_ids)}")--OUTPUT--Inserted document ID: 64a7f8d93652e845c2d8f1a2
Number of documents inserted: 2
To add data, you use the insert_one() method for a single document or insert_many() for multiple. These methods take Python dictionaries as arguments, which become documents in your collection. MongoDB automatically assigns a unique _id to each new document.
- The
insert_one()method returns an object containing the new document's uniqueinserted_id. - For bulk operations,
insert_many()accepts a list of dictionaries and returns an object with a list of all the newinserted_ids.
Query documents with find() and find_one()
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']
doc = collection.find_one({"name": "John"})
print(f"Found document: {doc}")
cursor = collection.find({"age": {"$gt": 25}})
for customer in cursor:
print(f"{customer['name']}: {customer['age']}")--OUTPUT--Found document: {'_id': 64a7f8d93652e845c2d8f1a2, 'name': 'John', 'age': 30}
John: 30
Bob: 35
Retrieving data is straightforward with PyMongo's query methods. You can fetch a single document using find_one(), which returns the first entry matching your query. If no document is found, it returns None.
- To get multiple documents, use the
find()method. This returns a cursor—an iterable object you can loop through to process each result. - Queries can be simple key-value matches or use advanced operators. For example,
{"age": {"$gt": 25}}uses the$gtoperator to find all customers older than 25.
Update and delete documents with MongoDB
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']
update_result = collection.update_one(
{"name": "John"},
{"$set": {"age": 31}}
)
print(f"Modified {update_result.modified_count} document")
delete_result = collection.delete_one({"name": "Alice"})
print(f"Deleted {delete_result.deleted_count} document")--OUTPUT--Modified 1 document
Deleted 1 document
To modify existing data, you use the update_one() method. It requires two arguments: a query to locate the document and an update document using the $set operator to specify which fields to change.
- For removing data,
delete_one()finds and deletes the first document that matches your query. - Both methods return result objects that confirm how many documents were affected via the
modified_countanddeleted_countproperties, respectively.
Advanced MongoDB techniques
Moving beyond simple data manipulation, you can unlock deeper insights and greater control over your data with aggregation, indexing, and schema validation.
Use MongoDB aggregation pipeline for data analysis
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']
pipeline = [
{"$match": {"age": {"$gt": 25}}},
{"$group": {"_id": None, "avg_age": {"$avg": "$age"}, "count": {"$sum": 1}}},
{"$project": {"_id": 0, "avg_age": 1, "count": 1}}
]
result = list(collection.aggregate(pipeline))
print(f"Aggregation result: {result[0]}")--OUTPUT--Aggregation result: {'avg_age': 33.0, 'count': 2}
The aggregation framework lets you perform complex data processing directly in the database. You define a pipeline—a series of stages—that transforms your data step-by-step. Each stage takes the output from the previous one, allowing for sophisticated analysis.
- The
$matchstage acts as a filter, selecting only documents where theageis greater than 25. - The
$groupstage then takes those documents and calculates the average age using$avgand the total count with$sum. - Finally,
$projectreshapes the output, removing the default_idand including only the finalavg_ageandcountfields.
Improve query performance with MongoDB indexes
from pymongo import MongoClient, ASCENDING
client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']
# Create an index on the name field
collection.create_index([("name", ASCENDING)])
# List all indexes in the collection
indexes = list(collection.list_indexes())
print("Collection indexes:")
for index in indexes:
print(f" - {index['name']}: {index['key']}")
# Use explain to see how a query uses the index
explanation = collection.find({"name": "John"}).explain()
print(f"Query uses index: {explanation['queryPlanner']['winningPlan']['inputStage']['indexName']}")--OUTPUT--Collection indexes:
- _id_: SON([('_id', 1)])
- name_1: SON([('name', 1)])
Query uses index: name_1
Indexes are a game-changer for performance, especially as your dataset grows. Using create_index() on a field like name builds a special data structure that makes lookups incredibly fast. Think of it like the index in a book—it lets MongoDB jump straight to the right data instead of reading every page.
- You can confirm your index exists by calling
list_indexes(), which shows all active indexes for the collection. - To see if your query is actually using the index, the
explain()method is your best friend. It reveals the query plan, confirming that MongoDB used the index for a faster search.
Implement MongoDB schema validation
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
# Create collection with validation
validator = {
"$jsonSchema": {
"bsonType": "object",
"required": ["name", "email"],
"properties": {
"name": {"bsonType": "string"},
"email": {"bsonType": "string", "pattern": "^.+@.+$"}
}
}
}
if "validated_customers" in db.list_collection_names():
db.validated_customers.drop()
db.create_collection("validated_customers", validator=validator)
validated_coll = db.validated_customers
# Insert valid document
valid_doc = {"name": "John", "email": "[email protected]"}
result = validated_coll.insert_one(valid_doc)
print(f"Valid document inserted with ID: {result.inserted_id}")--OUTPUT--Valid document inserted with ID: 64a7f8d93652e845c2d8f1a6
While MongoDB is known for its flexibility, schema validation is a powerful feature for enforcing data consistency. You define a validator object using $jsonSchema to set the rules for your documents. This validator is then applied when you create a collection with create_collection().
- The
requiredkey lists all mandatory fields, such asnameandemailin this example. - The
propertieskey defines data types (bsonType) and can even use apatternto check formats, ensuring emails are structured correctly.
This approach ensures that only documents matching your schema can be inserted, protecting your data's integrity at the database level.
Move faster with Replit
Replit is an AI-powered development platform that comes with all Python dependencies pre-installed. This lets you skip the setup and start coding instantly, without worrying about environment configuration.
Instead of piecing together individual techniques, you can use Agent 4 to build a complete application from a simple description. It helps you move from learning methods like find() and aggregate() to building real products. For example, you could create:
- A customer management dashboard that uses
find()to search and display user profiles from your database. - An analytics tool that runs an aggregation pipeline to calculate key metrics, like average user age or total sign-ups per day.
- A data entry utility that enforces a schema to validate new records before using
insert_one()to add them to a collection.
Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.
Common errors and challenges
Even with the right tools, you might run into a few common hurdles when integrating Python and MongoDB.
Working with MongoDB ObjectId in queries
One of the most frequent mix-ups involves MongoDB's unique _id field. It's not a simple string but a special ObjectId type. If you try to query for a document using just the string version of its ID, your query will come up empty.
- To fix this, you need to import
ObjectIdfrom thebson.objectidlibrary. - Then, wrap the ID string in the
ObjectId()constructor within your query, likefind_one({"_id": ObjectId("your_id_string")}). This ensures MongoDB can correctly locate the document.
Handling connection errors with try-except
Your application can't always assume a successful database connection. The server might be down, or credentials could be wrong. Without proper error handling, a connection failure will crash your script.
The standard Pythonic way to manage this is with a try-except block. By wrapping your connection logic and catching specific exceptions like pymongo.errors.ConnectionFailure, you can build more resilient applications that handle interruptions gracefully.
Resolving data type mismatches in MongoDB queries
MongoDB is precise about data types during queries. A common mistake is searching for a number where a string is stored, or vice versa. For example, a query for {"age": 30} will not find a document where the age is stored as "30".
This strictness means you need to be consistent with your data types. If your queries aren't returning the results you expect, double-check that the data types in your query match the types stored in the database. This is where schema validation, discussed earlier, can be a lifesaver.
Working with MongoDB ObjectId in queries
A frequent stumbling block is treating MongoDB's unique _id as a simple string. It's actually a special ObjectId type, and this distinction matters. If you query for a document using its ID as a string, the database won't find it. The code below shows this exact problem in action.
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']
# This won't work - string ID won't match ObjectId
document_id = "64a7f8d93652e845c2d8f1a2"
result = collection.find_one({"_id": document_id})
print(f"Document found: {result}")
The code returns None because it passes a raw string to the _id field. MongoDB’s find_one() method requires a specific object type for this field, so the query finds no match. See the corrected version below.
from pymongo import MongoClient
from bson.objectid import ObjectId
client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']
# Convert string ID to ObjectId for correct matching
document_id = "64a7f8d93652e845c2d8f1a2"
result = collection.find_one({"_id": ObjectId(document_id)})
print(f"Document found: {result}")
The solution is to convert the string ID into a MongoDB ObjectId. By importing ObjectId from bson.objectid and wrapping the ID string in your query—{"_id": ObjectId(document_id)}—you provide the exact data type MongoDB needs to locate the document. This mismatch often occurs when you use an ID that has been passed through your application as a string, for example, from a URL parameter. This small change makes your query successful.
Handling connection errors with try-except
A resilient application must anticipate that a database connection isn't guaranteed. If your MongoDB server is offline or unreachable, any script that tries to connect without safeguards will immediately crash. The following code shows what happens in this exact scenario.
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/', serverSelectionTimeoutMS=1000)
db = client['mydatabase']
collection = db['customers']
# The program will crash if MongoDB is not running
print("Connected to MongoDB successfully!")
This code tries to connect but has no fallback for when the server is down. PyMongo raises an exception that crashes the script. The corrected version below shows how to build a more resilient connection.
from pymongo import MongoClient
from pymongo.errors import ConnectionFailure, ServerSelectionTimeoutError
try:
client = MongoClient('mongodb://localhost:27017/', serverSelectionTimeoutMS=1000)
# Force a connection to verify it works
client.admin.command('ping')
print("Connected to MongoDB successfully!")
except (ConnectionFailure, ServerSelectionTimeoutError) as e:
print(f"MongoDB connection error: {e}")
To prevent crashes from a downed server, wrap your connection logic in a try-except block. This allows your application to handle interruptions gracefully instead of stopping unexpectedly. This approach is essential for building robust applications that can report an error and continue running.
- The code catches specific errors like
ConnectionFailure. - It uses
client.admin.command('ping')to actively verify the connection, since PyMongo connects lazily by default.
Resolving data type mismatches in MongoDB queries
MongoDB is precise about data types, a detail that often trips developers up. A query for a number using a string value will fail, even if the characters look the same, returning no results. The code below demonstrates this common mistake in action.
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']
# The age stored as integer but queried as string
# This won't match any documents
result = collection.find_one({"age": "30"})
print(f"Customer found: {result}")
The query returns None because it searches for the string "30", but the database stores the age as an integer. This data type mismatch causes the search to fail. See the corrected version below for the solution.
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
db = client['mydatabase']
collection = db['customers']
# Use the correct data type (integer) in the query
result = collection.find_one({"age": 30})
print(f"Customer found: {result}")
The fix is simple but crucial: your query must use the correct data type. The corrected code works because it searches for the integer 30, which matches how the age is stored in the database. The previous attempt failed because it looked for the string "30". Keep an eye out for this when handling user input or API data, as values often arrive as strings by default. Always ensure your query types match your schema.
Real-world applications
Putting these concepts into practice helps you build powerful features, from logging user activity to running complex sales analytics.
Log user activity in a web application with datetime
MongoDB's native support for date objects makes it simple to log events with Python's datetime module, allowing you to perform time-based queries efficiently.
from pymongo import MongoClient
from datetime import datetime
client = MongoClient('mongodb://localhost:27017/')
db = client['webapp']
logs = db['user_activity']
# Log a user login event
log_entry = {
"user_id": "user123",
"action": "login",
"timestamp": datetime.now(),
"ip_address": "192.168.1.1"
}
logs.insert_one(log_entry)
# Query recent activity for a specific user
recent_activity = list(logs.find(
{"user_id": "user123"},
{"action": 1, "timestamp": 1, "_id": 0}
).sort("timestamp", -1).limit(5))
print(recent_activity)
This code demonstrates how to create and retrieve time-sensitive data. First, it builds a log entry as a Python dictionary, using datetime.now() to capture the current time. This entry is then saved to the database with insert_one().
The second part shows a powerful query chain to fetch recent activity. It uses:
find()to filter for a specific user and project only the needed fields.sort()to arrange the logs from newest to oldest.limit()to retrieve just the five most recent entries.
Implement data aggregation for sales analytics with $match and $group
MongoDB's aggregation pipeline is ideal for sales analytics, allowing you to filter transactions with $match and then use $group to calculate key metrics like total revenue and average price per category.
from pymongo import MongoClient
from bson.son import SON
client = MongoClient('mongodb://localhost:27017/')
db = client['store']
sales = db['transactions']
# Insert sample sales data
sales.insert_many([
{"product": "laptop", "category": "electronics", "price": 1200, "date": datetime(2023, 7, 1)},
{"product": "phone", "category": "electronics", "price": 800, "date": datetime(2023, 7, 2)},
{"product": "desk", "category": "furniture", "price": 350, "date": datetime(2023, 7, 3)},
{"product": "chair", "category": "furniture", "price": 150, "date": datetime(2023, 7, 3)}
])
# Analyze sales by category with sum and average
pipeline = [
{"$group": {
"_id": "$category",
"total_sales": {"$sum": "$price"},
"avg_price": {"$avg": "$price"},
"count": {"$sum": 1}
}},
{"$sort": SON([("total_sales", -1)])}
]
results = list(sales.aggregate(pipeline))
for result in results:
print(f"{result['_id']}: ${result['total_sales']} ({result['count']} items, avg: ${result['avg_price']:.2f})")
This code uses an aggregation pipeline to analyze sales data directly in the database. The aggregate() method processes documents through a series of stages to produce a summarized result.
- The
$groupstage first organizes all transactions by theircategory. It then calculates the total sales, average price, and item count for each group using operators like$sumand$avg. - Next, the
$sortstage arranges the resulting categories bytotal_salesin descending order, showing the most profitable ones first.
Get started with Replit
Turn your knowledge into a real tool with Replit Agent. Describe what you want to build, like “a sales dashboard that aggregates transactions by category” or “a utility to log and retrieve recent user activity.”
Replit Agent writes the code, tests for errors, and helps you deploy your application. It handles the heavy lifting so you can focus on your idea. Start building with Replit.
Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.
Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

.png)
.png)
.png)