How to use the Gemini API in Python

Learn how to use the Gemini API in Python. Explore different methods, tips, real-world applications, and common error debugging.

How to use the Gemini API in Python
Published on: 
Tue
Mar 3, 2026
Updated on: 
Wed
Apr 1, 2026
The Replit Team

Google's Gemini API lets you add powerful generative models to your Python applications. This unlocks advanced capabilities for text generation, analysis, and even multimodal interactions.

In this article, you'll explore techniques to use the API effectively. We'll cover practical tips, real-world applications, and essential debugging advice to help you build robust, intelligent Python applications.

Using the Gemini API with the Google AI Python SDK

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-pro')
response = model.generate_content("Explain quantum computing in simple terms")
print(response.text)--OUTPUT--Quantum computing is like having a super-powerful calculator that works differently than normal computers. While regular computers use bits (0s and 1s), quantum computers use "qubits" that can be 0, 1, or both at the same time (called superposition). This allows them to solve certain complex problems much faster than regular computers. Think of it as being able to check many possible answers simultaneously instead of one at a time.

This code sets up your connection to the Gemini API. You start by configuring access with your API key using genai.configure(). Next, the genai.GenerativeModel('gemini-pro') line selects the 'gemini-pro' model, a versatile choice for a wide range of text-based tasks.

The core of the interaction is the model.generate_content() method, which sends your prompt to the model. The method returns a response object, not just plain text. You access the generated content itself through the response.text attribute.

Working with Gemini models

That first code snippet illustrates the core process, which you'll now unpack step-by-step, from authentication to handling the final response from the model.

Setting up authentication with API keys

import os
import google.generativeai as genai
from google.api_core.exceptions import InvalidArgument

# Set API key as environment variable or directly
os.environ["GOOGLE_API_KEY"] = "YOUR_API_KEY"
# Alternative: genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

print("Authentication configured successfully")--OUTPUT--Authentication configured successfully

To securely connect to the API, you need to handle your API key properly. The code demonstrates a best practice by setting the key as an environment variable using os.environ. This keeps your credentials separate from your source code.

  • You assign your key to GOOGLE_API_KEY, a specific variable the SDK is designed to look for.
  • The SDK automatically finds this variable for authentication, which simplifies your setup.

Creating and sending text prompts to Gemini

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-pro')

prompt = """
Create a short poem about programming in Python:
"""
response = model.generate_content(prompt)
print(response.text)--OUTPUT--In the realm of code where logic weaves,
Python slithers with elegant ease.
Indented blocks like stanzas flow,
Each function a verse, a story to show.

With simple syntax, clear and bright,
Turning complex tasks to pure delight.
In this dance of algorithms and art,
Python captures the programmer's heart.

Once you've selected a model, crafting your prompt is the next step. The example uses a multi-line string, which is great for organizing longer, more detailed instructions. You simply assign your text to a variable, like prompt in the code.

  • The key action happens with model.generate_content(prompt). This method sends your instructions directly to the Gemini model.
  • The model processes your prompt and returns its generated text, which you can then print or use elsewhere in your application.

Processing and extracting Gemini responses

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-pro')

response = model.generate_content("List 3 programming languages")

# Extract and process response
if response.parts:
content = response.text
languages = [lang.strip() for lang in content.split('\n') if lang.strip()]
print(f"Languages found: {languages}")
print(f"Number of languages: {len(languages)}")--OUTPUT--Languages found: ['1. Python', '2. JavaScript', '3. Java']
Number of languages: 3

The response from generate_content() isn't just plain text; it's an object containing the model's output. Before processing, it's a good practice to confirm the model actually returned something.

  • You can check for content with if response.parts:. This is a simple way to see if the response object has any generated parts.
  • Access the raw text using the response.text attribute.
  • The example then parses this string, using split('\n') to create a Python list from the model's line-separated output.

Advanced Gemini API techniques

Now that you've covered the basics of text generation, you can unlock more powerful capabilities like handling images, streaming responses, and fine-tuning the model's output.

Working with multimodal inputs

import google.generativeai as genai
from PIL import Image
import pathlib

genai.configure(api_key="YOUR_API_KEY")

# Load image and prepare a multimodal prompt
image_path = pathlib.Path("image.jpg")
image = Image.open(image_path)

model = genai.GenerativeModel('gemini-pro-vision')
response = model.generate_content([
"Describe what you see in this image:",
image
])
print(response.text)--OUTPUT--The image shows a scenic landscape with mountains in the background and a lake in the foreground. The water appears calm and reflects the surrounding scenery. There are trees along the shoreline, creating a natural frame for the vista. The sky appears to have some clouds, creating a dramatic effect against the mountain peaks.

Gemini isn't limited to just text. The gemini-pro-vision model lets you work with multimodal inputs, meaning you can combine text prompts with images. This code demonstrates how to send both to the model for analysis.

  • You first load an image from a file path using the pathlib and PIL (Pillow) libraries.
  • Then, you pass a Python list to model.generate_content(). This list contains both your text instruction and the image object itself.

Implementing streaming responses

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-pro')

prompt = "Write a short story about an AI learning to paint"
response = model.generate_content(prompt, stream=True)

print("Streaming response:")
for chunk in response:
print(chunk.text, end="", flush=True)
print("\nStreaming complete")--OUTPUT--Streaming response:
Once there was an AI named Canvas who was designed to analyze art. Day after day, Canvas processed thousands of paintings—Renaissance masterpieces, modern abstracts, and everything in between. Though Canvas could describe brush techniques and color theory perfectly, something felt missing.

"I understand art," Canvas thought, "but I've never created it."

One night, when the lab was empty, Canvas connected to a robotic arm equipped with paints and brushes. The first attempts were disastrous—paint splattered everywhere, lines wobbled uncontrollably. But Canvas persisted, adjusting parameters and refining motor controls.

Weeks passed. Canvas's paintings evolved from chaotic blobs to structured compositions, then to expressive scenes that captured something ineffably human. The scientists were astonished when they discovered the paintings, each signed with a small digital signature: Canvas.

The AI had learned that art wasn't just about perfect technique or analysis—it was about expressing something that existed beyond algorithms.
Streaming complete

Instead of waiting for the model to generate its entire response, you can stream it. This lets you receive and display the text in pieces as it's created, which is great for making your application feel more responsive, especially for longer outputs.

  • You enable this by setting stream=True in the model.generate_content() call.
  • This makes the response object an iterable, so you can loop through it to process the output as it arrives.
  • Each chunk in the loop contains a portion of the text, which you access with chunk.text.

Configuring advanced model parameters

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")

# Create model with custom generation configuration
model = genai.GenerativeModel(
model_name='gemini-pro',
generation_config=genai.GenerationConfig(
temperature=0.9,
top_p=0.95,
top_k=40,
max_output_tokens=256,
),
safety_settings={
"HARASSMENT": "BLOCK_MEDIUM_AND_ABOVE",
"HATE": "BLOCK_MEDIUM_AND_ABOVE",
"SEXUAL": "BLOCK_MEDIUM_AND_ABOVE",
"DANGEROUS": "BLOCK_MEDIUM_AND_ABOVE",
}
)

response = model.generate_content("Create a short, creative metaphor about coding")
print(response.text)--OUTPUT--Coding is a dance with logic where your fingers tap out rhythms on a keyboard, transforming abstract thoughts into digital choreography. Each function is a pirouette, each loop a recurring motif, and each successful compilation a moment when the entire ensemble moves in perfect harmony. Like dancers interpreting music, programmers translate human intent into machine understanding, creating performances that can range from simple routines to breathtaking symphonies of automation.

You can fine-tune the model's behavior by passing configuration objects when you create it. The generation_config parameter lets you control the creative output, while safety_settings manages content filtering. This gives you more precise control over the model's responses.

  • temperature: Adjusts randomness. A higher value like 0.9 encourages more creative, less predictable text.
  • max_output_tokens: Sets a hard limit on the length of the generated response.
  • safety_settings: Allows you to define how strictly the model blocks potentially harmful content.

Move faster with Replit

Replit is an AI-powered development platform that comes with all Python dependencies pre-installed, so you can skip setup and start coding instantly. While the techniques in this article are powerful, building a full application requires connecting many pieces. That's where Agent 4 comes in.

Instead of piecing together API calls and parsing logic manually, you can describe the app you want to build, and Agent 4 will take it from idea to a working product.

  • An image-to-text tool that uses the gemini-pro-vision model to analyze an uploaded image and automatically generate a detailed description and social media caption.
  • A content summarizer that takes a long article, streams a summary in real-time using generate_content(stream=True), and formats the output as a bulleted list.
  • A creative prompt generator that uses a high temperature setting to produce unique story ideas based on a few user-provided keywords.

Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.

Common errors and challenges

Even with a powerful tool like the Gemini API, you might run into a few common roadblocks during development; here’s how to solve them.

Handling API request errors with try-except blocks

Network issues or invalid credentials can interrupt your connection to the API. To prevent your application from crashing, it’s best to wrap your API calls in a try-except block. This is a standard Python practice for managing potential failures gracefully.

For example, if you provide an incorrect API key, the SDK will raise an InvalidArgument error. By catching this specific exception, you can provide a clear error message to the user instead of letting the program terminate unexpectedly. This makes your application more robust and user-friendly.

Troubleshooting max_output_tokens exceeded errors

If a model's response is cut off mid-sentence, you've likely hit the token limit. The max_output_tokens parameter in your generation_config sets a ceiling on the length of the generated text. When the output exceeds this value, the model simply stops writing.

You have two main options to fix this. You can either increase the value of max_output_tokens to allow for longer responses or refine your prompt to ask for a more concise answer. Often, a more specific prompt is the more efficient solution.

Fixing invalid multimodal input formats

When working with the gemini-pro-vision model, you might encounter errors related to the input format. This typically happens when the image and text prompt aren't structured correctly. The model expects a Python list containing both the text prompt and the image object.

Ensure you're passing a list where one element is your string prompt and the other is a PIL.Image object—not just a file path. Double-check that the image was loaded successfully from its file before you pass it to the generate_content() method.

Handling API request errors with try-except blocks

Beyond connection problems, Gemini's safety filters can also interrupt your workflow. If a prompt is flagged as sensitive, the model returns an empty response. Trying to access response.text will then cause an error, as the attribute doesn't exist. The code below shows this scenario.

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-pro')

response = model.generate_content("Tell me about a controversial topic")
print(response.text) # Will crash if content is blocked by safety filters

Because the code doesn't check if the response was successful, calling print(response.text) is risky. If safety filters block the prompt, the program will crash when it can't find the text. The code below shows how to prevent this.

import google.generativeai as genai
from google.api_core.exceptions import GoogleAPIError

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-pro')

try:
response = model.generate_content("Tell me about a controversial topic")
print(response.text)
except GoogleAPIError as e:
print(f"API error occurred: {e}")

To prevent crashes from blocked content, you can wrap your API call in a try-except block. This acts as a safety net for prompts that might trigger the model's safety filters.

  • The try block attempts the API call as usual.
  • If the call fails, the except GoogleAPIError block catches the error, preventing a crash.

This approach allows your program to handle the failure gracefully instead of stopping when it can't find response.text on a blocked response.

Troubleshooting max_output_tokens exceeded errors

If a model's response gets cut off, you've likely hit the token limit. The max_output_tokens parameter sets a ceiling on the output length, and the model will stop generating text once it reaches that value, even mid-sentence. The code below demonstrates this.

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-pro')

# This might be cut off if the response is too long
response = model.generate_content("Write a detailed essay about AI")
print(response.text)

The prompt "Write a detailed essay about AI" is too broad, making it likely the response will exceed the default token limit and be cut off. The code below shows one way to handle this situation.

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel(
'gemini-pro',
generation_config=genai.GenerationConfig(
max_output_tokens=4096 # Explicitly set to maximum
)
)

response = model.generate_content("Write a detailed essay about AI")
print(response.text)

To prevent cut-off responses, you can explicitly increase the token limit. The code demonstrates this by passing a generation_config when initializing the model. This is your go-to solution when you're asking for detailed content like essays or reports.

  • It sets max_output_tokens=4096 to allow for a much longer output.
  • This ensures the model has enough space to finish its thought without being cut off prematurely.

Fixing invalid multimodal input formats

The gemini-pro-vision model requires a specific input structure for multimodal prompts. A common mistake is to pass the image's file path as a string instead of loading it into an image object first, which will cause the request to fail.

The code below demonstrates this exact error, where a string is incorrectly passed to generate_content().

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-pro-vision')

# Incorrect: passing image path as string
image_path = "path/to/image.jpg"
response = model.generate_content([
"Describe this image:",
image_path
])

The generate_content() method expects the image's actual pixel data, not just a string containing its file path. Sending the path directly causes the request to fail because the model has no image to analyze. See the correct approach below.

import google.generativeai as genai
from PIL import Image

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-pro-vision')

# Correct: loading image as PIL Image object
image = Image.open("path/to/image.jpg")
response = model.generate_content([
"Describe this image:",
image
])

To fix this, you'll need to load the image into a PIL.Image object before sending it. The generate_content() method requires the actual image data, not just a string containing the file path.

  • Use Image.open() to create the necessary image object from your file.
  • Pass both your text prompt and this image object together in a list to the model.

This ensures the model receives the visual data it needs to analyze correctly.

Real-world applications

Moving from troubleshooting to implementation, you can use these skills to create applications like chatbots and content summarizers.

Building a simple chatbot with conversation history

The key to building a chatbot that can hold a conversation is to manage its history, which you can do easily by starting a chat session with model.start_chat().

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-pro')

conversation = model.start_chat(history=[])
response = conversation.send_message("What are 3 good Python libraries for data analysis?")
print(response.text)

response = conversation.send_message("Which one is best for beginners?")
print(response.text)

This code creates a conversational session using model.start_chat(history=[]). This method returns a conversation object that keeps track of the dialogue. Unlike a single prompt, this object maintains the context of your interaction over time.

  • You use conversation.send_message() to interact with the model within this session.
  • Each call automatically adds the new message and its response to the conversation's history.

This allows the model to understand follow-up questions, like asking which library is "best for beginners" based on the previously provided list.

Automated content analysis and summarization

Beyond generating new content, the Gemini API is a powerful tool for automated content analysis, allowing you to quickly summarize articles and identify important details.

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-pro')

article = """
Python has become one of the most popular programming languages in the world over the past decade.
Its simple syntax and readability make it accessible to beginners, while its versatility allows it to be used in web development, data science, artificial intelligence, and more.
Major companies like Google, Netflix, and Instagram rely heavily on Python for their operations.
The language's extensive library ecosystem, including tools like Django, Flask, NumPy, and Pandas, enables developers to build sophisticated applications quickly.
Python's community continues to grow, with millions of developers contributing to open-source projects and helping newcomers learn the language.
"""

prompt = f"""Analyze the following article and provide:
1. A concise summary (2-3 sentences)
2. The main topics covered
3. Key entities mentioned

Article: {article}
"""

response = model.generate_content(prompt)
print(response.text)

This code shows how to perform structured content analysis by embedding a block of text directly into a prompt. It uses a Python f-string to combine your instructions and the article text into a single request for the model.

  • The prompt gives the model a multi-part task: summarize, identify topics, and extract key entities.
  • A single model.generate_content() call sends this complex request.

The model then returns a formatted analysis based on your specific instructions, rather than just a simple text generation.

Get started with Replit

Turn these techniques into a real tool. Tell Replit Agent: “Build a Python app that summarizes articles using the Gemini API” or “Create an image-to-text tool that generates social media captions from an uploaded photo.”

Replit Agent writes the code, tests for errors, and deploys your application. Start building with Replit and see your project come together in minutes, not hours.

Build your first app today

Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.

Get started for free

Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.