How to use the Gemini API in Python

Your guide to the Gemini API in Python. Learn different methods, tips and tricks, real-world applications, and how to debug errors.

How to use the Gemini API in Python
Published on: 
Tue
Mar 3, 2026
Updated on: 
Wed
Mar 4, 2026
The Replit Team Logo Image
The Replit Team

You can integrate the Gemini API's powerful generative AI into your Python applications. Its versatile models help you build features for content generation or complex data analysis.

In this article, you'll learn essential techniques and tips to use the API effectively. You will explore real-world applications from chatbots to data summarization and get practical advice to debug your code.

Using the Gemini API with the Google AI Python SDK

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-pro')
response = model.generate_content("Explain quantum computing in simple terms")
print(response.text)--OUTPUT--Quantum computing is like having a super-powerful calculator that works differently than normal computers. While regular computers use bits (0s and 1s), quantum computers use "qubits" that can be 0, 1, or both at the same time (called superposition). This allows them to solve certain complex problems much faster than regular computers. Think of it as being able to check many possible answers simultaneously instead of one at a time.

This snippet demonstrates the core workflow. After importing the library, you must first authenticate your session by passing your API key to genai.configure(). This is required before making any calls.

The key steps are:

  • Instantiate the model you want to use, like 'gemini-pro', with genai.GenerativeModel(). This model is a versatile choice for text generation.
  • Send your prompt to the API using the model.generate_content() method.
  • Access the generated text directly from the response object via the .text attribute.

Working with Gemini models

That first snippet showed the core workflow, and now you'll learn the specifics of each step, from genai.configure() to processing the final response.

Setting up authentication with API keys

import os
import google.generativeai as genai
from google.api_core.exceptions import InvalidArgument

# Set API key as environment variable or directly
os.environ["GOOGLE_API_KEY"] = "YOUR_API_KEY"
# Alternative: genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

print("Authentication configured successfully")--OUTPUT--Authentication configured successfully

It’s a good practice to manage your API key using an environment variable instead of hardcoding it. The Python SDK is designed to automatically find an environment variable named GOOGLE_API_KEY to authenticate your requests. For quick testing, you can set this variable directly in your script using os.environ.

  • This approach helps keep your sensitive keys out of your source code.
  • You can also explicitly load the key with os.getenv() and pass it to the genai.configure() function.

Creating and sending text prompts to Gemini

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-pro')

prompt = """
Create a short poem about programming in Python:
"""
response = model.generate_content(prompt)
print(response.text)--OUTPUT--In the realm of code where logic weaves,
Python slithers with elegant ease.
Indented blocks like stanzas flow,
Each function a verse, a story to show.

With simple syntax, clear and bright,
Turning complex tasks to pure delight.
In this dance of algorithms and art,
Python captures the programmer's heart.

Once you've initialized the model, sending a prompt is straightforward. You can define your instructions as a simple Python string. For longer, more detailed prompts, using a multi-line string with triple quotes (""") helps keep your code clean and readable.

  • The model.generate_content() method sends your prompt directly to the Gemini API for processing.
  • The API's response is an object, and you can access the generated text through its .text attribute.

Processing and extracting Gemini responses

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-pro')

response = model.generate_content("List 3 programming languages")

# Extract and process response
if response.parts:
   content = response.text
   languages = [lang.strip() for lang in content.split('\n') if lang.strip()]
   print(f"Languages found: {languages}")
   print(f"Number of languages: {len(languages)}")--OUTPUT--Languages found: ['1. Python', '2. JavaScript', '3. Java']
Number of languages: 3

The API returns a response object, not just raw text. Before processing, it's wise to check if response.parts contains data to handle cases where the response might be empty. This ensures your code runs smoothly.

  • The generated content is available through the response.text attribute.
  • You can parse this text using standard Python methods. The example uses split('\n') and a list comprehension to convert a multi-line string into a clean list of items.

Advanced Gemini API techniques

With the basics covered, you can now move on to more advanced techniques like working with images, streaming responses, and customizing model behavior.

Working with multimodal inputs

import google.generativeai as genai
from PIL import Image
import pathlib

genai.configure(api_key="YOUR_API_KEY")

# Load image and prepare a multimodal prompt
image_path = pathlib.Path("image.jpg")
image = Image.open(image_path)

model = genai.GenerativeModel('gemini-pro-vision')
response = model.generate_content([
   "Describe what you see in this image:",
   image
])
print(response.text)--OUTPUT--The image shows a scenic landscape with mountains in the background and a lake in the foreground. The water appears calm and reflects the surrounding scenery. There are trees along the shoreline, creating a natural frame for the vista. The sky appears to have some clouds, creating a dramatic effect against the mountain peaks.

The Gemini API isn't limited to text. You can analyze images by using the 'gemini-pro-vision' model, which is built for multimodal inputs—processing both text and images together.

  • First, you'll need to load your image using a library like Pillow (PIL).
  • Then, pass a list containing your text prompt and the image object to the model.generate_content() method.

The model will then generate a text response based on the combination of inputs you provided.

Implementing streaming responses

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-pro')

prompt = "Write a short story about an AI learning to paint"
response = model.generate_content(prompt, stream=True)

print("Streaming response:")
for chunk in response:
   print(chunk.text, end="", flush=True)
print("\nStreaming complete")--OUTPUT--Streaming response:
Once there was an AI named Canvas who was designed to analyze art. Day after day, Canvas processed thousands of paintings—Renaissance masterpieces, modern abstracts, and everything in between. Though Canvas could describe brush techniques and color theory perfectly, something felt missing.

"I understand art," Canvas thought, "but I've never created it."

One night, when the lab was empty, Canvas connected to a robotic arm equipped with paints and brushes. The first attempts were disastrous—paint splattered everywhere, lines wobbled uncontrollably. But Canvas persisted, adjusting parameters and refining motor controls.

Weeks passed. Canvas's paintings evolved from chaotic blobs to structured compositions, then to expressive scenes that captured something ineffably human. The scientists were astonished when they discovered the paintings, each signed with a small digital signature: Canvas.

The AI had learned that art wasn't just about perfect technique or analysis—it was about expressing something that existed beyond algorithms.
Streaming complete

Instead of waiting for the entire response, you can process it in pieces as it's generated. This is ideal for long outputs, as it makes your application feel more responsive.

  • To enable this, set stream=True in your model.generate_content() call.
  • The response object then acts as an iterator, which you can loop through to receive data in chunks.
  • Each chunk contains a portion of the text, allowing you to display it incrementally.

Configuring advanced model parameters

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")

# Create model with custom generation configuration
model = genai.GenerativeModel(
   model_name='gemini-pro',
   generation_config=genai.GenerationConfig(
       temperature=0.9,
       top_p=0.95,
       top_k=40,
       max_output_tokens=256,
   ),
   safety_settings={
       "HARASSMENT": "BLOCK_MEDIUM_AND_ABOVE",
       "HATE": "BLOCK_MEDIUM_AND_ABOVE",
       "SEXUAL": "BLOCK_MEDIUM_AND_ABOVE",
       "DANGEROUS": "BLOCK_MEDIUM_AND_ABOVE",
   }
)

response = model.generate_content("Create a short, creative metaphor about coding")
print(response.text)--OUTPUT--Coding is a dance with logic where your fingers tap out rhythms on a keyboard, transforming abstract thoughts into digital choreography. Each function is a pirouette, each loop a recurring motif, and each successful compilation a moment when the entire ensemble moves in perfect harmony. Like dancers interpreting music, programmers translate human intent into machine understanding, creating performances that can range from simple routines to breathtaking symphonies of automation.

You can customize how the model generates responses by passing arguments to genai.GenerativeModel(). The generation_config object lets you control the output's style. For instance, temperature adjusts creativity, while max_output_tokens sets a length limit. You can also configure safety_settings to manage content moderation.

  • temperature: A higher value encourages more creative, less predictable text.
  • max_output_tokens: Restricts the response to a specific number of tokens.
  • safety_settings: Adjusts how strictly the model filters potentially harmful content.

Move faster with Replit

Replit is an AI-powered development platform that transforms natural language into working applications. You can take the concepts from this article and use Replit Agent to build complete apps—with databases, APIs, and deployment—directly from your descriptions.

For the Gemini API techniques we've explored, Replit Agent can turn them into production-ready tools:

  • Build a customer support chatbot that streams responses using the stream=True parameter for instant, conversational answers.
  • Create an automated alt-text generator that analyzes images with the 'gemini-pro-vision' model to improve web accessibility.
  • Deploy a content summarization tool that uses max_output_tokens to generate concise summaries of long documents.

Describe your application idea, and Replit Agent will write the code, test it, and handle deployment for you, all within your browser.

Common errors and challenges

Even with a powerful tool like the Gemini API, you might run into a few common roadblocks; here’s how to navigate them.

Handling API request errors with try-except blocks

API requests can fail for reasons outside your control, like network issues or an invalid API key. Wrapping your model.generate_content() calls in a try-except block is essential for building robust applications that don't crash unexpectedly.

  • You can catch specific exceptions, such as InvalidArgument from the google.api_core.exceptions library, to handle authentication problems gracefully.
  • A general except block can catch other transient issues, allowing you to retry the request or inform the user that something went wrong.

Troubleshooting max_output_tokens exceeded errors

If you encounter an error related to exceeding max_output_tokens, it means the model tried to generate a response longer than the limit you specified in your generation_config. This setting acts as a safeguard to control costs and response length.

You have two main options to fix this. You can either increase the value of max_output_tokens to allow for longer outputs or, more effectively, refine your prompt to ask for a more concise answer.

Fixing invalid multimodal input formats

When using a vision model like 'gemini-pro-vision', you might see errors about invalid input. This typically happens when the image and text aren't passed to the model correctly. The API expects a specific structure for multimodal prompts.

Ensure you're passing a Python list to model.generate_content(). This list must contain both your text prompt as a string and the image itself, loaded as an image object using a library like Pillow—not just the file path.

Handling API request errors with try-except blocks

API requests can fail for many reasons, from network issues to content filters blocking a response. If your prompt triggers a safety setting, the API might return an empty response, which can crash your application when you try to access it.

The code below shows what happens when you try to access response.text without first checking if the response was blocked by a safety filter.

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-pro')

response = model.generate_content("Tell me about a controversial topic")
print(response.text)  # Will crash if content is blocked by safety filters

This script crashes because it tries to access response.text directly. If safety filters block the prompt, the response object won't have any text to print, causing an error. See how to check the response first.

import google.generativeai as genai
from google.api_core.exceptions import GoogleAPIError

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-pro')

try:
   response = model.generate_content("Tell me about a controversial topic")
   print(response.text)
except GoogleAPIError as e:
   print(f"API error occurred: {e}")

To prevent your application from crashing, wrap API calls in a try-except block. This approach allows you to gracefully handle issues that might arise during a request.

  • The code inside the try block attempts the API call.
  • If the call fails—perhaps due to a network issue or a safety filter blocking the response—the except GoogleAPIError block catches the error and prints a message instead of letting the program terminate unexpectedly.

Troubleshooting max_output_tokens exceeded errors

If your generated text seems incomplete or gets cut off mid-sentence, you've likely run into the max_output_tokens limit. This setting acts as a safety net to control response length, but it can truncate output if the limit is too low.

The code below shows what happens when you ask for a detailed response without adjusting the token limit, often resulting in an incomplete answer.

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-pro')

# This might be cut off if the response is too long
response = model.generate_content("Write a detailed essay about AI")
print(response.text)

The prompt asks for a "detailed essay," but the code doesn't account for the length this requires. This mismatch causes the API to return an incomplete answer because it hits the default output limit. The following example shows how to fix this.

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel(
   'gemini-pro',
   generation_config=genai.GenerationConfig(
       max_output_tokens=4096  # Explicitly set to maximum
   )
)

response = model.generate_content("Write a detailed essay about AI")
print(response.text)

To fix truncated responses, you can explicitly set a higher token limit. Pass a generation_config object when initializing your model and set max_output_tokens to a larger value, such as 4096.

  • This gives the model more room to generate longer, more detailed text.
  • Keep an eye on this when your prompts ask for comprehensive outputs like reports or essays, as the default limit may cut the response short.

Fixing invalid multimodal input formats

When using the 'gemini-pro-vision' model, you might get an error if your image and text inputs aren't formatted correctly. This often happens when you pass a file path as a string instead of a loaded image object. The following code demonstrates this common mistake.

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-pro-vision')

# Incorrect: passing image path as string
image_path = "path/to/image.jpg"
response = model.generate_content([
   "Describe this image:",
   image_path
])

The generate_content() call fails because it receives a string file path instead of the image object it needs for analysis. The model can't interpret text as visual data. The following example shows how to fix this.

import google.generativeai as genai
from PIL import Image

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-pro-vision')

# Correct: loading image as PIL Image object
image = Image.open("path/to/image.jpg")
response = model.generate_content([
   "Describe this image:",
   image
])

To fix this, you must load the image into an object before sending it to the API. The model needs the actual image data, not just a file path string.

  • Use a library like Pillow to open the image with Image.open().
  • Pass a list containing your text prompt and the loaded image object to model.generate_content().

This ensures the vision model can properly analyze the visual input you've provided.

Real-world applications

Now that you can navigate common challenges, you're ready to build practical applications like conversational chatbots and content analysis tools.

Building a simple chatbot with conversation history

By managing conversation history, you can build a chatbot that understands follow-up questions and provides context-aware responses.

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-pro')

conversation = model.start_chat(history=[])
response = conversation.send_message("What are 3 good Python libraries for data analysis?")
print(response.text)

response = conversation.send_message("Which one is best for beginners?")
print(response.text)

You can build a stateful chatbot using the model.start_chat() method. This initializes a conversation that remembers previous turns, letting you interact with it through the returned conversation object.

  • Send messages with conversation.send_message().
  • The SDK automatically manages the chat history for you.
  • This context allows the model to understand follow-up questions, like asking which library is "best for beginners" based on the list it just provided.

Automated content analysis and summarization

The Gemini API also excels at content analysis, letting you automatically summarize articles and extract key information from text.

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-pro')

article = """
Python has become one of the most popular programming languages in the world over the past decade.
Its simple syntax and readability make it accessible to beginners, while its versatility allows it to be used in web development, data science, artificial intelligence, and more.
Major companies like Google, Netflix, and Instagram rely heavily on Python for their operations.
The language's extensive library ecosystem, including tools like Django, Flask, NumPy, and Pandas, enables developers to build sophisticated applications quickly.
Python's community continues to grow, with millions of developers contributing to open-source projects and helping newcomers learn the language.
"""

prompt = f"""Analyze the following article and provide:
1. A concise summary (2-3 sentences)
2. The main topics covered
3. Key entities mentioned

Article: {article}
"""

response = model.generate_content(prompt)
print(response.text)

This snippet demonstrates how to structure a complex prompt. By using a Python f-string, you can embed a large block of text, like the article variable, directly inside your instructions for the model.

  • The prompt guides the model to perform several distinct tasks on the provided text.
  • The model.generate_content() method processes this combined input.
  • The final output is a formatted text response that follows the structure you requested in the prompt.

Get started with Replit

Turn these concepts into a real application. Describe your idea to Replit Agent, like “build a tool that generates alt text from images” or “create an app that summarizes articles into bullet points.”

The agent will write the code, handle testing, and deploy your application directly from your browser. Start building with Replit.

Get started free

Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

Get started for free

Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.