Image2Text AI App with Anthropic

Guide overview #
This guide provides a step-by-step walkthrough to build an Image2Text AI app. The app allows a user to upload an image, and Anthropic's Claude Model will extract the text. Your final deployed app will look like this:

Getting started #
Start by forking this template. Click "Use template" and label your project.
Add Anthropic API key to Secrets #
First, we need to create an Anthropic API key. Go to the dashboard and create an API key. API Keys are sensitive information, so we want to store them securely.
In the bottom-left corner of the Replit workspace, there is a section called "Tools." Select "Secrets" within the Tools pane, and add your Anthropic API Key to the Secret labeled "ANTHROPIC_API_KEY."
Import necessary modules and set up the Anthropic client #
Begin by importing the necessary modules and setting up the Anthropic client with your API key. Paste the following code to your main.py file:
python
1
2
3
4
5
6
7
import base64
import os
import anthropic
import gradio as gr
from ratelimit import RateLimitException, limits
client = anthropic.Anthropic(api_key=os.environ['ANTHROPIC_API_KEY'])
Here's a quick overview of the packages we are using:
- The base64 module is used for encoding images
- os for accessing environment variables
- anthropic for interacting with the Anthropic API
- gradio for creating the web interface
- ratelimit for handling rate limiting
Building the application #
We will now create utility functions. Utility functions are pre-defined processed that the Large Language Model (LLM) can use. We will create two:
- image_to_base64 for converting images to base64 encoding
- get_media_type for determining the media type of the uploaded image based on its file extension.
Add this code to the bottom of main.py:
python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
def image_to_base64(image_path):
"""Convert the image to base64."""
with open(image_path, "rb") as image_file:
image_data = image_file.read()
return base64.b64encode(image_data).decode("utf-8")
def get_media_type(image_name):
"""Get the media type of the uploaded image based on its file extension."""
if image_name.lower().endswith(".jpg") or image_name.lower().endswith(".jpeg"):
return "image/jpeg"
elif image_name.lower().endswith(".png"):
return "image/png"
else:
raise ValueError(f"Unsupported image format: {image_name}")
These functions are used to prepare the image data for sending to the Anthropic API.
Implement the image-to-text functionality with rate limiting #
We now want to create a function that converts the image to text. We also want to create rate limiting. Expensive queries like this can be abused, which leads to large overages. Rate limiting is a way to make sure the request is denied if a user is abusing the service.
The image_to_text function takes the base64-encoded image and media type as input, sends it to the Anthropic API along with a prompt, and extracts the text from the response.
Add this code to the bottom of main.py:
python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
@limits(calls=5, period=1800)
def image_to_text(b64_img, media_type):
prompt = """
The following image may contain text. There may be multiple parts of the image where text is present, possibly in different sizes and fonts return ALL TEXT. Un-obstruct text if it is covered by something, to make it readable. Interpret the text in the image as written and return ALL TEXT found, even smaller text, starting with the ★ symbol. Do not return any output other than the text that's in the image. If no text can be found, return "No text found. Examples: ★NO PARKING violators may be towed at the owner's expense, ★No text found.
"""
message = client.messages.create(model="claude-3-sonnet-20240229", max_tokens=1000, messages=[{
"role": "user",
"content": [{
"type": "image",
"source": {
"type": "base64",
"media_type": media_type,
"data": b64_img,
},
}, {
"type": "text",
"text": prompt
}],
}])
return message.content[0].text.split('★')[1]
The @limits decorator is used to limit the number of calls to this function to 5 calls every 30 minutes (1800 seconds) to avoid exceeding the API rate limits.
Create the Gradio web interface #
Create the Gradio web interface using the gr.Blocks and gr.Interface components. Define the app function that handles the image upload, calls the image_to_text function, and returns the extracted text.
Add this code to the bottom of main.py:
python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
js = "<script>console.log('Hello, World!');</script>"
def app(img):
b64 = image_to_base64(img)
media_type = get_media_type(img)
try:
msg = image_to_text(b64, media_type)
except RateLimitException:
msg = "Rate limit exceeded. Please try again later."
return msg
with gr.Blocks(title="Image2Text", head=js) as demo:
with gr.Row():
gr.Markdown("""
# 🚀 Image2Text
This is an image to text demo built and deployed on [Replit](https://replit.com?utm_source=matt&utm_medium=twitter&utm_campaign=image2text) with [Anthropic](https://anthropic.com). The entire demo was built in under 30 minutes using a combination of Replit's ModelFarm, Gradio, and Anthropic's Python SDK. Upload or paste an image below to get started.
""")
gr.Interface(
fn=app,
inputs=gr.Image(type="filepath", label="👨🎨 Add your image", sources=['upload', 'clipboard']),
outputs="text",
)
demo.launch(favicon_path="assets/favicon.png")
The gr.Blocks component is used to create the overall structure of the web interface, and gr.Interface is used to define the input (image upload) and output (extracted text) components. The demo.launch() function is called to start the web interface.
Deploy your project #
Your Webview uses a replit.dev URL. The development URL is good for rapid iteration, but it will stop working shortly after you close Replit. To hava a permanent URL that you can share, you need to Deploy your project.

Click "Deploy" in the top-right of the Replit Workspace, and you will see the Deployment pane open. For more information on our Deployment types, check our documentation.
Once your application is deployed, you will received a replit.app domain that you can share.
What next #
Check out other guides on our guides page. Share your deployed URL with us on social media, so we can amplify.