GuidesBuild an applicationGoogle Gemini AI: Barista Bot

Google Gemini AI: Barista Bot

A guide by

Guide overview #

What will you learn in this guide? #

In this guide, we'll show you how to create an automatic function-calling Barista Bot using the Gemini API's Python SDK to build an agent. You can test the final product here.

This tutorial is based heavily on Google's tutorial, but we'll be using Replit for this one, so you don't have to worry about your environment or configuration.

What is Google Gemini API? #

Google Gemini is the Google suite of Large Language Models (LLMs). Gemini has a tool you can search here, but they also have APIs, so you can use the LLMs to build your own applications.

What is Gradio? #

Gradio is a Python framework that makes it very simple to create chat interfaces for your AI application. We will use Gradio to make your barista bot shareable.

What are agents? #

Agents are applications that use large language models (LLMs) to complete complex tasks.

Think of an agent as a smart assistant where the LLM acts as the brain, directing various operations to meet a user's request.

To work effectively, LLM agents may use additional modules like planning, memory, and tools to help them achieve their goals. In simple terms, these agents combine the thinking power of an LLM with specialized tools to get jobs done.

What are functions? #

In the context of AI, Function calling is when a model has the capacity to execute a function on the user's behalf—like fetching the weather via an API or, in our case, ordering coffee.

With a model like Gemini, we can use functions to enable agentic behavior in our model. Though this is a simple example, our model will operate on an agent loop that interacts with the user to order espresso-based beverages!

Getting started #

Start by opening the Replit tutorial here. Click “Use template”, and then follow along here, or use the “Tutorial” feature within the Repl.

Adding a Google API Key #

You'll need an API key. Head over to Google AI Studio and follow the instructions there.

Next, head over to Replit "Secrets" by typing "Secrets" in the search on the left. Add your key to the GOOGLE_API_KEY secret and save it. Replit secrets are encrypted and a secure place to store your application credentials.

Using Gemini #

Start by opening the file 02_start.py. This can be found in the src folder within the filetree. We will start by importing the generativeai module:

import os
import google.generativeai as genai
GOOGLE_API_KEY = os.getenv('GOOGLE_API_KEY')
model = genai.GenerativeModel('gemini-1.5-pro-latest')

Then define our main function:

if __name__ == "__main__":
  chat = model.start_chat()
  response = chat.send_message(
      "What's fun to do in San Francisco for someone who's just moved here?")
  print(response.text.strip())

Let's run this file and see what happens. Open a pane and type "Shell." In the Shell pane, type:

python src/02_start.py

In a few seconds, output should appear. Nice—you've just called Google's Gemini API.

Now, let's make a barista.

Building the barista bot #

Define a prompt #

We need to tell our AI what its task is! Luckily, we have a prompt for you to use. Within the src folder there is a file labeled coffee_bot_prompt.tx. Give it a read. If you'd like to make any changes, feel free!

To read it in the project, we'll add the lines:

1
2

with open('coffee_bot_prompt.txt', 'r') as f:
  COFFEE_BOT_PROMPT = f.read()

Create tools for the LLM to use #

The barista needs a list of tools to call. Here’s an example of the functions we provide in the src/03_enable_functions.py file:

from random import randint
from typing import Iterable

placed_order = []
order = []


def add_to_order(drink: str, modifiers: Iterable[str] = ()) -> None:
  """Adds the specified drink to the customer's order, including any modifiers."""
  order.append((drink, modifiers))


def get_order() -> Iterable[tuple[str, Iterable[str]]]:
  """Returns the customer's order."""
  return order


def remove_item(n: int) -> str:
  """Remove the nth (one-based) item from the order.

  Returns:
    The item that was removed.
  """
  item, modifiers = order.pop(int(n) - 1)
  return item


def clear_order() -> None:
  """Removes all items from the customer's order."""
  order.clear()


def confirm_order() -> str:
  """Asks the customer if the order is correct.

  Returns:
    The user's free-text response.
  """
  order_str = "Your order:"
  if not order:
    order_str += '\n  (no items)'

  for drink, modifiers in order:
    order_str += f'\n  {drink}'
    if modifiers:
      order_str += f'\n   - {", ".join(modifiers)}'

  order_str += '\nIs this correct?'

  print(order_str)
  return order_str


def place_order() -> int:
  """Submit the order to the kitchen.

  Returns:
    The estimated number of minutes until the order is ready.
  """
  placed_order[:] = order.copy()
  clear_order()

  return randint(1, 10)

Each tool can be found in the code block above as def {function_name}:. This is how you create a function in Python. The tools define a list of actions the barista can take. Here are the tools we gave:

add_to_order - Adds items the customer asks for to the order
get_oder - Lists all of the items in the order
remove_item - removes an item from the customer’s order upon request
clear_order - removes all items from the customer’s order
confirm_order - asks the customer to confirm the items in their order
place_order - submits the order to the kitchen

You can add more tools if you would like. For example, a fun challenge might be seeing if you can get the barista to return a joke! We will test all of these in a few sections.

Give access to the tools #

Now, let's beef up our model. We want the model to use our tools (which are well documented and type-hinted), so in src/04_frontend.py, we pass the function names as a list to the tools argument of the GenerativeModel.

tools=[
    add_to_order, get_order, remove_item,
    clear_order, confirm_order, place_order
]

Next, we pass the system prompt:

system_instruction=COFFEE_BOT_PROMPT

So now we have:

model = genai.GenerativeModel('gemini-1.5-pro-latest',
tools=[
    add_to_order, get_order, remove_item,
    clear_order, confirm_order, place_order
],
system_instruction=COFFEE_BOT_PROMPT)

Building the chat interface #

Now, we need to interface with our model! Let's define our chat interface, which will live at 03_enable_functions.py.

First, we need to start the chat with automatic function calling to enable the barista functions. In 03_enable_functions.py, we will add:

1
2

if __name__ == "__main__":
  chat = model.start_chat(enable_automatic_function_calling=True, )

Great! Now let's add a welcome message right after that:

1
2

print("Welcome to Gemini Coffee, what can I get started for you?")
print('---')

The placed_order variable tracks the status of the order, so while we don't have an order, we'll want to accept user input and return the response.

We can accept user input with the Python input function:

while not placed_order:
  msg = input()
  if len(msg) > 0:
    response = chat.send_message(msg)
    print('---')
    print('Gemini: ' + response.text.strip())
    print('---')
  
  else:
    print("Please enter some input!")

Nice! Now, our inputs will send a message to the chat. Gemini is equipped to build our order thanks to our functions and it will respond accordingly.

Order a coffee #

Head over to src/03_enable_functions.py. You can see that we've defined our functions within the file (along with the variables placed_order and order, which the function use to track state).

Open the Shell and run:

python src/03_enable_functions.py

You should see:

1
2

Welcome to Gemini Coffee, what can I get started for you?
---

You can test any of the tools. Add items to your order. Then ask the barista to list your order. If you need suggestions, check out our prompt file for options.

Add a frontend and deploy #

Our interface is great, but it's not a scalable way to interact with our barista. In the next two steps, we'll add a frontend.

So now we have our barista bot, but it's only in our console. We need a frontend to deploy it to the cloud. A simple, chat-friendly frontend is Gradio.

We'll wrap our code using Gradio syntax (04_frontend.py), so we can deploy it on Replit. You will notice the file has added the following throughout the project:

import gradio as gr - adds the Gradio package
with gr.Blocks() as demo: wraps our application with a Gradio interface
with gr.Column() - Starts a block of code that defines the layout of the interface

For more specifics, I recommend visiting the Gradio quickstart.

First, we'll configure our Repl to run the frontend by default. Head over to the .replit configuration file and change:

run = ["python3", "src/01_coffee_functions.py"]

run = ["python3", "src/04_frontend.py"]

Now, when you click "Run," you'll trigger the app. Click "Run" and open a new "Webview" pane where you can start ordering.

Your application can now be deployed and shared with anyone, but the replit.dev URL is only temporary. To get a permanent project, you need to Deploy.

Go to the top-right of the editor, and click “Deploy”. Follow the steps in the demo above, and within minutes, you application will be on the internet to share with friends.

What's next #

Check out the remainder of our guides at replit.com/guides to see what to build next.