How to use speech recognition in Python

Learn how to use speech recognition in Python. This guide covers different methods, tips, real-world applications, and how to debug common errors.

How to use speech recognition in Python
Published on: 
Wed
Mar 25, 2026
Updated on: 
Fri
Mar 27, 2026
The Replit Team

Speech recognition lets your Python applications understand spoken commands and convert audio to text. This technology opens up new possibilities for user interaction and data input, all with a few lines of code.

Here, you'll learn essential techniques and practical tips to implement speech recognition. You'll explore real-world applications and find advice to debug common issues, so you can build robust voice-enabled features.

Basic speech recognition with SpeechRecognition

import speech_recognition as sr
recognizer = sr.Recognizer()
with sr.Microphone() as source:
   print("Say something:")
   audio = recognizer.listen(source)
   text = recognizer.recognize_google(audio)
   print(f"You said: {text}")--OUTPUT--Say something:
You said: hello world

This code snippet hinges on the Recognizer class, which acts as the central hub for speech recognition. After creating an instance, you use it to process audio from a source.

The with sr.Microphone() as source: block is key—it properly opens and closes the microphone for you. Inside, recognizer.listen() captures the audio. The most important step is recognizer.recognize_google(), which sends the audio to Google's servers for transcription, making complex speech-to-text accessible with a single function call.

Basic speech recognition techniques

Now that you've seen the basics, you can expand your skills by fine-tuning microphone capture, transcribing pre-recorded audio files, and handling common recognition errors.

Capturing audio from your microphone with recognize_google()

import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
   r.adjust_for_ambient_noise(source)
   audio = r.listen(source, timeout=5)
text = r.recognize_google(audio)
print(f"Recognized text: {text}")--OUTPUT--Recognized text: this is a test sentence

This snippet refines the capture process for better accuracy and control. It introduces two key adjustments to make your application more robust:

  • The adjust_for_ambient_noise() function calibrates the recognizer by listening to background sounds. This helps it better distinguish your voice from ambient noise.
  • Adding the timeout=5 parameter to listen() stops the recording after five seconds of silence. This prevents your program from getting stuck waiting for input that may never come.

Converting audio files to text

import speech_recognition as sr
recognizer = sr.Recognizer()
with sr.AudioFile('example.wav') as source:
   audio_data = recognizer.record(source)
   text = recognizer.recognize_google(audio_data)
   print(f"Content: {text}")--OUTPUT--Content: this is an example audio file for speech recognition

Beyond live microphone input, you can also transcribe existing audio files. This approach is perfect for processing recordings you already have. The main difference is using sr.AudioFile() to specify the path to your file instead of capturing from a microphone.

  • The recognizer.record(source) function reads the entire audio file into memory.
  • From there, you use recognizer.recognize_google() just as you would with live audio to convert the recorded data into text.

Handling recognition exceptions

import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
   audio = r.listen(source)
try:
   text = r.recognize_google(audio)
   print(f"You said: {text}")
except sr.UnknownValueError:
   print("Could not understand audio")
except sr.RequestError:
   print("API unavailable")--OUTPUT--You said: testing speech recognition

Sometimes, speech recognition doesn't work perfectly. Wrapping your recognize_google() call in a try...except block prevents your app from crashing when things go wrong. This code catches two common errors:

  • sr.UnknownValueError: This error occurs when the API can't understand the audio you provided.
  • sr.RequestError: This happens if there's a problem connecting to the Google Speech Recognition service, like a network issue.

Advanced speech recognition techniques

Once you're comfortable handling basic recognition and errors, you can unlock more power by switching engines, implementing continuous listening, and customizing language options.

Using alternative speech recognition engines

import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
   audio = r.listen(source)
# Try with different recognition settings
text1 = r.recognize_google(audio)
text2 = r.recognize_google(audio, language="en-GB")
print(f"US English: {text1}")
print(f"British English: {text2}")--OUTPUT--US English: hello python speech recognition
British English: hello python speech recognition

The recognize_google() function is more flexible than it first appears. You can customize its behavior by passing optional arguments to improve accuracy for different speakers.

  • The language parameter tells the API which language or dialect to expect.

In this example, the code processes the same audio twice. The first call uses the default American English model, while the second specifies "en-GB" for British English. This simple change can significantly improve recognition for users with different accents.

Implementing continuous speech recognition

import speech_recognition as sr
import time
r = sr.Recognizer()
with sr.Microphone() as source:
   r.adjust_for_ambient_noise(source)
   for _ in range(3):  # Listen 3 times
       audio = r.listen(source)
       text = r.recognize_google(audio)
       print(f"Recognized: {text}")
       time.sleep(1)--OUTPUT--Recognized: what time is it
Recognized: open browser
Recognized: hello world

To create a continuous listening experience, you can wrap the recognition logic in a loop. This example uses a for loop to listen and transcribe three separate times, making it feel like an ongoing conversation with your application. This approach is ideal for building simple voice command systems.

  • On each pass, r.listen() captures a new audio segment.
  • The r.recognize_google() function processes that specific segment into text.
  • time.sleep(1) introduces a one-second pause between recognition attempts, giving you a moment between commands.

Customizing recognition with language parameters

import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
   audio = r.listen(source)
# Specify language and show confidence scores
result = r.recognize_google(audio, language="en-US", show_all=True)
print(f"Full result: {result}")--OUTPUT--Full result: {'alternative': [{'transcript': 'hello world', 'confidence': 0.98762346}, {'transcript': 'helo world'}, {'transcript': 'hello word'}], 'final': True}

Beyond just specifying a language, you can get more detailed feedback from the engine by setting show_all=True in your recognize_google() call. Instead of just a string, this returns a dictionary containing multiple transcription possibilities. This is useful for understanding how confident the API is in its primary guess.

  • The 'alternative' key holds a list of possible transcriptions.
  • The first item is the most likely result and often includes a 'confidence' score, a float indicating the API's certainty.

Move faster with Replit

Replit is an AI-powered development platform that transforms natural language into working applications. Describe what you want to build, and Replit Agent creates it—complete with databases, APIs, and deployment.

For the speech recognition techniques you've learned, Replit Agent can turn them into production applications:

  • Build a voice-activated calculator that processes spoken equations using recognize_google().
  • Create a transcription utility that converts audio files to text and uses exception handling to manage unclear speech.
  • Deploy a simple voice command system that uses continuous listening to trigger specific actions.

Describe your app idea, and Replit Agent writes the code, tests it, and fixes issues automatically, all in your browser. Try Replit Agent to turn your concept into a working application.

Common errors and challenges

When using speech recognition in Python, you'll likely encounter a few common hurdles, from microphone access to API connection failures.

  • Handling microphone access errors with Microphone(): Your application can't listen if it can't access a microphone. This error often happens when no microphone is connected or your operating system hasn't granted the necessary permissions. Always check that your hardware is properly set up and that your app has permission to access it.
  • Fixing speech timeout issues with listen(): The listen() function can cause your program to hang if it waits indefinitely for speech. While the timeout parameter helps, setting it incorrectly can either cut off users mid-sentence or make the app feel unresponsive. Finding the right balance is key to a smooth user experience.
  • Dealing with API connection issues in recognize_google(): Since recognize_google() depends on an internet connection to an external service, it can fail if the network is down or the API is unavailable. Wrapping your recognition calls in a try...except block to catch a RequestError allows you to handle these situations gracefully instead of crashing.

Handling microphone access errors with Microphone()

A frequent first hurdle is when your application can't find a microphone. This usually happens if no device is connected or if your system permissions are blocking access. The Microphone() class will raise an error, stopping your code cold.

The following code snippet demonstrates what happens when you run the basic recognition script without a properly configured microphone, resulting in an AttributeError.

import speech_recognition as sr
recognizer = sr.Recognizer()
with sr.Microphone() as source:
   print("Say something:")
   audio = recognizer.listen(source)
   text = recognizer.recognize_google(audio)
   print(f"You said: {text}")

The script crashes because sr.Microphone() raises an AttributeError when no device is found. A more robust approach handles this potential failure before attempting to listen. See how this is done in the following code.

import speech_recognition as sr
recognizer = sr.Recognizer()
try:
   with sr.Microphone() as source:
       print("Say something:")
       audio = recognizer.listen(source)
       text = recognizer.recognize_google(audio)
       print(f"You said: {text}")
except OSError:
   print("Error: Could not access the microphone. Check your device.")

To prevent your app from crashing when a microphone isn't available, wrap the with sr.Microphone() as source: block inside a try...except OSError:. This approach catches the error that occurs when no audio device is detected. Instead of an abrupt crash, your program can now display a helpful message, guiding the user to check their hardware. This is a crucial check to perform whenever your application initializes audio input.

Fixing speech timeout issues with listen()

Your app can feel unresponsive when the listen() function waits indefinitely for someone to speak. It's a common issue that leaves the program stuck, creating a poor user experience. The following code demonstrates what happens without a timeout.

import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
   print("Say something:")
   audio = r.listen(source)  # Might wait indefinitely
   text = r.recognize_google(audio)
   print(f"You said: {text}")

Without a timeout, the listen() function waits for speech that may never come, freezing the program. This leaves the user with no way to proceed. The following code demonstrates how to manage this behavior.

import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
   print("Say something (you have 5 seconds):")
   try:
       audio = r.listen(source, timeout=5, phrase_time_limit=5)
       text = r.recognize_google(audio)
       print(f"You said: {text}")
   except sr.WaitTimeoutError:
       print("Timeout: No speech detected within time limit.")

This solution prevents your app from hanging by setting limits within the listen() function and catching potential errors. The timeout=5 parameter stops the listener after five seconds of silence, while phrase_time_limit=5 caps the recording duration itself.

By wrapping the call in a try...except sr.WaitTimeoutError: block, you can gracefully handle cases where no speech is detected. This approach provides clear feedback instead of letting the program freeze, which is essential for creating a responsive user experience.

Dealing with API connection issues in recognize_google()

The recognize_google() function is powerful, but its reliance on an internet connection is a potential weak point. Without a network, the function can't reach Google's servers, causing an error that will halt your program. The following code demonstrates this abrupt crash.

import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
   audio = r.listen(source)
text = r.recognize_google(audio)  # Will raise exception if internet is down
print(f"You said: {text}")

Because the call to recognize_google() isn't wrapped in any error handling, a simple network issue will cause an unhandled exception and crash the program. The code below shows how to build a more resilient application.

import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
   audio = r.listen(source)
try:
   text = r.recognize_google(audio)
   print(f"You said: {text}")
except sr.RequestError as e:
   print(f"Could not request results; {e}")
   print("Check your internet connection and try again.")

This solution prevents your app from crashing due to network issues. By wrapping the recognize_google() call in a try...except sr.RequestError block, you can gracefully handle connection failures. Instead of an unhandled exception, the user gets a helpful message suggesting they check their internet. It’s a crucial pattern to use whenever your code relies on an external web service, as it keeps your application running even when the network is down.

Real-world applications

Now that you can handle the core techniques and common errors, you're ready to build practical, voice-enabled applications.

Creating a simple voice command system with recognize_google()

You can use the text from recognize_google() to check for specific commands and trigger corresponding actions, like opening a web browser.

import speech_recognition as sr
import webbrowser

r = sr.Recognizer()
with sr.Microphone() as source:
   print("Say a command:")
   audio = r.listen(source)
command = r.recognize_google(audio)
if "open browser" in command.lower():
   print(f"You said: {command}")
   webbrowser.open("https://www.google.com")

This script demonstrates how to translate a spoken phrase into a system action. After capturing and transcribing your speech, it checks the resulting text for a specific trigger phrase.

  • The recognized text is converted to lowercase with .lower(), making the check case-insensitive.
  • An if statement then determines if the substring "open browser" is present within your command.
  • If the phrase is found, the script calls webbrowser.open() to launch Google in your default browser, linking your voice command to a tangible outcome.

Building a multilingual voice translator with SpeechRecognition

You can also combine SpeechRecognition with a translation library to capture spoken words in one language and output the text in another.

import speech_recognition as sr
from deep_translator import GoogleTranslator

r = sr.Recognizer()
with sr.Microphone() as source:
   print("Say something in English:")
   audio = r.listen(source)
text = r.recognize_google(audio, language="en-US")
translation = GoogleTranslator(source='en', target='es').translate(text)
print(f"English: {text}")
print(f"Spanish: {translation}")

This script pipes the output from one library directly into another to create a real-time translator. It first captures your voice with r.listen() and transcribes it to English text using r.recognize_google(), specifying the language as en-US for accuracy.

  • The transcribed text is then passed to an instance of GoogleTranslator from the deep_translator library.
  • You configure the translator by setting the source to 'en' and the target to 'es'.
  • Finally, the .translate() method performs the conversion, and the script prints both the original and translated strings.

Get started with Replit

Turn your knowledge into a real application. Describe your idea to Replit Agent, like “build a voice calculator that solves spoken math problems” or “create a tool that transcribes audio files to text.”

The agent writes the code, tests for errors, and deploys your app from a single prompt. Start building with Replit.

Get started free

Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

Get started for free

Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.