In this blog post tutorial, we’ll learn how to take notes in Python using our voice. This means we can take an audio file and use AI speech-to-text to transcribe it. One could imagine dozens of scenarios where this could be helpful: from capturing the content of voice memos to providing a tidy written recap of a meeting to folks who couldn't attend.

Getting transcriptions out of these recordings is a pretty straightforward process. This project builds on Deepgram's speech-to-text APIs, which deliver high-quality AI-generated transcripts from both real-time streaming and batch processing pre-recorded audio sources. The project we'll do in this tutorial works with pre-recorded audio files.

Let’s walk through step-by-step taking notes with the voice in Python.

A Learn-by-Doing Speech AI Project in Python

Here’s a list of what we’ll cover in this project:

  • Step 1 - Getting Started with Deepgram Speech-to-Text Python SDK

  • Step 2 - Useful Speech-to-Text Features for Taking Voice Notes in Python

  • Step 3 - Setup Your Python Project 

  • Step 4 - Install Your Python Libraries and Packages using pip

  • Step 5 - How to Upload the Audio File in Python with Voice 

  • Step 6 - Using Speech-to-Text Features to Enhance Notetaking with Voice in Python

  • Final Step - Run the Python Voice Note-Taking Project and Export the Results

Step 1 - Getting Started with Deepgram Speech-to-Text Python SDK

Deepgram has a Python SDK that we can tap into that’s located on Github. We’ll also need to get started with an API key which we can grab in Console, a game-like hub in Deepgram to try the different types of transcriptions in many coding languages, including Python. When you first sign up, you'll get $150 in API credits to try out Deepgram's speech AI capabilities.

Step 2 - Useful Speech-to-Text Features for Taking Voice Notes in Python

Our project, taking notes with voice in Python, will use the Deepgram speech-to-text transcription API and some of its more advanced capabilities to enhance our voice notes. Here are the following features we’ll use along with transcribing audio:

  • Diarization - Recognizes multiple people speaking and assigns a speaker to each word in the transcript.

  • Summarization - Summarize sections of the transcript so that you can quickly scan it.

We’ll see in a few sections how to easily implement these features in our Python project.

Step 3 - Setup Your Python Project

There are a few items we need to set up before we begin coding. I’m using Python3.10 for our project but any version equal to or higher than Python 3.7 will work. Create a folder directory anywhere on your computer, let’s call it voice-notes-with-python

Then, open that same directory in a code editor like Visual Studio.

Next, create a virtual environment. This ensures our Python libraries get installed in that project and not system wide. Make sure we’re in the correct project directory and run these quick commands from the terminal to create the Python virtual environment and activate it:

python3 -m venv venv
source venv/bin/activate

Finally, let’s create a Python file inside our directory called take_voice_notes.py.

Step 4 - Install Your Python Libraries and Packages using pip

Now we are ready to install Deepgram using pip. Make sure your virtual environment is activated and run the following command:

pip install deepgram-sdk

This allows us to use the Deepgram speech-to-text Python SDK for transcription, and tap into the features we mentioned earlier. 

To verify that Deepgram was installed correctly, from the terminal type:

pip freeze

We should see the latest version of Deepgram from PyPI is installed and ready for use.

Step 5 - How to Transcribe the Audio File in Python with Voice

We’ll use Deepgram’s prerecorded transcription for this taking notes with voice Python project. This type of transcription is used to transcribe an audio file, either locally on your drive or by hosting it online. In this tutorial, we’ll transcribe audio using a local but this AI speech recognition provider, it’s very simple to do both. Let’s see how we transcribe an audio file either as a local download or an online file.

Transcribe a Local Audio File with Python

from deepgram import Deepgram
import json

DEEPGRAM_API_KEY = ‘YOUR_API_KEY_GOES_HERE’

PATH_TO_FILE = 'some/file.wav'
def main():
    # Initializes the Deepgram SDK
    deepgram = Deepgram(DEEPGRAM_API_KEY)

    # Open the audio file
    with open(PATH_TO_FILE, 'rb') as audio:
        # ...or replace mimetype as appropriate
        source = {'buffer': audio, 'mimetype': 'audio/wav'}
        response = deepgram.transcription.sync_prerecorded(source, {'punctuate': True})
        print(json.dumps(response, indent=4))

main()

Transcribe a Hosted Online Audio File with Python

from deepgram import Deepgram
import json

# The API key we created in step 3
DEEPGRAM_API_KEY = ‘YOUR_API_KEY_GOES_HERE’

# Hosted sample file
AUDIO_URL = "{YOUR_URL_TO_HOSTED_ONLINE_AUDIO_GOES_HERE}"

def main():
    # Initializes the Deepgram SDK
    dg_client = Deepgram(YOUR_API_KEY_GOES_HERE)

    source = {'url': AUDIO_URL}
    options = { "punctuate": True, "model": "general", "language": "en-US", "tier": "enhanced" }
    response = dg_client.transcription.sync_prerecorded(source, options)
    print(json.dumps(response, indent=4))

main()

Step 6 - Using Speech-to-Text Features to Enhance Notetaking with Voice in Python

Now that we have an idea of what our Python code looks like, let’s see an example with our diarize and summarization features. In the same function as above, we can just pass in those features to a Python dictionary as keys and set the values to True, like so:

with open(PATH_TO_FILE, 'rb') as audio:
       source = {'buffer': audio, 'mimetype': 'audio/mp3'}
       response = deepgram.transcription.sync_prerecorded(source,
                                                           {'diarize': True,
                                                           'summarize': True}
                                                           )

Final Step - Run the Python Voice Note-Taking Project and Export the Results

We’ve reached the final step! In this step, we need to run the Python project so we can see our JSON response with the transcript split into multiple speakers and summaries.

From our terminal type:

python3 take_voice_notes.py > notes.txt

This runs our project and outputs a file called notes.txt, which is now in our directory. 

Open the file and we see a JSON response that looks like the following, depending on which audio file was transcribed:

"alternatives": [
                    {
                        "transcript": "Hello, and thank you for being in this meeting...",
                        "confidence": 0.9916992,
                        "words": [
                            {
                                "word": "hello",
                                "start": 15.259043,
                                "end": 15.338787,
                                "confidence": 0.95751953,
                                "speaker": 0,
                                "speaker_confidence": 0.76544046,
                                "punctuated_word": "Hello,"
                            },
                            {
                                "word": "and",
                                "start": 15.418532,
                                "end": 15.617893,
                                "confidence": 0.99853516,
                                "speaker": 1,
                                "speaker_confidence": 0.76544046,
                                "punctuated_word": "and"
                            },
                            {
                                "word": "thank",
                                "start": 15.617893,
                                "end": 15.777383,
                                "confidence": 0.9975586,
                                "speaker": 1,
                                "speaker_confidence": 0.76544046,
                                "punctuated_word": "thank"
                            },
                            {
                                "word": "you",
                                "start": 15.777383,
                                "end": 15.9368725,
                                "confidence": 0.9975586,
                                "speaker": 0,
                                "speaker_confidence": 0.76544046,
                                "punctuated_word": "you"
                            },
],
"summaries": [
                            {
                                "summary": "Hello, and thank you for calling premier phone service. Please be aware that this call may be recorded for quality and training purposes. How may I help you today? I'm having some serious problem with my phone. Can you describe in detail for me? What kind of issues you're having with your device? Well, it isn't working.",
                                "start_word": 0,
                                "end_word": 649
                            },
                            {
                                "summary": "My phone won't turn on. I don't know what's wrong. My dad said I should get a new phone, but I didn't listen to him. I also never backed up my photos on the cloud like I know I should.",
                                "start_word": 649,
                                "end_word": 1288
                            },
}
]


We received the transcript, and each word in the transcript gets assigned a speaker and the summaries of the transcript at the end of the response. 

Conclusion of the Python Voice Note-taking Project with Speech Recognition

We’ve learned how to transcribe audio and take notes in voice with Python and an AI speech-to-text provider. 

There are many ways to extend this project by using some of Deepgram's other features like redaction which hides sensitive information like credit card numbers or social security numbers or the search feature which searches a transcript for terms and phrases. For a full list of all the features, please visit this page

If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.

Sign Up FreeBook a Demo