All posts

Automatically Transcribing Podcast Episodes with Pipedream and Python

Kevin Lewis September 9, 2022 in Tutorials

Automatically Transcribing Podcast Episodes with Pipedream and Python

I love podcasts, but I rarely find the time to listen to all the new episodes of the ones I love. However, I often find time to read an article or newsletter so I thought it would be fun to automatically transcribe new episodes and send myself the output via email.

We’ll be using Pipedream - an online workflow builder that integrates with many services while allowing us to write code in Python, JavaScript, Go, and more. Each workflow has one trigger that starts it and any number of actions that can happen as a result.

Before you start, you will need a Pipedream account, a Google account, and a free Deepgram API Key.

Create a new empty Pipedream workflow for this project.

Trigger Your Workflow When New Episodes Are Released

All Podcasts publish via a public RSS feed, which is updated whenever there is a new episode. Pipedream has a built-in trigger which will check every 15 minutes if a new item has been added.

Use the ‘New Item in RSS Feed’ trigger with any podcast RSS feed - you can use https://feeds.buzzsprout.com/1976304.rss, which is for the new Deepgram podcast Voice of the Future.

Trigger configuration showing a feed URL, timer, and an empty 'name' textbox

Click Create Source and select the first event. This is representative of one podcast episode.

Select event shows a dropdown of items. The first reads "Can AI get a read on you"

Once selected, you can see all of the data contained in that RSS feed entry - including the episode’s metadata and direct media link. All of this data can be used in future steps within the workflow.

A large amount of JSON-formatted data abotu the episode. Lightly-highlighted is the direct media URL.

Transcribe Podcast With Python

Create a new step (now an ‘action’) that will be run whenever the workflow is triggered. Pick Python -> Run Python Code. In the Configure section, click Add an App and select Deepgram. Insert your API Key and save it before continuing.

Pipedream step showing a Deepgram account has been configured, with boilerplate code in the editor.

Delete all of the code except def handler(pd: "pipedream"):, which is the required function signature that will be executed when this step is reached in the workflow. Make sure you have indented your code underneath this line. Then, get the URL from the trigger and your Deepgram API Key from the configured app:

url = pd.steps["trigger"]["event"]["enclosures"][0]["url"]
token = pd.inputs["deepgram"]["$auth"]['api_key']

As mentioned above, Pipedream requires the def handler(pd: "pipedream"): signature for the main function in a Python step. Because of this, the asynchronous Deepgram Python SDK won’t be usable in this context. Instead, we’ll directly make an API request with the requests library.

At the very top of your code, add the following line:

import requests

Then, at the bottom of your handler function, prepare your Deepgram API request:

listen = "https://api.deepgram.com/v1/listen?tier=enhanced&punctuate=true&diarize=true&paragraphs=true"
headers = { "Authorization": f"Token {token}" }
json = { "url": url }

This request will use Deepgram’s enhanced tier, diarization (speaker detection), and format the output using punctuation and paragraphs.

Now that you are set up, make the request, extract the formatted response, and return the value:

r = requests.post(listen, json=json, headers=headers)
response = r.json()
transcript = response['results']['channels'][0]['alternatives'][0]['paragraphs']['transcript']
return transcript

Final code:

import requests
def handler(pd: "pipedream"):
  url = pd.steps["trigger"]["event"]["enclosures"][0]["url"]
  token = pd.inputs["deepgram"]["$auth"]['api_key']
  listen = "https://api.deepgram.com/v1/listen?tier=enhanced&punctuate=true&diarize=true&paragraphs=true"
  headers = { "Authorization": f"Token {token}" }
  json = { "url": url }
  r = requests.post(listen, json=json, headers=headers)
  response = r.json()
  transcript = response['results']['channels'][0]['alternatives'][0]['paragraphs']['transcript']
  return transcript

Click Test and the URL from the trigger will be sent to Deepgram, and the returned value will be shown in Pipedream:

A formatted transcript is shown in Pipdream's 'Exports' section for the Python step. In it is a set of paragraphs each starting with ''

Send Email With Transcript

Now a transcript has been automatically generated, you can do anything with it - either through Pipedream’s integrations or by adding another Python step. As mentioned at the start of this post, the outcome of this project is to send yourself an email with the content of the podcast. You can also include variables in the subject line.

Add a Send Email To Self step, and set the subject line to:

New episode of {{steps.trigger.event.meta.title}}: {{steps.trigger.event.title}}

Set the text to:

Episode description: {{steps.trigger.event.description}}\n\n{{steps.python.$return_value}}

It should look like the following:

A send email to self step with a subject line and text containing variables

Test the step, and you should receive an email in just a few seconds. Deploy the workflow, and enjoy reading new podcast episodes. If you have questions about anything in this post, we’d love to hear from you. Head over to our forum and create a new discussion with your questions, or send us a tweet @DeepgramAI

More with these tags:

Share your feedback

Thank you! Can you tell us what you liked about it? (Optional)

Thank you. What could we have done better? (Optional)

We may also want to contact you with updates or questions related to your feedback and our product. If don't mind, you can optionally leave your email address along with your comments.

Thank you!

We appreciate your response.