free
hit counters

Using you own audio data to train a custom model

In the Quickstart Guide for the Deepgram MissionControl API, you learned how to train a Custom Model using one of the free training-ready datasets from your MissionControl account.

Now that you’ve had success there, let’s kick things up a notch and build a custom-trained speech model using your unique audio.

In this guide you’ll:

  1. Create a dataset
  2. Add audio files to your dataset
  3. Add labels to your audio files
  4. Train a Custom Model with your datasets

Create a dataset

Assuming you’ve already created a MissionControl account, you can create a new dataset by replacing the USERNAME:PASSWORD with your credentials and submitting the following curl command. Be sure to give your dataset a useful name.

We highly recommend running these requests through jq for easy-to-read outputs.

curl -X POST -u USERNAME:PASSWORD https://missioncontrol.deepgram.com/v1/datasets?name="MyDataset"

In no time, you’ll get back a response telling you your new dataset’s dataset-id.

{
  "id": "dddddddd-1111-2222-3333-444444444444",
  "name": "MyDataset",
  "created": "2020-05-01T23:23:37.708528Z",
  "resource_count": 0,
  "total_duration": 0,
  "status": "UNLABELED",
  "read_only": false
}

Add audio files to your dataset

You’ll notice that the response you received in the last step noted that your dataset has a resource_count of 0. In MissionControl, a resource is made up of an audio file and its corresponding labels. Let’s go ahead and start creating those resources.

To upload an audio file to your dataset use this curl command, being sure to:

  • Swap in your dataset-id
  • Point to the path of your audio file
  • Give your resource a name.

The MissionControl API supports both local and publicly accessible remote file uploads. We accept most audio types.

Uploading a Local File

To upload a file from your computer, run this in a terminal or your favorite API client:

curl -X POST -u USERNAME:PASSWORD --data-binary @path/to/file.wav "https://missioncontrol.deepgram.com/v1/resources?name=myfile.wav&dataset-id=dddddddd-1111-2222-3333-444444444444"

Uploading a Remote File

To upload a remote file that is publicly accessible (e.g. hosted in AWS S3 or another server), run this in a terminal or your favorite API client:

curl -X POST -u USERNAME:PASSWORD -H "Content-Type: application/json" --data '{"url": "https://www.deepgram.com/examples/interview_speech-analytics.wav"}' "https://missioncontrol.deepgram.com/v1/resources?name=myfile.wav&dataset-id=dddddddd-1111-2222-3333-444444444444"

You’ll receive a response like this:

{
  "id": "ffffffff-0000-0000-0000-ffffffffffff",
  "name": "myfile.wav",
  "created": "2020-05-01T17:08:32.353733Z",
  "duration": 2705.436,
  "status": "UNLABELED",
  "read_only": false
}

Repeat this process with all the files you’d like to add to your dataset.

Add labels to your audio files

You’ll notice that all the responses from the previous step included a status of UNLABELED. We’ll fix this now.

In order for your dataset to be ready for training, you’ll need to associate labels with your audio. These labels will pair with your audio to be the truth data for your custom model to train on. For that reason, the labels you upload should be as close to 100% accurate as possible. If you skimp on that step, your model can only hope to be as good as what you’ve trained it on. That’s right — “garbage in, garbage out.”

There are a couple ways for you to go about this step:

  1. If you already have labels, go ahead and upload them.
  2. If you don’t have labels, you can
    • Request that a professional transcriptionist create some for you. Remember, your MissionControl account grants you 10 minutes of free professional data labeling.
    • Label your resources yourself in MissionControl.

Uploading existing labels

Before you upload, you’ll want to check that the format of your labels match what MissionControl is expecting to receive.

Your labels should:

  • Be verbatim. If your labels skip spoken words or paraphrase, your model will learn to as well.
  • Have numbers and symbols written out. Numericals have a variety of ways that they can be said. Writing them out removes ambiguity for the speech model. For example, write “four” instead of “4” or “plus” instead of “+”.
curl -X PUT -u USERNAME:PASSWORD --data-binary @path/to/test.txt "https://missioncontrol.deepgram.com/v1/resources/{resource-id}/transcript"  

Requesting professional data labeling

Labeling speech data is quite time consuming and requires a high level of accuracy. Luckily, MissionControl comes with 10 minutes of free professional data labeling to help you get started.

To request these labels, submit the following command, being sure to specify the list of resource-ids you’d like to have labeled.

*Warning*: If the total duration of the audio submitted in your request exceeds your available labeling credits, your request will fail. If you’d like to increase your labeling credits, request an upgrade.

curl 
-X POST 
-u USERNAME:PASSWORD 
-H "Content-Type: application/json"
--data '{
  "resource_ids": 
    "ffffffff-0000-0000-0000-ffffffffffff",
    "gggggggg-1111-1111-1111-gggggggggggg"
}' 
"https://missioncontrol.deepgram.com/v1/label"

The confirmation response for each resource will look like:

{
  "id": "ffffffff-0000-0000-0000-ffffffffffff",
  "name": "myfile.wav",
  "created": "2020-04-27T20:36:01.813237Z",
  "duration": 3565.548,
  "status": "IN_PROGRESS",
  "read_only": false
}

Professional labeling will take some time to complete, so kick back while they go to work. You’ll be emailed when your labels are complete.

Labeling yourself in MissionControl

We’ve armed you with a transcript editor in the DataFactory of MissionControl. To start using it, log in to your account, route to the DataFactory, select your desired dataset, and then click “Label Myself”. Make use of the helpful keyboard shortcuts and be sure to follow the instructions for optimum training results.

If at any point you decide you’ve had enough, you can always send your resources to professionals for labeling.

Train a Custom Model with your datasets

Now that your dataset is training-ready, you’re ready to build your Custom Model.

Go ahead and submit a curl command that names your model and associates the dataset you prepared with it.

curl -X POST -u USERNAME:PASSWORD https://missioncontrol.deepgram.com/v1/models?dataset-id=dddddddd-1111-2222-3333-444444444444 -H 'content-type: application/json' -d '{"name": "MyCustomModel"}'  

To associate additional datasets to your model, take advantage of PUT /models/{model-id}/datasets.

You’ll quickly get back a response that shows your new model.

{
  "model_id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "version_id": "12345678-1234-1234-1234-1234567890ab",
  "model_name": "MyCustomModel",
  "created": "2020-05-01T18:56:40.316185Z",
  "model_type": "USER",
  "wer": null,
  "trained_at": null,
  "status": "CANCELLED"
}

Go ahead and copy the model_id. We’ll use that to submit the model for training.

Perfect, plug that model_id in and run the following command:

curl -X POST -u USERNAME:PASSWORD "https://missioncontrol.deepgram.com/v1/train?model-id={model-id}&base-model-id=e1eea600-6c6b-400a-a707-a491509e52f1"  

You’ll see a response confirming that your model has been submitted and its status has changed to PENDING

{
 "id":"a21e82a7-5bac-4b2a-a816-cb2f84e08ca8",
 "model_id":"aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
 "submitted":"2020-05-01T19:12:24.913587Z",
 "finished": null,
 "status":"PENDING"
}

Training will take some time, but you’ll be emailed once your model has finished.

Once it’s finished training, take a look at the steps for reviewing your custom model’s performance and deploying it for use at scale.

To transcribe with your new model, you'll need to deploy it to SpeechEngine.

Nice work. You’re on the road to superior transcription!