Introduction to Amazon Polly

Amazon Polly uses deep learning to synthesize human speech from text. With Amazon Polly you can convert meeting notes, research papers and magazine and newspaper articles to a digital audio format. And, with support for various voices and dozens of languages, Amazon Polly can bring your textual content to life in minutes.

Photo by Ilona Frey on Unsplash

The Solution

In this tutorial, we’ll bring you through a simple example where we show you how to synthesize speech from text and store the synthesized speech in a digital audio format like MP3.

Remember, for any example solution from AWS with .NET, we focus on the code that exemplifies the problem we are trying to solve. We don’t include logging, input validation, exception handling, etc., and we embed the configuration data within classes instead of using environment variables, configuration files, key/value stores and the like. These items should not be skipped for proper solutions.

Prerequisites

To complete this solution, you will need the .NET CLI which is included in the .NET SDK. In addition, you will need to create an AWS IAM user with programmatic access with the appropriate permissions to interact with Amazon Polly. In addition, you will need to download the AWS CLI and configure your environment.

Warning: some AWS services may have fees associated with them.

Our Dev Environment

This tutorial was developed using Ubuntu 23.10, .NET 8 SDK and Visual Studio Code 1.86.1. Some commands/constructs may vary across systems.

Create the Amazon Polly .NET Application

First, we’ll create the .NET Amazon Polly application with the following command:

$ dotnet new console -n TextToSpeech --use-program-main

Add Dependencies to the .NET Polly App

Add the app dependencies for the AWS SDK with the following commands:

$ dotnet add package AWSSDK.Core

$ dotnet add package AWSSDK.Polly

Developing the .NET Amazon Polly App

In the newly created .NET app, open the Program.cs file and change the “Main” function definition to:

static async Task Main(string[] args)

With that complete, let’s create a couple variables.

string outputFileName = "synthesized-text.mp3";
string textToSynthesize = "This is a test for AWS with dot net.";

With some of the variables created, we’ll create the Amazon Polly client. We’ll use the Amazon Polly client to interact with the Amazon Polly service.

AmazonPollyClient amazonPollyClient = new AmazonPollyClient();

With the client created, let’s construct the SynthesizeSpeechRequest object. Here we select “Mia” as the voice and we will output in the MP3 format. For the text, we will pass in the textToSynthesize variable that we created earlier.

SynthesizeSpeechRequest synthesizeSpeechRequest = new 
   SynthesizeSpeechRequest
   {
       VoiceId = VoiceId.Mia,
       OutputFormat = OutputFormat.Mp3,
       Text = textToSynthesize
   };

SynthesizeSpeechResponse synthesizeSpeechResponse = await 
   amazonPollyClient.SynthesizeSpeechAsync(synthesizeSpeechRequest);

To complete the code, we will create a FileStream object. Then we will read from the synthesizeSpeechResponse.AudioStream object into the newly created FileStream object.

Note: we need to flush the buffer so that we can squeeze out the last few seconds of the synthesized text that Polly created.

FileStream outputFileStream = new FileStream(outputFileName, 
   FileMode.Create, FileAccess.Write);

int c = 2048;//count: maximum number of bytes to be read
int o = 0;//offset: where to begin storing the data being read
byte[] b = new byte[c];//buffer: contains the specified byte array
int readSpeechBytes;

while ((readSpeechBytes = synthesizeSpeechResponse.AudioStream.Read(b, o, c)) > 0)
{
   outputFileStream.Write(b, 0, readSpeechBytes);
}

outputFileStream.Flush();
Console.WriteLine("Polly created synthesized speech. The " + outputFileName + " file was saved to local storage.");

Testing the .NET Amazon Polly App

With the code complete, we can now run the app with the following commands:

$ dotnet build && dotnet run

When the app finishes running, you should see a message like the following:

Polly created synthesized speech. The synthesized-text.mp3 file was saved to local storage.

Summary

We have concluded this tutorial where you have learned how to synthesize speech from text using Amazon Polly and the AWS .NET SDK.

.NET CLI, .NET SDK, AWS .NET SDK, Amazon Polly