Amazon Polly uses deep learning to synthesize human speech from text. With Amazon Polly you can convert meeting notes, research papers and magazine and newspaper articles to a digital audio format. And, with support for various voices and dozens of languages, Amazon Polly can bring your textual content to life in minutes.
Photo by Ilona Frey on Unsplash
The Solution
In this tutorial, we’ll bring you through a simple example where we show you how to synthesize speech from text and store the synthesized speech in a digital audio format like MP3.
Remember, for any example solution from AWS with .NET, we focus on the code that exemplifies the problem we are trying to solve. We don’t include logging, input validation, exception handling, etc., and we embed the configuration data within classes instead of using environment variables, configuration files, key/value stores and the like. These items should not be skipped for proper solutions.
Prerequisites
To complete this solution, you will need the .NET CLI which is included in the .NET SDK. In addition, you will need to create an AWS IAM user with programmatic access with the appropriate permissions to interact with Amazon Polly. In addition, you will need to download the AWS CLI and configure your environment.
Warning: some AWS services may have fees associated with them.
Our Dev Environment
This tutorial was developed using Ubuntu 23.10, .NET 8 SDK and Visual Studio Code 1.86.1. Some commands/constructs may vary across systems.
Create the Amazon Polly .NET Application
First, we’ll create the .NET Amazon Polly application with the following command:
$ dotnet new console -n TextToSpeech –use-program-main
Add Dependencies to the .NET Polly App
Add the app dependencies for the AWS SDK with the following commands:
$ dotnet add package AWSSDK.Core
$ dotnet add package AWSSDK.Polly
Developing the .NET Amazon Polly App
In the newly created .NET app, open the Program.cs file and change the “Main” function definition to:
static async Task Main(string[] args)
With that complete, let’s create a couple variables.
string outputFileName = "synthesized-text.mp3";
string textToSynthesize = "This is a test for AWS with dot net.";
With some of the variables created, we’ll create the the Amazon Polly client. We’ll use the Amazon Polly client to interact with the Amazon Polly service.
AmazonPollyClient amazonPollyClient = new AmazonPollyClient();
With the client created, let’s construct the SynthesizeSpeechRequest object. Here we select “Mia” as the voice and we will output in the MP3 format. For the text, we will pass in the textToSynthesize variable that we created earlier.
SynthesizeSpeechRequest synthesizeSpeechRequest = new
SynthesizeSpeechRequest
{
VoiceId = VoiceId.Mia,
OutputFormat = OutputFormat.Mp3,
Text = textToSynthesize
};
SynthesizeSpeechResponse synthesizeSpeechResponse = await
amazonPollyClient.SynthesizeSpeechAsync(synthesizeSpeechRequest);
To complete the code, we will create a FileStream object. Then we will read from the synthesizeSpeechResponse.AudioStream object into the newly created FileStream object.
Note: we need to flush the buffer so that we can squeeze out the last few seconds of the synthesized text that Polly created.
FileStream outputFileStream = new FileStream(outputFileName,
FileMode.Create, FileAccess.Write);
int c = 2048;//count: maximum number of bytes to be read
int o = 0;//offset: where to begin storing the data being read
byte[] b = new byte[c];//buffer: contains the specified byte array
int readSpeechBytes;
while ((readSpeechBytes =
synthesizeSpeechResponse.AudioStream.Read(b, o, c)) > 0)
{
outputFileStream.Write(b, 0, readSpeechBytes);
}
outputFileStream.Flush();
Console.WriteLine("Polly created synthesized speech. The " +
outputFileName + " file was saved to local storage.");
Testing the .NET Amazon Polly App
With the code complete, we can now run the app with the following commands:
$ dotnet build && dotnet run
When the app finishes running, you should see a message like the following:
Polly created synthesized speech. The synthesized-text.mp3 file was saved to local storage.
Summary
We have concluded this tutorial where you have learned how to synthesize speech from text using Amazon Polly and the AWS .NET SDK.