June 3, 2023

Golang / Go Crash Course 10 | Generating Text To Speech MP3 files with Amazon Polly

In this article we are going to use Amazon Polly from Golang to generate a text-to-speech audio files.

What is Amazon Polly?

  • Amazon Polly is a Text-To-Speech cloud-based service.
  • It uses advanced machine learning technologies to synthesize natural sounding human speech.
  • You can build speech-enabled applications in multiple languages.
  • Use Cases:
    • USA TODAY NETWORK produces audio content with Polly.
    • Mapbox uses Polly for voice guidance as part of its navigation solution.
    • Volley is a top developer of voice-controlled games that also uses Polly.

Create IAM Policy to use Polly

Let’s create a new policy from AWS Console so we are load to the user that owns the credentials that we have in our local environment to use the Amazon Polly API through the AWS SDK for Golang.

    "Version": "2012-10-17",
    "Statement": [
            "Sid": "PollyDevelopersPolicy1",
            "Effect": "Allow",
            "Action": "polly.SynthesizeSpeech",
            "Resource": "*"

Then attach this policy to the IAM user group named “rest-api-developers” that we already created in the previous article: Golang / Go Crash Course 09 | Connecting our REST API with Amazon (AWS) DynamoDB

Consume the Amazon Polly API

Now we have all the permissions ready on AWS so we can access Amazon Polly API through the AWS SDK for Golang. Let’s start working on a new project with Go module publication

$ mkdir golang-aws-polly
$ cd golang-aws-polly
$ go mod init github.com/favtuts/golang-amazon-polly

Next step we are going to install the AWS SDK for Golang

$ go get -u github.com/aws/aws-sdk-go
go: added github.com/aws/aws-sdk-go v1.44.235
go: added github.com/jmespath/go-jmespath v0.4.0

Now we are going to create a PollyService

package service

import (


type PollyService interface {
	Synthesize(text string, fileName string) error

type pollyConfig struct {
	voice string

func NewKimberlyPollyService() PollyService {
	return &pollyConfig{

func NewJoeyPollyService() PollyService {
	return &pollyConfig{
		voice: JOEY_VOICE,

const (
	AUDIO_FORMAT   = "mp3"
	KIMBERLY_VOICE = "Kimberly"
	JOEY_VOICE     = "Joey"

func createPollyClient() *polly.Polly {
	session := session.Must(session.NewSessionWithOptions(session.Options{
		SharedConfigState: session.SharedConfigEnable,

	return polly.New(session)

func (config *pollyConfig) Synthesize(text string, fileName string) error {
	pollyClient := createPollyClient()

	input := &polly.SynthesizeSpeechInput{
		OutputFormat: aws.String(AUDIO_FORMAT),
		Text:         aws.String(text),
		VoiceId:      aws.String(config.voice),

	output, err := pollyClient.SynthesizeSpeech(input)
	if err != nil {
		return err

	outFile, err := os.Create(fileName)
	if err != nil {
		return err

	defer outFile.Close()

	_, err = io.Copy(outFile, output.AudioStream)
	if err != nil {
		return err

	return nil

Create Polly application

Let’s create the main.go and let’s create some examples of using PollyService

package main

import "github.com/favtuts/golang-amazon-polly/service"

var (
	kimberly service.PollyService = service.NewKimberlyPollyService()
	joey     service.PollyService = service.NewJoeyPollyService()

func main() {
	err := kimberly.Synthesize("Hi, I am Kimberly, how are you?", "kimberly.mp3")
	if err != nil {

	err = joey.Synthesize("Hi, I am Joey. Nice to meet you.", "yoey.mp3")
	if err != nil {

Run the Polly application

Let’s run the application to see what happen:

$ go run *.go

Now we have two mp3 files and let play them:

$ play kimberly.mp3
$ play joey.mp3

Download Source Code

$ git clone https://github.com/favtuts/golang-amazon-polly.git
$ cd golang-amazon-polly
$ go build

$ go run *.go

Leave a Reply

Your email address will not be published. Required fields are marked *