Custom Voice
Build a recognizable, one-of-a-kind voice for your Text-to-Speech apps with your speaking data available. You can further fine-tune the voice outputs by adjusting a set of voice parameters.
Get started
New to Speech Services? Create a Speech resource
Create a voice as unique as your business
Do you need to give your voice agent a unique, recognizable brand voice? The Text-to-Speech voice customization feature makes it easy to create one-of-a-kind voice-enabled apps, with no expertise required. To customize your voice agent, simply record and upload training data, and Microsoft creates a unique voice font tuned to your recording. The system easily scales as you get more data to provide an even more natural voice.

When using either default or customized voices, you can further tailor your audio by controlling parameters such as the speed of speech, pitch, volume, additional pauses, and pronunciations.
Hear Custom Voice in action
Build a highly natural voice without a single line of code, starting from just a few minutes of audio.
2 hours of speech data or less
3 hours of high-quality voice recordings
8 hours of high-quality voice recordings
Creating a custom voice model
Custom voice Diagram
1 Prepare training data and create a Speech resource before you start to train a Custom Voice.
2 Upload your data to the Custom Voice portal or through the Custom Voice API and check quality.
3 Use your data to train a custom model. Test the model with your script when it’s ready.
4 Deploy the voice model to get your custom API endpoint. Test the endpoint before you integrate it in your system.
5 Use the voice in your apps by using code samples from the endpoint.
Voice Tuning preview
Quickly refine high-quality voice output with an SSML-powered web tool
Customize and fine-tune spoken text to best suit your needs. Experiment with voice styles in real-time, tailor speech patterns, and quickly create high-quality voice output.

Custom Voice offers tools to easily tune standard, custom, and neural voices. Choose from a wide palette of SSML-supported voice attributes such as rate, pitch, volume, pronunciation, and breaks.
Customize output by <prosody rate="-50.00%"> slowing-down the speed rate.</prosody>
Add a break <break time="600ms"/> between words.
You can pronounce it ASAP or <sub alias="as soon as possible">ASAP</sub>.
Get started with Voice Tuning now
Start tuning
Custom Voice turnkey solutions
Premium services that take care of everything for you
State-of-the-art custom voices
The highest quality digital voices, fully-tested and fine-tuned, in 49 supported languages.

Custom Voice turnkey solutions now support Neural Text-to-Speech, the latest deep neural network technology to create digital voices nearly indistinguishable from human recordings.
Get in touch to learn more about the Custom Voice turnkey solution Contact us
Creating a premium custom voice
Design a persona
Help you define a voice persona to match your target customer
Frame speech styles and identify key domains to use the voice
Collect voice data
Help you select the right scripts to prepare the recording data
Select voice talents to match the persona designed
Supervise the recording process
Build a quality voice model
Check consistency between scripts and recordings
Train the voice model with fine-tuning
Evaluate the voice model quality
Deploy to your solution
Deploy the voice model on Azure, on prem or at the edge
Support scalability