What You’ll Learn
Phoneme Basics
Understand SSML phoneme tag syntax and usage
Provider Differences
Learn how ElevenLabs, AWS Polly, and Google handle pronunciation
CMU Arpabet
Use the CMU Arpabet phonetic alphabet for pronunciation
Practical Examples
Apply pronunciation control to real video content
Before You Begin
Make sure you have:- A Pictory API key (get one here)
- Basic understanding of the Storyboard API
- Familiarity with voice-over configuration
Understanding Voice-Over Types
Pictory supports three voice-over services, each with different SSML phoneme support:| Service | Category | Phoneme Support | Alphabet |
|---|---|---|---|
| ElevenLabs | Premium | English only | CMU Arpabet |
| AWS Polly | Standard | Full support | IPA, X-SAMPA |
| Google TTS | Standard | Full support | IPA |
You can identify the voice-over service by the
service field in the Get Voiceover Tracks API response. Values are elevenlabs, aws, or google.Enabling SSML in Your Story
To use SSML tags including phoneme tags, you must setisSSMLStory: true in your scene configuration:
ElevenLabs (Premium Voices)
ElevenLabs premium voices provide high-quality, natural-sounding speech with phoneme support for English language content.Key Requirements
-
Model Configuration: When using phoneme tags with ElevenLabs, you must specify a
modelIdinpremiumVoiceSettings. If not provided,eleven_flash_v2is used by default for phoneme processing. - Limited Model Support: Only three ElevenLabs models support phoneme tags. Other models will ignore phoneme markup.
- English Only: Phoneme pronunciation control works only with English language content in ElevenLabs.
- CMU Arpabet: ElevenLabs uses the CMU Arpabet phonetic alphabet.
Models with Phoneme Support
| Model ID | Description | Phoneme Support |
|---|---|---|
eleven_flash_v2 | Fast, efficient model (default for phonemes) | Yes |
eleven_turbo_v2 | Optimized for speed | Yes |
eleven_monolingual_v1 | English-optimized | Yes |
Models Without Phoneme Support
The following models do not support phoneme tags:| Model ID | Description | Phoneme Support |
|---|---|---|
eleven_flash_v2_5 | Enhanced flash model | No |
eleven_turbo_v2_5 | Enhanced turbo model | No |
eleven_multilingual_v2 | Multi-language support | No |
eleven_multilingual_v1 | Legacy multi-language | No |
Complete Example
ElevenLabs External Documentation
For detailed information about ElevenLabs pronunciation features:AWS Polly (Standard Voices)
AWS Polly voices provide reliable SSML support with multiple phonetic alphabets for precise pronunciation control.Key Features
- Multiple Alphabets: AWS Polly supports IPA (International Phonetic Alphabet) and X-SAMPA phonetic systems.
-
SSML Categories: AWS Polly voices have different SSML support levels (Category A or B). Check the
ssmlSupportCategoryfield from the tracks API. - Neural and Standard Engines: Different voices use different engines with varying SSML capabilities.
Phoneme Tag Syntax
Complete Example
AWS Polly External Documentation
For detailed information about AWS Polly phoneme tags:Google Text-to-Speech (Standard Voices)
Google TTS voices offer high-quality neural speech synthesis with comprehensive IPA phoneme support.Key Features
- IPA Support: Google TTS uses the International Phonetic Alphabet (IPA) for phoneme specification.
- WaveNet and Neural2 Voices: Google offers advanced neural voice engines with natural-sounding output.
- Multi-language: Phoneme support across multiple languages with language-specific IPA symbols.
Phoneme Tag Syntax
Complete Example
Google TTS External Documentation
For detailed information about Google TTS phoneme support:CMU Arpabet Reference
The CMU Arpabet is a phonetic alphabet commonly used with ElevenLabs. Here’s a quick reference:Vowels
| Arpabet | Example | Word |
|---|---|---|
| AA | ɑ | father |
| AE | æ | cat |
| AH | ʌ | cut |
| AO | ɔ | caught |
| EH | ɛ | bed |
| ER | ɝ | bird |
| IH | ɪ | bit |
| IY | i | beat |
| UH | ʊ | book |
| UW | u | boot |
Stress Markers
| Marker | Meaning |
|---|---|
| 0 | No stress |
| 1 | Primary stress |
| 2 | Secondary stress |
Example Breakdown
For “Pictory” pronounced asP IH1 K T AO0 R IY0:
| Symbol | Sound |
|---|---|
| P | /p/ as in pat |
| IH1 | /ɪ/ as in bit (primary stress) |
| K | /k/ as in kit |
| T | /t/ as in top |
| AO0 | /ɔ/ as in caught (no stress) |
| R | /r/ as in run |
| IY0 | /i/ as in beat (no stress) |
Common Pronunciation Examples
Here are phoneme representations for words commonly mispronounced:| Word | CMU Arpabet | IPA |
|---|---|---|
| Pictory | P IH1 K T AO0 R IY0 | ˈpɪktɔːri |
| AI | EY1 AY1 | ˌeɪˈaɪ |
| Video | V IH1 D IY0 OW0 | ˈvɪdioʊ |
| Tutorial | T UW0 T AO1 R IY0 AH0 L | tuːˈtɔːriəl |
Best Practices
Test Pronunciation Before Rendering
Test Pronunciation Before Rendering
Always test your phoneme tags with a short video before creating longer content. Different voices may interpret phonemes slightly differently.
Use Consistent Phonetic Alphabet
Use Consistent Phonetic Alphabet
Stick to one phonetic alphabet per voice provider:
- ElevenLabs: CMU Arpabet
- AWS Polly: IPA or X-SAMPA
- Google TTS: IPA
Keep Phoneme Tags Simple
Keep Phoneme Tags Simple
Document Your Phonemes
Document Your Phonemes
Keep a reference document of phoneme tags used for your brand names and technical terms for consistency across videos.
Troubleshooting
Phoneme tags are read as text
Phoneme tags are read as text
Pronunciation sounds incorrect with ElevenLabs
Pronunciation sounds incorrect with ElevenLabs
Problem: The word is still mispronounced even with phoneme tags.Solution:
- Verify you’re using CMU Arpabet (not IPA) with ElevenLabs
- Check that
premiumVoiceSettings.modelIdis specified - Ensure stress markers (0, 1, 2) are correctly placed
SSML not working with certain voices
SSML not working with certain voices
Problem: Some voices don’t process SSML tags correctly.Solution: Check the
ssmlSupportCategory field from the Get Voiceover Tracks API. Some voices have limited SSML support.Special characters causing errors
Special characters causing errors
Problem: Request fails when using special characters in phoneme strings.Solution: Ensure proper escaping of special characters. In JSON, use
\" for quotes within the phoneme attribute.Next Steps
AI Voice-Over Guide
Learn the basics of adding voice-over to videos
Multi-Level Voice-Over
Use different voices for different scenes
Get Voiceover Tracks
Discover all available AI voices
Render Storyboard Video
Complete API reference for video rendering
