What You’ll Learn
AI Voice-Over
Add natural-sounding narration to your videos
Multiple Voices
Choose from various AI voice speakers
Auto-Sync
Voice automatically syncs with video scenes
Voice Customization
Control speed and volume for perfect delivery
Before You Begin
Make sure you have:- A Pictory API key (get one here)
- Node.js or Python installed on your machine
- Text content ready for video conversion
- Basic understanding of voice-over concepts
How Text-to-Video with Voice-Over Works
When you create a video with AI voice-over:- Text Processing - Your text content is analyzed and prepared
- Scene Generation - Text is split into logical video scenes
- Visual Selection - AI selects appropriate stock visuals for each scene
- Voice Generation - Professional AI narration is created from your text
- Synchronization - Voice-over is automatically synchronized with video timing
- Caption Creation - Subtitles are generated to match the narration
- Video Rendering - Final video is assembled with all elements combined
The AI voice-over is automatically synchronized with your video scenes. The narration timing adjusts based on text length, voice speed, and scene duration for natural pacing.
Complete Example
Understanding the Parameters
Voice-Over Configuration
| Parameter | Type | Required | Description |
|---|---|---|---|
voiceOver.enabled | boolean | Yes | Set to true to enable voice-over narration |
voiceOver.aiVoices | array | Yes | Array of AI voice configurations (currently supports one voice) |
voiceOver.aiVoices[].speaker | string | Yes | Name of the AI voice (e.g., “Brian”, “Emma”) |
voiceOver.aiVoices[].speed | number | No | Voice speed 50-200 (default: 100 = normal) |
voiceOver.aiVoices[].amplificationLevel | number | No | Volume level -1 to 1 (default: 0 = normal) |
Scene Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
scenes[].story | string | - | Text content to be narrated |
createSceneOnNewLine | boolean | false | Create new scene at line breaks |
createSceneOnEndOfSentence | boolean | false | Create new scene at sentence endings |
Voice Speed Reference
| Speed Value | Playback Rate | Best Used For |
|---|---|---|
| 50 | 0.5x (Very slow) | Complex technical content, learning materials |
| 75 | 0.75x (Slower) | Detailed explanations, emphasis |
| 90 | 0.9x (Slightly slower) | Professional presentations, important information |
| 100 | 1.0x (Normal) | Standard content, most use cases |
| 110-120 | 1.1-1.2x (Slightly faster) | Casual content, social media |
| 150 | 1.5x (Fast) | Quick summaries, energetic content |
| 200 | 2.0x (Very fast) | Speed reading, urgent updates |
Amplification Level Reference
| Level | Effect | Best Used For |
|---|---|---|
| -1.0 | Quietest | Background narration, ambient voice |
| -0.5 | Quieter than normal | De-emphasized content |
| 0 | Normal volume | Standard narration, most content |
| 0.3 | Slightly louder | Mild emphasis, important points |
| 0.5 | Moderately louder | Strong emphasis |
| 1.0 | Loudest | Maximum emphasis, calls-to-action |
Choosing Your AI Voice
The Pictory API supports various AI voice speakers with different characteristics:| Voice Type | Example Names | Best Used For |
|---|---|---|
| Male Professional | Brian, Matthew, Joey | Business content, technical tutorials, corporate videos |
| Female Professional | Emma, Joanna, Amy | Educational content, friendly tutorials, customer service |
| Conversational | Multiple options | Casual content, social media, storytelling |
Common Use Cases
Educational Content
Marketing and Sales
Social Media Content
Technical Tutorials
Best Practices
Select the Right Voice
Select the Right Voice
Match voice characteristics to your content:
- Professional Content: Use Brian, Matthew, or Joanna
- Casual/Friendly: Use Emma, Amy, or Joey
- Educational: Choose clear voices like Emma or Brian
- Brand Consistency: Use the same voice across all your videos
- Test Options: Try different voices to find the best fit
Optimize Voice Speed
Optimize Voice Speed
Adjust speed based on content type:
- Complex Content: Use 80-95 for technical or educational material
- Standard Content: Use 95-105 for most narration
- Social Media: Use 110-120 for energetic, engaging delivery
- Urgent Content: Use 120-150 for quick updates
- Test Pacing: Always preview to ensure speed feels natural
Use Amplification Wisely
Use Amplification Wisely
Apply volume strategically:
- Subtle Emphasis: Use 0.1-0.3 for gentle highlighting
- Normal Content: Keep at 0 for most narration
- Strong Emphasis: Use 0.4-0.6 for key messages
- Avoid Extremes: Don’t exceed 0.7 to prevent distortion
- Consistency: Maintain similar levels across similar content
Write for Voice-Over
Write for Voice-Over
Prepare text that sounds natural when spoken:
- Conversational Tone: Write as you would speak, not formal writing
- Short Sentences: Keep sentences concise for better pacing
- Clear Pronunciation: Spell out acronyms and difficult terms
- Proper Punctuation: Use periods and commas for natural pauses
- Test by Reading: Read your text aloud before converting
Plan Scene Breaks
Plan Scene Breaks
Structure scenes for effective narration:
- Logical Breaks: Split at natural pauses in your content
- Scene Length: Aim for 5-15 seconds per scene (30-100 words)
- Visual Match: Ensure each scene’s visuals match the narration
- Pacing: Use scene breaks to control video rhythm
- Test Flow: Review to ensure smooth transitions
Troubleshooting
Voice sounds robotic or unnatural
Voice sounds robotic or unnatural
Problem: The AI narration doesn’t sound natural.Solution:
- Try a different AI voice speaker
- Adjust speed to 95-105 for more natural cadence
- Ensure your text has proper punctuation
- Avoid using all caps or unusual formatting
- Write in a conversational tone, not formal writing
- Use the Get Voiceover Tracks API to explore voice options
Narration is too fast or too slow
Narration is too fast or too slow
Problem: Voice-over pacing doesn’t match your expectations.Solution:
- Adjust the
speedparameter:- Decrease to 80-90 for slower narration
- Increase to 110-120 for faster delivery
- Test different speeds to find the sweet spot
- Consider your audience - educational content needs slower pacing
- Very complex content may need speeds as low as 75
Voice-over doesn't sync with scenes
Voice-over doesn't sync with scenes
Problem: Narration timing seems off compared to visuals.Solution:
- The API automatically syncs voice to video duration
- Adjust scene breaks (
createSceneOnNewLine,createSceneOnEndOfSentence) - Keep scenes to reasonable lengths (5-15 seconds each)
- If using manual scenes, ensure text length matches desired scene duration
- Review the completed video - timing may feel different than expected
Audio quality is poor or distorted
Audio quality is poor or distorted
Problem: Voice-over sounds muffled, distorted, or has audio artifacts.Solution:
- Reduce
amplificationLevelto 0 or below - Avoid levels above 0.7 which can cause clipping
- Try a different AI voice - some have better audio quality
- Check for unusual characters or formatting in source text
- Ensure text doesn’t have excessive special characters
Can't find the right voice
Can't find the right voice
Problem: None of the voices seem right for your content.Solution:
- Use the Get Voiceover Tracks API to see all options
- Test multiple voices with the same content
- Consider the voice characteristics table above for guidance
- Try adjusting speed and amplification with different voices
- Some voices work better for specific content types
Next Steps
Enhance your voice-over videos with these features:Multi-Level Voice-Over
Use different voices or settings for different scenes
Background Music
Add music to complement voice-over narration
Custom Captions
Add translated or custom subtitles to your videos
Basic Text to Video
Create videos without voice-over
