This guide shows you how to create videos from text with professional AI-generated voice-over narration. Transform written content into narrated videos perfect for social media, YouTube, educational content, or marketing materials.Documentation Index
Fetch the complete documentation index at: https://docs.pictory.ai/llms.txt
Use this file to discover all available pages before exploring further.
What You’ll Learn
AI Voice-Over
Add natural-sounding narration to your videos
Multiple Voices
Choose from various AI voice speakers
Auto-Sync
Voice automatically syncs with video scenes
Voice Customization
Control speed and volume for perfect delivery
Before You Begin
Make sure you have:- A Pictory API key (get one here)
- Node.js or Python installed on your machine
- Text content ready for video conversion
- Basic understanding of voice-over concepts
How Text-to-Video with Voice-Over Works
When you create a video with AI voice-over:- Text Processing - Your text content is analyzed and prepared
- Scene Generation - Text is split into logical video scenes
- Visual Selection - AI selects appropriate stock visuals for each scene
- Voice Generation - Professional AI narration is created from your text
- Synchronization - Voice-over is automatically synchronized with video timing
- Caption Creation - Subtitles are generated to match the narration
- Video Rendering - Final video is assembled with all elements combined
The AI voice-over is automatically synchronized with your video scenes. The narration timing adjusts based on text length, voice speed, and scene duration for natural pacing.
Complete Example
Understanding the Parameters
Voice-Over Configuration
| Parameter | Type | Required | Description |
|---|---|---|---|
voiceOver.enabled | boolean | Yes | Set to true to enable voice-over narration |
voiceOver.aiVoices | array | Yes | Array of AI voice configurations (currently supports one voice) |
voiceOver.aiVoices[].speaker | string | Yes | Name or voice ID of the AI voice. You can pass a voice name (e.g., "Brian", "Emma"), a numeric track ID, or an ElevenLabs voice ID (e.g., "pNInz6obpgDQGcFmaJgB"). ElevenLabs voices are automatically discovered and added to your library if not already present. |
voiceOver.aiVoices[].speed | number | No | Voice speed 50-200 (default: 100 = normal) |
voiceOver.aiVoices[].amplificationLevel | number | No | Volume level -1 to 1 (default: 0 = normal) |
Scene Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
scenes[].story | string | - | Text content to be narrated |
createSceneOnNewLine | boolean | false | Create new scene at line breaks |
createSceneOnEndOfSentence | boolean | false | Create new scene at sentence endings |
Voice Speed Reference
| Speed Value | Playback Rate | Best Used For |
|---|---|---|
| 50 | 0.5x (Very slow) | Complex technical content, learning materials |
| 75 | 0.75x (Slower) | Detailed explanations, emphasis |
| 90 | 0.9x (Slightly slower) | Professional presentations, important information |
| 100 | 1.0x (Normal) | Standard content, most use cases |
| 110-120 | 1.1-1.2x (Slightly faster) | Casual content, social media |
| 150 | 1.5x (Fast) | Quick summaries, energetic content |
| 200 | 2.0x (Very fast) | Speed reading, urgent updates |
Amplification Level Reference
| Level | Effect | Best Used For |
|---|---|---|
| -1.0 | Quietest | Background narration, ambient voice |
| -0.5 | Quieter than normal | De-emphasized content |
| 0 | Normal volume | Standard narration, most content |
| 0.3 | Slightly louder | Mild emphasis, important points |
| 0.5 | Moderately louder | Strong emphasis |
| 1.0 | Loudest | Maximum emphasis, calls-to-action |
Choosing Your AI Voice
The Pictory API supports various AI voice speakers with different characteristics:| Voice Type | Example Names | Best Used For |
|---|---|---|
| Male Professional | Brian, Matthew, Joey | Business content, technical tutorials, corporate videos |
| Female Professional | Emma, Joanna, Amy | Educational content, friendly tutorials, customer service |
| Conversational | Multiple options | Casual content, social media, storytelling |
| ElevenLabs Premium | Pass voice ID directly | Ultra-realistic, expressive narration |
ElevenLabs Auto-Discovery: You can pass any ElevenLabs voice ID directly as the
speaker value. If the voice is not already in your library, it will be automatically discovered and added from the ElevenLabs catalog. No manual setup required. See the ElevenLabs Voices Guide for details.Common Use Cases
Educational Content
Marketing and Sales
Social Media Content
Technical Tutorials
Best Practices
Select the Right Voice
Select the Right Voice
Match voice characteristics to your content:
- Professional Content: Use Brian, Matthew, or Joanna
- Casual/Friendly: Use Emma, Amy, or Joey
- Educational: Choose clear voices like Emma or Brian
- Brand Consistency: Use the same voice across all your videos
- Test Options: Try different voices to find the best fit
Optimize Voice Speed
Optimize Voice Speed
Adjust speed based on content type:
- Complex Content: Use 80-95 for technical or educational material
- Standard Content: Use 95-105 for most narration
- Social Media: Use 110-120 for energetic, engaging delivery
- Urgent Content: Use 120-150 for quick updates
- Test Pacing: Always preview to ensure speed feels natural
Use Amplification Wisely
Use Amplification Wisely
Apply volume strategically:
- Subtle Emphasis: Use 0.1-0.3 for gentle highlighting
- Normal Content: Keep at 0 for most narration
- Strong Emphasis: Use 0.4-0.6 for key messages
- Avoid Extremes: Don’t exceed 0.7 to prevent distortion
- Consistency: Maintain similar levels across similar content
Write for Voice-Over
Write for Voice-Over
Prepare text that sounds natural when spoken:
- Conversational Tone: Write as you would speak, not formal writing
- Short Sentences: Keep sentences concise for better pacing
- Clear Pronunciation: Spell out acronyms and difficult terms
- Proper Punctuation: Use periods and commas for natural pauses
- Test by Reading: Read your text aloud before converting
Plan Scene Breaks
Plan Scene Breaks
Structure scenes for effective narration:
- Logical Breaks: Split at natural pauses in your content
- Scene Length: Aim for 5-15 seconds per scene (30-100 words)
- Visual Match: Ensure each scene’s visuals match the narration
- Pacing: Use scene breaks to control video rhythm
- Test Flow: Review to ensure smooth transitions
Troubleshooting
Voice sounds robotic or unnatural
Voice sounds robotic or unnatural
Problem: The AI narration does not sound natural.Solution:
- Try a different AI voice speaker
- Adjust speed to 95-105 for more natural cadence
- Ensure your text has proper punctuation
- Avoid using all caps or unusual formatting
- Write in a conversational tone, not formal writing
- Use the Get Voiceover Tracks API to explore voice options
Narration is too fast or too slow
Narration is too fast or too slow
Problem: Voice-over pacing does not match your expectations.Solution:
- Adjust the
speedparameter:- Decrease to 80-90 for slower narration
- Increase to 110-120 for faster delivery
- Test different speeds to find the sweet spot
- Consider your audience - educational content needs slower pacing
- Very complex content may need speeds as low as 75
Voice-Over Does Not Sync with Scenes
Voice-Over Does Not Sync with Scenes
Problem: Narration timing seems off compared to visuals.Solution:
- The API automatically syncs voice to video duration
- Adjust scene breaks (
createSceneOnNewLine,createSceneOnEndOfSentence) - Keep scenes to reasonable lengths (5-15 seconds each)
- If using manual scenes, ensure text length matches desired scene duration
- Review the completed video - timing may feel different than expected
Audio quality is poor or distorted
Audio quality is poor or distorted
Problem: Voice-over sounds muffled, distorted, or has audio artifacts.Solution:
- Reduce
amplificationLevelto 0 or below - Avoid levels above 0.7 which can cause clipping
- Try a different AI voice - some have better audio quality
- Check for unusual characters or formatting in source text
- Ensure text does not have excessive special characters
Can't find the right voice
Can't find the right voice
Problem: None of the voices seem right for your content.Solution:
- Use the Get Voiceover Tracks API to see all options
- Test multiple voices with the same content
- Consider the voice characteristics table above for guidance
- Try adjusting speed and amplification with different voices
- Some voices work better for specific content types
Next Steps
Enhance your voice-over videos with these features:Multi-Level Voice-Over
Use different voices or settings for different scenes
Background Music
Add music to complement voice-over narration
Custom Captions
Add translated or custom subtitles to your videos
Basic Text to Video
Create videos without voice-over
API Reference
For complete technical details, see:Get Voiceover Tracks
List all available AI voices
Render Storyboard Video
Direct video rendering with voice-over
Create Storyboard Preview
Create preview before rendering
Get Storyboard Preview Job
Monitor storyboard creation progress
