Text to Video with AI Voice-Over

This guide shows you how to create videos from text with professional AI-generated voice-over narration. Transform written content into narrated videos perfect for social media, YouTube, educational content, or marketing materials.

What You’ll Learn

AI Voice-Over

Add natural-sounding narration to your videos

Multiple Voices

Choose from various AI voice speakers

Auto-Sync

Voice automatically syncs with video scenes

Voice Customization

Control speed and volume for perfect delivery

Before You Begin

Make sure you have:

A Pictory API key (get one here)
Node.js or Python installed on your machine
Text content ready for video conversion
Basic understanding of voice-over concepts

npm install axios

How Text-to-Video with Voice-Over Works

When you create a video with AI voice-over:

Text Processing - Your text content is analyzed and prepared
Scene Generation - Text is split into logical video scenes
Visual Selection - AI selects appropriate stock visuals for each scene
Voice Generation - Professional AI narration is created from your text
Synchronization - Voice-over is automatically synchronized with video timing
Caption Creation - Subtitles are generated to match the narration
Video Rendering - Final video is assembled with all elements combined

The AI voice-over is automatically synchronized with your video scenes. The narration timing adjusts based on text length, voice speed, and scene duration for natural pacing.

Complete Example

import axios from "axios";

const API_BASE_URL = "https://api.pictory.ai/pictoryapis";
const API_KEY = "YOUR_API_KEY";

const SAMPLE_TEXT =
  "AI is poised to significantly impact educators and course creators on social media. " +
  "By automating tasks like content generation, visual design, and video editing, " +
  "AI will save time and enhance consistency.";

async function createTextToVideoWithVoiceOver() {
  try {
    console.log("Creating video with AI voice-over...");

    const response = await axios.post(
      `${API_BASE_URL}/v2/video/storyboard/render`,
      {
        videoName: "text_to_video_with_ai_voice",

        // Voice-over configuration
        voiceOver: {
          enabled: true,                    // Enable voice-over
          aiVoices: [
            {
              speaker: "Brian",              // AI voice name
              speed: 100,                    // Normal speed (50-200)
              amplificationLevel: 0,         // Normal volume (-1 to 1)
            },
          ],
        },

        // Scene configuration
        scenes: [
          {
            story: SAMPLE_TEXT,
            createSceneOnNewLine: true,      // New scene per line
            createSceneOnEndOfSentence: true, // New scene per sentence
          },
        ],
      },
      {
        headers: {
          "Content-Type": "application/json",
          Authorization: API_KEY,
        },
      }
    );

    const jobId = response.data.data.jobId;
    console.log("✓ Video creation started!");
    console.log("Job ID:", jobId);

    // Monitor progress
    console.log("\nMonitoring video creation...");
    let jobCompleted = false;
    let jobResult = null;

    while (!jobCompleted) {
      const statusResponse = await axios.get(
        `${API_BASE_URL}/v1/jobs/${jobId}`,
        {
          headers: { Authorization: API_KEY },
        }
      );

      const status = statusResponse.data.data.status;
      console.log("Status:", status);

      if (status === "completed") {
        jobCompleted = true;
        jobResult = statusResponse.data;
        console.log("\n✓ Video with AI voice-over is ready!");
        console.log("Video URL:", jobResult.data.videoURL);
      } else if (status === "failed") {
        throw new Error("Video creation failed: " + JSON.stringify(statusResponse.data));
      }

      await new Promise(resolve => setTimeout(resolve, 5000));
    }

    return jobResult;
  } catch (error) {
    console.error("Error:", error.response?.data || error.message);
    throw error;
  }
}

createTextToVideoWithVoiceOver();

Understanding the Parameters

Voice-Over Configuration

Parameter	Type	Required	Description
`voiceOver.enabled`	boolean	Yes	Set to `true` to enable voice-over narration
`voiceOver.aiVoices`	array	Yes	Array of AI voice configurations (currently supports one voice)
`voiceOver.aiVoices[].speaker`	string	Yes	Name of the AI voice (e.g., “Brian”, “Emma”)
`voiceOver.aiVoices[].speed`	number	No	Voice speed 50-200 (default: 100 = normal)
`voiceOver.aiVoices[].amplificationLevel`	number	No	Volume level -1 to 1 (default: 0 = normal)

Scene Configuration

Parameter	Type	Default	Description
`scenes[].story`	string	-	Text content to be narrated
`createSceneOnNewLine`	boolean	false	Create new scene at line breaks
`createSceneOnEndOfSentence`	boolean	false	Create new scene at sentence endings

Voice Speed Reference

Speed Value	Playback Rate	Best Used For
50	0.5x (Very slow)	Complex technical content, learning materials
75	0.75x (Slower)	Detailed explanations, emphasis
90	0.9x (Slightly slower)	Professional presentations, important information
100	1.0x (Normal)	Standard content, most use cases
110-120	1.1-1.2x (Slightly faster)	Casual content, social media
150	1.5x (Fast)	Quick summaries, energetic content
200	2.0x (Very fast)	Speed reading, urgent updates

Amplification Level Reference

Level	Effect	Best Used For
-1.0	Quietest	Background narration, ambient voice
-0.5	Quieter than normal	De-emphasized content
0	Normal volume	Standard narration, most content
0.3	Slightly louder	Mild emphasis, important points
0.5	Moderately louder	Strong emphasis
1.0	Loudest	Maximum emphasis, calls-to-action

Volume Considerations: Very high amplification levels (0.7-1.0) may cause audio distortion. Test your settings and use moderation for professional results.

Choosing Your AI Voice

The Pictory API supports various AI voice speakers with different characteristics:

Voice Type	Example Names	Best Used For
Male Professional	Brian, Matthew, Joey	Business content, technical tutorials, corporate videos
Female Professional	Emma, Joanna, Amy	Educational content, friendly tutorials, customer service
Conversational	Multiple options	Casual content, social media, storytelling

To get a complete list of available voices programmatically, use the Get Voiceover Tracks API endpoint. This returns all available AI voices with their names, languages, and characteristics.

Common Use Cases

Educational Content

{
  voiceOver: {
    enabled: true,
    aiVoices: [{
      speaker: "Emma",        // Clear female voice
      speed: 95,              // Slightly slower for comprehension
      amplificationLevel: 0
    }]
  }
}

Result: Clear, paced narration perfect for learning materials.

Marketing and Sales

{
  voiceOver: {
    enabled: true,
    aiVoices: [{
      speaker: "Matthew",     // Professional male voice
      speed: 110,             // Slightly faster for energy
      amplificationLevel: 0.2 // Louder for emphasis
    }]
  }
}

Result: Dynamic, engaging narration for promotional content.

{
  voiceOver: {
    enabled: true,
    aiVoices: [{
      speaker: "Amy",         // Friendly female voice
      speed: 115,             // Faster pace for social
      amplificationLevel: 0.1
    }]
  }
}

Result: Energetic narration for short-form social videos.

Technical Tutorials

{
  voiceOver: {
    enabled: true,
    aiVoices: [{
      speaker: "Brian",       // Professional male voice
      speed: 90,              // Slower for technical content
      amplificationLevel: 0
    }]
  }
}

Result: Measured, clear narration for complex topics.

Best Practices

Select the Right Voice

Match voice characteristics to your content:

Professional Content: Use Brian, Matthew, or Joanna
Casual/Friendly: Use Emma, Amy, or Joey
Educational: Choose clear voices like Emma or Brian
Brand Consistency: Use the same voice across all your videos
Test Options: Try different voices to find the best fit

Optimize Voice Speed

Adjust speed based on content type:

Complex Content: Use 80-95 for technical or educational material
Standard Content: Use 95-105 for most narration
Social Media: Use 110-120 for energetic, engaging delivery
Urgent Content: Use 120-150 for quick updates
Test Pacing: Always preview to ensure speed feels natural

Use Amplification Wisely

Apply volume strategically:

Subtle Emphasis: Use 0.1-0.3 for gentle highlighting
Normal Content: Keep at 0 for most narration
Strong Emphasis: Use 0.4-0.6 for key messages
Avoid Extremes: Don’t exceed 0.7 to prevent distortion
Consistency: Maintain similar levels across similar content

Write for Voice-Over

Prepare text that sounds natural when spoken:

Conversational Tone: Write as you would speak, not formal writing
Short Sentences: Keep sentences concise for better pacing
Clear Pronunciation: Spell out acronyms and difficult terms
Proper Punctuation: Use periods and commas for natural pauses
Test by Reading: Read your text aloud before converting

Plan Scene Breaks

Structure scenes for effective narration:

Logical Breaks: Split at natural pauses in your content
Scene Length: Aim for 5-15 seconds per scene (30-100 words)
Visual Match: Ensure each scene’s visuals match the narration
Pacing: Use scene breaks to control video rhythm
Test Flow: Review to ensure smooth transitions

Troubleshooting

Voice sounds robotic or unnatural

Problem: The AI narration does not sound natural.Solution:

Try a different AI voice speaker
Adjust speed to 95-105 for more natural cadence
Ensure your text has proper punctuation
Avoid using all caps or unusual formatting
Write in a conversational tone, not formal writing
Use the Get Voiceover Tracks API to explore voice options

Narration is too fast or too slow

Problem: Voice-over pacing does not match your expectations.Solution:

Adjust the speed parameter:
- Decrease to 80-90 for slower narration
- Increase to 110-120 for faster delivery
Test different speeds to find the sweet spot
Consider your audience - educational content needs slower pacing
Very complex content may need speeds as low as 75

Voice-Over Does Not Sync with Scenes

Problem: Narration timing seems off compared to visuals.Solution:

The API automatically syncs voice to video duration
Adjust scene breaks (createSceneOnNewLine, createSceneOnEndOfSentence)
Keep scenes to reasonable lengths (5-15 seconds each)
If using manual scenes, ensure text length matches desired scene duration
Review the completed video - timing may feel different than expected

Audio quality is poor or distorted

Problem: Voice-over sounds muffled, distorted, or has audio artifacts.Solution:

Reduce amplificationLevel to 0 or below
Avoid levels above 0.7 which can cause clipping
Try a different AI voice - some have better audio quality
Check for unusual characters or formatting in source text
Ensure text does not have excessive special characters

Can't find the right voice

Problem: None of the voices seem right for your content.Solution:

Use the Get Voiceover Tracks API to see all options
Test multiple voices with the same content
Consider the voice characteristics table above for guidance
Try adjusting speed and amplification with different voices
Some voices work better for specific content types

Next Steps

Enhance your voice-over videos with these features:

Multi-Level Voice-Over

Use different voices or settings for different scenes

Background Music

Add music to complement voice-over narration

Custom Captions

Add translated or custom subtitles to your videos

Basic Text to Video

Create videos without voice-over

API Reference

For complete technical details, see:

Get Voiceover Tracks

List all available AI voices

Render Storyboard Video

Direct video rendering with voice-over

Create Storyboard Preview

Create preview before rendering

Get Job Status

Monitor video creation progress

Getting started

Text to Video

Video with Avatar

Article to Video

Presentation to Video

Audio to Video

Video to Shorts

AI-Generated Visuals

Video Story CoPilot

Smart Layouts and Subtitles

Branding & Customization

Template to Video

Background Music

Video Storyboard

Voice-Over

Advanced Features

​What You’ll Learn

AI Voice-Over

Multiple Voices

Auto-Sync

Voice Customization

​Before You Begin

​How Text-to-Video with Voice-Over Works

​Complete Example

​Understanding the Parameters

​Voice-Over Configuration

​Scene Configuration

​Voice Speed Reference

​Amplification Level Reference

​Choosing Your AI Voice

​Common Use Cases

​Educational Content

​Marketing and Sales

​Social Media Content

​Technical Tutorials

​Best Practices

​Troubleshooting

​Next Steps

Multi-Level Voice-Over

Background Music

Custom Captions

Basic Text to Video

​API Reference

Get Voiceover Tracks

Render Storyboard Video

Create Storyboard Preview

Get Job Status

What You’ll Learn

Before You Begin

How Text-to-Video with Voice-Over Works

Complete Example

Understanding the Parameters

Voice-Over Configuration

Scene Configuration

Voice Speed Reference

Amplification Level Reference

Choosing Your AI Voice

Common Use Cases

Educational Content

Marketing and Sales

Social Media Content

Technical Tutorials

Best Practices

Troubleshooting

Next Steps

API Reference