Skip to main content
This guide shows you how to create videos from text with professional AI-generated voice-over narration. Transform written content into narrated videos perfect for social media, YouTube, educational content, or marketing materials.

What You’ll Learn

AI Voice-Over

Add natural-sounding narration to your videos

Multiple Voices

Choose from various AI voice speakers

Auto-Sync

Voice automatically syncs with video scenes

Voice Customization

Control speed and volume for perfect delivery

Before You Begin

Make sure you have:
  • A Pictory API key (get one here)
  • Node.js or Python installed on your machine
  • Text content ready for video conversion
  • Basic understanding of voice-over concepts
npm install axios

How Text-to-Video with Voice-Over Works

When you create a video with AI voice-over:
  1. Text Processing - Your text content is analyzed and prepared
  2. Scene Generation - Text is split into logical video scenes
  3. Visual Selection - AI selects appropriate stock visuals for each scene
  4. Voice Generation - Professional AI narration is created from your text
  5. Synchronization - Voice-over is automatically synchronized with video timing
  6. Caption Creation - Subtitles are generated to match the narration
  7. Video Rendering - Final video is assembled with all elements combined
The AI voice-over is automatically synchronized with your video scenes. The narration timing adjusts based on text length, voice speed, and scene duration for natural pacing.

Complete Example

import axios from "axios";

const API_BASE_URL = "https://api.pictory.ai/pictoryapis";
const API_KEY = "YOUR_API_KEY";

const SAMPLE_TEXT =
  "AI is poised to significantly impact educators and course creators on social media. " +
  "By automating tasks like content generation, visual design, and video editing, " +
  "AI will save time and enhance consistency.";

async function createTextToVideoWithVoiceOver() {
  try {
    console.log("Creating video with AI voice-over...");

    const response = await axios.post(
      `${API_BASE_URL}/v2/video/storyboard/render`,
      {
        videoName: "text_to_video_with_ai_voice",

        // Voice-over configuration
        voiceOver: {
          enabled: true,                    // Enable voice-over
          aiVoices: [
            {
              speaker: "Brian",              // AI voice name
              speed: 100,                    // Normal speed (50-200)
              amplificationLevel: 0,         // Normal volume (-1 to 1)
            },
          ],
        },

        // Scene configuration
        scenes: [
          {
            story: SAMPLE_TEXT,
            createSceneOnNewLine: true,      // New scene per line
            createSceneOnEndOfSentence: true, // New scene per sentence
          },
        ],
      },
      {
        headers: {
          "Content-Type": "application/json",
          Authorization: API_KEY,
        },
      }
    );

    const jobId = response.data.data.jobId;
    console.log("✓ Video creation started!");
    console.log("Job ID:", jobId);

    // Monitor progress
    console.log("\nMonitoring video creation...");
    let jobCompleted = false;
    let jobResult = null;

    while (!jobCompleted) {
      const statusResponse = await axios.get(
        `${API_BASE_URL}/v1/jobs/${jobId}`,
        {
          headers: { Authorization: API_KEY },
        }
      );

      const status = statusResponse.data.data.status;
      console.log("Status:", status);

      if (status === "completed") {
        jobCompleted = true;
        jobResult = statusResponse.data;
        console.log("\n✓ Video with AI voice-over is ready!");
        console.log("Video URL:", jobResult.data.videoURL);
      } else if (status === "failed") {
        throw new Error("Video creation failed: " + JSON.stringify(statusResponse.data));
      }

      await new Promise(resolve => setTimeout(resolve, 5000));
    }

    return jobResult;
  } catch (error) {
    console.error("Error:", error.response?.data || error.message);
    throw error;
  }
}

createTextToVideoWithVoiceOver();

Understanding the Parameters

Voice-Over Configuration

ParameterTypeRequiredDescription
voiceOver.enabledbooleanYesSet to true to enable voice-over narration
voiceOver.aiVoicesarrayYesArray of AI voice configurations (currently supports one voice)
voiceOver.aiVoices[].speakerstringYesName of the AI voice (e.g., “Brian”, “Emma”)
voiceOver.aiVoices[].speednumberNoVoice speed 50-200 (default: 100 = normal)
voiceOver.aiVoices[].amplificationLevelnumberNoVolume level -1 to 1 (default: 0 = normal)

Scene Configuration

ParameterTypeDefaultDescription
scenes[].storystring-Text content to be narrated
createSceneOnNewLinebooleanfalseCreate new scene at line breaks
createSceneOnEndOfSentencebooleanfalseCreate new scene at sentence endings

Voice Speed Reference

Speed ValuePlayback RateBest Used For
500.5x (Very slow)Complex technical content, learning materials
750.75x (Slower)Detailed explanations, emphasis
900.9x (Slightly slower)Professional presentations, important information
1001.0x (Normal)Standard content, most use cases
110-1201.1-1.2x (Slightly faster)Casual content, social media
1501.5x (Fast)Quick summaries, energetic content
2002.0x (Very fast)Speed reading, urgent updates

Amplification Level Reference

LevelEffectBest Used For
-1.0QuietestBackground narration, ambient voice
-0.5Quieter than normalDe-emphasized content
0Normal volumeStandard narration, most content
0.3Slightly louderMild emphasis, important points
0.5Moderately louderStrong emphasis
1.0LoudestMaximum emphasis, calls-to-action
Volume Considerations: Very high amplification levels (0.7-1.0) may cause audio distortion. Test your settings and use moderation for professional results.

Choosing Your AI Voice

The Pictory API supports various AI voice speakers with different characteristics:
Voice TypeExample NamesBest Used For
Male ProfessionalBrian, Matthew, JoeyBusiness content, technical tutorials, corporate videos
Female ProfessionalEmma, Joanna, AmyEducational content, friendly tutorials, customer service
ConversationalMultiple optionsCasual content, social media, storytelling
To get a complete list of available voices programmatically, use the Get Voiceover Tracks API endpoint. This returns all available AI voices with their names, languages, and characteristics.

Common Use Cases

Educational Content

{
  voiceOver: {
    enabled: true,
    aiVoices: [{
      speaker: "Emma",        // Clear female voice
      speed: 95,              // Slightly slower for comprehension
      amplificationLevel: 0
    }]
  }
}
Result: Clear, paced narration perfect for learning materials.

Marketing and Sales

{
  voiceOver: {
    enabled: true,
    aiVoices: [{
      speaker: "Matthew",     // Professional male voice
      speed: 110,             // Slightly faster for energy
      amplificationLevel: 0.2 // Louder for emphasis
    }]
  }
}
Result: Dynamic, engaging narration for promotional content.

Social Media Content

{
  voiceOver: {
    enabled: true,
    aiVoices: [{
      speaker: "Amy",         // Friendly female voice
      speed: 115,             // Faster pace for social
      amplificationLevel: 0.1
    }]
  }
}
Result: Energetic narration for short-form social videos.

Technical Tutorials

{
  voiceOver: {
    enabled: true,
    aiVoices: [{
      speaker: "Brian",       // Professional male voice
      speed: 90,              // Slower for technical content
      amplificationLevel: 0
    }]
  }
}
Result: Measured, clear narration for complex topics.

Best Practices

Match voice characteristics to your content:
  • Professional Content: Use Brian, Matthew, or Joanna
  • Casual/Friendly: Use Emma, Amy, or Joey
  • Educational: Choose clear voices like Emma or Brian
  • Brand Consistency: Use the same voice across all your videos
  • Test Options: Try different voices to find the best fit
Adjust speed based on content type:
  • Complex Content: Use 80-95 for technical or educational material
  • Standard Content: Use 95-105 for most narration
  • Social Media: Use 110-120 for energetic, engaging delivery
  • Urgent Content: Use 120-150 for quick updates
  • Test Pacing: Always preview to ensure speed feels natural
Apply volume strategically:
  • Subtle Emphasis: Use 0.1-0.3 for gentle highlighting
  • Normal Content: Keep at 0 for most narration
  • Strong Emphasis: Use 0.4-0.6 for key messages
  • Avoid Extremes: Don’t exceed 0.7 to prevent distortion
  • Consistency: Maintain similar levels across similar content
Prepare text that sounds natural when spoken:
  • Conversational Tone: Write as you would speak, not formal writing
  • Short Sentences: Keep sentences concise for better pacing
  • Clear Pronunciation: Spell out acronyms and difficult terms
  • Proper Punctuation: Use periods and commas for natural pauses
  • Test by Reading: Read your text aloud before converting
Structure scenes for effective narration:
  • Logical Breaks: Split at natural pauses in your content
  • Scene Length: Aim for 5-15 seconds per scene (30-100 words)
  • Visual Match: Ensure each scene’s visuals match the narration
  • Pacing: Use scene breaks to control video rhythm
  • Test Flow: Review to ensure smooth transitions

Troubleshooting

Problem: The AI narration doesn’t sound natural.Solution:
  • Try a different AI voice speaker
  • Adjust speed to 95-105 for more natural cadence
  • Ensure your text has proper punctuation
  • Avoid using all caps or unusual formatting
  • Write in a conversational tone, not formal writing
  • Use the Get Voiceover Tracks API to explore voice options
Problem: Voice-over pacing doesn’t match your expectations.Solution:
  • Adjust the speed parameter:
    • Decrease to 80-90 for slower narration
    • Increase to 110-120 for faster delivery
  • Test different speeds to find the sweet spot
  • Consider your audience - educational content needs slower pacing
  • Very complex content may need speeds as low as 75
Problem: Narration timing seems off compared to visuals.Solution:
  • The API automatically syncs voice to video duration
  • Adjust scene breaks (createSceneOnNewLine, createSceneOnEndOfSentence)
  • Keep scenes to reasonable lengths (5-15 seconds each)
  • If using manual scenes, ensure text length matches desired scene duration
  • Review the completed video - timing may feel different than expected
Problem: Voice-over sounds muffled, distorted, or has audio artifacts.Solution:
  • Reduce amplificationLevel to 0 or below
  • Avoid levels above 0.7 which can cause clipping
  • Try a different AI voice - some have better audio quality
  • Check for unusual characters or formatting in source text
  • Ensure text doesn’t have excessive special characters
Problem: None of the voices seem right for your content.Solution:
  • Use the Get Voiceover Tracks API to see all options
  • Test multiple voices with the same content
  • Consider the voice characteristics table above for guidance
  • Try adjusting speed and amplification with different voices
  • Some voices work better for specific content types

Next Steps

Enhance your voice-over videos with these features:

API Reference

For complete technical details, see: