Skip to main content
This guide shows you how to create videos with sophisticated voice-over configurations. Learn to set a default voice for your entire video while customizing specific scenes with different voice settings, speeds, or amplification levels.

What You’ll Learn

Video-Level Default

Set a default voice for all scenes

Scene-Level Override

Customize voice for specific scenes

Voice Customization

Control speed and amplification per voice

Multiple Voices

Use different AI voices in one video

Before You Begin

Make sure you have:
  • A Pictory API key (get one here)
  • Node.js or Python installed on your machine
  • Basic understanding of voice-over concepts
npm install axios

How Multi-Level Voice-Over Works

Multi-level voice-over gives you granular control over narration:
  1. Video-Level Settings: Define default voice-over configuration for all scenes
  2. Scene-Level Overrides: Customize voice settings for specific scenes
  3. Automatic Fallback: Scenes without custom settings use the video-level default
  4. Flexible Control: Mix and match voices, speeds, and volumes throughout your video
Scene-level voice-over settings always override video-level settings for that specific scene. This allows precise control while maintaining consistency across your video.

Complete Example

import axios from "axios";

const API_BASE_URL = "https://api.pictory.ai/pictoryapis";
const API_KEY = "YOUR_API_KEY";

// Different text for different scenes
const INTRO_TEXT = "Welcome to our comprehensive guide on AI in education.";
const MAIN_TEXT = "AI is transforming how educators create content, automate workflows, and personalize learning experiences.";
const OUTRO_TEXT = "Thank you for watching. Subscribe for more insights.";

async function createVideoWithMultiLevelVoiceOver() {
  try {
    console.log("Creating video with multi-level voice-over...");

    const response = await axios.post(
      `${API_BASE_URL}/v2/video/storyboard/render`,
      {
        videoName: "multilevel_voiceover_demo",

        // Video-level voice-over (default for all scenes)
        voiceOver: {
          enabled: true,
          aiVoices: [
            {
              speaker: "Brian",
              speed: 100,           // Normal speed
              amplificationLevel: 0, // Normal volume
            },
          ],
        },

        scenes: [
          {
            story: INTRO_TEXT,
            createSceneOnNewLine: false,
            createSceneOnEndOfSentence: false,

            // Scene-specific voice-over (slower and louder for emphasis)
            voiceOver: {
              enabled: true,
              aiVoices: [
                {
                  speaker: "Brian",
                  speed: 85,           // 15% slower for emphasis
                  amplificationLevel: 0.3, // Slightly louder
                },
              ],
            },
          },
          {
            story: MAIN_TEXT,
            createSceneOnNewLine: false,
            createSceneOnEndOfSentence: false,
            // No scene-level voice-over = uses video-level default
          },
          {
            story: OUTRO_TEXT,
            createSceneOnNewLine: false,
            createSceneOnEndOfSentence: false,

            // Different scene-level settings for outro
            voiceOver: {
              enabled: true,
              aiVoices: [
                {
                  speaker: "Brian",
                  speed: 90,           // Slightly slower
                  amplificationLevel: 0.1, // Slightly louder
                },
              ],
            },
          },
        ],
      },
      {
        headers: {
          "Content-Type": "application/json",
          Authorization: API_KEY,
        },
      }
    );

    const jobId = response.data.data.jobId;
    console.log("✓ Video creation started!");
    console.log("Job ID:", jobId);

    // Monitor progress
    console.log("\nMonitoring video creation...");
    let jobCompleted = false;
    let jobResult = null;

    while (!jobCompleted) {
      const statusResponse = await axios.get(
        `${API_BASE_URL}/v1/jobs/${jobId}`,
        {
          headers: { Authorization: API_KEY },
        }
      );

      const status = statusResponse.data.data.status;
      console.log("Status:", status);

      if (status === "completed") {
        jobCompleted = true;
        jobResult = statusResponse.data;
        console.log("\n✓ Video is ready!");
        console.log("Video URL:", jobResult.data.videoURL);
      } else if (status === "failed") {
        throw new Error("Video creation failed: " + JSON.stringify(statusResponse.data));
      }

      await new Promise(resolve => setTimeout(resolve, 5000));
    }

    return jobResult;
  } catch (error) {
    console.error("Error:", error.response?.data || error.message);
    throw error;
  }
}

createVideoWithMultiLevelVoiceOver();

Understanding the Configuration

Video-Level Voice-Over (Default Settings)

ParameterTypeDefaultDescription
voiceOver.enabledbooleanfalseEnable voice-over for the entire video
voiceOver.aiVoicesarray-Array of AI voice configurations
voiceOver.aiVoices[].speakerstring-AI voice name (e.g., “Brian”, “Emma”)
voiceOver.aiVoices[].speednumber100Voice speed (50-200)
voiceOver.aiVoices[].amplificationLevelnumber0Volume level (-1 to 1)

Scene-Level Voice-Over (Overrides)

ParameterTypeDescription
scenes[].voiceOverobjectScene-specific voice-over settings (same structure as video-level)
scenes[].voiceOver.enabledbooleanEnable/disable voice-over for this scene
scenes[].voiceOver.aiVoicesarrayCustom voice configuration for this scene
Override Behavior: When you define scene-level voice-over, it completely replaces the video-level settings for that scene. If you want to use the same voice but different settings, you must specify the voice name again.

Voice Speed Reference

Speed ValuePlayback RateBest Used For
500.5x (Very slow)Complex technical content, learning materials
750.75x (Slower)Detailed explanations, emphasis
85-900.85-0.9x (Slightly slower)Introduction, important points
1001.0x (Normal)Standard content, most scenes
110-1251.1-1.25x (Slightly faster)Casual content, transitions
1501.5x (Fast)Quick summaries, energetic content
2002.0x (Very fast)Speed reading, urgent calls-to-action

Amplification Level Reference

LevelEffectBest Used For
-1.0Quietest (background)Subtle narration, ambient voice
-0.5Quieter than normalDe-emphasized content
0Normal volumeStandard narration
0.3Slightly louderMild emphasis
0.5Moderately louderImportant points
1.0LoudestStrong emphasis, calls-to-action
Volume Considerations: Very high amplification levels (0.7-1.0) may cause audio distortion. Test your settings and use moderation for professional results.

Common Use Cases

Emphasizing Key Sections

// Intro: Slower and louder for impact
voiceOver: {
  enabled: true,
  aiVoices: [{
    speaker: "Brian",
    speed: 85,
    amplificationLevel: 0.3
  }]
}

// Main content: Normal settings
// (uses video-level default)

// Call-to-action: Faster and louder
voiceOver: {
  enabled: true,
  aiVoices: [{
    speaker: "Brian",
    speed: 110,
    amplificationLevel: 0.5
  }]
}

Tutorial Videos

// Step explanations: Slower for clarity
voiceOver: {
  enabled: true,
  aiVoices: [{
    speaker: "Emma",
    speed: 90,
    amplificationLevel: 0
  }]
}

// Quick transitions: Faster pace
voiceOver: {
  enabled: true,
  aiVoices: [{
    speaker: "Emma",
    speed: 120,
    amplificationLevel: 0
  }]
}

Multi-Language Support

// English sections
voiceOver: {
  enabled: true,
  aiVoices: [{
    speaker: "Brian",  // English voice
    speed: 100,
    amplificationLevel: 0
  }]
}

// Spanish sections
voiceOver: {
  enabled: true,
  aiVoices: [{
    speaker: "Maria",  // Spanish voice
    speed: 100,
    amplificationLevel: 0
  }]
}

Best Practices

Don’t vary speed too dramatically between scenes - sudden changes can be jarring:
  • Good: Vary by 10-20 points (e.g., 90-110)
  • Avoid: Extreme jumps (e.g., 50 to 200)
  • Tip: Test your video to ensure smooth transitions
Subtle volume changes are more professional than dramatic ones:
  • Good: Use 0.1-0.3 for emphasis
  • Moderate: Use 0.5 for strong emphasis
  • Avoid: Levels above 0.7 unless intentional
  • Tip: Let the content quality drive emphasis, not just volume
Keep voice settings consistent across similar scenes:
  • Use the same voice for all sections
  • Apply similar speed/volume to similar content types
  • Create a “voice style guide” for your brand
Always review a test video before creating multiple videos:
  • Generate a short sample with your settings
  • Listen on different devices (phone, desktop, headphones)
  • Adjust based on feedback
  • Document successful settings for reuse

Troubleshooting

Problem: Scene isn’t using your custom voice-over settings.Solution:
  • Ensure scene-level voiceOver object is properly formatted
  • Check that you’ve included the speaker name (it must be specified even if using the same voice)
  • Verify JSON syntax is correct (commas, brackets, quotes)
Problem: Audio quality is poor or distorted.Solution:
  • Reduce amplificationLevel (try 0.3 or lower)
  • Avoid combining high speed (>150) with high amplification (>0.5)
  • Check that speed values are within 50-200 range
Problem: Scene-level overrides aren’t being applied.Solution:
  • Verify scene-level voiceOver is inside the scene object
  • Check that enabled: true is set at scene level
  • Ensure scene-level configuration is complete (speaker, speed, amplificationLevel)

Next Steps

Enhance your voice-over videos with these features:

API Reference

For complete technical details, see: