Skip to main content
This guide shows you how to create videos with custom captions that are separate from your story text. Perfect for creating multilingual content where the visuals are selected based on one language while displaying subtitles in another.

What You’ll Learn

Custom Captions

Add subtitles separate from story text

Multi-Language Support

Display captions in different languages

Dual Text System

Story for visuals, captions for display

Translation Ready

Perfect for international audiences

Before You Begin

Make sure you have:
  • A Pictory API key (get one here)
  • Node.js or Python installed on your machine
  • Your content text and translated captions prepared
  • Language codes for your target language
npm install axios

How Caption System Works

Pictory’s caption system uses a dual-text approach:
  1. Story Text - The main content used by AI to understand context and select appropriate visuals
  2. Caption Text - The text displayed as subtitles in your final video
  3. Language Code - Specifies the caption language for proper formatting
  4. Automatic Sync - Captions are automatically synchronized with video timing
This separation allows you to create videos in one language (for visual selection) while displaying subtitles in another language, making your content accessible to global audiences.

Complete Example

import axios from "axios";

const API_BASE_URL = "https://api.pictory.ai/pictoryapis";
const API_KEY = "YOUR_API_KEY";

// Story text - used for AI visual selection and content understanding
const STORY_TEXT = "AI is poised to significantly impact educators and course creators on social media.";

// Caption text - displayed as subtitles (Spanish translation)
const CAPTION_TEXT = "La IA está destinada a impactar significativamente a los educadores y creadores de cursos en las redes sociales.";

async function createTextToVideoWithCaption() {
  try {
    console.log("Creating video with custom captions...");

    const response = await axios.post(
      `${API_BASE_URL}/v2/video/storyboard/render`,
      {
        videoName: "text_to_video_with_caption",
        scenes: [
          {
            story: STORY_TEXT,              // English for visual selection
            caption: CAPTION_TEXT,          // Spanish for subtitles
            captionLanguage: "es",          // Spanish language code
            createSceneOnNewLine: false,    // Required when using captions
            createSceneOnEndOfSentence: false, // Required when using captions
          },
        ],
      },
      {
        headers: {
          "Content-Type": "application/json",
          Authorization: API_KEY,
        },
      }
    );

    const jobId = response.data.data.jobId;
    console.log("✓ Video creation started!");
    console.log("Job ID:", jobId);

    // Monitor progress
    console.log("\nMonitoring video creation...");
    let jobCompleted = false;
    let jobResult = null;

    while (!jobCompleted) {
      const statusResponse = await axios.get(
        `${API_BASE_URL}/v1/jobs/${jobId}`,
        {
          headers: { Authorization: API_KEY },
        }
      );

      const status = statusResponse.data.data.status;
      console.log("Status:", status);

      if (status === "completed") {
        jobCompleted = true;
        jobResult = statusResponse.data;
        console.log("\n✓ Video with captions is ready!");
        console.log("Video URL:", jobResult.data.videoURL);
      } else if (status === "failed") {
        throw new Error("Video creation failed: " + JSON.stringify(statusResponse.data));
      }

      await new Promise(resolve => setTimeout(resolve, 5000));
    }

    return jobResult;
  } catch (error) {
    console.error("Error:", error.response?.data || error.message);
    throw error;
  }
}

createTextToVideoWithCaption();

Caption Configuration Parameters

Scene Configuration

ParameterTypeRequiredDescription
storystringYesThe main text used for AI visual selection and content understanding
captionstringYesThe text that will be displayed as subtitles in the video
captionLanguagestringYesISO 639-1 language code for the caption (e.g., “es” for Spanish)
createSceneOnNewLinebooleanYesMust be false when using captions
createSceneOnEndOfSentencebooleanYesMust be false when using captions
Important Limitation: When using custom captions, both createSceneOnNewLine and createSceneOnEndOfSentence must be set to false. This means you need to manually create multiple scenes if you want scene breaks with captions.

Supported Caption Languages

LanguageCodeLanguageCode
ChinesezhPortuguesept
DutchnlRussianru
EnglishenSpanishes
FrenchfrHindihi
GermandeJapaneseja
ItalianitKoreanko

Story vs Caption: Understanding the Difference

Story text is the content the AI uses to:
  • Understand the context and meaning of your video
  • Select appropriate stock visuals and images
  • Determine the overall theme and mood
  • Generate relevant background content
Use story text in the language that best describes your visual needs, even if your audience speaks a different language.
Caption text is what your viewers will see:
  • Displayed as subtitles throughout the video
  • Can be a translation of the story text
  • Can be simplified or adapted for your audience
  • Should match your target audience’s language
Use caption text in the language your viewers will understand.
Scenario: Creating a video for Spanish-speaking students about AI in education.
{
  story: "AI is transforming education through personalized learning",
  caption: "La IA está transformando la educación mediante el aprendizaje personalizado",
  captionLanguage: "es"
}
The AI understands the English concept to select educational visuals, but Spanish subtitles appear for your audience.

Common Use Cases

Creating Multilingual Marketing Videos

{
  story: "Our product saves you time and increases productivity",
  caption: "Notre produit vous fait gagner du temps et augmente la productivité",
  captionLanguage: "fr"
}
Result: Professional marketing video with French subtitles, using English-sourced visuals.

Educational Content for International Students

{
  story: "Learn programming fundamentals step by step",
  caption: "プログラミングの基礎を段階的に学ぶ",
  captionLanguage: "ja"
}
Result: Programming tutorial with Japanese subtitles, maintaining clear technical visuals.

Accessibility and Localization

{
  story: "Welcome to our company presentation",
  caption: "Willkommen zu unserer Unternehmenspräsentation",
  captionLanguage: "de"
}
Result: Corporate presentation accessible to German-speaking audiences.

Working with Multiple Scenes

Since automatic scene breaks are disabled when using captions, you need to manually create scenes:
scenes: [
  {
    story: "First concept in English",
    caption: "Primer concepto en español",
    captionLanguage: "es",
    createSceneOnNewLine: false,
    createSceneOnEndOfSentence: false
  },
  {
    story: "Second concept in English",
    caption: "Segundo concepto en español",
    captionLanguage: "es",
    createSceneOnNewLine: false,
    createSceneOnEndOfSentence: false
  }
]

Best Practices

Ensure your caption text accurately reflects the story text:
  • Use professional translation services when possible
  • Keep the meaning and tone consistent
  • Avoid direct word-for-word translations that may sound unnatural
  • Test with native speakers before production
Always use the correct ISO 639-1 language code:
  • Double-check the code matches your caption language
  • Use region-specific codes when necessary (e.g., zh-CN vs zh-TW)
  • Verify the language is supported (see table above)
Keep captions readable within the scene duration:
  • Shorter captions are easier to read
  • Break long sentences into multiple scenes
  • Aim for 1-2 lines of text per scene
  • Test video playback to ensure captions are comfortable to read
Since automatic scene breaks don’t work with captions:
  • Plan scene breaks manually before creating the video
  • Group related content into logical scenes
  • Create separate scene objects for each caption segment
  • Consider the visual flow between scenes

Troubleshooting

Problem: You set createSceneOnNewLine or createSceneOnEndOfSentence to true with captions.Solution: Set both parameters to false:
{
  caption: "Your caption text",
  captionLanguage: "es",
  createSceneOnNewLine: false,  // Must be false
  createSceneOnEndOfSentence: false  // Must be false
}
Problem: The caption text displays but formatting is incorrect.Solution:
  • Verify you’re using the correct language code (e.g., “es” not “spanish”)
  • Check the supported languages table above
  • Ensure the caption text is properly encoded (UTF-8)
Problem: The selected visuals don’t seem relevant to your captions.Solution: Remember that visuals are selected based on story text, not caption text:
  • Make sure your story text clearly describes the visuals you want
  • Keep story text in a language the AI handles well (English recommended)
  • The caption is only for subtitle display, not visual selection
Problem: Captions are cut off or hard to read.Solution:
  • Break long text into multiple scenes
  • Keep each caption to 1-2 lines (approximately 60-100 characters)
  • Create separate scene objects for longer content:
scenes: [
  { story: "Part 1", caption: "Parte 1 del contenido", captionLanguage: "es" },
  { story: "Part 2", caption: "Parte 2 del contenido", captionLanguage: "es" }
]

Next Steps

Enhance your captioned videos with these features:

API Reference

For complete technical details, see: