Skip to main content
This guide shows you how to generate an AI video using reference images along with a text prompt. By providing up to three reference images, you can influence the subjects, style, and composition of the generated video. This is particularly useful for combining elements from multiple images into a single video scene.

What You Will Build

Multi-Image Reference

Use up to three images to guide the video output

Subject Composition

Combine subjects from different images into one scene

Visual Consistency

Maintain character or scene appearance across generations

Creative Direction

Steer the video style and content with visual references

Before You Begin

Make sure you have:
  • A Pictory API key (get one here)
  • Node.js or Python installed on your machine
  • The required packages installed
  • One to three publicly accessible image URLs to use as references
npm install axios

Step-by-Step Guide

Step 1: Set Up Your Request

Prepare your API credentials, the reference image URLs, and a prompt that describes the desired video. The prompt should explain how the subjects or elements from the reference images should appear in the video.
import axios from "axios";

const API_BASE_URL = "https://api.pictory.ai/pictoryapis";
const API_KEY = "YOUR_API_KEY"; // Replace with your actual API key

// Video generation with reference images
const videoRequest = {
  prompt: "The woman from the first image walks through the open field from the second image, approaching the farmhouse in the distance in a wide shot",
  referenceImageUrls: [
    "https://example.com/images/woman-portrait-outdoor.png",
    "https://example.com/images/green-field-farmhouse.png"
  ],
  model: "pixverse5.5",
  aspectRatio: "9:16",
  duration: "8s"
};
The referenceImageUrls array accepts 1 to 3 image URLs. All URLs must point to publicly accessible images. This parameter cannot be used together with firstFrameImageUrl or extendVideoUrl.

Step 2: Submit the Video Generation Request

Send the request to the AI Studio video generation endpoint.
async function generateVideoFromReferences() {
  try {
    console.log("Submitting reference-based video generation request...");

    const response = await axios.post(
      `${API_BASE_URL}/v1/aistudio/videos`,
      videoRequest,
      {
        headers: {
          "Content-Type": "application/json",
          Authorization: API_KEY,
        },
      }
    );

    const jobId = response.data.data.jobId;
    console.log("Video generation started.");
    console.log("Job ID:", jobId);

    return jobId;
  } catch (error) {
    console.error("Error submitting request:", error.response?.data || error.message);
    throw error;
  }
}

Step 3: Poll for the Result

Check the job status at regular intervals until the video is ready.
async function waitForVideo(jobId) {
  console.log("\nPolling for video generation result...");

  while (true) {
    const response = await axios.get(
      `${API_BASE_URL}/v1/jobs/${jobId}`,
      {
        headers: { Authorization: API_KEY },
      }
    );

    const data = response.data;
    const status = data.data.status;
    console.log("Status:", status);

    if (status === "completed") {
      console.log("\nVideo generated successfully!");
      console.log("Video URL:", data.data.url);
      console.log("Duration:", data.data.duration);
      console.log("Dimensions:", data.data.width, "x", data.data.height);
      console.log("AI Credits Used:", data.data.aiCreditsUsed);
      return data;
    }

    if (status === "failed") {
      throw new Error("Video generation failed: " + JSON.stringify(data));
    }

    // Wait 15 seconds before polling again
    await new Promise(resolve => setTimeout(resolve, 15000));
  }
}

// Run the complete workflow
generateVideoFromReferences()
  .then(jobId => waitForVideo(jobId))
  .then(result => console.log("\nDone!"))
  .catch(error => console.error("Error:", error));

Understanding the Parameters

ParameterTypeRequiredDefaultDescription
promptstringYesA text description that explains how the reference images should be used in the video. Must be between 5 and 5,000 characters.
referenceImageUrlsarray of stringsNoAn array of 1 to 3 publicly accessible image URLs to guide the generation. Each entry must be a valid URI. Cannot be used together with firstFrameImageUrl or extendVideoUrl.
modelstringNopixverse5.5The AI model to use for generation. Supported values: veo3.1, veo3.1_fast, pixverse5.5. See Generate Video API for model capabilities and pricing.
aspectRatiostringNoFirst supported ratio of the selected modelThe output aspect ratio. Valid values depend on the model. For example, pixverse5.5 supports 16:9, 9:16, 1:1, 3:4, 4:3, while veo3.1 supports 16:9, 9:16.
durationstringNoFirst supported duration of the selected modelThe video length. Valid values depend on the model. For example, pixverse5.5 supports 5s, 8s, 10s, while veo3.1 supports 4s, 6s, 8s.
webhookstringNoA URL to receive a POST notification when the job completes. Must be a valid URI.

Use Cases for Reference Images

Use CaseNumber of ImagesDescription
Character consistency1Provide a character portrait to maintain appearance across video generations
Scene composition2Combine a character image with a background or environment image
Multi-subject scenes2 to 3Provide images of different subjects that should appear together in the video
Style reference1Use an image with the desired visual style to influence the output

Tips for Reference Image Videos

  • Describe the relationship between images. In your prompt, explain how the elements from each reference should interact. For example, “The person from the first image walks into the landscape from the second image.”
  • Use high-quality references. Clear, well-lit images produce better results. Avoid blurry or heavily compressed images.
  • Keep subjects distinct. When combining multiple images, ensure each reference contributes a clearly identifiable element such as a character, background, or object.
  • Order matters. Reference the images by position (“the first image”, “the second image”) in your prompt so the model can correctly associate each image with the corresponding subject.

Next Steps