What You Will Build
Multi-Image Reference
Use up to three images to guide the video output
Subject Composition
Combine subjects from different images into one scene
Visual Consistency
Maintain character or scene appearance across generations
Creative Direction
Steer the video style and content with visual references
Before You Begin
Make sure you have:- A Pictory API key (get one here)
- Node.js or Python installed on your machine
- The required packages installed
- One to three publicly accessible image URLs to use as references
Step-by-Step Guide
Step 1: Set Up Your Request
Prepare your API credentials, the reference image URLs, and a prompt that describes the desired video. The prompt should explain how the subjects or elements from the reference images should appear in the video.The
referenceImageUrls array accepts 1 to 3 image URLs. All URLs must point to publicly accessible images. This parameter cannot be used together with firstFrameImageUrl or extendVideoUrl.Step 2: Submit the Video Generation Request
Send the request to the AI Studio video generation endpoint.Step 3: Poll for the Result
Check the job status at regular intervals until the video is ready.Understanding the Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes | — | A text description that explains how the reference images should be used in the video. Must be between 5 and 5,000 characters. |
referenceImageUrls | array of strings | No | — | An array of 1 to 3 publicly accessible image URLs to guide the generation. Each entry must be a valid URI. Cannot be used together with firstFrameImageUrl or extendVideoUrl. |
model | string | No | pixverse5.5 | The AI model to use for generation. Supported values: veo3.1, veo3.1_fast, pixverse5.5. See Generate Video API for model capabilities and pricing. |
aspectRatio | string | No | First supported ratio of the selected model | The output aspect ratio. Valid values depend on the model. For example, pixverse5.5 supports 16:9, 9:16, 1:1, 3:4, 4:3, while veo3.1 supports 16:9, 9:16. |
duration | string | No | First supported duration of the selected model | The video length. Valid values depend on the model. For example, pixverse5.5 supports 5s, 8s, 10s, while veo3.1 supports 4s, 6s, 8s. |
webhook | string | No | — | A URL to receive a POST notification when the job completes. Must be a valid URI. |
Use Cases for Reference Images
| Use Case | Number of Images | Description |
|---|---|---|
| Character consistency | 1 | Provide a character portrait to maintain appearance across video generations |
| Scene composition | 2 | Combine a character image with a background or environment image |
| Multi-subject scenes | 2 to 3 | Provide images of different subjects that should appear together in the video |
| Style reference | 1 | Use an image with the desired visual style to influence the output |
Tips for Reference Image Videos
- Describe the relationship between images. In your prompt, explain how the elements from each reference should interact. For example, “The person from the first image walks into the landscape from the second image.”
- Use high-quality references. Clear, well-lit images produce better results. Avoid blurry or heavily compressed images.
- Keep subjects distinct. When combining multiple images, ensure each reference contributes a clearly identifiable element such as a character, background, or object.
- Order matters. Reference the images by position (“the first image”, “the second image”) in your prompt so the model can correctly associate each image with the corresponding subject.
Next Steps
- Generate Video from Text Prompt to create videos without reference images
- Generate Video from First Frame to animate from a specific starting frame
- Extend Video with AI to continue an existing video
- Generate Video API Reference for the complete parameter documentation
