ElevenLabs Voice Auto-Discovery
You can now use any voice from the ElevenLabs catalog without manual setup. Pass the ElevenLabs voice ID directly as thespeaker value in your voiceover configuration, and Pictory will automatically discover and add the voice to your library.- No manual voice registration — the Add Voiceover Track endpoint is no longer needed for ElevenLabs voices.
- Pass any ElevenLabs voice ID as the
speakerfield (e.g.,"pNInz6obpgDQGcFmaJgB"). - Automatic library addition — once discovered, the voice is available by name or track ID in future requests.
- Premium voice settings — fine-tune stability, similarity boost, style, and model selection.
POST /v1/voiceovers/tracks endpoint has been removed. ElevenLabs voices are now added automatically on first use.See the ElevenLabs Voices Guide for full documentation.AI Visuals: Visual Continuity, Reference Images, and Creative Direction
New features for AI-generated scene backgrounds that give you greater creative control over video and image generation.Visual Continuity- Enable
visualContinuityto create seamless transitions between consecutive AI-generated scenes. - Works with both video clips and images.
- The system uses the output of each scene as a reference for the next, producing a cohesive visual flow.
- Continuity applies within the same story and across consecutive stories when enabled.
- Use
firstFrameImageUrlto control the starting frame of an AI-generated video clip. - The AI model generates a video that begins from your provided image and transitions into the motion described by the prompt.
- Use
referenceImageUrlto guide the style and composition of AI-generated images with a reference image. - Influence the color palette, lighting, and visual tone while generating new content from your prompt.
- Use
referenceImageUrlsto provide 1–2 reference images that guide the style of AI-generated video clips. - When using
veo3.1orveo3.1_fastwith reference images, the video duration is automatically set to"8s".
- When a story is split into multiple scenes, the
promptfield acts as a creative direction for the entire video. - The system uses your creative direction to guide the auto-generated prompts for each individual scene.
- Recommended structure: [Action/Movement] + [Scene/Environment] + [Camera Technique] + [Visual Style].
- Job responses now include
aiCreditsUsedwhen AI visuals are generated, reporting total credits consumed across all scenes.
AI Studio: AI Image and Video Generation
Pictory now offers AI Studio, a new set of APIs for generating AI-powered images and videos directly from text prompts. AI Studio gives you access to multiple AI models, aspect ratios, visual styles, and advanced creative input modes.Image Generation- Generate images from text prompts with your choice of AI model and visual style.
- Use a reference image to create variations, replace subjects, or apply transformations.
- Choose from models including
seedream3.0,flux-schnell,nanobanana, andnanobanana-pro. - Apply visual styles such as
photorealistic,artistic,cartoon,minimalist,vintage, andfuturistic.
- Generate videos from text prompts with configurable duration and aspect ratio.
- Start a video from a specific image using the first frame input for precise visual control.
- Extend an existing video with new AI-generated content that continues from the original.
- Provide up to three reference images to guide subject appearance and scene composition.
- Choose from models including
veo3.1,veo3.1_fast, andpixverse5.5.
- Retrieve paginated lists of all your generated images and videos.
- Results are sorted by creation date with the most recent items first.
- Every completed job reports the number of AI credits consumed.
- Image generation is charged per image, and video generation is charged per second of output.
- See the rate card in each model’s documentation for exact pricing.
Dynamic Captions with Word-Level Timing
Dynamic captions now render subtitles word by word, synchronized precisely with the voiceover audio. This produces a more engaging viewing experience where each word appears on screen as it is spoken.- Set
maxSubtitleLinesto a value from1to4to control how many lines of subtitles are displayed at a time. - Word-level timing is applied by default when dynamic captions are active.
