What You’ll Learn
Audio to Video
Convert podcasts and audio files to videos
Auto Transcribe
Automatically transcribe audio content
Smart Visuals
Generate visuals synchronized to audio
Captions Included
Auto-generated subtitles from transcription
Before You Begin
Make sure you have:- A Pictory API key (get one here)
- Node.js or Python installed on your machine
- Audio file accessible via public URL
- Basic understanding of audio/podcast formats
How Podcast-to-Video Works
When you convert audio to video:- Audio Processing - Your audio file is accessed and analyzed
- Transcription - AI automatically transcribes the audio content
- Scene Generation - Content is split into logical scenes based on the transcript
- Visual Selection - Appropriate stock visuals are selected to match the audio content
- Caption Generation - Subtitles are created from the transcription
- Video Rendering - Final video is assembled with audio, visuals, and captions synchronized
The audio file must be accessible via a public URL. Upload to cloud storage (Google Drive, Dropbox, AWS S3) and use the public share link. Processing time is proportional to audio length.
Complete Example
Understanding the Parameters
Main Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
videoName | string | Yes | A descriptive name for your video project |
scenes | array | Yes | Array of scene objects |
Scene Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
audioUrl | string | Yes | Public URL to the audio file (podcast, interview, etc.) |
audioLanguage | string | Yes | Language code for audio transcription (e.g., “en-US”) |
Supported Audio Formats
| Format | Extension | Description |
|---|---|---|
| MP3 | .mp3 | Most common podcast format (recommended) |
| WAV | .wav | Uncompressed audio, high quality |
| M4A | .m4a | Apple audio format, good quality |
| AAC | .aac | Advanced audio codec |
| FLAC | .flac | Lossless audio compression |
| OGG | .ogg | Open-source audio format |
Common Use Cases
Podcast Episodes for YouTube
Interview Clips for Social Media
Audio Blog Posts
Webinar Audio Archives
Best Practices
Optimize Audio Quality
Optimize Audio Quality
Ensure high-quality audio for best transcription:
- Clear Speech: Use good microphones for recording
- Minimize Noise: Reduce background noise and echo
- Proper Levels: Avoid audio that’s too quiet or distorted
- Single Speaker: One speaker at a time works best for transcription
- Speaking Pace: Natural speaking pace (not too fast) improves accuracy
- File Quality: Use 128kbps or higher bitrate for MP3
Choose Appropriate Content Length
Choose Appropriate Content Length
Match content length to your platform and audience:
- Social Media: Extract 1-3 minute clips for maximum engagement
- YouTube: 5-15 minutes works well for episodic content
- Full Episodes: Consider breaking 30+ minute podcasts into segments
- Processing Time: Longer audio = longer processing (plan accordingly)
- Viewer Retention: Shorter clips often perform better on social platforms
Make Audio Files Accessible
Make Audio Files Accessible
Ensure your audio file can be accessed:
- Cloud Storage: Upload to Google Drive, Dropbox, or AWS S3
- Public Link: Generate a direct, public download link
- Test Access: Verify link works in incognito browser
- Stable URL: Ensure link won’t expire during processing
- Direct URL: Use direct file URL, not streaming or preview links
Plan for Transcription Accuracy
Plan for Transcription Accuracy
Improve AI transcription results:
- Clear Audio: Clean recordings transcribe more accurately
- Standard Accents: Clear, standard pronunciation works best
- Avoid Jargon: Technical terms may not transcribe perfectly
- Speaker Separation: Clear pauses between speakers help
- Background Music: Minimize or remove background music for better transcription
Extract Key Segments
Extract Key Segments
Create focused, engaging content:
- Highlight Reels: Extract best moments from full episodes
- Topic Segments: Create separate videos for different topics
- Intro Clips: Use episode intros as social media teasers
- Quotes: Extract powerful quotes or insights as short clips
- Series: Create a series of related short videos from one episode
Troubleshooting
Error: Unable to access audio file
Error: Unable to access audio file
Problem: The API cannot download or process your audio file.Solution:
- Verify the URL is publicly accessible (test in incognito browser)
- Ensure it’s a direct download link, not a streaming or preview link
- For Google Drive: Right-click → Share → “Anyone with the link” → Copy link
- For Dropbox: Share → Create link → change “dl=0” to “dl=1” at end of URL
- Check file hasn’t been deleted or moved
- Verify file format is supported (see table above)
Transcription is inaccurate
Transcription is inaccurate
Problem: Auto-generated captions don’t match the audio.Solution:
- Improve audio quality (reduce background noise, use better microphone)
- Ensure speakers speak clearly and at moderate pace
- Reduce background music volume if present
- Check audio isn’t distorted or too quiet
- Try re-recording with better equipment/conditions
- Consider professional audio editing before conversion
Visuals don't match audio content
Visuals don't match audio content
Problem: Selected stock visuals seem unrelated to podcast topic.Solution:
- The AI selects visuals based on transcribed content
- Ensure speakers mention key visual concepts in the audio
- More descriptive language helps AI select better visuals
- Consider extracting specific segments with clearer topics
- Review final video - sometimes visuals are thematic rather than literal
Processing takes very long
Processing takes very long
Problem: Job status shows “in-progress” for extended periods.Solution:
- Audio processing time depends on length and quality
- Expected times:
- 1-5 minutes audio: 5-10 minutes processing
- 10-30 minutes audio: 15-30 minutes processing
- 60+ minutes audio: 45-90 minutes processing
- Large file sizes take longer to download and process
- Check status every 5-10 seconds, not more frequently
- If stuck for 2x expected time, contact support with job ID
Video has no captions
Video has no captions
Problem: Expected auto-generated captions but they’re missing.Solution:
- Captions are automatically generated from transcription
- Verify audio contains clear, audible speech
- Check that audio isn’t purely music or sound effects
- Ensure audio file isn’t corrupted
- Try with a different audio file to test
- Contact support if issue persists
Next Steps
Enhance your podcast videos with these features:Background Music
Add music layers to your podcast videos
Custom Captions
Customize or translate auto-generated captions
Brand Settings
Apply consistent branding to all videos
Intro/Outro
Add branded intro and outro sequences
API Reference
For complete technical details, see:- Render Storyboard Video - Full API specification
- Get Job Status - Monitor job status and progress
