Skip to main content
This guide shows you how to convert podcast episodes or audio files into engaging videos with automatically generated visuals. Perfect for repurposing audio content for YouTube, social media, or video platforms.

What You’ll Learn

Audio to Video

Convert podcasts and audio files to videos

Auto Transcribe

Automatically transcribe audio content

Smart Visuals

Generate visuals synchronized to audio

Captions Included

Auto-generated subtitles from transcription

Before You Begin

Make sure you have:
  • A Pictory API key (get one here)
  • Node.js or Python installed on your machine
  • Audio file accessible via public URL
  • Basic understanding of audio/podcast formats
npm install axios

How Podcast-to-Video Works

When you convert audio to video:
  1. Audio Processing - Your audio file is accessed and analyzed
  2. Transcription - AI automatically transcribes the audio content
  3. Scene Generation - Content is split into logical scenes based on the transcript
  4. Visual Selection - Appropriate stock visuals are selected to match the audio content
  5. Caption Generation - Subtitles are created from the transcription
  6. Video Rendering - Final video is assembled with audio, visuals, and captions synchronized
The audio file must be accessible via a public URL. Upload to cloud storage (Google Drive, Dropbox, AWS S3) and use the public share link. Processing time is proportional to audio length.

Complete Example

import axios from "axios";

const API_BASE_URL = "https://api.pictory.ai/pictoryapis";
const API_KEY = "YOUR_API_KEY";

// Sample audio URL - replace with your own audio file URL
const AUDIO_URL = "https://pictory-static.pictorycontent.com/sample_podcast.mp3";

async function createPodcastToVideo() {
  try {
    console.log("Creating video from podcast/audio...");

    const response = await axios.post(
      `${API_BASE_URL}/v2/video/storyboard/render`,
      {
        videoName: "podcast_to_video",
        url: AUDIO_URL,                    // Audio file URL
      },
      {
        headers: {
          "Content-Type": "application/json",
          Authorization: API_KEY,
        },
      }
    );

    const jobId = response.data.data.jobId;
    console.log("✓ Video creation started!");
    console.log("Job ID:", jobId);

    // Monitor progress
    console.log("\nMonitoring video creation...");
    let jobCompleted = false;
    let jobResult = null;

    while (!jobCompleted) {
      const statusResponse = await axios.get(
        `${API_BASE_URL}/v1/jobs/${jobId}`,
        {
          headers: { Authorization: API_KEY },
        }
      );

      const status = statusResponse.data.data.status;
      console.log("Status:", status);

      if (status === "completed") {
        jobCompleted = true;
        jobResult = statusResponse.data;
        console.log("\n✓ Video from podcast is ready!");
        console.log("Video URL:", jobResult.data.videoURL);
      } else if (status === "failed") {
        throw new Error("Video creation failed: " + JSON.stringify(statusResponse.data));
      }

      await new Promise(resolve => setTimeout(resolve, 5000));
    }

    return jobResult;
  } catch (error) {
    console.error("Error:", error.response?.data || error.message);
    throw error;
  }
}

createPodcastToVideo();

Understanding the Parameters

ParameterTypeRequiredDescription
videoNamestringYesA descriptive name for your video project
urlstringYesPublic URL to the audio file (podcast, interview, etc.)

Supported Audio Formats

FormatExtensionDescription
MP3.mp3Most common podcast format (recommended)
WAV.wavUncompressed audio, high quality
M4A.m4aApple audio format, good quality
AAC.aacAdvanced audio codec
FLAC.flacLossless audio compression
OGG.oggOpen-source audio format
Best Format: Use MP3 for the best balance of quality and processing speed. Ensure audio is mono or stereo with clear speech.

Common Use Cases

Podcast Episodes for YouTube

{
  videoName: "podcast_episode_042_youtube",
  url: "https://storage.example.com/episode-042.mp3"
}
Result: Full podcast episode as video with matching visuals and captions.

Interview Clips for Social Media

{
  videoName: "interview_highlight_linkedin",
  url: "https://storage.example.com/interview-segment.mp3"
}
Result: Short interview clip with professional visuals for LinkedIn.

Audio Blog Posts

{
  videoName: "audio_blog_marketing_tips",
  url: "https://storage.example.com/blog-audio.mp3"
}
Result: Audio blog converted to video format with relevant visuals.

Webinar Audio Archives

{
  videoName: "webinar_q_and_a_session",
  url: "https://storage.example.com/webinar-audio.m4a"
}
Result: Webinar audio transformed into watchable video content.

Best Practices

Ensure high-quality audio for best transcription:
  • Clear Speech: Use good microphones for recording
  • Minimize Noise: Reduce background noise and echo
  • Proper Levels: Avoid audio that’s too quiet or distorted
  • Single Speaker: One speaker at a time works best for transcription
  • Speaking Pace: Natural speaking pace (not too fast) improves accuracy
  • File Quality: Use 128kbps or higher bitrate for MP3
Match content length to your platform and audience:
  • Social Media: Extract 1-3 minute clips for maximum engagement
  • YouTube: 5-15 minutes works well for episodic content
  • Full Episodes: Consider breaking 30+ minute podcasts into segments
  • Processing Time: Longer audio = longer processing (plan accordingly)
  • Viewer Retention: Shorter clips often perform better on social platforms
Ensure your audio file can be accessed:
  • Cloud Storage: Upload to Google Drive, Dropbox, or AWS S3
  • Public Link: Generate a direct, public download link
  • Test Access: Verify link works in incognito browser
  • Stable URL: Ensure link won’t expire during processing
  • Direct URL: Use direct file URL, not streaming or preview links
Improve AI transcription results:
  • Clear Audio: Clean recordings transcribe more accurately
  • Standard Accents: Clear, standard pronunciation works best
  • Avoid Jargon: Technical terms may not transcribe perfectly
  • Speaker Separation: Clear pauses between speakers help
  • Background Music: Minimize or remove background music for better transcription
Create focused, engaging content:
  • Highlight Reels: Extract best moments from full episodes
  • Topic Segments: Create separate videos for different topics
  • Intro Clips: Use episode intros as social media teasers
  • Quotes: Extract powerful quotes or insights as short clips
  • Series: Create a series of related short videos from one episode

Troubleshooting

Problem: The API cannot download or process your audio file.Solution:
  • Verify the URL is publicly accessible (test in incognito browser)
  • Ensure it’s a direct download link, not a streaming or preview link
  • For Google Drive: Right-click → Share → “Anyone with the link” → Copy link
  • For Dropbox: Share → Create link → change “dl=0” to “dl=1” at end of URL
  • Check file hasn’t been deleted or moved
  • Verify file format is supported (see table above)
Problem: Auto-generated captions don’t match the audio.Solution:
  • Improve audio quality (reduce background noise, use better microphone)
  • Ensure speakers speak clearly and at moderate pace
  • Reduce background music volume if present
  • Check audio isn’t distorted or too quiet
  • Try re-recording with better equipment/conditions
  • Consider professional audio editing before conversion
Problem: Selected stock visuals seem unrelated to podcast topic.Solution:
  • The AI selects visuals based on transcribed content
  • Ensure speakers mention key visual concepts in the audio
  • More descriptive language helps AI select better visuals
  • Consider extracting specific segments with clearer topics
  • Review final video - sometimes visuals are thematic rather than literal
Problem: Job status shows “in-progress” for extended periods.Solution:
  • Audio processing time depends on length and quality
  • Expected times:
    • 1-5 minutes audio: 5-10 minutes processing
    • 10-30 minutes audio: 15-30 minutes processing
    • 60+ minutes audio: 45-90 minutes processing
  • Large file sizes take longer to download and process
  • Check status every 5-10 seconds, not more frequently
  • If stuck for 2x expected time, contact support with job ID
Problem: Expected auto-generated captions but they’re missing.Solution:
  • Captions are automatically generated from transcription
  • Verify audio contains clear, audible speech
  • Check that audio isn’t purely music or sound effects
  • Ensure audio file isn’t corrupted
  • Try with a different audio file to test
  • Contact support if issue persists

Next Steps

Enhance your podcast videos with these features:

API Reference

For complete technical details, see: