Video Transcription

Overview

The Video Transcription API generates accurate transcriptions and subtitles for your video or audio files. It supports over 20 languages and can automatically detect speech, convert it to text, and provide timing information for subtitle creation. You can also provide your own transcript or request AI-generated highlights from the transcription. This endpoint processes files asynchronously and returns a job ID that you can use to track the transcription status. Once complete, you will receive the full transcript with precise timing information in formats suitable for SRT, VTT, or JSON subtitle files.

Request Headers

Authorization

string

required

API key for authentication

Authorization: YOUR_API_KEY

Content-Type

string

required

Must be set to application/json

Content-Type: application/json

Request Parameters

Option 1: Automatic Transcription (Recommended)

Use this option to automatically transcribe your video/audio file.

fileUrl

string

required

The URL of the video or audio file to transcribe. The file must be accessible via HTTPS.Supported formats: MP4, MOV, AVI, MP3, WAV, M4A

mediaType

string

default:"video"

The type of media file being transcribed.Options:

video - Video file with audio track (default)
audio - Audio-only file

language

string

default:"en-US"

Language code for transcription. Defaults to en-US if not specified.See the Supported Languages section below for the complete list.

webhook

string

Webhook URL where transcription results will be sent via POST request when processing completes.The webhook will receive the complete transcript data including text, word-level timing information, and subtitle formats (SRT, VTT).

Option 2: Provide Your Own Transcript

Use this option when you already have word-level transcript data with timing.

fileUrl

string

required

The URL of the video or audio file.

transcript

object[]

required

Array of word-level transcript items with timing information.

Required for Option 2 only. The “Try it now” form may show this as optional because it combines both options, but this field is required when providing your own transcript.

Each transcript item must contain:

content (string, required): The word or punctuation text
type (string, required): Type of content - "word", "punctuation", "punctuated_word", or "sentence"
start (number, required): Start time in seconds (supports decimals)
end (number, required): End time in seconds (supports decimals)
speaker (string, optional): Speaker name (default: "Speaker 1")
endOfSentence (boolean, optional): Set to true if this is the last word/punctuation of a sentence

Example format:

[
  {
    "content": "Hello",
    "type": "word",
    "start": 0.0,
    "end": 0.5,
    "speaker": "Speaker 1"
  },
  {
    "content": "world",
    "type": "word",
    "start": 0.5,
    "end": 1.0,
    "speaker": "Speaker 1"
  },
  {
    "content": ".",
    "type": "punctuation",
    "start": 1.0,
    "end": 1.0,
    "speaker": "Speaker 1",
    "endOfSentence": true
  }
]

highlights

object[]

Pre-defined highlight segments (when providing your own transcript).Each highlight segment contains timing information for marking important sections.

mediaType

string

default:"video"

Media type: "video" or "audio"

language

string

default:"en-US"

Language code (e.g., "en-US", "es-ES", "fr-FR")

webhook

string

Webhook URL for job completion notification

Supported Languages

The transcription service supports the following languages:

en-US - English (United States)
en-AU - English (Australia)
en-GB - English (United Kingdom)
fr-CA - French (Canada)
de-DE - German (Germany)
it-IT - Italian (Italy)
es-ES - Spanish (Spain)
ja-JP - Japanese (Japan)
ko-KR - Korean (South Korea)
ru-RU - Russian (Russia)
hi-IN - Hindi (India)
ta-IN - Tamil (India)
pt-BR - Portuguese (Brazil)
zh-CN - Chinese (Simplified)
ar-SA - Arabic (Saudi Arabia)
nl-NL - Dutch (Netherlands)
pl-PL - Polish (Poland)
tr-TR - Turkish (Turkey)
sv-SE - Swedish (Sweden)
da-DK - Danish (Denmark)

Response

Success Response

When the transcription job is successfully created:

{
  "success": true,
  "data": {
    "jobId": "95333422-8e76-4962-812b-5b6d7276451a"
  }
}

success

boolean

Indicates whether the request was successful

data

object

Contains the transcription job information

jobId

string

Unique identifier for the transcription job. Use this to track the job status and retrieve results via the Get Job by ID API.

Job Status Response (via Get Job API)

While the job is processing, the Get Job API returns:

{
  "job_id": "95333422-8e76-4962-812b-5b6d7276451a",
  "success": true,
  "data": {
    "status": "in-progress"
  }
}

When transcription completes, you will receive the full result with transcript data via webhook or by polling the Get Job API.

Error Response (400 Bad Request)

{
  "success": false,
  "data": {
    "error_code": "4000",
    "error_message": "CREATE_TRANSCRIPTION_JOB_FAILED"
  }
}

Code Examples

curl --request POST \
  --url https://api.pictory.ai/pictoryapis/v2/transcription \
  --header 'Authorization: YOUR_API_KEY' \
  --header 'Content-Type: application/json' \
  --data '{
    "fileUrl": "https://example.com/video.mp4",
    "mediaType": "video",
    "language": "en-US",
    "webhook": "https://your-domain.com/webhooks/transcription"
  }' | python -m json.tool

Common Use Cases

1. Basic Video Transcription

Generate a transcript for a video file in English:

import requests

url = "https://api.pictory.ai/pictoryapis/v2/transcription"
headers = {
    "Authorization": "YOUR_API_KEY",
    "Content-Type": "application/json"
}

payload = {
    "fileUrl": "https://example.com/presentation.mp4",
    "mediaType": "video",
    "language": "en-US",
    "webhook": "https://your-domain.com/webhooks/transcription"
}

response = requests.post(url, json=payload, headers=headers)
data = response.json()

if data.get("success"):
    print(f"Job ID: {data['data']['jobId']}")
    print("You will receive the transcript at your webhook URL when processing is complete")

2. Multi-Language Transcription

Transcribe content in different languages:

import requests

def transcribe_video(video_url, language_code, webhook_url):
    url = "https://api.pictory.ai/pictoryapis/v2/transcription"
    headers = {
        "Authorization": "YOUR_API_KEY",
        "Content-Type": "application/json"
    }

    payload = {
        "fileUrl": video_url,
        "mediaType": "video",
        "language": language_code,
        "webhook": webhook_url
    }

    response = requests.post(url, json=payload, headers=headers)
    return response.json()

# Transcribe videos in different languages
videos = [
    ("https://example.com/english-video.mp4", "en-US"),
    ("https://example.com/spanish-video.mp4", "es-ES"),
    ("https://example.com/japanese-video.mp4", "ja-JP")
]

for video_url, language in videos:
    result = transcribe_video(video_url, language, "https://your-domain.com/webhooks/transcription")
    if result.get("success"):
        print(f"Started transcription for {language}: {result['data']['jobId']}")

3. Audio File Transcription

Transcribe audio-only files like podcasts or interviews:

import requests

url = "https://api.pictory.ai/pictoryapis/v2/transcription"
headers = {
    "Authorization": "YOUR_API_KEY",
    "Content-Type": "application/json"
}

payload = {
    "fileUrl": "https://example.com/podcast-episode.mp3",
    "mediaType": "audio",
    "language": "en-US",
    "webhook": "https://your-domain.com/webhooks/transcription"
}

response = requests.post(url, json=payload, headers=headers)
data = response.json()

if data.get("success"):
    print(f"Podcast transcription started: {data['data']['jobId']}")

4. Transcription with Custom Transcript

Provide your own transcript and request highlights:

import requests

url = "https://api.pictory.ai/pictoryapis/v2/transcription"
headers = {
    "Authorization": "YOUR_API_KEY",
    "Content-Type": "application/json"
}

payload = {
    "fileUrl": "https://example.com/webinar.mp4",
    "mediaType": "video",
    "language": "en-US",
    "webhook": "https://your-domain.com/webhooks/transcription",
    "transcript": [
        {"text": "Welcome to today's webinar on API integration.", "start": 0, "end": 4},
        {"text": "We'll cover authentication, endpoints, and best practices.", "start": 4, "end": 9},
        {"text": "First, let's discuss API keys and security.", "start": 9, "end": 13}
    ],
    "highlight": [
        {"duration": 60, "type": "summary"}
    ]
}

response = requests.post(url, json=payload, headers=headers)
data = response.json()

if data.get("success"):
    print(f"Job ID: {data['data']['jobId']}")
    print("Custom transcript submitted with highlight request")

5. Batch Processing Multiple Videos

Process multiple videos and track their job IDs:

import requests
import time

def start_transcription(file_url, language):
    url = "https://api.pictory.ai/pictoryapis/v2/transcription"
    headers = {
        "Authorization": "YOUR_API_KEY",
        "Content-Type": "application/json"
    }

    payload = {
        "fileUrl": file_url,
        "mediaType": "video",
        "language": language,
        "webhook": "https://your-domain.com/webhooks/transcription"
    }

    response = requests.post(url, json=payload, headers=headers)
    return response.json()

# List of videos to transcribe
videos_to_process = [
    {"url": "https://example.com/video1.mp4", "language": "en-US", "title": "Introduction"},
    {"url": "https://example.com/video2.mp4", "language": "en-US", "title": "Tutorial Part 1"},
    {"url": "https://example.com/video3.mp4", "language": "es-ES", "title": "Spanish Version"}
]

job_tracker = []

for video in videos_to_process:
    result = start_transcription(video["url"], video["language"])

    if result.get("success"):
        job_info = {
            "title": video["title"],
            "jobId": result["data"]["jobId"],
            "language": video["language"]
        }
        job_tracker.append(job_info)
        print(f"Started: {video['title']} - Job ID: {result['data']['jobId']}")
        time.sleep(1)  # Rate limiting
    else:
        print(f"Failed to start: {video['title']}")

# Save job IDs for tracking
print(f"\nTotal jobs started: {len(job_tracker)}")
for job in job_tracker:
    print(f"  {job['title']}: {job['jobId']}")

Best Practices

Webhook Configuration: Ensure your webhook endpoint is configured to handle POST requests and can process the transcript payload. The webhook should return a 200 status code to acknowledge receipt.
File Accessibility: Make sure your video/audio files are publicly accessible via HTTPS. The transcription service must be able to download the file from the provided URL.
Language Selection: Choose the correct language code for best transcription accuracy. Using the wrong language will result in poor quality transcripts.
File Format: Use common formats like MP4, MP3, or WAV for best results. Ensure audio quality is clear for accurate transcription.
Processing Time: Transcription is an asynchronous process. Processing time varies based on file length and complexity. Use the webhook to receive results rather than polling.
Custom Transcripts: When providing custom transcripts, ensure timing information is accurate with start and end times in seconds. This enables proper synchronization.
Rate Limiting: Implement appropriate delays between batch requests to avoid rate limiting. Process videos sequentially with small delays between API calls.
Error Handling: Implement proper error handling for failed transcriptions. Check the webhook payload for error messages and retry if necessary.

Getting started

Videos

Video Storyboard

Pictory Jobs

Smart Layouts

Avatars

VoiceOvers

Music Search

Media Management

Branding - Video

Pictory Projects

Video Templates

Video Summary and Transcription

Vimeo Integration

AWS Integration

Video Transcription

Overview

Request Headers

Request Parameters

Option 1: Automatic Transcription (Recommended)

Option 2: Provide Your Own Transcript

Supported Languages

Response

Success Response

Job Status Response (via Get Job API)

Error Response (400 Bad Request)

Code Examples

Common Use Cases

1. Basic Video Transcription

2. Multi-Language Transcription

3. Audio File Transcription

4. Transcription with Custom Transcript

5. Batch Processing Multiple Videos

Best Practices

Getting started

Videos

Video Storyboard

Pictory Jobs

Smart Layouts

Avatars

VoiceOvers

Music Search

Media Management

Branding - Video

Pictory Projects

Video Templates

Video Summary and Transcription

Vimeo Integration

AWS Integration

​Overview

​Request Headers

​Request Parameters

​Option 1: Automatic Transcription (Recommended)

​Option 2: Provide Your Own Transcript

​Supported Languages

​Response

​Success Response

​Job Status Response (via Get Job API)

​Error Response (400 Bad Request)

​Code Examples

​Common Use Cases

​1. Basic Video Transcription

​2. Multi-Language Transcription

​3. Audio File Transcription

​4. Transcription with Custom Transcript

​5. Batch Processing Multiple Videos

​Best Practices

Overview

Request Headers

Request Parameters

Option 1: Automatic Transcription (Recommended)

Option 2: Provide Your Own Transcript

Supported Languages

Response

Success Response

Job Status Response (via Get Job API)

Error Response (400 Bad Request)

Code Examples

Common Use Cases

1. Basic Video Transcription

2. Multi-Language Transcription

3. Audio File Transcription

4. Transcription with Custom Transcript

5. Batch Processing Multiple Videos

Best Practices