Video Transcription
Video Summary and Transcription
Video Transcription
Generate accurate transcriptions and subtitles for your videos
POST
Video Transcription
Documentation Index
Fetch the complete documentation index at: https://docs.pictory.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The Video Transcription API generates accurate transcriptions and subtitles for your video or audio files. It supports over 20 languages and can automatically detect speech, convert it to text, and provide timing information for subtitle creation. You can also provide your own transcript or request AI-generated highlights from the transcription. This endpoint processes files asynchronously and returns a job ID that you can use to track the transcription status. Once complete, you will receive the full transcript with precise timing information in formats suitable for SRT, VTT, or JSON subtitle files.Request Headers
API key for authentication
Must be set to
application/jsonRequest Parameters
Option 1: Automatic Transcription (Recommended)
Use this option to automatically transcribe your video/audio file.The URL of the video or audio file to transcribe. The file must be accessible via HTTPS.Supported formats: MP4, MOV, AVI, MP3, WAV, M4A
The type of media file being transcribed.Options:
video- Video file with audio track (default)audio- Audio-only file
Language code for transcription. Defaults to
en-US if not specified.See the Supported Languages section below for the complete list.Webhook URL where transcription results will be sent via POST request when processing completes.The webhook will receive the complete transcript data including text, word-level timing information, and subtitle formats (SRT, VTT).
Option 2: Provide Your Own Transcript
Use this option when you already have word-level transcript data with timing.The URL of the video or audio file.
Array of word-level transcript items with timing information.Each transcript item must contain:
content(string, required): The word or punctuation texttype(string, required): Type of content -"word","punctuation","punctuated_word", or"sentence"start(number, required): Start time in seconds (supports decimals)end(number, required): End time in seconds (supports decimals)speaker(string, optional): Speaker name (default:"Speaker 1")endOfSentence(boolean, optional): Set totrueif this is the last word/punctuation of a sentence
Pre-defined highlight segments (when providing your own transcript).Each highlight segment contains timing information for marking important sections.
Media type:
"video" or "audio"Language code (e.g.,
"en-US", "es-ES", "fr-FR")Webhook URL for job completion notification
Supported Languages
The transcription service supports the following languages:- en-US - English (United States)
- en-AU - English (Australia)
- en-GB - English (United Kingdom)
- fr-CA - French (Canada)
- de-DE - German (Germany)
- it-IT - Italian (Italy)
- es-ES - Spanish (Spain)
- ja-JP - Japanese (Japan)
- ko-KR - Korean (South Korea)
- ru-RU - Russian (Russia)
- hi-IN - Hindi (India)
- ta-IN - Tamil (India)
- pt-BR - Portuguese (Brazil)
- zh-CN - Chinese (Simplified)
- ar-SA - Arabic (Saudi Arabia)
- nl-NL - Dutch (Netherlands)
- pl-PL - Polish (Poland)
- tr-TR - Turkish (Turkey)
- sv-SE - Swedish (Sweden)
- da-DK - Danish (Denmark)
Response
Success Response
When the transcription request is successfully submitted, a job is created and a job ID is returned:Indicates whether the request was successful
Contains the transcription job information
Unique identifier for the transcription job. Use this to track the job status and retrieve results via the Get Transcription Job by ID endpoint.
Next Steps
Once you have thejobId, poll the Get Transcription Job by ID endpoint to check the job status and retrieve the transcript when processing is complete. Use a polling interval of 10–30 seconds.
Code Examples
Common Use Cases
1. Basic Video Transcription
Generate a transcript for a video file in English:2. Multi-Language Transcription
Transcribe content in different languages:3. Audio File Transcription
Transcribe audio-only files like podcasts or interviews:4. Transcription with Custom Transcript
Provide your own transcript and request highlights:5. Batch Processing Multiple Videos
Process multiple videos and track their job IDs:Best Practices
- Webhook Configuration: Ensure your webhook endpoint is configured to handle POST requests and can process the transcript payload. The webhook should return a 200 status code to acknowledge receipt.
- File Accessibility: Make sure your video/audio files are publicly accessible via HTTPS. The transcription service must be able to download the file from the provided URL.
- Language Selection: Choose the correct language code for best transcription accuracy. Using the wrong language will result in poor quality transcripts.
- File Format: Use common formats like MP4, MP3, or WAV for best results. Ensure audio quality is clear for accurate transcription.
- Processing Time: Transcription is an asynchronous process. Processing time varies based on file length and complexity. Use the webhook to receive results rather than polling.
- Custom Transcripts: When providing custom transcripts, ensure timing information is accurate with start and end times in seconds. This enables proper synchronization.
- Rate Limiting: Implement appropriate delays between batch requests to avoid rate limiting. Process videos sequentially with small delays between API calls.
- Error Handling: Implement proper error handling for failed transcriptions. Check the webhook payload for error messages and retry if necessary.
