The Video Transcription API generates accurate transcriptions and subtitles for your video or audio files. It supports over 20 languages and can automatically detect speech, convert it to text, and provide timing information for subtitle creation. You can also provide your own transcript or request AI-generated highlights from the transcription.This endpoint processes files asynchronously and returns a job ID that you can use to track the transcription status. Once complete, you’ll receive the full transcript with precise timing information in formats suitable for SRT, VTT, or JSON subtitle files.
The URL where the transcription results will be sent via POST request when processing is complete.The webhook will receive the complete transcript data including text, timing information, and any requested highlights.
Optional custom transcript to use instead of automatic speech recognition. Useful when you already have accurate transcript text and want to generate highlights or verify timing.Each object contains:
text (string, required): The spoken text segment
start (number, required): Start time in seconds (supports integers and decimals, e.g., 0, 2.5, 10.75)
end (number, required): End time in seconds (supports integers and decimals, e.g., 3, 5.2, 15.5)
Example format:
Report incorrect code
Copy
Ask AI
[ { "text": "Welcome to our product demo.", "start": 0, "end": 3.5 }, { "text": "Today we'll show you the key features.", "start": 3.5, "end": 7.2 }]
Optional array to request AI-generated highlights from the transcription. Each object specifies desired highlight characteristics.Each object contains:
duration (number): Target duration for the highlight in seconds (supports integers and decimals, e.g., 30, 45.5, 60)
type (string): Type of highlight (e.g., “summary”, “key_moments”)
Webhook Configuration: Ensure your webhook endpoint is configured to handle POST requests and can process the transcript payload. The webhook should return a 200 status code to acknowledge receipt.
File Accessibility: Make sure your video/audio files are publicly accessible via HTTPS. The transcription service must be able to download the file from the provided URL.
Language Selection: Choose the correct language code for best transcription accuracy. Using the wrong language will result in poor quality transcripts.
File Format: Use common formats like MP4, MP3, or WAV for best results. Ensure audio quality is clear for accurate transcription.
Processing Time: Transcription is an asynchronous process. Processing time varies based on file length and complexity. Use the webhook to receive results rather than polling.
Custom Transcripts: When providing custom transcripts, ensure timing information is accurate with start and end times in seconds. This enables proper synchronization.
Rate Limiting: Implement appropriate delays between batch requests to avoid rate limiting. Process videos sequentially with small delays between API calls.
Error Handling: Implement proper error handling for failed transcriptions. Check the webhook payload for error messages and retry if necessary.