Video to Text Usecase

Transcription API of Pictory lets you generate the text from the existing video. Once the transcription(text) has been generated you can use the steps mentioned in this to add transcription to your existing videos.


To generate text from video, please follow the below-mentioned steps:

  1. STEP 1: CALL TRANSCRIPTION API: Generate the Transcription of the uploaded Video by passing the URL (on which the video file was uploaded in the request param by using /v2/transcription API. Transcription data will be sent in a callback URL or it can be fetched from GET Job API.
curl --location 'https://api.pictory.ai/pictoryapis/v2/transcription' \
--header 'Authorization: YOUR_AUTH_TOKEN ' \
--header 'X-Pictory-User-Id: YOUR_USER_ID' \
--header 'Content-Type: application/json' \
--data '{
    "fileUrl":  "PUBLIC_STREAMABLE_MP4_URL", 
	"mediaType": "video", 
	"language": "en-IN", 
    "webhook":"<CALLBACK_URL>" 
}'

{  
    "data": {  
        "jobId": "\<JOB_ID>"  
    },  
    "success": true  
}

Call GET Job API to get the transcription data. The transcription data is also sent to the webhook URL.

{
  "success": true,  
   "data": {  
       "transcript": [  
         {  
               "uid":"",  
               "speakerId": "",  
           		"word": [  
                 {  "uid": "ef4b1282-5b34-45a2-984d-d84e6700756e",  
                       "word": "Once",  
                       "start_time": 0,  
                       "end_time": 0.88,  
                       "sentence_index": 0,  
                       "is_pause": true,  
                       "pause_size": "small",  
                       "state": "active",  
                       "speakerId": 1  
                  },  
                 {...}, {...}  
               ]  
         },  
         {...},  
          {...}.  
       ],  
     "txt": "Once Again",  
     "srt": "1\\n00:00:00,880 --> 00:00:02,960\\nOnce again",  
     "vtt": "WEBVTT\\n\\n1\\n00:00:00.880 --> 00:00:02.960\\n- Once again, ",  
      "job_id": "4eac6816-9435-49d0-ba61-db226ec5cf0c"  
   }
}

This will generate text from the video.

📘

Follow step 2 & 3 if you want to Burn Subtitles in the video.

  1. STEP 2: Create Text sentences: As of now, Summary API converts speech to text in an array of word formats (instead of generating sentences). Each word object consists of the start_time and end_time of the word that needs to be displayed in the videos.

You would need to write logic to form sentences from words and include the start time of the sentence (the start time of 1st word in the sentence) and the end time of the sentence (the end time of the last word of the sentence). A sample of the response is given above.

  1. Step 3: Call Text to Video APIs : Once sentences are created, you need to follow Text to Video steps and call Storyboard API to add subtitles to the original video. To add subtitles to the scene you need to:
  1. Divide the original video into different scenes. Add subtitles to each scene.
    Storyboard API has a scenes array and for each scene, you need to pass the following parameters:
    1. backgroundUri: original Video URL (AWS S3)
    2. text: this will be the subtitle that needs to be displayed for that particular scene
      1. BackgroundVideoSegments.start: This defines the start time of text in the video. Pass the start time of the sentence here.
      2. BackgroundVideoSegments.end: This defines the start time of text in the video. Pass the end time of the sentence here.
  2. Storyboard API returns Job_Id in response. The job status can be seen by calling GET Job REST API. Once this job is complete, it gives a video preview URL and render_video settings in response. Video Render settings returned in this job can be used to generate video in .mp4 format.
  3. Video can be generated by calling Video Render API. Video Render API returns JobId in response. The job status can be seen by calling GET Job REST API. Once this job is complete, it gives the link to the .mp4 video in response.