The Add new Voiceover API allows users to add Text-to-Speech (TTS) voiceovers from external platforms like Google, AWS, and ElevenLabs to Pictory’s voiceover library. By default, Pictory provides access to a pre-configured list of voices from these platforms. However, users may encounter voiceovers available on these platforms that are not present in Pictory’s default voiceover library. This API enables users to add such custom voices to the Pictory platform by providing the relevant details.

API Endpoint

URL:
https://api.pictory.ai/pictoryapis/v1/voiceovers/tracks

Method:
POST

Content-Type:
application/json

Accept:
application/json

Request Structure

CURL Command:

curl --request POST \
     --url https://api.pictory.ai/pictoryapis/v1/voiceovers/tracks \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --data '
{
  "engine": "WaveNet",
  "service": "google",
  "name": "Arnold",
  "description": "A deep male voice",
  "language": "en-US",
  "voiceId": "en-US-Journey-D",
  "accent": "American",
  "age": "Adult",
  "gender": "Male",
  "sample": "https://example.com/sample.mp3"
}'

Request Body Parameters:

FieldTypeRequiredDescription
namestringYesUnique name of the voiceover.
descriptionstringNoDescription of the voiceover.
enginestringYesThe TTS engine to use. Allowed values: neural, standard, WaveNet, Neural2, eleven_monolingual_v1, eleven_multilingual_v2.

For aws service, the valid values are neural and standard

For google service, the valid values are standard, WaveNet and Neural2

For elevenlabs service, the valid values are eleven_monolingual_v1 and eleven_multilingual_v2
servicestringYesThe TTS provider service. Allowed values: aws, google, elevenlabs.
languagestringYesThe language code of the voice. Example: en-US.
publicUserIdstringNoPublic user ID used to publicly identify ElevenLabs users. It is required for sharing a voice from user's ElevenLabs account to Pictory's account.
It is valid for elevenlabs service only.
voiceIdstringYesUnique identifier for the voice from the external TTS provider.

For google, the voiceId could be voice name en-US-Journey-D

For aws, the voiceId could be voice name `Danielle

For elevenlabs, the voiceId could be unique Id of the voices in the voice library
accentstringYesThe accent of the voice, e.g., American.
agestringYesThe age group of the voice. Typical values include Child, Teen, Adult, Senior.
genderstringYesThe gender of the voice. Typical values include Male, Female, Neutral.
sampleURLNoA URL to an audio sample for this voice.

Example Request Body:

{
  "engine": "WaveNet",
  "service": "google",
  "name": "Arnold",
  "description": "A deep male voice",
  "language": "en-US",
  "voiceId": "en-US-Journey-D",
  "accent": "American",
  "age": "Adult",
  "gender": "Male",
  "sample": "https://example.com/sample.mp3"
}

Headers:

HeaderValueDescription
Authorization<access_token>Token for API access.
X-Pictory-User-Id<Your-Pictory-User-ID>Unique identifier for the user provided by Pictory.
content-typeapplication/jsonSpecifies the request payload format.

Example Response

Successful Response:

{
  "id": 12345,
  "name": "Arnold",
  "accent": "American",
  "gender": "Male",
  "language": "en-US",
  "sample": "https://example.com/sample.mp3",
  "service": "google",
  "engine": "WaveNet",
  "category": "standard",
  "ssmlSupportCategory": "C",
  "ssmlHelp": "https://docs.pictory.ai/docs/supported-ssml-tags#category-c"
}

Response Fields:

FieldTypeDescription
idIntegerThe unique identifier for the voiceover track in the Pictory voiceover library.
nameStringThe name of the voiceover track.
accentStringThe accent of the voice (e.g., "American").
genderStringThe gender of the voice (e.g., "Male").
languageStringThe language of the voice (e.g., "en-US").
sampleURLThe sample URL of the voice track.
serviceStringThe service where the voice is sourced from. Possible values: aws, google, elevenlabs.
engineStringThe voice synthesis engine used. Possible values: neural, standard, WaveNet, Neural2, eleven_monolingual_v1, eleven_multilingual_v2.
categoryStringThe category of the voice. For elevenlabs, it is premium. For google and aws, it is standard.
ssmlSupportCategoryStringThe SSML support category. Possible values: A, B, C, D.
ssmlHelpStringA URL pointing to the Pictory SSML support documentation for the corresponding support category.

Try request here

Language
Click Try It! to start a request and see the response here!