Get Voiceover Tracks

Overview

Retrieve a complete list of all AI voiceover voices available for text-to-speech conversion in your video projects. The endpoint provides detailed information about each voice including accent, gender, language, service provider (AWS Polly or Google WaveNet/Neural2), sample audio URLs, and SSML support categories.

You need a valid API key to use this endpoint. Get your API key from the API Access page in your Pictory dashboard.

API Endpoint

GET https://api.pictory.ai/pictoryapis/v1/voiceovers/tracks

Request Parameters

Headers

Authorization

string

required

API key for authentication (starts with pictai_)

Authorization: YOUR_API_KEY

Response

Returns an array of voice objects with the following properties:

accent

string

The accent or regional variant of the voice (e.g., “American accent”, “British accent”, “Indian accent”)

Response Examples

[
  {
    "accent": "American accent",
    "category": "standard",
    "engine": "neural",
    "gender": "female",
    "id": 1001,
    "language": "en-US",
    "name": "Joanna",
    "sample": "https://pictory-static.pictorycontent.com/polly/samples/Joanna_100_sample.mp3",
    "service": "aws",
    "ssmlHelp": "https://docs.pictory.ai/docs/supported-ssml-tags#category-b",
    "ssmlSupportCategory": "B",
    "voice": "Joanna"
  },
  {
    "accent": "British accent",
    "category": "standard",
    "engine": "WaveNet",
    "gender": "female",
    "id": 1034,
    "language": "en-GB",
    "name": "Fiona",
    "sample": "https://pictory-static.pictorycontent.com/google/samples/Fiona_en-GB-Wavenet-A_FEMALE_updated.mp3",
    "service": "google",
    "ssmlHelp": "https://docs.pictory.ai/docs/supported-ssml-tags#category-c",
    "ssmlSupportCategory": "C",
    "voice": "en-GB-Wavenet-A_FEMALE"
  },
  {
    "accent": "Indian accent",
    "category": "standard",
    "engine": "WaveNet",
    "gender": "female",
    "id": 1039,
    "language": "en-IN",
    "name": "Shreya",
    "sample": "https://pictory-static.pictorycontent.com/google/samples/en-IN-Wavenet-A_FEMALE_updated.mp3",
    "service": "google",
    "ssmlHelp": "https://docs.pictory.ai/docs/supported-ssml-tags#category-c",
    "ssmlSupportCategory": "C",
    "voice": "en-IN-Wavenet-A_FEMALE"
  }
]

Code Examples

Replace YOUR_API_KEY with your actual API key that starts with pictai_

curl --request GET \
  --url https://api.pictory.ai/pictoryapis/v1/voiceovers/tracks \
  --header 'Authorization: YOUR_API_KEY' \
  --header 'accept: application/json' | python -m json.tool

Usage Notes

Voice Availability: The endpoint returns all available voices across multiple languages, accents, and service providers. Filter the results based on your project requirements.

Sample Audio: Each voice includes a sample URL pointing to an MP3 preview. Use these samples to let users preview voices before selecting.

SSML Support: Different voices support different SSML (Speech Synthesis Markup Language) features. Check the ssmlSupportCategory and ssmlHelp fields to understand what is available for each voice.

Service Providers:

AWS Polly voices (service: "aws") use the “neural” or “standard” engine
Google Cloud voices (service: "google") use “WaveNet” or “Neural2” engines

Generally, neural/WaveNet voices sound more natural than standard voices.

Common Use Cases

1. List All Available Voices

Retrieve and display all available voices:

import requests

def get_all_voices(api_key):
    """
    Retrieve all available voiceover voices
    """
    url = "https://api.pictory.ai/pictoryapis/v1/voiceovers/tracks"
    headers = {"Authorization": api_key}

    response = requests.get(url, headers=headers)

    if response.status_code == 200:
        voices = response.json()
        print(f"Total available voices: {len(voices)}\n")

        for voice in voices[:10]:  # Show first 10
            print(f"{voice['name']} - {voice['accent']} ({voice['language']})")
            print(f"  Gender: {voice['gender']}, Engine: {voice['engine']}")
            print(f"  Sample: {voice['sample']}")
            print()

        return voices
    else:
        print(f"Error: {response.status_code}")
        return []

# Example usage
voices = get_all_voices("YOUR_API_KEY")

2. Filter Voices by Language

Get all voices for a specific language:

async function getVoicesByLanguage(apiKey, languageCode) {
  const response = await fetch(
    'https://api.pictory.ai/pictoryapis/v1/voiceovers/tracks',
    {
      headers: { 'Authorization': `${apiKey}` }
    }
  );

  const allVoices = await response.json();

  // Filter by language code (e.g., "en-US", "en-GB", "fr-FR")
  const filtered = allVoices.filter(voice => voice.language === languageCode);

  console.log(`Found ${filtered.length} voices for ${languageCode}:`);
  filtered.forEach(voice => {
    console.log(`  - ${voice.name} (${voice.accent}, ${voice.gender})`);
  });

  return filtered;
}

// Example usage
const usVoices = await getVoicesByLanguage('YOUR_API_KEY', 'en-US');

3. Group Voices by Accent

Organize voices by accent type:

from collections import defaultdict
import requests

def group_voices_by_accent(api_key):
    """
    Group voices by their accent
    """
    url = "https://api.pictory.ai/pictoryapis/v1/voiceovers/tracks"
    headers = {"Authorization": api_key}

    response = requests.get(url, headers=headers)
    voices = response.json()

    # Group by accent
    by_accent = defaultdict(list)
    for voice in voices:
        accent = voice.get('accent', 'Unknown')
        by_accent[accent].append(voice)

    # Print grouped results
    for accent, voice_list in sorted(by_accent.items()):
        print(f"\n{accent} ({len(voice_list)} voices):")
        for voice in voice_list[:5]:  # Show first 5 per accent
            print(f"  - {voice['name']} ({voice['gender']}, {voice['engine']})")

    return by_accent

# Example usage
grouped = group_voices_by_accent("YOUR_API_KEY")

4. Find Best Voice Match

Find voices matching specific criteria:

async function findVoiceMatch(apiKey, criteria) {
  const response = await fetch(
    'https://api.pictory.ai/pictoryapis/v1/voiceovers/tracks',
    {
      headers: { 'Authorization': `${apiKey}` }
    }
  );

  const voices = await response.json();

  // Filter by multiple criteria
  const matches = voices.filter(voice => {
    return (
      (!criteria.language || voice.language === criteria.language) &&
      (!criteria.gender || voice.gender === criteria.gender) &&
      (!criteria.accent || voice.accent.toLowerCase().includes(criteria.accent.toLowerCase())) &&
      (!criteria.service || voice.service === criteria.service) &&
      (!criteria.engine || voice.engine === criteria.engine)
    );
  });

  console.log(`Found ${matches.length} matching voices:`);
  matches.forEach(voice => {
    console.log(`  - ${voice.name} (ID: ${voice.id})`);
    console.log(`    ${voice.accent}, ${voice.gender}, ${voice.engine}`);
  });

  return matches;
}

// Example usage - find American female neural voices
const matches = await findVoiceMatch('YOUR_API_KEY', {
  language: 'en-US',
  gender: 'female',
  accent: 'American',
  engine: 'neural'
});

5. Create Voice Selection UI Data

Prepare voice data for a user interface:

import requests

def prepare_voice_ui_data(api_key, language_filter=None):
    """
    Prepare voice data structured for UI dropdowns/selectors
    """
    url = "https://api.pictory.ai/pictoryapis/v1/voiceovers/tracks"
    headers = {"Authorization": api_key}

    response = requests.get(url, headers=headers)
    voices = response.json()

    # Filter by language if specified
    if language_filter:
        voices = [v for v in voices if v['language'] == language_filter]

    # Structure for UI
    ui_data = {
        'languages': {},
        'accents': {},
        'genders': ['male', 'female'],
        'engines': set(),
        'voices': []
    }

    for voice in voices:
        # Collect unique languages
        lang = voice['language']
        if lang not in ui_data['languages']:
            ui_data['languages'][lang] = []
        ui_data['languages'][lang].append(voice['name'])

        # Collect unique accents
        accent = voice['accent']
        if accent not in ui_data['accents']:
            ui_data['accents'][accent] = []
        ui_data['accents'][accent].append(voice['name'])

        # Collect engines
        ui_data['engines'].add(voice['engine'])

        # Create simplified voice entry
        ui_data['voices'].append({
            'id': voice['id'],
            'name': voice['name'],
            'label': f"{voice['name']} ({voice['accent']}, {voice['gender']})",
            'language': voice['language'],
            'accent': voice['accent'],
            'gender': voice['gender'],
            'engine': voice['engine'],
            'service': voice['service'],
            'sample': voice['sample']
        })

    ui_data['engines'] = sorted(list(ui_data['engines']))

    print(f"Prepared UI data for {len(ui_data['voices'])} voices")
    print(f"Languages: {len(ui_data['languages'])}")
    print(f"Accents: {len(ui_data['accents'])}")
    print(f"Engines: {ui_data['engines']}")

    return ui_data

# Example usage
ui_data = prepare_voice_ui_data("YOUR_API_KEY", language_filter="en-US")

# Access structured data
print("\nSample voice entries:")
for voice in ui_data['voices'][:3]:
    print(f"  {voice['label']} - ID: {voice['id']}")

6. Compare Voice Providers

Analyze and compare voices by service provider:

import requests

def compare_voice_providers(api_key):
    """
    Compare voice offerings between AWS and Google
    """
    url = "https://api.pictory.ai/pictoryapis/v1/voiceovers/tracks"
    headers = {"Authorization": api_key}

    response = requests.get(url, headers=headers)
    voices = response.json()

    # Separate by provider
    aws_voices = [v for v in voices if v['service'] == 'aws']
    google_voices = [v for v in voices if v['service'] == 'google']

    print("Voice Provider Comparison:")
    print(f"\nAWS Polly: {len(aws_voices)} voices")
    print(f"  Engines: {set(v['engine'] for v in aws_voices)}")
    print(f"  Languages: {len(set(v['language'] for v in aws_voices))}")
    print(f"  Male: {len([v for v in aws_voices if v['gender'] == 'male'])}")
    print(f"  Female: {len([v for v in aws_voices if v['gender'] == 'female'])}")

    print(f"\nGoogle Cloud: {len(google_voices)} voices")
    print(f"  Engines: {set(v['engine'] for v in google_voices)}")
    print(f"  Languages: {len(set(v['language'] for v in google_voices))}")
    print(f"  Male: {len([v for v in google_voices if v['gender'] == 'male'])}")
    print(f"  Female: {len([v for v in google_voices if v['gender'] == 'female'])}")

    # Language overlap
    aws_langs = set(v['language'] for v in aws_voices)
    google_langs = set(v['language'] for v in google_voices)
    common_langs = aws_langs & google_langs

    print(f"\nLanguages available in both: {len(common_langs)}")
    print(f"AWS only: {len(aws_langs - google_langs)}")
    print(f"Google only: {len(google_langs - aws_langs)}")

    return {
        'aws': aws_voices,
        'google': google_voices,
        'stats': {
            'aws_count': len(aws_voices),
            'google_count': len(google_voices),
            'common_languages': list(common_langs)
        }
    }

# Example usage
comparison = compare_voice_providers("YOUR_API_KEY")

Best Practices

Voice Selection

Preview Before Use: Always use the sample URLs to let users preview voices before selection
Filter by Language: Filter voices by the target language to present relevant options to users
Consider Accent: Match voice accent to your target audience (American, British, Indian, etc.)
Engine Quality: Neural and WaveNet voices generally sound more natural than standard voices
SSML Support: Check ssmlSupportCategory if you need advanced SSML features like custom pronunciation or emphasis

Performance Tips

Cache Voice List: Cache the voice list for 24 hours as it rarely changes
Client-Side Filtering: Fetch all voices once and filter on the client side
Lazy Load Samples: Only load audio samples when users preview them
Index by ID: Create an ID-to-voice mapping for quick lookups

Common Voice Categories

SSML Support Categories:

Category A: Basic SSML support (standard engines)
Category B: Advanced SSML support (AWS neural voices)
Category C: Full SSML support (Google WaveNet/Neural2 voices)

Engine Types:

standard: Basic quality, faster processing
neural: High quality, natural-sounding (AWS)
WaveNet: Premium quality (Google)
Neural2: Latest generation neural voices (Google)

Language Coverage

The API provides voices for multiple languages including:

English variants: en-US, en-GB, en-AU, en-IN, en-NZ, en-ZA
European: fr-FR, fr-CA, de-DE, de-AT, it-IT, es-ES, nl-NL, nl-BE, pt-PT, pt-BR
And many more…

Getting started

Videos

Video Storyboard

Pictory Jobs

Smart Layouts

Avatars

VoiceOvers

Music Search

Media Management

Branding - Video

Pictory Projects

Video Templates

Video Summary and Transcription

Vimeo Integration

AWS Integration

Get Voiceover Tracks

Overview

API Endpoint

Request Parameters

Headers

Response

Response Examples

Code Examples

Usage Notes

Common Use Cases

1. List All Available Voices

2. Filter Voices by Language

3. Group Voices by Accent

4. Find Best Voice Match

5. Create Voice Selection UI Data

6. Compare Voice Providers

Best Practices

Voice Selection

Performance Tips

Common Voice Categories

Language Coverage

Getting started

Videos

Video Storyboard

Pictory Jobs

Smart Layouts

Avatars

VoiceOvers

Music Search

Media Management

Branding - Video

Pictory Projects

Video Templates

Video Summary and Transcription

Vimeo Integration

AWS Integration

​Overview

​API Endpoint

​Request Parameters

​Headers

​Response

​Response Examples

​Code Examples

​Usage Notes

​Common Use Cases

​1. List All Available Voices

​2. Filter Voices by Language

​3. Group Voices by Accent

​4. Find Best Voice Match

​5. Create Voice Selection UI Data

​6. Compare Voice Providers

​Best Practices

​Voice Selection

​Performance Tips

​Common Voice Categories

​Language Coverage

Overview

API Endpoint

Request Parameters

Headers

Response

Response Examples

Code Examples

Usage Notes

Common Use Cases

1. List All Available Voices

2. Filter Voices by Language

3. Group Voices by Accent

4. Find Best Voice Match

5. Create Voice Selection UI Data

6. Compare Voice Providers

Best Practices

Voice Selection

Performance Tips

Common Voice Categories

Language Coverage