Skip to main content
GET
https://api.pictory.ai
/
pictoryapis
/
v1
/
voiceovers
/
tracks
Get Voiceover Tracks
curl --request GET \
  --url https://api.pictory.ai/pictoryapis/v1/voiceovers/tracks \
  --header 'Authorization: <authorization>'
[
  {
    "accent": "American accent",
    "category": "standard",
    "engine": "neural",
    "gender": "female",
    "id": 1001,
    "language": "en-US",
    "name": "Joanna",
    "sample": "https://pictory-static.pictorycontent.com/polly/samples/Joanna_100_sample.mp3",
    "service": "aws",
    "ssmlHelp": "https://docs.pictory.ai/docs/supported-ssml-tags#category-b",
    "ssmlSupportCategory": "B",
    "voice": "Joanna"
  },
  {
    "accent": "British accent",
    "category": "standard",
    "engine": "WaveNet",
    "gender": "female",
    "id": 1034,
    "language": "en-GB",
    "name": "Fiona",
    "sample": "https://pictory-static.pictorycontent.com/google/samples/Fiona_en-GB-Wavenet-A_FEMALE_updated.mp3",
    "service": "google",
    "ssmlHelp": "https://docs.pictory.ai/docs/supported-ssml-tags#category-c",
    "ssmlSupportCategory": "C",
    "voice": "en-GB-Wavenet-A_FEMALE"
  },
  {
    "accent": "Indian accent",
    "category": "standard",
    "engine": "WaveNet",
    "gender": "female",
    "id": 1039,
    "language": "en-IN",
    "name": "Shreya",
    "sample": "https://pictory-static.pictorycontent.com/google/samples/en-IN-Wavenet-A_FEMALE_updated.mp3",
    "service": "google",
    "ssmlHelp": "https://docs.pictory.ai/docs/supported-ssml-tags#category-c",
    "ssmlSupportCategory": "C",
    "voice": "en-IN-Wavenet-A_FEMALE"
  }
]

Overview

Retrieve a complete list of all AI voiceover voices available for text-to-speech conversion in your video projects. The endpoint provides detailed information about each voice including accent, gender, language, service provider (AWS Polly or Google WaveNet/Neural2), sample audio URLs, and SSML support categories.
You need a valid API key to use this endpoint. Get your API key from the API Access page in your Pictory dashboard.

API Endpoint

GET https://api.pictory.ai/pictoryapis/v1/voiceovers/tracks

Request Parameters

Headers

Authorization
string
required
API key for authentication (starts with pictai_)
Authorization: YOUR_API_KEY

Response

Returns an array of voice objects with the following properties:
accent
string
The accent or regional variant of the voice (e.g., “American accent”, “British accent”, “Indian accent”)
category
string
Voice quality category, typically “standard”
engine
string
The text-to-speech engine type (e.g., “neural”, “WaveNet”, “Neural2”, “standard”)
gender
string
The voice gender: “male” or “female”
id
integer
Unique numeric identifier for the voice
language
string
Language code in IETF format (e.g., “en-US”, “en-GB”, “fr-FR”, “es-ES”)
name
string
The display name of the voice (e.g., “Joanna”, “Matthew”, “Amy”)
sample
string (uri)
URL to an MP3 sample of the voice for preview
service
string
The voice provider service: “aws” (Amazon Polly) or “google” (Google Cloud Text-to-Speech)
ssmlHelp
string (uri)
URL to documentation for supported SSML tags for this voice
ssmlSupportCategory
string
SSML support level category (A, B, or C) indicating which SSML features are supported
voice
string
The technical voice identifier used by the service provider

Response Examples

[
  {
    "accent": "American accent",
    "category": "standard",
    "engine": "neural",
    "gender": "female",
    "id": 1001,
    "language": "en-US",
    "name": "Joanna",
    "sample": "https://pictory-static.pictorycontent.com/polly/samples/Joanna_100_sample.mp3",
    "service": "aws",
    "ssmlHelp": "https://docs.pictory.ai/docs/supported-ssml-tags#category-b",
    "ssmlSupportCategory": "B",
    "voice": "Joanna"
  },
  {
    "accent": "British accent",
    "category": "standard",
    "engine": "WaveNet",
    "gender": "female",
    "id": 1034,
    "language": "en-GB",
    "name": "Fiona",
    "sample": "https://pictory-static.pictorycontent.com/google/samples/Fiona_en-GB-Wavenet-A_FEMALE_updated.mp3",
    "service": "google",
    "ssmlHelp": "https://docs.pictory.ai/docs/supported-ssml-tags#category-c",
    "ssmlSupportCategory": "C",
    "voice": "en-GB-Wavenet-A_FEMALE"
  },
  {
    "accent": "Indian accent",
    "category": "standard",
    "engine": "WaveNet",
    "gender": "female",
    "id": 1039,
    "language": "en-IN",
    "name": "Shreya",
    "sample": "https://pictory-static.pictorycontent.com/google/samples/en-IN-Wavenet-A_FEMALE_updated.mp3",
    "service": "google",
    "ssmlHelp": "https://docs.pictory.ai/docs/supported-ssml-tags#category-c",
    "ssmlSupportCategory": "C",
    "voice": "en-IN-Wavenet-A_FEMALE"
  }
]

Code Examples

Replace YOUR_API_KEY with your actual API key that starts with pictai_
curl --request GET \
  --url https://api.pictory.ai/pictoryapis/v1/voiceovers/tracks \
  --header 'Authorization: YOUR_API_KEY' \
  --header 'accept: application/json' | python -m json.tool

Usage Notes

Voice Availability: The endpoint returns all available voices across multiple languages, accents, and service providers. Filter the results based on your project requirements.
Sample Audio: Each voice includes a sample URL pointing to an MP3 preview. Use these samples to let users preview voices before selecting.
SSML Support: Different voices support different SSML (Speech Synthesis Markup Language) features. Check the ssmlSupportCategory and ssmlHelp fields to understand what’s available for each voice.
Service Providers:
  • AWS Polly voices (service: "aws") use the “neural” or “standard” engine
  • Google Cloud voices (service: "google") use “WaveNet” or “Neural2” engines
Generally, neural/WaveNet voices sound more natural than standard voices.

Common Use Cases

1. List All Available Voices

Retrieve and display all available voices:
import requests

def get_all_voices(api_key):
    """
    Retrieve all available voiceover voices
    """
    url = "https://api.pictory.ai/pictoryapis/v1/voiceovers/tracks"
    headers = {"Authorization": api_key}

    response = requests.get(url, headers=headers)

    if response.status_code == 200:
        voices = response.json()
        print(f"Total available voices: {len(voices)}\n")

        for voice in voices[:10]:  # Show first 10
            print(f"{voice['name']} - {voice['accent']} ({voice['language']})")
            print(f"  Gender: {voice['gender']}, Engine: {voice['engine']}")
            print(f"  Sample: {voice['sample']}")
            print()

        return voices
    else:
        print(f"Error: {response.status_code}")
        return []

# Example usage
voices = get_all_voices("YOUR_API_KEY")

2. Filter Voices by Language

Get all voices for a specific language:
async function getVoicesByLanguage(apiKey, languageCode) {
  const response = await fetch(
    'https://api.pictory.ai/pictoryapis/v1/voiceovers/tracks',
    {
      headers: { 'Authorization': `${apiKey}` }
    }
  );

  const allVoices = await response.json();

  // Filter by language code (e.g., "en-US", "en-GB", "fr-FR")
  const filtered = allVoices.filter(voice => voice.language === languageCode);

  console.log(`Found ${filtered.length} voices for ${languageCode}:`);
  filtered.forEach(voice => {
    console.log(`  - ${voice.name} (${voice.accent}, ${voice.gender})`);
  });

  return filtered;
}

// Example usage
const usVoices = await getVoicesByLanguage('YOUR_API_KEY', 'en-US');

3. Group Voices by Accent

Organize voices by accent type:
from collections import defaultdict
import requests

def group_voices_by_accent(api_key):
    """
    Group voices by their accent
    """
    url = "https://api.pictory.ai/pictoryapis/v1/voiceovers/tracks"
    headers = {"Authorization": api_key}

    response = requests.get(url, headers=headers)
    voices = response.json()

    # Group by accent
    by_accent = defaultdict(list)
    for voice in voices:
        accent = voice.get('accent', 'Unknown')
        by_accent[accent].append(voice)

    # Print grouped results
    for accent, voice_list in sorted(by_accent.items()):
        print(f"\n{accent} ({len(voice_list)} voices):")
        for voice in voice_list[:5]:  # Show first 5 per accent
            print(f"  - {voice['name']} ({voice['gender']}, {voice['engine']})")

    return by_accent

# Example usage
grouped = group_voices_by_accent("YOUR_API_KEY")

4. Find Best Voice Match

Find voices matching specific criteria:
async function findVoiceMatch(apiKey, criteria) {
  const response = await fetch(
    'https://api.pictory.ai/pictoryapis/v1/voiceovers/tracks',
    {
      headers: { 'Authorization': `${apiKey}` }
    }
  );

  const voices = await response.json();

  // Filter by multiple criteria
  const matches = voices.filter(voice => {
    return (
      (!criteria.language || voice.language === criteria.language) &&
      (!criteria.gender || voice.gender === criteria.gender) &&
      (!criteria.accent || voice.accent.toLowerCase().includes(criteria.accent.toLowerCase())) &&
      (!criteria.service || voice.service === criteria.service) &&
      (!criteria.engine || voice.engine === criteria.engine)
    );
  });

  console.log(`Found ${matches.length} matching voices:`);
  matches.forEach(voice => {
    console.log(`  - ${voice.name} (ID: ${voice.id})`);
    console.log(`    ${voice.accent}, ${voice.gender}, ${voice.engine}`);
  });

  return matches;
}

// Example usage - find American female neural voices
const matches = await findVoiceMatch('YOUR_API_KEY', {
  language: 'en-US',
  gender: 'female',
  accent: 'American',
  engine: 'neural'
});

5. Create Voice Selection UI Data

Prepare voice data for a user interface:
import requests

def prepare_voice_ui_data(api_key, language_filter=None):
    """
    Prepare voice data structured for UI dropdowns/selectors
    """
    url = "https://api.pictory.ai/pictoryapis/v1/voiceovers/tracks"
    headers = {"Authorization": api_key}

    response = requests.get(url, headers=headers)
    voices = response.json()

    # Filter by language if specified
    if language_filter:
        voices = [v for v in voices if v['language'] == language_filter]

    # Structure for UI
    ui_data = {
        'languages': {},
        'accents': {},
        'genders': ['male', 'female'],
        'engines': set(),
        'voices': []
    }

    for voice in voices:
        # Collect unique languages
        lang = voice['language']
        if lang not in ui_data['languages']:
            ui_data['languages'][lang] = []
        ui_data['languages'][lang].append(voice['name'])

        # Collect unique accents
        accent = voice['accent']
        if accent not in ui_data['accents']:
            ui_data['accents'][accent] = []
        ui_data['accents'][accent].append(voice['name'])

        # Collect engines
        ui_data['engines'].add(voice['engine'])

        # Create simplified voice entry
        ui_data['voices'].append({
            'id': voice['id'],
            'name': voice['name'],
            'label': f"{voice['name']} ({voice['accent']}, {voice['gender']})",
            'language': voice['language'],
            'accent': voice['accent'],
            'gender': voice['gender'],
            'engine': voice['engine'],
            'service': voice['service'],
            'sample': voice['sample']
        })

    ui_data['engines'] = sorted(list(ui_data['engines']))

    print(f"Prepared UI data for {len(ui_data['voices'])} voices")
    print(f"Languages: {len(ui_data['languages'])}")
    print(f"Accents: {len(ui_data['accents'])}")
    print(f"Engines: {ui_data['engines']}")

    return ui_data

# Example usage
ui_data = prepare_voice_ui_data("YOUR_API_KEY", language_filter="en-US")

# Access structured data
print("\nSample voice entries:")
for voice in ui_data['voices'][:3]:
    print(f"  {voice['label']} - ID: {voice['id']}")

6. Compare Voice Providers

Analyze and compare voices by service provider:
import requests

def compare_voice_providers(api_key):
    """
    Compare voice offerings between AWS and Google
    """
    url = "https://api.pictory.ai/pictoryapis/v1/voiceovers/tracks"
    headers = {"Authorization": api_key}

    response = requests.get(url, headers=headers)
    voices = response.json()

    # Separate by provider
    aws_voices = [v for v in voices if v['service'] == 'aws']
    google_voices = [v for v in voices if v['service'] == 'google']

    print("Voice Provider Comparison:")
    print(f"\nAWS Polly: {len(aws_voices)} voices")
    print(f"  Engines: {set(v['engine'] for v in aws_voices)}")
    print(f"  Languages: {len(set(v['language'] for v in aws_voices))}")
    print(f"  Male: {len([v for v in aws_voices if v['gender'] == 'male'])}")
    print(f"  Female: {len([v for v in aws_voices if v['gender'] == 'female'])}")

    print(f"\nGoogle Cloud: {len(google_voices)} voices")
    print(f"  Engines: {set(v['engine'] for v in google_voices)}")
    print(f"  Languages: {len(set(v['language'] for v in google_voices))}")
    print(f"  Male: {len([v for v in google_voices if v['gender'] == 'male'])}")
    print(f"  Female: {len([v for v in google_voices if v['gender'] == 'female'])}")

    # Language overlap
    aws_langs = set(v['language'] for v in aws_voices)
    google_langs = set(v['language'] for v in google_voices)
    common_langs = aws_langs & google_langs

    print(f"\nLanguages available in both: {len(common_langs)}")
    print(f"AWS only: {len(aws_langs - google_langs)}")
    print(f"Google only: {len(google_langs - aws_langs)}")

    return {
        'aws': aws_voices,
        'google': google_voices,
        'stats': {
            'aws_count': len(aws_voices),
            'google_count': len(google_voices),
            'common_languages': list(common_langs)
        }
    }

# Example usage
comparison = compare_voice_providers("YOUR_API_KEY")

Best Practices

Voice Selection

  1. Preview Before Use: Always use the sample URLs to let users preview voices before selection
  2. Filter by Language: Filter voices by the target language to present relevant options to users
  3. Consider Accent: Match voice accent to your target audience (American, British, Indian, etc.)
  4. Engine Quality: Neural and WaveNet voices generally sound more natural than standard voices
  5. SSML Support: Check ssmlSupportCategory if you need advanced SSML features like custom pronunciation or emphasis

Performance Tips

  • Cache Voice List: Cache the voice list for 24 hours as it rarely changes
  • Client-Side Filtering: Fetch all voices once and filter on the client side
  • Lazy Load Samples: Only load audio samples when users preview them
  • Index by ID: Create an ID-to-voice mapping for quick lookups

Common Voice Categories

SSML Support Categories:
  • Category A: Basic SSML support (standard engines)
  • Category B: Advanced SSML support (AWS neural voices)
  • Category C: Full SSML support (Google WaveNet/Neural2 voices)
Engine Types:
  • standard: Basic quality, faster processing
  • neural: High quality, natural-sounding (AWS)
  • WaveNet: Premium quality (Google)
  • Neural2: Latest generation neural voices (Google)

Language Coverage

The API provides voices for multiple languages including:
  • English variants: en-US, en-GB, en-AU, en-IN, en-NZ, en-ZA
  • European: fr-FR, fr-CA, de-DE, de-AT, it-IT, es-ES, nl-NL, nl-BE, pt-PT, pt-BR
  • And many more…