Endpoints
Create Generation
Start a text-to-speech generation job
POST
Submits a text-to-speech task and returns a
generation_id for tracking. The audio is generated asynchronously—use the Get Generation endpoint to check when it’s ready.
Endpoint
Request
The text to convert to speech. Maximum 50,000 characters (varies by plan).
The model tier to use.
base: Standard high-quality voices. Lowest cost/latency.advanced: Premium voices. Supports cloning and higher expressiveness.
Playback speed multiplier. Range:
0.5 to 2.0.Advanced Model Settings
Controls how strictly the model follows the text/style. Range:
1.0 - 5.0.Controls emotional fluctuation/expressiveness. Range:
0.0 - 1.0.Output format:
wav or mp3.Optional language code for the input text. The accepted values depend on
model:- Base model uses single-letter codes:
a(American English),b(British English),j(Japanese),z(Mandarin Chinese),e(Spanish),f(French),h(Hindi),i(Italian),p(Brazilian Portuguese). - Advanced model uses ISO 639-1 codes:
en,ar,da,de,el,es,fi,fr,he,hi,it,ja,ko,ms,nl,no,pl,pt,ru,sv,sw,tr,zh.
Response
Initial status:
IN_QUEUE.The input text that was submitted.
Voice ID used for generation.
Human-readable voice name.
TTS model used (
base or advanced).Number of tokens consumed.
Cost in USD (set when generation completes).
Payment method used:
API_WALLET or CREDITS_BALANCE.URL to download the generated audio (when status=
COMPLETED).Error details if status=
FAILED.ISO 8601 timestamp of creation.
ISO 8601 timestamp when processing started.
ISO 8601 timestamp when processing completed.
Error Responses
400 Bad Request
400 Bad Request
Invalid parameters (e.g. invalid model, voice not found).
402 Payment Required
402 Payment Required
Insufficient balance.
429 Too Many Requests
429 Too Many Requests
Rate limit exceeded.