B. Audio Chunks (Binary)
Raw binary audio data streams immediately after the “started” message. See Audio Format below for decoding details. Append these chunks to your audio buffer or play them directly.C. Completion (JSON)
Sent when the stream ends.
All binary audio chunks are streamed in a standardized format, consistent across all models (base, advanced, turbo):
Property
Value
Encoding
Raw PCM (uncompressed)
Sample Format
32-bit float (float32)
Sample Rate
24,000 Hz
Channels
1 (Mono)
Byte Order
Little-endian
The streamed chunks are raw PCM float32 — not WAV or MP3. Each sample is a 4-byte IEEE 754 float, typically in the range [-1.0, 1.0]. You must decode them as float32, not int16, or the audio will sound like static noise.
The completed message includes an audio_url pointing to the fully encoded file (WAV/MP3) on the CDN. You can use this as a fallback or for saving a complete copy.
WebSocket is a bidirectional protocol — standard HTTP tools like cURL cannot be used for streaming. For CLI testing, you can use websocat, though binary audio handling is limited in terminal tools.For any language with a WebSocket client, the decoding rules are the same:
Connect to wss://api.audixa.ai/v3/tts/stream?api_key=YOUR_KEY
Send a JSON request, then listen for messages
Text messages → JSON control events (started, completed, error)
Binary messages → Raw PCM audio: float32, little-endian, 24 kHz, mono (4 bytes per sample)