Skip to main content
Audixa provides a WebSocket endpoint for real-time text-to-speech streaming, allowing you to play audio as it is being generated.
Compatible with wav and mp3 output formats.

Connection

Endpoint
wss://api.audixa.ai/v3/tts/stream

Authentication

Pass your API key as a query parameter:
wss://api.audixa.ai/v3/tts/stream?api_key=YOUR_API_KEY

Protocol

1. Send Request

Once connected, send a JSON payload with your generation details:
{
  "text": "Hello, world! This is a streaming test.",  // Required
  "voice_id": "am_ethan",                               // Required
  "model": "base",                                  // Optional, default: "base"
  "speed": 1.0,                                     // Optional, default: 1.0
  "audio_format": "wav",                            // Optional: "wav" or "mp3"
  "cfg_weight": 2.5,                                // Optional: Advanced model only
  "exaggeration": 0.5                               // Optional: Advanced model only
}

2. Receive Messages

The server will send a mix of JSON control messages and binary audio data. A. Started Message (JSON) Sent when processing begins.
{
  "type": "started",
  "generation_id": "gen_abc123",
  "channel": "generation:gen_abc123"
}
B. Audio Chunks (Binary) Raw binary audio data streams immediately after the “started” message. See Audio Format below for decoding details. Append these chunks to your audio buffer or play them directly. C. Completion (JSON) Sent when the stream ends.
{
  "type": "completed",
  "audio_url": "https://cdn.audixa.ai/..." // Backup complete file URL
}
D. Error (JSON)
{
  "type": "error",
  "message": "Invalid API Key"
}

Audio Format

All binary audio chunks are streamed in a standardized format, consistent across all models (base, advanced, turbo):
PropertyValue
EncodingRaw PCM (uncompressed)
Sample Format32-bit float (float32)
Sample Rate24,000 Hz
Channels1 (Mono)
Byte OrderLittle-endian
The streamed chunks are raw PCM float32 — not WAV or MP3. Each sample is a 4-byte IEEE 754 float, typically in the range [-1.0, 1.0]. You must decode them as float32, not int16, or the audio will sound like static noise.
The completed message includes an audio_url pointing to the fully encoded file (WAV/MP3) on the CDN. You can use this as a fallback or for saving a complete copy.

Examples

Real-Time Playback

Play audio chunks as they arrive for the lowest perceived latency.
import websockets
import json
import asyncio
import numpy as np
import sounddevice as sd

# Audio format — must match server output
RATE = 24000        # 24 kHz
CHANNELS = 1        # Mono
DTYPE = 'float32'   # 32-bit float PCM

async def stream_tts():
    audio_stream = sd.RawOutputStream(
        samplerate=RATE,
        channels=CHANNELS,
        dtype=DTYPE,
    )
    audio_stream.start()

    uri = "wss://api.audixa.ai/v3/tts/stream?api_key=YOUR_KEY"
    try:
        async with websockets.connect(uri) as websocket:
            await websocket.send(json.dumps({
                "text": "Hello from the stream!",
                "voice_id": "am_ethan",
                "model": "base"
            }))

            while True:
                msg = await websocket.recv()
                if isinstance(msg, str):
                    data = json.loads(msg)
                    if data["type"] == "started":
                        print("Stream started!")
                    elif data["type"] == "completed":
                        print(f"Done! CDN URL: {data.get('audio_url')}")
                        break
                    elif data["type"] == "error":
                        print(f"Error: {data['message']}")
                        break
                else:
                    # Decode float32 PCM and play immediately
                    audio_stream.write(np.frombuffer(msg, dtype=DTYPE))
    finally:
        audio_stream.stop()
        audio_stream.close()

asyncio.run(stream_tts())

Save to File

Collect all audio chunks and save a complete file when the stream ends.
import websockets
import json
import asyncio
import numpy as np
import soundfile as sf

async def stream_tts():
    uri = "wss://api.audixa.ai/v3/tts/stream?api_key=YOUR_KEY"
    async with websockets.connect(uri) as websocket:
        await websocket.send(json.dumps({
            "text": "Hello from the stream!",
            "voice_id": "am_ethan",
            "model": "base"
        }))

        audio_chunks = []
        while True:
            msg = await websocket.recv()
            if isinstance(msg, str):
                data = json.loads(msg)
                if data["type"] == "completed":
                    print(f"Done! CDN URL: {data.get('audio_url')}")
                    break
                elif data["type"] == "error":
                    print(f"Error: {data['message']}")
                    break
            else:
                # Decode float32 PCM
                audio_chunks.append(np.frombuffer(msg, dtype=np.float32))

        # Combine chunks and save as WAV
        audio = np.concatenate(audio_chunks)
        sf.write("streamed_output.wav", audio, 24000)
        print("Saved to streamed_output.wav")

asyncio.run(stream_tts())

Other Languages & CLI

WebSocket is a bidirectional protocol — standard HTTP tools like cURL cannot be used for streaming. For CLI testing, you can use websocat, though binary audio handling is limited in terminal tools.For any language with a WebSocket client, the decoding rules are the same:
  • Connect to wss://api.audixa.ai/v3/tts/stream?api_key=YOUR_KEY
  • Send a JSON request, then listen for messages
  • Text messages → JSON control events (started, completed, error)
  • Binary messages → Raw PCM audio: float32, little-endian, 24 kHz, mono (4 bytes per sample)

WebSocket Close Codes

The server received data it cannot accept (e.g., malformed JSON, missing fields, or validation errors).
Authentication failed (e.g., invalid or missing API key).
An unexpected condition prevented the server from fulfilling the request (e.g., database or Redis connection failure).