Real-time Streaming

VieNeu-TTS supports ultra-low latency streaming — audio playback starts before the entire sentence is generated.

Performance

Latency: Under 300ms for first chunk on modern i3/i5 CPUs
Optimized for: GGUF backend on CPU
Sample rate: 24 kHz

Web Demo

uv run vieneu-stream

Open http://localhost:8001 in your browser.

SDK Streaming

from vieneu import Vieneu

tts = Vieneu()

for chunk in tts.infer_stream(text="Một đoạn văn rất dài..."):
    # Each chunk is a numpy array of audio samples
    play_audio(chunk)  # Your audio playback function

Parameters

tts.infer_stream(
    text="Your text here",
    max_chars=256,        # Max characters per internal chunk
    temperature=1.0,      # Sampling temperature
    top_k=50,             # Top-k sampling
    voice=voice_data,     # Optional preset voice
    ref_audio="ref.wav",  # Optional reference for cloning
    ref_text="...",       # Transcript of reference
)

How It Works

Text is split into chunks
Each chunk is phonemized
GGUF model generates tokens via streaming
Every N tokens, a partial decode produces an audio chunk
Overlap-add smooths chunk boundaries

The streaming uses configurable parameters:

streaming_frames_per_chunk: 25 frames per audio chunk
streaming_overlap_frames: 1 frame overlap for smooth transitions
streaming_lookforward: 10 frames lookahead for quality

Performance​

Web Demo​

SDK Streaming​

Parameters​

How It Works​

Performance

Web Demo

SDK Streaming

Parameters

How It Works