Skip to main content

Inference Methods

infer()

Generate speech from a single text input.

audio = tts.infer(
text: str,
ref_audio: str = None,
ref_codes: Tensor = None,
ref_text: str = None,
voice: dict = None,
max_chars: int = 256,
silence_p: float = 0.15,
crossfade_p: float = 0.0,
temperature: float = 1.0,
top_k: int = 50,
skip_normalize: bool = False,
)

Parameters

ParameterTypeDescription
textstrText to synthesize
ref_audiostrPath to reference audio for voice cloning
ref_codesTensorPre-encoded reference codes
ref_textstrTranscript of reference audio
voicedictPreset voice dict from get_preset_voice()
max_charsintMax characters per chunk (default 256)
silence_pfloatSilence duration between chunks in seconds
crossfade_pfloatCrossfade duration between chunks
temperaturefloatSampling temperature (higher = more variation)
top_kintTop-k sampling (limits token choices)
skip_normalizeboolSkip text normalization

Returns

numpy.ndarray — Audio waveform at 24 kHz.

Voice Priority

  1. voice dict (from preset)
  2. ref_audio + ref_text (voice cloning)
  3. ref_codes + ref_text (pre-encoded)
  4. Default preset voice (if available)

infer_batch()

Generate speech for multiple texts.

audios = tts.infer_batch(
texts: List[str],
# Same voice/sampling params as infer()
)

Returns

List[numpy.ndarray] — List of audio waveforms.

PyTorch mode uses true batch generation. GGUF processes sequentially.


infer_stream()

Generate speech with streaming output (GGUF only).

for chunk in tts.infer_stream(
text: str,
# Same voice/sampling params as infer()
):
play_audio(chunk)

Yields

numpy.ndarray — Audio chunks as they're generated.


save()

Save audio to a WAV file.

tts.save(audio: numpy.ndarray, output_path: str)

encode_reference()

Encode reference audio to reusable codes.

codes = tts.encode_reference(ref_audio_path: str)

Returns

torch.Tensor — Encoded speech codes.


close()

Release model resources and free memory.

tts.close()

Or use the context manager:

with Vieneu() as tts:
audio = tts.infer(text="...")