Skip to main content

Voice Management

Preset Voices

Models include built-in voices defined in voices.json.

list_preset_voices()

voices = tts.list_preset_voices()
# Returns: List[tuple[str, str]] → [(description, voice_id), ...]

get_preset_voice()

voice = tts.get_preset_voice(voice_name: str = None)
# Returns: dict → {"codes": Tensor, "text": str}

If voice_name is None, returns the default voice.

Using a Preset Voice

voices = tts.list_preset_voices()
for desc, vid in voices:
print(f"{vid}: {desc}")

voice = tts.get_preset_voice("bac_si_tuyen")
audio = tts.infer(text="Chào bạn!", voice=voice)

LoRA Adapters

Load custom fine-tuned models (PyTorch mode only, not GGUF):

load_lora_adapter()

success = tts.load_lora_adapter(
lora_repo_id: str, # HuggingFace repo with LoRA weights
hf_token: str = None,
)

Loading a LoRA adapter also loads its voices.json if available, replacing existing voices.

unload_lora_adapter()

success = tts.unload_lora_adapter()

Restores original model weights.

voices.json Format

{
"default_voice": "voice_name",
"presets": {
"voice_name": {
"description": "Description of the voice",
"text": "Transcript of the reference audio",
"codes": [42, 17, 89, 55, ...]
}
}
}