PickSkill
← Back

Qwen3-tts

Local text-to-speech using Qwen3-TTS-12Hz-1.7B-CustomVoice. Use when generating audio from text, creating voice messages, or when TTS is requested. Supports 10 languages including Italian, 9 premium speaker voices, and instruction-based voice control (emotion, tone, style). Alternative to cloud-based TTS services like ElevenLabs. Runs entirely offline after initial model download.

README.md
Rendered from GitHub raw
View raw ↗

Qwen3-TTS Skill

Local text-to-speech using Qwen3-TTS-12Hz-1.7B-CustomVoice model.

Installation

cd /home/brewuser/.nvm/versions/node/v24.13.0/lib/node_modules/clawdbot/skills/public/qwen-tts
bash scripts/setup.sh

This will:

  • Create a Python 3.12 virtual environment in ./venv
  • Install qwen-tts package and dependencies (~500MB)
  • First synthesis auto-downloads ~1.7GB model

Quick Test

scripts/tts.py "Ciao, questo è un test!" -l Italian -o test.wav

Play the audio:

aplay test.wav  # Linux
# or
ffplay test.wav  # Cross-platform

Usage

See SKILL.md for complete documentation.

Basic:

scripts/tts.py "Your text" -l Italian -o output.wav

List speakers:

scripts/tts.py --list-speakers

With emotion:

scripts/tts.py "Sono felice!" -i "Parla con entusiasmo" -l Italian

Integration with OpenClaw

The skill is automatically available to OpenClaw once installed. OpenClaw can call:

cd skills/public/qwen-tts && scripts/tts.py "Text" -l Italian -o /tmp/audio.wav

Output path is printed to stdout (last line).

Requirements

  • Python 3.10-3.12 (tested with 3.12)
  • ~2.2GB disk space (500MB venv + 1.7GB model)
  • GPU recommended (CPU works but slower)

License

Uses Qwen3-TTS under Apache 2.0 license. Check model card for details: https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice