PickSkill
← Back

ybouhjira/claude-code-tts

README.md
Rendered from GitHub raw
View raw ↗

Claude Code TTS Plugin

Go Version License CI codecov MCP

A Text-to-Speech MCP server plugin for Claude Code that converts text to speech using OpenAI's TTS API. Get audio feedback from Claude as you work!

Demo

Features

  • Deterministic Auto-Speak: Every Claude response is automatically spoken (via Stop hook)
  • 6 High-Quality Voices: alloy, echo, fable, onyx, nova, shimmer
  • Worker Pool Architecture: Non-blocking queue with concurrent processing
  • Mutex-Protected Playback: One audio plays at a time, no overlapping
  • Cross-Platform: macOS (afplay), Linux (mpv/ffplay/mpg123), Windows (PowerShell)
  • Standalone CLI: speak-text binary for direct TTS without MCP

Quick Install

# One-liner installation
curl -fsSL https://raw.githubusercontent.com/ybouhjira/claude-code-tts/main/install.sh | bash

Or install manually:

git clone https://github.com/ybouhjira/claude-code-tts.git ~/.claude/plugins/claude-code-tts
cd ~/.claude/plugins/claude-code-tts
make install

Requirements

  • Go 1.21+ (for building from source)
  • OpenAI API Key with TTS access
  • Audio Player:
    • macOS: afplay (built-in)
    • Linux: mpv, ffplay, or mpg123
    • Windows: PowerShell (built-in)

Configuration

Set your OpenAI API key:

export OPENAI_API_KEY="sk-..."

Or add to your shell profile (~/.zshrc or ~/.bashrc).

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Claude Code                              │
│                         │                                    │
│                    MCP Protocol                              │
│                         │                                    │
│  ┌──────────────────────▼──────────────────────────────┐    │
│  │              TTS MCP Server (Go)                     │    │
│  │  ┌─────────────────────────────────────────────┐    │    │
│  │  │              Tool Handlers                   │    │    │
│  │  │   speak(text, voice)  │  tts_status()       │    │    │
│  │  └─────────────┬─────────┴─────────────────────┘    │    │
│  │                │                                     │    │
│  │  ┌─────────────▼─────────────────────────────┐      │    │
│  │  │           Worker Pool (2 workers)          │      │    │
│  │  │  ┌─────────┐    ┌─────────────────────┐   │      │    │
│  │  │  │ Job     │───►│ Queue (50 slots)    │   │      │    │
│  │  │  │ Submit  │    └──────────┬──────────┘   │      │    │
│  │  │  └─────────┘               │              │      │    │
│  │  │                   ┌────────▼────────┐     │      │    │
│  │  │                   │ Worker 1 │ 2    │     │      │    │
│  │  │                   └────────┬────────┘     │      │    │
│  │  └────────────────────────────│──────────────┘      │    │
│  │                               │                      │    │
│  │  ┌────────────────────────────▼──────────────────┐  │    │
│  │  │              OpenAI TTS API                    │  │    │
│  │  │         POST /v1/audio/speech                  │  │    │
│  │  │         Model: tts-1                           │  │    │
│  │  └───────────────────┬────────────────────────────┘  │    │
│  │                      │                               │    │
│  │  ┌───────────────────▼────────────────────────────┐  │    │
│  │  │         Audio Player (Mutex Protected)          │  │    │
│  │  │   macOS: afplay │ Linux: mpv │ Win: PowerShell  │  │    │
│  │  └─────────────────────────────────────────────────┘  │    │
│  └──────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘

Usage

speak(text, voice)

Convert text to speech and play it aloud.

Parameters:

Parameter Type Required Description
text string Yes Text to speak (max 4096 chars)
voice string No Voice to use (default: alloy)

Available Voices:

Voice Description
alloy Neutral, balanced
echo Male, warm
fable British accent
onyx Deep male
nova Female, friendly
shimmer Soft female

Example:

Use the speak tool to say "Build completed successfully!" with the nova voice.

tts_status()

Get the current status of the TTS system.

Returns:

{
  "worker_count": 2,
  "queue_size": 50,
  "queue_pending": 0,
  "total_processed": 15,
  "total_failed": 0,
  "is_playing": false,
  "recent_jobs": [...]
}

Automatic TTS (Deterministic)

This plugin includes a Stop hook that automatically speaks the first sentence of every Claude response. No configuration needed - it just works.

How it works:

Claude responds → Stop hook fires → First sentence extracted → Audio plays

The hook runs in the background and won't block Claude's responses.

speak-text CLI

A standalone binary for direct TTS without going through MCP:

# Basic usage
speak-text "Hello world"
 
# With voice selection
speak-text -voice onyx "Error occurred"

Located at ~/.claude/plugins/claude-code-tts/bin/speak-text after installation.

Project Structure

claude-code-tts/
├── cmd/
│   ├── tts-server/
│   │   └── main.go           # MCP server entry point
│   └── speak-text/
│       └── main.go           # Standalone CLI binary
├── hooks/
│   └── auto-speak.sh         # Stop hook for deterministic TTS
├── internal/
│   ├── audio/
│   │   └── player.go         # Cross-platform audio playback
│   ├── server/
│   │   ├── server.go         # MCP server & tool handlers
│   │   └── worker.go         # Worker pool implementation
│   └── tts/
│       └── openai.go         # OpenAI TTS client
├── plugin.json                # Plugin metadata + hook config
├── Makefile                   # Build automation
└── install.sh                 # One-liner installer

Building from Source

# Clone the repository
git clone https://github.com/ybouhjira/claude-code-tts.git
cd claude-code-tts
 
# Build
make build
 
# Install to Claude Code plugins
make install
 
# Run tests
make test

Troubleshooting

"OPENAI_API_KEY environment variable is required"

Set your OpenAI API key:

export OPENAI_API_KEY="sk-..."

"No suitable audio player found on Linux"

Install one of: mpv, ffplay, or mpg123:

# Ubuntu/Debian
sudo apt install mpv
 
# Fedora
sudo dnf install mpv
 
# Arch
sudo pacman -S mpv

Audio not playing on macOS

Check that afplay works:

# Test with a sample audio file
afplay /System/Library/Sounds/Ping.aiff

Queue is full

The default queue size is 50. If you're hitting this limit:

  1. Wait for current jobs to complete
  2. Check tts_status() to see pending jobs
  3. The queue will drain as jobs are processed

High latency

  • OpenAI TTS API typically takes 1-3 seconds per request
  • Audio files must download completely before playing
  • Consider keeping messages short for faster feedback

API Costs

This plugin uses OpenAI's tts-1 model:

  • Cost: ~$0.015 per 1,000 characters
  • Example: "Hello, world!" (13 chars) = ~$0.0002

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

Credits