PickSkill
← Back

openai-tts-python

Text-to-speech conversion using OpenAI's TTS API for generating high-quality, natural-sounding audio. Supports 6 voices (alloy, echo, fable, onyx, nova, shimmer), speed control (0.25x-4.0x), HD quality model, multiple output formats (mp3, opus, aac, flac), and automatic text chunking for long content (4096 char limit per request). Use when: (1) User requests audio/voice output with triggers like "read this to me", "convert to audio", "generate speech", "text to speech", "tts", "narrate", "speak", or when keywords "openai tts", "voice", "podcast" appear. (2) Content needs to be spoken rather than read (multitasking, accessibility). (3) User wants specific voice preferences like "alloy", "echo", "fable", "onyx", "nova", "shimmer" or speed adjustments.

README.md
Rendered from GitHub raw
View raw ↗

OpenAI TTS Skill

Text-to-speech conversion using OpenAI's TTS API for generating high-quality, natural-sounding audio from text.

Installation

pip install openai pydub
 
# For audio processing (pydub dependency)
# macOS
brew install ffmpeg
 
# Ubuntu/Debian
sudo apt install ffmpeg

Setup

Set your OpenAI API key:

export OPENAI_API_KEY="sk-..."

Usage

Command Line

# Basic usage
python openai_tts.py "Hello world" -o output.mp3
 
# From file
python openai_tts.py -f article.txt -o article.mp3
 
# With voice selection
python openai_tts.py "Your text" -o output.mp3 --voice nova
 
# High quality
python openai_tts.py "Your text" -o output.mp3 --model tts-1-hd
 
# Adjust speed (0.25 to 4.0)
python openai_tts.py "Your text" -o output.mp3 --speed 1.5
 
# Pipe input
echo "Hello world" | python openai_tts.py -o output.mp3
 
# Verbose mode
python openai_tts.py "Test" -o test.mp3 -v
 
# List available voices
python openai_tts.py --list-voices

As Module

from openai_tts import generate_tts
 
# Basic
generate_tts("Hello world", "output.mp3")
 
# With options
generate_tts(
    text="Your text here",
    output_path="output.mp3",
    voice="nova",
    model="tts-1-hd",
    response_format="mp3",
    speed=1.25,  # 0.25 to 4.0
    verbose=True
)

Voices

Voice Type Description
alloy Neutral Balanced, versatile
echo Male Warm, conversational
fable Neutral Expressive, storytelling
onyx Male Deep, authoritative (default)
nova Female Friendly, upbeat
shimmer Female Clear, professional

Models

Model Quality Speed Cost
tts-1 Standard Fast $0.015/1K chars
tts-1-hd High Definition Slower $0.030/1K chars

Features

  • Auto-chunking: Automatically splits text longer than 4096 characters
  • Multiple formats: mp3, opus, aac, flac
  • 6 voices: Male and female options
  • Pipe support: Read from stdin

Output Formats

Format Description
mp3 Default, widely compatible
opus Smaller file size, good quality
aac Apple/iOS compatible
flac Lossless, larger files

License

MIT