speak-tts▌
emzod/speak · updated Apr 8, 2026
MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.
Real-time text-to-speech with voice cloning on Apple Silicon, entirely on-device.
- ›Supports multiple input sources (text files, markdown, stdin, web articles, PDFs) and output modes (streaming, file save, playback, or both)
- ›Voice cloning from 10–30 second WAV samples at 24000 Hz mono; includes emotion tags like [laugh] , [sigh] , and [gasp] for audible effects
- ›Batch processing with auto-chunking for long documents, concatenation utilities, and resume capability for interrupted generat
speak - Talk to your Claude!
Give your agent the ability to speak to you real-time. Local text-to-speech, voice cloning, and audio generation on Apple Silicon. Give your agent the ability to speak to you real-time. Local TTS with voice cloning on Apple Silicon.
Prerequisites
| Requirement | Check | Install |
|---|---|---|
| Apple Silicon Mac | uname -m → arm64 |
Intel not supported |
| macOS 12.0+ | sw_vers |
- |
| sox | which sox |
brew install sox |
| ffmpeg | which ffmpeg |
brew install ffmpeg |
| poppler (PDF) | which pdftotext |
brew install poppler |
Input Sources
| Source | Example |
|---|---|
| Text file | speak article.txt |
| Markdown | speak doc.md |
| Direct string | speak "Hello" |
| Clipboard | pbpaste | speak |
| Stdin | cat file.txt | speak |
Web Articles
lynx -dump -nolist "https://example.com/article" | speak --output article.wav
Converting Formats
| Format | Convert Command |
|---|---|
pdftotext doc.pdf doc.txt |
|
| DOCX | textutil -convert txt doc.docx |
| HTML | pandoc -f html -t plain doc.html > doc.txt |
Output Modes
| Goal | Command |
|---|---|
| Save for later | speak text.txt --output file.wav |
| Listen now (streaming) | speak text.txt --stream |
| Listen now (complete) | speak text.txt --play |
| Both | speak text.txt --stream --output file.wav |
Default Behavior
speak article.txt # → ~/Audio/speak/article.wav (no playback)
speak "Hello" # → ~/Audio/speak/speak_<timestamp>.wav
Directory Auto-Creation
| Directory | Auto-Created? |
|---|---|
~/Audio/speak/ |
✓ Yes |
~/.chatter/voices/ |
✗ No |
| Custom directories | ✗ No |
Always create custom directories first:
mkdir -p ~/.chatter/voices/
mkdir -p ~/Audio/custom/
Voice Cloning
Voice cloning generates speech that matches your vocal characteristics (pitch, tone, cadence) from a short recording.
Quality Expectations
- Output captures general voice characteristics but is not a perfect replica
- Quality depends heavily on sample quality
- 15-25 seconds is optimal (10s minimum, 30s maximum)
Recording Your Voice
Using QuickTime:
- Open QuickTime Player → File → New Audio Recording
- Record 20 seconds of clear speech
- File → Export As → Audio Only (.m4a)
- Convert to WAV (see below)
Using sox (command line):
# -d = use default microphone
# Recording starts immediately and stops after 25 seconds
sox -d -r 24000 -c 1 ~/.chatter/voices/my_voice.wav trim 0 25
Converting to Required Format
Voice samples MUST be: WAV, 24000 Hz, mono, 10-30 seconds.
# From MP3
ffmpeg -i voice.mp3 -ar 24000 -ac 1 voice.wav
# From M4A (QuickTime)
ffmpeg -i voice.m4a -ar 24000 -ac 1 voice.wav
# Trim to 25 seconds
ffmpeg -i long.wav -t 25 -ar 24000 -ac 1 trimmed.wav
# Check sample properties
ffprobe -i voice.wav 2>&1 | grep -E "Duration|Stream"
# Should show: Duration ~15-25s, 24000 Hz, mono
Using Your Voice
# Create directory
mkdir -p ~/.chatter/voices/
# Move sample
mv voice.wav ~/.chatter/voices/my_voice.wav
# Test
speak "Testing my voice" --voice ~/.chatter/voices/my_voice.wav --stream
# Use for content
speak notes.txt --voice ~/.chatter/voices/my_voice.wav --output presentation.wav
Path requirements:
- ✓ Works:
~/.chatter/voices/my_voice.wav(tilde expanded by shell) - ✓ Works:
/Users/name/.chatter/voices/my_voice.wav - ✗ Fails:
my_voice.wav(relative path) - ✗ Fails:
./voices/my_voice.wav(relative path)
Voice Sample Tips
| Good Sample | Bad Sample |
|---|---|
| Quiet room | Background noise |
| Natural pace | Rushed or monotone |
| Clear diction | Mumbling |
| Varied content | Repetitive phrases |
Default Voice
When --voice is omitted, a built-in default voice is used:
speak "Hello world" --stream # Uses default voice
Emotion Tags
Tags produce audible effects (actual sounds), not spoken words:
speak "[sigh] Monday again." --stream
# Output: (sigh sound) "Monday again."
| Tag | Effect |
|---|---|
[laugh] |
Laughter |
[chuckle] |
Light chuckle |
[sigh] |
Sighing |
[gasp] |
Gasping |
[groan] |
Groaning |
[clear throat] |
Throat clearing |
[cough] |
Coughing |
[crying] |
Crying |
[singing] |
Sung speech |
NOT supported: [pause], [whisper] (ignored)
For pauses: Use punctuation: "Wait... let me think."
Batch Processing
mkdir -p ~/Audio/book/
speak ch01.txt ch02.txt ch03.txt --output-dir ~/Audio/book/
# Creates: ch01.wav, ch02.wav, ch03.wav
# With auto-chunking (for long files)
speak chapters/*.txt --output-dir ~/Audio/book/ --auto-chunk
# Skip completed files
speak chapters/*.txt --output-dir ~/Audio/book/ --skip-existing
Auto-Chunk Behavior
When using --auto-chunk with batch processing:
- Each input file is chunked independently
- Chunks are generated and automatically concatenated per file
- Final output: one
.wavper input file (e.g.,ch01.wav) - Intermediate chunks deleted (unless
--keep-chunks)
You don't need to manually concatenate chunks — only concatenate final chapter files.
Concatenating Audio
# Explicit order (recommended)
speak concat ch01.wav ch02.wav ch03.wav --output book.wav
# Glob pattern (REQUIRES zero-padded filenames)
speak concat audiobook/*.wav --output book.wav
Zero-Padding Rules
Critical for correct concatenation order:
| Files | Correct | Wrong |
|---|---|---|
| 1-9 | 01, 02, ..., 09 |
1, 2, ..., 9 |
| 10-99 | 01, 02, ..., 99 |
1, 10, 2, ... |
| 100+ | 001, 002, ..., 999 |
1, 100, 2, ... |
Why: Shell glob expansion sorts alphabetically. 1, 10, 2 vs 01, 02, 10.
PDF to Audiobook (Complete Workflow)
Step 1: Find Chapter Boundaries
# Preview table of contents
pdftotext -f 1 -l 5 textbook.pdf toc.txt
cat toc.txt # Note chapter page numbers
# Or search for "Chapter" markers
pdftotext textbook.pdf - | grep -n "Chapter"
Step 2: Extract Chapters (Zero-Padded!)
# For 100-page book with ~10 chapters
pdftotext -f 1 -l 12 -layout textbook.pdf ch01.txt
pdftotext -f 13 -l 25 -layout textbook.pdf ch02.txt
pdftotext -f 26 -l 38 -layout textbook.pdf ch03.txt
# ... continue for all chapters
Step 3: Estimate Time
speak --estimate ch*.txt
# Shows: total audio duration, generation time, storage needed
# Quick estimates:
# 1 page ≈ 2 min audio ≈ 1 min generation
# 100 pages ≈ 200 min audio ≈ 100 min generation ≈ 500 MB
Step 4: Generate Audio
mkdir -p audiobook/
speak ch01.txt ch02.txt ch03.txt --output-dir audiobook/ --auto-chunk
# Creates: audiobook/ch01.wav, audiobook/ch02.wav, audiobook/ch03.wav
Step 5: Concatenate
speak concat audiobook/ch01.wav audiobook/ch02.wav audiobook/ch03.wav --output complete_audiobook.wav
# Or with glob (only if zero-padded):
speak concat audiobook/ch*.wav --output complete_audiobook.wav
PDF Troubleshooting
| Issue | Solution |
|---|---|
| Empty/garbled text | Scanned PDF — use OCR: brew install tesseract |
| Wrong encoding | Try: pdftotext -enc UTF-8 doc.pdf |
| Check word count | pdftotext doc.pdf - | wc -w (should be >100) |
Multi-Voice Content
mkdir -p podcast/scripts podcast/wav
echo "Welcome to the show." > podcast/scripts/01_host.txt
echo "Thanks for having me." > podcast/scripts/02_guest.txt
speak podcast/scripts/01_host.txt --voice ~/.chatter/voices/host.wav --output podcast/wav/01.wav
speak podcast/scripts/02_guest.txt --voice ~/.chatter/voices/guest.wav --output podcast/wav/02.wav
speak concat podcast/wav/01.wav podcast/wav/02.wav --output podcast.wav
Options Reference
| Option | Description | Default |
|---|---|---|
--stream |
Stream as it generates | false |
--play |
Play after complete | false |
--output <path> |
Output file | ~/Audio/speak/ |
--output-dir <dir> |
Batch output directory | - |
--voice <path> |
Voice sample (full path) | default |
--timeout <sec> |
Timeout per file | 300 |
--auto-chunk |
Split long documents | false |
--chunk-size <n> |
Chars per chunk | 6000 |
--resume <file> |
Resume from manifest | - |
--keep-chunks |
Keep intermediate files | false |
--skip-existing |
Skip if output exists | false |
--estimate |
Show duration estimate | false |
--dry-run |
Preview only | false |
--quiet |
Suppress output | false |
Commands
| Command | Description |
|---|---|
speak setup |
Set up environment |
speak health |
Check system status |
speak models |
List TTS models |
speak concat |
Concatenate audio |
speak daemon kill |
Stop TTS server |
speak config |
Show configuration |
Performance
| Metric | Value |
|---|---|
| Cold start | ~4-8s |
| Warm start | ~3-8s |
| Speed | 0.3-0.5x RTF (faster than real-time) |
| Storage | ~2.5 MB/min, ~150 MB/hour |
Resume Capability
For interrupted long generations:
# Single file with auto-chunk — use --resume
speak long.txt --auto-chunk --output book.wav
# If interrupted, manifest saved at ~/Audio/speak/manifest.json
speak --resume ~/Audio/speak/manifest.json
# Batch processing — use --skip-existing
speak ch*.txt --output-dir audiobook/ --auto-chunk
# If interrupted, re-run same command:
speak ch*.txt --output-dir audiobook/ --auto-chunk --skip-existing
Common Errors
| Error | Cause | Solution |
|---|---|---|
| "Voice file not found" | Relative path | Use full path: ~/.chatter/voices/x.wav |
| "Invalid WAV format" | Wrong specs | Convert: ffmpeg -i in.wav -ar 24000 -ac 1 out.wav |
| "Voice sample too short" | <10 seconds | Record 15-25 seconds |
| "Output directory doesn't exist" | Not created | mkdir -p dirname/ |
| "sox not found" | Not installed | brew install sox |
| Scrambled concat order | Non-zero-padded | Use 01, 02, not 1, 2 |
| Timeout | >5 min generation | Use --auto-chunk or --timeout 600 |
| "Server not running" | Stale daemon | speak daemon kill && speak health |
Setup
How to use speak-tts on Cursor
AI-first code editor with Composer
Prerequisites
Before installing skills in Cursor, ensure your development environment meets these requirements:
- ›Cursor installed and configured on your development machine
- ›Node.js version 16.0+ with npm package manager (verify with
node --version) - ›Active project directory or workspace where you want to add speak-tts
Execute installation command
Execute the skills CLI command in your project's root directory to begin installation:
The skills CLI fetches speak-tts from GitHub repository emzod/speak and configures it for Cursor.
Select Cursor when prompted
The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:
Verify installation
Confirm successful installation by checking the skill directory location:
Reload or restart Cursor to activate speak-tts. Access the skill through slash commands (e.g., /speak-tts) or your agent's skill management interface.
Security & Verification Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.
List & Monetize Your Skill
Submit your Claude Code skill and start earning
Use Cases▌
User Story & Requirements Generation
Create detailed user stories, acceptance criteria, and feature specs
Example
Generate user stories for 'password reset feature' with acceptance criteria, edge cases, and test scenarios
Reduce spec writing time by 50%, ensure comprehensive coverage
Competitive Analysis
Research competitors, compare features, identify gaps
Example
Analyze 5 competitor products, create feature comparison matrix, suggest differentiation opportunities
Complete competitive research in 2 hours instead of 2 days
Roadmap Prioritization
Evaluate features using frameworks (RICE, ICE, Kano) and create prioritized backlogs
Example
Score 20 feature ideas using RICE framework, generate prioritized roadmap with rationale
Make data-driven prioritization decisions faster
Stakeholder Communication
Draft PRDs, status updates, and stakeholder presentations
Example
Create executive summary of Q3 roadmap, monthly progress report, feature launch announcement
Save 3-5 hours/week on communication overhead
Implementation Guide▌
Prerequisites
- ›Claude Desktop or compatible AI client
- ›Access to product documentation and roadmap tools (Jira, Notion, etc.)
- ›Understanding of product management frameworks (RICE, Jobs-to-be-Done, etc.)
- ›Stakeholder contact information and communication channels
Time Estimate
30-60 minutes to see productivity improvements
Installation Steps
- 1.Install product management skill
- 2.Start with user story generation for known feature
- 3.Progress to competitive analysis: research 2-3 competitors
- 4.Use for roadmap prioritization: apply RICE/ICE scoring
- 5.Draft stakeholder communications and refine based on feedback
- 6.Build template library for recurring PM tasks
- 7.Share effective prompts with product team
Common Pitfalls
- ⚠Not validating competitive research—verify facts before sharing
- ⚠Accepting user stories without involving engineering team
- ⚠Over-relying on frameworks without qualitative judgment
- ⚠Not customizing outputs to company culture and communication style
- ⚠Skipping stakeholder validation of generated requirements
Best Practices▌
✓ Do
- +Validate research and competitive analysis with real data
- +Collaborate with engineering when generating technical requirements
- +Customize frameworks and templates to your company context
- +Use skill for first drafts, refine with stakeholder input
- +Document successful prompt patterns for PM tasks
- +Combine AI efficiency with human judgment and intuition
✗ Don't
- −Don't publish competitive analysis without fact-checking
- −Don't finalize user stories without engineering review
- −Don't make prioritization decisions solely on AI scoring
- −Don't skip customer validation of generated requirements
- −Don't ignore company-specific context and culture
💡 Pro Tips
- ★Provide context: company goals, constraints, customer feedback
- ★Ask for alternatives: 'Show 3 ways to prioritize this roadmap'
- ★Request stakeholder-specific formatting: 'Executive summary vs. engineering spec'
- ★Use skill for 70% generation + 30% customization to company needs
When to Use This▌
✓ Use When
Use for user story writing, competitive research, roadmap prioritization, stakeholder communication, and PRD drafting. Best for reducing repetitive documentation and research work.
✗ Avoid When
Avoid for strategic product vision (requires deep customer empathy), pricing decisions (needs market and financial expertise), or when face-to-face customer discovery is more valuable than speed.
Learning Path▌
- 1Basic: user stories, feature specs, status updates
- 2Intermediate: competitive analysis, prioritization frameworks, PRDs
- 3Advanced: product strategy, go-to-market planning, OKR setting
- 4Expert: product vision, market positioning, business model innovation
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
Ratings
4.6★★★★★30 reviews- ★★★★★Hana Bhatia· Dec 28, 2024
Registry listing for speak-tts matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Ganesh Mohane· Dec 24, 2024
speak-tts reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Ira Patel· Dec 20, 2024
We added speak-tts from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Hana Mehta· Dec 4, 2024
Useful defaults in speak-tts — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Ishan Dixit· Nov 23, 2024
speak-tts has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Isabella Iyer· Nov 11, 2024
Keeps context tight: speak-tts is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Ira Perez· Oct 14, 2024
Solid pick for teams standardizing on skills: speak-tts is focused, and the summary matches what you get after install.
- ★★★★★Henry White· Oct 2, 2024
speak-tts is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Mia White· Sep 25, 2024
Registry listing for speak-tts matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Jin Jackson· Sep 21, 2024
Useful defaults in speak-tts — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
showing 1-10 of 30