text-to-speech▌
inferen-sh/skills · updated Apr 8, 2026
Multiple text-to-speech models via inference.sh CLI for voiceovers, podcasts, and accessibility.
- ›Six models available: ElevenLabs (premium, 22+ voices, 32 languages), DIA TTS (conversational), Kokoro TTS (fast), Chatterbox, Higgs Audio (emotional control), and VibeVoice (long-form podcasts)
- ›Core capabilities include basic speech synthesis, expressive speech with emotion control, and conversational dialogue generation
- ›Easily combine with video tools like OmniHuman to create talking he
Text-to-Speech
Convert text to natural speech via inference.sh CLI.

Quick Start
Requires inference.sh CLI (
infsh). Install instructions
infsh login
# Generate speech
infsh app run infsh/kokoro-tts --input '{"text": "Hello, welcome to our product demo."}'
Available Models
| Model | App ID | Best For |
|---|---|---|
| ElevenLabs TTS | elevenlabs/tts |
Premium quality, 22+ voices, 32 languages |
| DIA TTS | infsh/dia-tts |
Conversational, expressive |
| Kokoro TTS | infsh/kokoro-tts |
Fast, natural |
| Chatterbox | infsh/chatterbox |
General purpose |
| Higgs Audio | infsh/higgs-audio |
Emotional control |
| VibeVoice | infsh/vibevoice |
Podcasts, long-form |
Browse All Audio Apps
infsh app list --category audio
Examples
Basic Text-to-Speech
infsh app run infsh/kokoro-tts --input '{"text": "Welcome to our tutorial."}'
Conversational TTS with DIA
infsh app sample infsh/dia-tts --save input.json
# Edit input.json:
# {
# "text": "Hey! How are you doing today? I'm really excited to share this with you.",
# "voice": "conversational"
# }
infsh app run infsh/dia-tts --input input.json
Long-form Audio (Podcasts)
infsh app sample infsh/vibevoice --save input.json
# Edit input.json with your podcast script
infsh app run infsh/vibevoice --input input.json
Expressive Speech with Higgs
infsh app sample infsh/higgs-audio --save input.json
# {
# "text": "This is absolutely incredible!",
# "emotion": "excited"
# }
infsh app run infsh/higgs-audio --input input.json
Use Cases
- Voiceovers: Product demos, explainer videos
- Audiobooks: Convert text to spoken word
- Podcasts: Generate podcast episodes
- Accessibility: Make content accessible
- IVR: Phone system voice prompts
- Video Narration: Add narration to videos
Combine with Video
Generate speech, then create a talking head video:
# 1. Generate speech
infsh app run infsh/kokoro-tts --input '{"text": "Your script here"}' > speech.json
# 2. Use the audio URL with OmniHuman for avatar video
infsh app run bytedance/omnihuman-1-5 --input '{
"image_url": "https://portrait.jpg",
"audio_url": "<audio-url-from-step-1>"
}'
Related Skills
# ElevenLabs TTS (premium, 22+ voices)
npx skills add inference-sh/skills@elevenlabs-tts
# ElevenLabs dialogue (multi-speaker)
npx skills add inference-sh/skills@elevenlabs-dialogue
# Full platform skill (all 150+ apps)
npx skills add inference-sh/skills@infsh-cli
# AI avatars (combine TTS with talking heads)
npx skills add inference-sh/skills@ai-avatar-video
# AI music generation
npx skills add inference-sh/skills@ai-music-generation
# Speech-to-text (transcription)
npx skills add inference-sh/skills@speech-to-text
# Video generation
npx skills add inference-sh/skills@ai-video-generation
Browse all apps: infsh app list
Documentation
- Running Apps - How to run apps via CLI
- Audio Transcription Example - Audio processing workflows
- Apps Overview - Understanding the app ecosystem
Ratings
4.5★★★★★63 reviews- ★★★★★Nikhil Sanchez· Dec 24, 2024
text-to-speech is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Soo Patel· Dec 24, 2024
Useful defaults in text-to-speech — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Nikhil Martin· Dec 20, 2024
Solid pick for teams standardizing on skills: text-to-speech is focused, and the summary matches what you get after install.
- ★★★★★Arya Choi· Dec 16, 2024
We added text-to-speech from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Emma Haddad· Dec 12, 2024
I recommend text-to-speech for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Chinedu Ramirez· Dec 4, 2024
Keeps context tight: text-to-speech is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Nikhil Rahman· Dec 4, 2024
text-to-speech fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Soo Sethi· Nov 23, 2024
text-to-speech has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Emma Khan· Nov 23, 2024
Registry listing for text-to-speech matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Nikhil Park· Nov 19, 2024
Useful defaults in text-to-speech — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
showing 1-10 of 63