Productivity

text-to-speech

inferen-sh/skills · updated Apr 8, 2026

$npx skills add https://github.com/inferen-sh/skills --skill text-to-speech
summary

Multiple text-to-speech models via inference.sh CLI for voiceovers, podcasts, and accessibility.

  • Six models available: ElevenLabs (premium, 22+ voices, 32 languages), DIA TTS (conversational), Kokoro TTS (fast), Chatterbox, Higgs Audio (emotional control), and VibeVoice (long-form podcasts)
  • Core capabilities include basic speech synthesis, expressive speech with emotion control, and conversational dialogue generation
  • Easily combine with video tools like OmniHuman to create talking he
skill.md

Text-to-Speech

Convert text to natural speech via inference.sh CLI.

Text-to-Speech

Quick Start

Requires inference.sh CLI (infsh). Install instructions

infsh login

# Generate speech
infsh app run infsh/kokoro-tts --input '{"text": "Hello, welcome to our product demo."}'

Available Models

Model App ID Best For
ElevenLabs TTS elevenlabs/tts Premium quality, 22+ voices, 32 languages
DIA TTS infsh/dia-tts Conversational, expressive
Kokoro TTS infsh/kokoro-tts Fast, natural
Chatterbox infsh/chatterbox General purpose
Higgs Audio infsh/higgs-audio Emotional control
VibeVoice infsh/vibevoice Podcasts, long-form

Browse All Audio Apps

infsh app list --category audio

Examples

Basic Text-to-Speech

infsh app run infsh/kokoro-tts --input '{"text": "Welcome to our tutorial."}'

Conversational TTS with DIA

infsh app sample infsh/dia-tts --save input.json

# Edit input.json:
# {
#   "text": "Hey! How are you doing today? I'm really excited to share this with you.",
#   "voice": "conversational"
# }

infsh app run infsh/dia-tts --input input.json

Long-form Audio (Podcasts)

infsh app sample infsh/vibevoice --save input.json

# Edit input.json with your podcast script
infsh app run infsh/vibevoice --input input.json

Expressive Speech with Higgs

infsh app sample infsh/higgs-audio --save input.json

# {
#   "text": "This is absolutely incredible!",
#   "emotion": "excited"
# }

infsh app run infsh/higgs-audio --input input.json

Use Cases

  • Voiceovers: Product demos, explainer videos
  • Audiobooks: Convert text to spoken word
  • Podcasts: Generate podcast episodes
  • Accessibility: Make content accessible
  • IVR: Phone system voice prompts
  • Video Narration: Add narration to videos

Combine with Video

Generate speech, then create a talking head video:

# 1. Generate speech
infsh app run infsh/kokoro-tts --input '{"text": "Your script here"}' > speech.json

# 2. Use the audio URL with OmniHuman for avatar video
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "<audio-url-from-step-1>"
}'

Related Skills

# ElevenLabs TTS (premium, 22+ voices)
npx skills add inference-sh/skills@elevenlabs-tts

# ElevenLabs dialogue (multi-speaker)
npx skills add inference-sh/skills@elevenlabs-dialogue

# Full platform skill (all 150+ apps)
npx skills add inference-sh/skills@infsh-cli

# AI avatars (combine TTS with talking heads)
npx skills add inference-sh/skills@ai-avatar-video

# AI music generation
npx skills add inference-sh/skills@ai-music-generation

# Speech-to-text (transcription)
npx skills add inference-sh/skills@speech-to-text

# Video generation
npx skills add inference-sh/skills@ai-video-generation

Browse all apps: infsh app list

Documentation

general reviews

Ratings

4.563 reviews
  • Nikhil Sanchez· Dec 24, 2024

    text-to-speech is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Soo Patel· Dec 24, 2024

    Useful defaults in text-to-speech — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Nikhil Martin· Dec 20, 2024

    Solid pick for teams standardizing on skills: text-to-speech is focused, and the summary matches what you get after install.

  • Arya Choi· Dec 16, 2024

    We added text-to-speech from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.

  • Emma Haddad· Dec 12, 2024

    I recommend text-to-speech for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Chinedu Ramirez· Dec 4, 2024

    Keeps context tight: text-to-speech is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Nikhil Rahman· Dec 4, 2024

    text-to-speech fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

  • Soo Sethi· Nov 23, 2024

    text-to-speech has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Emma Khan· Nov 23, 2024

    Registry listing for text-to-speech matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Nikhil Park· Nov 19, 2024

    Useful defaults in text-to-speech — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

showing 1-10 of 63

1 / 7