speech-to-text

inference-sh/skills · updated Apr 8, 2026

$npx skills add https://github.com/inference-sh/skills --skill speech-to-text
0 commentsdiscussion
summary

Transcribe audio to text via inference.sh CLI.

skill.md

Speech-to-Text

Transcribe audio to text via inference.sh CLI.

Speech-to-Text

Quick Start

Requires inference.sh CLI (infsh). Install instructions

infsh login

infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://audio.mp3"}'

Available Models

Model App ID Best For
ElevenLabs Scribe v2 elevenlabs/stt 98%+ accuracy, diarization, 90+ languages
Fast Whisper V3 infsh/fast-whisper-large-v3 Fast transcription
Whisper V3 Large infsh/whisper-v3-large Highest accuracy

Examples

Basic Transcription

infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://meeting.mp3"}'

With Timestamps

infsh app sample infsh/fast-whisper-large-v3 --save input.json

# {
#   "audio_url": "https://podcast.mp3",
#   "timestamps": true
# }

infsh app run infsh/fast-whisper-large-v3 --input input.json

Translation (to English)

infsh app run infsh/whisper-v3-large --input '{
  "audio_url": "https://french-audio.mp3",
  "task": "translate"
}'

From Video

# Extract audio from video first
infsh app run infsh/video-audio-extractor --input '{"video_url": "https://video.mp4"}' > audio.json

# Transcribe the extracted audio
infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "<audio-url>"}'

Workflow: Video Subtitles

# 1. Transcribe video audio
infsh app run infsh/fast-whisper-large-v3 --input '{
  "audio_url": "https://video.mp4",
  "timestamps": true
}' > transcript.json

# 2. Use transcript for captions
infsh app run infsh/caption-videos --input '{
  "video_url": "https://video.mp4",
  "captions": "<transcript-from-step-1>"
}'

Supported Languages

Whisper supports 99+ languages including: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Hindi, Russian, and many more.

Use Cases

  • Meetings: Transcribe recordings
  • Podcasts: Generate transcripts
  • Subtitles: Create captions for videos
  • Voice Notes: Convert to searchable text
  • Interviews: Transcription for research
  • Accessibility: Make audio content accessible

Output Format

Returns JSON with:

  • text: Full transcription
  • segments: Timestamped segments (if requested)
  • language: Detected language

Related Skills

# ElevenLabs STT (98%+ accuracy, diarization)
npx skills add inference-sh/skills@elevenlabs-stt

# ElevenLabs TTS (reverse direction)
npx skills add inference-sh/skills@elevenlabs-tts

# Full platform skill (all 150+ apps)
npx skills add inference-sh/skills@infsh-cli

# Text-to-speech (reverse direction)
npx skills add inference-sh/skills@text-to-speech

# Video generation (add captions)
npx skills add inference-sh/skills@ai-video-generation

# AI avatars (lipsync with transcripts)
npx skills add inference-sh/skills@ai-avatar-video

Browse all audio apps: infsh app list --category audio

Documentation

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.
general reviews

Ratings

4.825 reviews
  • Naina Okafor· Dec 24, 2024

    I recommend speech-to-text for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Ganesh Mohane· Dec 16, 2024

    Useful defaults in speech-to-text — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Amina Zhang· Dec 4, 2024

    Solid pick for teams standardizing on skills: speech-to-text is focused, and the summary matches what you get after install.

  • Alexander Abebe· Nov 23, 2024

    We added speech-to-text from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.

  • Alexander Farah· Oct 14, 2024

    speech-to-text fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

  • Isabella Martin· Sep 21, 2024

    Solid pick for teams standardizing on skills: speech-to-text is focused, and the summary matches what you get after install.

  • Rahul Santra· Sep 13, 2024

    speech-to-text has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Aarav Zhang· Aug 24, 2024

    Keeps context tight: speech-to-text is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Isabella Taylor· Aug 12, 2024

    speech-to-text has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Pratham Ware· Aug 4, 2024

    Solid pick for teams standardizing on skills: speech-to-text is focused, and the summary matches what you get after install.

showing 1-10 of 25

1 / 3