speech-to-text▌
inference-sh/skills · updated Apr 8, 2026
Transcribe audio to text via inference.sh CLI.
Speech-to-Text
Transcribe audio to text via inference.sh CLI.

Quick Start
Requires inference.sh CLI (
infsh). Install instructions
infsh login
infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://audio.mp3"}'
Available Models
| Model | App ID | Best For |
|---|---|---|
| ElevenLabs Scribe v2 | elevenlabs/stt |
98%+ accuracy, diarization, 90+ languages |
| Fast Whisper V3 | infsh/fast-whisper-large-v3 |
Fast transcription |
| Whisper V3 Large | infsh/whisper-v3-large |
Highest accuracy |
Examples
Basic Transcription
infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://meeting.mp3"}'
With Timestamps
infsh app sample infsh/fast-whisper-large-v3 --save input.json
# {
# "audio_url": "https://podcast.mp3",
# "timestamps": true
# }
infsh app run infsh/fast-whisper-large-v3 --input input.json
Translation (to English)
infsh app run infsh/whisper-v3-large --input '{
"audio_url": "https://french-audio.mp3",
"task": "translate"
}'
From Video
# Extract audio from video first
infsh app run infsh/video-audio-extractor --input '{"video_url": "https://video.mp4"}' > audio.json
# Transcribe the extracted audio
infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "<audio-url>"}'
Workflow: Video Subtitles
# 1. Transcribe video audio
infsh app run infsh/fast-whisper-large-v3 --input '{
"audio_url": "https://video.mp4",
"timestamps": true
}' > transcript.json
# 2. Use transcript for captions
infsh app run infsh/caption-videos --input '{
"video_url": "https://video.mp4",
"captions": "<transcript-from-step-1>"
}'
Supported Languages
Whisper supports 99+ languages including: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Hindi, Russian, and many more.
Use Cases
- Meetings: Transcribe recordings
- Podcasts: Generate transcripts
- Subtitles: Create captions for videos
- Voice Notes: Convert to searchable text
- Interviews: Transcription for research
- Accessibility: Make audio content accessible
Output Format
Returns JSON with:
text: Full transcriptionsegments: Timestamped segments (if requested)language: Detected language
Related Skills
# ElevenLabs STT (98%+ accuracy, diarization)
npx skills add inference-sh/skills@elevenlabs-stt
# ElevenLabs TTS (reverse direction)
npx skills add inference-sh/skills@elevenlabs-tts
# Full platform skill (all 150+ apps)
npx skills add inference-sh/skills@infsh-cli
# Text-to-speech (reverse direction)
npx skills add inference-sh/skills@text-to-speech
# Video generation (add captions)
npx skills add inference-sh/skills@ai-video-generation
# AI avatars (lipsync with transcripts)
npx skills add inference-sh/skills@ai-avatar-video
Browse all audio apps: infsh app list --category audio
Documentation
- Running Apps - How to run apps via CLI
- Audio Transcription Example - Complete transcription guide
- Apps Overview - Understanding the app ecosystem
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
Ratings
4.8★★★★★25 reviews- ★★★★★Naina Okafor· Dec 24, 2024
I recommend speech-to-text for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Ganesh Mohane· Dec 16, 2024
Useful defaults in speech-to-text — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Amina Zhang· Dec 4, 2024
Solid pick for teams standardizing on skills: speech-to-text is focused, and the summary matches what you get after install.
- ★★★★★Alexander Abebe· Nov 23, 2024
We added speech-to-text from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Alexander Farah· Oct 14, 2024
speech-to-text fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Isabella Martin· Sep 21, 2024
Solid pick for teams standardizing on skills: speech-to-text is focused, and the summary matches what you get after install.
- ★★★★★Rahul Santra· Sep 13, 2024
speech-to-text has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Aarav Zhang· Aug 24, 2024
Keeps context tight: speech-to-text is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Isabella Taylor· Aug 12, 2024
speech-to-text has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Pratham Ware· Aug 4, 2024
Solid pick for teams standardizing on skills: speech-to-text is focused, and the summary matches what you get after install.
showing 1-10 of 25