explainx.ainewsletter3.4k
trending🔥loopsskills
pricing
workshops ↗
explainx.ai

Learn to lead teams that combine humans and agents. Platform access, live workshops, bootcamps, and 50+ courses — plus skills, tools, and MCP to practice what you learn.

follow us

custom AI agents

[email protected]

get started

Join · $29/moUpcoming workshop

learn

platform · $29/moupcoming workshopworkshopsbootcampscoursescertificationscertification testsexplainx universitycorporate trainingfacilitatorshackathonslearn skills & mcp

discover

skillstoolsagentsmcp serversdesignsllmsagiranks

content

releasesvisionmissionaboutteamcareersresourcespromptsgenerators hubgenerator SEO hubprompt templatesprompt guidesblogfor LLMsdemo

Sister Products

Infloq

Infloq

Influencer marketing

BgBlur

BgBlur

Privacy-first blur

Olly Social

Olly Social

Social AI copilot

Ceptory

Ceptory

Video intelligence

BgRemover

BgRemover

Background removal

newsletter · weekly

Get AI news, tools, and insights in your inbox.

contactsupportprivacytermsdata rightssubmission guidelines

© 2026 AISOLO Technologies Pvt Ltd

skills/tag/speech
tag

speech▌

11 indexed skills · max 10 per page

skills (11)

speech-to-text

elevenlabs/skills · Productivity

2

Transcribe audio and video to text with speaker identification, word-level timestamps, and 90+ language support. \n \n Two models available: scribe_v2 for batch transcription with high accuracy, and scribe_v2_realtime for live transcription with ~150ms latency \n Speaker diarization labels each word with speaker ID; keyterm prompting helps recognize domain-specific vocabulary and proper nouns \n Word-level timestamps include type classification (word, spacing, audio event) for precise timing and

text-to-speech

inference-sh/skills · Productivity

1

Convert text to natural speech via inference.sh CLI.

text-to-speech

heygen-com/skills · Productivity

1

Generate speech audio from text using HeyGen's Starfish TTS model with voice, speed, and pitch control. \n \n List available TTS voices by language and gender, then generate audio files with customizable speed (0.5–1.5) and pitch (−50 to 50) \n Supports multilingual voices with locale selection (e.g., pt-BR ) and SSML-style break tags for pauses within text \n Returns audio URL, duration, request ID, and word-level timestamps for caption syncing or timed overlays \n Requires HEYGEN_API_KEY envir

speech-recognition

dpearson2699/swift-ios-skills · Productivity

1

Transcribe live and pre-recorded audio to text using Apple's Speech framework. Covers SFSpeechRecognizer (iOS 10+) and the new SpeechAnalyzer API (iOS 26+).

nemotron-speech

nvidia/skills · nemotron

0

Routes NVIDIA Nemotron Speech (Riva) NIM tasks — deploys, runs, and tests ASR, TTS, and NMT NIMs on build.nvidia.com or self-hosted.

speech-to-text

martinholovsky/claude-skills-generator · Productivity

0

File Organization: Split structure. See references/ for detailed implementations.

speech-to-text

inference-sh/skills · Productivity

0

Transcribe audio to text via inference.sh CLI.

speech

openai/skills · Productivity

0

Text-to-speech generation for narration, voiceovers, IVR prompts, and accessibility reads via OpenAI Audio API. \n \n Supports single clips and batch processing; defaults to gpt-4o-mini-tts-2025-12-15 with built-in voices (cedar, marin, and others) \n Includes instruction augmentation for voice affect, tone, pacing, emotion, and emphasis; instructions supported only on GPT-4o mini TTS models \n Enforces 4096-character input limit per request and 50 requests/minute rate cap; splits longer text in

text-to-speech

elevenlabs/skills · Productivity

0

Natural speech synthesis from text across 70+ languages with multiple quality and latency models. \n \n Six models available ranging from highest-quality eleven_v3 to ultra-low-latency eleven_flash_v2_5 (~75ms), with language and speed tradeoffs documented \n Supports 13+ output formats including MP3, PCM, WAV, Opus, and telephony codecs (μ-law, A-law) for web, streaming, and real-time applications \n Fine-tune voice characteristics via stability, similarity boost, style, speaker boost, and spee

text-to-speech

inferen-sh/skills · Productivity

0

Multiple text-to-speech models via inference.sh CLI for voiceovers, podcasts, and accessibility. \n \n Six models available: ElevenLabs (premium, 22+ voices, 32 languages), DIA TTS (conversational), Kokoro TTS (fast), Chatterbox, Higgs Audio (emotional control), and VibeVoice (long-form podcasts) \n Core capabilities include basic speech synthesis, expressive speech with emotion control, and conversational dialogue generation \n Easily combine with video tools like OmniHuman to create talking he

prevpage 1 / 2next