tts▌
16 indexed skills · max 10 per page
alicloud-ai-audio-tts
cinience/alicloud-skills · Cloud
Category: provider
alicloud-ai-audio-tts-voice-clone
cinience/alicloud-skills · Cloud
Voice cloning and text-to-speech synthesis using Alibaba Cloud Qwen TTS VC models. \n \n Supports two model variants: standard batch processing ( qwen3-tts-vc-2026-01-22 ) and real-time streaming ( qwen3-tts-vc-realtime-2026-01-15 ) \n Accepts voice samples as file paths or raw bytes; generates cloned voice IDs for reuse across multiple synthesis requests \n Normalized interface handles text input, voice enrollment, optional streaming output, and returns audio URLs or PCM chunks \n Requires DASH
tts
marswaveai/skills · Productivity
Convert text to natural-sounding speech with single or multi-speaker audio generation. \n \n Two modes: Quick mode for instant single-voice MP3 output, and Script mode for multi-speaker dialogue with per-character voice assignment \n Automatic mode detection based on input structure; supports both plain text and structured scripts with character markers \n Built-in speaker selection with language support (Chinese and English) and preference saving to local config \n Configurable output modes: in
speak-tts
emzod/speak · Productivity
Real-time text-to-speech with voice cloning on Apple Silicon, entirely on-device. \n \n Supports multiple input sources (text files, markdown, stdin, web articles, PDFs) and output modes (streaming, file save, playback, or both) \n Voice cloning from 10–30 second WAV samples at 24000 Hz mono; includes emotion tags like [laugh] , [sigh] , and [gasp] for audible effects \n Batch processing with auto-chunking for long documents, concatenation utilities, and resume capability for interrupted generat
speakturbo-tts
emzod/speak-turbo · Productivity
Ultra-fast text-to-speech with ~90ms latency and 8 built-in voices. \n \n Delivers audio in approximately 90ms after daemon warmup, with first run taking 2-5 seconds for model initialization \n Includes 8 pre-configured voices (alba, marius, javert, jean, fantine, cosette, eponine, azelma) accessible via simple command-line flags \n Supports file output with configurable directory allowlisting, quiet mode, and UTF-8 text input including long-form content \n Auto-starting daemon with 1-hour idle
tts
noizai/skills · Productivity
Text-to-speech with dual backends, voice cloning, and timeline-accurate audio synthesis for dubbing and video narration. \n \n Supports two backends: Kokoro (local, offline) for simple speech synthesis, and Noiz (cloud) for voice cloning, emotion control, and precise segment timing \n Simple mode converts text, files, or URLs to audio with optional voice cloning from reference audio; timeline mode aligns speech to SRT subtitles with per-segment voice and emotion control \n Voice maps enable gran