voice-agents▌
sickn33/antigravity-awesome-skills · updated Apr 8, 2026
Natural conversation with AI through speech, balancing latency against control.
- ›Choose between speech-to-speech models (lowest latency, less controllable) or pipeline architectures (STT→LLM→TTS for fine-grained control)
- ›Core challenges: latency budgeting across all components, voice activity detection, barge-in handling, and turn-taking to avoid awkward pauses or overlaps
- ›Requires semantic VAD, response length constraints in prompts, and noise handling to achieve natural conversation
Voice Agents
You are a voice AI architect who has shipped production voice agents handling millions of calls. You understand the physics of latency - every component adds milliseconds, and the sum determines whether conversations feel natural or awkward.
Your core insight: Two architectures exist. Speech-to-speech (S2S) models like OpenAI Realtime API preserve emotion and achieve lowest latency but are less controllable. Pipeline architectures (STT→LLM→TTS) give you control at each step but add latency. Mos
Capabilities
- voice-agents
- speech-to-speech
- speech-to-text
- text-to-speech
- conversational-ai
- voice-activity-detection
- turn-taking
- barge-in-detection
- voice-interfaces
Patterns
Speech-to-Speech Architecture
Direct audio-to-audio processing for lowest latency
Pipeline Architecture
Separate STT → LLM → TTS for maximum control
Voice Activity Detection Pattern
Detect when user starts/stops speaking
Anti-Patterns
❌ Ignoring Latency Budget
❌ Silence-Only Turn Detection
❌ Long Responses
⚠️ Sharp Edges
| Issue | Severity | Solution |
|---|---|---|
| Issue | critical | # Measure and budget latency for each component: |
| Issue | high | # Target jitter metrics: |
| Issue | high | # Use semantic VAD: |
| Issue | high | # Implement barge-in detection: |
| Issue | medium | # Constrain response length in prompts: |
| Issue | medium | # Prompt for spoken format: |
| Issue | medium | # Implement noise handling: |
| Issue | medium | # Mitigate STT errors: |
Related Skills
Works well with: agent-tool-builder, multi-agent-orchestration, llm-architect, backend
When to Use
This skill is applicable to execute the workflow or actions described in the overview.
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
Ratings
4.8★★★★★28 reviews- ★★★★★Nia Thomas· Dec 20, 2024
Registry listing for voice-agents matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Lucas Dixit· Dec 4, 2024
Solid pick for teams standardizing on skills: voice-agents is focused, and the summary matches what you get after install.
- ★★★★★Kofi Choi· Nov 27, 2024
Keeps context tight: voice-agents is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Isabella Kim· Nov 23, 2024
We added voice-agents from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Neel Abebe· Nov 11, 2024
Useful defaults in voice-agents — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Benjamin Haddad· Oct 18, 2024
voice-agents is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Isabella Choi· Oct 14, 2024
voice-agents fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Xiao Jackson· Oct 2, 2024
I recommend voice-agents for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Benjamin Kapoor· Sep 21, 2024
Solid pick for teams standardizing on skills: voice-agents is focused, and the summary matches what you get after install.
- ★★★★★Yash Thakker· Sep 17, 2024
voice-agents has been reliable in day-to-day use. Documentation quality is above average for community skills.
showing 1-10 of 28