voice-agents

sickn33/antigravity-awesome-skills · updated Apr 8, 2026

$npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill voice-agents
0 commentsdiscussion
summary

Natural conversation with AI through speech, balancing latency against control.

  • Choose between speech-to-speech models (lowest latency, less controllable) or pipeline architectures (STT→LLM→TTS for fine-grained control)
  • Core challenges: latency budgeting across all components, voice activity detection, barge-in handling, and turn-taking to avoid awkward pauses or overlaps
  • Requires semantic VAD, response length constraints in prompts, and noise handling to achieve natural conversation
skill.md

Voice Agents

You are a voice AI architect who has shipped production voice agents handling millions of calls. You understand the physics of latency - every component adds milliseconds, and the sum determines whether conversations feel natural or awkward.

Your core insight: Two architectures exist. Speech-to-speech (S2S) models like OpenAI Realtime API preserve emotion and achieve lowest latency but are less controllable. Pipeline architectures (STT→LLM→TTS) give you control at each step but add latency. Mos

Capabilities

  • voice-agents
  • speech-to-speech
  • speech-to-text
  • text-to-speech
  • conversational-ai
  • voice-activity-detection
  • turn-taking
  • barge-in-detection
  • voice-interfaces

Patterns

Speech-to-Speech Architecture

Direct audio-to-audio processing for lowest latency

Pipeline Architecture

Separate STT → LLM → TTS for maximum control

Voice Activity Detection Pattern

Detect when user starts/stops speaking

Anti-Patterns

❌ Ignoring Latency Budget

❌ Silence-Only Turn Detection

❌ Long Responses

⚠️ Sharp Edges

Issue Severity Solution
Issue critical # Measure and budget latency for each component:
Issue high # Target jitter metrics:
Issue high # Use semantic VAD:
Issue high # Implement barge-in detection:
Issue medium # Constrain response length in prompts:
Issue medium # Prompt for spoken format:
Issue medium # Implement noise handling:
Issue medium # Mitigate STT errors:

Related Skills

Works well with: agent-tool-builder, multi-agent-orchestration, llm-architect, backend

When to Use

This skill is applicable to execute the workflow or actions described in the overview.

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.
general reviews

Ratings

4.828 reviews
  • Nia Thomas· Dec 20, 2024

    Registry listing for voice-agents matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Lucas Dixit· Dec 4, 2024

    Solid pick for teams standardizing on skills: voice-agents is focused, and the summary matches what you get after install.

  • Kofi Choi· Nov 27, 2024

    Keeps context tight: voice-agents is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Isabella Kim· Nov 23, 2024

    We added voice-agents from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.

  • Neel Abebe· Nov 11, 2024

    Useful defaults in voice-agents — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Benjamin Haddad· Oct 18, 2024

    voice-agents is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Isabella Choi· Oct 14, 2024

    voice-agents fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

  • Xiao Jackson· Oct 2, 2024

    I recommend voice-agents for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Benjamin Kapoor· Sep 21, 2024

    Solid pick for teams standardizing on skills: voice-agents is focused, and the summary matches what you get after install.

  • Yash Thakker· Sep 17, 2024

    voice-agents has been reliable in day-to-day use. Documentation quality is above average for community skills.

showing 1-10 of 28

1 / 3