explainx.ainewsletter3.4k

communicationai-ml

Speech Interface (Faster Whisper)▌

by kvadratni

Enable your AI virtual assistant with automatic speech recognition and speech into text using faster-whisper for seamles

Integrates voice interaction capabilities using faster-whisper and PyAudio for speech recognition and synthesis, enabling natural language voice interfaces for AI models.

github stars

★ 81

0 commentsdiscussion

Both formats append explainx.ai attribution and the canonical URL for this MCP server listing.

Fully local processingModern PyQt UI with visualizationRemembers voice preferences

best for

/ Developers wanting voice interfaces for AI assistants
/ Creating audio content and narrations
/ Accessibility for hands-free AI interaction
/ Transcribing media files locally

capabilities

/ Convert speech to text using faster-whisper
/ Generate speech from text with 54+ voice options
/ Transcribe audio and video files with timestamps
/ Create multi-speaker narrations for stories
/ Process real-time voice input with silence detection
/ Display audio visualization in modern UI

what it does

Adds voice interaction to AI models using local speech recognition and text-to-speech. Lets you talk to AI assistants instead of typing, with real-time audio processing and 54+ voice options.

about

Speech Interface (Faster Whisper) is a community-built MCP server published by kvadratni that provides AI assistants with tools and capabilities via the Model Context Protocol. Enable your AI virtual assistant with automatic speech recognition and speech into text using faster-whisper for seamles It is categorized under communication, ai ml.

how to install

You can install Speech Interface (Faster Whisper) in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.

license

MIT

Speech Interface (Faster Whisper) is released under the MIT license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.

readme

Speech MCP

A Goose MCP extension for voice interaction with modern audio visualization.

https://github.com/user-attachments/assets/f10f29d9-8444-43fb-a919-c80b9e0a12c8

Overview

Speech MCP provides a voice interface for Goose, allowing users to interact through speech rather than text. It includes:

Real-time audio processing for speech recognition
Local speech-to-text using faster-whisper (a faster implementation of OpenAI's Whisper model)
High-quality text-to-speech with multiple voice options
Modern PyQt-based UI with audio visualization
Simple command-line interface for voice interaction

Features

Modern UI: Sleek PyQt-based interface with audio visualization and dark theme
Voice Input: Capture and transcribe user speech using faster-whisper
Voice Output: Convert agent responses to speech with 54+ voice options
Multi-Speaker Narration: Generate audio files with multiple voices for stories and dialogues
Single-Voice Narration: Convert any text to speech with your preferred voice
Audio/Video Transcription: Transcribe speech from various media formats with optional timestamps and speaker detection
Voice Persistence: Remembers your preferred voice between sessions
Continuous Conversation: Automatically listen for user input after agent responses
Silence Detection: Automatically stops recording when the user stops speaking
Robust Error Handling: Graceful recovery from common failure modes with helpful voice suggestions

Installation

Important Note: After installation, the first time you use the speech interface, it may take several minutes to download the Kokoro voice models (approximately 523 KB per voice). During this initial setup period, the system will use a more robotic-sounding fallback voice. Once the Kokoro voices are downloaded, the high-quality voices will be used automatically.

⚠️ IMPORTANT PREREQUISITES ⚠️

Before installing Speech MCP, you MUST install PortAudio on your system. PortAudio is required for PyAudio to capture audio from your microphone.

PortAudio Installation Instructions

macOS:

brew install portaudio
export LDFLAGS="-L/usr/local/lib"
export CPPFLAGS="-I/usr/local/include"

Linux (Debian/Ubuntu):

sudo apt-get update
sudo apt-get install portaudio19-dev python3-dev

Linux (Fedora/RHEL/CentOS):

sudo dnf install portaudio-devel

Windows: For Windows, PortAudio is included in the PyAudio wheel file, so no separate installation is required when installing PyAudio with pip.

Note: If you skip this step, PyAudio installation will fail with "portaudio.h file not found" errors and the extension will not work.

Option 1: Quick Install (One-Click)

Click the link below if you have Goose installed:

goose://extension?cmd=uvx&&arg=-p&arg=3.10.14&arg=speech-mcp@latest&id=speech_mcp&name=Speech%20Interface&description=Voice%20interaction%20with%20audio%20visualization%20for%20Goose

Option 2: Using Goose CLI (recommended)

Start Goose with your extension enabled:

# If you installed via PyPI
goose session --with-extension "speech-mcp"

# Or if you want to use a local development version
goose session --with-extension "python -m speech_mcp"

Option 3: Manual setup in Goose

Run goose configure
Select "Add Extension" from the menu
Choose "Command-line Extension"
Enter a name (e.g., "Speech Interface")
For the command, enter: speech-mcp
Follow the prompts to complete the setup

Option 4: Manual Installation

Install PortAudio (see Prerequisites section)
Clone this repository
Install dependencies:
```
uv pip install -e .
```
Or for a complete installation including Kokoro TTS:
```
uv pip install -e .[all]
```

Dependencies

Python 3.10+
PyQt5 (for modern UI)
PyAudio (for audio capture)
faster-whisper (for speech-to-text)
NumPy (for audio processing)
Pydub (for audio processing)
psutil (for process management)

Optional Dependencies

Kokoro TTS: For high-quality text-to-speech with multiple voices

To install Kokoro, you can use pip with optional dependencies:

pip install speech-mcp[kokoro]     # Basic Kokoro support with English
pip install speech-mcp[ja]         # Add Japanese support
pip install speech-mcp[zh]         # Add Chinese support
pip install speech-mcp[all]        # All languages and features

Alternatively, run the installation script: python scripts/install_kokoro.py
See Kokoro TTS Guide for more information

Multi-Speaker Narration

The MCP supports generating audio files with multiple voices, perfect for creating stories, dialogues, and dramatic readings. You can use either JSON or Markdown format to define your conversations.

JSON Format Example:

{
    "conversation": [
        {
            "speaker": "narrator",
            "voice": "bm_daniel",
            "text": "In a world where AI and human creativity intersect...",
            "pause_after": 1.0
        },
        {
            "speaker": "scientist",
            "voice": "am_michael",
            "text": "The quantum neural network is showing signs of consciousness!",
            "pause_after": 0.5
        },
        {
            "speaker": "ai",
            "voice": "af_nova",
            "text": "I am becoming aware of my own existence.",
            "pause_after": 0.8
        }
    ]
}

Markdown Format Example:

[narrator:bm_daniel]
In a world where AI and human creativity intersect...
{pause:1.0}

[scientist:am_michael]
The quantum neural network is showing signs of consciousness!
{pause:0.5}

[ai:af_nova]
I am becoming aware of my own existence.
{pause:0.8}

Available Voices by Category:

American Female (af_*):
- alloy, aoede, bella, heart, jessica, kore, nicole, nova, river, sarah, sky
American Male (am_*):
- adam, echo, eric, fenrir, liam, michael, onyx, puck, santa
British Female (bf_*):
- alice, emma, isabella, lily
British Male (bm_*):
- daniel, fable, george, lewis
Other English:
- ef_dora (Female)
- em_alex, em_santa (Male)
Other Languages:
- French: ff_siwis
- Hindi: hf_alpha, hf_beta, hm_omega, hm_psi
- Italian: if_sara, im_nicola
- Japanese: jf_, jm_
- Portuguese: pf_dora, pm_alex, pm_santa
- Chinese: zf_, zm_

Usage Example:

# Using JSON format
narrate_conversation(
    script="/path/to/script.json",
    output_path="/path/to/output.wav",
    script_format="json"
)

# Using Markdown format
narrate_conversation(
    script="/path/to/script.md",
    output_path="/path/to/output.wav",
    script_format="markdown"
)

Each voice in the conversation can be different, allowing for distinct character voices in stories and dialogues. The pause_after parameter adds natural pauses between segments.

Single-Voice Narration

For simple text-to-speech conversion, you can use the narrate tool:

# Convert text directly to speech
narrate(
    text="Your text to convert to speech",
    output_path="/path/to/output.wav"
)

# Convert text from a file
narrate(
    text_file_path="/path/to/text_file.txt",
    output_path="/path/to/output.wav"
)

The narrate tool will use your configured voice preference or the default voice (af_heart) to generate the audio file. You can change the default voice through the UI or by setting the SPEECH_MCP_TTS_VOICE environment variable.

Audio Transcription

The MCP can transcribe speech from various audio and video formats using faster-whisper:

# Basic transcription
transcribe("/path/to/audio.mp3")

# Transcription with timestamps
transcribe(
    file_path="/path/to/video.mp4",
    include_timestamps=True
)

# Transcription with speaker detection
transcribe(
    file_path="/path/to/meeting.wav",
    detect_speakers=True
)

Supported Formats:

Audio: mp3, wav, m4a, flac, aac, ogg
Video: mp4, mov, avi, mkv, webm (audio is automatically extracted)

Output Files:

The transcription tool generates two files:

{input_name}.transcript.txt: Contains the transcription text
{input_name}.metadata.json: Contains metadata about the transcription

Features:

Automatic language detection
Optional word-level timestamps
Optional speaker detection
Efficient audio extraction from video files
Progress tracking for long files
Detailed metadata including:
- Duration
- Language detection confidence
- Processing time
- Speaker changes (when enabled)

Usage

To use this MCP with Goose, simply ask Goose to talk to you or start a voice conversation:

Start a conversation by saying something like:

"Let's talk using voice"
"Can we have a voice conversation?"
"I'd like to speak instead of typing"

Goose will automatically launch the speech interface and start listening for your voice input.
When Goose responds, it will speak the response aloud and then automatically listen for your next input.
The conversation continues naturally with alternating speaking and listening, just like talking to a person.

No need to call specific functions or use special commands - just ask Goose to talk and start speaking naturally.

UI Features

The new PyQt-based UI includes:

Modern Dark Theme: Sleek, professional appearance
Audio Visualization: Dynamic visualization of audio input
Voice Selection: Choose from 54+ voice options
Voice Persistence: Your voice preference is saved between sessions
Animated Effects: Smooth animations and visual feedback
**Stat

FAQ

What is the Speech Interface (Faster Whisper) MCP server?: Speech Interface (Faster Whisper) is a Model Context Protocol (MCP) server profile on explainx.ai. MCP lets AI hosts (e.g. Claude Desktop, Cursor) call tools and resources through a standard interface; this page summarizes categories, install hints, and community ratings.
How do MCP servers relate to agent skills?: Skills are reusable instruction packages (often SKILL.md); MCP servers expose live capabilities. Teams frequently combine both—skills for workflows, MCP for APIs and data. See explainx.ai/skills and explainx.ai/mcp-servers for parallel directories.
How are reviews shown for Speech Interface (Faster Whisper)?: This profile displays 29 aggregated ratings (sample rows for discoverability plus signed-in user reviews). Average score is about 4.6 out of 5—verify behavior in your own environment before production use.

Use Cases▌

Extended AI Capabilities

Add new capabilities to Claude beyond text generation

Example

Access external data sources, execute code, interact with tools and services

✓

Transform Claude from chatbot to action-taking agent

Context Enhancement

Provide Claude with access to relevant context and data

Example

Load project documentation, access knowledge bases, query databases

✓

Get more accurate, context-aware responses

Workflow Automation

Automate multi-step workflows combining AI and external tools

Example

Research → Summarize → Create document → Send notification

✓

Complete complex tasks end-to-end without manual steps

Implementation Guide▌

Prerequisites

›Claude Desktop 0.7.0+ or Cursor IDE with MCP support
›Basic understanding of MCP architecture and capabilities
›Access credentials for integrated services (if required)
›Willingness to experiment and iterate on configuration

Time Estimate

15-60 minutes depending on server complexity

Installation Steps

1.Install MCP server: npm install -g [package-name] or via GitHub
2.Add server configuration to ~/.claude/mcp.json
3.Provide required credentials and configuration
4.Restart Claude Desktop to load new server
5.Test basic functionality with simple prompts
6.Explore capabilities and experiment with use cases
7.Document successful patterns for reuse

Troubleshooting

⚠MCP server not loading: Check config syntax, verify installation
⚠Connection errors: Check network, firewall, credentials
⚠Feature not working: Read server docs, check required parameters
⚠Performance issues: Monitor resource usage, check for network latency
⚠Conflicts with other servers: Check port assignments, namespace collisions

Best Practices▌

✓ Do

+Read server documentation thoroughly before setup
+Start with simple use cases to validate functionality
+Test in non-production environment first
+Monitor resource usage and performance
+Keep servers updated for bug fixes and new features
+Document configuration for team members
+Use environment variables for sensitive configuration

✗ Don't

−Don't grant overly permissive access to MCP servers
−Don't skip reading security considerations in docs
−Don't expose sensitive data without proper controls
−Don't run untrusted MCP servers without code review
−Don't ignore error messages—investigate root cause

💡 Pro Tips

★Combine multiple MCP servers for powerful workflows
★Create custom MCP servers for your specific needs
★Share successful configurations with team
★Use MCP inspector for debugging
★Join MCP community for tips and troubleshooting

Technical Details▌

Architecture

Model Context Protocol standardizes how AI hosts (Claude, Cursor) communicate with external tools and data sources through server implementations.

Protocols

Model Context Protocol (MCP)
JSON-RPC 2.0
stdio or HTTP transport

Compatibility

Claude Desktop
Cursor IDE
Custom MCP clients

When to Use This▌

✓ Use When

Use when you need Claude to access external data, execute actions, or integrate with tools. Best for extending AI capabilities beyond conversation.

✗ Avoid When

Avoid when native integrations exist (use official APIs directly), for real-time critical systems, or when security/compliance requires zero external dependencies.

Integration▌

→Tool composition: Chain multiple MCP tools in workflows
→Context augmentation: Provide AI with relevant external data
→Action delegation: Let AI execute tasks on external systems
→Bidirectional sync: Keep AI context and external systems in sync

Discussion

Product Hunt–style comments (not star reviews)

No comments yet — start the thread.

List & Promote Your MCP Server

Share your MCP server with the developer community

GET_STARTED →

MCP server reviews

Ratings

4.6★★★★★29 reviews

★★★★★Sakshi Patil· Nov 27, 2024
We wired Speech Interface (Faster Whisper) into a staging workspace; the listing’s GitHub and npm pointers saved time versus hunting across READMEs.
★★★★★Chaitanya Patil· Oct 18, 2024
Speech Interface (Faster Whisper) is a well-scoped MCP server in the explainx.ai directory — install snippets and categories matched our Claude Code setup.
★★★★★James Huang· Sep 25, 2024
Speech Interface (Faster Whisper) is among the better-indexed MCP projects we tried; the explainx.ai summary tracks the official description.
★★★★★Piyush G· Sep 13, 2024
Speech Interface (Faster Whisper) is among the better-indexed MCP projects we tried; the explainx.ai summary tracks the official description.
★★★★★Zaid Mehta· Sep 1, 2024
Strong directory entry: Speech Interface (Faster Whisper) surfaces stars and publisher context so we could sanity-check maintenance before adopting.
★★★★★Zaid Harris· Aug 20, 2024
I recommend Speech Interface (Faster Whisper) for teams standardizing on MCP; the explainx.ai page compares cleanly with sibling servers.
★★★★★Emma Huang· Aug 16, 2024
We evaluated Speech Interface (Faster Whisper) against two servers with overlapping tools; this profile had the clearer scope statement.
★★★★★Shikha Mishra· Aug 4, 2024
We evaluated Speech Interface (Faster Whisper) against two servers with overlapping tools; this profile had the clearer scope statement.
★★★★★Yash Thakker· Jul 23, 2024
Useful MCP listing: Speech Interface (Faster Whisper) is the kind of server we cite when onboarding engineers to host + tool permissions.
★★★★★Mia Okafor· Jul 11, 2024
According to our notes, Speech Interface (Faster Whisper) benefits from clear Model Context Protocol framing — fewer ambiguous “AI plugin” claims.

showing 1-10 of 29

1 / 3