5 MINUTE TUTORIAL

How to Add Local Voice Mode to Claude Desktop

Turn Claude Desktop into an interruptible, conversational voice assistant powered entirely by local Machine Learning on your Mac. Zero cloud APIs required.

The Goal

By connecting the Voice MCP Server to Claude Desktop, Claude gains the ability to speak using Kokoro TTS and listen using Apple MLX Whisper. It uses a Push-to-Talk (Shift key) mechanism to ensure flawless focus and true barge-in capabilities.

Prerequisites

  • A Mac with Apple Silicon (M1, M2, M3, or M4).
  • Claude Desktop installed.
  • Node.js (v18+) installed.

Step 1: Install System Audio Dependencies

Open your terminal and install the underlying audio components:

brew install portaudio espeak-ng

Step 2: Install Voice MCP Server

Install the server globally:

npm install -g voice-mcp-server

Step 3: Configure Claude Desktop

Claude Desktop requires editing a JSON configuration file to add MCP servers.

  1. Open your terminal and use nano or your favorite editor:
nano ~/Library/Application\ Support/Claude/claude_desktop_config.json

Paste the following configuration:

{
  "mcpServers": {
    "voice-mcp-server": {
      "command": "voice-mcp-server",
      "args": []
    }
  }
}

Step 4: Grant Input Monitoring Permissions

To use the Push-to-Talk feature (Shift key) globally, macOS requires permission. Go to System Settings > Privacy & Security > Input Monitoring and toggle Claude to ON.

Step 5: Restart Claude Desktop

Fully quit (Cmd+Q) and reopen Claude Desktop. The Voice MCP Server will now be loaded.

Bonus: Need Time to Think?

The Voice MCP Server features an intelligent Standby Mode. If you need a moment, just tell the AI something like "give me a minute" or "let me think." It will gracefully suspend itself, turning off the microphone (and the orange macOS privacy dot) indefinitely. When you are ready to resume, simply press and hold Shift to start talking again!

Start Talking!

In Claude Desktop, prompt: "Let's talk. Use your voice_converse tool." Note: The first run takes a few minutes as it downloads the ~4GB local ML models.