Why Gemini CLI?

The Gemini CLI is a powerful agentic tool. Adding voice turns it from a standard terminal interface into a Jarvis-like assistant running directly in your workspace. Using local Apple MLX Whisper and Kokoro TTS, your voice processing is entirely private.

Prerequisites

A Mac with Apple Silicon (M1, M2, M3, or M4).
Gemini CLI installed.
Node.js (v18+) installed.

Step 1: Install System Dependencies

Open your terminal and install the underlying audio components:

brew install portaudio espeak-ng

Step 2: Install Voice MCP Server

Install the server globally so the Gemini CLI can execute it seamlessly:

npm install -g voice-mcp-server

Step 3: Connect to Gemini CLI

We use the built-in MCP tooling provided by the Gemini CLI. Run this command:

gemini mcp add voice-mcp-server --scope user voice-mcp-server

This adds the server to your global user scope, so it's available in any project directory you open the CLI in.

Step 4: Grant Input Monitoring Permissions

To use the Push-to-Talk feature (Right Option key) globally, macOS requires permission. Go to System Settings > Privacy & Security > Input Monitoring and toggle your Terminal app (e.g., iTerm, Terminal, Warp) to ON.

Bonus: Need Time to Think?

The Voice MCP Server features an intelligent Standby Mode. If you need a moment, just tell the AI something like "give me a minute" or "let me think." It will gracefully suspend itself, turning off the microphone (and the orange macOS privacy dot) indefinitely. When you are ready to resume, simply press and hold Right Option to start talking again!

You're Ready to Talk!

Launch the Gemini CLI, and prompt: "Let's talk." The AI will utilize the voice_converse tool to speak to you. Just press and hold Right Option when it's your turn to reply!