Voicebot

An open-source voice-first AI butler built in Rust for macOS

Real-time voice interaction with natural conversation flow, proactive assistance, and computer automation. Always listening, instantly responding.

What Makes It Different?

A chatbot answers questions. A butler anticipates needs.

🎙️

Always Listening

Automatic speech detection with Silero VAD. No push-to-talk button needed. Just speak naturally.

Real-Time Responses

Sub-3-second latency from speech to response. Streaming LLM with KV-cache reuse for instant answers.

🗣️

Barge-In Support

Interrupt mid-speech instantly. The pipeline cancels active TTS and starts fresh with your new input.

🧠

Context Awareness

Persistent memory, user profile extraction, and conversation summarization across sessions.

👁️

Visual Awareness

EYES module captures and analyzes screen content periodically using a vision-capable secondary LLM.

🔧

Computer Control

Built-in tools: screenshots, clipboard, app launching, shell commands, web search, and MCP protocol support.

The Voice Pipeline

1

Capture

CPAL audio capture with pre-roll buffer

2

VAD

Silero voice activity detection

3

STT

Whisper.cpp speech-to-text (Metal GPU)

4

LLM

Streaming inference via mlx-lm/oMLX

5

TTS

Sentence-by-sentence synthesis & playback

Every stage connected by tokio channels. Sentence N+1 generates while sentence N plays.

Quick Start

# Install with one command
curl -fsSL https://github.com/madcato/voicebot/releases/latest/download/install.sh | sh

# Or build from source
git clone https://github.com/madcato/voicebot.git
cd voicebot
cp .env.example .env
cargo build --release
cargo run --release

View detailed installation instructions →

Feature Highlights

🎙️ Core Voice Pipeline

  • Real-time voice capture with VAD and pre-roll buffer
  • Whisper STT via whisper-cpp-plus (Metal GPU on macOS)
  • Streaming LLM with sub-second latency
  • Sentence-by-sentence TTS playback
  • Barge-in: user speech cancels active pipeline instantly
  • Persistent SQLite conversation history

🧠 Advanced Intelligence

  • Context consolidation with persistent memory
  • User profile extraction from conversations
  • Inference daemon for proactive suggestions
  • Two conversation modes: Active and Ambient
  • Multi-speaker registry with auto-enrollment
  • Ambient context buffer for contextual responses

🔌 Integrations

  • Tool calling: time, files, clipboard, apps, shell, web search
  • MCP (Model Context Protocol) support
  • HTTP Control API with SSE events
  • WebSocket server for remote devices
  • Agent delegation for complex tasks
  • SearXNG-backed web search

Roadmap

Planned

Calendar Sync

Integration with calendar services for proactive scheduling assistance

Planned

Mobile Companion App

iOS/Android app for remote interaction with your Voicebot instance

Planned

Multi-Platform Support

Linux and Windows support beyond macOS