Voicebot - Voice-First AI Butler

What Makes It Different?

A chatbot answers questions. A butler anticipates needs.

🎙️

Always Listening

Automatic speech detection with Silero VAD. No push-to-talk button needed. Just speak naturally.

⚡

Real-Time Responses

Sub-3-second latency from speech to response. Streaming LLM with KV-cache reuse for instant answers.

🗣️

Barge-In Support

Interrupt mid-speech instantly. The pipeline cancels active TTS and starts fresh with your new input.

🧠

Context Awareness

Persistent memory, user profile extraction, and conversation summarization across sessions.

👁️

Visual Awareness

EYES module captures and analyzes screen content periodically using a vision-capable secondary LLM.

🔧

Computer Control

Built-in tools: screenshots, clipboard, app launching, shell commands, web search, and MCP protocol support.

The Voice Pipeline

Capture

CPAL audio capture with pre-roll buffer

→

VAD

Silero voice activity detection

→

STT

Whisper.cpp speech-to-text (Metal GPU)

→

LLM

Streaming inference via mlx-lm/oMLX

→

TTS

Sentence-by-sentence synthesis & playback

Every stage connected by tokio channels. Sentence N+1 generates while sentence N plays.

Quick Start

# Install with one command
curl -fsSL https://github.com/madcato/voicebot/releases/latest/download/install.sh | sh

# Or build from source
git clone https://github.com/madcato/voicebot.git
cd voicebot
cp .env.example .env
cargo build --release
cargo run --release

View detailed installation instructions →

Feature Highlights

🎙️ Core Voice Pipeline

Real-time voice capture with VAD and pre-roll buffer
Whisper STT via whisper-cpp-plus (Metal GPU on macOS)
Streaming LLM with sub-second latency
Sentence-by-sentence TTS playback
Barge-in: user speech cancels active pipeline instantly
Persistent SQLite conversation history

🧠 Advanced Intelligence

Context consolidation with persistent memory
User profile extraction from conversations
Inference daemon for proactive suggestions
Two conversation modes: Active and Ambient
Multi-speaker registry with auto-enrollment
Ambient context buffer for contextual responses

🔌 Integrations

Tool calling: time, files, clipboard, apps, shell, web search
MCP (Model Context Protocol) support
HTTP Control API with SSE events
WebSocket server for remote devices
Agent delegation for complex tasks
SearXNG-backed web search

Roadmap

Planned

Calendar Sync

Integration with calendar services for proactive scheduling assistance

Planned

Mobile Companion App

iOS/Android app for remote interaction with your Voicebot instance

Planned

Multi-Platform Support

Linux and Windows support beyond macOS