What Makes It Different?
A chatbot answers questions. A butler anticipates needs.
Always Listening
Automatic speech detection with Silero VAD. No push-to-talk button needed. Just speak naturally.
Real-Time Responses
Sub-3-second latency from speech to response. Streaming LLM with KV-cache reuse for instant answers.
Barge-In Support
Interrupt mid-speech instantly. The pipeline cancels active TTS and starts fresh with your new input.
Context Awareness
Persistent memory, user profile extraction, and conversation summarization across sessions.
Visual Awareness
EYES module captures and analyzes screen content periodically using a vision-capable secondary LLM.
Computer Control
Built-in tools: screenshots, clipboard, app launching, shell commands, web search, and MCP protocol support.
The Voice Pipeline
Capture
CPAL audio capture with pre-roll buffer
VAD
Silero voice activity detection
STT
Whisper.cpp speech-to-text (Metal GPU)
LLM
Streaming inference via mlx-lm/oMLX
TTS
Sentence-by-sentence synthesis & playback
Every stage connected by tokio channels. Sentence N+1 generates while sentence N plays.
Quick Start
# Install with one command
curl -fsSL https://github.com/madcato/voicebot/releases/latest/download/install.sh | sh
# Or build from source
git clone https://github.com/madcato/voicebot.git
cd voicebot
cp .env.example .env
cargo build --release
cargo run --release
Feature Highlights
🎙️ Core Voice Pipeline
- Real-time voice capture with VAD and pre-roll buffer
- Whisper STT via whisper-cpp-plus (Metal GPU on macOS)
- Streaming LLM with sub-second latency
- Sentence-by-sentence TTS playback
- Barge-in: user speech cancels active pipeline instantly
- Persistent SQLite conversation history
🧠 Advanced Intelligence
- Context consolidation with persistent memory
- User profile extraction from conversations
- Inference daemon for proactive suggestions
- Two conversation modes: Active and Ambient
- Multi-speaker registry with auto-enrollment
- Ambient context buffer for contextual responses
🔌 Integrations
- Tool calling: time, files, clipboard, apps, shell, web search
- MCP (Model Context Protocol) support
- HTTP Control API with SSE events
- WebSocket server for remote devices
- Agent delegation for complex tasks
- SearXNG-backed web search
Roadmap
Calendar Sync
Integration with calendar services for proactive scheduling assistance
Mobile Companion App
iOS/Android app for remote interaction with your Voicebot instance
Multi-Platform Support
Linux and Windows support beyond macOS