Choose Your Path
Recommended
Quick Install
For end users who want to get started immediately. Automatically downloads required models.
curl -fsSL https://github.com/madcato/voicebot/releases/latest/download/install.sh | sh
Developer Build
For developers who want to build from source and customize configuration.
git clone https://github.com/madcato/voicebot.git
cd voicebot
cp .env.example .env
# Edit .env with your settings
cargo build --release
System Requirements
Operating System
macOS 12.0+ (Big Sur or later)
Apple Silicon (M-series) recommended for optimal performance
Rust Toolchain
Stable Rust required
rustup install stable
Optional Dependencies
- Kokoro TTS:
brew install espeak-ng - MCP Servers:
brew install node
Step-by-Step Developer Setup
1
Clone the Repository
git clone https://github.com/madcato/voicebot.git
cd voicebot
2
Configure Environment
Copy the example configuration and customize:
cp .env.example .env
nano .env
Minimum required settings:
| Variable | Default | Description |
|---|---|---|
WHISPER_MODEL |
- | Path to Whisper .bin model |
LLM_URL |
http://127.0.0.1:8000 |
LLM server URL |
LLM_MODEL |
local-model |
Model name/path |
Example .env configuration:
WHISPER_MODEL=./models/ggml-small.bin
WHISPER_COREML=0
LLM_URL=http://127.0.0.1:8000
LLM_MODEL=mlx-community/Qwen3-8B-4bit
TTS_PROVIDER=avspeech
AVSPEECH_VOICE="Jorge (Enhanced)"
AVSPEECH_RATE=0.55
VOICEBOT_LANGUAGE=es
3
Start the LLM Server
Voicebot requires an external LLM server. We recommend mlx-lm for Apple Silicon.
Option A: Using the helper script
./scripts/start-mlx-lm.sh mlx-community/Qwen3-8B-4bit
Option B: Manual start
mlx_lm.server \
--model mlx-community/Qwen3-8B-4bit \
--host 127.0.0.1 --port 8000 \
--prompt-cache-size 1 \
--chat-template-args '{"enable_thinking": false}'
Option C: oMLX (alternative)
./scripts/start-omlx.sh ~/models
# Set in .env: LLM_URL=http://127.0.0.1:8001
4
Build & Run
Standard build (AVSpeech TTS - macOS default)
cargo build --release
cargo run --release
With Kokoro TTS (high-quality, ONNX-based)
cargo build --features kokoro --release
TTS_PROVIDER=kokoro cargo run --features kokoro --release
With Terminal UI
cargo build --features tui --release
cargo run --features tui --release
With HTTP Control API + SSE
cargo build --features control --release
CONTROL_PORT=9001 cargo run --features control --release
Model Setup
The installer automatically downloads required models. For manual setup:
| Model | Purpose | Source |
|---|---|---|
| Whisper STT | Speech-to-text | HuggingFace (ggml-small.bin) |
| Silero VAD | Voice activity detection | sherpa-onnx (ggml-silero-vad.bin) |
| Kokoro TTS | Text-to-speech (Linux) | Kokoro GitHub release |
Manual Model Downloads
# Whisper STT model
wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin -O ./models/ggml-small.bin
# Silero VAD model
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx -O ./models/ggml-silero-vad.bin
# Optional: Kokoro TTS models
wget https://github.com/leloykun/kokoro/releases/download/v1.0/kokoro-v1.0.onnx -O ./models/kokoro-v1.0.onnx
wget https://github.com/leloykun/kokoro/releases/download/v1.0/voices-v1.0.bin -O ./models/voices-v1.0.bin
Troubleshooting
"No audio device found"
# List available devices
cargo run -- --list-devices
# Then set in .env:
AUDIO_INPUT_DEVICE="Microphone"
AUDIO_OUTPUT_DEVICE="Speaker"
# For multiple matches, use index suffix:
AUDIO_INPUT_DEVICE="Poly Sync 20-M#0"
TTS not working
- AVSpeech: Check voices with
say -v ? - Kokoro: Ensure models exist and
espeak-ngis installed - Check feature flags:
--features avspeechor--features kokoro
High latency
- Reduce
VAD_SILENCE_MSto 150-200ms - Use CoreML STT:
WHISPER_COREML=1 - Verify LLM server has Metal acceleration
- Check performance logs:
RUST_LOG=performance=debug cargo run