Choose Your Path

Developer Build

For developers who want to build from source and customize configuration.

git clone https://github.com/madcato/voicebot.git
cd voicebot
cp .env.example .env
# Edit .env with your settings
cargo build --release

System Requirements

Operating System

macOS 12.0+ (Big Sur or later)

Apple Silicon (M-series) recommended for optimal performance

Rust Toolchain

Stable Rust required

rustup install stable

Optional Dependencies

  • Kokoro TTS: brew install espeak-ng
  • MCP Servers: brew install node

Step-by-Step Developer Setup

1

Clone the Repository

git clone https://github.com/madcato/voicebot.git
cd voicebot
2

Configure Environment

Copy the example configuration and customize:

cp .env.example .env
nano .env

Minimum required settings:

Variable Default Description
WHISPER_MODEL - Path to Whisper .bin model
LLM_URL http://127.0.0.1:8000 LLM server URL
LLM_MODEL local-model Model name/path

Example .env configuration:

WHISPER_MODEL=./models/ggml-small.bin
WHISPER_COREML=0
LLM_URL=http://127.0.0.1:8000
LLM_MODEL=mlx-community/Qwen3-8B-4bit
TTS_PROVIDER=avspeech
AVSPEECH_VOICE="Jorge (Enhanced)"
AVSPEECH_RATE=0.55
VOICEBOT_LANGUAGE=es
3

Start the LLM Server

Voicebot requires an external LLM server. We recommend mlx-lm for Apple Silicon.

Option A: Using the helper script

./scripts/start-mlx-lm.sh mlx-community/Qwen3-8B-4bit

Option B: Manual start

mlx_lm.server \
  --model mlx-community/Qwen3-8B-4bit \
  --host 127.0.0.1 --port 8000 \
  --prompt-cache-size 1 \
  --chat-template-args '{"enable_thinking": false}'

Option C: oMLX (alternative)

./scripts/start-omlx.sh ~/models
# Set in .env: LLM_URL=http://127.0.0.1:8001
4

Build & Run

Standard build (AVSpeech TTS - macOS default)

cargo build --release
cargo run --release

With Kokoro TTS (high-quality, ONNX-based)

cargo build --features kokoro --release
TTS_PROVIDER=kokoro cargo run --features kokoro --release

With Terminal UI

cargo build --features tui --release
cargo run --features tui --release

With HTTP Control API + SSE

cargo build --features control --release
CONTROL_PORT=9001 cargo run --features control --release

Model Setup

The installer automatically downloads required models. For manual setup:

Model Purpose Source
Whisper STT Speech-to-text HuggingFace (ggml-small.bin)
Silero VAD Voice activity detection sherpa-onnx (ggml-silero-vad.bin)
Kokoro TTS Text-to-speech (Linux) Kokoro GitHub release

Manual Model Downloads

# Whisper STT model
wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin -O ./models/ggml-small.bin

# Silero VAD model
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx -O ./models/ggml-silero-vad.bin

# Optional: Kokoro TTS models
wget https://github.com/leloykun/kokoro/releases/download/v1.0/kokoro-v1.0.onnx -O ./models/kokoro-v1.0.onnx
wget https://github.com/leloykun/kokoro/releases/download/v1.0/voices-v1.0.bin -O ./models/voices-v1.0.bin

Troubleshooting

"No audio device found"

# List available devices
cargo run -- --list-devices

# Then set in .env:
AUDIO_INPUT_DEVICE="Microphone"
AUDIO_OUTPUT_DEVICE="Speaker"

# For multiple matches, use index suffix:
AUDIO_INPUT_DEVICE="Poly Sync 20-M#0"

TTS not working

  • AVSpeech: Check voices with say -v ?
  • Kokoro: Ensure models exist and espeak-ng is installed
  • Check feature flags: --features avspeech or --features kokoro

High latency

  1. Reduce VAD_SILENCE_MS to 150-200ms
  2. Use CoreML STT: WHISPER_COREML=1
  3. Verify LLM server has Metal acceleration
  4. Check performance logs: RUST_LOG=performance=debug cargo run

Next Steps

Once installed, learn about how Voicebot works or check out the contributing guidelines.