Reachy Mini goes fully local

2026-05-27

1 min read

Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.

This guide explains how to run Reachy Mini with a fully local speech-to-speech pipeline, eliminating the need to send audio to cloud servers.

•Uses a cascaded VAD → STT → LLM → TTS architecture with recommended components: Silero VAD, Parakeet-TDT STT, and Qwen3-TTS
•Deploy llama.cpp locally to serve models like Gemma-4 with optimized settings including flash attention and parallel slots for concurrent requests
•Supports multiple LLM backends through the Responses API protocol: vLLM, Hugging Face Inference Endpoints, or OpenAI
•Provides on-device processing options including MLX for Apple Silicon to achieve low-latency inference
•Each stage of the pipeline is customizable with swappable components to optimize for latency, quality, language support, or specific needs

Related Articles