Claudia

Build your own always-on voice assistant in an afternoon — a Raspberry Pi Zero 2 WH with a USB microphone for conversation audio and the Hiwonder WonderEcho module as a hardware wake-word trigger, wired straight to the Claude API. Sits on your shelf, listens for "Claudia", and Claude answers out loud in seconds. No Alexa account, no surveillance, no subscription — just a Claude API key and hardware you own.

Why two audio devices? The WonderEcho is a command-word recognizer, not a microphone: its CI1302 chip recognizes the wake word on-device and reports a short event ID over I²C. It never streams raw audio, so Whisper can't transcribe through it. The WonderEcho handles the always-listening wake word; the USB mic (a standard ALSA device) records what you actually say.

WH, not W. The WonderEcho connects to four GPIO pins (SDA / SCL / 5V / GND), so the build needs the WH variant with pre-soldered headers. Buying the plain "W" means soldering 40 pins yourself before anything works.

Before you start, gather: a Windows / macOS / Linux computer to flash the microSD and SSH in, a way to plug a microSD into it (the SanDisk Ultra ships with a full-size SD adapter but no USB reader — most modern ultrabooks and MacBooks need a USB microSD reader, ~$8), and a 2.4 GHz Wi-Fi network (the Pi Zero 2 WH has no 5 GHz radio). The smart-plug options below ship with US plugs; each vendor (Kasa, Shelly, Sonoff) also sells EU/UK/AU variants that speak the same local API — pick your region at checkout. No soldering iron needed, but you'll need 4 female-to-female jumper wires to link the WonderEcho to the Pi — the WonderEcho does not include any cable, so the shopping list below adds a cheap Dupont wire kit. The USB microphone plugs into the Pi's data port through a micro-USB OTG adapter (also in the shopping list — the Pi Zero has no full-size USB-A port).

Stock check. The Pi Zero 2 WH is supply-constrained; if all the US retailers on the cards below show out-of-stock, rpilocator.com tracks live availability across the official reseller network.

01. Configure #

Pick what you have or plan to buy and the guide below adapts. Choices save automatically.

Battery / portability Conversation microphone Speech-to-text (ASR) Text-to-speech (TTS) 3D-printed case Smart-home control

02. Shopping list #

Each card opens its Google Shopping search in a new tab so you can verify current prices. Cards that don't apply to your configuration are hidden, and the total below updates live.

core — Required for every build.

Raspberry Pi Zero 2 WH

~$20

SKU: SC0511 (RPi Foundation) · Adafruit 6008 · SparkFun DEV-26256
Wireless: Wi-Fi 802.11 b/g/n 2.4 GHz · Bluetooth 4.2 / BLE
CPU: Quad-core Arm Cortex-A53 @ 1 GHz, 64-bit ARMv8
GPIO: 40-pin pre-soldered header (the “H” suffix)
SoC: Raspberry Pi RP3A0 SiP (Broadcom BCM2710A1 die)
Power in: 5 V via micro-USB (the corner port labeled “PWR IN”)
RAM: 512 MB LPDDR2
Form: 65 × 30 mm

Buy:Official Google Reputable #1 Reputable #2 Reputable #3

Must be the WH (with pre-soldered headers). The plain W has no GPIO pins and would need 40 pins soldered before the WonderEcho's 4-pin cable can connect. Most retailers list WH as a dropdown variant of the W product page — pick 'with headers' before adding to cart. Supply is intermittent — if all US retailers show out-of-stock, rpilocator.com tracks live availability.

microSD card, 32 GB Class 10 (SanDisk Ultra)

~$9

SKU: SDSQUA4-032G-AN6MA (SanDisk Ultra microSDHC 32 GB)
Used for: Raspberry Pi OS 64-bit (full image) — flashed in Part 04 via Raspberry Pi Imager
Capacity: 32 GB
In the box: microSD card + full-size SD adapter. No USB reader included.
Speed: Class 10, UHS-I U1, A1 app-rated · up to 120 MB/s read
Substitutes: Any reputable 32 GB Class 10 / A1 microSD works (Samsung EVO Plus, Kingston Canvas Select, etc.)

Buy:Official Google Reputable #1 Reputable #2

Ships with a full-size SD-card adapter only — no USB reader. If your laptop has no SD slot (most modern ultrabooks/MacBooks don't), grab a USB microSD reader too (~$8).

Official Raspberry Pi 12.5W micro-USB power supply (5V/2.5A)

~$9

Output: 5.1 V / 2.5 A DC (12.5 W nominal, 12.75 W actual)
Plugs into: Pi Zero 2 WH “PWR IN” port (corner micro-USB, not the middle “USB” port)
Connector: micro-USB-B (captive) — NOT USB-C
Regions: US / UK / EU / AU / IN plug variants — pick at retailer checkout
Cable: 1.5 m, 18 AWG

Buy:Official Google Reputable #1 Reputable #2

The Pi Zero 2 WH is micro-USB, NOT USB-C. A USB-C-only PSU will not fit. Plugs into the corner port labeled “PWR IN” on the silkscreen, not the middle “USB” port. Any clean 5 V / 2.5 A micro-USB supply works — the official one is recommended because it has a thick captive cable and won't brown out under load.

micro-USB OTG adapter (USB-A female)

~$3

SKU: Adafruit 1099 (cable) · Adafruit 2910 (flush dongle variant)
Plugs into: Pi Zero 2 WH middle 'USB' port — NOT the corner 'PWR IN' (power-only)
Connectors: micro-USB-B male (OTG, ID pin wired for host mode) → USB-A female
Used for: Hosting the USB conversation mic (basic dongle or reSpeaker array)
Length: ~13 cm (5") tip-to-tip; passive cable, no electronics

Buy:Official Google Reputable #1 Reputable #2

Required — the Pi Zero 2 WH has no full-size USB-A port, so the USB mic connects through this. Plug it into the MIDDLE port labeled 'USB' (the data/OTG port), never the corner 'PWR IN' port, which carries power only. Any micro-B-male-to-A-female OTG cable or dongle works; Adafruit 1099 is the reference part.

Hiwonder WonderEcho voice module (I2C wake-word frontend)

~$24

SKU: Hiwonder 21090150 · Amazon ASIN B0F7RR983M
On-board: Mic + speaker (used only by its on-chip recognizer — not visible to ALSA) + red/blue status LEDs
In the box: 1× WonderEcho module only — no jumper wires (source them separately)
Memory: 2 MB Flash, 640 KB SRAM
Voice chip: CI1302 (on-device CNN wake-word + recognition firmware)
Range: ~5 m quiet / ~1 m noisy · up to 255 phrases
Interface: I²C bus 1, default address 0x52 — verify with “i2cdetect -y 1”
Size: 48 × 24 × 10.5 mm, 10 g
Connector: 4-pin cable: SDA → BCM 2 (pin 3), SCL → BCM 3 (pin 5), 5V → pin 2, GND → pin 6
Languages: English + Chinese

Buy:Official Google Reputable #1

The hardware wake-word trigger — NOT the conversation microphone. Its CI1302 chip recognizes 'Claudia' on-device and reports an event ID over I²C; it never streams raw audio, so the build also needs the USB mic above. Does not include any wiring — connect it to the Pi's GPIO header with 4 female-to-female jumper wires (see the ELEGOO Dupont kit below).

ELEGOO 120pcs Dupont Jumper Wire Kit (M-F / M-M / F-F)

~$7

Pieces: 120 (3× 40-pin ribbons — peel apart as needed)
Pitch: 2.54 mm (0.1") — matches the Pi GPIO header
Types: 40× male-to-female · 40× male-to-male · 40× female-to-female
Rating: Light-gauge signal wire: < 1 A, < 50 V DC
Length: ~20 cm (200 mm)
Used for: Female-to-female jumpers for the WonderEcho ↔ Pi GPIO link (+ spares)

Buy:Official Google Reputable #1 Reputable #2

Required. The WonderEcho does not include any wiring, so you need these to tie the mic module to the Pi's GPIO header. This 120-pc assortment gives you the female-to-female wires for the 4-pin link (SDA → pin 3, SCL → pin 5, 5V → pin 2, GND → pin 6) plus jumpers for any other GPIO wiring. 2.54 mm pitch matches the Pi Zero 2 WH header exactly.

mic — The conversation microphone — required, pick one in the configurator. The WonderEcho only handles the wake word; it never streams audio to the Pi.

SunFounder USB 2.0 Mini Microphone

~$9

SKU: SunFounder CN0029 · Amazon ASIN B01KLRBHGM
Pattern: Omnidirectional
Interface: USB 2.0 — USB Audio Class 1.0, driverless on Linux/ALSA
Form: Thumb-size dongle — plugs straight into a USB-A port
Chip: C-Media CM108 (16-bit, 44.1/48 kHz capture)
Range: Desk-distance pickup — fine for ~1-2 m; pick the array for far-field

Buy:Official Google Reputable #1 Reputable #2

Required (or pick the reSpeaker array instead). The conversation mic — the WonderEcho only recognizes the wake word on-chip and never streams audio, so Whisper needs a real ALSA microphone. This thumb-size dongle is plug-and-play USB Audio Class: it shows up in 'arecord -l' the moment it's plugged in, no driver. Plugs into the Pi's middle 'USB' port via the OTG adapter below.

Seeed reSpeaker XVF3800 USB 4-Mic Array (far-field upgrade)

~$51

SKU: Seeed 101991441 · Amazon ASIN B0FKGFXQQ5
Capture: 16 kHz max sample rate, 32-bit — matched to speech ASR, not music
Mics: 4× PDM MEMS, circular array — 360° far-field up to ~5 m
Audio out: 3.5 mm headphone jack + JST speaker connector (up to 5 W)
DSP: XMOS XVF3800: AEC · beamforming · DNN noise suppression · dereverb · DoA · AGC
Size: 108 × 108 × 18 mm
Interface: USB-C (cable included) — USB Audio Class 2.0, driverless on Linux/ALSA

Buy:Official Google Reputable #1 Reputable #2

The higher-quality alternative to the SunFounder mini mic: four MEMS mics with on-chip XMOS DSP (echo cancellation, beamforming, neural noise suppression, dereverberation) for 360° pickup up to ~5 m — Alexa-grade far-field capture. Standard USB Audio Class 2.0, so it's driverless ALSA just like the basic mic. Bonus: a 3.5 mm jack + JST connector give it a speaker output too.

portable — Optional battery for a roaming build.

PiSugar 3 1200 mAh battery

~$40

Capacity: 1200 mAh Li-Po
Extras: On-board RTC, USB-C charge port, power button, IRQ button
Fits: Raspberry Pi Zero / Zero 2 W (NOT the larger “Plus” for Pi 4)
Software: PiSugar Power Manager (Python service) — optional for OS-level battery awareness
Mount: Magnetic pogo pins — no soldering, snaps to the underside of the Pi

Buy:Official Google Reputable #1 Reputable #2

Pick the “PiSugar 3” for the Pi Zero form factor — NOT the larger “PiSugar 3 Plus” (which is sized for the full Pi 4). The 1200 mAh variant is the standard one.

smarthome — Optional locally-controllable smart plugs. Lets Claudia run commands like 'Claudia turn on the living room' without any cloud round-trip beyond Claude itself.

TP-Link Kasa HS103 / KP125M smart plug (local control via python-kasa)

~$10

Model: HS103 (single) / HS103P2 (2-pack). KP125M is the newer Matter-capable equivalent and also works with python-kasa.
Form: “Mini” single outlet, 66.5 × 40 × 38 mm
Max load: 10 A resistive @ 120 VAC (1200 W typical, general-use rating)
Local API: TP-Link Smart Home LAN protocol — “kasa --host <ip> on/off” via python-kasa, no Kasa cloud account required
Wi-Fi: 2.4 GHz only · 802.11 b/g/n (Pi Zero 2 W also 2.4 GHz only)

Buy:Official Google Reputable #1 Reputable #2

Works locally over the LAN with the python-kasa library — no cloud round-trip. Tell Claudia 'turn on the living room' and a one-line shell call flips it.

Shelly Plug US (local HTTP / MQTT, no cloud required)

~$20

Generation: Shelly Plug US Gen4 (current model) — exposes the new /rpc/ JSON-RPC and the legacy /relay/ endpoints
Local API: HTTP REST + MQTT + WebSocket. Guide uses GET /relay/0?turn=on|off (works on every Shelly generation).
Max load: 12 A continuous / 15 A peak (1440 W @ 120 VAC)
Extras: On-board energy metering · no vendor account needed for local control
Wi-Fi: 2.4 GHz · 802.11 b/g/n

Buy:Official Google Reputable #1

REST endpoint at http://<ip>/relay/0?turn=on — Claudia can flip it with a single curl. No vendor account needed. Current model is Shelly Plug US Gen4; the legacy /relay/ endpoint still works alongside the new /rpc/ JSON-RPC.

Sonoff S31 (re-flashable with Tasmota for local MQTT control)

~$10

Model: S31 (with power monitoring) — NOT S31 Lite
Stock firmware: eWeLink (cloud-dependent — does NOT meet the “local only” goal as-shipped)
MCU: Espressif ESP8266
Re-flashable: Yes — Tasmota via serial header inside the case (no soldering needed for the S31)
Energy chip: CSE7766 (UART-attached power meter on GPIO01/03)
Reference: Tasmota template + flash steps: templates.blakadder.com/sonoff_S31.html
Max load: 15 A / 1800 W @ 120 VAC

Buy:Official Google Reputable #1 Reputable #2

Out-of-the-box uses the eWeLink cloud; flash with Tasmota to get fully local MQTT/HTTP control. Pick the S31 (with energy monitoring) — NOT the S31 Lite, which omits the power meter.

Your build estimate ~$0 prices estimated 2026-06-09

03. Assemble #

Total time: ~3 minutes. No soldering.

Do not insert the microSD yet. Flash it first in section 04.
Connect the WonderEcho to the Pi's I²C header pins with 4 female-to-female Dupont jumper wires (from the kit in the shopping list — the WonderEcho does not include any cable): SDA → BCM 2 (pin 3), SCL → BCM 3 (pin 5), 5V → pin 2, GND → pin 6.
Plug the micro-USB OTG adapter into the Pi's middle port labeled USB (the data port — not the corner PWR IN port), then plug the USB microphone into the adapter.

The SunFounder mini mic is a thumb-size dongle — it just hangs off the OTG adapter. Point its grille roughly toward where you'll be speaking.

Set the reSpeaker XVF3800 array flat, mics facing the room — its beamforming works best with an unobstructed 360° view. It connects to the OTG adapter with its own USB cable.

Make sure the WonderEcho's speaker face is unobstructed (its on-board mic listens for the wake word).

Snap the PiSugar 3 battery onto the underside of the Pi using its magnetic pogo pins. No soldering — the spring-loaded pogo pins align themselves.

Final stack: WonderEcho (via I²C cable) ←→ Pi Zero 2 WH ←→ USB mic (via OTG) → PiSugar 3

Final layout: WonderEcho (via I²C cable) ←→ Pi Zero 2 WH ←→ USB mic (via OTG), wall-powered

✅ Checkpoint: The four I²C wires are seated firmly, nothing wobbles, the USB mic is in the USB (middle) port via the OTG adapter, and the WonderEcho's speaker grille is unobstructed.

04. Flash microSD #

4.1 Install Raspberry Pi Imager #

If your laptop has no SD-card slot — common on recent ultrabooks and every modern MacBook — plug in a USB microSD reader now. The card itself ships with a full-size SD adapter, but that only helps you if the host has a full-size SD slot.

Download from raspberrypi.com/software (Windows, macOS, Linux).

4.2 Flash #

Open Raspberry Pi Imager.
Choose Device → Raspberry Pi Zero 2 W (Imager doesn't distinguish W from WH — the OS image is the same).
Choose OS → Raspberry Pi OS (other) → Raspberry Pi OS (64-bit) (the full version, not Lite).
- The chatbot repo's install script expects packages from the full image. Lite will work but you'll need extra apt installs and may hit surprises.
Choose Storage → your microSD card.
Click the gear icon (⚙) for Edit Settings and configure:
- Hostname: claudia
- Username: anything other than pi — Pi OS Bookworm deprecated the default pi user, and current Imager builds warn (or refuse) when you try to set it. Use claudia, your first name, or any other identifier you'll remember.
- Password: something secure
- Enable SSH: ✅ password auth
- Wireless LAN: SSID + password for your home Wi-Fi
- Locale: your timezone (e.g. America/Chicago), keyboard your layout (e.g. us)
Save, then Write. Takes 2–5 minutes.

4.3 First boot #

Insert the microSD into the Pi.
Plug the official power supply into the PWR IN micro-USB port (the one nearest the corner, labeled PWR IN on the silkscreen). Not the middle port labeled USB.
Wait 60–90 seconds.
From your PC:
```
ssh <your-username>@claudia.local
```
If claudia.local doesn't resolve, find the Pi's IP in your router's admin page and use ssh <your-username>@192.168.x.x.

✅ Checkpoint: You see the <your-username>@claudia:~ $ prompt. Run cat /etc/os-release and confirm it says Debian/Raspberry Pi OS. Run free -h — you should see ~430 MB of Mem: (the Pi Zero 2 WH has 512 MB total).

05. System setup #

Run these from the SSH session. One at a time. Wait for each to finish.

5.1 Update #

sudo apt update && sudo apt full-upgrade -y

This takes 5–15 minutes on a Pi Zero 2 WH. Be patient.

5.2 Free up RAM (Pi Zero only has 512 MB) #

The Pi Zero 2 WH is RAM-constrained. Disable services you don't need:

# Disable Bluetooth (not used by this build)
sudo systemctl disable hciuart bluetooth

# Disable triggerhappy (gamepad daemon, not needed)
sudo systemctl disable triggerhappy

5.3 Install build dependencies #

sudo apt install -y git curl build-essential python3-pip python3-venv \
  portaudio19-dev libsndfile1 ffmpeg alsa-utils libatlas-base-dev

5.4 Enable I²C and detect the WonderEcho #

The WonderEcho is an I²C device. Turn the bus on, install i2c-tools, then verify the module answers on the bus.

# Enable I²C non-interactively
sudo raspi-config nonint do_i2c 0

# Tools + Python bindings
sudo apt install -y i2c-tools python3-smbus

sudo reboot

After it reboots, SSH back in and run:

i2cdetect -y 1

You should see a device address show up (commonly 0x52 for the WonderEcho — verify against the sticker on the module).

✅ Checkpoint: i2cdetect -y 1 lists at least one device address — the WonderEcho is talking to the Pi.

5.5 Verify the USB microphone #

The USB mic is a standard USB Audio Class device — no driver needed. Confirm ALSA sees it:

arecord -l

You should see the mic listed as a capture card (typically card 1 — card 0 is the Pi's HDMI output, which has no capture side). Then make it the default capture device so the chatbot's recorder finds it without extra flags:

nano ~/.asoundrc

Paste (this file also ships in the repo as config/asoundrc.usbmic):

pcm.!default {
    type asym
    playback.pcm {
        type plug
        slave.pcm "hw:0,0"
    }
    capture.pcm {
        type plug
        slave.pcm "hw:1,0"
    }
}

ctl.!default {
    type hw
    card 1
}

If arecord -l showed your mic on a different card number, change the hw:1,0 (and card 1) to match.

Array bonus: the reSpeaker XVF3800 also has a playback side — a 3.5 mm jack plus a JST connector driving up to 5 W speakers (Seeed wiki). Point playback.pcm at the reSpeaker's card too and one device covers both mic and speaker.

Now record a 3-second test clip:

arecord -d 3 -f S16_LE -r 16000 /tmp/mictest.wav

✅ Checkpoint: arecord -l lists the USB mic, and the test recording completes without audio open error. (Playback of the clip is covered in Part 10 — the Pi's only speaker at this point may be an HDMI display.)

06. Install chatbot #

This is the PiSugar whisplay-ai-chatbot repo — we use it as the LLM/ASR/TTS plumbing even though we're not using the Whisplay HAT itself. Wake-word detection goes through the WonderEcho; conversation audio is recorded from the USB mic, which the chatbot picks up as the default ALSA capture device (set in Part 5.5).

cd ~
git clone https://github.com/PiSugar/whisplay-ai-chatbot.git
cd whisplay-ai-chatbot
bash install_dependencies.sh
source ~/.bashrc

The dependency install pulls Node.js, Python packages, and audio libraries. This takes 15–25 minutes on a Pi Zero 2 WH. Let it finish.

The source ~/.bashrc line is important — the installer sets PATH entries you need in your current shell session.

✅ Checkpoint: install_dependencies.sh finishes without errors. Test that Node is on PATH:

node --version

You should see v22.x or newer (upstream's installer pulls in the current Node LTS).

07. API key #

Go to console.anthropic.com and sign in (or create an account).
Add a payment method and put a small amount of credit on the account (e.g., $5 — that lasts a long time on Haiku).
Navigate to API Keys → Create Key.
Name it claudia. Copy the key now — you can't see it again later.
Treat the key like a password.

Approximate cost: Casual personal use on claude-haiku-4-5-20251001 typically runs a few dollars per month at most. Check current pricing at anthropic.com/pricing.

Which model to pick #

Model ID	Speed	Quality	When to use
`claude-haiku-4-5-20251001`	Fastest	Good	Default for this device. Latency matters more than essay-grade prose for a voice assistant.
`claude-sonnet-4-6`	Medium	Excellent	If you want richer answers and don't mind a slightly slower response.
`claude-opus-4-7`	Slowest	Best	Overkill for spoken Q&A. Use for hard reasoning tasks only.

Model IDs change over time. The current list lives at docs.claude.com.

08. Configure chatbot #

8.1 Create your `.env` #

cd ~/whisplay-ai-chatbot
cp .env.template .env
nano .env

The template ships with many fields for different ASR/LLM/TTS providers. For a Claude-based build, you need the LLM section set to Anthropic. Find and set:

# === LLM (the AI brain) ===
LLM_SERVER=anthropic
ANTHROPIC_API_KEY=sk-ant-YOUR-KEY-HERE
ANTHROPIC_MODEL=claude-haiku-4-5-20251001

# === System prompt — shapes the assistant's voice ===
SYSTEM_PROMPT=You are a concise, friendly voice assistant. Answer in plain spoken English — no markdown, no bullet lists, no headings. Keep responses to 1–3 sentences unless the user explicitly asks for more.

The wake-word listener does not run on the Pi — it's handled in hardware by the WonderEcho (see section 08.3 below). The Pi only polls the WonderEcho's wake-event register over I²C, so no WAKE_WORD_* env keys are needed. When a wake event fires, the chatbot records your question from the USB mic (the default ALSA capture device you set in Part 5.5) — the WonderEcho's own mic is only used by its on-chip wake-word detector and is never seen by the Pi.

Env-key naming: upstream uses LLM_SERVER, ASR_SERVER, TTS_SERVER (not *_PROVIDER). The plugin registry switches on the lowercase value — see src/cloud-api/server.ts in the upstream repo.

ASR (speech-to-text): Whisper, local. Already wired up by the template defaults. Slowest option on a Pi Zero 2 WH (~3–6 s per utterance) but no API key required and works offline.

ASR (speech-to-text): OpenAI Whisper API. Add to your .env:

ASR_SERVER=openai
OPENAI_API_KEY=sk-REPLACE-ME

Round-trip latency drops to ~0.5–1 s. Costs a few cents per hour of speech.

ASR (speech-to-text): Google Cloud STT. Add to your .env:

ASR_SERVER=google
GOOGLE_APPLICATION_CREDENTIALS=/home/pi/google-stt-key.json

Drop the service-account JSON from Google Cloud Console at the path above. Generally fastest cloud STT on US-region traffic.

TTS (text-to-speech): Piper, local. Free, runs on the Pi. Voice quality is "robot but understandable" — fine for short replies. Add to your .env:

TTS_SERVER=piper
PIPER_BINARY_PATH=/usr/local/bin/piper
PIPER_MODEL_PATH=/home/pi/piper/voices/en_US-amy-low.onnx

TTS (text-to-speech): OpenAI gpt-4o-mini-tts (recommended). Near-state-of-the-art quality, supported by upstream out-of-the-box. Add to your .env:

TTS_SERVER=openai
OPENAI_API_KEY=sk-REPLACE-ME
OPENAI_VOICE_MODEL=gpt-4o-mini-tts
OPENAI_VOICE_TYPE=nova

The new gpt-4o-mini-tts model and the 4o-series voices (alloy, nova, onyx, marin, cedar, plus older echo/fable/shimmer/ash/ballad/coral/sage/verse) are dramatically more natural than the older tts-1. Costs roughly $0.015 per minute of speech.

TTS (text-to-speech): ElevenLabs (best quality, requires a one-time patch).

ElevenLabs has the most natural voices on the market right now, but the upstream chatbot doesn't ship an ElevenLabs handler. You add one yourself — about 40 lines of TypeScript and a single registration entry.

Step 1 — handler. Create ~/whisplay-ai-chatbot/src/cloud-api/elevenlabs/elevenlabs-tts.ts with:

import mp3Duration from "mp3-duration";
import { TTSResult } from "../../type";

// The chatbot already loads .env at startup, so process.env is populated
// by the time this plugin's activate() runs — no need to call dotenv here.
const apiKey     = process.env.ELEVENLABS_API_KEY     || "";
const voiceId    = process.env.ELEVENLABS_VOICE_ID    || "EXAVITQu4vr4xnSDxMaL"; // "Bella"
const modelId    = process.env.ELEVENLABS_MODEL_ID    || "eleven_turbo_v2_5";    // low-latency
const stability  = parseFloat(process.env.ELEVENLABS_STABILITY  || "0.5");
const similarity = parseFloat(process.env.ELEVENLABS_SIMILARITY || "0.75");

const elevenLabsTTS = async (text: string): Promise<TTSResult> => {
  if (!apiKey) { console.error("ELEVENLABS_API_KEY is not set."); return { duration: 0 }; }
  const url = `https://api.elevenlabs.io/v1/text-to-speech/${encodeURIComponent(voiceId)}`;
  let res: Response;
  try {
    res = await fetch(url, {
      method: "POST",
      headers: {
        "xi-api-key": apiKey,
        "Content-Type": "application/json",
        "Accept": "audio/mpeg",
      },
      body: JSON.stringify({
        text,
        model_id: modelId,
        voice_settings: { stability, similarity_boost: similarity },
      }),
    });
  } catch (e) {
    console.log("ElevenLabs TTS request failed:", e);
    return { duration: 0 };
  }
  if (!res.ok) {
    console.log("ElevenLabs TTS HTTP " + res.status + ": " + (await res.text().catch(() => "")));
    return { duration: 0 };
  }
  const buffer = Buffer.from(await res.arrayBuffer());
  const duration = await mp3Duration(buffer);
  // mp3-duration returns undefined if it can't parse the stream; coerce to
  // 0 so downstream code never sees NaN.
  return { buffer, duration: (duration ?? 0) * 1000 };
};

export default elevenLabsTTS;

Step 2 — register the plugin. Open ~/whisplay-ai-chatbot/src/plugin/builtin/tts.ts and add this block alongside the other pluginRegistry.register(...) calls:

pluginRegistry.register({
  name: "elevenlabs",
  displayName: "ElevenLabs TTS",
  version: "1.0.0",
  type: "tts",
  audioFormat: "mp3",
  description: "ElevenLabs text-to-speech (high-quality cloud voices)",
  activate: () => {
    const ttsProcessor = require("../../cloud-api/elevenlabs/elevenlabs-tts").default;
    return { ttsProcessor };
  },
} as TTSPlugin);

Step 3 — .env.

TTS_SERVER=elevenlabs
ELEVENLABS_API_KEY=sk_REPLACE_ME
ELEVENLABS_VOICE_ID=EXAVITQu4vr4xnSDxMaL
ELEVENLABS_MODEL_ID=eleven_turbo_v2_5
ELEVENLABS_STABILITY=0.5
ELEVENLABS_SIMILARITY=0.75

Step 4 — rebuild + restart.

cd ~/whisplay-ai-chatbot
bash build.sh
sudo systemctl restart chatbot.service

Voice IDs: log into elevenlabs.io, open VoiceLab, and copy the ID of any voice you've cloned or one of their stock voices. eleven_turbo_v2_5 is recommended for the Pi Zero 2 WH — it has the lowest latency. Cost is roughly $0.18 per 1000 chars (~7-8 cents per minute of speech).

The .env.template evolves. If your file looks different from this guide, the live template at github.com/PiSugar/whisplay-ai-chatbot/blob/master/.env.template is the source of truth.

Save: Ctrl+X, Y, Enter.

8.2 Build the project #

bash build.sh

This compiles the TypeScript and prepares assets. ~5–10 minutes on a Pi Zero 2 WH.

✅ Checkpoint: build.sh exits cleanly with no errors.

8.3 Configure the WonderEcho wake word #

The WonderEcho module runs its own on-device wake-word detector — the Pi doesn't have to listen. You program the trigger phrase ("Claudia") once over I²C, then the module flags a wake event on the bus whenever it hears the word; the Pi polls that register and starts a recording session each time it fires.

Verify before running. The exact I²C register layout (0x10 as the "set-trigger" opcode below) depends on your WonderEcho firmware revision. Check the Hiwonder WonderEcho wiki for the register map matching your unit before running this — the snippet is the canonical pattern, not a guaranteed copy-paste for every shipping firmware.

# Reference snippet: writes the trigger word to the WonderEcho's "set-trigger"
# register. Confirm the register/opcode against the Hiwonder wiki for your
# firmware revision before relying on this in production.
cd ~/whisplay-ai-chatbot
python3 - <<'PY'
import smbus2 as smbus, time
bus = smbus.SMBus(1)          # I²C bus 1 on the Pi Zero
ADDR = 0x52                    # WonderEcho default — confirm with i2cdetect
WORD = b"claudia"
bus.write_i2c_block_data(ADDR, 0x10, list(WORD) + [0])   # 0x10 = set-trigger
time.sleep(0.2)                                          # let the WonderEcho commit the trigger to its on-board flash before we close the bus
print("Wake word programmed:", WORD.decode())
PY

The chatbot service polls the WonderEcho's wake-event register over I²C and starts a recording session each time the word fires. No Python venv, no openWakeWord, no training.

✅ Checkpoint: speak "Claudia" near the module — journalctl -u chatbot.service -f shows a wake event within ~300 ms.

The exact register map can vary by firmware revision. If your unit reports a different I²C address (verify with i2cdetect -y 1) or uses a different "set-trigger" opcode, check the WonderEcho wiki for the map matching your firmware.

09. Healthcheck #

Before launching the full chatbot, run a 90-second healthcheck that verifies four layers: the WonderEcho is present on the I²C bus, the USB mic is visible to ALSA, the network can reach Anthropic, and your API key + chosen model actually return a response. (The full audio round trip — record, transcribe, speak — is exercised by the manual launch in Part 10.)

Create the script:

nano ~/healthcheck.sh

Paste:

#!/bin/bash
# claudia healthcheck — quick end-to-end smoke test
# Usage: bash ~/healthcheck.sh

set -u
ENV_FILE="$HOME/whisplay-ai-chatbot/.env"
PASS="\033[0;32m✓\033[0m"
FAIL="\033[0;31m✗\033[0m"
exit_code=0

step() { printf "\n%s\n" "── $1 ──"; }
ok()   { printf "  $PASS %s\n" "$1"; }
bad()  { printf "  $FAIL %s\n" "$1"; exit_code=1; }

step "1. WonderEcho module on I2C"
# The WonderEcho is the wake-word frontend and talks to the Pi over I2C bus 1.
# It is NOT an audio device — it never appears in ALSA.
if command -v i2cdetect >/dev/null 2>&1; then
    if i2cdetect -y 1 2>/dev/null | grep -qE ' 5[234] '; then
        ok "WonderEcho detected on I2C bus 1"
    else
        bad "WonderEcho NOT detected on I2C bus 1 (check 4-pin wiring + 'sudo raspi-config nonint do_i2c 0')"
    fi
else
    bad "i2c-tools not installed - run 'sudo apt install -y i2c-tools' (see Part 05.4)"
fi

step "2. USB microphone in ALSA"
# Conversation audio comes from the USB mic — a standard USB Audio Class
# device that must show up as an ALSA capture card (see Part 5.5).
if command -v arecord >/dev/null 2>&1; then
    if arecord -l 2>/dev/null | grep -q '^card '; then
        ok "ALSA capture device present: $(arecord -l 2>/dev/null | grep '^card ' | head -1)"
    else
        bad "no ALSA capture device — is the USB mic in the middle 'USB' port via the OTG adapter? (Part 03 / 5.5)"
    fi
else
    bad "alsa-utils not installed - run 'sudo apt install -y alsa-utils' (see Part 05.3)"
fi

step "3. Network reachability"
# Use HTTPS instead of ping — many networks/APIs drop ICMP but pass TLS.
# A 4xx response still proves we got a real reply from api.anthropic.com.
net_code=$(curl -sS -o /dev/null -w '%{http_code}' --max-time 5 https://api.anthropic.com/ 2>/dev/null || echo "000")
if [ "$net_code" != "000" ]; then
  ok "api.anthropic.com responded (HTTP $net_code)"
else
  bad "cannot reach api.anthropic.com (Wi-Fi, DNS, or TLS issue)"
fi

step "4. Claude API call"
if [ ! -f "$ENV_FILE" ]; then
  bad "$ENV_FILE not found — finish Part 08 first"
else
  # shellcheck disable=SC1090
  set -a; source "$ENV_FILE"; set +a
  if [ -z "${ANTHROPIC_API_KEY:-}" ]; then
    bad "ANTHROPIC_API_KEY is empty in .env"
  else
    response=$(curl -s -w "\n%{http_code}" https://api.anthropic.com/v1/messages \
      -H "x-api-key: $ANTHROPIC_API_KEY" \
      -H "anthropic-version: 2023-06-01" \
      -H "content-type: application/json" \
      -d "{\"model\":\"${ANTHROPIC_MODEL:-claude-haiku-4-5-20251001}\",\"max_tokens\":50,\"messages\":[{\"role\":\"user\",\"content\":\"Say hello in exactly 5 words.\"}]}")
    http_code=$(echo "$response" | tail -n1)
    body=$(echo "$response" | sed '$d')
    if [ "$http_code" = "200" ]; then
      ok "Claude API responded HTTP 200"
      # Prefer jq if available — it handles escaped quotes correctly. Fall
      # back to a grep+sed that breaks on escapes but is good enough for a
      # smoke-test "did Claude reply" sanity check.
      if command -v jq >/dev/null 2>&1; then
        reply=$(echo "$body" | jq -r '.content[0].text // empty' 2>/dev/null)
      else
        reply=$(echo "$body" | grep -o '"text":"[^"]*"' | head -1 | sed 's/"text":"//;s/"$//')
      fi
      echo "  Reply: $reply"
    else
      bad "Claude API returned HTTP $http_code"
      echo "  $body" | head -3
    fi
  fi
fi

echo
if [ $exit_code -eq 0 ]; then
  printf "$PASS All checks passed. You're ready for Part 10.\n"
else
  printf "$FAIL One or more checks failed. Fix above before running the chatbot.\n"
fi
exit $exit_code

Run it:

chmod +x ~/healthcheck.sh
bash ~/healthcheck.sh

✅ Checkpoint: All four sections print green check marks. If anything fails, fix that piece before moving on — running the full chatbot before this passes just makes debugging harder.

10. Run #

Manual launch (foreground, for testing) #

cd ~/whisplay-ai-chatbot
bash run_chatbot.sh

Say "Claudia" — the WonderEcho hears the wake word, the chatbot starts a recording session on the USB mic, you ask your question, and Claude answers out loud. Sessions end automatically after 60 seconds of silence or when you say a stop word (byebye, goodbye, or stop).

Stop the foreground process with Ctrl+C.

Set it to start on boot #

The repo provides an opinionated startup installer that registers a chatbot.service systemd unit and sets the system to multi-user (headless) mode. Use it:

cd ~/whisplay-ai-chatbot
bash startup.sh

After this, the chatbot starts automatically on every boot. Verify:

sudo systemctl status chatbot.service

You should see Active: active (running).

Live logs #

tail -f ~/whisplay-ai-chatbot/chatbot.log
# or
journalctl -u chatbot.service -f

Tuning wake-word reliability #

The WonderEcho exposes a few I²C registers for tuning:

Too many false wakes (TV, conversations) → raise the detection threshold via the threshold register.
Missing real wakes (you have to say it twice) → lower the threshold, or move the module closer to where you sit.

Reference: Hiwonder WonderEcho wiki for the exact register map for your firmware revision.

10.5 Smart-home #

You picked a smart plug. Teach Claudia to flip it by giving the chatbot a tool — a small shell command it can invoke when the user's request matches.

TP-Link Kasa (HS103 / KP125M) — local control via `python-kasa` #

pip install python-kasa --break-system-packages

# Find your plug on the LAN
kasa discover

# Toggle it (replace IP)
kasa --host 192.168.1.42 on
kasa --host 192.168.1.42 off

Wire that into the chatbot by exposing kasa --host <ip> on / off as a tool the LLM can call. No vendor account, no cloud hop — works even if the Kasa cloud is down.

Shelly Plug US — local HTTP #

Find your plug's IP in your router admin or via the Shelly app. Then any HTTP client can flip it:

# On
curl "http://192.168.1.42/relay/0?turn=on"
# Off
curl "http://192.168.1.42/relay/0?turn=off"

No vendor account, no SDK — wire those two curl calls into the chatbot as tools.

Sonoff S31 + Tasmota — local MQTT / HTTP #

Out-of-the-box the S31 uses the eWeLink cloud, which means latency and a dependency on someone else's servers. Reflash with Tasmota (no soldering needed for the S31 — there's a serial header) to expose a local HTTP endpoint:

curl "http://192.168.1.42/cm?cmnd=Power%20On"
curl "http://192.168.1.42/cm?cmnd=Power%20Off"

Slightly more work to flash, but you get full local control + power-usage telemetry over MQTT.

11. Case #

You picked no case. PiSugar publishes free STL files if you change your mind — flip the 3D-printed case config above to FDM or SLA and the right link will appear here.

PiSugar publishes free STL files for case shells:

pi02 Whisplay chatbot case — FDM (filament print)

pi02 Whisplay chatbot case — SLA (resin print)

No printer? Upload the STL to a print service like JLC3DP or Craftcloud — a few dollars shipped.

12. Troubleshooting #

Nothing plays through the speaker #

TTS playback goes to the Pi's default ALSA output (aplay -l shows it), not to the WonderEcho — the WonderEcho's on-board speaker can only play its own canned firmware phrases and cannot render Claude's replies. Check which card playback is routed to in ~/.asoundrc (Part 5.5) and that an actual speaker is attached to it.
Check journalctl -u chatbot.service -f for "TTS" or "speak" lines — if Claude is replying but you hear nothing, the playback device is wrong or muted (alsamixer, F6 to pick the card).

Mic captures silence or garbage #

Run arecord -l — if the USB mic is missing, reseat the OTG adapter in the middle USB port (the corner port is power-only) and check dmesg | tail for USB enumeration errors.
If the card number changed after a reboot (USB enumeration order isn't stable), update hw:1,0 in ~/.asoundrc to match arecord -l, or lock the mic to index 1 via /etc/modprobe.d/alsa-base.conf.
Test in isolation: arecord -d 3 -f S16_LE -r 16000 /tmp/mictest.wav — if this errors, the problem is ALSA config, not the chatbot.
If the wake event never fires (journalctl -u chatbot.service -f stays silent when you speak), that's the WonderEcho, not the mic — the wake word may have been reset on cold boot; re-run the I²C programming snippet from Part 08.3.

Build fails out of memory #

The Pi Zero 2 WH only has 512 MB. Add swap if build.sh gets OOM-killed:

sudo dphys-swapfile swapoff
sudo sed -i 's/^CONF_SWAPSIZE=.*/CONF_SWAPSIZE=1024/' /etc/dphys-swapfile
sudo dphys-swapfile setup
sudo dphys-swapfile swapon

Service won't start #

sudo systemctl status chatbot.service --no-pager
journalctl -u chatbot.service -n 60 --no-pager

Look for the first ERROR line — usually a missing .env key or a wrong path.

Claude API returns 401 #

API key is invalid or expired. Re-copy from console.anthropic.com → API Keys.

Claude API returns 429 #

You're rate-limited. Add credit at console.anthropic.com → Billing.

WonderEcho doesn't respond #

Run i2cdetect -y 1 and confirm the module's address still shows up.
Re-run the wake-word programming script in Part 08.3 — flashes can be lost on cold boots.
Check journalctl -u chatbot.service -f while you speak — if the wake event never fires, the 4-pin I²C cable may have come loose or the module's mic input is occluded.

Wake word triggers on TV / unrelated speech #

Increase the WonderEcho's detection threshold via I²C — see the Hiwonder wiki for the register address on your firmware revision.

Responses feel slow #

Use claude-haiku-4-5-20251001 (Part 07 — it's the recommended default for this reason).
The Pi Zero 2 WH's Wi-Fi antenna is weak. Move it closer to the router.
Local Whisper STT is the slowest step on a Pi Zero 2 WH. If you have a cloud STT key (OpenAI, Google), switching to one of those in .env cuts perceived latency dramatically.

Need to re-run the healthcheck #

bash ~/healthcheck.sh

SD card filling up #

df -h
sudo apt clean
# clear chatbot recordings:
rm -f ~/whisplay-ai-chatbot/data/recordings/*.wav 2>/dev/null

Reference #

WonderEcho module: https://www.hiwonder.com/products/wonderecho
SunFounder USB mini mic: https://www.sunfounder.com/products/mini-usb-microphone
reSpeaker XVF3800 mic array: https://wiki.seeedstudio.com/respeaker_xvf3800_introduction/
Chatbot repo: https://github.com/PiSugar/whisplay-ai-chatbot
Claude API docs: https://docs.claude.com
Claude model catalog: https://docs.claude.com/en/docs/about-claude/models/overview
Pricing: https://anthropic.com/pricing

Summary stack #

Layer	What it is
Hardware	Pi Zero 2 WH + USB mic (SunFounder mini or reSpeaker XVF3800 array) + Hiwonder WonderEcho (I²C wake word) (+ optional PiSugar 3 battery)
OS	Raspberry Pi OS 64-bit
Wake word	"Claudia" — runs on the WonderEcho, no Pi-side listener
Microphone	USB mic via OTG adapter — the default ALSA capture device (the WonderEcho never streams audio)
Speech → text	Local Whisper-cpp, or cloud STT if configured
LLM	Claude API (Anthropic)
Text → speech	OpenAI gpt-4o-mini-tts (recommended), Piper (local), or ElevenLabs (with patch)
Service manager	systemd (`chatbot.service`, set up by `startup.sh`)

Only Claude (and your chosen TTS, if cloud) runs in the cloud. Everything else can run on-device.

Claudia

01. Configure #

02. Shopping list #

core — Required for every build.

mic — The conversation microphone — required, pick one in the configurator. The WonderEcho only handles the wake word; it never streams audio to the Pi.

portable — Optional battery for a roaming build.

smarthome — Optional locally-controllable smart plugs. Lets Claudia run commands like 'Claudia turn on the living room' without any cloud round-trip beyond Claude itself.

03. Assemble #

04. Flash microSD #

4.1 Install Raspberry Pi Imager #

4.2 Flash #

4.3 First boot #

05. System setup #

5.1 Update #

5.2 Free up RAM (Pi Zero only has 512 MB) #

5.3 Install build dependencies #

5.4 Enable I²C and detect the WonderEcho #

5.5 Verify the USB microphone #

06. Install chatbot #

07. API key #

Which model to pick #

08. Configure chatbot #

8.1 Create your .env #

8.2 Build the project #

8.3 Configure the WonderEcho wake word #

09. Healthcheck #

10. Run #

Manual launch (foreground, for testing) #

Set it to start on boot #

Live logs #

Tuning wake-word reliability #

10.5 Smart-home #

TP-Link Kasa (HS103 / KP125M) — local control via python-kasa #

Shelly Plug US — local HTTP #

Sonoff S31 + Tasmota — local MQTT / HTTP #

11. Case #

12. Troubleshooting #

Nothing plays through the speaker #

Mic captures silence or garbage #

Build fails out of memory #

Service won't start #

Claude API returns 401 #

Claude API returns 429 #

WonderEcho doesn't respond #

Wake word triggers on TV / unrelated speech #

Responses feel slow #

Need to re-run the healthcheck #

SD card filling up #

Reference #

Summary stack #

8.1 Create your `.env` #

TP-Link Kasa (HS103 / KP125M) — local control via `python-kasa` #