Claudia — Build Guide (2026.05.19)
A single-path, checkpoint-driven build guide for a Raspberry Pi Zero 2 W + PiSugar Whisplay HAT voice assistant powered by the Claude API. Say "Hey Jarvis" (or train your own "Claudia" wake word — Appendix A), ask anything, and Claude talks back.
What's new in 2026.05.19 (vs. 2026.05.18) #
- Wake word is now the default activation method. Out-of-the-box you say "Hey Jarvis" — that's the closest pre-trained model openWakeWord ships. Want it to actually answer to "Claudia"? Appendix A walks through training a custom model.
- Part 7.3 installs the wake-word engine (openWakeWord in a Python 3.11 pyenv). Adds ~3 minutes to first install; no extra cloud API keys.
- Mic decision flipped. Wake word needs reasonable pickup, so the budget USB mic is now the recommended floor, not the on-board mics.
- Button is still wired up — fall back to it by setting
WAKE_WORD_ENABLED=falsein.env.
What's new in 2026.05.18 (vs. previous) #
- One path, not two. The PiSugar
whisplay-ai-chatbotrepo is purpose-built for this exact hardware and is the right choice for a Pi Zero 2 W. - Verified install commands against the live upstream repos.
- Checkpoint tests at the end of each Part.
- First-run
healthcheck.shscript (Part 8). - systemd unit dropped — the repo's
startup.shsets upchatbot.serviceproperly. - Mic decision moved to the front as a 3-question flowchart.
- Cost tiers cut from 4 to 2. Desktop or portable.
Pick your build before you buy #
Two questions decide everything.
Q1 — Where will this device sit?
- On my desk, within arm's reach → desktop build (cheaper, simpler)
- Roaming around the house / outside / unplugged → portable build (add the PiSugar 3 battery)
Q2 — How far away will you be when you talk to it?
Because wake-word activation is on by default, the mic has to reliably hear "Hey Jarvis" without you leaning into it. The on-board Whisplay mics technically work but you'll be repeating yourself.
- Within 3 feet, quiet room, OK with occasional misses → on-board Whisplay mics (no extra purchase). Acceptable only if you'll mostly use the button.
- 4–10 feet, normal room (recommended floor) → add the SunFounder USB mini mic (~$13) + an OTG adapter
- Across the room, possibly noisy, "Alexa-class" pickup → add the reSpeaker XVF3800 mic array (~$60) + an OTG adapter — its on-board AEC + beamforming dramatically improves wake-word reliability.
That's it. Now build the cart.
Part 1: Shopping list #
Core build (required) #
| Item | Why | ~Price | Notes |
|---|---|---|---|
| Raspberry Pi Zero 2 WH | The brain. The "H" matters — it has the GPIO header pre-soldered, which the Whisplay HAT needs. Don't buy the bare "Pi Zero 2 W" or you'll need to solder 40 pins yourself. | ~$22 | Search Amazon or Adafruit for "Raspberry Pi Zero 2 WH" |
| PiSugar Whisplay HAT | LCD, speaker, dual mic, RGB LED, buttons — everything except the CPU. | ~$36 | Direct from PiSugar's site or Amazon |
| microSD card, 32 GB Class 10 (SanDisk Ultra or equivalent) | The "hard drive." 16 GB works but 32 GB gives you headroom. | ~$9 | Any reputable brand |
| Official Raspberry Pi 12.5W micro-USB power supply (5V/2.5A) | The Pi Zero 2 W is rated for 5V/2.5A. Phone chargers will seem to work and then cause random crashes. Buy the official one or a known-good 2.5A+ supply. | ~$10 | Pi Hut, Adafruit, official resellers |
Pick-your-mic (optional) #
| Item | When to buy | ~Price |
|---|---|---|
| SunFounder USB Mini Mic | If you need 4–10 ft pickup. | ~$13 |
| Seeed reSpeaker XVF3800 USB Mic Array | If you want across-the-room pickup with built-in echo cancellation, beamforming, and noise suppression. | ~$60 |
| micro-USB OTG adapter | Required if you bought either USB mic above — the Pi Zero only has a micro-USB data port. | ~$7 |
Portable (optional) #
| Item | When to buy | ~Price |
|---|---|---|
| PiSugar 3 1200 mAh battery | If you want to unplug and carry it around. Snaps onto the Pi via magnetic pogo pins — no soldering. | ~$40 |
Totals #
| Build | What you get | Total |
|---|---|---|
| Desktop | Core + Whisplay's mics + wall power | ~$77 |
| Desktop + better mic (budget) | + SunFounder + OTG | ~$97 |
| Desktop + better mic (premium) | + reSpeaker XVF3800 + OTG | ~$144 |
| Portable + premium mic | All of the above + PiSugar 3 battery | ~$184 |
Prices fluctuate. Verify before checkout. Where I've named vendors, search by product name rather than trusting a link.
What you do not need #
- A separate speaker (the Whisplay has one)
- A monitor or keyboard (we do the whole setup over SSH from your PC)
- A USB hub (one USB mic on the OTG port is fine)
Part 2: Assemble the hardware #
Total time: ~5 minutes. No soldering.
- Do not insert the microSD yet. Flash it first in Part 3.
- Align the Whisplay HAT's 40-pin socket with the Pi Zero 2 WH's GPIO header pins. The Whisplay's buttons should be on the same side as the Pi's USB ports.
- Press the HAT down firmly and evenly until fully seated. Hold the PCB edges; do not press on the glass LCD. Press the LCD and it cracks.
- Peel off the LCD protective film.
- (Optional, portable build) Snap the PiSugar 3 battery onto the underside of the Pi using its magnetic pogo pins.
Final stack (top → bottom): Whisplay HAT → Pi Zero 2 WH → PiSugar 3 (optional)
✅ Checkpoint: The stack feels solid, the LCD is exposed and undamaged, nothing wobbles.
Part 3: Flash the microSD card #
3.1 Install Raspberry Pi Imager #
Download from raspberrypi.com/software (Windows, macOS, Linux).
3.2 Flash #
- Open Raspberry Pi Imager.
- Choose Device →
Raspberry Pi Zero 2 W. - Choose OS →
Raspberry Pi OS (other)→ Raspberry Pi OS (64-bit) (the full version, not Lite).- The chatbot repo's install script expects packages from the full image. Lite will work but you'll need extra apt installs and may hit surprises.
- Choose Storage → your microSD card.
- Click the gear icon (⚙) for Edit Settings and configure:
- Hostname:
claudia - Username:
pi - Password: something secure
- Enable SSH: ✅ password auth
- Wireless LAN: SSID + password for your home Wi-Fi
- Locale:
America/Chicago, keyboardus
- Hostname:
- Save, then Write. Takes 2–5 minutes.
3.3 First boot #
Insert the microSD into the Pi.
Plug the official power supply into the
PWR INmicro-USB port (the one nearest the corner, labeledPWR INon the silkscreen). Not the middle port labeledUSB.Wait 60–90 seconds.
From your PC:
ssh pi@claudia.localIf
claudia.localdoesn't resolve, find the Pi's IP in your router's admin page and usessh pi@192.168.x.x.
✅ Checkpoint: You see the pi@claudia:~ $ prompt. Run cat /etc/os-release and confirm it says Debian/Raspberry Pi OS. Run free -h — you should see ~430 MB of Mem: (the Pi Zero 2 W has 512 MB total).
Part 4: System setup #
Run these from the SSH session. One at a time. Wait for each to finish.
4.1 Update #
sudo apt update && sudo apt full-upgrade -y
This takes 5–15 minutes on a Pi Zero. Be patient.
4.2 Free up RAM (Pi Zero only has 512 MB) #
The Pi Zero 2 W is RAM-constrained. Disable services you don't need:
# Disable Bluetooth (not used by this build)
sudo systemctl disable hciuart bluetooth
# Disable triggerhappy (gamepad daemon, not needed)
sudo systemctl disable triggerhappy
4.3 Install build dependencies #
sudo apt install -y git curl build-essential python3-pip python3-venv \
portaudio19-dev libsndfile1 ffmpeg alsa-utils libatlas-base-dev
4.4 Install the Whisplay HAT driver (LCD + audio + buttons + LEDs) #
This is the official PiSugar driver. The install script also enables the I2C, SPI, and I2S buses automatically.
cd ~
git clone https://github.com/PiSugar/Whisplay.git --depth 1
cd Whisplay/Driver
sudo bash install_wm8960_drive.sh
sudo reboot
Wait ~60 seconds, then SSH back in:
ssh pi@claudia.local
✅ Checkpoint 1 — driver loaded:
aplay -l
You should see a card whose name contains wm8960. If not, the driver didn't load — re-run the install script and re-check.
✅ Checkpoint 2 — speaker works:
speaker-test -t sine -f 440 -l 1 -D plughw:CARD=wm8960soundcard
You should hear a one-second 440 Hz beep from the Whisplay's speaker. If you hear nothing, run alsamixer, press F6, pick the wm8960 card, and turn up the playback levels.
✅ Checkpoint 3 — on-board mic works:
arecord -d 5 -f cd /tmp/mic_test.wav && aplay /tmp/mic_test.wav
Record 5 seconds, then play it back. You should hear your voice. If it's silent or garbled, raise the "Capture" levels in alsamixer (F4 toggles between Playback and Capture).
If all three checkpoints pass and you're using the on-board mic, skip to Part 5. Otherwise, do Part 4.5 first.
Part 4.5: Configure a USB microphone (skip if using on-board mics) #
Plug it in #
- Power down:
sudo shutdown -h nowand unplug power. - Connect the OTG adapter to the Pi's data micro-USB port — that's the middle one labeled
USB, notPWR IN. - Plug the USB mic into the OTG adapter.
- Power back on, SSH in.
Verify it's detected #
arecord -l
You should now see two capture cards:
card 0: wm8960soundcard ...
card 1: ArrayUAC10 / Mini Mic ... (your USB mic — name varies)
If card 1 is missing: try a different OTG cable (some are charge-only), or make sure you're in the data port not the power one. Run lsusb and confirm the device shows up.
Test recording from the USB mic #
arecord -D plughw:1,0 -d 5 -f cd /tmp/usb_test.wav
aplay /tmp/usb_test.wav
If it's quiet, run alsamixer, press F6 to switch to the USB card, F4 for Capture controls, and raise it to ~80%.
Make the USB mic the system default capture device #
This way the chatbot picks it up automatically — speaker stays on the Whisplay.
nano ~/.asoundrc
Paste:
# Playback on Whisplay speaker (card 0), capture on USB mic (card 1)
pcm.!default {
type asym
playback.pcm {
type plug
slave.pcm "hw:0,0"
}
capture.pcm {
type plug
slave.pcm "hw:1,0"
}
}
ctl.!default {
type hw
card 1
}
Save with Ctrl+X, Y, Enter.
Pin the card numbering across reboots #
Linux's USB card numbers can swap on reboot. Lock the USB mic to card 1:
echo "options snd_usb_audio index=1" | sudo tee /etc/modprobe.d/alsa-base.conf
Reboot:
sudo reboot
Verify the default config works end-to-end #
ssh pi@claudia.local
arecord -d 5 -f cd /tmp/default_test.wav && aplay /tmp/default_test.wav
This should record from the USB mic and play through the Whisplay speaker — without specifying any device flags.
✅ Checkpoint: Recording and playback both use the right devices via the system default.
reSpeaker XVF3800 note: The on-board AEC, beamforming, and noise suppression run automatically. No extra software needed. For raw multi-channel access or firmware tweaks, see the Seeed reSpeaker XVF3800 wiki.
Part 5: Install the chatbot software #
This is the PiSugar whisplay-ai-chatbot repo. It's purpose-built for the Pi Zero 2 W + Whisplay combo. Say "Hey Jarvis" → ask your question → Claude answers out loud. The on-board button still works as a fallback.
cd ~
git clone https://github.com/PiSugar/whisplay-ai-chatbot.git
cd whisplay-ai-chatbot
bash install_dependencies.sh
source ~/.bashrc
The dependency install pulls Node.js, Python packages, and audio libraries. This takes 15–25 minutes on a Pi Zero 2 W. Let it finish.
The
source ~/.bashrcline is important — the installer sets PATH entries you need in your current shell session.
✅ Checkpoint: install_dependencies.sh finishes without errors. Test that Node is on PATH:
node --version
You should see v20.x or similar.
Part 6: Get a Claude API key #
- Go to console.anthropic.com and sign in (or create an account).
- Add a payment method and put a small amount of credit on the account (e.g., $5 — that lasts a long time on Haiku).
- Navigate to API Keys → Create Key.
- Name it
claudia. Copy the key now — you can't see it again later. - Treat the key like a password.
Approximate cost: Casual personal use on claude-haiku-4-5-20251001 typically runs a few dollars per month at most. Check current pricing at anthropic.com/pricing.
Which model to pick #
| Model ID | Speed | Quality | When to use |
|---|---|---|---|
claude-haiku-4-5-20251001 |
Fastest | Good | Default for this device. Latency matters more than essay-grade prose for a voice assistant. |
claude-sonnet-4-6 |
Medium | Excellent | If you want richer answers and don't mind a slightly slower response. |
claude-opus-4-7 |
Slowest | Best | Overkill for spoken Q&A. Use for hard reasoning tasks only. |
Model IDs change over time. The current list lives at docs.claude.com.
Part 7: Configure the chatbot #
7.1 Create your .env #
cd ~/whisplay-ai-chatbot
cp .env.template .env
nano .env
The template ships with many fields for different ASR/LLM/TTS providers. For a Claude-based build, you need the LLM section set to Anthropic and the wake-word section enabled. Find and set:
# === LLM (the AI brain) ===
LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-YOUR-KEY-HERE
ANTHROPIC_MODEL=claude-haiku-4-5-20251001
# === System prompt — shapes the assistant's voice ===
SYSTEM_PROMPT=You are a concise, friendly voice assistant. Answer in plain spoken English — no markdown, no bullet lists, no headings. Keep responses to 1–3 sentences unless the user explicitly asks for more.
# === Wake word ===
# Default is "Hey Jarvis" — the closest pre-trained openWakeWord model. To
# answer to "Claudia" instead, train a custom model (Appendix A), drop the
# .tflite onto the Pi, and set WAKE_WORDS=claudia + WAKE_WORD_MODEL_PATHS=...
# To disable wake word entirely and fall back to the button, set
# WAKE_WORD_ENABLED=false.
WAKE_WORD_ENABLED=true
WAKE_WORDS=hey_jarvis
WAKE_WORD_PYTHON_PATH=/home/pi/.pyenv/versions/python311/bin/python
WAKE_WORD_THRESHOLD=0.5
WAKE_WORD_IDLE_TIMEOUT_SEC=60
WAKE_WORD_END_KEYWORDS=byebye,goodbye,stop
For speech-to-text (ASR) and text-to-speech (TTS), the template defaults usually work. If you want fully local (no extra API keys), pick whisper for ASR and piper for TTS. If you want higher-quality cloud STT/TTS, see the wiki for OpenAI / Google / Volcengine options — each needs its own API key.
The
.env.templateevolves. If your file looks different from this guide, the live template at github.com/PiSugar/whisplay-ai-chatbot/blob/master/.env.template is the source of truth.
Save: Ctrl+X, Y, Enter.
7.2 Build the project #
bash build.sh
This compiles the TypeScript and prepares assets. ~5–10 minutes on a Pi Zero 2 W.
✅ Checkpoint: build.sh exits cleanly with no errors.
7.3 Install the wake-word engine (openWakeWord) #
The chatbot's wake-word frontend is openWakeWord. Raspberry Pi OS ships Python 3.12, but openWakeWord needs 3.11 for its prebuilt wheels — so we install Python 3.11 in a pyenv sandbox to avoid touching the system Python.
# 1. Build prerequisites + pyenv installer
sudo apt install -y build-essential make gcc \
libssl-dev zlib1g-dev libbz2-dev libreadline-dev \
libsqlite3-dev curl llvm libncursesw5-dev xz-utils \
tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev git
curl https://pyenv.run | bash
# 2. Wire pyenv into your shell (one-time)
cat >> ~/.bashrc <<'EOF'
# pyenv (for openWakeWord)
export PATH="$HOME/.pyenv/bin:$PATH"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
EOF
source ~/.bashrc
# 3. Build Python 3.11 (~10–15 min on a Pi Zero — go get a coffee)
pyenv install 3.11.0
pyenv virtualenv 3.11.0 python311
pyenv activate python311
# 4. Install openWakeWord + download the model files
pip install openwakeword "numpy<2"
python -c "import openwakeword.utils as u; u.download_models()"
pyenv deactivate
The .env line WAKE_WORD_PYTHON_PATH=/home/pi/.pyenv/versions/python311/bin/python (set in 7.1 above) is what the chatbot uses to spawn the wake-word listener in this sandbox.
✅ Checkpoint: the model files exist on disk.
ls ~/.pyenv/versions/python311/lib/python3.11/site-packages/openwakeword/resources/models/ | head
You should see filenames like hey_jarvis_v0.1.tflite, alexa_v0.1.tflite, etc.
Part 8: First-run sanity check #
Before launching the full chatbot, run a 90-second healthcheck that verifies every layer end-to-end: speaker, mic, network, Claude API.
Create the script:
nano ~/healthcheck.sh
Paste:
#!/bin/bash
# claudia healthcheck — quick end-to-end smoke test
# Usage: bash ~/healthcheck.sh
set -u
ENV_FILE="$HOME/whisplay-ai-chatbot/.env"
PASS="\033[0;32m✓\033[0m"
FAIL="\033[0;31m✗\033[0m"
exit_code=0
step() { printf "\n%s\n" "── $1 ──"; }
ok() { printf " $PASS %s\n" "$1"; }
bad() { printf " $FAIL %s\n" "$1"; exit_code=1; }
step "1. Audio devices"
aplay -l | grep -q wm8960 && ok "wm8960 playback card detected" || bad "wm8960 NOT detected (driver issue?)"
arecord -l | grep -q card && ok "at least one capture card detected" || bad "no capture card detected"
step "2. Speaker test (1s beep)"
speaker-test -t sine -f 440 -l 1 -s 1 >/dev/null 2>&1 \
&& ok "speaker-test completed (did you hear a beep?)" \
|| bad "speaker-test failed"
step "3. Mic test (3s record-and-replay)"
echo " (speak for 3 seconds now…)"
arecord -d 3 -f cd /tmp/hc_mic.wav >/dev/null 2>&1
[ -s /tmp/hc_mic.wav ] && ok "captured audio file written" || bad "no audio captured"
aplay /tmp/hc_mic.wav >/dev/null 2>&1 && ok "playback OK (did you hear yourself?)" || bad "playback failed"
step "4. Network reachability"
ping -c 1 -W 3 api.anthropic.com >/dev/null 2>&1 \
&& ok "api.anthropic.com is reachable" \
|| bad "cannot reach api.anthropic.com (Wi-Fi or DNS issue)"
step "5. Claude API call"
if [ ! -f "$ENV_FILE" ]; then
bad "$ENV_FILE not found — finish Part 7 first"
else
# shellcheck disable=SC1090
set -a; source "$ENV_FILE"; set +a
if [ -z "${ANTHROPIC_API_KEY:-}" ]; then
bad "ANTHROPIC_API_KEY is empty in .env"
else
response=$(curl -s -w "\n%{http_code}" https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d "{\"model\":\"${ANTHROPIC_MODEL:-claude-haiku-4-5-20251001}\",\"max_tokens\":50,\"messages\":[{\"role\":\"user\",\"content\":\"Say hello in exactly 5 words.\"}]}")
http_code=$(echo "$response" | tail -n1)
body=$(echo "$response" | sed '$d')
if [ "$http_code" = "200" ]; then
ok "Claude API responded HTTP 200"
echo " Reply: $(echo "$body" | grep -o '"text":"[^"]*"' | head -1 | sed 's/"text":"//;s/"$//')"
else
bad "Claude API returned HTTP $http_code"
echo " $body" | head -3
fi
fi
fi
echo
if [ $exit_code -eq 0 ]; then
printf "$PASS All checks passed. You're ready for Part 9.\n"
else
printf "$FAIL One or more checks failed. Fix above before running the chatbot.\n"
fi
exit $exit_code
Run it:
chmod +x ~/healthcheck.sh
bash ~/healthcheck.sh
✅ Checkpoint: All five sections print green check marks. If anything fails, fix that piece before moving on — running the full chatbot before this passes just makes debugging harder.
Part 9: Run the chatbot #
Manual launch (foreground, for testing) #
cd ~/whisplay-ai-chatbot
bash run_chatbot.sh
The LCD lights up with status. Say "Hey Jarvis" — Claudia plays a chime to indicate she's listening, you ask your question, Claude answers out loud. The session ends automatically after 60 seconds of silence (WAKE_WORD_IDLE_TIMEOUT_SEC) or when you say a stop word — by default byebye, goodbye, or stop.
You can also press the on-board button to wake her without the keyword — useful when there's background noise the wake-word detector is missing.
Stop the foreground process with Ctrl+C.
Set it to start on boot #
The repo provides an opinionated startup installer that registers a chatbot.service systemd unit and sets the system to multi-user (headless) mode. Use it:
cd ~/whisplay-ai-chatbot
bash startup.sh
After this, the chatbot starts automatically on every boot. Verify:
sudo systemctl status chatbot.service
You should see Active: active (running).
Live logs #
tail -f ~/whisplay-ai-chatbot/chatbot.log
# or
journalctl -u chatbot.service -f
Tuning wake-word reliability #
- Too many false wakes (TV, conversations) → raise
WAKE_WORD_THRESHOLDin.envfrom0.5toward0.7. - Missing real wakes (you have to say it twice) → drop the threshold toward
0.3, or move closer / get a better mic. - Chime too aggressive → tweak
WAKE_WORD_COOLDOWN_SEC.
Falling back to button-only #
If wake word is too unreliable for your room or you'd rather not have a CPU listener running 24/7:
cd ~/whisplay-ai-chatbot
sed -i 's/^WAKE_WORD_ENABLED=.*/WAKE_WORD_ENABLED=false/' .env
sudo systemctl restart chatbot.service
The button activation continues to work either way.
Part 10: Optional case (3D-printed) #
PiSugar publishes free STL files for case shells:
No printer? Upload the STL to a print service like JLC3DP or Craftcloud — a few dollars shipped.
Part 11: Troubleshooting #
Nothing plays through the speaker #
aplay -lshould listwm8960. If not, re-run the driver install in Part 4.4.- Run
alsamixer→ F6 → wm8960 card → confirmSpeakeris unmuted (noMMlabel) and above 0%.
Mic captures silence or garbage #
- Run
arecord -l, confirm the card you expect is listed. - Run
alsamixer→ F6 → mic card → F4 (Capture) → raise to ~80%. - For a USB mic: re-check Part 4.5; the OTG cable being charge-only is a common gotcha.
USB mic disappears after reboot #
- Linux can renumber cards. The
/etc/modprobe.d/alsa-base.confline in Part 4.5 pins it. If you skipped that, do it now.
Build fails out of memory #
- The Pi Zero 2 W only has 512 MB. Add swap if
build.shgets OOM-killed:sudo dphys-swapfile swapoff sudo sed -i 's/^CONF_SWAPSIZE=.*/CONF_SWAPSIZE=1024/' /etc/dphys-swapfile sudo dphys-swapfile setup sudo dphys-swapfile swapon
Service won't start #
sudo systemctl status chatbot.service --no-pager
journalctl -u chatbot.service -n 60 --no-pager
Look for the first ERROR line — usually a missing .env key or a wrong path.
Claude API returns 401 #
- API key is invalid or expired. Re-copy from console.anthropic.com → API Keys.
Claude API returns 429 #
- You're rate-limited. Add credit at console.anthropic.com → Billing.
Wake word never triggers #
- Confirm
WAKE_WORD_ENABLED=truein.envand the service was restarted after the change. - Confirm the Python sandbox exists and the path matches
WAKE_WORD_PYTHON_PATH:ls /home/pi/.pyenv/versions/python311/bin/python - Check the model file is downloaded:
ls ~/.pyenv/versions/python311/lib/python3.11/site-packages/openwakeword/resources/models/ | grep -i jarvis - Lower
WAKE_WORD_THRESHOLDtoward0.3in.env. On a noisy mic the default0.5can be too strict. - Watch the live log while saying the wake word — if you see no detection events at all, the listener isn't running. Look for Python tracebacks at service start:
journalctl -u chatbot.service -n 100.
Wake word triggers on TV / unrelated speech #
- Raise
WAKE_WORD_THRESHOLDtoward0.7. - The
hey_jarvismodel is liberal by design. A custom-trained "Claudia" model (Appendix A) usually has fewer false positives because you only train against your own voice samples.
Responses feel slow #
- Use
claude-haiku-4-5-20251001(Part 6 — it's the recommended default for this reason). - The Pi Zero 2 W's Wi-Fi antenna is weak. Move it closer to the router.
- Local Whisper STT is the slowest step on a Pi Zero. If you have a cloud STT key (OpenAI, Google), switching to one of those in
.envcuts perceived latency dramatically.
Need to re-run the healthcheck #
bash ~/healthcheck.sh
SD card filling up #
df -h
sudo apt clean
# clear chatbot recordings:
rm -f ~/whisplay-ai-chatbot/data/recordings/*.wav 2>/dev/null
Reference #
- Whisplay HAT driver: https://github.com/PiSugar/Whisplay
- Chatbot repo: https://github.com/PiSugar/whisplay-ai-chatbot
- Chatbot wiki (wake word, image gen, battery display): https://github.com/PiSugar/whisplay-ai-chatbot/wiki
- Pre-built SD card images: https://github.com/PiSugar/whisplay-ai-chatbot/wiki (skips most of Parts 4–7)
- Claude API docs: https://docs.claude.com
- Claude model catalog: https://docs.claude.com/en/docs/about-claude/models/overview
- Pricing: https://anthropic.com/pricing
Summary stack #
| Layer | What it is |
|---|---|
| Hardware | Pi Zero 2 WH + PiSugar Whisplay HAT (+ optional PiSugar 3 battery, USB mic) |
| OS | Raspberry Pi OS 64-bit |
| Audio driver | WM8960 (Whisplay) |
| Activation | Wake word (hey_jarvis by default; trained claudia via Appendix A) + button fallback |
| Wake-word engine | openWakeWord in a Python 3.11 pyenv |
| Speech → text | Local Whisper, or cloud STT if configured |
| LLM | Claude API (Anthropic) |
| Text → speech | Piper (local) or cloud TTS if configured |
| Service manager | systemd (chatbot.service, set up by startup.sh) |
Only Claude runs in the cloud. Everything else can run on-device if you want it to.
Appendix A: Train your own "Claudia" wake word #
openWakeWord ships with pre-trained models for hey_jarvis, hey_mycroft, alexa, and hey_rhasspy. There is no pre-trained "Claudia" model — the only way to get the device to actually answer to its name is to train one yourself. This is a one-time chore: ~30 minutes of recording, ~15–30 minutes of training on a free Colab GPU, then a one-line .env swap on the Pi.
What you need #
- A laptop or desktop with a working microphone (recording the positive samples).
- A free Google Colab account (the training runs on a free T4 GPU — no local GPU required).
- ~45 minutes total.
A.1 Collect ~100 positive samples ("Claudia") #
The model learns from your voice. Record yourself saying "Claudia" naturally — same room and mic conditions you'll use the device in, ideally the same person who'll use it most.
Use any recorder that emits 16-bit, 16 kHz mono WAV. Audacity works fine: Tracks → Add New → Mono Track, set the project rate to 16000 Hz, record 1 second of "Claudia", export as WAV, repeat.
Aim for ~100 clips, each ~1 second long, with natural variation:
- Different distances from the mic (close, normal, across-the-room)
- Different intonations (statement, question, sleepy, energetic)
- Some with light background noise (TV at low volume, fan)
- A couple of "almost-claudia" pronunciations (drawing out the a, swallowing the i) to make the model robust
Save them all into a folder like claudia_samples/.
A.2 Run the openWakeWord training notebook #
Open the official training Colab linked from the openWakeWord repo: https://github.com/dscripka/openWakeWord → "Training new models" section. The notebook is named openwakeword_training_tutorial.ipynb.
In the notebook:
- Runtime → Change runtime type → GPU.
- Upload your
claudia_samples/folder into the Colab session (Files panel → upload). - Set the wake-word name to
claudiaand point the positive-samples path at your uploaded folder. - Keep the synthetic-data generator enabled — it'll fabricate thousands of additional "claudia" pronunciations using a TTS model, which is what actually makes a hundred real samples enough.
- Run all cells. Training takes 15–30 minutes on the free T4.
- When it's done, the notebook writes
claudia.tflite(orclaudia.onnx, depending on the notebook revision). Download it.
A.3 Drop the model onto the Pi #
From your PC:
scp claudia.tflite pi@claudia.local:/home/pi/wakeword/
(Create the dir first if it doesn't exist: ssh pi@claudia.local mkdir -p /home/pi/wakeword)
A.4 Switch .env to the new model #
SSH in and edit ~/whisplay-ai-chatbot/.env:
WAKE_WORDS=claudia
WAKE_WORD_MODEL_PATHS=/home/pi/wakeword/claudia.tflite
Or do it from your Windows PC via the Console:
.\Claudia.Console.bat set-wakeword claudia
# (then SSH in once to set WAKE_WORD_MODEL_PATHS — the console doesn't
# write that key since it depends on where you put the .tflite)
Restart the service:
sudo systemctl restart chatbot.service
A.5 Sanity check #
journalctl -u chatbot.service -f
Say "Claudia" — you should see a detection event in the log and hear the chime. If false positives are too frequent in the first day of use, retrain with more "hard negative" samples (your voice saying close-but-not-claudia words) and bump WAKE_WORD_THRESHOLD toward 0.6–0.7.
CPU note: Custom single-keyword models are often less CPU-hungry than the multi-keyword default, so wake-word listening typically gets cheaper after this switch — useful on a Pi Zero 2 W.
Built for MindAttic LLC — 2026.05.19
Parts gallery
Click a card to jump to the best-known buy URL (chosen via find-deals if you've picked one, otherwise the Amazon tier).