RagnarAAC
A personalised, gamified AAC app for my 5-year-old son. Standard AAC treats the button press as the goal; this one treats it as the trigger for a multi-modal reward that beats the dopamine barrier.
- Voices
- 14
- Schema
- OBF
- Frontend
- PWA
- Deploy
- In progress
RagnarAAC is an Augmentative and Alternative Communication app I’m building for my five-year-old son. Most AAC apps treat a phrase coming out of the tablet as the end state. For a kid who isn’t yet motivated to communicate verbally, that framing skips the real problem: every button press has a cognitive cost, and the reward has to beat that cost. Every single time.
The dopamine barrier
A typical AAC button press produces a flat synthesised voice reading a word. That’s functionally correct and motivationally dead. For a kid weighing up whether the effort is worth it, “the tablet said the word” isn’t enough payoff to keep choosing to press the button next time.
RagnarAAC reframes the press as a trigger for a multi-modal payoff:
- A character voice reads the phrase: robot, dragon, ghost, giant, chipmunk, or my own cloned voice reading it back as Dad.
- A particle burst lights up on the tile, scaled to the chosen voice and tied to the same accent palette as the tile itself.
- A smart-home effect fires in the real room: lights change colour, scenes shift, “monster mode” kicks in. Connected via Home Assistant webhooks.
The voice stack
A self-hosted FastAPI backend runs on a GPU-enabled LXC container alongside the local LLM. It exposes a single synthesis endpoint and dispatches to one of two engines per request.
- 13 character voices come from a single Piper model put through per-voice sox effect chains: pitch shifts, formant tweaks, reverb, ring modulation, the usual toy box. One small model, thirteen distinct characters, near zero load time.
- The dad voice is a clone of mine via XTTS-v2, trained on a short reference clip. Heavier model, heavier load.
- Engine dispatch keeps the XTTS model unloaded until the dad voice is actually requested. Character-voice requests stay snappy; the GPU only pays the XTTS cost when it’s earning it.
The frontend
The tablet runs a PWA built with React, Vite and Framer Motion. The board schema is OBF-compatible so existing AAC vocabularies port in, and folders nest naturally. The picker exposes all 14 voices.
A hidden “edit mode” sits behind a long-press, separate from the kid-safe tap-only surface. In edit mode I can drag tiles to reorder, rename, repaint, or attach a smart-home webhook payload to any tile. Out of edit mode the app is locked to tap-only: there’s nothing to accidentally drag, delete, or rename.
Home Assistant integration
Every tile can carry an optional webhook URL and JSON payload. Pressing the tile fires the webhook in parallel with the voice synthesis, so the light flips colour at the same moment the phrase plays. One parameterised Home Assistant automation backs all eight colour buttons via a single payload field in the editor, rather than eight near-duplicate automations.
# one automation, eight colour buttons
- alias: aac_colour_scene
trigger:
- platform: webhook
webhook_id: aac_colour
allowed_methods: [POST]
action:
- service: light.turn_on
target:
entity_id: light.ragnar_room
data:
color_name: "{{ trigger.json.colour }}"
brightness_pct: "{{ trigger.json.brightness | default(80) }}"
transition: 0.4Wyoming bridge
Sitting in front of the same synthesis backend is a Wyoming-protocol bridge that exposes all 14 voices to Home Assistant as a first-class TTS entity. HA automations and Assist pipelines can now speak in the cloned dad voice on any room’s speaker: doorbell announcements, bedtime reminders, the “you left the fridge open” nag.
Same backend, two consumers. The tablet press path uses the HTTP API directly; the smart-home speak-into-the-room path goes through Wyoming.
Production path
The tablet hits the backend over the LAN today. The public route is a single-origin deploy behind Caddy and Cloudflare Tunnel ataac.tech2urdoor.org, so the PWA and the synthesis API share an origin and the service worker stays happy. Same Cloudflare tunnel pattern as the rest of the site; that part is in progress.
Why the tradeoffs landed where they did
- PWA over Flutter. Iteration speed wins when the user is five and the feedback loop is “rebuild, hand him the tablet, watch what he does.” Web tooling lets me change a tile and reload faster than any native cycle.
- Hidden edit mode. Mixing edit affordances with tap-only would mean either drag handles that confuse the press target, or an admin pane that the kid finds. Separating them keeps the kid surface clean.
- Engine dispatch on the backend. Loading XTTS eagerly would burn VRAM that the local LLM container also wants. Lazy load means the dad voice has a small first-press warmup; every other voice is instant.
- OBF schema. Picking the open AAC interchange format up front means existing vocabularies port in, and his board can move to another AAC app the day he outgrows this one.