Case Study — Kyrah AI

Designing a Reliable
AI Safety System

Model correctness did not equal user trust. Here is how we redesigned a system — not a model — to make consistent decisions under uncertainty.

Role
AI Product Manager
Timeline
8 wks MVP → iterative
Focus
Classification · Guardrails · RAG

01

The Problem

Early versions of Kyrah exposed a gap that's easy to miss: the model was often correct, but the system response still felt wrong. Harmful patterns were identified — but responses defaulted to generic validation. Similar inputs produced inconsistent tone and guidance. Subtle but serious signals went unescalated.

Users felt acknowledged. But not understood. And more critically, they had no framework to interpret what they were experiencing.

The model was correct. The system still failed the user.


02

The Key Insight

Most AI pipelines follow a simple path: Input → Model → Output. This breaks down the moment context matters, signals are ambiguous, or responses require interpretation rather than generation.

We weren't looking at a model problem. We were looking at a system design problem. The fix wasn't better model outputs — it was a layer that made explicit decisions before a response was ever generated.

We shifted from generating responses to designing a system that decides.


03

System Design

This architecture emerged through iterative failure analysis — each layer added in direct response to a failure mode we identified in real usage.

System Architecture — Kyrah AI
User Input raw signal
Detection LLM Classification
Decision Layer ← controls system behavior
Response Generation context-aware
Validation Layer guardrails + RAG
Final Output consistent · interpretable

The Decision Layer is where system behavior is explicitly controlled — not inferred. This is what separates a model wrapper from an AI system.


04

The Key Tradeoff

Every system design involves a tradeoff. This one was deliberate.

We chose
Consistency
Predictable behavior in high-risk scenarios required explicit decision logic, even at the cost of added system complexity.
We gave up
Speed
Additional latency was an acceptable cost. Inconsistent behavior in a safety-critical context was not.

In a safety-critical product, inconsistency is more damaging than latency. This wasn't a difficult call — but it required naming it explicitly and designing for it. This added complexity and latency to the system, but ensured consistent behavior in high-risk scenarios where variable responses were not acceptable.


05

Before vs. After

The same user input. Two different system behaviors.

"I feel like he keeps confusing me and I don't know what's real anymore."
Before — model-driven
"That sounds confusing and difficult…"
Generic validation
No pattern surfaced
No interpretive frame
After — decision-driven
"When someone's words and actions don't line up consistently, it can create confusion and make you question your own perception. Over time, that makes it harder to trust what you remember."
Identifies pattern explicitly
Explains the mechanism
Enables user interpretation
Reduces cognitive confusion

The difference is not the model. The difference is whether the system makes a decision before it generates a response.


06

Results

97%
System accuracy
up from 88%
<1%
Hallucination rate
via guardrails + RAG
+8%
User retention
from improved system behavior

Users didn't just receive better responses — they began to recognize patterns in their situations and act with greater clarity. That is the measurable outcome of designing at the system level.


07

Reflection

What this taught me

Model correctness does not translate into user trust. Trust comes from consistent system behavior — not isolated correct outputs.

The most valuable thing I learned building Kyrah was the distinction between a model that performs and a system that behaves. That difference only shows up in production, under real-world conditions, with real users who need to act on what they're given.

Most teams never design for that gap. This project was about making that gap visible — and then building a layer to close it.

Next Case Study →
Food Spy AI — Turning AI Output into Usable Insight
When users don't struggle with the data — they struggle with knowing what to do with it.