Designing a Reliable
AI Safety System
AI systems fail when decisions are implicit.
Kyrah fixed this by making them explicit:
Detection → Risk assessment → Response selection
Model correctness did not equal user trust.
The Problem
Early versions of Kyrah exposed a gap that's easy to miss: the model was often correct, but the system response still felt wrong. Harmful patterns were identified — but responses defaulted to generic validation. Similar inputs produced inconsistent tone and guidance. Subtle but serious signals went unescalated.
Users felt acknowledged. But not understood. And more critically, they had no framework to interpret what they were experiencing.
The model was correct. The system still failed the user.
In a safety context, this is not a UX issue — it is a reliability failure. Users could feel validated, but still fail to recognize harmful patterns in their situation. At scale, inconsistent responses can reinforce confusion instead of helping users recognize harmful patterns. In safety contexts, this can delay recognition of harmful situations — which is a system-level failure, not a response issue.
Without this, high-risk and low-risk situations were treated similarly, leading to inconsistent guidance and reduced user trust.
The Key Insight
Improving model accuracy alone did not fix inconsistent behavior. The issue was that the system had no explicit decision layer.
We chose to introduce a structured decision layer between classification and response generation, accepting increased system complexity and latency to ensure predictable behavior in high-risk scenarios.
We shifted from generating responses to designing a system that decides.
System Design
This architecture emerged through iterative failure analysis — each layer added in direct response to a failure mode we identified in real usage. This required aligning engineering and product around explicit system behavior rather than relying on model outputs alone.
The Decision Layer is where system behavior is explicitly controlled — not inferred. This is what separates a model wrapper from an AI system.
The Key Tradeoff
Every system design involves a tradeoff. This one was deliberate.
Chose deterministic decision rules over generative flexibility to ensure consistent behavior, even at the cost of additional system complexity and slightly slower responses. In a safety-critical product, inconsistency is more damaging than latency.
Before vs. After
The same user input. Two different system behaviors.
The difference is not the model. The difference is whether the system makes a decision
before it generates a response.
This changed how users interpreted their situation — not just how they felt about it.
This shifted the product from an empathetic chatbot to a decision-support system users could rely on.
Results
This reduced inconsistent system behavior in production and improved user trust in responses.
up from 88%
via guardrails + RAG
from improved system behavior
Users didn't just receive better responses — they began to recognize patterns in their situations and act with greater clarity. That is the measurable outcome of designing at the system level.
Reflection
What this taught me
Model correctness does not translate into user trust. Trust comes from consistent system behavior — not isolated correct outputs.
The most valuable thing I learned building Kyrah was the distinction between a model that performs and a system that behaves. That difference only shows up in production, under real-world conditions, with real users who need to act on what they're given.
Most systems break here — not because the model fails, but because the system never defined how it should behave. This project was about making that gap visible — and then building a layer to close it.