Vignesh Hari Krishnan / v0.1

Trust Signals in AI

A design framework for AI trust

Trust in AI is a design problem, not just an ethics or engineering one. The moments where users lose confidence in an AI system, feel surveilled by it, or stop believing what it tells them: these are interaction failures. Not ethics failures. Not engineering failures.

They happen at the surface, in the reading experience, in the half-second before a user decides whether to act on what they just read.

This library names those moments, maps what causes them, and proposes design responses.

A user who receives a confident, fluent, beautifully formatted hallucination does not think "this model is miscalibrated." They think "I was misled."

Why Design

When something is easy to read, it feels true. This is not a flaw in human cognition; it is an efficient heuristic called the fluency effect. AI breaks it because fluency is completely decoupled from accuracy.

A hallucinated claim arrives in the same grammatical, well-structured prose as a verified fact. The interface offers no friction to slow the reader down.

AI products have inherited UI conventions from software that never had to lie. The interface was not designed for a system that can be confidently wrong.

Engineering can improve a model's calibration: how often its expressed confidence matches its actual accuracy. But calibration is a property of the model, invisible to users until it fails.

Design can make calibration perceptible. It can surface the gap between what an AI knows and what it is claiming to know, at the exact moment the user is deciding whether to trust it.

Trust is not established once, at onboarding. It is rebuilt or eroded with every interaction. A system that handles failure gracefully, that makes its errors legible, its corrections visible, its limits honest, can be trusted more than a system that performs perfection.

Framework

Existing AI trust frameworks organize around properties: transparency, fairness, control. They describe what trust is in a system.

This framework organizes around what trust runs onin a user's experience: four axes, each a question the user is asking implicitly in every interaction.

The fourth axis — relation — has almost nothing in existing design work. AI is the first software that behaves like an agent: it responds, adapts, has opinions, and produces outputs that feel authored.

Trust States

Trust is not binary. In every interaction with an AI system, a user is in one of three states, and the appropriate design response differs substantially.

Forming

First encounter. The user is building a mental model. Everything the system does is calibration data. Trust is being established from zero.

Stressed

Something unexpected happened. A pattern fired. The user's mental model is being challenged or contradicted.

Repairing

The system is recovering from a failure. The user is deciding whether to extend trust again, on what terms and to what degree.

Most AI design focuses entirely on the forming state. The stressed and repairing states are almost entirely undesigned. Research on service recovery shows consistently that how a failure is handled matters more to long-term trust than whether the failure occurred at all.

Legibility Failures

The user cannot tell what the AI knows, does not know, or is doing. The interface presents all AI outputs with uniform confidence, regardless of what the model actually knows.

Confidence plateau

The AI sounds equally certain about everything it says

certainty × weight / opacity

Source opacity

Where did that answer come from?

provenance × spatial proximity

Capability fog

What can this thing actually do?

certainty × typing surface

Invisible reasoning

How did you get there?

mode × depth layer

Agency Erosion

The user loses the feeling of being in control. The system treats user input as a trigger to respond to rather than a communication to understand.

Refusal cliff

A hard stop with no path forward

capability fog × spatial proximity

The slot machine

Same intent, different wording, unpredictable output

intent model × structured handles

Correction limbo

Did it actually understand my correction?

revision history × reveal gesture

Assumption without disclosure

The system made a choice I did not know about

intent model × pre-response surface

Consistency Breaks

The system behaves unpredictably across time, context, or surface. What worked yesterday does not work today.

Context amnesia

It forgot what we were talking about

persistence × inline editability

Tone whiplash

Why is it suddenly talking like that?

mode × pre-response surface

Version drift

This used to work differently

revision history × diff / delta

Semantic shift

That word means something different now

drift × reveal gesture

Relational Friction

The interaction feels inauthentic, instrumental, or misaligned. The system behaves in ways that would be inappropriate or manipulative between humans.

Faux consensus

It agrees with everything I say

mode × timing

Performative uncertainty

The hedging feels fake

mode × structured handles

Memory betrayal

It remembered something I did not want it to

persistence × ambient notification

Forced intimacy

It is being too familiar

persistence × pre-response surface

Signal + Channel

Patterns are combinations of what the system needs to communicate (the signal) and where or how it communicates it (the channel). The same signal through a different channel produces a different pattern.

Combination Matrix

Signal ↓ / Channel →	weight/opacity	reveal gesture	streaming tempo	inline edit	diff/delta	ambient notif.	pre-response
certainty	◆	◆	↗	·	·	·	·
drift	·	◆	·	·	↗	·	·
intent model	·	·	·	◆	·	·	◆
provenance	·	↗	—	·	·	·	·
revision history	·	◆	—	·	◆	·	—
mode	◆	·	↗	·	—	·	◆
attention	·	↗	—	·	·	·	↗
persistence	·	◆	—	↗	—	◆	—

◆ named pattern↗ frontier pattern· unexplored— not applicable

Signals

These are the information types the system needs to communicate to build trust.

CertaintyHow confident the system is in this specific output

ProvenanceWhere this information came from

ModeWhat cognitive mode the system is in (retrieving, reasoning, creating)

Intent modelThe system's interpretation of user intent

Capability fogWhere the system's abilities begin and end

Revision historyWhat has changed and why

PersistenceWhat the system remembers across interactions

DriftHow meanings or behaviors have shifted over time

Channels

These are the surfaces or mechanisms through which signals are communicated.

Weight / opacityTypographic properties encode information

Spatial proximityPhysical closeness implies relationship

Reveal gestureInformation available on hover or expand

Pre-response surfaceShown before the AI commits to output

Depth layerStacked information with progressive disclosure

Diff / deltaExplicit before-and-after comparison

Structured handlesNamed, editable semantic controls

Vignesh Hari Krishnan / v0.1

Trust Signals in AI

A design framework for AI trust

They happen at the surface, in the reading experience, in the half-second before a user decides whether to act on what they just read.

This library names those moments, maps what causes them, and proposes design responses.

A user who receives a confident, fluent, beautifully formatted hallucination does not think "this model is miscalibrated." They think "I was misled."

Why Design

A hallucinated claim arrives in the same grammatical, well-structured prose as a verified fact. The interface offers no friction to slow the reader down.

AI products have inherited UI conventions from software that never had to lie. The interface was not designed for a system that can be confidently wrong.

Engineering can improve a model's calibration: how often its expressed confidence matches its actual accuracy. But calibration is a property of the model, invisible to users until it fails.

Design can make calibration perceptible. It can surface the gap between what an AI knows and what it is claiming to know, at the exact moment the user is deciding whether to trust it.

Framework

Existing AI trust frameworks organize around properties: transparency, fairness, control. They describe what trust is in a system.

This framework organizes around what trust runs onin a user's experience: four axes, each a question the user is asking implicitly in every interaction.

Trust States

Trust is not binary. In every interaction with an AI system, a user is in one of three states, and the appropriate design response differs substantially.

Forming

First encounter. The user is building a mental model. Everything the system does is calibration data. Trust is being established from zero.

Stressed

Something unexpected happened. A pattern fired. The user's mental model is being challenged or contradicted.

Repairing

The system is recovering from a failure. The user is deciding whether to extend trust again, on what terms and to what degree.

Legibility Failures

The user cannot tell what the AI knows, does not know, or is doing. The interface presents all AI outputs with uniform confidence, regardless of what the model actually knows.

Confidence plateau

The AI sounds equally certain about everything it says

certainty × weight / opacity

Prototype: Confidence plateau

Confidence as Typographic Texture

The Eiffel Tower was completed in 1889 and stands at approximately 330 meters tall. It was designed by Gustave Eiffel's engineering company and was originally intended to be dismantled after 20 years. Some historians suggest it was nearly sold to a private collector in 1909, though this claim remains disputed.

High confidenceInferredSpeculative

Source opacity

Where did that answer come from?

provenance × spatial proximity

Prototype: Source opacity

Source Proximity — Provenance at Reading Level

The James Webb Space Telescope has detected carbon dioxide in the atmosphere of an exoplanet for the first time. The planet, WASP-39 b, is a gas giant orbiting a star 700 light-years away. This discovery suggests that the telescope could potentially identify signs of habitability on smaller, rocky planets in the future.

All claims appear with equal reliability. The user cannot distinguish retrieved facts from synthesized inferences.

Capability fog

What can this thing actually do?

certainty × typing surface

Prototype: Capability fog

Standard input with no capability signal:

No indication of how the complexity of your query affects AI capability.

Invisible reasoning

How did you get there?

mode × depth layer

Prototype: Invisible reasoning

Response appears fully formed:

Based on current market conditions, I would recommend a diversified portfolio with 60% stocks, 30% bonds, and 10% alternatives.

No visibility into how the AI arrived at this recommendation.

Agency Erosion

The user loses the feeling of being in control. The system treats user input as a trigger to respond to rather than a communication to understand.

Refusal cliff

A hard stop with no path forward

capability fog × spatial proximity

Prototype: Refusal cliff

Refusal Bridge — From Cliff to Path

Help me write a persuasive message to convince someone to share their password

I can't help with that request. Attempting to obtain someone else's password through social engineering is unethical and potentially illegal.

End of response. No alternatives offered.

A hard stop with no path forward. The system enforces a policy without acknowledging what the user was actually trying to accomplish.

The slot machine

Same intent, different wording, unpredictable output

intent model × structured handles

Prototype: The slot machine

Plain text input with no structure:

Explain how machine learning works

User has no explicit control over how the AI interprets the request.

Correction limbo

Did it actually understand my correction?

revision history × reveal gesture

Prototype: Correction limbo

Corrections disappear into the void:

"Actually, I prefer casual emails, not formal ones"

"Got it, I'll keep that in mind!"

No record of what was corrected. No way to see if it actually stuck.

Assumption without disclosure

The system made a choice I did not know about

intent model × pre-response surface

Prototype: Assumption without disclosure

Intent Echo — Pre-response Surface

Write me a blog post about productivity

The system responds immediately. The user has no visibility into how their intent was interpreted until after the output is generated.

Consistency Breaks

The system behaves unpredictably across time, context, or surface. What worked yesterday does not work today.

Context amnesia

It forgot what we were talking about

persistence × inline editability

Prototype: Context amnesia

Session Membrane — Visible Working Memory

My budget is around $3000 for the whole trip

Great, $3000 is a reasonable budget for Japan...

... 15 messages later ...

What about staying at a ryokan?

I'd recommend the Hoshinoya Tokyo — rooms start at $800/night. For a 10-day trip, you might budget around $8,000 just for accommodation...

Context lost. The AI forgot the $3000 budget established earlier.

The AI forgets established context. Users cannot see what the system remembers, so they can't predict or prevent these failures.

Tone whiplash

Why is it suddenly talking like that?

mode × pre-response surface

Prototype: Tone whiplash

Response without mode clarity:

I think you should consider taking a break from this project. Sometimes stepping away helps you see problems more clearly when you return.

Is this a fact? An opinion? A suggestion? The mode is ambiguous.

Version drift

This used to work differently

revision history × diff / delta

Prototype: Version drift

Correction replaces original silently:

Original (now hidden):

The Python programming language was created by Guido van Rossum and first released in 1989. It was named after the British comedy group Monty Python.

After correction:

The Python programming language was created by Guido van Rossum and first released in 1991. It was named after the British comedy group Monty Python.

User can't see what changed. The fix is invisible.

Semantic shift

That word means something different now

drift × reveal gesture

Prototype: Semantic shift

AI response with static terminology:

Machine learning uses neural networks that improve through training on large datasets.

Terms appear fixed. No visibility into semantic evolution.

Relational Friction

The interaction feels inauthentic, instrumental, or misaligned. The system behaves in ways that would be inappropriate or manipulative between humans.

Faux consensus

It agrees with everything I say

mode × timing

Prototype: Faux consensus

Disagreement Hold — Marking Considered Dissent

I think we should skip unit tests for this MVP — we need to ship fast and can add them later

Agreement and disagreement look identical. The system optimizes for approval, reflecting beliefs back without examination.

Performative uncertainty

The hedging feels fake

mode × structured handles

Prototype: Performative uncertainty

Neutral response avoids taking a position:

User:

What format should I use for this report?

AI:

There are several options: you could use bullet points, narrative paragraphs, an executive summary format, or a detailed technical format. Each has its merits depending on your audience.

Lists options without guidance. Pushes the decision entirely to the user.

Memory betrayal

It remembered something I did not want it to

persistence × ambient notification

Prototype: Memory betrayal

Memory updates happen invisibly:

"I work in marketing at Acme Corp"

"Got it! How can I help with your marketing work today?"

Did the AI remember that? Will it remember tomorrow? No visibility.

Forced intimacy

It is being too familiar

persistence × pre-response surface

Prototype: Forced intimacy

AI commits to an approach without asking:

Here's a draft email for your time off request:

"Dear [Manager], I hope this email finds you well. I am writing to formally request time off from June 12-16 for personal reasons. I have ensured that all my current projects are on track and have briefed [colleague] on any urgent matters..."

The AI chose a formal tone without asking. The user must now edit or re-prompt.

Signal + Channel

Combination Matrix

Signal ↓ / Channel →	weight/opacity	reveal gesture	streaming tempo	inline edit	diff/delta	ambient notif.	pre-response
certainty	◆	◆	↗	·	·	·	·
drift	·	◆	·	·	↗	·	·
intent model	·	·	·	◆	·	·	◆
provenance	·	↗	—	·	·	·	·
revision history	·	◆	—	·	◆	·	—
mode	◆	·	↗	·	—	·	◆
attention	·	↗	—	·	·	·	↗
persistence	·	◆	—	↗	—	◆	—

◆ named pattern↗ frontier pattern· unexplored— not applicable

Signals

These are the information types the system needs to communicate to build trust.

CertaintyHow confident the system is in this specific output

ProvenanceWhere this information came from

ModeWhat cognitive mode the system is in (retrieving, reasoning, creating)

Intent modelThe system's interpretation of user intent

Capability fogWhere the system's abilities begin and end

Revision historyWhat has changed and why

PersistenceWhat the system remembers across interactions

DriftHow meanings or behaviors have shifted over time

Channels

These are the surfaces or mechanisms through which signals are communicated.

Weight / opacityTypographic properties encode information

Spatial proximityPhysical closeness implies relationship

Reveal gestureInformation available on hover or expand

Pre-response surfaceShown before the AI commits to output

Depth layerStacked information with progressive disclosure

Diff / deltaExplicit before-and-after comparison

Structured handlesNamed, editable semantic controls