4.1 Emerging

Confidence and uncertainty surfacing

An agent output annotated with a confidence band and an action-tied caveat, so the citizen sees both how sure the agent is and what happens next.

01 Emerging Challenges

Agents that produce determinations of varying certainty are now widely available, and as government services run on them they will issue far more of these determinations than any caseworker did. Presenting them all with uniform authority drives either uncritical acceptance (automation bias) or blanket rejection (automation aversion).

When the certainty behind an output is invisible, a citizen cannot tell a confident determination from a guess, so they cannot calibrate how far to rely on it.

02 Assurance

Government needs the certainty behind a determination to be legible before anyone relies on it, so a citizen can tell a confident output from a guess and calibrate reliance rather than over-trusting or blanket-rejecting it.

03 Access

A citizen who cannot parse a numeric or color signal (a non-sighted user, or anyone unused to percentage or color-band cues) is left with none of the meaning the indicator carries, and over-relies or wrongly rejects the output as a result. The path stays open when every confidence signal also reads in plain language ("We're fairly sure about this, but a person will double-check") and a screen reader conveys the same meaning the color does, since a badge without an accessible name fails WCAG.

04 Response surface

Interaction design

Your assistant's eligibility check

Adjustment of status: you appear eligible to file Form I-485

Low confidence0 of 3 checks done

A USCIS officer will review this before any decision takes effect.

Checks the assistant wants from you

You are currently in F-1 student status

Inferred from one I-20. Your status may have changed since the assistant last saw it.

You have kept continuous lawful presence since August 2021

Estimated from your last filing, not confirmed against entry and exit records.

An approved I-140 petition is on file for you

Assumed from your employer's note, not yet matched to a receipt number.

The certainty score (policy)

determination: status_adjustment_eligibility
score: 0.45  // illustrative
band: "low"
threshold: 0.85
review_required: true
unverified_inputs: ["status", "presence", "petition"]

The band the citizen sees and the score behind it are the same fact, two ways. Verifying an input the assistant was unsure of raises the score and, once it crosses the threshold, lifts the human-review gate.

The response this pattern proposes

The determination's internal certainty score is shown as a green, amber, or red band alongside a next-step sentence reading "medium confidence: a human officer will review before any decision takes effect".

05 Maturity

Established

Confidence-surfacing responses are well-established in clinical decision support and weather forecasting, and the theory behind appropriate reliance is mature.
Emerging Headline

Not yet standard practice in government digital services.
Frontier

Applying this response to a citizen-facing government determination remains unproven.

06 Precedents

Lee & See (2004). The seminal trust-in-automation framework established that "calibrated trust" is the correspondence between a user's trust in an automated system and that system's actual capabilities. Miscalibrated trust produces predictable failure modes: over-trust leads to complacency; under-trust leads to disuse. Designers must surface reliability information to enable appropriate reliance.

Healthcare AI confidence calibration. Clinical decision support systems have developed confidence-visualization patterns: color-coded bands (for example, green for high confidence, amber for medium, red for low), uncertainty intervals alongside predictions, and "low certainty" labels that trigger escalation. Clinicians are generally receptive to evidence-based AI tools, but override rates stay high when calibration is poor.

Explainable-AI interface patterns. Emerging patterns include confidence meters, progress bars distinguishing "sure bets" from "best guesses", and escalation pathways when confidence dips (rephrase, escalate to a human, or view supporting evidence).

07 Transferability

High transferability. Government services regularly produce determinations of varying certainty (eligibility assessments, risk classifications, benefit calculations), so surfacing confidence is directly applicable. The healthcare parallel is apt: clinicians and caseworkers both need to know when to rely on a system versus apply professional judgment. The color-coded band pattern is simple and well understood.

Key adaptation: government confidence signals have to be tied to a next step the citizen can act on ("This assessment has medium confidence; a human officer will review before any decision takes effect"), not displayed as passive information the citizen can do nothing with.

08 Where things go wrong

The failure mode is an automated estimate issued with false certainty, masking determinations that should never have stood. Surfacing low confidence on such an estimate, tied to mandatory human review, flags exactly those determinations before they go out at scale.

09 Sources

4 references US