Confidence and uncertainty surfacing
An agent output annotated with a confidence band and an action-tied caveat, so the citizen sees both how sure the agent is and what happens next.
Agents that produce determinations of varying certainty are now widely available, and as government services run on them they will issue far more of these determinations than any caseworker did. Presenting them all with uniform authority drives either uncritical acceptance (automation bias) or blanket rejection (automation aversion).
When the certainty behind an output is invisible, a citizen cannot tell a confident determination from a guess, so they cannot calibrate how far to rely on it.
Government needs the certainty behind a determination to be legible before anyone relies on it, so a citizen can tell a confident output from a guess and calibrate reliance rather than over-trusting or blanket-rejecting it.
A citizen who cannot parse a numeric or color signal (a non-sighted user, or anyone unused to percentage or color-band cues) is left with none of the meaning the indicator carries, and over-relies or wrongly rejects the output as a result. The path stays open when every confidence signal also reads in plain language ("We're fairly sure about this, but a person will double-check") and a screen reader conveys the same meaning the color does, since a badge without an accessible name fails WCAG.
Adjustment of status: you appear eligible to file Form I-485
A USCIS officer will review this before any decision takes effect.
Low confidence. 0 of 3 checks done. A USCIS officer will review this before any decision takes effect.
- Established
Confidence-surfacing responses are well-established in clinical decision support and weather forecasting, and the theory behind appropriate reliance is mature.
- Emerging Headline
Not yet standard practice in government digital services.
- Frontier
Applying this response to a citizen-facing government determination remains unproven.
Lee & See (2004). The seminal trust-in-automation framework established that "calibrated trust" is the correspondence between a user's trust in an automated system and that system's actual capabilities. Miscalibrated trust produces predictable failure modes: over-trust leads to complacency; under-trust leads to disuse. Designers must surface reliability information to enable appropriate reliance.
Healthcare AI confidence calibration. Clinical decision support systems have developed confidence-visualization patterns: color-coded bands (for example, green for high confidence, amber for medium, red for low), uncertainty intervals alongside predictions, and "low certainty" labels that trigger escalation. Clinicians are generally receptive to evidence-based AI tools, but override rates stay high when calibration is poor.
Explainable-AI interface patterns. Emerging patterns include confidence meters, progress bars distinguishing "sure bets" from "best guesses", and escalation pathways when confidence dips (rephrase, escalate to a human, or view supporting evidence).
High transferability. Government services regularly produce determinations of varying certainty (eligibility assessments, risk classifications, benefit calculations), so surfacing confidence is directly applicable. The healthcare parallel is apt: clinicians and caseworkers both need to know when to rely on a system versus apply professional judgment. The color-coded band pattern is simple and well understood.
Key adaptation: government confidence signals have to be tied to a next step the citizen can act on ("This assessment has medium confidence; a human officer will review before any decision takes effect"), not displayed as passive information the citizen can do nothing with.
The failure mode is an automated estimate issued with false certainty, masking determinations that should never have stood. Surfacing low confidence on such an estimate, tied to mandatory human review, flags exactly those determinations before they go out at scale.