6.1 Emerging

Clustering and deduplication views for high-volume submissions

A reviewer console that groups semantically similar submissions into argument clusters with a visible distinctness ratio, each cluster expandable to the underlying submissions.

01 Emerging Challenges

Agencies receiving hundreds of thousands or millions of written submissions (public comments, but equally consultation responses, grant applications, or planning objections) cannot meaningfully review each one individually. Mass submission campaigns (historically postcard campaigns, now orchestrated online) produce near-identical submissions that inflate raw counts without adding substantive argumentation. Reviewers need tools that surface distinct arguments rather than re-reading the same template thousands of times.

02 Assurance

Government needs to read every distinct argument in a body of submissions without re-reading the duplicates that mass campaigns produce, so a reviewer can be confident no unique position was missed when the volume is too high to read each one. The confidence comes from grouping submissions by what they argue, not from judging which were machine-written.

03 Access

Deduplicate for analysis, never for exclusion: clustering helps reviewers find distinct arguments efficiently but never removes a submission from the record. Participants and the public can see how clustering was performed and verify no substantive argument was lost.

04 Response surface
Service design Considered
The response this pattern proposes

A clustered submission view that groups submissions by natural-language similarity and collapses near-identical ones into a single distinct comment, so a reviewer reads each unique argument once. In the CDO Council worked example it reduced 267 near-identical submissions to 9 distinct comments, with no submission deleted from the record.

No surface has been built yet; the approach above is the brief for one.

05 Maturity
Emerging

Emerging

06 Precedents

CDO Council Public Comment Analysis Pilot (US, 2020–present). The Federal Chief Data Officers Council, working with OIRA and GSA, developed and piloted NLP-based tools that cluster duplicate and semantically similar comments for expert review. The tool recognises topics and themes, groups semantically similar submissions, and surfaces them for subject-matter expert review; the report's worked example collapsed 267 near-identical submissions to 9 distinct comments. The CDO Council subsequently published recommendations for implementing these tools federal-wide.

ICF / Regulations.gov Gen AI Comment Processing (US, 2024–present). ICF, working with GSA on Regulations.gov, has deployed generative AI to accelerate public comment analysis, moving beyond spreadsheet-based manual review toward automated clustering and theme extraction.

Citizen Space / Delib (UK, Australia, NZ). Delib's Citizen Space platform offers tagging and coding of qualitative responses, cross-referencing across questions, and AI-powered first-pass analysis that identifies themes and sentiment. The platform is widely used across UK, Australian, and New Zealand government consultations.

07 Transferability

High. Clustering and deduplication are infrastructure-level capabilities that any digital submission or consultation system should offer (the comment-analysis precedents transfer directly to grant, objection and petition intake). The CDO Council's approach is designed as a reusable, federal-wide toolset and could be adapted by other jurisdictions.

The key design question is transparency: participants and the public should be able to see how clustering was performed and verify that no substantive arguments were lost.

08 Where things go wrong

Deduplication is analytical, not exclusionary, so on its own it creates no large-scale adverse harm. The safeguard is that clustering surfaces arguments for human review rather than substituting an automated decision for it.

09 Sources
5 references US · UK