AIIncidentTracker
Editorial

How we track AI incidents — and how we keep the database accurate.

Accuracy is the only moat in incident data. Every entry on AIIncidentTracker is grounded in primary sources, passes through automated language and source checks, and is reviewed by a named editor before publishing. This page documents the standards in full.

Last updated May 18, 2026

What we track

We cover documented incidents involving AI systems — including LLM hallucinations, autonomous-vehicle accidents, algorithmic hiring bias, facial-recognition misidentifications, deepfake fraud, prompt-injection attacks, AI-generated child-safety harms, and recommender-system harms. The scope mirrors the OECD AI Incidents Monitor and the AI Incident Database (AIID), with editorial enrichment that maps each incident to its applicable laws, related lawsuits, and relevant compliance vendors.

We do not cover hypothetical risks, speculative future harms, or AI capabilities that have not been observed in deployed systems.

How we count

  • One event, one incident. Multiple reports of the same underlying event are merged into a single record with all sources attached.
  • Multi-harm incidents appear under each applicable harm category on `/by-harm/` archive pages.
  • Inclusion threshold: at least one primary source (per the source hierarchy below) must document the incident.
  • Exclusions: threatened or pre-incident demand letters, rumored incidents without verifiable reporting, social-media-only claims with no follow-up reporting, and incidents that cannot be told without identifying minors.

Source requirements

Every factual claim on every incident page links to at least one source. Sources are tiered:

TierExamples
PrimaryCourt filings via CourtListener / PACER, official agency reports (NHTSA, FTC, EU AI Office, state AGs), corporate statements published by the company itself, peer-reviewed research, the AI Incident Database (AIID), AIAAIC, OECD AIM.
SecondaryReuters, AP, WSJ, NYT, Bloomberg, BBC, FT, court reporters, and other verifiable mainstream news with named bylines.
TertiaryTweets, Reddit threads, blog posts. Cited only as "claimed via [platform]" when no primary or secondary source exists. Never count toward the inclusion threshold.

Schema-level enforcement: each incident requires source_count ≥ 1 with at least one source tagged tier: "primary".

How an incident enters the database

  1. Ingestion. New records are surfaced daily from the AIID GraphQL feed, the OECD AIM dataset, AIAAIC, and a first-party news-monitoring layer.
  2. Deduplication. Fuzzy matching against existing records by date (±7 days), company name, AI system, and harm type. Score >0.85 flags as candidate duplicate for human review.
  3. LLM enrichment. A 200-word editorial TL;DR and 500-word structured description are generated using mandated hedge language. Severity, harm type, AI failure mode, NIST AI RMF mapping, and OECD harm classification are extracted.
  4. Automated language check. Every draft passes through a hedge-language linter that blocks publication if it finds bare assertions of wrongdoing without an attribution wrapper (e.g., "X caused" with no "is reported to" or "according to" preceding it).
  5. Source archival. Every source URL is submitted to the Internet Archive Wayback Machine so the record remains defensible if the source later changes or disappears.
  6. Human editorial review. A named editor reads the enrichment, edits hedge language, confirms cross-links, and sets status to alleged, confirmed, company-disputed, resolved, or ongoing. The audit trail (editor ID + timestamp) is stored on every record.
  7. Publication. The record goes live, the sitemap updates, and the Google Indexing API is notified.

How incidents stay current

An incident isn't done when it's filed:

  • Last verified date is displayed prominently on every incident page.
  • Re-verification cadence: incidents from the last 6 months are re-checked monthly. Older incidents are re-checked at least annually.
  • Status changes (alleged → confirmed, ongoing → settled) populate a "Recently Updated" feed on the homepage and trigger a corrections-log entry when material.
  • Annual source-link sweep: every source URL is re-checked. Broken links auto-fallback to the archive.org snapshot. If no snapshot exists, the source is flagged and the incident may be downgraded in status if it was the only primary source.

As the site grows, we will publish the cadence we actually hit — not one we wish we did.

Severity and harm taxonomy

We use established standards rather than invented frameworks.

Severity (5 levels)

  • Critical — death, major injury, civil-rights violation at scale
  • High — significant financial harm, individual rights violation, major reputational damage
  • Medium — limited harm, isolated incident
  • Low — minor issue, theoretical risk realized at small scale
  • Near-miss — potential harm avoided

Harm types — OECD AI Harm Taxonomy

Physical · Environmental · Economic/Financial · Reputational · Public Interest · Human & Fundamental Rights · Psychological. Multiple harm types can apply to one incident.

AI failure modes

Distribution shift · Edge case · Adversarial input · Hallucination · Bias amplification · Inadequate training data · Specification gaming · Inadequate human oversight · Sensor/perception failure · Other.

Coverage and known gaps

  • U.S. coverage is the most comprehensive layer because primary-source court records are most accessible here (PACER, CourtListener, state e-filing).
  • EU / UK incidents are covered when an official regulator report, court filing, or major-publisher news story is available.
  • Other jurisdictions are included case-by-case when at least one primary source is verifiable in English or one of the editorial team's working languages.
  • Known gap: incidents reported only on social media without subsequent mainstream coverage. We err toward excluding these rather than amplifying unverified claims.

Editorial standards

  • Primary sources only count toward inclusion thresholds.
  • Hedge language is mandatory for any factual assertion against a named party. See our full editorial standards.
  • Named-individual policy: public figures may be named with their role and sourced allegation. Private individuals are redacted to initials + role + jurisdiction unless they have publicly self-identified and multiple primary sources confirm the name. Minors are never named.
  • Rulings are quoted verbatim with docket citations.
  • No editorializing on active incidents. We state facts, link to sources, and let readers draw conclusions.
  • Settlement amounts are labeled "claimed" vs. "awarded" to distinguish complaints from judgments.

Editorial independence

  • We do not accept payment from named parties to influence how an incident is covered.
  • Newsletter sponsorships are clearly labeled, capped per issue, and never embedded in incident coverage.
  • Vendor referrals to AIComplianceVendors may produce revenue but never determine whether a vendor appears on an incident page — appearance is determined by failure-mode and AI-system-type matching, not commercial relationship.

What we don't do

  • We do not give legal advice. Anything on this site is informational only.
  • We do not source from press releases, blog posts, or unverified summaries.
  • We do not run paid placement on incident prominence or company profiles.
  • We do not editorialize on active incidents.
  • We do not publish incidents that require identifying minors to tell.

Worked examples

Three illustrative walkthroughs of how a raw report becomes a verified entry. Each uses an active incident in the database so you can compare the source URL to the published record.

1. Airline chatbot refund misstatement — read the record

  1. Primary source identified. A small-claims tribunal decision is published. We treat tribunal decisions as primary because they carry a fact-finder's analysis under oath.
  2. Hedge linting at intake. The first draft of the canonical title read “Airline chatbot lied to customer.” The linter blocked “lied,” which presupposes intent. Final title uses “misstatement” — a factual, hedged term.
  3. Status starts at “alleged.” Even with a tribunal decision, the airline's public response was “disagree with the ruling, evaluating options.” We flipped to “resolved” only after the appeal window closed.
  4. Cross-reference scan. Our sister-site resolver matched this to a relevant law on AILawsbyState and added the cross-link. No matching lawsuit on AILawsuitTracker yet; we'd link one if filed.
  5. Severity “medium.” Monetary harm under $5,000, no class implications, narrow factual misstatement. Not “high” because no bodily injury, no class-wide damages.

2. Resume-screening age-bias class action — read the record

  1. Filed complaint as primary. The plaintiffs' class-action complaint is the primary source. Court documents, even unproven, are primary because they're sworn filings.
  2. Status “alleged.” A filed complaint is an allegation, not a finding. The status would flip to “confirmed” only on a court ruling or admission.
  3. Cross-portfolio link. Our resolver matched related lawsuits on AILawsuitTracker and added the docket cross-link.
  4. Harm types: “fundamental rights” + “economic.” Disparate impact is both a civil-rights claim and a real economic harm. Both tags surface the record on the relevant archive pages.
  5. Plaintiffs not characterized. We retain the lead plaintiff's name (it's on the docket) but never characterize the plaintiff's personal history beyond what the filing states.

3. Autonomous vehicle pedestrian injury — read the record

  1. NTSB preliminary report as primary. We never lead with the news version when the federal investigator has published preliminaries.
  2. Severity “critical.” A pedestrian injury at this severity is treated as an irreversible physical harm. We do not auto-publish “critical” records through the enrichment lane; this one required human approval on every field.
  3. Named individuals. The pedestrian's name was withheld; the safety driver's name appeared in the NTSB report. We retained the latter because federal release made it public, but the record's body avoids any characterization of the driver beyond NTSB language.
  4. Status “ongoing.” The NTSB final report had not landed at publish time. We'll flip to “confirmed” when it does and add it as primary source.

For the full audit trail on any incident, click “View audit history” in the editorial admin. Every status change, severity update, and re-verification is signed and dated.

Corrections & contact

Spotted an error or missing incident? File a correction via the public removal & correction process or email editorial@aiincidenttracker.com. Acknowledgements are sent within one business day. All resolved changes are logged publicly in the corrections log.