The Verification Tax: Every AI Output Needs Checking

The AI generated a citation: Contreras, Ghosh & Perez (2025) in the Quarterly Journal of Economics.

It looked right. The format was correct. The topic matched. We almost published it.

The paper doesn't exist. The authors don't exist. The AI hallucinated the entire citation with complete confidence.

This is the verification tax: the cost of checking AI output before trusting it. We pay it every time, or we pay much more when hallucinations slip through.

The Confidence Problem

AI assistants signal certainty regardless of accuracy. A person might say "I think it was published in 2023, though I'm not sure." An AI will state "Published in QJE, 2023" with identical confidence whether the citation is correct or fabricated.

This is not deception. The AI genuinely cannot distinguish between retrieved knowledge and plausible generation. It produces text that fits the pattern, and citations, code snippets, and factual claims all have recognizable patterns that can be generated without being true.

The burden of verification falls entirely on us.

What to Verify

Not everything needs the same level of checking. We triage based on risk:

High stakes (verify thoroughly):

Citations and references
Statistical claims and numbers
Code that handles money, auth, or data integrity
Anything we'll publish or share externally
Legal or compliance-related content

Medium stakes (spot check):

API calls and external service interactions
Code logic in critical paths
Claims about libraries or frameworks
Configuration that affects production

Lower stakes (trust but review):

Formatting and style
Code structure and organization
Documentation drafts
Internal notes and summaries

The mistake is treating everything as low stakes. One uncaught hallucination in a published paper damages credibility far more than the time verification would have cost.

Building Verification into Workflow

Verification shouldn't be a separate step at the end. It should be embedded in how we work.

For citations:

We never accept AI-generated citations directly. The workflow:

AI suggests a citation
We fetch the URL or search for the paper
We extract actual metadata (authors, title, journal, year)
We confirm with the source before using

We've built this into our process with a hook that detects academic URLs and reminds us to verify before citing.

For code:

AI-generated code gets the same review as human code:

Does it do what we asked?
Are there edge cases it misses?
Does it follow our patterns?
Are there security implications?

The difference: we assume AI code has bugs until proven otherwise. With human code, we assume competence. With AI code, we assume plausible-looking errors.

For claims:

Any factual claim gets a source check. "According to the 2023 Census Bureau data" means we verify that the Census Bureau published that data and it says what the AI claims.

The Hallucination Debt

When we skip verification, we accumulate hallucination debt.

Like technical debt, it compounds. One unchecked claim becomes the foundation for other claims. We build analysis on fabricated data. We cite papers that don't exist. We implement patterns that don't work the way the AI described.

Eventually, something breaks. We discover the foundation was wrong. Now we have to unwind not just the original hallucination but everything built on top of it.

The interest rate on hallucination debt is brutal. Pay verification costs upfront.

Automation Where Possible

Some verification can be automated:

Type checking: Run static analysis on generated code. If the AI claims a function returns a string but the types say otherwise, catch it immediately.

Link checking: Verify that cited URLs actually exist and return expected content.

Test running: If the AI generates code with tests, run the tests. If it generates code without tests, write tests.

Linting and formatting: Automated style checks catch inconsistencies the AI introduces.

Citation hooks: Trigger verification workflows when academic URLs appear.

Automation catches the easy errors, freeing us to focus on the hard ones.

The Verification Mindset

The goal isn't paranoia. It's calibrated trust.

We trust the AI to:

Generate plausible structures
Follow patterns we've demonstrated
Draft content we'll refine
Explore options faster than we could

We don't trust the AI to:

Know things it wasn't trained on
Accurately recall specific facts
Recognize when it's uncertain
Catch its own errors

This isn't a criticism. It's how the technology works. The AI is a powerful generator, not a reliable database. Treating it as a generator that needs verification gives us its benefits without its risks.

When Verification Fails

Sometimes hallucinations slip through despite our best efforts. When they do:

Document the failure. What got through? How? What would have caught it?

Add a check. If a type of error happened once, it will happen again. Add verification for that category.

Don't blame the AI. The AI did what it does. We failed to verify. The lesson is about our process, not the AI's reliability.

Update trust calibration. If a certain type of output has higher hallucination rates than we expected, increase verification for that category.

Each failure is a chance to improve our verification workflow. The failures that teach us the most are the ones that almost made it to production.

The True Cost

Verification takes time. On a typical day, we spend 15-20% of AI-assisted work time on verification. That's the tax.

But consider the alternative. One bad citation in a published paper. One security hole in production code. One fabricated statistic in a report to stakeholders.

The cost of these failures extends beyond correction time: credibility damage, trust erosion, and the nagging uncertainty about what else might be wrong.

Verification isn't overhead. It's the price of using AI output with confidence.

Practical Recommendations

Never publish AI-generated citations without fetching the source. This is non-negotiable.
Run generated code before trusting it. Code that fails to run is wrong. Code that runs without tests remains suspect.
Automate what we can. Type checkers, linters, and link validators are cheap insurance.
Build verification into workflow, not after it. Checking at the end means checking when we're tired and eager to finish.
Calibrate trust by category. Some AI outputs are more reliable than others. Verify accordingly.
Document hallucinations when we catch them. Patterns emerge. We use them to focus future verification.

The verification tax is real. Pay it willingly, or pay much more when hallucinations compound.

This is part of our series on AI-assisted research workflows. Next: Research Phases Need Different Prompts: exploration vs. implementation vs. documentation.

Suggested Citation

Cholette, V. (2026, January 21). The verification tax: Every AI output needs checking. Too Early To Say. https://tooearlytosay.com/research/methodology/verification-tax/

Copy citation