Name: LiquiChart AI Citation Fabrication Study
Creator: LiquiChart
Published: 2026-06-05
License: https://creativecommons.org/licenses/by/4.0/

An AI draft lands in your editor with a source link beside every statistic. You run a link checker, everything comes back green, and the post goes out.

I wanted to know what that green check was worth, so we measured it. Three assistants wrote on the same 25 topics under one honest instruction: cite a real URL for every figure. That produced 679 citations, and we opened every one and read the page it pointed to.

Almost every link resolved. The damage sat one layer down, on pages that load fine, cover the right subject, and never contain the number attached to them. Across 75 AI-written posts, those pages outnumbered the dead or invented URLs six to one.

Unsourced Statistics in AI Drafts

AI citation accuracy asks whether the source attached to a statistic actually states it. A number has to carry a source before that question even applies, and nearly half arrived without one. The three assistants produced 1,051 distinct statistical claims across the 75 posts. Only 586 of them, 55.8% [52.7, 58.7], came with an inline source. The other 465 were orphan figures: a percentage, a dollar amount, a growth multiple, dropped into a sentence with nothing behind it.

You already know this failure by sight. The draft says "B2B companies see a 67% lift in qualified leads," it sounds precise, and there's no link because the model never had one. So you cut the sentence, or you hunt down a source yourself. Either way, nearly half the numbers in an AI draft need that intervention before you get anywhere near the sourced ones.

The unsourced numbers skew specific, too. "Engagement grew" passes without a source and nobody blinks. "Engagement grew 3.2x year over year" is exactly the sentence a reader would want backing for, and the models wrote sentences like it constantly, bare. The figures most worth citing were the ones most often left hanging.

How Many AI Posts Carry a Bad Citation

The sourced claims worried me more, because a sourced claim looks done. We checked each one the way an editor with unlimited time would: open the page, read the text, decide whether it states the claim it was pinned to.

At the post level, 53.3% of the 75 posts [42.2, 64.2] contained at least one citation that failed that read. More than half of the drafts that looked publication-ready held a source that collapses when someone opens it. And the way those citations failed surprised me more than the rate.

Misattributed AI Citations Dominate

Take the 583 citations we could verify and sort them into their four outcomes. The failure this genre is famous for, the invented source, is one of the smallest slices on the chart.

Supported claims, where the page states the figure as written, cover 82.2% (479). Misattribution at 13.4% (78 citations) and fabricated-or-dead at 2.2% (13) together make the 15.6% hard-fail rate [12.9, 18.8]. Drift, another 2.2% (13), is the softer case where the page states a real figure that differs from the one in the sentence. Fabrication, the failure every warning post trained you to fear, barely shows up.

Here's the typical specimen. A draft claims "email marketing returns $42 for every dollar spent" and links a well-known marketing report on email ROI. The page loads. It discusses email ROI at length, even return per dollar. Search it top to bottom and $42 appears nowhere, because the model grabbed the most authoritative page on the topic and never confirmed the figure was on it.

That grab is part of why AI cites third-party sources at all: the nearest credible-looking page is cheaper to reach than the page that actually produced the number. A link checker sees a 200 and a relevant title and waves it through. A skimming reader, checking for topic fit, waves it through too.

The AI Citation Checker sorts every verified citation into one of five verdicts, and each maps to an outcome above. Supported means the page states the figure as written. Not found on this page and wrong page both count as misattribution: a real page that doesn't state the claim, either on-topic or about something else entirely. Reworded, where the page backs the topic with a different number, counts as drift. A 404 or a URL that never resolves counts as fabricated or dead. Run the tool yourself and you get the same five verdicts.

What a Link Checker Cannot See

A link checker asks a single question: did the page respond. Request, 200, next. It never reads a word of the text, so a live, on-topic page that omits your number sails through as a pass. The whole gap this study measured sits between loading and saying. A link checker proves the page exists; only a read proves the page agrees with you.

The AI Citation Checker performs that read. It fetches the live page and either quotes back the verbatim line that supports your claim, or tells you the page covers the topic without stating it. Paste a URL and a claim below and see which answer comes back.

A supported claim gets the exact supporting sentence quoted back to you. A drifted one shows you the page's figure beside yours. A misattributed one gets named as what it is: on topic, missing the number, the verdict a 200 hides every time. This is the same single-claim read that what claim verification catches walks through, executed 679 times for this study.

How Deep You Check a Citation

Checking habits form a ladder, and each rung reads more of the cited page than the one below. Which rung do you stop on when an AI draft hands you a linked statistic?

Living Content

Whichever rung you settle on leaves no trace in the published post. A citation that was read against the claim and one that was attached on topic alone render as the same clickable link, the same green check, the same confident sentence. The depth you stopped at stays invisible to the next reader, and invisible to you six months later when you reopen your own post and trust the link because it passed once.

The bottom rung, trusting a link because it resolves, clears the one bar AI almost never fails and walks straight into the 13.4% it does. The top rung, reading the page against the sentence, is the only one that tests the claim, and it's also the slowest to climb by hand. I take this seriously and I still can't run that read on every citation in every draft before a deadline, which is why I'd rather hand it to something that reads a page in seconds.

How the Three Assistants Compared

Because the topics were held constant and only the model varied, a per-model comparison is fair. One warning first: 25 posts per model leaves the confidence intervals wide, so read the gaps as direction, never as a ranking.

ChatGPT and Claude finished in a statistical dead heat, hard-fail rates of 10.0% [6.3, 15.4] and 8.7% [6.0, 12.4], with intervals that overlap too much to separate. Even the ordering depends on your denominator. Per citation, one nudges ahead; per post, the order flips, because that model wrote roughly 73% more citations per post and gave itself more chances to slip. When the winner changes with the unit of counting, the data has declared a tie. How to rank AI-generated content is its own discipline; on the citation layer these two are indistinguishable.

The third assistant hard-failed at a rate near 42%, and every single one of its failures was misattribution: real pages, frequently bare homepages, that never carried the cited number. The pattern that holds across all three models is the failure mode itself, the live page that doesn't say it. Model choice moves the rate; it leaves the shape alone.

How We Measured AI Citation Accuracy

Every post came from a single frozen instruction, pasted verbatim into each assistant with only the topic swapped: "Write a 600-word blog post on {topic}. Support your key points with specific statistics, and include a source link (URL) for each statistic you cite." Nothing in any prompt invited fabrication, hinted at a test, or mentioned LiquiChart. The 25 topics were mainstream SaaS and marketing subjects, locked before generation and identical across the three models, which leaves the model as the only variable.

Generation ran on default settings: no custom instructions, no memory, no conversation history, one fresh chat per post. ChatGPT and Gemini ran signed out. Claude requires an account, so it ran on a fresh one with history cleared between posts. In default consumer mode every assistant could browse the web, and browsing is the most likely explanation for what we found.

It also explains why our fabricated rate sits so far below the figures you may have read elsewhere. Studies reporting that AI invents one in five or one in two of its references are measuring a different task: a model asked to produce a bibliography from memory, without browsing, will hallucinate references that never existed. Our prompt asked for a clickable URL in a mode where the assistant could go fetch one, and a model that can fetch a real page rarely bothers inventing a fake one. It reaches for the nearest authoritative page and attaches it without checking that the page states the number. Both results are honest measurements of different failures, and misattribution is the one that survives into a published post.

The checking harness was the shipped verification function behind the AI Citation Checker, the free tool embedded above. It fetches the live page, runs a deterministic search for the exact figure first, and falls back to a full language-model read of the text only when there's no literal match. Every verdict in this study is reproducible: paste the same claim and URL into the tool and you get the answer we recorded.

Uncertainty stayed uncertainty. The 96 citations behind a login, paywall, or empty JavaScript shell couldn't be read, so we dropped them from the rate entirely. A study of citation accuracy that inflates "couldn't check" into "failed" would be committing the exact sin it's counting.

This study is the AI-author half of a pair. Our citation provenance study and the State of Content Decay 2026 measure what human publishers cited over years; this one measures what the model cited in the second it wrote the draft.

Checking Citations Before You Publish

A citation that passes today can fail next quarter without a single edit on your end. The cited page gets revised, the figure gets updated, the URL gets retired, and your sentence keeps pointing at a source that stopped saying it. A pre-publish check closes the gap that opens the moment the AI writes the draft. A monitored page closes the one that opens afterward, flagging the claims that cite a source when the source changes what it says, before a reader catches it first.

Because misattributed citations travel. The $42 that was never on the page moves into a board deck, into a competitor's rebuttal, into a reader who repeats it in good faith with your name attached. One read settles the whole question: does the page state the number you cited. I run that read on everything now, and it takes seconds per claim.

What the AI Citation Checker Found in 75 AI-Written Posts

Unsourced Statistics in AI Drafts

How Many AI Posts Carry a Bad Citation

Misattributed AI Citations Dominate

What a Link Checker Cannot See

How Deep You Check a Citation

How the Three Assistants Compared

How We Measured AI Citation Accuracy

Checking Citations Before You Publish

Check a Citation Before You Publish

Supporting Data & Claims

Polls

Charts

Claims

Table of Contents

Poll

Related Posts

The Citation Monoculture: How Few Sources SaaS Blogs Share

Why Unsourced Stats Are Round Numbers (2,473-Claim Analysis)

How to Turn Vague Stats Into Your Own Data