A blog cites a statistic. The link works. The page loads and returns 200 OK. The number it was supposed to support is not on the page. That is not a broken citation; it is a hollow one, and no link checker catches it, because nothing about the HTTP response is wrong.
We traced citation chain depth across 45 SaaS blog domains: how many hops separate a published claim from the source at the end of its reference trail. Of 2,505 linked citations followed to their terminal node, only 10.6% reached a primary source. The rest resolved to pages where the number had been removed, sat behind a bot wall or a login, or pointed to another secondary source that pointed to another that pointed to nothing.
That pattern is the rule, not the exception. A companion study finds most SaaS blog claims are borrowed from third-party research, and that even when a borrowed claim carries an external link, almost one in four resolves to a dead, gated, or broken page. Those studies measured whether citations existed and whether they resolved. This citation provenance study measures where the chain ends when it does resolve.
How We Measured Citation Chain Depth Across 45 Domains
Citation chain depth is the number of hops between a published claim and the terminal source at the end of its reference trail. A blog post cites a number that links to a page. That page may cite its own source. Each link in the sequence is one hop. The chain ends when a page either names no deeper source or provides the original dataset.
We used LiquiChart's claim extraction pipeline to identify every monitorable claim in each post: statistical, comparative, source-citation, and temporal. Numbers, ratios, percentages, and dated assertions all qualified, since a "2024 survey" reference is as falsifiable as a "67%" reference. From there, the production tracer followed each chain from the citing post to its terminal node. The process was mechanical: follow the citation link, determine whether the cited page contains the asserted value or date, and if it does, check whether that page cites a deeper source.
Each hop adds one to the chain's depth. Each terminus gets classified. The census traced 2,505 linked third-party citations across 938 posts on 45 domains, every citation that carried a followable external link.
Why 2,505? Most claims with a third-party origin carried no followable link at all. The 2,505 are the subset where a verifiable trail existed: a claim, a link, and at least one page to evaluate at the other end. First-party operational claims are out of scope. A service provider's own availability figure is the publisher speaking about itself, so there is no upstream chain to trace.
Terminal classifications:
- Primary source: the chain reaches the original dataset, study, or methodology.
- Claim not found: the cited page loads but the specific number is absent, or the page is the wrong one.
- Access blocked: the page exists but cannot be read: a bot wall, a robots block, a login gate, a redirect loop, an unparsed PDF, or a client-side shell that never renders the value.
- Named without link: the post names a source by title but provides no URL.
- Broken: the URL returns a 4xx or 5xx error.
- Circular: the chain loops back to a page already visited.
When Your Team Cites a Statistic, How Far Do You Trace It?
The 2,505 citations in this study represent thousands of editorial decisions about how far to follow a link. Before the results, a baseline: how far does your team follow them?
The difference between options A and B is the difference between a chain that stops at layer one and a chain that reaches the ground. Most teams have never articulated which standard they follow, which means different writers on the same blog produce claims at different verification depths.
Where Citation Chains Actually End
The 2,505 chains terminated in six ways. The distribution was lopsided.
265 chains (10.6%) reached a primary source [9.4%, 11.8%]: the researcher's dataset, the government database, the original survey methodology. These are the chains where a reader could verify the claim against the data that produced it.
The largest category was absence. 1,128 chains (45.0%) resolved to a live page where the claimed number was not present, or where the page was not the one the claim described. The URL resolved. The page rendered. The statistic was gone or never there.
Some sources published a newer edition that dropped the figure. Others restructured the page and removed the relevant section. One chain in the dataset began at a claim about analytics data loss and pointed to a marketing agency's post that still returns 200 OK with the figure no longer anywhere on it. From the outside, nothing looks wrong: the link is blue, the page renders, and the number the post relies on is not on it.
741 chains (29.6%) hit a page that exists but cannot be read. This is the category a link checker is least equipped to surface, because the page is alive. The dataset breaks it down: bot detection that blocks an automated reader (8.9%), a robots directive that disallows the path (6.6%), a redirect loop (5.6%), a client-side shell that renders no text to a fetch (5.1%), and an unparsed PDF (3.2%). One chain hopped through two secondary pages into an originating announcement that rendered a single word of body text to the tracer; the rest loads client-side, so no static fetch, search crawler, or citation auditor can confirm the value. The chain reaches the announcement, then walks into a wall.
267 chains (10.7%) named a source without linking to it: "according to a leading analyst firm" with no URL. 101 chains (4.0%) were broken links. Three (0.1%) were circular.
The 4.0% broken-link figure deserves attention because it is small, and it is the only failure mode existing link checkers catch. The other 85% of chains fail in ways a status check never surfaces: the value is gone, the page is the wrong one, the source sits behind a wall or never renders, or it was named with no link to check at all.
Mean Citation Chain Depth Is 1.08 Hops
The depth distribution across all 2,505 chains shows how shallow citation trails actually are.
Depth one: 2,361. Depth two: 102. Depth three: 36. Depth four: 6. There is no supply chain.
94% of citation chains in the dataset consist of a single hop. The blog post links to one page. That page is where the chain ends, regardless of whether it contains the number, names a deeper source, or originates the data.
The mean depth of 1.08 hops means almost nobody checks past the first link. One hop substitutes for verification. The handful of depth-three and depth-four chains were not more reliable for the extra length; the longest ones passed through several secondary sources before terminating at a gated report with no methodology section.
I have watched teams add a hyperlink to a claim and call it sourced. The link is the citation. The citation is the proof. Whether the page at the other end actually contains the original data is a question that almost never gets asked.
Aggregators and Gated Reports Terminate More Chains Than Broken Links Do
The single largest bloc of unreadable terminations was not dead links. It was live pages an automated reader cannot get into: research aggregators, consultancy reports, analyst portals, and industry-body PDFs. These compile data from other sources and sit behind registration walls, bot detection, or paywalls.
The pattern is structural. A SaaS blog cites a statistic. The citation links to an aggregator. The reader clicks through and hits a login wall, or a crawler hits a bot challenge. The chain dies at the gate. The data behind that wall may itself reference another source. Nobody on the citing team will ever know.
Every chain that terminates at a gated secondary source produces a zombie statistic: a number that circulates without a verifiable path to the research that generated it. The blog presents the number as fact. The intermediate source frames it as a data point. Neither page shows the methodology or the original dataset. The statistic persists because repetition replaces the verification step. Content teams treat a link to an aggregator as a link to primary research. It leads to a tollbooth.
Three Failure Modes That Look Like Due Diligence
The Stat That Disappeared
One chain began at a claim about how much analytics data is lost to cookie-consent denial. The citation pointed to a live, well-maintained marketing-agency post. The figure was not on it. The citation still points to a live page; the page no longer contains the number. That is a frozen liability: a claim locked to a source that has moved on without it. A reader who clicks through finds the post but not the statistic. The data has become orphaned data: present downstream, absent upstream, with no system connecting the two. This was the single most common outcome in the dataset.
Named Without a URL: The Chain That Never Starts
267 chains (10.7%) ended at pages that named their sources without linking to them. The citing post points to a second page. The second page says "according to our research" and stops there. No URL follows the name. One chain started from a claim about daily search volume and led to a vendor post that mentioned businesses and creators by name but never linked to where the figure came from. The trail terminated at an authoritative domain that published the number without an upstream path. This failure mode passes both a dead-link check and a paywall check: the URL resolves, the page loads at an authoritative domain, and the reader takes the signal without scrolling far enough to notice the trail ends at prose.
Three Hops Into a Wall
One chain traced a platform user-count claim through two secondary pages into the originating announcement. The announcement returns 200 OK. It rendered exactly one word of body text to the tracer, because the rest loads client-side. Three hops, three pages, each presenting the number with the apparent authority of the source one link upstream. The value may exist behind the JavaScript, but no static fetch, search crawler, citation auditor, or training pipeline can confirm it. The chain reaches the announcement, then walks into a wall.
Citations Without Provenance Infrastructure Create an Illusion of Verification
Tracing blog sources at scale is a citation audit most teams have never run. Adding citations without provenance infrastructure creates an illusion of verification. The blog looks well-sourced. The links resolve. The reader assumes someone checked.
In the dataset, 89% of chains ended before reaching the research that generated the number. Of all third-party claims, only 7.4% trace to a primary source. A companion study across 45 domains found even refreshed posts carried a 5.3% stale claim rate, higher than the 3.9% in posts no one had touched.
Every unverified citation chain is content debt that accrues without a signal. The data behind the link changes and no system in the publishing workflow notices. Citation count tells you nothing about SaaS blog citation quality. Fifteen linked statistics with no verified path to original research carry the same provenance as five unsourced claims: zero.
The distinction matters for what Google's information gain score actually rewards. A post that traces a claim to its primary source adds information no other post in the SERP contains. A post that cites the same secondary source as ten competitors adds nothing. The chain's depth determines whether the citation contributes unique value or recycles an existing reference.
LiquiChart's claim-level monitoring infrastructure watches the upstream URL beyond its HTTP status. When a source removes a number from a page that still returns 200 OK, that change propagates to every post that cited it. The gap between link health and citation health is where published claims lose their foundation.
The teams building systems to detect when published data goes stale will find that most of their exposure lives in the 45% of chains that resolve to a page where the number used to be.