We cited a statistic from Orbit Media's annual blogging survey. The link worked. The page loaded. The number had been removed in their 2025 update. That is not a broken citation chain; it is a hollow one, and no monitoring tool catches it because the URL returns 200 OK.
We measured citation chain depth across 20 SaaS blog domains: how many hops separate a published claim from its original source. Only 23% of 125 traced chains reached a primary source. The rest ended at pages where the data had vanished, sat behind a paywall, or pointed to another secondary source that pointed to another secondary source that pointed to nothing.
That Orbit Media case was not unusual. In a companion study of the same corpus, 34% of SaaS blog claims cited no source at all. Among the claims that did include a citation, 80% of cited claims could not be independently verified by a reader following the link. Those studies measured whether citations existed and whether they resolved. This citation provenance study measures where the chain ends when it does resolve.
How We Measured Citation Chain Depth Across 20 Domains
Citation chain depth is the number of hops between a published claim and the terminal source at the end of its reference trail. A blog post cites a number that links to a page. That page may cite its own source. Each link in the sequence is one hop. The chain ends when a page either names no deeper source or provides the original dataset.
We used LiquiChart's claim extraction pipeline to identify every statistical claim in each post. From there, we traced each chain from the citing post to its terminal node. The process was mechanical: follow the citation link, determine whether the cited page contains the number, and if it does, check whether that page cites a deeper source.
Each hop adds one to the chain's depth. Each terminus gets classified: primary source, claim not found, paywalled, named without a link, broken, or circular. The pipeline produced 125 clean chains from 941 extracted claims across 140 posts on 20 domains.
Why 125 from 941? Most claims had no citation at all. Others linked to pages with no statistical content. The 125 chains represent the subset where a verifiable trail existed: a claim, a link, and at least one page to evaluate at the other end.
Six terminal classifications:
- Primary source: the chain reaches the original dataset, study, or methodology.
- Claim not found: the cited page loads but the specific number is absent.
- Paywalled: the chain terminates at a page requiring payment or gated access.
- Named without link: the post names a source by title but provides no URL.
- Broken: the URL returns a 4xx or 5xx error.
- Circular: the chain loops back to a page already visited.
When Your Team Cites a Statistic, How Far Do You Trace It?
The 125 chains in this study represent 125 editorial decisions about how far to follow a link. Before the results, a baseline: how far does your team follow them?
The difference between options A and B is the difference between a chain that stops at layer one and a chain that reaches the ground. Most teams have never articulated which standard they follow, which means different writers on the same blog produce claims at different verification depths.
Most sourcing workflows end when the link resolves to a page that looks authoritative. Whether that page contains the original data or is itself citing someone else rarely enters the process. The result is a published claim that feels grounded but has no verified path to the number it states.
Where Citation Chains Actually End
The 125 chains terminated in six ways. The distribution was lopsided.
Twenty-nine chains (23%) reached a primary source. The researcher's dataset, the government database, the original survey methodology. These are the chains where a reader could verify the claim against the data that produced it.
The largest category was absence. Thirty-eight chains (30%) ended at a page where the claimed number was not present. The URL resolved. The page loaded. The statistic was gone.
Some sources had published a newer edition that dropped the figure. Others had restructured the page and removed the relevant section. From the outside, nothing looks wrong. The link is blue, the page renders, and the number the post relies on is nowhere on it.
Three chains from the dataset illustrate three distinct failure modes. A claim like "42% of posts had been updated since publication" traces to a source that restructured its page. A claim about "$6,000 per year in content debt" links to a report with no underlying methodology. A claim citing a "2.1% annual data decay rate" passes through three secondary sources before reaching a paywall. All three produce the same outcome: a published number with no verified path to the ground.
Paywalls stopped another 29 chains (23%). The terminal page exists, but the data sits behind a subscription gate. Seventeen chains (14%) named a source without linking to it: "according to Gartner" with no URL. Eight chains (6%) were broken links. Two (2%) were circular, with page A citing page B citing page A.
The 6% broken-link figure deserves attention because it is small. Broken links are the only failure mode that existing monitoring tools catch. The other 71% of non-primary terminations return 200 OK.
Mean Citation Chain Depth Is 1.1 Hops
The depth distribution across all 125 chains shows how shallow citation trails actually are.
Depth one: 112. Depth two: 10. Depth three: 2. Depth four: 1. There is no supply chain.
Ninety percent of citation chains in the dataset consist of a single hop. The blog post links to one page. That page is where the chain ends, regardless of whether it contains the number, names a deeper source, or originates the data.
The mean depth of 1.1 hops means that for every ten chains, nine go one level deep and one goes two. The depth-three and depth-four chains were outliers. They were not more reliable for the extra length. The depth-four chain passed through three secondary sources before terminating at a paywalled report with no methodology section.
The dataset shows that almost nobody checks past the first link. One hop substitutes for verification.
I have watched teams add a hyperlink to a claim and call it sourced. The link is the citation. The citation is the proof. Whether the page at the other end actually contains the original data is a question that almost never gets asked.
Four Platforms Terminate More Chains Than Broken Links Do
Four platforms appeared at the terminus of multiple chains: G2, Statista, Capterra, and Gartner.
G2 appeared at the end of six chains. Statista at five. Capterra at five. Gartner at three. All four compile data from other sources. All four sit behind paywalls or gated registration.
The pattern is structural. A SaaS blog cites a statistic. The citation links to G2 or Statista. The reader clicks through and hits a login wall. The chain dies at the gate.
The data behind that wall may itself reference another source. Nobody on the citing team will ever know.
Every chain that terminates at a paywalled secondary source produces a zombie statistic: a number that circulates without a verifiable path to the research that generated it. The blog presents the number as fact. The intermediate source frames it as a data point. Neither page shows you the methodology or the original dataset. The statistic persists because repetition replaces the verification step.
These four platforms do what aggregators do: compile. Content teams treat a link to an aggregator as a link to primary research. It leads to a tollbooth.
Three Failure Modes That Look Like Due Diligence
Orbit Media: The Stat That Disappeared
One chain in the dataset began at a blog post citing "bloggers who use analytics are 2.5x more likely to report strong results" from Orbit Media's annual blogging survey. The URL pointed to the survey results page, a live, authoritative, well-maintained page. The number had been removed in the 2025 edition.
The citation still points to a live page. The page no longer contains the number. That is a frozen liability: a claim locked to a source that has moved on without it. The citing post still publishes the number. A reader who clicks through finds the survey but not the statistic. The data has become orphaned data: present in the downstream post, absent from the upstream source, with no system connecting the two.
GitHub: 37 Claims, Zero Source URLs
One domain in the corpus published 37 statistical claims across seven posts. None included a source URL. Numbers were stated as fact: market size figures, adoption percentages, growth rates, all without attribution of any kind.
Zero chains. Thirty-seven claims published without a single link to follow.
This is misrepresentation through omission: the post presents data as fact without providing any path for the reader to verify it. The numbers may be accurate. They may come from reputable research. Without a link, a reader cannot check, and a monitoring system cannot track upstream changes. The claims exist in isolation, permanently frozen at whatever value the author typed.
The 2.1% Decay Chain: Three Hops to a Paywall
One chain traced a claim about a "2.1% annual data decay rate." The citing post linked to a marketing blog. That blog linked to a demand generation platform's report. That report linked to a third vendor's white paper. The white paper sat behind a registration wall with no methodology section visible.
Three hops. Three pages, each presenting the number as if they had produced it. Each included phrases like "our research shows" or "according to our data." Zero primary sources. The number traveled through three layers of citation, gaining apparent authority at each stop, with no researcher or dataset at the bottom.
[BlogCTA variant="scanner"]
Citations Without Provenance Infrastructure Create an Illusion of Verification
Tracing blog sources at scale is a citation audit most teams have never run. Adding citations without provenance infrastructure creates an illusion of verification. The blog looks well-sourced. The links resolve. The reader assumes someone checked.
In the dataset, 77% of chains ended before reaching the research that generated the number. Across the full corpus, even posts that had been updated carried a 14.9% stale data rate.
Every unverified citation chain is content debt that accrues silently. The data behind the link changes and no system in the publishing workflow notices. Citation count tells you nothing about SaaS blog citation quality. Fifteen linked statistics with no verified path to original research carry the same provenance as five unsourced claims: zero.
The distinction matters for what Google's information gain score actually rewards. A post that traces a claim to its primary source adds information no other post in the SERP contains. A post that cites the same secondary source as ten competitors adds nothing. The chain's depth determines whether the citation contributes unique value or recycles an existing reference.
LiquiChart's claim-level monitoring infrastructure watches the upstream URL beyond its HTTP status. When Orbit Media removes a number from a page that still returns 200 OK, that change propagates to every post that cited it. The gap between link health and citation health is where published claims lose their foundation.
The teams building systems to detect when published data goes stale will find that most of their exposure lives in the 30% of chains that point to a page where the number used to be.
[BlogCTA variant="scanner"]