One in three data claims in SaaS blog posts cites no source at all.
We extracted 575 claims from 100 posts across 20 SaaS domains and classified each one by how it attributes its data. The breakdown: 34% are completely unsourced. 10% name a source but provide no link. 11% cite a third-party source by name and link to it. And 45% are first-party claims drawn from the company's own data.
That 34% is not a quality problem. It is a structural one.
These posts averaged 97 out of 100 on freshness scores. Only 1% of source URLs were dead. The claims are still unverifiable, because the sourcing step was never part of the publishing workflow.
Fresh content is not accurate content.
Freshness tells you when a post was last touched, not whether the numbers inside it can be checked.
The deficit was baked in at the point of creation. That is content debt that no freshness audit catches, and no amount of updating changes it. Every unsourced statistic is a claim the reader has to take on faith.
Thirty-four percent of the claims in this dataset ask exactly that.
How 575 Blog Claims Break Down by Attribution
The 575 claims fall into four categories.
45% of all claims are first-party. The company's own product data, its own survey, its own customer metrics. SaaS blogs are good at generating first-party data. The problem sits entirely on the third-party side, where charts are claims about reality that somebody else generated and you borrowed.
11% cite a third-party source by name and include a URL. These are the fully verifiable claims. A reader can click through, check the methodology, evaluate the finding. This is the floor for blog claim attribution that holds up under scrutiny.
10% name a source without linking to it. "According to Gartner" with no URL. "A McKinsey study found" with no path to the study. The claim carries a label but no verification path.
34% are unsourced statistics. No name. No link. No trail. A number appears in the prose as if it were common knowledge. "72% of B2B buyers prefer self-service." Says who? Published when? Based on what sample?
These are zombie statistics. Numbers that circulate from post to post with no provenance, no expiration, and no way for a reader to verify whether they were ever true. Repetition replaced provenance. Say a number enough times and the source stops mattering.
The 45% first-party figure explains why most SaaS blogs feel well-sourced. Nearly half the numbers come from the company itself. That creates an impression of data-richness that masks the third-party gap underneath.
Why 80% of Third-Party Statistics Are Unsourced or Unverifiable
Of the 316 third-party claims in the dataset, only 64 have a verifiable link.
That is 20%.
The other 80% split into two groups. 58 claims name a source without providing a URL. 194 claims cite nothing at all. Combined, these 252 claims represent numbers that a reader cannot check without running a separate search.
Naming HubSpot is not citing HubSpot.
A sentence that says "HubSpot reports that 60% of marketers prioritize blog content" gives the reader a brand name and a statistic. It does not give them a path to the report, the year the data was collected, or the methodology behind it. The claim is orphaned data from the moment it is published.
It arrived without a verification path and will never acquire one unless someone goes back and adds the link by hand.
The gap between "sourced" and "verifiable" is where most blog claim attribution breaks down. Content teams believe they are citing their data because they mention the name. The scan shows that naming without linking leaves 80% of third-party claims unverifiable. The reader has to trust the author. The author has to trust their memory of the original. And no one can confirm the number did not drift through three layers of repetition before it landed in the prose.
Your blog's ratio will not be far from this average.
The pattern held across 20 domains with very different editorial cultures.
The scan shows what ends up published. It does not show what happens during the writing. That part depends on the team.
Most content teams track publishing cadence, not claim attribution. The two require different infrastructure. As readers weigh in above, the gap between what teams intend and what they actually do at the claim level will sharpen.
The unsourced statistics problem does not trace back to careless writers.
Every domain in the scan publishes regularly, updates frequently, and maintains high freshness scores. The sourcing gap exists because the workflow never included a sourcing step. Writers paste a statistic they remember reading, or they pull it from a previous post that also lacked a link to the original.
The number propagates. The provenance does not.
What 20 SaaS Domains Reveal About Editorial Culture
The 20 domains in the scan do not cluster around a single attribution norm. They spread across a spectrum that reveals editorial culture more than content quality.
The pattern is structural. Domains in the upper cluster write primarily about their own data. GitHub attributes 91% of its claims to first-party sources, with only 4% unsourced. Buffer runs at 84% first-party and 8% unsourced. Webflow sits at 82% first-party. These blogs make claims they can defend because the source is themselves.
The lower cluster tells a different story. OpenAI's scanned posts run 88% unsourced and only 9% first-party. Chargebee hits 79% unsourced. CircleCI lands at 71%. HubSpot, despite being a content marketing benchmark, shows 55% unsourced claims and only 8% first-party.
A high unsourced rate does not mean the numbers are wrong. It means a reader cannot verify them.
A post asserting that "73% of enterprise buyers conduct independent research before contacting sales" might be accurate. Without a link to the study, the reader has no way to know.
The spectrum separates two editorial cultures. One writes from its own data and keeps the provenance. The other borrows assertions from the broader industry and files nothing. Both produce content that ranks and reads well. Only one produces content where a reader can check the numbers.
The gap between Buffer's 84% first-party rate and OpenAI's 9% does not close with discipline. Both publish multiple times per week. The difference is infrastructure: whether the publishing system records provenance or leaves it to the writer.
Fresh Content Is Not Accurate Content
The average freshness score across all 100 posts in the scan: 97 out of 100.
And 34% of their claims cannot be verified.
These are independent variables. A post published last week with a perfect freshness score can contain five unsourced statistics with no trail back to their origin. A post from 2024 where every number links to a named, dated study can score low on freshness.
Freshness captures when content was touched. Attribution captures whether it can be checked.
Most content audits measure only the first.
That gap is freshness theater: teams update dates and check links, but the claims inside the post were never part of the audit. The pattern held across 20 companies with very different editorial operations.
Discipline is not the variable.
What changes the pattern is infrastructure that records provenance at the point of creation. When a writer adds a statistic and the system asks where it came from, that claim enters the world with a trail attached. When it does not, the claim enters as one more number the reader has to accept on faith.
AI-assisted content makes this worse.
When writers use LLMs to draft posts, the statistics in those drafts arrive with no provenance at all. The number sounds plausible. The source is nowhere.
Without a system that traces the citation supply chain from published claim back to primary source, there is no way to verify what is real, what is approximate, and what was hallucinated from training data. Any system that detects when published data goes stale can only work if the data was traceable in the first place.
Living content starts with knowing where each claim originated.
How We Built This Dataset
The scan used LiquiChart's claim extraction and source verification infrastructure. One hundred posts were fed through the same claim extraction engine that powers the Content Health Scanner. Each post was parsed for statistical assertions, and each assertion was classified by attribution type: first-party, sourced with a link, sourced without a link, or completely unsourced. The tool did not judge quality. It recorded provenance.
We selected 20 SaaS domains with active blogs and RSS feeds. From each domain, we collected the five most recent posts. A 28-rule programmatic filter excluded statistics roundups, annual benchmark compilations, product announcements, and quote-heavy listicles. After filtering, every remaining title was reviewed by hand. 36 additional posts were excluded during that review for containing fewer than two data claims or relying entirely on first-party metrics.
The result: 100 posts that represent ordinary, argumentative SaaS blog content. Posts that make a case, support it with data, and try to move the reader toward a conclusion.
Each claim was classified by a single question: can a reader verify this number? First-party claims count as verifiable because the company is the source. Linked third-party claims count as verifiable because the reader can click through. Named-no-link and unsourced claims are unverifiable because the path ends at the prose.
Source URLs were checked via HTTP HEAD requests for availability. Source page freshness was extracted from metadata where present. The full methodology, including filter rules and classification criteria, is published for reproducibility.
The gap this scan reveals is structural. The posts are fresh. The statistics are recent. The sources were missing the day the post went live.
No amount of editorial discipline closes that gap at the scale most SaaS teams publish. It closes when the system that publishes a claim also records where the claim came from.
We ran this scan on 100 posts from 20 SaaS blogs. The Content Health Scanner runs the same extraction on any URL. Run it on one of yours.