Paste your best-performing post into a scanner and it lists the claims you forgot were in there. A stat from a 2023 report. A "leading vendor" figure you can no longer place. A percentage that felt solid at publish and has sat untouched ever since.
The post still reads fine, which is exactly why you stopped checking it. What drifted is not the prose; it is the data underneath, and re-reading the post will never surface that.
This is the gap every automated and AI content audit walks into. Point one at your back catalog and it re-checks titles, meta tags, broken links, thin pages, in minutes. The newer ones score your copy and tell you which posts read old. Every one of those checks resolves against your own page, which is why a machine can run them cold.
The number you borrowed from someone else's research is the exception. It moves when they move it, and your page never does, which is the freshness problem underneath all of this. Automating a content audit means deciding whether the audit is going to watch that number, or keep auditing everything around it.
What an Automated Content Audit Checks, and What It Misses
An automated content audit is a system that inventories your published posts, pulls each cited statistic out as a separate claim, and re-verifies whether that number still holds at the source it came from. The older shape of the same word is a crawler, which re-checks the page wrapper, titles, meta, broken links, thin pages, fast and at scale.
Those are the rows that live inside your own page, so a machine resolves them without ever leaving your domain. The cited figure is the row that does not. It lives on someone else's page, and the audit that never leaves your domain cannot see it move.
This is the part a thorough crawl structurally misses. You can run the most complete wrapper audit on the market, clear every flag, and still be publishing a number that stopped being true a year ago. The page passed because the page was never the thing that changed. The data underneath it was, which is what content decay actually is: the claims a post carries falling out of step with the world while the words sit still. A wrapper audit was built to inspect the words and pointed at the data.
What AI Automates Well in a Content Audit
An AI content audit is good at exactly the parts of this that are mechanical and high-volume. It reads every post in a catalog and lifts out each statistical claim, the step a person quits doing somewhere around the fortieth post. It fires an HTTP request at every cited source and reports which ones still resolve. Read the publish date on a source and it tells you the data you leaned on is three years old. Point it at a page and it detects when that page changes.
None of it requires judgment, and all of it is the labor that collapses by hand once a catalog gets large.
What it cannot do is decide what to do next. Extracting claims, re-checking sources, flagging what moved: the machine does all of that well, and none of it is the rewrite.
The rewrite needs a person, because deciding whether a stale number gets a new source, a hedge, or a deletion is an editorial call that depends on what the post is arguing. The AI does not judge intent. It does not weigh whether a claim is load-bearing. It does not publish onto your live page.
Treating it as a verification engine is the honest version of this, and it is also why a manual audit keeps missing source decay even when the person running it is careful: the labor of re-fetching every cited source for every post is exactly the labor that does not scale on a human, and exactly the labor a machine was built for. The first time you watch the Content Health Scanner do that pass, the division of work becomes obvious. The machine finds. You decide.
How to Automate the Audit a Spreadsheet Cannot Run
Automating the part that matters comes down to three steps, and only the middle one is the reason any of this is worth automating.
First, inventory the catalog and extract the claims. Every post in, every cited statistic out, each one its own row with the source it points to attached. This is the inventory a spreadsheet pretends to be and never is, because a spreadsheet holds the posts you remembered to add and the claims you remembered were in them.
Second, re-verify each source. This is the step a spreadsheet cannot run at all. For every claim, the audit fires an HTTP HEAD at the source to check it still resolves, and reads the source's own date to score how old the data underneath the claim has become.
A link checker is happy the moment a URL returns a 200. That 200 proves the page loaded. It does not prove the page still says what you said it said. The figure you cited is the one part of your post that someone else can edit, and they do not send you a notice when they do.
Third, surface what moved into a review queue. Stale claims, dead sources, figures whose underlying data has aged past the point you would still stand behind, all collected in one place for a person to act on. The detection is automated end to end. The decision is not.
Run the data-verification layer of your audit on a single URL with the Content Health Scanner. Paste your best-performing post and it does what the spreadsheet never could: it surfaces the dead sources, the ones now behind a login or paywall you can no longer read, and the share of your citations that resolve to a page you cannot verify. That last number is the one a crawl structurally cannot produce, because the figure you borrowed from someone else's research changes when they change it, and your page never moves.
The scanner is the on-demand pass. The AI's job here is extraction and verification, not writing, and I drew that line on purpose. It tells you which claims exist and which sources moved. It does not decide the rewrite, and it never touches your live post. The anonymous run gives you one scan a day; a free workspace raises that to three, and a paid one higher still.
This is the whole wedge in one move. Run it on the post you are proudest of. The one you stopped checking because it kept ranking. There is a reason that post is the most likely to be carrying a number that no longer holds: it has had the most time to drift, and the least scrutiny, because nothing about its ranking ever told you to look. The audit you actually want is the framework you already know with this one column added, and a content audit checklist that checks your data instead of stopping at the page.
What to Fix First, at the Claim Level
The worst case is the post that is still ranking, still pulling traffic, and carrying a cited number that has gone stale. It is the worst case precisely because nothing flags it. A dead page drops. A broken link gets reported. A post that ranks well on a figure that stopped being true keeps sending confident readers to a wrong number, at volume, with the search engine's endorsement behind it.
So triage by traffic, but do it at the claim level. The thing you act on is the stale claim inside the post. The post around it can wait. Fixing that claim is a smaller, faster, more defensible move than refreshing the whole thing, and it is the move that actually closes the gap.
Refreshing a post does not reach its claims. You can rewrite every paragraph, bump the date, and leave the borrowed number underneath exactly as wrong as it was. When you rank the queue, sort by data age, not traffic alone: the oldest cited data is where the risk concentrates, and a high-traffic post on old data is the row that should move first.
Run Two Clocks, a Quarterly Pass and a Continuous Monitor
How Often Should You Automate a Content Audit
An audit pass is something you run. Treat quarterly as the ceiling, not the floor, with annual fine for a smaller catalog (the genre consensus lands there too). The cited sources behind your posts change on a clock that has nothing to do with your calendar.
That second clock is the one Monitored Pages was built to watch. Enroll the source URLs your claims cite and LiquiChart re-checks them on a recurring schedule, daily or weekly, propagating staleness to any claim that depends on a page once that page changes. The scan you run on demand and the monitor that runs without you are two surfaces of the same content maintenance infrastructure.
A schedule tells you when you last looked. It can never tell you when the number moved.
When the monitor catches a source that has shifted, it does not edit your post. It surfaces the affected claim as a recommendation you approve. Detection and re-verification are automated end to end; the fix stays yours. Nothing publishes onto your live pages without your sign-off.
That gate is the reason you can let the detection run without you and not worry about what it will do to content you spent years ranking. It is the one part of this I will not automate away. This is the same split the staleness layer is built around: catch the drift on its own clock, hold the correction for a person. What the monitor watches is the cited source itself, the page your number was borrowed from, so a change there reaches every claim that leaned on it.
Before you settle on an interval, answer the question the interval is supposed to solve. The honest version of it has nothing to do with how often you mean to look. It asks when you actually last checked whether the data in your best post still holds.
A scheduled audit runs when you remember to run it. A continuous monitor re-checks the cited sources on its own schedule and flags a change whether or not anyone is looking. As readers weigh in above with how often they audit, the question underneath every answer is whether anything but a person's memory is set to notice when a borrowed number moves.
However you answered, the interval only governs how often a person reads the post again. The sources behind it keep changing in the spaces between passes, which is the gap an audit you schedule cannot close and a monitor you leave running can.
There is real evidence the gap is wide. When we scanned 938 SaaS blog posts, the figure I keep coming back to is how fast the rot compounds: about a quarter of the posts that cite data carry numbers two or more years out of date, and the aged share climbs from 2.3% on posts under a year old to 13.1% on posts two to three years old (the full curve is here).
It compounds with how few of those numbers were ever followable in the first place. Across 4,907 third-party citations, only 30% carried an external link a reader could follow, and nearly one in five of those links was dead, gated, or broken on re-check. In a separate trace of linked citations, only 15.0% reached a primary source at all, and a third resolved to a live page where the claimed number was simply gone.
Most of what your catalog cites was borrowed to begin with: in the same corpus, 73% of claims are someone else's data rather than your own measurement. A borrowed number is a number you do not control, on a page you do not own, on a clock you were never synced to.
The curve has the same shape as the risk: the longer a post has been live, the more of its borrowed numbers have had time to move at sources you never watched.
The Audit You Want Watches the Claim, Not the Page
You can automate the audit only if the automation watches the claim itself, the one row that lives on someone else's page. Everything else, the links, the meta, the thin-content flags, automates cleanly because the answer was always inside your own post. The cited figure was never inside your post. It was on loan, and the loan came due without telling you.
Watch the claim instead of the page and the back catalog stops being a liability you re-inspect on a calendar. It becomes content that tells you, on its own, the day one of its facts stops being true.