The Internet Archive is not "in peril". Stop doing its PR, and start doing some journalism
There's a simple solution to the Wayback Machine's complaints, which it could implement itself. Why doesn't anyone ask it?
As a journalist—and that’s the label I’ll give myself, as someone who views their general mission as being to spread useful information, and to ask difficult questions of people who are obfuscating—let me first say, as many other journalists will, that the Internet Archive (aka the Wayback Machine) has been invaluable many times for tracking down lost content, or finding when content has been suspiciously or accidentally deleted, or for making comparisons between now and some previous period. It’s truly one of the marvels of the internet age, along with Wikipedia, as a store of knowledge.
However, like any organisation, the Archive isn’t above criticism. And I think it deserves criticism for its latest attempts to paint itself as the wronged party, “in peril” (to quote a Wired headline) from news organisations who are blocking its indexing because they don’t want AI scrapers accessing their content via the Wayback Machine. Rather than giving their content up to AI companies for free and without any control, those organisations want to control how their content is used by AI companies, and whether they get paid for it. This shouldn’t be controversial: they’re the content creators, they own it, so how they monetise it—paywall, advertising, charging AI companies to index it—is their own decision. But if it all goes into the Archive, and the Archive doesn’t block AI scrapers, then the news organisations lose that leverage over the AI companies.
Ergo: a number of them, such as The Guardian and New York Times, plus Reddit, are blocking the Archive’s scraper from indexing their content, or limiting what it gets. (There’s an excellent primer on this from January at Niemenlab, by Andrew Deck and Hanaa’ Tameez. Highly recommend reading it.)
This has made the people at the Archive very unhappy, and they are wailing about it online to anyone who will listen—as long as those listening don’t ask the obvious question: why don’t you just block the AI scrapers, so the news organisations (and Reddit) feel confident in letting you index their content again?
Apparently this question is too complex for some journalists to even consider. Take David Sirota, who has signed an open letter to which the Archive has been pointing journalists (at https://www.savethearchive.com/journalists/) and written a Substack post about it. He pointed me to that post when I challenged him on X, where he had said this:
Yeah, totally agree David, that would be bad. But it isn’t happening. Media giants are not threatening the Archive. There is no lawsuit, unlike the book publishers during the pandemic (see below), which really did threaten its existence.
When I challenged him on this, he pointed me to the post linked above.
Let me say that I am offended by him saying I’m being “angry and self-righteous without reading literally anything”. In The Overspill, I’ve linked over the past few months to articles about the Archive counting the cost of its mistaken ebook lending scheme (November 2025), Wayback Machine director Mark Graham’s self-pitying Techdirt article about news organisations’ concern about AI scraping (February 2026) and Kate Knibbs’s almost perfect article at Wired last week on the topic. My only quibbles with Knibbs’s piece are that the headline says the Archive is “in peril”, which it absolutely is not1, and that she appears never to have asked the Archive “why don’t you block AI scrapers, then?” But of course, for Sirota to assume he must know more than anyone else is internet praxis, I guess.
So looking at his post (reading time: 1 minute) we find this:
The moves [by news organisations to block the Archive’s scraper] aren’t malevolent - these outlets are worried that Big Tech will exploit the archive to allow artificial intelligence to appropriate intellectual property and spit it back to users without any credit or links back to the original source. As the founder and editor of a news outlet that often has our work scraped, pilfered and pawned off as someone else’s, I believe that’s a legitimate concern.
But that concern cannot justify blocking the Internet Archive’s crucial preservation work.
So the media organisations have legitimate concerns which would affect their future monetisation plans and might have far-reaching effects on their existence, but they should set those aside for an organisation that has the power itself to stop all this by blocking AI scrapers. I do not understand people who can’t see that if the media organisations can block the Archive’s scrapers (and, one takes it, unauthorised AI scrapers), then the Archive could do exactly that too.
Nobody seems to have asked this question. Or when they do, it makes the people at the Archive very uncomfortable, and they say things like (in the Niemenlab article) “if publishers limit libraries, like the Internet Archive, then the public will have less access to the historical record.” Yes, this is self-evident. It also doesn’t answer the question.
Sirota is proud of having signed the open letter. Let me say first that I think open letters with multiple signatories are stupid virtue signalling which almost never achieve any of their aims, and often serve only to make the signers look like idiots when new information comes to light. The best you can hope for if you’re the signatory of an open letter is that everyone forgets it2. (For more on why they’re bad, read this; gift link.) A number of internet-associated journalists have signed it. Good for you, folks. Who do you think is going to care, exactly? Who’s it being presented to for urgent action? And what is the letter saying on the crucial topic, anyway?
After some throat-clearing about how wonderful the Wayback Machine and the Archive, we get this paragraph in the letter text:
We also recognize the efforts the Internet Archive is making to respect journalism as our profession grapples with the impacts of AI. Generative AI can provide complete narrative answers to user search queries about world events, which has significant implications for the field of journalism. We are thankful that the Internet Archive itself proactively partners with news organizations, and does not engage in paywall circumvention or irresponsible scraping. They value the work of journalists, and it shows in the care that they take to preserve it with integrity.
OK! That’s great! But where does it say that the IA is blocking or intends to block AI scrapers, which is the principal beef here? I assure you this is not selective quotation; that’s the only paragraph that mentions AI. Furthermore, the letter doesn’t anywhere explain how much of the Wayback Machine’s daily or weekly scraping is affected by these news organisations’ move. As a journalist, wouldn’t you think that before you stick your name on something public like this open letter that you’d actually trouble to find out what the complaint is about, and how serious it is? Is it 5% of what gets indexed? 1%? 50%? Who knows?
This called for some journalism. So I asked Mark Graham, mentioned above as the director of the Wayback Machine, as he was on X spreading his message.
If the Archive isn’t a backdoor for AI scraping, then it seems odd that publishers are complaining that it allows exactly that. Graham responded to my question. Unfortunately, there was also a spectator who had input:
I did ask Graham where the downside was for the Archive in acceding to news organisations desire that it should block the AI scrapers. He didn’t respond.
The reality is that the Archive has absolutely zero leverage here. It can’t threaten the news organisations with anything, because it’s entirely reliant on their goodwill for access. It has no bargaining power at all in this situation. Which is why it has instead gone for a scare-the-horses PR campaign among journalists who don’t ask too many difficult questions, and an open letter hosted by a friendly organisation at a URL which implies that the Archive is somehow in danger of vanishing beneath the Pacific ocean waves.
In fact, the Archive should probably count itself lucky that news organisations haven’t sought a way to sue it, after nearly coming seriously unstuck with its previous brilliant wheeze to save the internet. When the pandemic hit, it made an ill-advised decision to broaden an ebook lending scheme that used to be a one-to-one system—a single person could “read” the ebook, and then had to check it in, and then another person could “take it out”—to a one (ebook) to many system, where lots of people could read the same ebook at once.
The IA’s argument was that all these people had lost their access to physical libraries because the pandemic stopped physical visits. Publishers, unsurprisingly, felt this was essentially piracy under a different name, and sued. They’d already been edgy about the one-to-one “lending” system, but this was too much. The Internet Archive insisted the scheme was “fair use” and “public interest”. The courts, including appeal courts, disagreed in a September 2024 decision. The point wasn’t really whether the IA’s unlimited lending hurt ebook sales (though you could certainly argue that at the margin, it must have: some people who couldn’t download a book would have bought it), but whether organisations could decide how their copyrighted content is used.
The damages could have totalled $400m—enough to wipe out the Archive. (Its assets were about $10m in 2024, down from a high of $16m in 2023; its annual revenues are about $28m). The publishers settled for a much lower sum—one might guess it’s single-digit millions plus their legal costs—but the Archive had to wipe 500,000 ebooks from its library. You want something that put the Archive in peril? The Archive put itself in peril. The news organisations are being pussycats by comparison.
There’s a tendency on the internet, as in real life, to view things or people we like as perfect in every way, which is why the discovery of feet of clay takes some people by surprise. Wikipedia is fantastically useful; but the embedded bureaucracy creates gigantic flaws which prevent accuracy in some topics. The Internet Archive is fantastically useful, but exaggerates the problems it faces or gets over its skis in its confidence about copyright law—and now, it seems, about whether it can get media companies to listen to it about AI scraping.
So, journalists: first, please don’t sign open letters. And second, if someone tells you that you have to sign because something is “in peril” or “endangered”, remember your sacred duty is not to believe them. Instead, your role model is the guy at the top.
• You can buy Social Warming in paperback, hardback or ebook via One World Publications, or order it through your friendly local bookstore. Or listen to me read it on Audible.
You could also sign up for The Overspill, a daily list of links with short extracts and brief commentary on things I find interesting in tech, science, medicine, politics and any other topic that takes my fancy.
Knibbs almost certainly didn’t write the headline; that’s the job of subeditors, normally.
Notably, the “Six open letters that changed the world” all have sole authors.






Sigh ... I gave up on this politics, as it was very bad for my life. And the Internet Archive does not need me to defend them. But - they are, overall, as a whole, not wrong. Their very existence is walking the tightrope of the actual full quote by Steward Brand:
"Information wants to be free because it has become so cheap to distribute, copy, and recombine - too cheap to meter. It wants to be expensive because it can be immeasurably valuable to the recipient. That tension will not go away. It leads to endless wrenching debate about price, copyright, 'intellectual property', the moral rightness of casual distribution, because each round of new devices makes the tension worse, not better.
And here we are, exactly, on "the moral rightness of casual distribution" - and the effects of scaling it.
I suspect they view "why don't you block AI scrapers" as a poisoned chalice. It gets into perverse situations where "good netizens" respect the block, but "bad netizens" don't care. And there are a lot of outlaw scrapers around now. Even worse, the "good netizens" are run by some of IA's powerful friends. They'll have a morass of trying to chase blocks, which is just a losing game because they don't have the resources to do it well. When they fail - and they will - it'll mean more attacks on them, where they'll end up being kicked every time someone wants a quick story ("The IA claims it's blocking AI scrapers, BUT IT'S NOT!!!").
I'm not a lawyer, and the following is pure speculation, take it for what it's worth: I wouldn't be surprised if IA had gotten legal advice that they should start out asserting "fair use" very strongly, because the alternative of trying to block will *absolutely counter-intuitively* open them up to accusations of bad faith negotiating.
No easy answers here, and everyone is arguably "doing what they have to do".