AI News HubLIVE
In-site rewrite5 min read

I don't care if web content is AI-generated

The author argues that despite common complaints about AI-generated content being inaccurate, inauthentic, and undisclosed, human content suffers from similar issues. All online content should be judged on its own merits, and AI may eventually become a trusted source. The piece reflects on the 'Gutenberg parenthesis' and the need to relearn information evaluation skills.

SourceHacker News AIAuthor: speckx

Why I don’t really care if web content is AI-generated

I’ve noticed people taking an interest in the “small net” or “small web” because they’re fed up with the amount of material on the mainstream web which is generated by large language models (LLMs) and the like. They complain about “AI slop”, which I take to mean poor-quality, low-effort, unoriginal content.

There is a lot of this kind of material on the mainstream web; it’s hard to be sure exactly how much, because authorship is often concealed. “Slop” is now starting to appear on the “small” web, too – so it might not always be the refuge it once was. But that’s a matter for another day.

I confess to a visceral distaste for AI-generated content, but I can’t really explain why. I’ve noticed, thought, that that many other people who find AI troubling also can’t explain why. When they try, the most common complaints I hear are that

AI-generated content might be inaccurate, or

it’s “inauthentic”, or

it’s not clear that AI was used in the authoring.

Let’s consider these contentions, one by one.

Is AI-generated content inaccurate?

Well, yes, it often is. But a lot of human-generated content is inaccurate, too. Social media, in particular, carries a lot of poorly-researched material that lacks nuance. Sometimes there is deliberate, wanton misinformation. LLMs can produce false and misleading information – but only because they’ve been trained on such material. So far as I know, an LLM can’t (yet) act in bad faith – immorality requires the special creativity of the human mind.

LLMs are actually quite good at aggregating and summarizing large volumes of information. It’s true that they lack discernment; that is, they’re unable to distinguish fine shades of meaning. But, frankly, many people are not sufficiently analytical in their reading, either. Critical assessment of information isn’t taught widely enough, from a young enough age, but it should be.

In short, while there is a lot of “AI slop” around, there’s an awful lot of “human slop”, too – along with human deliberate mischief-making and sometimes outright wickedness.

Is AI-generated content inauthentic?

I recently read on a web forum a moving account of a woman’s heroic – and ultimately unsuccessful – struggle with cancer, written by her husband of thirty years. Almost before my tears had dried, somebody else had posted that the same text appeared under different names in at least three different places, and was “AI slop”. People with a nose for that kind of thing seem to be able to recognize the signs, even though I did not.

What makes this sort of thing offensive? This AI-generated story was true, in the sense that it was an aggregate of real human experiences, as derived from the LLM’s training data. There wasn’t a real, named human being undergoing the pain and loss but, sadly, there are such people. If the story had been written directly by a human, I doubt the writer would be somebody I knew personally, who could confirm that all the details were true (even if I were heartless enough to ask).

In short, the impact of this story on me, personally, would have been the same, whoever or whatever wrote it. So why is there a problem if it’s AI-generated?

Fundamentally, I think the issue is that we don’t like to be made to sympathize with somebody who isn’t real. No, we do like that: we like it in fiction. We just don’t like it when the story purports to be factual. It takes emotional energy to empathise. Why it does, when the writer is a complete stranger to us, is something I’m still struggling to understand.

I struggle even more to understand why people claim AI-generated content is unauthentic even when it’s fiction: fiction is inauthentic by its very nature, whoever writes it. I doubt that Charles Dickens ever had the experience of waiting in line to be guillotined, but we don’t criticise him for writing as if he had, because A Tale of Two Cities is a work of fiction. I’d argue that being able to live another person’s experiences is a key skill for a novelist, whatever those experiences are.

Of course, an LLM doesn’t have any experiences – it’s a machine. But what an LLM can do is to aggregate the experiences of human writers. The results of this process might be a bit bland, but they’re no less authentic than any other work of fiction.

Does it matter whether we know AI was used?

It certainly matters to many people. I suspect these people want to know that AI was used because then they don’t have to read the content. If you think it’s likely to be inaccurate or “inauthentic”, you’ll certainly save time by skipping over it.

Some people won’t read AI-generated content as a matter of principle, whether it might be useful or not. If you think AI is taking people’s jobs, for example, I can see why you might eschew it completely. I’m sure AI is taking people’s jobs. Database company Oracle announced earlier this week that it was laying off 10% of its workforce, citing “increased uptake of AI” as the reason. Whether AI will create a compensating number of new job roles remains to be seen.

Whether I read the AI-generated summary of this morning’s news stories, or skip it and search for something more human, I doubt it will change the uptake of AI in the IT industry. Nevertheless, I can understand why opponents of AI might not want to be complicit in this hostile takeover.

So it seems to me that it’s just common courtesy for a publisher to indicate how AI was involved, given that many people seem to want to know.

But it doesn’t really matter much to me.

Why I don’t care

The complaints about AI in web-based publishing seem to me to have at least some merit. So why do I find that I’m not particularly worried?

For my part, I divide all on-line content into two classes: material written by a known individual (or organization) that I trust, and everything else, in a ratio of about 1:1000. In the “everything else” category I find that I don’t really distinguish between human and AI-generated content: everything falls to be assessed on its own merits. Some AI-generated content is informative and entertaining, but not much; but the same applies to human-generated content. I think it’s a grave error to assume that human-generated content is a priori more likely to be accurate than AI content: very few individuals produce material that I trust without detailed scrutiny, and it takes a long time for any writer to make it onto my trustworthy list.

It seems plausible to me that the category of “known individuals I trust” might one day encompass AI agents. If the AI consistently produces information that turns out to be true, useful, or entertaining, I can see how I might one day accept it as an “individual” for practical purposes. That won’t happen easily, but I don’t easily trust human writers, either.

We’re living at the end of the “Gutenberg parenthesis”. This was the time period – short in terms of human history – during which information was managed institutionally. Until the invention of the printing press, information was largely transmitted orally, and held communally. Printing allowed information to be centralized – exploited, some might say – by large institutions that could afford printing. These institutions had strong, financial incentives to print stuff that was essentially accurate. In my youth, “I read it in a book” was more-or-less a knock-down argument that some fact was true. Before we realized the way the Internet was going, we carelessly extended the same reasoning to web-based content, whose authors had no particular incentive to share only truthful information.

The Internet has made information communal again, but we haven’t yet (re-)learned the skills we need to handle this kind of information properly. It’s no longer obvious which sources can be trusted, and we don’t yet have robust methods to find out.

As a result, scepticism about all on-line information is justified, whether it’s the product of the human mind or a machine.

In short, I’m not worried about AI-generated web content because we need to learn to think about information in pre-Gutenberg terms with or without AI.

Have you posted something in response to this page?

Feel free to send a webmention to notify me, giving the URL of the blog or page that refers to this one.