AI News HubLIVE
站内改写6 min read

Can AI produce writing that we want to read?

The article investigates the current state and limitations of AI writing. Through experiments, the author finds that while AI can imitate famous authors' styles, it struggles to generate vivid scenes with active characters. The piece also discusses the controversy around a Granta story suspected to be AI-generated.

SourceHacker News AIAuthor: streptomycin

When I consider the original question of this series—whether my nine-year-old daughter will go to college—I find myself wondering whether she will actually struggle through the writing process in that old-fashioned way. Readers will always want literature written by humans, but, for everything else—e-mails, advertising copy, legal briefs, student papers—the resistance to A.I.-generated writing will almost certainly slip as technology improves and it becomes functionally impossible to see the difference between writing by a person and writing by a machine. When that happens, the major incentive that educators hold over students—“I will fail you if you cheat”—will disappear, because there will simply be no way to know. With that in mind, I want to take a step back from the implications of A.I. for higher education and ask a more fundamental question: How far are we from that moment? Right now, I believe it’s still easy for people to spot obvious examples of A.I. writing. A professor who reads hundreds of papers and has a decent grasp of her students’ writing ability can recognize the fakes. A manager who starts getting tidy, bullet-pointed, and mostly cheery e-mails from her employees will rightly suspect that robots have autocompleted their messages. Robot writing is also frequently filled with tells: copious em dashes, “not X but Y” constructions, conspicuous verbs (“delve” comes to mind). But those tells generally show up only in Claude’s most rudimentary outputs. What about the kind of prose that we actually want to read? Can Claude produce that? This question, or some version of it, was asked by thousands of enraged readers during the past couple of weeks, after the literary magazine Granta published a Commonwealth Prize-winning story by a writer named Jamir Nazir that seemed to bear all the hallmarks of A.I. writing. People noted the strange recurrence of the word “hum,” for instance, and, especially, the awkward, constipated metaphors that didn’t make much sense. The publisher of Granta then put out a bizarrely ambivalent statement, concluding that “perhaps we never will know” whether A.I. had written the story. Nazir, for his part, rebutted the allegation. A whole bunch of writers screamed that the end times had arrived, or, less persuasively, insisted that the reason A.I. writing could win the Commonwealth Prize was that literary fiction was in such a bad place. (Is literary fiction better or worse today than it was twenty or thirty or forty years ago? I have no idea, but I do know that every generation of writers has made more or less the same complaint.) Using Claude, I vibe-coded a simple game that presented roughly two hundred words of text and asked the player whether it was written by a human or generated by A.I. The sample texts all came from Project Gutenberg, an online library of public-domain literature; I asked the robots to scan through works by writers including George Eliot, James Joyce, Ernest Hemingway, and Arthur Conan Doyle and come up with passages in their respective styles. The robot would then display the results and let me and a few of my friends guess whether each was the real deal or a fabrication. The test rounds were fairly easy. The A.I. writing had tells, including formatting and punctuation problems, and an overreliance on tortured similes and metaphors. A.I. also had a weird habit of making its characters fidget constantly, always running a finger along the edge of a table or adjusting a collar. The most reliable marker, though, was something more abstract, and, I suppose, upon reflection, even a little spooky. The scenes generated by A.I. had characters, but, apart from fidgeting, they mostly did nothing. Consider this passage that Claude generated in the style of Henry Fielding:

Sophia, who had hitherto said very little, now looked towards her father with an expression which Mr. Western could not well interpret, whether as entreaty or reproach and indeed it is probable she scarce knew herself what she meant by it. Jones stood near the window, and had the appearance of a man waiting to hear his sentence pronounced. Western, for his part, had by this time recovered something of his usual bluster, and began again upon the subject of Blifil, commending his estate and his family with great earnestness, as though these considerations alone ought to have settled the matter long since. He spoke of Allworthy’s approval with particular force, repeating the name two or three times, as if that name carried an authority which no reasonable person could withstand. Sophia said nothing to this, but she turned away towards the fireplace, where a small coal fire was burning, though the afternoon was not cold enough to have required one.

There is very little action and no certainty. Sophia doesn’t say much, and Mr. Western can’t interpret her expression, which she herself does not fully understand. And, after Western says his piece, which is described with both an “as if” and an “as though” clause, Sophia doesn’t respond, and looks to the fireplace that is burning a pointless flame. In early rounds, the people I shared such deadened passages with immediately assumed that they were fake, even if the robots had done a decent job of approximating a given writer’s style. For the next couple of days, I chatted with Claude about how to get rid of these tells. I told it to avoid similes and to cut down on such words as “nowhere” and “something,” which tended to betray its odd, core ambivalence. For a while, Claude kept spitting out the same inert passages, in which Jay Gatsby or Sherlock Holmes did a whole lot of nothing and had no opinion about the very little that was happening around them. I told Claude that it wasn’t doing a very good job of unlearning its bad habits, and suggested that it create another agent to scan through the fakes and catch any mistakes it made. A third agent made notes with instructions on how best to imitate each author. I imagined these as cue cards that the agent would hold up to make sure everyone remembered to make Dorothea Brooke actually do something. Here’s a sampling of the rules, which I had no part in writing—these are Claude’s instructions to itself regarding how to mimic each author’s style. (I have included only a few; there were typically about ten instructions in each “Does” and “Does Not” category.)

ERNEST HEMINGWAY

DOES:

Strings short declarative sentences with “and” as the primary connective tissue, creating forward momentum

Strips dialogue tags to bare “he said / she said”; rarely uses adverbs or action beats on the same line

Places weather or landscape as a flat factual sentence, not a framed observation (“The sun was over the hills”)

DOES NOT:

Never uses subordinate clause stacking or periodic sentences that withhold the main verb

Avoids Latinate or polysyllabic vocabulary (“illuminated,” “nevertheless,” “subsequently”)

Never attributes interior thought through free indirect discourse or italicized reflection

Never names or explains what a character is feeling directly (“he felt sad,” “she was afraid”)

GEORGE ELIOT

DOES:

Builds long, architecturally balanced sentences with multiple embedded subordinate clauses joined by semicolons or colons

Introduces characters with a brief sociological or class-placing phrase before the name arrives (“a man of some fifty years, whose . . .”) DOES NOT:

Never uses sentence fragments for emphasis or rhythm

Avoids present-tense narration; everything moves in past tense with controlled retrospect

Never uses colloquial or American idiom; no contractions in narration

Multiplying the robot workforce and reminding the bot of its task seemed to work, at least in part. (When I asked a friend who teaches computer science and machine learning at U.C. Berkeley why the robots needed other robots to check their work, he replied, “One hundred percent serious answer: No one knows.”) The similes went away. But Claude took some of the new directives a bit too seriously; suddenly, every fake passage was filled with characters hopping on a horse, or delivering an important package, or running. This, for whatever reason, led to very short sentences that were easy for people to spot as fake. So I loosened the rules a bit, and let Claude do its usual thing, with a handful of strict rules about vague words and similes. After a few days of testing, I posted a link to the test on my X account. Within five days, I had more than thirty thousand responses. The people who took the test were able to identify a real passage versus a fake one roughly fifty-two per cent of the time—which might be another way of saying that they couldn’t actually distinguish the two. But roughly ten per cent of players seemed good at the game, whether because they had prior knowledge of the original material or a particularly keen eye for A.I. tics that I still don’t recognize. By this point, I had figured out how to make slightly better fakes. I deployed another A.I. employee and had it double-check both samples for tells. And, by the end of the week, I was fooling more than half of the people who played the game. The sample that tricked the most people came from a robot Bram Stoker. Only seventeen per cent of players were able to discern that it was fake.

4 May. I have spent the greater part of this morning at the window of my room and have given myself up to a course of reflection which I had hoped to avoid by means of constant activity, but which the absence of any occupation in this place has finally rendered unavoidable. The Count was last seen by me, so far as I can with certainty assert, on the evening of the second; and his absence has now extended through two nights and the better part of three days. I do not believe that he has left the castle. The horses are in the stable. The great door at the foot of the south stairs has been locked from within since Tuesday. I have walked the corridors of the three lower floors twice each night and have heard no sound but the wind in the chimney of the hall. And yet I am convinced, by a means I cannot explain, that he has been in some part of the castle during the whole of this interval, and that he has known of my walks.

What struck me was that, although this is definitely a better facsimile of Bram Stoker than earlier iterations of the game included, it still describes absence and stasis. The narrator is trying to avoid a “course of reflection” through “constant activity,” but can’t find enough to do to occupy his mind. The Count is nowhere to be found, leaving the narrator to walk through empty corridors where he hears “no sound but the wind in the chimney in the hall.” Not all of the fake samples contained this degree of emptiness, but a sufficient number did to suggest that, though Claude can generate imitations of famous public-domain authors—ones that are good enough to fool the vast majority of even discerning readers, though not all of them—it still can’t reliably have those characters do much of anything. No amount of additional cue cards or feedback could fix this problem; the second I asked it to make things more active, the stunted and more easily identifiable A.I. prose kicked in again. I hesitate to claim that this is the great tell, because it sounds, well, far too literary, or even corny—I am a bit too bashful to fully indulge in what it might mean that the robots cannot quite bring a scene to life. I will leave that to the poets and the anti-clankers. My only humble submission in this dialogue: the art of fiction relies, in heavy measure, on the reader accepting these descriptive, atmospheric passages that Claude seems to favor as what the literary critic James Wood has called “a camera’s easy swipe.” Wood has argued that an author’s choices, both big and small, always push up through the surface. A.I. makes choices, too, not by drawing on its personal reveries about, say, a street in Paris at dusk but rat

[truncated for AI cost control]