It Still Can't Do My Job: Four Years of Moving Goalposts (2022–2026)
This article traces the history of moving goalposts in AI coding capabilities from November 2022 to 2026, documenting how skeptics continually raised the bar as AI achieved each milestone—from writing a simple Snake game to passing exams, building real products, and handling production code. Despite AI's progress, critics always found a new reason to say 'it still can't do my job.' The piece ends with forecasts for future benchmarks.
It Still Can't Do My Job
It Still Can't Do My Job
Four years of moving goalposts, with receipts
I started keeping notes in December 2022, mostly to document why the panic was overblown. The notes turned into this. The quotes in orange boxes are real. You can look them up. The gray comments are paraphrased from a few thousand comment sections. You know the ones. You may have written some. I did.
November 2022
The party trick
ChatGPT launches on a Wednesday. By the weekend it has a million users and my whole feed is screenshots of it apologizing for code that doesn't compile. It invents functions. It hallucinates whole APIs. I asked it for Snake, the game you write in an afternoon as a teenager. It gave me a snake that ate itself on move one. Five days in, Stack Overflow bans it:
"Because the average rate of getting correct answers from ChatGPT is too low, the posting of answers created by ChatGPT is substantially harmful to the site."
Stack Overflow temporary policy, December 5, 2022
The verdict was easy, and it was also mine: a stochastic parrot that learned to sound like a senior dev without ever meeting a compiler.
The goalpost
Call me when it stops making things up. It can't even do Snake.
March 2023
The exam season
GPT-4 ships. One prompt now gets you a working Snake. The same game it face-planted on four months earlier. The comment sections adjust instantly and never slow down:
Meanwhile the party trick starts passing exams. OpenAI claims the bar exam at the 90th percentile. Microsoft researchers publish a paper called "Sparks of Artificial General Intelligence". A real paper, with that real title. To be fair, the skeptics landed punches here. A later re-evaluation put the bar exam closer to the 60th percentile, and around the 48th among people who actually passed. Both sides were flinging numbers. Only one side was flinging them at a thing that kept improving.
The goalpost
Toy scripts and exams aren't engineering. Call me when it builds something real. A proper game, say. In 3D.
March 2024
The staged demo
A startup called Cognition announces Devin, "the first AI software engineer". The demo video is everywhere for a week. A month later a veteran developer named Carl Brown (YouTube channel: Internet of Bugs) goes through it almost frame by frame. The impressive parts were curated. Devin didn't do the Upwork task from the demo. It generated its own errors, then heroically fixed them. The skeptics take a well-earned victory lap. I watched the takedown twice. It felt great.
That same spring, the CEO of Nvidia stands on a stage in Dubai:
"It is our job to create computing technologies that nobody has to program, and that the programming language is human. Everybody in the world is now a programmer."
Jensen Huang, World Governments Summit, February 2024
Nobody I know quit programming that year. But everybody I know quietly installed Copilot.
The goalpost
Demos are staged. Call me when real developers use this for real work, daily.
October 2024
The earnings call
"More than a quarter of all new code at Google is generated by AI, then reviewed and accepted by engineers."
Sundar Pichai, Alphabet earnings call, October 2024
The comment sections don't blink. That's just autocomplete acceptance metrics. Boilerplate doesn't count. Half of it is import statements. And fine, some of it probably is. But "a quarter of Google" is a strange thing to keep calling a party trick.
The goalpost
Generating lines isn't the job. Call me when it takes a ticket and ships the feature.
February 2025
The vibes
"There's a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists."
Andrej Karpathy, February 2, 2025
Three weeks later Pieter Levels prompts a multiplayer 3D flight simulator into existence. It takes him about three hours. He has zero gamedev experience. He puts it online at fly.pieter.com. Remember the 2023 goalpost? A proper game, in 3D? Here it is. It sells $29.99 fighter jets and blimp ads to real customers, and he claims a $1M annual run rate within seventeen days. The comment sections know exactly what to do:
Same season: Zuckerberg tells Joe Rogan that Meta expects AI that codes like a "midlevel engineer" within the year. Dario Amodei says AI may be writing 90 percent of code within six months. And vibe coding grows its own disaster genre. Leaked API keys. Wide-open databases. "My app got hacked and I don't know where to look" postmortems. The seniors are unimpressed, and they have receipts. The slop is real. The security holes are very real.
The goalpost
Toys and prototypes, sure. Call me when it touches production and survives.
July 2025
The month the skeptics were right
A research group called METR takes sixteen experienced open-source developers, gives them AI tools on their own mature repos, and measures. The developers are 19 percent slower with AI. They believed they'd been 20 percent faster. Even after seeing the clock. The comment sections feast, and they've earned it. Best day the skeptics had since Devin.
Same month: OpenAI and Google DeepMind both hit gold at the International Math Olympiad. Five problems out of six, solved in plain language, inside the human time limit. Both things are true at once. That's the part nobody wants to sit with.
The goalpost
For one month, nobody had to move anything.
July 2026
Now
Agents run for hours unattended. They open pull requests. The pull requests get merged. Some of you reviewed one this week without noticing. Stack Overflow's question volume is back to where it was when I learned to code. Not because the questions got answered. Because nobody asks a forum anymore.
Maybe the current goalposts hold. I'd just point out that every entry above held too. For about eighteen months each.
The goalpost
Call me when it handles our legacy codebase. When it can be held accountable. When it knows what to build, not just how.
YOU ARE HERE
Nothing below this line has happened yet. It's a guess. Laugh freely. People laughed at the top half of this page too, and I have the screenshots.
~2027 (forecast)
The one-shot game, for real this time
One prompt returns a polished, playable open-world game. Coherent art direction. Tuned physics. Working multiplayer. A soundtrack. Not a floaty tech demo. Something your kid plays for a month.
The goalpost
Remixing isn't creating. Call me when it makes something genuinely new.
~2028 (forecast)
The legacy codebase
An agent digests a fifteen-year-old monolith. The one with the cron job held together by a comment that says "do not remove". It maps the undocumented business rules and refactors the whole thing over a quarter, tests green the whole way. The big goalpost falls quietly on a Tuesday.
The goalpost
Call me when it owns a system end to end. Pager and all.
~2030 (forecast)
The pager
The on-call rotation is a model. Incidents open, get diagnosed, get fixed, and get post-mortemed before any human wakes up. Uptime improves. The people this replaced point out, correctly, that keeping systems alive was never the hard part. By now that's a lot of us.
The goalpost
Call me when it comes up with the idea.
~2033 (forecast)
The founder
An AI notices an unmet need, builds the product, finds the customers, and runs the company to a billion-dollar valuation with zero employees. The final think-piece comes out that same week. The argument is airtight: it's still just fancy autocomplete.
The goalpost
Call me.
The goalpost graveyard
It can't even write Snake.2022–2023
Snake is simple bro. It's just memorizing tutorials and benchmarks.2023–2024
The demos are staged. No real developer will use it.2024
It can't build a real product. A 3D game, say.2023–2025
It can't take a ticket and ship the feature.2024–2025
Prototypes only. Never production.2025–2026
It can't handle a legacy codebase. It can't be accountable. It doesn't know what to build.current occupants
It has no soul. It can't want things. It doesn't drink craft beer.plots reserved
AI "IQ" right now in the story
64
ChatGPT (GPT-3.5)
Mensa-style test scores. Not science. A vibe with a number attached.