2026-06-18站内改写6 min readUpdated: 2026-06-18

Six months of AI in 2026, and a whole lot of noise

A review of AI developments from January to June 2026, highlighting key model releases, the shift from chatbots to agents, cost concerns with token-based pricing, and industry calls for a pause.

SourceHacker News AIAuthor: jtnl

Looking back at the AI news from January to June 2026, the feeling I'm left with is pretty clear: things went off in every direction. Plenty of innovation, but not always a lot of control.

Almost every week, a new model shows up promising to "change everything." GPT this, Claude that, some Chinese model beating everyone on a benchmark nobody had heard of the week before, video generation at levels we'd never seen.

And then, this week, something we all saw coming but nobody really wanted to hear finally happened: two of the players setting the pace in this AI race start talking about slowing down, even hitting pause.

It's worth stopping for a second to look at what actually happened.

What actually happened between January and June 2026

Setting aside all the noise, here are some of the events that actually matter, in order.

Late January: Moonshot releases Kimi K2.5, an open-weight multimodal model with its "Agent Swarm" that can coordinate up to 100 sub-agents. Meanwhile, GPT-5.2, released back in December 2025, was still one of the main yardsticks new models got measured against.

February 5: Anthropic releases Claude Opus 4.6, followed a few days later by Claude Sonnet 4.6.

Mid-February: Google unveils Gemini 3.1 Pro, doubling down on its push for models that are stronger at reasoning, coding, and long tasks.

Late February: Perplexity, needing to make a move, launches Computer, an agentic system that splits tasks across several specialized models and hooks into outside apps.

February 27: according to Reuters, Donald Trump orders a gradual phase-out of Anthropic's technology across U.S. federal agencies, after a dispute over certain military uses.

The Pentagon goes so far as to label Anthropic a "supply chain risk." Panic sets in, but Anthropic holds its ground, which earns it a lot of admiration. And since, in the land of business, someone always steps into an opening, OpenAI announces a Pentagon deal right on its heels.

March 5: OpenAI launches GPT-5.4, with native computer control, a context window of up to 1 million tokens via the API (the standard window being 272,000 tokens), plus Codex's capabilities rolled into a single model. The real news isn't just the size of the context window. The model can drive a browser and a desktop environment on its own.

This is the shift from "the chatbot that answers" to "the agent that actually does things." OpenAI also reports 33% fewer hallucinations than GPT-5.2.

April 2: Google unveils Gemma 4, its new family of open models under the Apache 2.0 license, built for advanced reasoning, agents, and running on your own hardware. A genuinely interesting family.

April 16: Anthropic releases Claude Opus 4.7, with a new tokenizer that can produce up to 35% more tokens for the same text (anywhere from 1.0 to 1.35 times more, depending on the content).

April 20: Moonshot releases Kimi K2.6, with a 256,000-token context window and an expanded Agent Swarm that can coordinate up to 300 sub-agents.

April 23 and 24: OpenAI introduces GPT-5.5 and, almost at the same time, DeepSeek fires back with DeepSeek V4: a Pro version with 1.6 trillion parameters, an open-source Flash version, and a 1-million-token context window.

May 19: at Google I/O, Google unveils Gemini 3.5 Flash, Gemini Spark, Gemini Omni, and Antigravity 2.0.

The direction is crystal clear: fewer chatbots, more agents. You can see it in the new Antigravity interface, which gives a good sense of Google's vision and where the company wants to take us.

May 28: Anthropic releases Claude Opus 4.8, an unusually fast update within the Opus line.

It brings Dynamic Workflows, among other things. Anthropic also presents the model as smarter and more honest than 4.7, especially when it's working on code. The Effort Control and Fast Mode options show up too.

June 1: MiniMax launches MiniMax M3, an open-weight, multimodal model with a context window of up to 1 million tokens and a heavy focus on coding, agents, and long tasks.

An interesting one, mainly because it's trying to position itself as a cheaper alternative to Claude and GPT.

June 1: GitHub Copilot also changes its billing model. The service drops premium requests in favor of charging for advanced usage through GitHub AI Credits, calculated based on the number of tokens consumed and the model used. The basic features stay included, and each plan comes with a monthly allotment of credits. But the most powerful models and the agents can burn through them fast.

And that's where part of the community started freaking out over the prices: some users blew through their credits in a few hours or days, with estimates running into the hundreds of dollars to keep up the same pace of use.

Coming up soon: on August 2, 2026, most of the European AI Act takes effect, though with some exceptions and adjusted timelines, especially for systems considered high-risk.

If you only look at this list, it looks like a competition. And it is one. Pretty clearly.

But underneath the marketing, a real shift is happening: we're no longer talking only about chatbots, but more and more about agents.

And it's not just a slogan. It's a reality that settled in fast.

We now have:

Models that use the computer.

Models that coordinate different tools.

Models that can work for a long stretch without you having to guide them every step of the way.

Models that can move through an entire codebase instead of just answering a one-off question.

GPT-5.4 and its computer use.

Perplexity's Computer.

Kimi's swarms.

Google's Antigravity.

Claude, more and more geared toward coding and long-running work.

And in real life?

Sometimes the result is genuinely impressive. Even a fairly simple demo can leave you stunned.

But things change when you put these tools on a real project, with technical debt, incomplete context, decisions that were never verified, and, in the end, code nobody wants to touch.

Because the agent can pile on too: rack up technical debt of its own, make calls based on partial context, or ship a solution nobody actually reviewed.

Maybe we just needed to think a bit more, frame the request better, write sharper instructions, and define each agent's role and limits more clearly.

One thing has become clear, though: thanks to agents, MCP, and automation, AI lets you build faster, dig deeper into certain topics, and surface other ideas that, of course, already existed somewhere on the internet, and that can help you move forward.

But the human contribution stays unique, authentic, and all the more valuable.

The direction is clear: we're no longer heading only toward "assistants that answer," or agents doing whatever they want off in a corner.

We're heading toward systems that try to interact with us, to remember tasks and mistakes, in order to improve our work and our projects.

But at what cost?

GitHub Copilot's change to its business model shows another side of this AI race: models and agents eat up a huge amount of resources, so they cost a lot.

Put another way, the all-you-can-eat plan is starting to disappear for some services.

An AI system doesn't just fire off a single request and hand back an answer.

It reads files, passes context, calls tools, generates code, runs tests, analyzes errors, then tries again.

And every step burns tokens.

As long as all of that was bundled into a monthly subscription, it was easy to use these tools as if they were basically unlimited.

But when billing depends directly on usage, your perception changes. You think twice before kicking off a task.

Suddenly the question isn't only which model is the smartest or which one codes best.

You also have to ask how much it costs, how many tokens it burns before it finishes a task, and whether that cost actually pays off on a project.

GitHub Copilot might be showing us the beginning of the end of unlimited plans for advanced AI use.

The mid-2026 turning point: some start calling for calm

And this is where it gets interesting.

Anthropic is now calling for a coordinated pause. Not a pause it would impose on itself alone: a pause coordinated across the whole industry.

Its argument is that models could soon be approaching a phase of recursive self-improvement: systems capable of helping improve other systems, with less and less human involvement. And before we get there, it would be better to have real mechanisms for control, safety, and verification.

That sounds consistent with the message Anthropic has always pushed. But to be honest, you have to tell the whole story.

Some also see this request as a self-serving move. Anthropic is still in the race, still shipping new models, still watching its valuation climb. And according to Reuters, the company was also criticized this year for dropping a more explicit internal commitment to halt certain training runs if especially dangerous capability levels were reached.

In other words, Anthropic is asking everyone to slow down right after giving itself more room to maneuver.

Draw your own conclusions.

On OpenAI's side, Sam Altman has also softened his tone quite a bit. A year ago, he warned that AI would wipe out a chunk of junior office jobs. Now he admits he was "pretty wrong": the impact has been far smaller than he predicted. And he's not alone. Dario Amodei, who'd gone as far as talking about eliminating half of all office jobs, now says automation could actually expand what people are able to do.

So the idea taking hold isn't simply:

"AI is going to take your job."

It's more uncomfortable, but probably more realistic too:

AI is going to change which parts of your job actually have value.

And that's not the same thing.

What I'm taking away from these first six months

Between the frantic race to ship models and the very same people now talking about slowing down, the contrast says a lot about where we are.

On one hand, nobody wants to fall behind.

On the other, it's getting clearer that we're not just talking about a new tool for writing emails faster anymore.

We're talking about models that are starting to create, modify, and audit software, but also to step into systems, decisions, workflows, accounting, how companies run, and even regulation.

In that context, the benchmark of the moment isn't enough.

Not everything that ships is actual news.

Not everything that "beats X on benchmark Y" holds up against a real project.

And not everything that looks revolutionary in a demo changes a developer's day-to-day.

So I'm going to keep watching all of this, but with a filter.

I'll publish what strikes me as genuinely important about the big models and the new players, testing them where I can, and without selling you hype.

If the first half of 2026 was this hectic, the second half promises to be just as much. Especially around energy consumption, token pricing, and the possible end of unlimited plans.

We'll need to keep a close eye on it.

Share this post

Back to articles