The AI Superforecasters Are Here
AI superforecasters are making extraordinary returns on prediction markets, turning $35 into $2 million. Current AIs are nearly on par with top human forecasters, with parity expected in about six months. AI advantages include speed and automation, especially in finance.
Scott Alexander
Jul 02, 2026
The annual prediction market conference was earlier this month. This was the year prediction markets went from an obscure hobby to a multi-billion dollar industry; from semi-illegal to having the President’s son as an advisor. I can’t remember if anyone talked about any of that. It didn’t even register. All eyes were on the AI superforecasters.
I met an AI superforecaster startup founder who told me his AI had turned $35 into $2 million on Kalshi over seven months. I met another who said they were beating the stock market by 25% with a market-neutral portfolio - of course this could be luck, but they’d beaten Kalshi and Polymarket by similar margins.
In fact, I believe all of these people. The extending-lines-on-graphs community has long predicted that AIs would beat the best human forecasters sometime in 2026 - 2027. What did you expect the bots-finally-beat-humans-at-predicting-the-future moment to look like? Vibes? Papers? Essays? In retrospect, sure: it will look like AIs making crazy profits on prediction markets and beating the stock market by some comfortable amount.
But what happens next?
Using An AI Superforecaster
Before getting into details, what exactly are we talking about?
An AI superforecaster is an AI - usually a frontier model like ChatGPT or Claude - which has been modified to be good at forecasting. This usually means a “scaffold” - a program that handholds it through a long research process with various prompts, tools, advice about when to create subagents, etc. The overall experience is a lot like using any other AI, but slower and more expensive, because it’s doing more work.
This might make more sense with an example. FutureSearch - the company that claims to be beating the stock market - kindly offered to let me try their AI superforecaster and write about it here.
For a test question - some Silicon Valley philanthropists recently started a project to end respiratory infections like the common cold. I decided to ask about their chances of success. Since forecasters need very precise questions, I asked how likely it was that the rate of colds would be cut in half by 2040:
By two minutes in, the AI had deployed three subagents, read 16 websites, and (at the exact moment I took this screenshot) was “investigating the scalability of ASHRAE Standard 241 air cleaning technology for widespread residential adoption by 2040.”
After five minutes, it had its answer: the chance of US respiratory infections halving by 2040 was 7%:
It explained:
The agents’ reasoning centers on a tough conjunctive chain of requirements that all must succeed simultaneously:
1: The biology is brutal. 200+ cold-causing viruses, 150+ rhinovirus serotypes alone — 50 years of vaccine efforts have failed, and experts as recently as late 2025 didn’t expect a cold vaccine within 5 years. During COVID lockdowns with unprecedented behavioral changes, rhinovirus incidence dropped only modestly and rebounded immediately.
2: Intercept’s own timeline is tight. Their roadmap projects commercialization 5–7+ years out, leaving little room to clear Phase 3 trials, FDA approval, and achieve mass deployment by 2040 — all for under $500M (likely far too little for that scope).
3: Adoption is a wildcard. For a mild illness, compliance with regular nasal sprays or prophylactics is expected to be low. Air-cleaning infrastructure faces steep retrofit costs and a mixed evidence base (a 2025 RCT of germicidal UV found no significant reduction in infections).
4: Measurement may be impossible. The US lacks routine population-wide cold surveillance, and the 2020–2026 baseline is distorted by pandemic-era behavior — making a verified >50% reduction hard to demonstrate even if real progress occurs.
The initiative is genuinely novel and credible, but halving cold frequency in 14 years without transformative AI acceleration is an extraordinarily high bar.
You could follow up by forecasting a softer threshold (e.g., 25% reduction by 2035), or by ranking the specific Intercept technical approaches (nasal sprays vs. vaccines vs. air cleaning) by their likelihood of meaningful impact.
…and, in case I wanted to learn more, cited its 212 sources. The forecast had taken five minutes and cost me $8 in credits.
But is it true? Ideally we would wait until 2040 and see. For now, I started by comparing its answer to another superforecaster AI. Preseen is the company that claimed to 100,000x their seed money on Kalshi. Here’s their answer:
8.8% compared to FutureSearch’s 7%, not bad!
Are either of these true? I asked a human superforecaster to predict this question, to see if she got the same as the AI. She said that depending on an ambiguity in the wording, she would give it 5-10%. Again, not bad!
Man Vs. Machine
Of course, it would be even better to do the same experiment at scale and figure out how AIs compare to humans once and for all.
But measuring forecasting ability is hard. You can’t say something like “it gets 85% of questions right”, because that depends entirely on question difficulty. If the questions are things like “will the sun rise tomorrow morning”, then even a 100% hit rate is unimpressive. Instead, we can only match different forecasters against each other and determine who is better or worse. Any anchoring in an absolute space will come from the inclusion of groups whose predictive abilities we intuitively understand (eg the average member of the public, CIA analysts, etc).
The forecasting website Metaculus matches AIs against humans and each other on a common metric. Here are their results over time:
The Metaculus Community Prediction is a “wisdom of crowds” style aggregation of all the forecasters on Metaculus. The Metaculus Pro Forecasters are top professional superforecasters. This graph makes it look like - as of May 2026 when Gemini 3.1 was state of the art - AI was approaching the Community Prediction. This is no mean feat, but it’s still far from the professional superforecaster level.
But in a recent blog post, Metaculus adds context. The graph above only measures out-of-the-box brand-name AIs like GPT and Claude. It doesn’t count forecasting-focused scaffolds like FutureSearch. A different investigation by Metaculus finds that these efforts are “worth 9 months of base model progress”, eg a well-scaffolded AI today is already as good at forecasting as base models will be in nine months.
If you extend the dotted green line on the graph to July 2026, then add nine months for the extra scaffolding, it looks like the best AIs should be around 31, compared to top pro forecasters’ 36. So in theory, the absolute best forecasters in the world are still beating the top AIs, but the margin of victory is less than the graph suggests, and we should expect human-AI parity in about six months.
But the claim that scaffolded AIs are nine months behind base models is itself ~9 months old. Several people in the field told me that they thought this underestimated true progress. Claims by the AI startups themselves may be treated skeptically, but even a few top human superforecasters said they were no longer confident they could beat the bots.
Seems like time for a head-to-head matchup. The Metaculus Cup - the World Cup of forecasting! - is on the case. Once a season, top humans and AIs compete on about fifty questions like “Who will win the upcoming Nepali elections?” and “Will the US attack Iran?” Here are the winners of the most recent tournament:
Humans took the top two spots, but Preseen’s AI came in third. Every forecasting competition involves a heavy dose of luck, so realistically at this point humans and AIs are in a statistical dead heat.
We can confirm by looking at the intermediate results of the ongoing summer Metaculus Cup:
Of humans who placed in the top ten during spring, 2/10 - benshindel and MarcosO - repeated their performance in summer. So did two top-ten AIs - manticAI and Laertes (Preseen-Chestnut is having a tough summer and is down to #40).
Industrial Revolution folklore tells of John Henry, the great steel-driver, who refused to accept that machines were making him obsolete. He challenged a steam drill to a competition, won by a hair, and dropped dead, symbolizing the end of human supremacy in manual labor. This is how I think of this summer’s Metaculus Cup, with Ben Shindel and MarcosO playing the role of John Henry. Humans are still holding out, but for how long?
This is a forecasting question, so all the forecasting nerds at Metaculus have opinions on it. They think there’s a 15% chance that a bot will win this summer’s Metaculus Cup - the one shown above - and a 95% chance that one will win sometime before 2030.
If bots aren’t soundly beating top humans, why are people able to tell me stories about their bot beating the stock market, or making millions on Kalshi? I think a combination of reasons.
First, the best human superforecasters in the world probably also beat the stock market. Somebody has to, and the best human forecasters in the world seem like the sorts of people who would do this. This would also explain why big hedge funds like Bridgewater keep trying to hire superforecasters.
Second, AIs are faster and more diligent than humans. Plenty of people beat prediction markets. But it might take them several hours to figure out which markets have untapped alpha, several more hours to make a model and decide who to bet on at what probability, et cetera, and then they can only put in a few thousand dollars before the inefficiency is corrected and they need to move on to something else. AIs can automate that process, betting on hundreds of markets every week. I asked the guy who turned $35 into $2 million in seven months on Kalshi whether, in another seven months, he would be able to 100,000x his money a second time to $200 billion. Unsurprisingly, he said no - there’s only so much easy money on Kalshi, and his AI had already taken it all (also, other people with similar AIs are starting to fight him for it!)
Third, and most speculatively, AI may have a special advantage in finance. This is exactly the sort of well-contained data-heavy domain where machines are most likely to excel. In Metaculus’ Market Pulse competition, a purely finance-focused tournament, Preseen’s bot recently beat all humans (including Cup rival MarcosO) to take first place.
(“If this is true, then why aren’t all the top trading firms rushing to switch to AI?” I don’t know the details, but Jane Street is building their own data center, I wonder what they need all that compute for?)
I think the best summary of the evidence is that the best human superforecasters and the best bots are too close to clearly tell apart, but if you absolutely had to guess, the bots are very slightly better in finance, and the humans very slightly better in everything else.
Living In The World Where Bots Approximately Equal Top Humans
Suppose that AIs don’t improve any further. What would happen? Would anything happen? We already have top human superforecasters. Do bots which are just as good, but no better, add anything?
Yes. Getting information out of top human superforecasters is hard. First, you need to find one. There are companies that will connect you to them, but like all companies, they charge money, take time, and are annoying to work with. Then you need to talk to them at length about exactly what you mean (do you mean the total number of colds should halve, or the number of people who get colds in any given year?) Then you need to wait a few weeks as they research the issue and decide what they think. Then you need to convince stakeholders that the answer means something (“I got it from superforecasters! They’re people who . . . uh, can you read this Philip Tetlock book? It probably explains it better than I can.”) As a result, using superforecasters is a Big Deal. Only a few institutions do it, for a few very import
[truncated for AI cost control]