The author recounts early access to Claude 5 Fable, the first Mythos-class AI model released to the public. Fable significantly outperforms previous models on complex, multi-hour tasks like building an isochrone map or sophisticated data analysis software. The experience shifts the human role from active builder to a patron who commissions and judges results, as AI autonomously handles research, coding, and decision-making in a black-box manner. Despite its power, limitations include high token usage, sensitive guardrails, and lack of transparency.
Fable outperforms existing models on diverse complex tasks, executing autonomous multi-hour work sessions.
It spawns sub-agents for research and verification, requiring minimal human input.
Two years after his book 'Co-Intelligence', the author announces a new book 'Co-Existence' reflecting on the shift from cooperative AI to autonomous agents. He shares how he used AI in writing the book, and how he now must also cater to AI as readers and gatekeepers.
New book 'Co-Existence' coming October 20, available for pre-order
Author wrote the book himself, but used AI for feedback, fact-checking, and unblocking
The author had early access to GPT-5.5 and finds it a significant step in AI progress, particularly in coding, image generation, and integrated applications. Despite improvements in models, apps, and tools, the 'jagged frontier' of AI ability persists, especially in long-form fiction. The article showcases GPT-5.5's ability to simulate evolving towns, generate near-PhD-level papers, and create role-playing games.
GPT-5.5 Pro is faster and smarter in coding tasks, simulating a dynamic 3D town evolution.
OpenAI advanced in models, apps (Codex), and tools (new image generator with text rendering).
AI capabilities far exceed common usage due to poor interfaces. Research shows chatbot interfaces impose cognitive load, offsetting productivity gains. The article explores specialized interfaces, personal agents like OpenClaw and Claude Cowork with Dispatch, and the emerging paradigm of interfaces generated on demand by AI.
Chatbot interfaces create a cognitive tax that reduces work efficiency
Specialized interfaces (e.g., Claude Code) excel for programmers but don't serve most knowledge workers
The article discusses the exponential growth of AI capabilities and its profound implications for work, markets, and policy. It describes the shift from 'co-intelligence' to 'managing AIs,' with AI agents like Claude Code and Codex now capable of completing complex tasks independently. The author demonstrates rapid AI progress through the 'Otter Test' and various benchmarks, noting that despite impressive capabilities, practical adoption is still early. The piece introduces StrongDM's radical 'Software Factory' experiment and the 'rolling disruption' caused by AI, including market volatility, corporate layoffs, and policy conflicts. Finally, the author warns about recursive self-improvement (RSI) potentially accelerating change, but emphasizes that the current window to shape AI's future remains open.
AI capabilities are growing exponentially, transitioning from co-intelligence to managing AI agents.
Strong benchmarks show AI approaching or exceeding human expert performance.
AI usage has shifted from chatbots to agents. This guide explains the three factors to consider—models, apps, and harnesses—and provides a detailed overview of the current landscape, including GPT, Claude, Gemini, and specialized tools.
AI is now used as agents that perform tasks autonomously, not just chatbots.
In an experiment at the University of Pennsylvania, MBA students used AI tools to create startups from scratch in four days, demonstrating how AI accelerates entrepreneurship. The article explores an equation for agentic work and how effective delegation (management skills) improves AI success rates.
Students built functional prototypes in four days with AI, drastically shortening traditional startup timelines.
The value of AI work depends on three variables: human baseline time, probability of success, and AI process time.
New AI coding tools like Claude Code demonstrate remarkable autonomous capabilities, completing complex tasks and self-correcting errors. The article explores features like long-running autonomy, context compaction, skills, and sub-agents, and discusses their profound impact on programming. While currently focused on coders, these tools hint at broader AI applications in knowledge work.
Claude Code and similar tools can work autonomously for extended periods and self-correct in coding tasks.
They overcome LLM limitations via context compaction, skills, and sub-agents.
AI's ability is uneven, described as the 'Jagged Frontier.' Progress is often blocked by bottlenecks, which when resolved lead to sudden leaps, as seen with Google's Nano Banana Pro improving image generation, unlocking new capabilities like PowerPoint creation.
AI's jagged frontier means it excels at some tasks while failing at others, often unpredictably.
Bottlenecks, like poor image generation, can hold back entire systems until they are solved.
The author compares the original ChatGPT from three years ago with today's Gemini 3, demonstrating AI's leap from chatbots to agents. Gemini 3 can code, build games, and autonomously conduct PhD-level research, marking the advent of the 'digital coworker' era.
Three years ago AI could barely write a poem; now Gemini 3 builds interactive games and conducts complex research autonomously
Google's Gemini 3 and its agent tool Antigravity showcase AI's shift from conversation to action
As AI advice becomes more important, we need to get better at assessing it. Current benchmarks have issues like data leakage, unclear meaning, and uncalibrated difficulty. However, collectively they measure underlying ability. But for specific tasks like writing or business advice, benchmarks fall short. The author proposes 'vibes-based' benchmarking (e.g., asking AI to draw a pelican on a bike) and real-world task testing (like OpenAI's GDPval) to understand AI models, and argues that organizations should systematically interview AI as if hiring an employee.
Current AI benchmarks suffer from data leakage, unclear meaning, and calibration issues
Aggregate benchmarks show upward trend but insufficient for specific tasks
A practical, opinionated guide to using AI in late 2025, covering free vs paid models, choosing among major systems (Claude, Gemini, ChatGPT, Grok, and open-weight models), advanced model selection, getting better answers via deep research and data connections, multimodal inputs, image/video generation, and quick tips. The author emphasizes experimentation and building intuition over becoming an expert.
About 10% of humanity uses AI weekly, mostly with free tools
For paid use, choose among Claude, Gemini, or ChatGPT ($20/month)