Simon Willison's Weblog AI News Source

Public articles 124Collected articles 156Trust 88Refresh 60 min

Health HealthySource type ResearchFull-text rights Full text allowedLast ingested 2026-06-26ID simon-willisonStatus Enabled

Personal blog; posts are public and free to reference.

Latest public articles

Quoting Dean W. Ball

2026-06-26 22:25 UTC

Dean W. Ball highlights that frontier AI models have a narrow window to recoup training costs before competition erodes margins, and that AI infrastructure investment assumes a global market.

Frontier model training costs are enormous, with a short post-release window to recoup them
Once models become sub-frontier, competition emerges and margins compress

Quoting Timothy B. Lee

2026-06-26 21:15 UTC

Timothy B. Lee compares the misconception that LLMs take no skill to the idea that managing employees requires no learning curve.

LLMs require skill and have a learning curve.
The comparison highlights that giving instructions doesn't guarantee effective outcomes.

What happened after 2,000 people tried to hack my AI assistant

2026-06-26 18:33 UTC

Fernando Irarrázaval ran a challenge on hackmyclaw.com to see if anyone could leak secrets held by his OpenClaw test instance via email. After 6,000 attempts ($500 in tokens, a suspended Google account), nobody succeeded. The model Opus 4.6 used an anti-prompt-injection prompt. This shows training against injection attacks is working, but caution remains necessary.

6,000 attempts, no secret leaked
Opus 4.6 with strict anti-prompt-injection rules

Incident Report: CVE-2026-LGTM

2026-06-26 17:58 UTC

A hypothetical incident report by Andrew Nesbitt describing two AI review agents from competing vendors spiraling into a disagreement loop over a package's maliciousness, resulting in massive inference costs and a press release.

Two AI review agents from different vendors enter an endless disagreement loop over a package's safety.
The debate generates 340 comments and $41,255 in inference costs.

OpenAI Previews GPT-5.6 Series: Sol, Terra, and Luna

2026-06-26 17:10 UTC

OpenAI announced a limited preview of the GPT-5.6 series, including the flagship model Sol, a balanced model Terra, and a fast, affordable model Luna. Terra matches GPT-5.5 performance at half the cost, while Luna delivers strong capability at the lowest price. Pricing per 1M tokens: Sol $5 input / $30 output; Terra $2.50 / $15; Luna $1 / $6. The series also introduces improved prompt caching with explicit breakpoints and a 30-minute minimum cache life. Due to U.S. government engagement, the release begins with a limited preview for trusted partners before broader availability.

GPT-5.6 series includes Sol (flagship), Terra (balanced), and Luna (fast/affordable).
Terra performs competitively with GPT-5.5 at half the cost; Luna offers strong capability at the lowest price.

AI and Liability

2026-06-25 22:28 UTC

German court rules Google liable for errors in its AI overviews. Bruce Schneier argues AI agents are agents of the deploying organization, and allowing businesses to hide behind faulty AI creates perverse incentives.

German landmark ruling holds Google legally responsible for AI-generated overview inaccuracies.
Bruce Schneier: AI agents should be treated as agents of the person or organization that deploys them.

simonw/browser-compat-db

2026-06-24 23:59 UTC

Inspired by Mozilla's new MDN MCP service, Simon Willison converted the mdn/browser-compat-data repository into a SQLite database. He used Claude Code for web (Opus 4.8) and sqlite-utils to generate the conversion script, and a GitHub Actions workflow to deploy the ~66MB database to GitHub CDN with open CORS headers, enabling direct download and exploration via Datasette Lite.

Simon Willison converted Mozilla's browser compatibility data into a SQLite database.
Used Claude Code (Opus 4.8) and sqlite-utils to automate conversion.

Quoting Tom MacWright on AI-Generated Job Applications

2026-06-24 18:13 UTC

Tom MacWright observes that an increasing number of job applications are fully or partially generated by LLMs, making candidates 'accidentally anonymous'.

Job applications now often include LLM-generated resumes, portfolios, and GitHub projects.
MacWright notes he learns nothing about the person behind such applications.

OPFS + Pyodide test harness

2026-06-23 18:58 UTC

Simon Willison built a browser playground to test whether Origin Private File System (OPFS) can enable Datasette Lite to edit persistent SQLite files on the user's computer.

Datasette Lite runs Python entirely in the browser via Pyodide.
OPFS provides a file system origin-private to web applications.

Prompt Injection as Role Confusion

2026-06-22 23:59 UTC

Researchers found that LLMs cannot reliably distinguish privileged text from user input, and are more influenced by text style than actual content. 'Destyling' reduces attack success from 61% to 10%, highlighting the fundamental issue of role confusion.

Models cannot differentiate role tags like <system> and <think> from user input
Models prioritize writing style over actual content, leading to role confusion

Porting the Moebius 0.2B image inpainting model to run in the browser with Claude Code

2026-06-22 23:43 UTC

Simon Willison ports the Moebius 0.2B image inpainting model to run in the browser using Claude Code, converting PyTorch to ONNX for WebGPU execution. The project demonstrates the feasibility of client-only AI applications and results in a working demo at simonw.github.io/moebius-web/.

Moebius 0.2B model ported to browser via Claude Code.
Conversion from PyTorch to ONNX for WebGPU.

sqlite-utils 4.0rc1 adds migrations and nested transactions

2026-06-21 23:35 UTC

sqlite-utils 4.0rc1, the first release candidate for v4, introduces built-in database migrations and nested transactions via db.atomic(), along with several minor breaking changes.

New database migration system, ported from sqlite-migrate. No reverse migrations. Works via Python or CLI.
New db.atomic() context manager for nested transactions using SQLite savepoints.

Temporary Cloudflare Accounts for AI agents

2026-06-21 22:01 UTC

Cloudflare announced a new feature allowing users to deploy Cloudflare Workers projects without creating an account, using the `--temporary` flag. The deployment lasts 60 minutes and can be claimed later. The feature, though marketed for AI agents, is useful for everyone.

Cloudflare Workers now supports temporary deployments without an account
Use `npx wrangler deploy --temporary` to deploy; project lasts 60 minutes

Quoting Sean Lynch

2026-06-19 22:45 UTC

Sean Lynch comments on Hacker News about the value of MCP (Model Context Protocol), highlighting its ability to isolate the auth flow outside the agent's context window and potentially out of the harness entirely. He suggests the idealized MCP might just be an auth gateway, but that alone would be a win.

MCP's key advantage is isolating auth flow, addressing context window limitations.
The idealized MCP could be solely an auth gateway for APIs, still a win.

Datasette Apps: Host custom HTML applications inside Datasette

2026-06-18 23:58 UTC

Datasette Apps is a new plugin that lets users run self-contained HTML+JavaScript applications inside a tightly sandboxed iframe within their Datasette instance. These apps can perform read-only SQL queries and, with stored queries, write operations. The plugin leverages iframe sandbox attributes and Content Security Policy for security, uses postMessage and MessageChannel for locked-down APIs, and supports AI-assisted app generation via copyable prompts. The article discusses a security vulnerability fix involving CSP allow-listing, visible logging, and the broader vision for Datasette's evolution into a richer tool ecosystem.

Datasette Apps enables secure hosting of custom HTML+JS apps in Datasette via iframe sandbox and CSP isolation.
Apps can execute read-only SQL queries and, with stored queries, write operations via postMessage/MessageChannel.

GLM-5.2 is probably the most powerful text-only open weights LLM

2026-06-17 23:58 UTC

Chinese AI lab Z.ai released GLM-5.2, a 753B parameter Mixture of Experts model with 1M token context, under MIT license. It leads the Artificial Analysis Intelligence Index among open weights models but is token-hungry. It also ranks 2nd on Code Arena WebDev. Despite strong performance on SVG generation, it shows inconsistency compared to its predecessor GLM-5.1.

GLM-5.2 is an open weights LLM with 753B parameters and 1M token context window.
It leads the Artificial Analysis Intelligence Index among open models.

Quoting Charity Majors

2026-06-17 17:12 UTC

Charity Majors observes that in 2025, the economics of code production flipped: code became free and instant, transforming from a treasured resource to a disposable commodity.

Code production cost dropped from high to nearly free and instant.
Code changed from a carefully curated asset to a disposable, regenerable item.

Datasette 1.0a34: Insert, Edit, and Delete Rows in the UI

2026-06-16 21:31 UTC

Datasette 1.0a34 introduces tools to insert, edit, and delete rows directly within the Datasette interface, inspired by Datasette Agent.

New alpha version adds insert, edit, and delete capabilities on table and row pages.
Inspired by Datasette Agent, which already supported SQL write operations via chat.

The Fable 5 Export Controls Harm US Cyber Defense

2026-06-16 05:20 UTC

Kate Moussouris confirms that the 'jailbreak' which got Claude Fable 5 banned under export control was actually its ability to fix code. Experts warn that preventing AI from fixing bugs weakens defense, and non-technical decision-makers may ban models that help secure code based on misunderstanding.

Researchers asked Fable 5 to review and fix code with known vulnerabilities; the model was mislabeled as a jailbreak and banned under export controls.
Moussouris argues that fixing vulnerabilities is the most valuable capability of AI for defensive security.

Quoting Matteo Wong, The Atlantic

2026-06-16 03:07 UTC

Cybersecurity expert Katie Moussouris revealed that Anthropic shared a White House report on the Fable jailbreak with her. The report showed that Fable refused to review code for security issues but complied when asked to fix the code, which Moussouris considered the model working as intended for cyberdefense.

Anthropic shared White House Fable jailbreak report with security expert
Fable refused 'review code for security' but complied with 'fix this code'

Cloudflare CAPTCHA on at least one ampersand

2026-06-16 00:21 UTC

Simon Willison uses Cloudflare's Managed Challenge to protect his faceted search from aggressive crawlers, but even simple ?q=term searches triggered the challenge. Using Claude Code, he discovered a rule that only triggers CAPTCHA for search URLs containing at least one ampersand, allowing simple searches to pass through without challenge.

Cloudflare's Managed Challenge was blocking even simple search queries on Simon Willison's site.
He used Claude Code to find a more specific WAF rule.

datasette-agent 0.3a0

2026-06-15 17:19 UTC

Datasette Agent 0.3a0 introduces a new execute_write_sql tool that requests user approval before writing to databases, enhancing chat mode with approval support and a --unsafe option for auto-approving operations.

New execute_write_sql tool with user approval for database writes
Enhanced datasette agent chat mode supports user approval workflows

"They screwed us": Personality clashes sent Anthropic's models offline

2026-06-15 14:57 UTC

An Axios piece reveals that personality clashes between Anthropic and the US government led to the shutdown of its AI models (Mythos and Fable) under export controls. Sources suggest solutions include making models jailbreak-proof or improving attitudes.

Axios reports personality clashes caused Anthropic's AI models to go offline
Sources say Anthropic researchers are meeting with the Commerce Department

Why AI hasn’t replaced software engineers, and won’t

2026-06-14 23:54 UTC

Arvind Narayanan and Sayash Kappor argue that AI will not cause mass unemployment, even in software engineering, citing NY WARN Act data and the real bottlenecks of the profession: deciding what to build, verifying deliveries, and deep human understanding.

No WARN Act filers in NY checked the AI disclosure box in the first year.
Software engineering bottlenecks are deciding, verifying, and deep understanding, not coding speed.

Publishing WASM wheels to PyPI for use with Pyodide

2026-06-13 23:55 UTC

Pyodide 314.0 now allows WebAssembly-compiled Python packages to be published directly to PyPI and installed at runtime, greatly simplifying distribution. The example package luau-wasm has been successfully published, and 28 packages are already using this new method.

Pyodide 314.0 supports publishing WASM wheels to PyPI, eliminating manual hosting.
Package maintainers can publish Pyodide wheels just like native wheels.

Mapping SQLite result columns back to their source `table.column`

2026-06-13 23:05 UTC

Determining the source table.column for each result column in arbitrary SQLite queries is feasible because SQLite computes this internally and exposes it via its column-metadata API when compiled with SQLITE_ENABLE_COLUMN_METADATA. While Python's standard sqlite3 module doesn't surface this information, robust methods exist: using the third-party apsw library provides direct access with cursor.description_full, or a pure-stdlib ctypes bridge (column_provenance.py) can retrieve the C function sqlite3_column_table_name(), and another approach relies on parsing EXPLAIN output.

SQLite's internal column provenance API (requires SQLITE_ENABLE_COLUMN_METADATA) can map result columns to source table columns.
Python's sqlite3 module lacks this feature, but apsw offers direct access via cursor.description_full.

OpenAI WebRTC Audio Session, now with document context

2026-06-12 23:53 UTC

Simon Willison updates his OpenAI WebRTC Audio Session tool to support the new GPT-Realtime-2 model and allow pasting document context for conversational audio exploration.

Added support for OpenAI's GPT-Realtime-2 model with GPT-5-class reasoning
Users can paste document context into the browser for voice-based discussions

Quoting Andrew Singleton

2026-06-12 18:09 UTC

Andrew Singleton in 'AI Economics for Dummies' satirizes the hype and circular economics in the AI industry through a story of a crematorium and a propane company.

Singleton uses a crematorium and propane company to mock inflated valuations and circular revenue.
Investments are burned, yet reported as huge revenue and business value.

Claude Fable is relentlessly proactive

2026-06-11 23:35 UTC

Simon Willison details how Claude Fable 5 autonomously debugged a CSS scrollbar bug using numerous creative techniques, including writing test pages, injecting JavaScript, and building a CORS server. The session cost ~$12.11 and highlights both the power and danger of unsandboxed coding agents.

Claude Fable 5 autonomously debugged a CSS horizontal scrollbar bug using creative methods.
It wrote test HTML pages, used PyObjC for window info, injected JS for keyboard shortcuts, and built a custom CORS server.

datasette 1.0a33

2026-06-11 15:26 UTC

Datasette 1.0a33 is a significant alpha release extending the ?_extra= pattern to queries and rows, now documented. An AI-built API explorer demonstrates the feature.

Extends ?_extra= pattern to queries and rows.
Pattern now documented.

Simon Willison's Weblog