Using DSPy to evaluate and improve Datasette Agent's SQL system prompts
Simon Willison leverages the DSPy framework to evaluate and refine the core production system prompts for Datasette Agent's read-only SQL question answerer. Using DSPy agents that invoke actual tool implementations against a live in-process Datasette, and a gold-standard auto-generated dataset, the project identifies promising improvements, such as including column names in the schema listing or softening advice that causes column-name guessing.
Research: Using DSPy to evaluate and improve Datasette Agent's SQL system prompts
Simon Willison’s Weblog
Subscribe
2nd July 2026
Research
Using DSPy to evaluate and improve Datasette Agent's SQL system prompts — Leveraging the DSPy framework, this project evaluates and refines the core production system prompts used by Datasette Agent’s read-only SQL question answerer. The methodology involves a harness where DSPy agents invoke Datasette Agent’s actual tool implementations and prompts against a live in-process Datasette, and a gold-standard, auto-generated dataset provides rigorous evaluation via custom metrics.
One of this morning's AIE keynotes covered dspy, which reminded me I've been meaning to see if it could help me improve the system prompt used by Datasette Agent - so I fired off an asynchronous research task in Claude Code for web using Claude Fable 5:
Pip install the latest Datasette alpha and datasette-agent and dspy - then figure out how to use dspy to evaluate and improve the main system prompts used by Datasette Agent for the feature where it can execute read only SQL queries to answer user questions about data.
Fable chose to test using GPT 4.1 mini and nano, and identified several promising looking directions for improvements. I particularly like this one:
The schema listing gives only table names; the "don't call describe_table if you already have the information" advice caused column-name guessing (page_count, o.order_id, first_name) and error-retry loops in baseline traces. Either include column names in the prompt's schema listing or soften that advice.
Recent articles
Have your agent record video demos of its work with shot-scraper video - 30th June 2026
Porting the Moebius 0.2B image inpainting model to run in the browser with Claude Code - 22nd June 2026
sqlite-utils 4.0rc1 adds migrations and nested transactions - 21st June 2026
This is a beat by Simon Willison, posted on 2nd July 2026.
ai 2,097
datasette 1,523
generative-ai 1,854
llms 1,822
evals 43
dspy 3
datasette-agent 18
claude-mythos 19
Monthly briefing
Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.
Pay me to send you less!
Sponsor & subscribe
Disclosures
Colophon
©
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026