SIA: The Open Source Self Improving AI
SIA is an open-source self-improving AI framework that autonomously boosts AI system performance on benchmark tasks by coordinating meta, target, and feedback agents. It achieves significant gains: 56.6% on LawBench, 91.9% runtime reduction on GPU kernels, 502% improvement on scRNA denoising, and ranks #1 on MLE-Bench Hard. Supports local execution and custom tasks. MIT licensed.
Article intelligence
Key points
- SIA uses an iterative loop of meta, target, and feedback agents for autonomous self-improvement.
- Achieves substantial performance gains across LawBench, GPU kernel optimization, scRNA denoising, and MLE-Bench.
- Offers easy local setup with built-in tasks and ability to bring custom tasks.
- Open-source under MIT license, available on GitHub.
Why it matters
This matters because SIA uses an iterative loop of meta, target, and feedback agents for autonomous self-improvement.
Technical impact
May affect model selection, inference cost, product capability, and evaluation benchmarks.
Notifications You must be signed in to change notification settings
Fork 3
Star 63
BranchesTags
Open more actions menu
Folders and files
NameName
Last commit message
Last commit date
Latest commit
History
5 Commits
5 Commits
.github/workflows
.github/workflows
docs
docs
sia
sia
tests
tests
.gitignore
.gitignore
CONTRIBUTING.md
CONTRIBUTING.md
EVALUATION_GUIDE.md
EVALUATION_GUIDE.md
LICENSE
LICENSE
README.md
README.md
environment.yml
environment.yml
pyproject.toml
pyproject.toml
Repository files navigation
Official implementation of SIA: Self Improving AI with Harness & Weight Updates (Hebbar et al., 2026) — a self-improving loop where a language-model agent updates both the harness and the weights of a task-specific agent. The paper reports a 56.6% gain on LawBench, 91.9% runtime reduction on GPU kernels, and 502% improvement on single-cell RNA denoising over baseline.
SIA is a Self Improving AI framework to autonomously improve the performance of any AI system (Model / Agent) on a benchmark task.
Just want to try it? Skip to Run SIA locally.
Architecture
Control flow between Meta, Target, and Feedback agents over successive generations.
SIA operates by coordinating three main types of AI agents that work together to continuously improve task performance:
Glossary
Meta-Agent: Reads the task description and generates an initial Target Agent tailored to the task.
Target / Task Specific Agent: Attempts to complete the task and records its actions and results.
Feedback/Improvement Agent: Reviews the Target Agent's performance logs, identifies improvements, and updates the Target Agent accordingly.
This iterative process allows the system to autonomously refine and enhance its ability to solve scientific tasks.
Benchmark Results
OpenAI MLE-Bench Hard: a gauntlet of real Kaggle ML competitions where agents must write, run, and iterate full ML pipelines. SIA ranks #1 across all generations tested.
LawBench: predict the criminal charge from Chinese court case descriptions across 191 charge categories. SIA-W+H reaches 70.1% Top-1 accuracy, beating the prior SOTA of 45%.
AlphaFold-3 TriMul Triton Kernel: implement and optimize the Triangle Multiplicative Update as a Triton kernel, preserving correctness while hitting H100 latency targets. SIA-W+H achieves 14x speedup over baseline.
scRNA-seq Denoising: impute missing gene expression values in single-cell RNA sequencing data. SIA-W+H scores 0.289 MSEnorm, surpassing the prior SOTA of 0.220.
Run SIA locally with built-in tasks
SIA ships with four built-in tasks: gpqa, lawbench, longcot-chess, spaceship-titanic.
Install
Pick the Agent backend that matches the LLMs you want to run.
Claude backend (Claude Agent SDK, Claude models only):
python3 -m venv .venv && source .venv/bin/activate pip install 'sia-agent[claude]' export ANTHROPIC_API_KEY="..."
OpenHands backend (multi-provider — Gemini, OpenAI, Anthropic, etc.):
python3 -m venv .venv && source .venv/bin/activate pip install 'sia-agent[openhands]'
Export the key(s) for the provider(s) you'll use:
export ANTHROPIC_API_KEY="..." # for anthropic/* models export GEMINI_API_KEY="..." # for gemini/* models (or GOOGLE_API_KEY) export OPENAI_API_KEY="..." # for openai/* models
Full provider/model reference: docs/configuration.md.
Run
sia --task gpqa --max_gen 5 --run_id 1
Swap --task for any of the four bundled tasks.
Artifacts land in runs/run_{run_id}/gen_{n}/:
target_agent.py — the agent for that generation
agent_execution.json — execution logs
improvement.md — diff rationale (gen 2+)
Common flags
Flag Default Description
--task — Bundled task name (mutually exclusive with --task_dir)
--task_dir — Path to an external task directory
--max_gen 3 Number of self-improvement generations
--run_id 1 Unique run identifier
--backend claude claude (Claude Agent SDK) or openhands (multi-provider)
--meta_model haiku Meta/feedback model (e.g. haiku, sonnet, opus, or gemini/..., openai/... with openhands)
--task_model claude-haiku-4-5-20251001 Target agent model
Full backend, model, and API-key reference: docs/configuration.md. Hit a snag? docs/troubleshooting.md.
Bring your own task
Prepare a task directory with the layout below and point --task_dir at it:
my-task/ ├── data/ │ ├── public/ │ │ ├── task.md # Task description — SIA reads this │ │ └── ... # Inputs the agent is allowed to see │ └── private/ # Held-out eval data; never exposed to the agent └── reference/ ├── reference_target_agent.py # Template; copy from sia/tasks/_shared/ └── SAMPLE_TASK_DESCRIPTIONS.md # Optional: example tasks for the meta-agent
sia --task_dir ./my-task --max_gen 5 --run_id 1
Or bring an MLE-Bench competition. SIA can bootstrap a task directory directly from any MLE-Bench competition — it pulls the dataset via the Kaggle API, sets up the public/private split, and drops in the reference agent template:
python -m sia.prepare_mlebench_dataset -c "spaceship-titanic" sia --task_dir ./tasks/spaceship-titanic --max_gen 5 --run_id 1
Full step-by-step for both paths: docs/walkthrough.md.
Further reading
docs/architecture.md — directory layout, generation flow, prompt customization
docs/walkthrough.md — detailed custom-task walkthrough
docs/configuration.md — backends, models, API keys, CLI reference
docs/troubleshooting.md — common errors and fixes
Citation
If you use SIA in your research, please cite:
@article{hebbar2026sia, title = {SIA: Self Improving AI with Harness \& Weight Updates}, author = {Hebbar, Prannay and Manawat, Yogendra and Verboomen, Samuel and Ivanova, Alesia and Palanimalai, Selvam and Bhatia, Kunal and Baskaran, Vignesh}, journal = {arXiv preprint arXiv:2605.27276}, year = {2026}, url = {https://arxiv.org/abs/2605.27276} }
About
SIA is a Self Improving AI framework to autonomously improve the performance of any AI system (Model / Agent) on a benchmark task.
hexolabs.com/
Resources
Readme
License
MIT license
Contributing
Contributing
Uh oh!
There was an error while loading. Please reload this page.
Activity
Custom properties
Stars
63 stars
Watchers
1 watching
Forks
3 forks
Report repository
Releases
3 tags
Packages 0
Uh oh!
There was an error while loading. Please reload this page.
Contributors
Uh oh!
There was an error while loading. Please reload this page.
Languages
Python 100.0%