2026-06-07 06:25 UTCIn-site rewrite1 min readUpdated: 2026-06-30 13:03 UTC

Meet Harness-1: A 20B Retrieval Subagent Trained With Reinforcement Learning Inside a Stateful Search Harness on gpt-oss-20b

Harness-1 is a 20B retrieval subagent built on gpt-oss-20b, trained with reinforcement learning inside a stateful search harness. The harness handles bookkeeping—candidate pool, curated set, evidence graph, verification records—while the policy focuses on search, curation, and verification decisions. It achieves 0.730 average curated recall across eight benchmarks, outperforming the next open subagent by 11.4 points and trailing only Opus-4.6. Weights and harness code are public.

SourceMarkTechPostAuthor: Asif Razzaq

Most search agents are trained as policies over a growing transcript. The model decides how to search. It must also remember what it saw, which evidence matters, and which claims it checked. A team of researchers from University of Illinois Urbana-Champaign, UC Berkeley, and Chroma argues this asks too much. Reinforcement learning ends up optimizing both search decisions and routine bookkeeping at once.

Their answer is Harness-1, a 20B retrieval subagent built on gpt-oss-20b. It was trained with reinforcement learning inside a stateful search harness. The harness holds the bookkeeping. The policy keeps the semantic decisions. The weights and harness code are publicly released.

https://arxiv.org/pdf/2606.02373