2026-06-16原文2 min readUpdated: 2026-06-16

Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

DR-DCI is a retriever-steered Direct Corpus Interaction (DCI) framework that treats retrieval as an agent-callable action to dynamically expand a local workspace, achieving scalable and precise evidence resolution. Experiments show up to 73.3% accuracy on Browsecomp-Plus, outperforming raw DCI and BM25, and scaling stably to 20M documents.

SourcearXiv AIAuthor: Yi Lu, Zhuofeng Li, Ping Nie, Haoxiang Zhang, Yuyu Zhang, Kai Zou, Wenhu Chen, Jimmy Lin, Dongfu Jiang, Yu Zhang

[2606.14885] Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

[Submitted on 12 Jun 2026]

Title:Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

View a PDF of the paper titled Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion, by Yi Lu and 9 other authors

View PDF HTML (experimental)

Abstract:Agentic search over large corpora relies on retriever-mediated interfaces (e.g., BM25 or ColBERT) for scalable candidate discovery. While effective at ranking relevant documents, these interfaces expose evidence only as ranked results or bounded document views, limiting agents' ability to reorganize material and verify constraints across documents. Direct Corpus Interaction (DCI) addresses this limitation by exposing shell-executable corpus operations for flexible search, filtering, comparison, and verification. However, full-corpus terminal commands become slow and unstable as the corpus grows, degrading performance and efficiency. We introduce DR-DCI, a retriever-steered DCI framework that treats retrieval as an agent-callable action for expanding a local workspace. Rather than operating directly over the full corpus, the agent dynamically pulls relevant documents into an evolving workspace and conducts DCI operations within it. This design combines retriever-level recall with DCI-style precision: retrieval keeps exploration scalable, while DCI preserves the local operations needed for effective evidence resolution. Experiments show that DR-DCI is both effective and efficient across scales. On Browsecomp-Plus, DR-DCI reaches 71.2\% accuracy, improving over raw DCI and ablated variants by up to 8.3 points while reducing tool usage, wall time, and estimated cost. With workspace-preserving context reset, accuracy further improves to 73.3\%. In corpus-scaling experiments, DR-DCI remains effective from 100K to 10M documents, whereas raw DCI becomes unstable and BM25 performs substantially worse. DR-DCI also scales to a 20M-scale file-per-document Wiki-18 QA setting, achieving an average score of 63.0 across six benchmarks and outperforming retrieval-based and trained search-agent baselines. Ablation analysis further shows that ranked previews and inter-document DCI are key to performance.

Comments: 25 pages, 4 figures, 22 tables

Subjects:

Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Cite as: arXiv:2606.14885 [cs.AI]

(or arXiv:2606.14885v1 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2606.14885

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Dongfu Jiang [view email] [v1] Fri, 12 Jun 2026 18:46:18 UTC (342 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion, by Yi Lu and 9 other authors

View PDF

HTML (experimental)

TeX Source

view license

Current browse context:

cs.AI

new | recent | 2026-06

Change to browse by:

cs cs.CL

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)