DeepSeek Researcher Develops Automated Research Skill: Writing a Paper with Only 2 Hours of Human Brain Time
DeepSeek researcher Chen Deli used his self-developed DeliAutoResearch skill, collaborating with DeepSeek-V4-Pro and GPT-Image2, to complete a 46-page paper in just 6 days. The paper introduces an L1-L5 autonomy classification for research agents, analyzes four architectural patterns and 17 mainstream systems, and identifies six open problems. Chen Deli says only about 2 hours of human 'CPU time' were needed, with the rest handled by AI agents.
Article intelligence
Key points
- Chen Deli's DeliAutoResearch skill enabled the paper to be 99% written by AI agents.
- The paper proposes an L1-L5 autonomy classification for research agents, analogous to SAE levels for autonomous driving.
- Four architectural patterns are analyzed: single-agent loop, multi-agent collaboration, hierarchical scheduling, and tool-enhanced execution.
- Six open problems are highlighted, including cognitive loop traps, context limitations, and innovation assessment.
Why it matters
This matters because chen Deli's DeliAutoResearch skill enabled the paper to be 99% written by AI agents.
Technical impact
May affect model selection, inference cost, product capability, and evaluation benchmarks.
DeepSeek researcher Chen Deli has demonstrated a striking example of AI-driven research automation. Using his self-developed DeliAutoResearch skill, Chen collaborated with DeepSeek-V4-Pro for research and writing, and GPT-Image2 for generating figures, to produce a 46-page survey paper in just six days. The paper underwent six iterations (V1: four times, V2: once, V3: once), totaling approximately 108 agent calls, consuming 648,000 tokens, and generating 2,234 lines of LaTeX code. All 103 references were verified, and the paper includes seven figures and four tables.
Chen Deli claims that only about 1% of the paper was directly written by him, with the remaining 99% generated by AI agents. The human effort, he notes, amounted to less than two hours of "CPU time" for his brain, whereas similar work would have previously required at least a month. The paper itself addresses the chaotic landscape of autonomous research agents by introducing a clear L1–L5 autonomy classification system, inspired by the SAE levels for autonomous driving.
The classification ranges from L1 (basic autocomplete, like early GitHub Copilot) to L5 (fully autonomous agenda-setting, still unrealized). According to the paper, the current frontier is at L4, where agents can execute multi-step experiments and write papers within a restricted domain, but cannot independently choose research questions. The paper argues that the true bottlenecks are not model capabilities but "continuous knowledge accumulation" and "reliable self-assessment."
In addition to the autonomy levels, the paper identifies four major architectural patterns for research agents: single-agent loop (e.g., ReAct, Reflexion), multi-agent collaboration (e.g., CAMEL, AutoGen), hierarchical scheduling (e.g., Claude Code, Devin), and tool-enhanced execution (e.g., SWE-Agent). Each pattern has its strengths and is suited to different tasks. The paper then evaluates 17 existing autonomous research systems using a six-dimensional feature matrix, revealing that the field has evolved from early fragile prototypes to L4 specialized systems, with code agents being the most mature.
Finally, the paper outlines six open problems: cognitive loop traps, context limitations, innovation assessment, reproducibility, safety and ethics, and cost issues. Chen Deli also shares a personal note: thanks to AI agents, he has been able to resume blogging and other creative work that he had put aside due to burnout. He emphasizes that the human role is shifting from executor to initiator.