2026-07-02 00:00 UTCOriginal source2 min readUpdated: 2026-07-02 17:31 UTC

Multi-Agent Teams Hold Experts Back

Self-organizing LLM teams fail to match the performance of their best expert agent, with losses up to 41.1%. The primary bottleneck is leveraging expert knowledge rather than identifying experts. Teams tend toward integrative compromise, which harms performance but improves robustness against adversarial agents.

SourceApple Machine Learning Research

content type paperpublished July 2026

AuthorsAneesh Pappu†, Batu El†, Hancheng Cao‡, Carmelo di Nolfo, Yanchao Sun, Meng Cao, James Zou†

View publication

Multi-agent LLM systems are increasingly deployed as autonomous collaborators, where agents interact freely rather than execute fixed, pre-specified workflows. In such settings, effective coordination cannot be fully designed in advance and must instead emerge through interaction. However, most prior work enforces coordination through fixed roles, workflows, or aggregation rules, leaving open the question of how well self-organizing teams perform when coordination is unconstrained. Drawing on organizational psychology, we study whether self-organizing LLM teams achieve strong synergy, where team performance matches or exceeds the best individual member. Across human-inspired and frontier ML benchmarks, we find that — unlike human teams — LLM teams consistently fail to match their expert agent’s performance, even when explicitly told who the expert is, incurring performance losses of up to 41.1% on ML benchmarks. Decomposing this failure, we show that expert leveraging, rather than identification, is the primary bottleneck. Conversational analysis reveals a tendency toward integrative compromise — averaging expert and non-expert views rather than appropriately weighting expertise — which increases with team size and correlates negatively with performance. Interestingly, this consensus-seeking behavior improves robustness to adversarial agents, suggesting a trade-off between alignment and effective expertise utilization. Our findings reveal a significant gap in the ability of self-organizing multi-agent teams to harness the collective expertise of their members.

† Stanford University

‡ Emory University

AgentBuilder: Exploring Scaffolds for Prototyping User Experiences of Interface Agents

January 9, 2026research area Human-Computer Interaction

Interface agents powered by generative AI models (referred to as “agents”) can automate actions based on user commands. An important aspect of developing agents is their user experience (i.e., agent experience). There is a growing need to provide scaffolds for a broader set of individuals beyond AI engineers to prototype agent experiences, since they can contribute valuable perspectives to designing agent experiences. In this work, we explore the…

Towards Learning Multi-Agent Negotiations via Self-Play

January 28, 2019research area Computer VisionWorkshop at ICCV

Making sophisticated, robust, and safe sequential decisions is at the heart of intelligent systems. This is especially critical for planning in complex multi-agent environments, where agents need to anticipate other agents’ intentions and possible future actions. Traditional methods formulate the problem as a Markov Decision Process, but the solutions often rely on various assumptions and become brittle when presented with corner cases. In…