2026-06-11原文3 min readUpdated: 2026-06-12

How Benchling builds agents when the smartest AI isn't smart enough

Benchling's Head of AI Nicholas Larus-Stone discusses building agents for life sciences on the Max Agency podcast. He explains their multi-model approach for quality, production trace review processes, and how agents compress workflows to accelerate scientific discovery. Benchling AI launched in October 2025 on top of a 14-year-old data platform.

SourceLangChain Blog

Max Agency Podcast

June 11, 2026

min

Go back to blog

Create agents

Nicholas Larus-Stone is the Head of AI at Benchling , the R&D data platform that life science companies use to store and manage their experiments, samples, instruments, and analysis. Benchling has been around since 2012. In October 2025, it launched Benchling AI, an intelligence layer with a chat interface, backed by an agent, that helps scientists find data, design experiments, and write reports. Nick came to Benchling through its acquisition of Sphinx Bio (acquired), the analysis startup he founded.

In this conversation with LangChain Co-Founder & CEO Harrison Chase, Nick walks through what it takes to build agents for scientific work, and where the playbook from coding agents holds up and where it breaks down.

🎧 Watch the full conversation on YouTube, or listen & subscribe on Apple Podcasts or Spotify.

What we learned

Why Benchling runs multiple models on the same task

Instead of running the same model multiple times, Benchling runs across different providers. Different model families make different mistakes, so there is a stronger quality indicator for their team. If multiple models agree, it indicates good data quality. If multiple models disagree, there's usually an error.

"Each of them will make slightly different errors... being able to ask different model providers, we found gives us much better performance."

‍

How Benchling approaches trace review

In the world of scientific research, evals can only get you so far. Benchling leans on a structured approach for looking at production traces. Every week, they have a rotating fire chief who addresses and flags issues that are addressed in their weekly tech operations meeting. For external signals, they look at thumbs up & thumbs down user feedback.

"People who are working on specific features are gonna go look at the traces — our product managers, our engineers who are building something will actually go and see how people are using that feature after releasing it."

‍

Agents are having a big impact in scientific work

Nicholas points out that agents are compressing workflows and reducing the number of experiments needed to get an answer. By reducing dead time between steps, a day saved can often become a week saved. In addition, agents are also helping scientists design experiments more rigorously upfront, reducing the number of runs needed to get to a conclusion.

‍