AI News HubLIVE
Public articles 79Collected articles 85Trust 84Refresh 30 min
Health HealthySource type ResearchFull-text rights Full text allowedLast ingested 2026-06-26ID langchain-blogStatus Enabled

Technical tool blog; verify individual post terms before full body display.

Latest public articles

Prompt Caching with Deep Agents

Learn how Deep Agents uses prompt caching to cut LLM token costs by up to 80% across every major model provider - no extra config required.

  • Prompt caching reduces token costs by 41-80% by storing model state after processing a prompt.
  • Different providers have varying support for caching features, making provider-agnostic optimization tricky.
In-site article

June 2026: LangChain Newsletter — Fleet On-Call Copilot, Deep Agents Rubrics, and More

New in LangSmith: a Fleet on-call copilot for alert triage, computer use for agents, voice trace debugging, and experiment status tracking. Plus Deep Agents Rubrics, programmatic subagents, a new LangSmith Deployment course, and upcoming events in Chicago, Berlin, DC, and Vegas.

  • Fleet On-Call Copilot: a prebuilt agent template that triages alerts and drafts updates using code, traces, and runbooks.
  • Computer Use: agents can now operate an isolated virtual computer for code, files, and authenticated API calls.
In-site article

Why the Best AI Agents Are Simple: Sierra’s Zack Reneau-Wedeen on the Max Agency Podcast

On the Max Agency Podcast, Zack Reneau-Wedeen discusses the future of AI agents, advocating for simple architectures, outcome-based pricing, and avoiding 'org chart shipping.' He shares insights from building customer-facing agents at Sierra.

  • Simple agent architectures outperform complex multi-agent systems
  • Outcome-based pricing aligns incentives for high-value tasks
In-site article

How Klarna's AI assistant redefined customer support at scale for 85 million active users

Klarna's AI assistant, built on LangGraph and LangSmith, handles the work of 700 full-time staff, reducing customer query resolution time by 80% and automating 70% of repetitive support tasks.

  • Klarna's AI assistant handles over 2.5 million conversations, performing the work of 700 full-time employees.
  • The assistant reduced average customer query resolution time by 80% and automated ~70% of repetitive tasks.
In-site article

How LangSmith and LangChain OSS Help You Meet EU AI Act Requirements

The EU AI Act compliance deadline is August 2, 2026. This article explains what the Act requires for high-risk AI systems and how LangSmith and LangChain OSS help meet each requirement through full observability, automated evaluations, human oversight, and more.

  • EU AI Act requires risk management, automatic logging, transparency, human oversight, and post-market monitoring for high-risk AI systems.
  • LangSmith provides end-to-end tracing capturing every agent input, reasoning step, tool call, and output.
In-site article

How to Build Memory into AI Agents

A practical guide to adding memory to AI agents, covering short-term and long-term memory concepts, trace analysis, and how LangSmith's tools enable a complete memory loop for agent improvement across runs.

  • Memory enables agents to remember user preferences and corrections, reducing repeated instruction.
  • Short-term memory handles current tasks; long-term memory persists facts, preferences, and skills.
In-site article

Introducing LangSmith’s No Code Agent Builder

LangSmith launches a no-code agent builder that enables non-technical users to create AI agents with memory, guided prompts, and MCP tools. The builder uses conversational guidance, built-in memory, and sub-agents to lower the barrier for agent development, suitable for internal productivity use cases.

  • LangSmith Agent Builder offers a no-code experience with memory and guided prompt creation.
  • Agents consist of four core components: prompt, tools, triggers, and sub-agents.
In-site article

How Factory Used LangSmith to Automate Feedback Loop and Double Iteration Speed

Factory AI leveraged LangSmith's observability and feedback API to close the product feedback loop, achieving a 2x improvement in iteration speed and significant reductions in development cycle time.

  • Factory integrated LangSmith with AWS CloudWatch for enhanced observability and debugging.
  • Using LangSmith's Feedback API, Factory automated prompt optimization, reducing manual effort.
In-site article

Introducing Open SWE: An Open-Source Asynchronous Coding Agent

Open SWE is an open-source, cloud-hosted coding agent that autonomously handles GitHub tasks—planning, coding, testing, and opening PRs. It features a multi-agent architecture, human-in-the-loop control, and asynchronous execution.

  • Open SWE is an open-source, async, cloud-hosted coding agent that integrates directly with GitHub.
  • It uses a multi-agent architecture (Planner, Programmer, Reviewer) to ensure code quality.
In-site article

Monte Carlo: Building Data + AI Observability Agents with LangGraph and LangSmith

Monte Carlo built an AI Troubleshooting Agent on LangGraph and debugged with LangSmith to help data teams resolve issues faster by exploring multiple investigation paths in parallel.

  • Monte Carlo used LangGraph to create a dynamic graph for automated, parallel troubleshooting.
  • LangSmith enabled visualization and rapid iteration of prompts from day one.
In-site article

Sharing LangSmith Benchmarks

LangSmith launches public benchmarks and evaluation dataset sharing to help developers compare LLM architecture performance. The first benchmark is a Q&A dataset over LangChain docs, accompanied by the langchain-benchmarks package. The article analyzes various models and architectures, providing insights into performance and debugging.

  • LangSmith now supports sharing evaluation datasets and results for community-driven benchmarks.
  • The initial benchmark is a Q&A dataset over LangChain docs to test RAG systems.
In-site article

LangSmith: Redesigned product homepage and Resource Tags for better organization

LangSmith's homepage is now organized into Observability, Evaluation, and Prompt Engineering, with improved Resource Tags for flexible resource grouping. Onboarding guides and upcoming ABAC enhance usability.

  • Homepage divided into three sections: Observability, Evaluation, and Prompt Engineering.
  • Resource Tags now support flexible grouping by 'Application' or custom tags.
In-site article

Agent Engineering: A New Discipline

Agent engineering is an emerging discipline that integrates product thinking, engineering, and data science to build reliable LLM agents through rapid iteration and production feedback. It addresses the unpredictability of agents by cycling through build, test, ship, observe, and refine, as practiced by companies like Clay, Vanta, LinkedIn, and Cloudflare.

  • Agent engineering is an iterative process: build, test, ship, observe, refine, repeat.
  • It combines product thinking (scope and behavior), engineering (infrastructure), and data science (measurement and improvement).
In-site article

Testing Fine Tuned Open Source Models in LangSmith

Evaluate and compare fine-tuned open source LLMs using LangSmith. Test multiple models, automate evaluations, and choose the best performing AI.

  • LangSmith provides UI and API to create evaluation datasets for easy model comparison.
  • Fine-tuned Llama2-7b (78k rows) and Llama2-13b (10k rows) for SQL generation.
In-site article

Human judgment in the agent improvement loop

AI agents work best when they reflect the knowledge and judgment your team has built over time. This article explores how to integrate human judgment into each stage of agent development, using a trader copilot example. It covers workflow design, tool design, and context engineering, and emphasizes the importance of automated evaluations and continuous iteration.

  • Agents need tacit knowledge from domain experts
  • Human judgment can be embedded through workflow, tool, and context design
In-site article

Context Management for Deep Agents

Learn how Deep Agents SDK manages context for long-running AI tasks through offloading, summarization, and filesystem abstraction to prevent context rot.

  • Three compression techniques: offloading large tool results (>20K tokens), offloading large tool inputs (at >85% context), and summarization (when offloading insufficient).
  • Offloaded content is saved to filesystem with pointers; agent can retrieve via file operations.
In-site article

The Art of Loop Engineering

This post explores how to build reliable AI agents by designing loops, not just using a good model. It introduces four nested loops: the agent loop, verification loop, event-driven loop, and hill climbing loop, each building on the previous to create agents that work consistently and improve over time. Using LangChain primitives, developers can implement each level and embed human oversight where needed.

  • The agent loop lets the model call tools repeatedly to complete tasks. It's the fundamental loop.
  • The verification loop checks output quality and provides feedback, ensuring consistency.
In-site article

Why Fleet Has General Purpose Chat and Specialized Agents

Fleet supports both quick, ad hoc tasks and recurring responsibilities. See how General Purpose Chat and Specialized Agents help teams delegate work.

  • Two patterns of agent work: ad hoc and recurring. Fleet uses General Purpose Chat for one-off tasks and Specialized Agents for repetitive work.
  • Specialized Agents offer configurable instructions, tools, models, subagents, skills, triggers, and persistent memory.
In-site article

Building a 100x Cheaper Trace Judge with Fireworks

LangChain and Fireworks fine-tuned an open model to mine perceived error signals from production traces, matching frontier model performance at a fraction of the cost.

  • LangSmith processes billions of tokens daily across production traces.
  • Fine-tuned Qwen model detects 'Perceived Error' at frontier performance with 100x cost savings.
In-site article

What is an AI agent?

The article explores the definition of AI agents, proposing that an agent is a system that uses an LLM to decide the control flow of an application. The author agrees with Andrew Ng that agent capabilities are a spectrum and introduces the concept of 'agentic' behavior, discussing its implications for development, operation, evaluation, and monitoring.

  • An AI agent is a system that uses an LLM to determine the control flow of an application.
  • Agent capabilities exist on a spectrum, from simple routing to highly autonomous agents.
In-site article

How we built LangChain's GTM Agent

LangChain built a GTM agent using Deep Agents that automates lead research, drafting, and account intelligence, achieving a 250% increase in lead conversion and saving 40 hours per rep per month.

  • Agent automates outbound and inbound lead processing with human-in-the-loop approval via Slack.
  • Uses Deep Agents for multi-step orchestration and LangSmith for evaluations and feedback.
In-site article

Introducing Align Evals: Streamlining LLM Application Evaluation

Align Evals is a new feature in LangSmith that helps you calibrate your evaluators to better match human preferences.

  • Align Evals reduces mismatches between LLM evaluator scores and human judgment.
  • Provides a playground-like interface and baseline alignment score for iterative prompt improvement.
In-site article

How and when to build multi-agent systems

This article analyzes two seemingly opposing blog posts—'Don't Build Multi-Agents' by Cognition and 'How we built our multi-agent research system' by Anthropic—and finds they share common insights about when and how to build multi-agent systems. Key points include the critical role of context engineering, the relative ease of read-oriented vs. write-oriented multi-agent systems, and production reliability challenges. It also highlights how tools like LangGraph and LangSmith address these challenges.

  • Context engineering is the most critical part of building multi-agent systems, requiring dynamic communication of task context to models.
  • Multi-agent systems focused on 'reading' (e.g., research) are easier than those focused on 'writing' (e.g., coding), as writing requires more complex coordination and merging.
In-site article

Pushing LangSmith to new limits with Replit Agent's complex workflows

Learn how Replit Agent leverages LangSmith's observability features to debug complex agent workflows, including improvements in trace performance, search, and human-in-the-loop threads.

  • Replit Agent uses LangGraph and LangSmith for monitoring and debugging.
  • LangSmith was enhanced to handle large traces with hundreds of steps.
In-site article

Recap of Interrupt 2025: The AI Agent Conference by LangChain

Interrupt 2025, LangChain's first industry conference, gathered 800 people in San Francisco. Keynote themes included Agent Engineering as a new discipline, multi-model LLM apps, LangGraph for reliable agents, and AI observability. Product launches included LangGraph Platform GA, Open Agent Platform, LangGraph Studio v2, LangGraph Pre-Builts, LangSmith observability updates, Open Evals, and LLM-as-Judge private preview.

  • LangChain held its first Interrupt conference, focusing on AI agents.
  • Several new products were announced, including LangGraph Platform GA and Open Agent Platform.
In-site article

Pairwise Evaluations with LangSmith

Learn what pairwise evaluation is, why you might need it for LLM app development, and see an example of how to use it in LangSmith by LangChain.

  • Pairwise evaluation compares two LLM outputs directly to better capture human preferences.
  • LangSmith introduces custom pairwise evaluators for flexible comparison based on any criteria.
In-site article

Build and deploy a RAG app with Pinecone Serverless

A guide to building production-ready RAG apps using Pinecone Serverless, LangChain, and LangServe, addressing pain points like vectorstore management, rapid deployment, and observability.

  • Pinecone Serverless offers usage-based pricing and unlimited scalability, solving hosted vectorstore challenges.
  • LangServe enables rapid deployment of LangChain chains as production web services.
In-site article

Quickly Start Evaluating LLMs With OpenEvals

OpenEvals and AgentEvals provide pre-built evaluators for LLM-as-judge, structured data, and agent trajectory evaluation. These open-source packages help developers quickly establish evaluation workflows to ensure reliability of LLM applications.

  • OpenEvals and AgentEvals offer ready-to-use evaluators covering LLM-as-judge, structured data, and agent trajectory evaluation.
  • LLM-as-judge evaluators are customizable with few-shot examples and scoring schemas, suitable for conversational quality, hallucination detection, and more.
In-site article

How to think about agent frameworks

Learn to build reliable AI agents. Compare workflows vs agents, declarative vs imperative approaches, and why context control matters most.

  • The hard part of building reliable agents is controlling the context passed to the LLM at each step.
  • Agentic systems include both workflows and agents; most production systems are a mix.
In-site article

Aligning LLM-as-a-Judge with Human Preferences

LangSmith introduces self-improving LLM-as-a-Judge evaluators that leverage human corrections as few-shot examples to align evaluations with human preferences without prompt engineering.

  • LLM-as-a-Judge evaluators are popular for grading natural language outputs but require careful prompt engineering.
  • LangSmith's new feature stores human corrections as few-shot examples to improve evaluator alignment over time.
In-site article

All sources