LangChain AI News

Source Mix

LangChain Blog36
Hacker News AI4
arXiv Machine Learning3
KDnuggets2
Machine Learning Mastery2
arXiv AI1
Latent Space1
NVIDIA Blog1

Topic Mix

Agents50
Research32
Models18
Policy10
Chips8
Startups1

Timeline

2026-06-1612
2026-07-086
2026-06-305
2026-06-243
2026-07-013
2026-07-023
2026-06-172
2026-06-182

Latest Updates

An educational lab of AI agent architectures

2026-07-11 15:33 UTC

An educational lab of AI agent architectures built on LangChain and local Ollama, offering multiple agent variants for chat, tool calling, RAG, hybrid, and agentic RAG modes.

Multiple AI agent architecture variants covering chat, tool calling, RAG, hybrid, and agentic RAG.
Built on LangChain and local Ollama server, with optional OpenRouter support.

OpenWiki Brains: Proactive Memory for AI Agents

2026-07-10 16:46 UTC

OpenWiki Brains turns sources like Gmail, Notion, Git, X, Hacker News, and web search into a local wiki that agents can use as fresh, proactive memory.

OpenWiki Brains turns external sources into a local wiki for agents to use as proactive memory.
Two modes: Personal Brain for general context and Code Brain for code documentation.

Build An Auditable VC Research Agent With The Perplexity Agent API, LangGraph, And LangSmith

2026-07-09 15:58 UTC

Learn how to build a venture capital research agent that produces investment memos in 90 seconds with cited sources, using the Perplexity Agent API, LangGraph, and LangSmith. The agent runs parallel research nodes for team, financials, product, and market, then synthesizes a memo with seven sections, including a thesis and recommendation. Every claim is traced to primary sources, making the output auditable. The article also compares three search providers and offers takeaways for building similar agents.

An agent built with Perplexity Agent API, LangGraph, and LangSmith generates a draft investment memo in ~90 seconds at ~$0.40, with every claim cited.
Four parallel research nodes (team, financials, product, market) gather evidence, then a tool-less synthesizer composes the memo.

LLM Orchestration Frameworks Compared: LangChain vs. LlamaIndex vs. Raw API Calls

2026-07-09 15:38 UTC

A comparison of LangChain, LlamaIndex, and raw API calls for LLM applications, covering their strengths, trade-offs, and a decision framework for choosing the right abstraction level.

LangChain excels at orchestrating complex workflows and agents but can introduce overhead and debugging complexity.
LlamaIndex specializes in retrieval-augmented generation (RAG) with strong data ingestion and indexing capabilities.

LangChain and NVIDIA Launch NemoClaw Deep Agents Blueprint

2026-07-08 15:04 UTC

LangChain and NVIDIA launch the NemoClaw Deep Agents blueprint, combining Deep Agents Code, Nemotron 3 Ultra, and OpenShell for open, governed enterprise agents.

The blueprint integrates LangChain's Deep Agents framework, NVIDIA's Nemotron 3 Ultra model, and NVIDIA OpenShell runtime.
It achieves a 0.86 score on LangChain's agent eval suite at $4.48 cost, roughly 10x lower than competing models.

Tuning the harness, not the model: a Nemotron 3 Ultra playbook

2026-07-08 15:00 UTC

By tuning only the harness (scaffolding) around the Nemotron 3 Ultra, we achieved a best run of 0.86 on the Deep Agents suite, nearly matching Opus 4.8's best of 0.87, at roughly 10x lower cost. This article details the eval-driven approach, prompt engineering, middleware optimizations, and what didn't work.

Tuning the harness alone took Nemotron 3 Ultra to a best run of 0.86 on Deep Agents, nearly matching Opus 4.8's 0.87, at about 10x lower cost per run.
Evals are the training data for harness work: every change runs through a trace-driven loop, screened cheaply first, and kept only if wins repeat across trials with no regressions.

NVIDIA Nemotron Achieves Benchmark-Leading Performance With LangChain Deep Agents Harness

2026-07-08 15:00 UTC

NVIDIA Nemotron 3 Ultra is offering leading performance at lower cost than top closed models with the largest and most widely adopted AI agent orchestration platform. LangChain tuned its Deep Agents harness for NVIDIA Nemotron 3 Ultra, achieving the highest accuracy among open models, while completing more tasks at higher throughput and running at 10x lower inference cost per run than leading closed models.

LangChain's Deep Agents harness tuned for NVIDIA Nemotron 3 Ultra achieves highest accuracy among open models, with 10x lower inference cost than closed models.
All performance gains come from engineering the environment around the model, not retraining the model itself.

Deep Agents Code on NVIDIA NemoClaw

2026-07-08 15:00 UTC

Run Deep Agents Code on NVIDIA NemoClaw with deny-by-default networking, human approval, and audit logs for sensitive code modernization.

Deep Agents Code (dcode) runs as a governed blueprint on NemoClaw with the open Nemotron 3 Ultra model, giving you control over source, model, and audit trail.
Deny-by-default networking, human approval, and full audit logs provide the controls a regulated team needs.

BrAIn, reactive AI agent nodes on a NATS bus instead of a chat loop

2026-07-08 14:50 UTC

brAIn is an experimental AI agent framework that replaces the traditional chat loop with a NATS pub/sub bus architecture of long-lived daemon nodes. Nodes are reactive, only activating when relevant messages arrive, saving token consumption. Each node can have its own UI, supports distributed deployment, and features priority preemption and MCP client integration. The author demonstrates applications like ambient room agents, Slack listeners, and IoT controllers, and compares the architecture with existing tools such as LangGraph, AutoGen, and ROS 2.

brAIn uses a NATS pub/sub bus for many-to-many communication between long-lived reactive daemon nodes.
Each node can have its own UI, run locally or remotely, and be distributed across machines.

[AINews] Lilian Weng summarizes 35 papers on Harness Engineering for RSI

2026-07-08 02:20 UTC

This edition of AINews covers a broad range of AI developments from July 6-7, 2026. Highlights include Lilian Weng's deep dive into harness engineering for recursive self-improvement, Meta's launch of Muse Image and preview of Muse Video with agentic generation loops, and major product updates from Anthropic, LangChain, and Google on agent platforms. Other notable items: NVIDIA's Audex audio model, Cohere's Arabic ASR, robotics integrations with Hugging Face and NVIDIA, Liquid AI's Antidoom method to reduce reasoning loop failures, and Anthropic's controversial J-space interpretability work. Also covered: benchmarks for agents and legal AI, research automation, and inference efficiency advances.

Lilian Weng's blog post reframes recursive self-improvement around the harness rather than direct weight modification, emphasizing that harness engineering is critical for specifying goals and context.
Meta's Muse Image and Muse Video showcase agentic generation with planning, tool use, and self-refinement, quickly ranking high on public leaderboards.

Improving Agents is a Data Mining Problem

2026-07-07 15:05 UTC

How LangChain mines agent traces to find failures, fine-tune judge models cheaper than frontier LLMs, and hill-climb performance with evals.

Mining traces gives you signals to hill climb on
Open model fine-tuning & compound agent systems help you process large scale trace data

How Schneider Electric Built Their LLMOps Foundations With LangSmith

2026-07-07 15:00 UTC

Schneider Electric built enterprise LLMOps foundations with LangSmith to improve observability, evaluation, and deployment for AI products at scale. Their AI Hub of 350 experts deployed 60+ agents. The three pillars: self-hosted LangSmith for observability, offline/online evaluation with maturity framework, and per-product deployment. Case studies include internal AI assistant One Jo, CSM Copilot, and document processing agent, showing significant efficiency gains.

Schneider Electric has over 60 AI products built on the LangChain ecosystem
Self-hosted LangSmith ensures data privacy and compliance

Deep Agents: A Batteries-Included Agent Harness

2026-07-03 04:33 UTC

Deep Agents is an open-source agent harness by LangChain, designed for long-horizon, multi-step tasks. It includes built-in features such as sub-agents, filesystem access, context management, shell access, persistent memory, and human-in-the-loop approval. Model-agnostic and built on LangGraph, it is production-ready with LangSmith integration.

Opinionated and extensible agent harness built on LangGraph.
Built-in sub-agents, filesystem, context management, shell access, persistent memory, and human-in-the-loop.

We Ran a Complex Task – A LangChain Repo Analysis with Claude Fable Models

2026-07-02 23:01 UTC

A detailed experiment comparing five Claude models (Opus, Fable, Sonnet, Sonnet 4.6, Haiku) on a full audit of the LangChain Python monorepo. Fable matched Opus in grade (A-) but excelled in generating actionable milestones and quick wins. The article presents findings, strengths/weaknesses, and recommends a multi-model pipeline.

Five Claude models were tested on a four-phase audit of LangChain.
Fable scored A- and produced the most actionable improvement plan.

Your coding agent bill doubled. Here’s how to fix it.

2026-07-02 17:29 UTC

Learn why coding agent bills spiral out of control — and how to trace, compare, and govern spend across Claude Code, Cursor, Copilot, and more in one place.

Coding agent usage exploded in early 2026, leading to skyrocketing bills with no unified cost visibility.
Fragmentation across tools (Claude Code, Cursor, Copilot) makes it impossible to compare spend without a common tracking model.

10 Agentic AI Frameworks You Should Know in 2026

2026-07-02 14:00 UTC

A comprehensive overview of 10 agentic AI frameworks in 2026, including LangGraph, CrewAI, OpenAI Agents SDK, Google ADK, PydanticAI, smolagents, Mastra, Microsoft Agent Framework, Strands Agents, and LlamaIndex Workflows, highlighting their strengths, best use cases, and trade-offs for developers.

LangGraph focuses on state machine control, ideal for complex, long-running agents with human-in-the-loop.
CrewAI offers a role-based multi-agent mental model, great for fast prototypes and collaboration.

OpenWiki: Open Source Repo Documentation for Coding Agents

2026-07-01 17:58 UTC

OpenWiki generates and maintains codebase documentation so coding agents can find the repo context they need without loading everything into one instruction file.

OpenWiki automatically generates and updates repo wikis for coding agents.
It adds a reference in agent instruction files so agents can retrieve docs on demand.

How to Use RLMs in Deep Agents

2026-07-01 15:38 UTC

Recursive Language Models (RLMs) combat context rot by having agents write code to dispatch subagents over context chunks. Deep Agents now supports RLMs through dynamic subagents and a lightweight code interpreter, enabling programmatic orchestration like map/reduce over large inputs. Benchmarked on OOLONG, RLMs outperform turn-by-turn agents at longer contexts.

RLMs use code to recursively call subagents on context chunks, avoiding context window limits.
Deep Agents implements RLMs with dynamic subagents and a code interpreter.

How Pendo uses LangSmith to trace Novus from user behavior to code fixes

2026-07-01 15:00 UTC

Pendo used LangSmith to debug, evaluate, and monitor Novus, its AI product agent that turns behavioral data and session replays into code fixes. LangSmith's production tracing enabled Pendo to ship Novus in days with 90%+ success rate, saving 25% time on identifying new use cases and catching 60% of AI problems before customers noticed.

Novus is a product agent that detects and fixes usability issues in live applications.
LangSmith traces helped Pendo debug agent decisions, monitor costs, and refine prompts.

Harbor x LangChain: A Unified Stack for Evaluating Agents

2026-06-30 15:22 UTC

Evaluating long-running, stateful agents requires a new eval runner. Harbor integrates with LangChain's Deep Agents, LangSmith sandboxes, and observability to provide scalable, isolated evaluations with explainable traces.

Harbor connects agents via langgraph.json registry and make_graph factory, staying model-agnostic.
LangSmith sandboxes provide isolated environments per trial, enabling horizontal scaling with hundreds of parallel runs.

Wiki Memory

2026-06-30 14:46 UTC

The article introduces 'Wiki Memory' as a pattern for agent memory, where agents transform raw source data into a compact, persistent, and structured knowledge layer. Unlike RAG, it precomputes a higher-level synthesis. Examples include DeepWiki, Karpathy's LLM Wiki, and Factory's AutoWiki. The article also discusses open questions about raw data, compression format, and maintenance, concluding that files and agents are common answers.

Wiki memory uses agents to compress raw data into a reusable knowledge base, distinct from RAG's real-time chunk retrieval.
Examples: DeepWiki for code documentation, Karpathy's LLM Wiki for general files, and Factory's AutoWiki.

An Agentic AI Pipeline for Appliance-Level Energy Anomaly Detection and LLM-Driven Recommendations

2026-06-30 04:00 UTC

This paper proposes an end-to-end agentic pipeline combining deep time-series forecasting, variational anomaly detection, and LLM reasoning to generate prioritized, actionable maintenance recommendations for office building appliance-level energy monitoring. The system uses a hybrid SSA-LSTM forecasting model and per-appliance LSTM VAE with attention for anomaly detection, with a three-stage LangChain pipeline (Context, Diagnosis, Report agents) featuring dynamic retrieval. Evaluated on a 16-scenario benchmark, the best backend scores 90.4/100 and a local 7B model passes all scenarios.

Hybrid SSA-LSTM forecasting with per-appliance LSTM VAE attention for anomaly detection
Three-stage LangChain agent pipeline: Context, Diagnosis, Report agents with dynamic retrieval

Benchmarking Agent Tool Use

2026-06-30 01:27 UTC

LangChain releases four new test environments to benchmark LLMs' ability to use tools effectively, covering function calling, planning, and reasoning. Tests include single-tool and multi-tool typewriter tasks, relational data queries, and a math task with altered rules. Key findings: GPT-4 excels on relational data but fails on longer trajectories; Claude 2.1 matches GPT-4 on three tasks; open-source models like Mistral 7b struggle with multi-step function composition; planning remains challenging for all models.

LangChain introduces four benchmarks for LLM tool use: Typewriter (single & 26 tools), Relational Data, and Multiverse Math.
GPT-4 scores highest on Relational Data but still fails on simple long-horizon tasks.

Extraction Benchmarking

2026-06-30 01:27 UTC

Compare GPT-4, Claude, and open-source LLMs on structured data extraction from chat logs. Benchmark results, evaluation metrics, and dataset creation insights.

LangChain releases an extraction benchmark dataset for chat log structured data.
GPT-4 outperforms Claude-2 and open-source models across most metrics.

Introducing Dynamic Subagents in Deep Agents

2026-06-29 16:17 UTC

Dynamic subagents let AI agents orchestrate work at scale using code instead of tool calls. Learn how programmatic orchestration in Deep Agents guarantees coverage, handles fan-out, and unlocks reliable multi-step, complex agent pipelines with common orchestration patterns and live traces.

Dynamic subagents replace tool-call-based subagent invocation with programmatic orchestration, improving reliability at scale. They allow models to write code (loops, branches, concurrency) to manage subagents.
Key benefits include deterministic coverage (no skipped items) and reliable complex orchestration for multi-phase pipelines, fan-out + synthesis, and conditional branching.

How Candidly Built State-Aware Agent Harnesses with LangSmith

2026-06-29 16:09 UTC

Candidly built a state-aware conversational agent harness that uses an Input-Output Hidden Markov Model (IO-HMM) to infer user engagement states in real time from conversation traces, enabling targeted response policies that reduce disengagement. The system identifies four states—Engaged, Detailed, Guided, and Disengaging—and cuts disengaging turns from 23% to 11%.

Candidly uses an IO-HMM to model user states from lightweight trace features, achieving 0.90 AUC for outcome prediction.
Four engagement states emerge: Engaged (53%), Detailed (7%), Guided (17%), Disengaging (23%), with resolution rates from 78% to 30%.

Prompt Caching with Deep Agents

2026-06-26 17:13 UTC

Learn how Deep Agents uses prompt caching to cut LLM token costs by up to 80% across every major model provider - no extra config required.

Prompt caching reduces token costs by 41-80% by storing model state after processing a prompt.
Different providers have varying support for caching features, making provider-agnostic optimization tricky.

June 2026: LangChain Newsletter — Fleet On-Call Copilot, Deep Agents Rubrics, and More

2026-06-25 17:42 UTC

New in LangSmith: a Fleet on-call copilot for alert triage, computer use for agents, voice trace debugging, and experiment status tracking. Plus Deep Agents Rubrics, programmatic subagents, a new LangSmith Deployment course, and upcoming events in Chicago, Berlin, DC, and Vegas.

Fleet On-Call Copilot: a prebuilt agent template that triages alerts and drafts updates using code, traces, and runbooks.
Computer Use: agents can now operate an isolated virtual computer for code, files, and authenticated API calls.

Why the Best AI Agents Are Simple: Sierra’s Zack Reneau-Wedeen on the Max Agency Podcast

2026-06-25 14:36 UTC

On the Max Agency Podcast, Zack Reneau-Wedeen discusses the future of AI agents, advocating for simple architectures, outcome-based pricing, and avoiding 'org chart shipping.' He shares insights from building customer-facing agents at Sierra.

Simple agent architectures outperform complex multi-agent systems
Outcome-based pricing aligns incentives for high-value tasks

How Klarna's AI assistant redefined customer support at scale for 85 million active users

2026-06-24 20:08 UTC

Klarna's AI assistant, built on LangGraph and LangSmith, handles the work of 700 full-time staff, reducing customer query resolution time by 80% and automating 70% of repetitive support tasks.

Klarna's AI assistant handles over 2.5 million conversations, performing the work of 700 full-time employees.
The assistant reduced average customer query resolution time by 80% and automated ~70% of repetitive tasks.

How LangSmith and LangChain OSS Help You Meet EU AI Act Requirements

2026-06-24 19:56 UTC

The EU AI Act compliance deadline is August 2, 2026. This article explains what the Act requires for high-risk AI systems and how LangSmith and LangChain OSS help meet each requirement through full observability, automated evaluations, human oversight, and more.

EU AI Act requires risk management, automatic logging, transparency, human oversight, and post-market monitoring for high-risk AI systems.
LangSmith provides end-to-end tracing capturing every agent input, reasoning step, tool call, and output.

How to Build Memory into AI Agents

2026-06-24 16:11 UTC

A practical guide to adding memory to AI agents, covering short-term and long-term memory concepts, trace analysis, and how LangSmith's tools enable a complete memory loop for agent improvement across runs.

Memory enables agents to remember user preferences and corrections, reducing repeated instruction.
Short-term memory handles current tasks; long-term memory persists facts, preferences, and skills.

Building Browser-Using AI Agents in Python

2026-06-22 12:00 UTC

This article explains how to build AI agents that can browse and interact with real websites using Playwright, browser-use, and LangGraph. It covers Playwright's advantages over Selenium (30-50% faster, persistent WebSocket, built-in auto-waiting, realistic events), setup steps, dynamic page scraping, multi-step form filling, anti-bot detection handling, session persistence, and Docker deployment. Through code examples, readers will create a working browser agent that navigates sites, fills forms, extracts structured data, and uses an LLM for decision-making.

Playwright outperforms Selenium with persistent WebSocket connections, 30-50% faster operations, and built-in auto-waiting and realistic mouse/keyboard events.
Setup requires Python 3.10+, an OpenAI API key, and a few pip installs, including Playwright browser binaries.

Introducing LangSmith’s No Code Agent Builder

2026-06-18 17:32 UTC

LangSmith launches a no-code agent builder that enables non-technical users to create AI agents with memory, guided prompts, and MCP tools. The builder uses conversational guidance, built-in memory, and sub-agents to lower the barrier for agent development, suitable for internal productivity use cases.

LangSmith Agent Builder offers a no-code experience with memory and guided prompt creation.
Agents consist of four core components: prompt, tools, triggers, and sub-agents.

NAVI-Orbital: First In-Orbit Demonstration of a Zero-Shot Vision-Language Model for Autonomous Earth Observation

2026-06-18 04:00 UTC

This paper presents NAVI-Orbital, a software system on a LEO spacecraft that achieved the first in-orbit demonstration of a vision-language model performing autonomous multi-modal inference entirely onboard on April 16, 2026. Using Gemma 3 and LangGraph, it classifies scenes, generates descriptions, and responds to operator dialogue. Ground benchmark accuracy 88.16%, and it successfully processed uncorrected YAM-9 imagery onboard, demonstrating feasibility of semantic compression to reduce downlink bandwidth.

First in-orbit demonstration of zero-shot vision-language model for autonomous multi-modal inference
Uses Gemma 3 and LangGraph for natural language tasking and dialogue

How (and Why) I Built an AI Assistant

2026-06-17 14:00 UTC

The author explains the motivation behind building a custom AI assistant instead of using existing tools, detailing the architecture, tech stack, and implementation process including LLM, LangChain, memory management, and tool integration.

Building a custom AI assistant provides better control, data privacy, and workflow customization.
The stack includes GPT-4o, LangChain for orchestration, SQLite-backed persistent memory, and tools like DuckDuckGo search.

Verified Detection and Prevention of Concurrency Anomalies in Multi-Agent Large Language Model Systems

2026-06-17 04:00 UTC

This research addresses concurrency anomalies in multi-agent LLM systems by formalizing four anomaly types using TLA+ and building a mechanically verified consistency hierarchy L0-L4. With 274 Verus proof obligations, the detectors are proven sound and complete. Three Rust runtimes implement L0-L1, and the work reproduces real-world anomalies in ByteDance's deer-flow and LangGraph, providing verified fixes.

Formalizes four concurrency anomalies in multi-agent LLM systems as TLA+ specifications
Builds the first machine-checked consistency hierarchy L0-L4 with 274 Verus obligations

How Factory Used LangSmith to Automate Feedback Loop and Double Iteration Speed

2026-06-16 18:11 UTC

Factory AI leveraged LangSmith's observability and feedback API to close the product feedback loop, achieving a 2x improvement in iteration speed and significant reductions in development cycle time.

Factory integrated LangSmith with AWS CloudWatch for enhanced observability and debugging.
Using LangSmith's Feedback API, Factory automated prompt optimization, reducing manual effort.

Introducing Open SWE: An Open-Source Asynchronous Coding Agent

2026-06-16 18:08 UTC

Open SWE is an open-source, cloud-hosted coding agent that autonomously handles GitHub tasks—planning, coding, testing, and opening PRs. It features a multi-agent architecture, human-in-the-loop control, and asynchronous execution.

Open SWE is an open-source, async, cloud-hosted coding agent that integrates directly with GitHub.
It uses a multi-agent architecture (Planner, Programmer, Reviewer) to ensure code quality.

Monte Carlo: Building Data + AI Observability Agents with LangGraph and LangSmith

2026-06-16 18:08 UTC

Monte Carlo built an AI Troubleshooting Agent on LangGraph and debugged with LangSmith to help data teams resolve issues faster by exploring multiple investigation paths in parallel.

Monte Carlo used LangGraph to create a dynamic graph for automated, parallel troubleshooting.
LangSmith enabled visualization and rapid iteration of prompts from day one.

Sharing LangSmith Benchmarks

2026-06-16 18:07 UTC

LangSmith launches public benchmarks and evaluation dataset sharing to help developers compare LLM architecture performance. The first benchmark is a Q&A dataset over LangChain docs, accompanied by the langchain-benchmarks package. The article analyzes various models and architectures, providing insights into performance and debugging.

LangSmith now supports sharing evaluation datasets and results for community-driven benchmarks.
The initial benchmark is a Q&A dataset over LangChain docs to test RAG systems.

LangSmith: Redesigned product homepage and Resource Tags for better organization

2026-06-16 18:07 UTC

LangSmith's homepage is now organized into Observability, Evaluation, and Prompt Engineering, with improved Resource Tags for flexible resource grouping. Onboarding guides and upcoming ABAC enhance usability.

Homepage divided into three sections: Observability, Evaluation, and Prompt Engineering.
Resource Tags now support flexible grouping by 'Application' or custom tags.

Agent Engineering: A New Discipline

2026-06-16 18:06 UTC

Agent engineering is an emerging discipline that integrates product thinking, engineering, and data science to build reliable LLM agents through rapid iteration and production feedback. It addresses the unpredictability of agents by cycling through build, test, ship, observe, and refine, as practiced by companies like Clay, Vanta, LinkedIn, and Cloudflare.

Agent engineering is an iterative process: build, test, ship, observe, refine, repeat.
It combines product thinking (scope and behavior), engineering (infrastructure), and data science (measurement and improvement).

Testing Fine Tuned Open Source Models in LangSmith

2026-06-16 18:06 UTC

Evaluate and compare fine-tuned open source LLMs using LangSmith. Test multiple models, automate evaluations, and choose the best performing AI.

LangSmith provides UI and API to create evaluation datasets for easy model comparison.
Fine-tuned Llama2-7b (78k rows) and Llama2-13b (10k rows) for SQL generation.

Human judgment in the agent improvement loop

2026-06-16 18:04 UTC

AI agents work best when they reflect the knowledge and judgment your team has built over time. This article explores how to integrate human judgment into each stage of agent development, using a trader copilot example. It covers workflow design, tool design, and context engineering, and emphasizes the importance of automated evaluations and continuous iteration.

Agents need tacit knowledge from domain experts
Human judgment can be embedded through workflow, tool, and context design

Context Management for Deep Agents

2026-06-16 18:04 UTC

Learn how Deep Agents SDK manages context for long-running AI tasks through offloading, summarization, and filesystem abstraction to prevent context rot.

Three compression techniques: offloading large tool results (>20K tokens), offloading large tool inputs (at >85% context), and summarization (when offloading insufficient).
Offloaded content is saved to filesystem with pointers; agent can retrieve via file operations.

The Art of Loop Engineering

2026-06-16 16:59 UTC

This post explores how to build reliable AI agents by designing loops, not just using a good model. It introduces four nested loops: the agent loop, verification loop, event-driven loop, and hill climbing loop, each building on the previous to create agents that work consistently and improve over time. Using LangChain primitives, developers can implement each level and embed human oversight where needed.

The agent loop lets the model call tools repeatedly to complete tasks. It's the fundamental loop.
The verification loop checks output quality and provides feedback, ensuring consistency.

Why Fleet Has General Purpose Chat and Specialized Agents

2026-06-16 15:50 UTC

Fleet supports both quick, ad hoc tasks and recurring responsibilities. See how General Purpose Chat and Specialized Agents help teams delegate work.

Two patterns of agent work: ad hoc and recurring. Fleet uses General Purpose Chat for one-off tasks and Specialized Agents for repetitive work.
Specialized Agents offer configurable instructions, tools, models, subagents, skills, triggers, and persistent memory.

Remember, Don't Re-read: Stateful ReAct Agents for Token-Efficient Autonomous Experimentation

2026-06-16 04:00 UTC

This paper introduces a stateful ReAct agent built with LangGraph that avoids re-reading entire history at each iteration, cutting token costs by 90% in hyperparameter tuning and 52% in code optimization while maintaining performance. It provides a blueprint for practitioners to implement token-efficient autonomous experimentation. (Source: arXiv, June 2026)

Stateless autore search reconstructs context iteratively, leading to O(n²) token cost. The stateful ReAct agent reduces per-iteration cost to O(1).
On hyperparameter tuning (15 iterations), token consumption dropped from 24,465 to 2,492 (90% reduction).

Building a 100x Cheaper Trace Judge with Fireworks

2026-06-15 17:06 UTC

LangChain and Fireworks fine-tuned an open model to mine perceived error signals from production traces, matching frontier model performance at a fraction of the cost.

LangSmith processes billions of tokens daily across production traces.
Fine-tuned Qwen model detects 'Perceived Error' at frontier performance with 100x cost savings.

LangChain

Source Mix

Topic Mix

Timeline

Latest Updates

An educational lab of AI agent architectures

OpenWiki Brains: Proactive Memory for AI Agents

Build An Auditable VC Research Agent With The Perplexity Agent API, LangGraph, And LangSmith

LLM Orchestration Frameworks Compared: LangChain vs. LlamaIndex vs. Raw API Calls

LangChain and NVIDIA Launch NemoClaw Deep Agents Blueprint

Tuning the harness, not the model: a Nemotron 3 Ultra playbook

NVIDIA Nemotron Achieves Benchmark-Leading Performance With LangChain Deep Agents Harness

Deep Agents Code on NVIDIA NemoClaw

BrAIn, reactive AI agent nodes on a NATS bus instead of a chat loop

[AINews] Lilian Weng summarizes 35 papers on Harness Engineering for RSI

Improving Agents is a Data Mining Problem

How Schneider Electric Built Their LLMOps Foundations With LangSmith

Deep Agents: A Batteries-Included Agent Harness

We Ran a Complex Task – A LangChain Repo Analysis with Claude Fable Models

Your coding agent bill doubled. Here’s how to fix it.

10 Agentic AI Frameworks You Should Know in 2026

OpenWiki: Open Source Repo Documentation for Coding Agents

How to Use RLMs in Deep Agents

How Pendo uses LangSmith to trace Novus from user behavior to code fixes

Harbor x LangChain: A Unified Stack for Evaluating Agents

Wiki Memory

An Agentic AI Pipeline for Appliance-Level Energy Anomaly Detection and LLM-Driven Recommendations

Benchmarking Agent Tool Use

Extraction Benchmarking

Introducing Dynamic Subagents in Deep Agents

How Candidly Built State-Aware Agent Harnesses with LangSmith

Prompt Caching with Deep Agents

June 2026: LangChain Newsletter — Fleet On-Call Copilot, Deep Agents Rubrics, and More

Why the Best AI Agents Are Simple: Sierra’s Zack Reneau-Wedeen on the Max Agency Podcast

How Klarna's AI assistant redefined customer support at scale for 85 million active users

How LangSmith and LangChain OSS Help You Meet EU AI Act Requirements

How to Build Memory into AI Agents

Building Browser-Using AI Agents in Python

Introducing LangSmith’s No Code Agent Builder

NAVI-Orbital: First In-Orbit Demonstration of a Zero-Shot Vision-Language Model for Autonomous Earth Observation

How (and Why) I Built an AI Assistant

Verified Detection and Prevention of Concurrency Anomalies in Multi-Agent Large Language Model Systems

How Factory Used LangSmith to Automate Feedback Loop and Double Iteration Speed

Introducing Open SWE: An Open-Source Asynchronous Coding Agent

Monte Carlo: Building Data + AI Observability Agents with LangGraph and LangSmith

Sharing LangSmith Benchmarks

LangSmith: Redesigned product homepage and Resource Tags for better organization

Agent Engineering: A New Discipline

Testing Fine Tuned Open Source Models in LangSmith

Human judgment in the agent improvement loop

Context Management for Deep Agents

The Art of Loop Engineering

Why Fleet Has General Purpose Chat and Specialized Agents

Remember, Don't Re-read: Stateful ReAct Agents for Token-Efficient Autonomous Experimentation

Building a 100x Cheaper Trace Judge with Fireworks

Company Directory

OpenAI

Anthropic

DeepSeek

Google

Meta

Microsoft

NVIDIA

Mistral

Hugging Face

LangChain