AI News HubLIVE

Today's must-reads

Agents

We built an agent that runs our AI data platform

Encord introduces Merlin, an agentic intelligence layer that integrates via MCP into Claude, Codex, and other platforms, enabling users to manage AI data infrastructure through conversation across the build, observe, and optimize phases of the data lifecycle.

  • Merlin is Encord's agentic intelligence layer for conversational AI data management.
  • It integrates via MCP into Claude, Codex, and other agentic coding platforms, with Slack and more coming soon.
In-site article

SpaceX Aims at Agentic Coding With $60B Cursor Acquisition

The acquisition could help SpaceX expand its developer offerings and will give it access to Cursor’s developer workflow and user analytics.

  • SpaceX acquires AI coding tool Cursor for $60 billion
  • Move aims to expand SpaceX's developer ecosystem
In-site article

AI's Silent Leap: From Code to Cognition

By using AI daily, the author discovers that the real bottleneck is not coding itself but the mental drain from context switching. AI enables longer periods of focused thought, shifting energy from syntax and debugging to architecture and systems thinking, while maintaining judgment and taste remains crucial.

  • AI reduces mental fatigue from context switching, allowing developers to stay focused longer.
  • AI acts as externalized working memory, enabling parallel handling of different problem layers.
In-site article

Show HN: Ctx, save tokens by loading only the relevant tools

Ctx is a context management tool for Claude Code and custom LLMs that recommends a small, top-scored bundle of skills, agents, and MCP servers for the current task by analyzing a 102,928-node graph, saving tokens and improving quality.

  • Ctx watches what you are building and recommends relevant tools from a large graph, avoiding context waste.
  • Supports Claude Code and custom local/API models with a separate setup flow.
In-site article

How Factory Used LangSmith to Automate Feedback Loop and Double Iteration Speed

Factory AI leveraged LangSmith's observability and feedback API to close the product feedback loop, achieving a 2x improvement in iteration speed and significant reductions in development cycle time.

  • Factory integrated LangSmith with AWS CloudWatch for enhanced observability and debugging.
  • Using LangSmith's Feedback API, Factory automated prompt optimization, reducing manual effort.
In-site article

Introducing Open SWE: An Open-Source Asynchronous Coding Agent

Open SWE is an open-source, cloud-hosted coding agent that autonomously handles GitHub tasks—planning, coding, testing, and opening PRs. It features a multi-agent architecture, human-in-the-loop control, and asynchronous execution.

  • Open SWE is an open-source, async, cloud-hosted coding agent that integrates directly with GitHub.
  • It uses a multi-agent architecture (Planner, Programmer, Reviewer) to ensure code quality.
In-site article

Monte Carlo: Building Data + AI Observability Agents with LangGraph and LangSmith

Monte Carlo built an AI Troubleshooting Agent on LangGraph and debugged with LangSmith to help data teams resolve issues faster by exploring multiple investigation paths in parallel.

  • Monte Carlo used LangGraph to create a dynamic graph for automated, parallel troubleshooting.
  • LangSmith enabled visualization and rapid iteration of prompts from day one.
In-site article
Tools

AI Consciousness: The Delusionals and the Philosopher's Bench

This article explores the debate surrounding AI consciousness, distinguishing between 'delusionals' who believe AI can be conscious and philosophers who take a skeptical stance.

  • Delusionals argue for AI consciousness without strong evidence.
  • Philosophers emphasize the need for rigorous definitions and evidence.
In-site article
Research

The 8 best early Prime Day headphone deals I'd upgrade to immediately as a headphones enthusiast

Amazon's Prime Day is earlier than usual this year, running June 23-26. Several flagship headphones released in mid-to-late 2025 are on sale. I've tested every pair on this list and recommend them for different reasons.

  • Prime Day runs June 23-26, one month earlier than usual.
  • Newer 2025 models like Bowers & Wilkins Px7 S3 and Sony WH-1000XM6 see first discounts.
In-site article
Policy
Other updates (24)
Models

Sharing LangSmith Benchmarks

LangSmith launches public benchmarks and evaluation dataset sharing to help developers compare LLM architecture performance. The first benchmark is a Q&A dataset over LangChain docs, accompanied by the langchain-benchmarks package. The article analyzes various models and architectures, providing insights into performance and debugging.

  • LangSmith now supports sharing evaluation datasets and results for community-driven benchmarks.
  • The initial benchmark is a Q&A dataset over LangChain docs to test RAG systems.
In-site article

Agent Engineering: A New Discipline

Agent engineering is an emerging discipline that integrates product thinking, engineering, and data science to build reliable LLM agents through rapid iteration and production feedback. It addresses the unpredictability of agents by cycling through build, test, ship, observe, and refine, as practiced by companies like Clay, Vanta, LinkedIn, and Cloudflare.

  • Agent engineering is an iterative process: build, test, ship, observe, refine, repeat.
  • It combines product thinking (scope and behavior), engineering (infrastructure), and data science (measurement and improvement).
In-site article

Testing Fine Tuned Open Source Models in LangSmith

Evaluate and compare fine-tuned open source LLMs using LangSmith. Test multiple models, automate evaluations, and choose the best performing AI.

  • LangSmith provides UI and API to create evaluation datasets for easy model comparison.
  • Fine-tuned Llama2-7b (78k rows) and Llama2-13b (10k rows) for SQL generation.
In-site article

France to ditch AI data tools from Palantir for domestic provider

France’s domestic intelligence service is to ditch AI data tools from the US tech giant Palantir in favour of a domestic provider in an effort to avoid ‘strategic dependency’, the prime minister, Sébastien Lecornu, has said.

  • France's domestic intelligence service will replace Palantir AI tools with ChapsVision.
  • Prime Minister Lecornu emphasizes avoiding strategic dependency on foreign powers.
In-site article

Meet Qwen-RobotSuite: Three Embodied AI Models for VLA Manipulation, Video World Modeling, and Navigation

The Qwen team has released Qwen-RobotSuite, a set of three embodied AI models targeting manipulation, world modeling, and navigation. RobotManip is a Vision-Language-Action model built on Qwen3.5-4B that uses a unified alignment framework to scale manipulation data. RobotWorld is a language-conditioned video world model with a 60-layer MMDiT that predicts future video frames. RobotNav is a navigation model built on Qwen3-VL with a parameterized interface for multiple task modes. The suite achieves state-of-the-art results across several benchmarks.

  • Qwen-RobotSuite comprises three independent models: RobotManip, RobotWorld, and RobotNav.
  • RobotManip addresses heterogeneous manipulation data via a unified alignment framework, achieving SOTA on OOD benchmarks like LIBERO-Plus and RoboTwin-C2R Hard.
In-site article
Agents

LangSmith: Redesigned product homepage and Resource Tags for better organization

LangSmith's homepage is now organized into Observability, Evaluation, and Prompt Engineering, with improved Resource Tags for flexible resource grouping. Onboarding guides and upcoming ABAC enhance usability.

  • Homepage divided into three sections: Observability, Evaluation, and Prompt Engineering.
  • Resource Tags now support flexible grouping by 'Application' or custom tags.
In-site article

Human judgment in the agent improvement loop

AI agents work best when they reflect the knowledge and judgment your team has built over time. This article explores how to integrate human judgment into each stage of agent development, using a trader copilot example. It covers workflow design, tool design, and context engineering, and emphasizes the importance of automated evaluations and continuous iteration.

  • Agents need tacit knowledge from domain experts
  • Human judgment can be embedded through workflow, tool, and context design
In-site article

Context Management for Deep Agents

Learn how Deep Agents SDK manages context for long-running AI tasks through offloading, summarization, and filesystem abstraction to prevent context rot.

  • Three compression techniques: offloading large tool results (>20K tokens), offloading large tool inputs (at >85% context), and summarization (when offloading insufficient).
  • Offloaded content is saved to filesystem with pointers; agent can retrieve via file operations.
In-site article

Enabling Governed Vibe Coding for Enterprise Apps on Databricks

Databricks announced three new capabilities at Data + AI Summit 2026: App Spaces for governance, Genie App Builder for AI-assisted app creation, and Serverless Micro Apps for cost-efficient scale-to-zero apps, enabling governed vibe coding in the enterprise.

  • App Spaces provides a governance boundary for groups of apps, automatically inheriting security policies.
  • Genie App Builder leverages Databricks data context and Unity Catalog semantics to build apps via natural language or screenshots.
In-site article

Show HN: Ito – Code reviews that run code

Ito is an automated QA platform that runs code on pull requests, providing behavioral regression testing without manual scripts. It integrates with GitHub, supports any stack, and generates QA reports with video and screenshots.

  • Ito offers scriptless, execution-based QA that catches behavioral regressions.
  • Supports any stack and requires only a 5-minute setup.
In-site article

Introducing OpenSharing: the Next Evolution of Delta Sharing for the Agentic Era

Databricks announced OpenSharing, the next evolution of Delta Sharing and the industry's first open protocol built for the agentic era. It extends data sharing to the full AI stack — models and agents — and becomes an independent open-source project under the Linux Foundation. OpenSharing enables sharing across any cloud, vendor, and format, addressing challenges of cross-organizational data collaboration. Key features include Genie Agent Sharing for governed AI experiences, SecureConnect for simplified cross-cloud networking, Global Distribution for automatic replication and reduced egress costs, and support for on-premises storage via the Storage Ecosystem. It also adds Apache Iceberg REST Catalog API compatibility for broader interoperability.

  • OpenSharing is the evolution of Delta Sharing, expanding scope to include AI models, agents, and unstructured data.
  • It becomes an independent open-source project under the Linux Foundation, supporting multiple formats including Delta Lake, Apache Iceberg, and Parquet.
In-site article

Logical Ways to Track AI Agent Lineage and State in Code Development

This article explores systematic methods to track the decision history, configuration, and code lineage of AI agents in agentic software development. The author proposes building an 'agent warehouse' for observability and scale, and discusses Git's limitations in storing agent data.

  • Agent development requires tracking metadata including commit SHA, agent version, and session logs.
  • Lineage tracing from code to deployment helps understand agent behavior's impact on the final system.
In-site article

Announcing Apps on Databricks Marketplace

Databricks announces the Public Preview of Apps on Databricks Marketplace, enabling customers to discover, install, and run third-party data and AI applications natively within their secure Databricks workspaces, with no data movement required.

  • Apps on Databricks Marketplace allow customers to discover, install, and run third-party data and AI applications directly within their Databricks workspace.
  • Applications run in a secure, isolated sandbox within Unity Catalog, inheriting existing governance controls.
In-site article

How to Use an Nvidia EGPU with Your Mac for Local AI in 2026

Apple has approved Tiny Corp's TinyGPU driver, enabling Nvidia and AMD eGPUs to work on Apple Silicon Macs for compute workloads. This guide covers hardware recommendations, setup, and performance benchmarks for running CUDA-based local AI.

  • Apple notarized the TinyGPU driver for Nvidia/AMD eGPU support on Mac.
  • Best eGPU pick is RTX 4090 for most users; RTX 5090 for 70B models.
In-site article

Introducing OpenSharing SecureConnect

OpenSharing SecureConnect is a Databricks-managed proxy that simplifies network configuration for cross-organization data sharing. Providers set up once and no longer need to configure networks per recipient. Optionally, private link connectivity via NCC enhances security. Data stays in provider storage. Now in Public Preview.

  • SecureConnect is a Databricks-managed proxy that routes storage access on behalf of recipients.
  • Providers perform a one-time setup; no per-recipient firewall changes required afterward.
In-site article

The Art of Loop Engineering

This post explores how to build reliable AI agents by designing loops, not just using a good model. It introduces four nested loops: the agent loop, verification loop, event-driven loop, and hill climbing loop, each building on the previous to create agents that work consistently and improve over time. Using LangChain primitives, developers can implement each level and embed human oversight where needed.

  • The agent loop lets the model call tools repeatedly to complete tasks. It's the fundamental loop.
  • The verification loop checks output quality and provides feedback, ensuring consistency.
In-site article

I packaged 20 years of enterprise AI sales experience as a Claude Skill

Forward Deployed Selling (FDS) is an open-source enterprise sales methodology for the AI era, packaged as a Claude Skill. It provides a playbook refined over 20 years of experience, aiming to compress sales cycles by 3-10x.

  • FDS is an AI-era sales methodology built from 20 years of enterprise sales experience at AWS and other firms.
  • It is packaged as a Claude Skill, installable in 60 seconds, and includes a full playbook for AI-assisted selling.
In-site article

The Pokémon Trading Card Game AI Battle Challenge

The Pokémon TCG AI Battle Challenge pits AI against the complexities of the Pokémon Trading Card Game. It features a Simulation and Strategy category, with top teams advancing to a final round in late 2026. Prize pool includes $50,000 for the winner.

  • The Simulation Category involves continuous AI matches on Kaggle with live rankings. Submissions open June 16 to August 17, 2026.
  • The Strategy Category requires a report on AI strategy, evaluated on stability, deck concept, and simulation performance. Deadline September 14, 2026.
In-site article

HPE AI Factory With NVIDIA Expands for the Era of Agents

Enterprises are moving agentic AI from proof of concept to production — and the next generation of AI factories are built for the era of agents. At HPE Discover Las Vegas, NVIDIA and HPE are expanding the HPE AI Factory with NVIDIA, including NVIDIA Vera CPU and NVIDIA Agent Toolkit for HPE Private Cloud AI. NVIDIA Confidential Computing extends across HPE AI Factory and enhanced full-stack NVIDIA integration is available throughout the portfolio.

  • NVIDIA Vera CPU, built for agents, will be available with HPE Private Cloud AI in 2027.
  • NVIDIA Agent Toolkit now available with HPE Private Cloud AI, providing an agentic AI operating system.
In-site article
Chips

Parallelize speculative decoding with P-EAGLE on Amazon SageMaker AI

This post walks you through how to use P-EAGLE directly within Amazon SageMaker AI. It demonstrates how to select a compatible model from SageMaker JumpStart, configure parallel drafting, and deploy a highly optimized endpoint.

  • P-EAGLE eliminates sequential drafting by predicting all draft tokens in a single forward pass.
  • Up to 1.69x throughput improvement over EAGLE-3.
In-site article

Apple 2027 rumors: AirPods with cameras for AI and the second folding iPhone

Bloomberg's Mark Gurman details upcoming hardware including camera-equipped AirPods for visual AI context, a second foldable iPhone, and a 20th-anniversary iPhone with an edge-to-edge curved display, all targeted for late 2027.

  • Camera-equipped AirPods scheduled for late 2027, tested with iOS 28.
  • Cameras in stems with indicator lights to provide Siri with visual context.
In-site article

Qualcomm’s latest chip hints that more powerful smart glasses could be on the way

Qualcomm announces the Snapdragon Reality Elite chip for next-gen XR devices, featuring GPU 60% faster, CPU 30% faster, NPU up to 160% faster, and improved cooling and battery life. Already powering the upcoming Aura glasses, the chip promises better visuals, AI capabilities, and lighter designs for smart glasses.

  • Qualcomm unveils Snapdragon Reality Elite chip for XR devices with significant performance boosts.
  • GPU up 60%, CPU up 30%, NPU up to 160%.
In-site article

Supply Chain Capitalism, Platform Mercantilism, AI Coup: A Political Economy of Dependencies

This article analyzes how different exploitation strategies within capitalism have led to an enormous concentration of power, threatening democratic societies. Through three case studies—supply chain capitalism, platform mercantilism, and the AI coup—the author develops a 'political economy of dependencies' to understand the fusion of tech oligarchs and political power, with AI as a central driver.

  • Supply chain capitalism (since the 1980s) outsourced production globally, hollowing out Western brands and creating complex networks.
  • Platform mercantilism (2010s) turned digital platforms into powerful extractors and gatekeepers.