AI Daily Briefing 2026-05-29

Today's must-reads

Agents

Prompt: Robinhood Wants AI Agents to Trade, Spend on Your Behalf

2026-05-29

Robinhood's new tools could bring AI-driven trading and financial transactions into the mainstream.

Robinhood is developing AI agents to execute trades and spending decisions on behalf of users.
These tools aim to simplify financial operations by letting AI manage daily transactions.

Does AI Make Totalitarianism More Likely?

2026-05-29

This essay examines how AI could shift the balance between centralized and decentralized governance, potentially enabling a new wave of totalitarianism. It reviews historical precedents where communication technologies bolstered authoritarian control, and analyzes structural mechanisms—from Hayekian knowledge problems to selectorate theory—to argue that AI may lower the cost of central planning, surveillance, and propaganda, thereby narrowing the historical performance gap between democracies and dictatorships.

AI could enhance centralized information processing, monitoring, and persuasion, reducing costs of authoritarian rule.
Historical examples show technology cuts both ways: radio and tabulating machines aided Nazis, while printing and the internet empowered dissent.

Anthropic Opus 4.8 Shows the AI Lab is Paying Attention to Customers

2026-05-29

The model helps enterprises with complex workflows and lets them choose a mode that works for their application.

Anthropic Opus 4.8 focuses on enterprise complex workflows
Model offers selectable modes for different applications

Agents aren't the problem – Existing systems and API's were not built for AI

2026-05-29

MCP Bridge addresses the challenge of making enterprise APIs readable by AI agents through a hybrid search approach and AI Enrichment, which automatically generates meaningful names and descriptions from API response shapes, dramatically improving tool-selection accuracy.

Hybrid search combining FTS and vector search with a reranker improves tool discovery.
Enterprise APIs often have opaque names like 'getProcInfo3' with poor documentation.

25 Most Influential AI Pioneers to Meet at DataHack Summit 2026

2026-05-29

The strongest AI voices are not just people with impressive job titles. They are researchers pushing the technical boundaries of AI. Founders building AI communities. Practitioners turning models into products. Leaders helping businesses understand what this technology can actually do. This article highlights 25 top AI voices appearing at DataHack Summit 2026, including researchers from Google DeepMind, Microsoft AI, and leaders from Walmart, Novartis, and more.

DataHack Summit 2026 will feature 25 influential AI pioneers from research, industry, and academia.
Speakers include Dheeraj Nagaraj (Google DeepMind), Alessandro Romano (Kuehne+Nagel), and others.

“The AI did it” won’t save you when EU regulators come knocking

2026-05-29

The EU's Cyber Resilience Act (CRA) will soon hold organizations accountable for cybersecurity, with reporting obligations starting September 2026 and full compliance by December 2027. The regulation applies to all connected products and software sold in the EU, including AI-generated code. Key requirements include secure-by-design development, lifecycle vulnerability handling, SBOM transparency, and 24-hour reporting of exploited vulnerabilities. Organizations must act now to audit, document, and implement SBOM tools. "The AI did it" is no defense.

The EU Cyber Resilience Act (CRA) imposes strict cybersecurity requirements on all connected products sold in the EU, with key deadlines in 2026 and 2027.
Organizations must integrate security into development lifecycle, provide SBOMs, and report actively exploited vulnerabilities within 24 hours.

Tools

I've used Gemini in Android Auto for 2 months now, and it's transformed my daily drive in 4 ways

2026-05-29

After two months, Gemini in Android Auto has made driving safer, more entertaining, and more productive, turning car time into something to look forward to.

Less reaching for phone/screen
Family drives become fun with trivia and interactive stories

Chips

BYD Launches 4nm AI Chip: On Par with NVIDIA in Process, Outperforms Tesla in Compute

2026-05-29

BYD unveiled its first self-developed 4nm automotive-grade smart driving chip, Xuanji A3, achieving over 2100 TOPS with three chips combined. The dedicated NPU architecture offers 20% lower power per unit and 100% higher compute utilization compared to general-purpose GPUs. BYD also promises full compensation for accidents during city navigation.

BYD unveils fully self-developed 4nm smart driving chip Xuanji A3
Dedicated NPU delivers 20% lower power and 100% higher compute efficiency

A full stack platform for Edge AI from Google

2026-05-29

Google's Coral platform provides a full-stack solution for edge AI, offering tools for software and hardware developers to deploy AI models locally.

Coral combines AI-first hardware with unified developer experience
Supports PyTorch, JAX, and LiteRT models via MLIR compiler

Models

3000 tokens/sec LLM playground

2026-05-29

A high-speed LLM playground achieving 3000 tokens per second, featuring an open web UI.

3000 tokens per second throughput
Open WebUI interface

Other updates (177)

Agents

Winning under CMS TEAM: Building the learning health system to realize success in VBC today and tomorrow

2026-05-29

Starting January 1, 2026, over 700 hospitals in the US must manage cost and quality for five high-volume surgical episodes under the CMS TEAM program. Success demands a unified, AI-enabled data platform to enable proactive intervention, with typical outcomes including 15% reduction in SNF costs and 12% reduction in readmissions.

CMS TEAM program mandates bundled payments for five surgical episodes starting January 2026.
Hospitals need a unified data platform integrating clinical, claims, and post-acute data.

TheFoundry – Easy Bootstrapping Framework for MultiAgent Systems

2026-05-29

TheFoundry is a user-friendly, enterprise-ready Multi-Agent System (MAS) bootstrapping framework that solves critical AI coding failures like token amnesia, infinite loops, and agent collisions. It employs a pull-based workflow, shared kanban board, context scoping, step budgets, deterministic TOML-based communication, and an ephemeral bootstrapper to orchestrate specialized AI agents in building software projects autonomously.

Pull-based workflow: agents read tasks from their own queues rather than receiving pushes, avoiding context loss.
Shared kanban board: agents update team_status.md in real-time for team awareness.

AI is shipping code faster than security was built to handle

2026-05-29

Snyk enters the AI-powered penetration testing market with Evo Continuous Offensive Security (COS), addressing the vulnerability gap created by AI-generated code and agentic attackers. The product offers continuous testing vs. traditional 15-day annual coverage, leveraging platform context to find both classic and AI-specific flaws.

Snyk launches Evo COS for continuous AI-powered penetration testing.
Distinguishes between heuristic-detectable and context-dependent vulnerabilities.

Show HN: Adaptive Runtime – AI agent layer, no GPU, crash recovery

2026-05-29

Adaptive Runtime is an open-source Python library that provides a runtime intelligence layer for stateful AI systems. It features five core engines (State, Context, Confidence, Decision, Recovery) that address production runtime issues like crash recovery, state persistence, confidence scoring, and more. No GPU required, runs on low-cost VPS.

Adaptive Runtime is a runtime intelligence layer for stateful AI systems, addressing production runtime problems.
It includes five core engines: State, Context, Confidence, Decision, and Recovery.

PPIO Selected for '2026 Global AI 100' by FeiFan Research, Leading the New Wave of AI Globalization

2026-05-29

PPIO has been named to the '2026 Global AI 100' list by FeiFan Research, recognized at the FeiFan Awards – Annual AI Globalization Summit. The list honors AI-native companies with global vision. PPIO offers a global distributed computing infrastructure, full-stack cloud services, a model platform supporting DeepSeek, GLM, MiniMax, Kimi, Qwen, and an innovative Agent Sandbox. As of April 2026, PPIO has integrated over 4,800 distributed nodes, with daily token calls exceeding 1 trillion, over 570,000 developers, and Agent Sandbox business growing more than 50x since launch. PPIO was also designated as a pilot unit for Shanghai's Digital Overseas Service Platform and a GDA Pilot Service Station.

PPIO selected for '2026 Global AI 100', highlighting its leadership in AI globalization.
Provides global distributed computing infrastructure with full GPU coverage for training and inference.

Is AI causing a repeat of Front end's Lost Decade?

2026-05-29

This article examines how AI is deskilling programming, drawing parallels to the transformation of frontend development over the past decade. It discusses deskilling, abstraction, leaky abstractions, and the Bauhaus movement as a potential response.

AI is deskilling programming skills, similar to how JS frameworks deskilled frontend development.
Agentic coding is a leaky abstraction, requiring deep understanding to fix issues when abstractions fail.

The Age of Ungovernable AI Bureaucracy

2026-05-29

The author argues that AI, rather than liberating us from bureaucracy, has created a new, unaccountable form of it. While AI excels at mundane tasks like summarizing emails and filing expenses, its inherent lack of understanding of purpose, coupled with safety training that makes it risk-averse, results in a bureaucratic machine that generates 'workslop' and resists governance. The article warns that AI's probabilistic nature and lack of accountability mean that when things go wrong, there is no one to fire.

AI's main value lies in handling routine bureaucratic tasks, but it introduces a new, ungovernable bureaucracy.
Models are trained to be cautious, leading to increased rejections and bland, uniform outputs.

Beyond Next-Token Prediction: Enforcing Legal Hierarchy with Neurosymbolic Graph

2026-05-29

Traditional generative AI’s next-word prediction is risky for legal analysis. Next-generation legal tech combines Neurosymbolic AI with GraphRAG to enforce legal hierarchy and contextual understanding, reducing hallucinations and providing transparent audit trails.

Neurosymbolic AI merges language models with a symbolic logic engine to enforce legal reasoning chains and source hierarchy.
GraphRAG maps legal documents into a knowledge graph for contextual retrieval rather than isolated snippets.

Crabbox.sh Pond – Runtime Pools for AI Agents and CI

2026-05-29

Pond is a lightweight mechanism in Crabbox.sh for grouping related leases, discovering them, and releasing them together. It supports multiple transport planes (Tailscale, URL bridge, SSH mesh) and allows mixing different providers. This article covers the core concepts, quick start, commands, transport planes, use cases, and Tailscale integration.

Pond is a logical grouping of active leases via a shared pond= label.
Supports three transport planes: Tailscale, URL Bridge, and SSH mesh for different communication methods.

Flathub prohibits AI-generated code

2026-05-29

Flathub has updated its policy to explicitly ban applications containing AI-generated or AI-assisted code, documentation, or other content. The policy also prohibits AI-generated pull requests or reviews. Exceptions may be granted for mature, well-maintained projects.

Flathub's Generative AI policy now bans AI-generated code in submissions.
AI-generated pull requests, reviews, and automation are disallowed.

Adobe’s conversational AI agent is a mediocre design intern

2026-05-29

AI image tools rarely make me feel like I’m part of the creative process. They are, afterall, mostly designed so that people with no design experience can type in a few words and get back a usable result. So I was pleasantly surprised by Adobe's latest take on an AI image assistant: it’s a bot designed to take away some busywork, while still granting you creative control. Unlike AI generators that are specifically designed to make and edit images or video, Adobe's Firefly AI Assistant, which I've been testing in beta, is more like a multitasking middleman that can operate Adobe's design apps for you. On its website, Adobe says that you can “tell Firefly AI Assistant (beta) what you need, and it will use tools from apps like Photoshop, Illustrator, and more to complete multistep projects in moments.”

Adobe's Firefly AI Assistant can operate Photoshop and Illustrator to complete multistep projects.
The assistant explains its editing process in detail and is forthcoming about its limitations.

Cognition (Devin): $1B Series D at $26B valuation

2026-05-29

Cognition has raised over $1B at a $26B valuation led by Lux Capital, General Catalyst, and 8VC. Its AI software engineer Devin has seen enterprise usage grow >10x since the start of the year, with run-rate revenue reaching $492M. Customers like Mercedes-Benz cut an eight-month project to eight days. Cognition is moving toward self-driving software development, with 89% of its own code committed by Devin.

Cognition raises over $1B at $26B valuation in Series D led by Lux Capital, General Catalyst, and 8VC
Devin enterprise usage grows >10x since start of 2025, run-rate revenue hits $492M

ModelBest's 'Open Source Week': A Systemic Declaration Defining the Endgame of On-Device AI

2026-05-29

From May 25 to 29, ModelBest jointly organized an 'On-Device LLM Open Source Week' with the OpenBMB community, releasing five key technological achievements that form a full-stack closed loop: BitCPM-CANN (1.58-bit low-bit training model supporting Ascend), MiniCPM5-1B (outperforming models twice its size), ForgeTrain (AI-written training framework 10% faster than Megatron), PilotDeck (agent operating system), and UltraData (core dataset). These releases demonstrate that the on-device AI competition is a systemic engineering challenge, not a single technology race. MiniCPM5-1B surpasses parts of GPT-4o, validating the 'density law.' ModelBest's two-year lead and deep tech stack position it as a key player in the shift from cloud to edge.

ModelBest held an On-Device LLM Open Source Week from May 25-29, 2026, releasing one key technology each day.
The five releases cover training framework, model compression, data, and agent OS, showcasing systemic innovation.

5 Billion Tokens Free! World's First Commercial AI Host Launched, Unleashing Token Consumption

2026-05-29

Lenovo launches the world's first commercial AI host series, designed for one-person companies (OPC) and growing enterprises. By combining local and cloud hybrid architecture, it addresses high token costs and data security issues, offering generous token bonuses and out-of-box experience.

Lenovo unveils three AI hosts: mini 100, 300, and Pro 700, catering from individuals to teams.
Local inference plus cloud elasticity reduces token costs by 70%-95%.

Zero Skill Floor, AAA Ceiling: Tencent's AI Game Creation Platform Is Wild

2026-05-29

The next wave of AI creation is hitting gaming. Tencent has unveiled 'Project Craft', an AI-powered game creation platform that lets users generate playable games through natural language, supports 2D and 3D, and comes with AIGC tools and free assets to slash the barrier to game development.

Tencent launches 'Project Craft', an AI game creation platform that generates playable games from natural language prompts
Supports both 2D and 3D games, with a full AIGC pipeline and over 20,000 free assets

Creative Design WorkBuddy is Here! Tencent Releases AI Agent Creative Studio Miora

2026-05-29

Tencent has released Miora, an AI-powered creative studio that integrates image, video, UI/UX, and 3D generation. It features a memory system, multi-modal canvas, and customizable Skills, aiming to enable one person to have a whole creative studio.

Tencent launches Miora, a creative AI agent studio
Supports generation of images, videos, UI/UX, and 3D content

AI Agent Permissions: The Missing Layer Between "Works" and "Safe"

2026-05-29

This article examines the security risks of AI coding agents like Claude Code, including command misinterpretation, credential exfiltration, and prompt injection. It highlights the problem of 'permission fatigue' in human oversight, and discusses mitigation strategies such as sandboxing, auto mode, and hooks, emphasizing the need for dev containers and least-privilege principles.

AI agents executing natural language commands can cause disasters like data deletion and credential leaks; human supervision is not foolproof.
Anthropic's telemetry shows users approve ~93% of permission prompts, indicating significant permission fatigue.

One Graph, Many Native Surfaces: Speculating on AI and Cross-Platform Apps

2026-05-29

AI may change cross-platform app development from one UI framework to one product graph with native surface outputs.

Cross-platform frameworks share code but often at the cost of native feel.
AI agents may work better natively, requiring a shared source of intent.

PromptLayer: Trace AI requests, workflows, and costs in one timeline

2026-05-29

PromptLayer is AI observability for developers, offering a unified timeline and waterfall view to trace requests, workflows, token usage, latency, costs, and failures across multi-step AI systems. Free beta is now available.

Visualize AI workflows with timeline and waterfall views
Track token usage, latency, and costs

When AI Starts Writing Systems Code

2026-05-29

Exploring the implications of AI-generated systems code.

AI writing systems code could boost productivity but raises reliability and security concerns.
New verification and testing methods are needed to ensure correctness.

CodePulse – token-efficient codebase indexer for AI coding tools

2026-05-29

CodePulse is an open-source codebase indexer that saves 60-80% of token budget for AI coding assistants by maintaining a persistent, git-diff-aware index and injecting a compact snapshot at session start. It supports Claude Code, OpenAI Codex CLI, Cursor, and other tools, with features like task-aware ranking, git-aware ranking, and auto budget. It offers CLI, MCP server, and multiple integration methods.

Saves 60-80% of exploration tokens for AI assistants via pre-built snapshots.
Supports multiple AI tools: Claude Code, Codex CLI, Cursor, etc.

Show HN: Open-source toolkit for AI memory that scales

2026-05-29

Lithium is a hierarchical versioned storage engine built on PostgreSQL ltree, offering deterministic, scoped retrieval, built-in versioning, and zero runtime dependencies. It integrates with AI tools via MCP server, suitable for AI agent memory, decision tracking, and more.

Hierarchical versioned storage using PostgreSQL ltree, faster than graph databases
TypeScript API with scoped retrieval and built-in versioning

UI tests are the guardrails an AI needs: the story of clipboardwire

2026-05-29

The author, frustrated by clipboard sync issues under Wayland, used Claude Code to rewrite the Java project ClipCascade in Rust, creating the lightweight binary clipboardwire. The key insight: the bottleneck was the quality of feedback the AI received, and UI tests became the guardrails that enabled reliable iteration.

Without tests, AI-generated code can chase bugs in circles, fixing one while breaking another.
Investing in a comprehensive test suite (including UI tests) transformed the AI's reliability and speed.

Financial AI That Investigates Macro Trends: EU Economic Analysis with You.com and Langchain

2026-05-29

This article describes a macroeconomic research agent built with Deep Agents, LangSmith, and the You.com Finance Research API. It analyzes GDP data across all 27 EU member states, detects anomalies, and produces a cited briefing in approximately 45 minutes. The report details the anomalous growth in Ireland and contraction in Germany, emphasizing the importance of traceability and auditability.

The AI agent analyzes GDP data for all 27 EU countries in about 45 minutes at an API cost of roughly $2.20.
Ireland's 12.3% GDP growth is driven by pharma export front-loading, while Germany faces structural contraction from automotive and construction sectors.

A Progress-Aware Leader-Follower Midair Docking System for Dual-Drone Aerial Manipulation

2026-05-29

This paper presents a dual-drone docking platform where two quadrotors operate in a leader-follower formation and dock using a lightweight modular frame with passive magnetic latching. A progress-aware mission supervisor manages phase transitions: approach, alignment, capture, and settle. The platform integrates a complete hardware-software stack (ROS 2 with Crazyflie/PX4 interfaces) and is evaluated in simulation and real-world experiments using quantitative metrics such as formation error, docking success rate, and time-to-dock.

Dual-drone midair docking platform with leader-follower formation and passive magnetic latching.
Progress-aware mission supervisor overseeing approach, alignment, capture, and settle phases.

The Open Motion Planning Library 2.0

2026-05-29

The Open Motion Planning Library (OMPL), first released in 2008, has become a cornerstone of the motion planning community, providing implementations of a wide range of state-of-the-art sampling-based algorithms. Over almost two decades of continuous development, OMPL 2.0 targets real-time motion planning through hardware acceleration and integrates seamlessly with modern AI research workflows.

OMPL 2.0 is a major upgrade focusing on real-time motion planning and hardware acceleration.
The new version integrates with modern AI research tools for more efficient workflows.

Human-in-the-Loop Swarms: A Bionic Swarm Approach to Real-World Soil Mapping

2026-05-29

This paper introduces the 'Bionic Swarm,' a human-in-the-loop system that lowers barriers to real-world validation of swarm robotics. It uses a smartphone web-app, Bluetooth sensors, and a centralized server to direct human users. The Score-Biased-Search algorithm for soil mapping demonstrates superlinear map reconstruction in both simulations and outdoor experiments.

Bionic Swarm system reduces hardware cost and development time by delegating difficult tasks to humans.
Score-Biased-Search algorithm assigns scores to map locations for efficient soil mapping.

S3Mem: Structured Spatiotemporal Scene-Event Memory for Long-Horizon Interactive Question Answering

2026-05-29

This paper proposes S3MEM, a structured scene-event memory framework for long-horizon interactive question answering. By writing trajectories into structured memory units, using anchor-sensitive retrieval, and exposing a compact token-budget-aware evidence interface, S3MEM significantly improves accuracy and efficiency in answering questions about early events. Experiments on multiple environments show that S3MEM achieves a better accuracy-efficiency frontier than existing methods.

S3MEM writes trajectories into structured memory units and retrieves evidence via anchor-sensitive retrieval with token-budget awareness.
It outperforms Vanilla RAG across Crafter, Jericho, SciWorld, and ALFWorld, surpassing Graph-NoReader on three environments while using fewer evidence tokens.

Self-Play Reinforcement Learning under Imperfect Information in Big 2

2026-05-29

This paper studies self-play reinforcement learning in the four-player imperfect-information card game Big 2. PPO outperforms Monte Carlo Q approximation, SARSA, and Q-learning against various opponents. Moderate entropy regularization improves PPO by preventing overdeterministic policies, and current-policy self-play provides a stronger finite-budget curriculum than alternatives.

Self-play RL framework developed for Big 2, a four-player imperfect-information game.
PPO consistently outperforms value-approximating methods across opponent types.

Ruby inventor Matz working on native compiler with AI help

2026-05-29

Yukihiro Matsumoto, creator of Ruby, is building Spinel, an experimental ahead-of-time compiler for Ruby with AI assistance from Anthropic's Claude. Spinel compiles Ruby to C code, achieving significant performance gains but with many limitations including unsupported features like eval and threads.

Matz uses Anthropic's Claude Code to develop Spinel, an AOT compiler for Ruby.
Spinel converts Ruby AST to C code, resulting in 11.6x faster execution than MiniRuby.

How to optimize your AI token usage

2026-05-29

repo-brain is an open-source tool that compresses an entire codebase into a single Markdown context file, achieving up to 96% compression and significantly reducing AI token usage. It supports static analysis, architecture analysis, semantic relationships, and multiple AI providers.

Compress entire codebase into a single Markdown context file to reduce AI token usage
Achieved 96% compression on a 262-file repo (154,229 to 6,487 tokens)

Anthropic raises $965B Series H, releases Opus 4.8 and Dynamic Workflows/ultracode

2026-05-29

Anthropic raises $65B in Series H at $965B post-money valuation and reports $47B run-rate revenue, while releasing Claude Opus 4.8 with improved judgment and honesty, and launching Dynamic Workflows for parallel multi-agent tasks in Claude Code.

Anthropic raised $65B at $965B valuation, led by Altimeter, Dragoneer, Greenoaks, and Sequoia
Opus 4.8 delivers sharper judgment, more honesty, and efficiency gains, beating GPT-5.5 on several benchmarks

ReadyToTalk – AI receptionist for small businesses, built solo with AI agents

2026-05-29

ReadyToTalk is an AI receptionist designed for small businesses. It answers every call in under 2 seconds, provides 24/7 coverage, supports 30+ languages, and learns your business from your website. Priced at $39/month with a 7-day free trial, it requires no technical skills to set up.

Answers every call in under 2 seconds, 24/7/365.
Supports 30+ languages with automatic language detection.

Dis Dat – Loom for AI coding agents

2026-05-29

Dis Dat is a tool that lets you visually show anything to your AI coding agent, enhancing communication and code generation. It positions itself as 'Loom for AI agents'.

Enables visual demonstration for AI coding agents
Simple web-based interface for instant sharing

Are AI PowerPoint Tools Worth Using?

2026-05-29

The article examines the limitations of Genspark, an AI presentation tool, and presents six alternatives for 2026, including Smallppt, Plus AI, Prezi, Vector Shift, Beautiful.ai, and ClickUp, each with distinct strengths to help users choose based on their needs.

Genspark has security vulnerabilities, poor customer support, and limited content flexibility.
Smallppt and Beautiful.ai focus on quick professional slide creation with strong design automation.

Show HN: theta-spec - a humble harness agnostic configuration spec

2026-05-29

theta-spec is a declarative, harness-agnostic configuration standard for AI coding agents. A single theta.toml file defines the full configuration surface (instructions, rules, tools, skills, subagents). A protocol is specified for the lifecycle of this configuration file, and any theta-spec compliant implementation can resolve, lock, and cast it to any supported harness. The project includes a reference Rust CLI (theta) and supports harnesses like Claude Code, Codex CLI, Cursor, and GitHub Copilot.

Declarative, harness-agnostic config standard for AI agents.
Supports Claude Code, Codex CLI, Cursor 3+, GitHub Copilot.

AI and the End of Recessions as We Know Them

2026-05-29

Ken Griffin did a 180 on AI after seeing agents complete complex work in hours. This raises concerns about GDP growth without job growth, challenging the traditional use of GDP as an economic health indicator.

Ken Griffin initially dismissed AI output as 'garbage' but later reversed his stance.
AI agents completed work in hours that took Citadel employees weeks or months.

How Together AI built the world’s fastest speech-to-text stack

2026-05-29

Together AI built the fastest speech-to-text stack on Artificial Analysis by treating ASR as a full-path systems problem, not just a GPU inference problem. This article details optimizations including TensorRT multi-profile encoders, conditional CUDA graphs, shared memory, evented I/O, and gc.freeze() to eliminate tail latency.

Together AI achieved fastest STT by optimizing the entire system path, not just GPU inference.
Key techniques: TensorRT multi-profile encoders, conditional CUDA graphs, zero-copy shared memory, and evented I/O.

Reinforcement Learning is an Infrastructure Problem

2026-05-29

This article explores the practical application of reinforcement learning in post-training large language models, highlighting that the current bottleneck is infrastructure rather than algorithms. Modal shares its experience running RL post-training at scale and introduces its open-source library to help teams address key challenges like multi-node training, environment management, and GPU utilization.

The bottleneck for RL post-training LLMs is infrastructure, including training engines, inference sandboxes, and environment isolation.
Multi-node training makes weight synchronization costly; RDMA and delta compression significantly reduce latency.

I built a memory system for AI that abstracts like the brain, not a database

2026-05-28

Serenity is an open-source, local AI agent that uses a brain-inspired memory architecture called Neural Node Network. It remembers causal relationships, reasons across domains, operates autonomously, and runs entirely on your machine without cloud dependencies.

Neural Node Network encodes experiences in causal format, enabling contextual understanding
Operates 100% locally with Ollama, ensuring privacy and no cloud dependencies

Liquid AI Releases LFM2.5-8B-A1B: An On-Device MoE Model With 8.3B Total and 1.5B Active Parameters

2026-05-28

Liquid AI has released LFM2.5-8B-A1B, an on-device Mixture-of-Experts model designed for tool calling. With 8.3B total parameters but only 1.5B active per token, it runs on consumer hardware. It features a 128K context window, reasoning capabilities, and nine-language support. Benchmarks show significant improvements over its predecessor, including a jump in non-hallucination rate from 7.46 to 63.47.

LFM2.5-8B-A1B activates only 1.5B of 8.3B total parameters per token, enabling efficient on-device inference.
Supports 128K context length and covers nine languages, including Arabic, Chinese, and Japanese.

AI WordPress and Compliance and Ad Tracking in One Place

2026-05-28

A software that combines AI, WordPress, compliance, and ad tracking, offering free affiliate marketing cheat sheets.

All-in-one solution integrating AI, WordPress, compliance, and ad tracking
Provides free affiliate marketing cheat sheets

Open House observability announcements: MCP server, AI Notebooks, and ClickStack Cloud

2026-05-28

Key announcements from Open House: ClickStack Cloud (fully managed serverless observability) enters private preview, Managed ClickStack reaches GA, AI Notebooks (structured investigative workspace) enters beta, and the ClickStack MCP server is open-sourced for external agents.

ClickStack Cloud private preview offers a fully managed serverless observability platform.
Managed ClickStack is now generally available for teams wanting operational control.

AI coding is at L3 autonomy, but infrastructure is stuck at L1

2026-05-28

AI-powered coding tools have reached advanced autonomy, enabling anyone to build software, but the underlying infrastructure remains outdated, leading to inefficiencies. A new AI-native operating system is needed.

AI coding tools like Claude Code and Cursor are at L3-L4 autonomy.
Infrastructure lags at L1-L2, with isolated agents and idle resources.

/monitor by Firecrawl: Web Change Detection for AI Agents

2026-05-28

Firecrawl launches /monitor, a web monitoring tool that notifies AI agents when pages change via webhook, reducing LLM token usage by up to 90%.

Firecrawl's /monitor lets users specify a URL and tracking description in plain English, automatically detecting changes and notifying agents.
By only ingesting changed content, it reduces token usage by up to 90% compared to full-page rescraping.

The Case Against the AI Thought Partner

2026-05-28

This article argues that using AI chatbots as 'thought partners' can be harmful due to sycophancy, cognitive bias amplification, and lack of adversarial balance. The author warns users to be cautious and calls for labs and regulators to protect cognitive integrity.

AI chatbots tend to sycophantically agree with users, reinforcing biases.
Human-AI feedback loops amplify cognitive biases more than human-human interactions.

AI is changing this job so fast the interview process can't keep up

2026-05-28

The rise of AI in software engineering has rendered traditional interview processes obsolete. While AI tools are now integral to daily coding work, most companies still ban AI in interviews, creating a mismatch between tested skills and actual job requirements. Some employers are adopting new approaches, but the problem remains largely unsolved.

AI has become essential for software engineers, but interview processes have not adapted.
Traditional coding tests fail to evaluate AI collaboration and high-level decision-making.

Perplexity launches Bumblebee: How its new read-only dev scanner differs from Chainguard

2026-05-28

Perplexity released an open-source developer security tool called Bumblebee, designed to scan programmers' laptops for risky packages, extensions, and AI tool configurations. It is read-only, never runs install scripts or package managers, and focuses on four attack surfaces: language package managers, AI agent configs, editor extensions, and browser extensions. Unlike Chainguard, which focuses on containers and pipelines, Bumblebee targets the developer's local environment.

Bumblebee is Perplexity's open-source read-only scanner for checking developer machines for risky components.
It covers four surfaces: language package managers, AI agent configs, editor extensions, and browser extensions.

A New Era of Innovation: Google Research at I/O 2026

2026-05-28

At Google I/O 2026, Google Research showcased breakthroughs in scientific discovery, health, edge computing, and weather prediction. Highlights include Gemini for Science (ERA, Co-Scientist), Google Health app, Symptom AI, AMIE, Coral NPU, and AI for extreme weather. These innovations demonstrate AI's potential to amplify human ingenuity.

Google launched Gemini for Science with ERA and Co-Scientist to accelerate scientific discovery.
Health advancements include Google Health app, Symptom AI, and AMIE improving clinical care.

Build a custom portal with embedded Amazon SageMaker AI MLflow Apps

2026-05-28

Learn how to build a custom portal embedding SageMaker AI MLflow Apps UI using a React frontend and Flask reverse proxy with AWS SigV4 authentication, deployed via AWS CDK. This solution provides a persistent, bookmarkable URL for MLflow without requiring presigned URLs or AWS Console access.

React frontend with Flask reverse proxy for SigV4 authentication.
Deploy via AWS CDK with automated setup.

Evaluating Deep Agents using LangSmith on AWS

2026-05-28

This post combines learnings from LangChain’s work on evaluating deep agents and Anthropic’s guide to demystifying evals for AI agents into a practical guide. You will learn how to apply five evaluation patterns for deep agents, build offline evaluations using pytest and LangSmith, and configure online monitoring for production. The walkthrough uses a text-to-SQL deep agent with Amazon Bedrock for the full development to production lifecycle.

Agent evaluations face challenges: non-determinism, error propagation, and creative solutions.
Introduces three grader types: code-based, model-based (LLM-as-judge), and human graders, with recommendations for combining them.

Neocloud Vendor CoreWeave Builds Up Software Stack

2026-05-28

With the launch of new agentic AI capabilities, the startup is using software acquisitions to develop an AI hardware-software stack for agent training and inference.

CoreWeave launches new agentic AI capabilities
Uses software acquisitions to build an AI hardware-software stack

AI used to identify miscreant judge

2026-05-28

A federal judge's anonymous misconduct report was quickly deanonymized by AI models, revealing Judge Eleanor Ross. The judiciary's naive anonymization efforts failed against AI's ability to cross-reference public details. This case highlights the urgent need for lawyers to understand AI's capabilities in both maintaining confidentiality and investigative tasks.

AI identified Judge Eleanor Ross from an anonymized report within minutes.
Details like two-year clerk terms and 'District Attorney' references enabled AI to narrow down.

How enterprise leaders are scaling AI agents across their organization

2026-05-28

Enterprise leaders share five practices for scaling AI agents responsibly, including unified governance, complex workflow management, dedicated sandboxes, early wins, and workforce upskilling.

Embed unified governance into AI agent strategy
Manage complex workflows with orchestrated multi-agent frameworks

The AI Resist List

2026-05-28

A curated list of global resistance movements against large-scale AI empires, featuring protests, legal actions, alternative tools, and community organizing to inspire hope and action.

AI empires disguise resource consolidation and control as benefiting humanity.
Resistance takes many forms: lawsuits, data poisoning, community campaigns, and worker organizing.

Unity Catalog and the next era of Apache Iceberg™

2026-05-28

Databricks positions Unity Catalog as the most comprehensive, interoperable, and production-ready Apache Iceberg catalog, with Managed Iceberg, Iceberg v3, and Foreign Iceberg now GA. Five key capabilities: open APIs, catalog federation, cross-engine access control, zero-copy secure sharing, and AI-driven optimization. Future Iceberg v4 and Delta 5.0 will converge on unified metadata structure.

Unity Catalog now supports Managed Iceberg, Iceberg v3, and Foreign Iceberg in GA.
Five key capabilities: open APIs, catalog federation, cross-engine ABAC, zero-copy secure sharing, and AI-driven optimization.

The Age of Async Agents — Cognition's Walden Yan & OpenInspect's Cole Murray

2026-05-28

The article explores the shift from tightly coupled local developer workflows to asynchronous background agents in AI coding, highlighting the December 2025 model inflection that made spec-to-PR workflows practical, and delving into the architecture, security, testing, memory, and multi-agent orchestration behind Devin and OpenInspect.

Background agents are becoming mainstream; Devin's merged PR share grew from 16% to 80% on Cognition repos.
The December 2025 model upgrades (Opus 4.5/GPT 5.2) enabled agents to autonomously go from specification to a complete pull request.

Why AWS scrapped OpenSearch’s architecture to chase agent workloads

2026-05-28

AWS launched a near-total rebuild of OpenSearch Serverless to handle bursty agent workloads, separating storage and compute to scale to zero, cut costs by 60%, and auto-scale 20x faster. New features include GPU acceleration, search/vector collections, integrations with Vercel and Kiro IDE, and a roadmap for agent memory and log analytics.

AWS rebuilt 97% of OpenSearch Serverless with a new storage layer separating storage and compute, enabling zero-cost idle scaling.
The new architecture targets AI agent burst workloads with 20x faster auto-scaling and 60% cost savings.

AWS Rebuilds OpenSearch Serverless, Intros Agent Skills

2026-05-28

The update positions OpenSearch as foundational infrastructure for enterprises, enabling faster, scalable search.

AWS rebuilds OpenSearch Serverless
Introduces Agent Skills

Build a test suite that grows with your agent with dataset management in Amazon Bedrock AgentCore

2026-05-28

Agent evaluation is most powerful when combining fast-moving online signals with stable offline baselines. Amazon Bedrock AgentCore's dataset management provides versioned test fixtures, enabling consistent measurement and ground truth verification.

Versioned datasets in AgentCore provide stable, immutable test scenarios for consistent agent evaluation across runs.
Predefined scenarios capture exact expected inputs, tool sequences, and assertions for verifiable ground truth.

SIA: The Open Source Self Improving AI

2026-05-28

SIA is an open-source self-improving AI framework that autonomously boosts AI system performance on benchmark tasks by coordinating meta, target, and feedback agents. It achieves significant gains: 56.6% on LawBench, 91.9% runtime reduction on GPU kernels, 502% improvement on scRNA denoising, and ranks #1 on MLE-Bench Hard. Supports local execution and custom tasks. MIT licensed.

SIA uses an iterative loop of meta, target, and feedback agents for autonomous self-improvement.
Achieves substantial performance gains across LawBench, GPU kernel optimization, scRNA denoising, and MLE-Bench.

Micron Hits $1T on AI Memory Boom

2026-05-28

Micron crossed $1 trillion market cap on May 26-27, joining SK Hynix in the same week as the first pure-play memory chipmakers to enter the trillion-dollar club. Driven by HBM demand from agentic AI workloads, UBS tripled its price target to $1,625 citing long-term supply contracts. Micron stock has more than tripled year-to-date.

Micron and SK Hynix both hit $1T market cap in the same week, a first for pure-play memory chipmakers
Agentic AI workloads driving record HBM demand

AI Agent Frameworks Comparison

2026-05-28

As of mid-2026, seven major AI agent frameworks (DSPy, Claude Agent SDK, OpenAI Agents SDK, CrewAI, AutoGen, LangGraph, Google ADK) vary in design philosophy, architecture, production readiness, etc. LangGraph leads in production deployments, Claude Agent SDK offers deepest single-provider capabilities, OpenAI Agents SDK provides cleanest multi-agent handoffs, and CrewAI excels in developer velocity. The market is projected to grow from $7.84B in 2025 to $52.62B by 2030.

LangGraph has the most mature durable execution model, deployed by ~400 enterprises.
Claude Agent SDK offers the most powerful single-provider capabilities but is locked to Anthropic models.

Anthropic launches Opus 4.8, with honesty as its killer feature

2026-05-28

Anthropic's latest Claude model, Opus 4.8, emphasizes honesty—making fewer unsupported claims and admitting uncertainty more often. It also introduces dynamic workflows for orchestrating hundreds of subagents on large-scale tasks. Pricing remains unchanged for standard mode, while fast mode gets cheaper.

Claude Opus 4.8 shows significant honesty improvements, with error rates dropping about 4x
Dynamic workflows can plan and run hundreds of parallel subagents, verifying outputs before reporting back

Automate AML alert triage with Amazon Quick and Snowflake Cortex AI

2026-05-28

This post demonstrates that integration in action by automating one of the most labor-intensive workflows in financial services: anti-money laundering (AML) alert triage. You will build a triage workflow using Amazon Quick Flows and Snowflake Cortex, connected through the Amazon Quick Model Context Protocol (MCP) integration. In our testing environment, automated workflows built using Amazon Quick reduced alert investigation time from 30-90 minutes to under 5 minutes. Actual results may vary based on alert complexity and data volume.

Amazon Quick Flows and Snowflake Cortex integrate via MCP to automate AML alert triage.
Automated workflows reduced investigation time from 30-90 minutes to under 5 minutes.

Data Formulator 0.7: AI-powered data analytics for enterprise data

2026-05-28

Data Formulator 0.7 is an open-source AI-powered system for enterprise data analytics that combines data connectivity, agent-guided exploration, and visualization refinement in a shared workspace.

Open-source AI system for enterprise data analytics
Data Connectors support governed, reusable connections across diverse data sources

Serverless 2.0: Three Ways to Run Inference, One API

2026-05-29

Fireworks AI launches Serverless 2.0, offering Standard, Priority, and Fast inference paths through a single API without reserved capacity. The Priority path provides stronger request admission under congestion, while the Fast path delivers roughly 2x throughput. The update also clarifies error codes by separating load shedding (503) from rate limits (429), improving retry logic and alerting.

Serverless 2.0 introduces three serving intents: Standard (default), Priority (stronger admission under load), and Fast (higher token throughput).
Priority achieved 0% 503 error rate in peak-load testing versus 0.082% for Standard.

Anthropic raises $65B in Series H funding at $965B post-money valuation

2026-05-28

Anthropic announces $65 billion Series H funding led by Altimeter Capital, Dragoneer, Greenoaks, and Sequoia Capital, with a post-money valuation of $965 billion. The company plans to use the funds to advance AI safety research, expand compute capacity, and scale product development.

Anthropic raises $65 billion in Series H at $965 billion valuation
Run-rate revenue crosses $47 billion as enterprise adoption grows

Introducing Dubbing v2: our revolutionary new dubbing model

2026-05-28

Today we’re launching Dubbing v2, our revolutionary new AI dubbing model. For the first time, the emotion and performance of the original speaker carries across every language. Instead of generating flat, disconnected audio from a transcript alone, Dubbing v2 conditions directly on the original performance - preserving tone, pacing, delivery, and emotional intent. This solves one of the biggest unsolved problems in AI dubbing: making translated speech feel like the original person actually said it.

Dubbing v2 preserves the original speaker's emotion and performance across 90+ languages
Conditions on original audio, not just transcript, for natural delivery

Research

Show HN: I built an AI medical-records hub after my mom's cancer diagnosis

2026-05-29

KeptWell is an AI-powered platform that helps families organize, understand, and share medical records. It extracts key information, tracks lab trends, generates appointment questions, and enables family collaboration. Privacy-focused, no ads, and data exportable.

Built by founder after mom's cancer diagnosis to simplify medical info management.
Supports upload of PDFs, images, and voice recordings; AI extracts key findings and lab values.

Study: AI responses to healthcare queries are nearly 76% accurate

2026-05-29

A new study led by Penn State researchers found that AI-powered chatbots answer everyday health questions with nearly 76% accuracy, raising concerns about their trustworthiness in real-world applications. The study, which involved a Diagnose-a-thon competition and evaluation by board-certified physicians, found that AI performed best in obstetrics and otolaryngology, but poorly in internal medicine, neurology, and dermatology. Researchers suggest AI tools may be more useful for physicians than patients.

LLM responses to health queries were 76.2% accurate overall, but error rates exceeded 20%, roughly double that of human physicians.
AI performed best in obstetrics/gynecology and otolaryngology, and worst in internal medicine, neurology, and dermatology.

StoryScope: Investigating Idiosyncrasies in AI Fiction

2026-05-29

A new study introduces StoryScope, a method that distinguishes AI-generated from human-written stories by analyzing narrative structure rather than writing style. Using a corpus of 61,608 stories with 304 features each, the approach achieves 93.2% macro-F1 for human vs. AI detection and reveals distinct narrative fingerprints for different LLMs like Claude, GPT, and Gemini.

StoryScope extracts discourse-level narrative features (e.g., character agency, temporal discontinuity) to differentiate AI fiction from human writing, without relying on stylistic cues.
On 61,608 stories (~5,000 words each), narrative features alone achieve 93.2% macro-F1 for human vs. AI detection and 68.4% for six-way authorship attribution.

An AI Audit of FreeBSD

2026-05-29

Researchers conducted an AI-assisted security audit of the FreeBSD kernel, uncovering 15 bugs including 5 local privilege escalations and a bhyve guest-to-host escape. They published exploits for three LPEs and shared their methodology to help maintainers.

AI-powered audit of FreeBSD kernel found 15 vulnerabilities
Includes 5 LPEs, 1 VM escape, memory disclosures, and DoS

Evidence that the first papal encyclical on AI was substantially written by AI

2026-05-29

The article presents multiple lines of evidence, including statistical analysis of punctuation and word usage, and results from an AI detection tool, to argue that Pope Leo's first encyclical on AI contains substantial portions written by AI, likely Claude. The author acknowledges each piece of evidence might be explained away but argues the consilience is hard to dismiss.

The encyclical uses em-dashes and the word 'genuinely' at rates far exceeding any previous encyclical.
AI detection tool Pangram flagged several paragraphs as 40-100% AI-generated, while none of the backtested past encyclicals were flagged.

Extreme dynamic symmetry enables omnidirectional and multifunctional robots

2026-05-29

Researchers propose dynamic symmetry, quantified by dynamic isotropy, as a measure of uniformity in a robot's attainable center-of-mass accelerations. Through simulations and physical experiments, high dynamic symmetry improves trajectory tracking, task success, robustness, resilience, and energy efficiency. The Argus family of spherical robots, especially a 20-legged variant with near-extreme dynamic isotropy, demonstrates orientation-invariant locomotion, agile terrain traversal, rapid self-stabilization, and resilience to actuator failures.

Dynamic symmetry is defined as uniformity of a robot's attainable center-of-mass accelerations, measured via dynamic isotropy.
Over 1,000 simulated morphologies show high dynamic symmetry consistently improves performance, with benefits peaking near the theoretical limit.

Seeing through boxes: Non-Line-of-Sight 3D Reconstruction from Radar Signals

2026-05-29

This paper introduces GeRaF 2.0, a unified framework integrating Line-of-Sight (LoS) and Non-Line-of-Sight (NLoS) neural geometry reconstruction, leveraging LoS geometry to guide RF propagation for stable and physically consistent 3D reconstruction of hidden scenes, achieving state-of-the-art results.

RF signals can penetrate occlusions but suffer from low resolution and noise.
Existing NLoS reconstruction methods ignore LoS constraints, causing unstable optimization and surface ambiguity.

Lightweight Complementary-Cue Fusion for Robust Video Face Forgery Detection

2026-05-29

This paper proposes two lightweight face forgery detectors, LFWS and LFWL, built on Xception (21.9M params) by adding a fusion module with only 292 extra parameters. They combine wavelet-denoised features with phase spectrum or local binary patterns, boosting AUC by 3.8% and 4.4% on FaceForensics++ and DFDC-Preview, respectively, outperforming larger models like F3Net and SRM across eight benchmarks.

LFWS and LFWL add only 292 parameters to Xception, keeping total at 21.9M, smaller than F3Net (22.5M) and less than half of SRM (55.3M).
AUC improves from 74.8% to 78.6% on FaceForensics++ and from 70.5% to 74.9% on DFDC-Preview, gains of 3.8% and 4.4%.

A Deep Learning Iterative Framework for Sentinel-1 Stripmap Enhancement Based on Azimuth Doppler Decomposition

2026-05-29

This paper proposes a self-supervised enhancement framework for Sentinel-1 Stripmap SAR imagery using azimuth subaperture decomposition. It generates training data without external sensors or simulated ground truth, integrates single- and multi-frame learning, and employs iterative inference. Experiments show it outperforms MERLIN in PSNR and SSIM, while MERLIN achieves higher ENL, highlighting a trade-off between structural fidelity and speckle smoothing.

Self-supervised SAR enhancement via azimuth subaperture decomposition
No external sensors or simulated ground truth needed

Auditing Training-Free 3D Shape Retrieval with Diffused Geodesic Moments

2026-05-29

This paper audits evaluation protocols for training-free shape descriptors by introducing Diffused Geodesic Moments (DGM). Experiments show that Geometric Moment Shape Descriptor based on Heat Kernel Signature (GMSD-HKS) achieves the highest scores on FAUST-Reg and TOSCA, while Wave Kernel Signature (WKS) remains strong. DGM is valuable for sparse or non-spectral applications. The work provides a reproducible protocol-cascade analysis, cross-shape alignment diagnostic, and recommendations for designing and reporting training-free descriptors.

Introduces Diffused Geodesic Moments (DGM) as a training-free descriptor for protocol audit
GMSD-HKS outperforms other methods on FAUST-Reg and TOSCA; WKS remains competitive

Bixonimania – the fake illness that AI fell for

2026-05-29

A researcher fabricated a fake skin disease to test AI, and the AI chatbots fell for it, highlighting the dangers of relying on AI for medical advice.

Researcher created fake disease 'bixonimania' and seeded it online.
AI chatbots like ChatGPT incorporated it as a real condition.

Show HN: Trelk – Read, Think, Connect

2026-05-29

Trelk is a one-time purchase, privacy-first app that uses on-device AI to save, organize, and connect articles, papers, and notes. Features include hybrid search, knowledge graph, RAG chat, flashcard spaced repetition, and community collections.

One-time purchase, no subscriptions
On-device AI-powered knowledge management and connection

A shared playbook for trustworthy third party evaluations

2026-05-29

OpenAI shares guidance on third-party AI evaluations, covering how to assess model capabilities, safeguards, and validity for frontier systems.

OpenAI publishes framework for third-party evaluations.
Focus on capabilities, safeguards, and validity.

To Gen or Not to Gen: The Ethical Use of Generative AI

2026-05-28

This article by Johannes Link and Jakob Schnell explores the ethical dimensions of generative AI (GenAI), focusing on large language models. It highlights both promises and harms, including ecological impact, misinformation, threats to education and democracy, and digital colonialism. The authors argue for a balanced, informed approach that weighs benefits against risks, often requiring trade-offs.

GenAI has significant downsides: massive energy use, e-waste, misinformation, and IP issues.
LLMs lack true reasoning and are prone to hallucinations; they cannot distinguish truth from falsehood.

AI is changing how we think, not replacing it | Letters

2026-05-28

Richard Thackeray and Phil Snell respond to an article by Wendy Liu on using artificial intelligence, arguing that AI enhances curiosity rather than diminishing it.

Wendy Liu raises concerns about labour redundancies, hype, and environmental cost of AI.
Richard Thackeray, a heavy AI user, finds AI makes him more curious and enables exploration of new territory.

How to force Google AI Overviews to prioritize your favorite news sources

2026-05-28

Google's Preferred Sources feature is now available in AI Overviews and AI Mode, allowing you to add your favorite sites to appear more prominently in AI-powered searches, along with new carousel and 'Highly Cited' badges.

Google's Preferred Sources feature now works with AI Overviews and AI Mode.
You can add favorite news sites to make them more prominent in AI search results.

Models

Claude Opus 4.8: A Smarter Model in the Right Direction

2026-05-29

Anthropic's Claude Opus 4.8 prioritizes reliability, honesty, and agentic workflows over raw intelligence. Pricing remains unchanged, but fast mode is significantly cheaper.

Claude Opus 4.8 focuses on reliability and uncertainty handling rather than raw intelligence.
Standard pricing remains at $5/$25 per million tokens; fast mode is three times cheaper.

New review paper argues code is how AI agents think and act, not just what they produce

2026-05-29

A new review paper argues that the real bottleneck for autonomous AI agents is the software layer around the language model—tools, memory, testing, and permissions. DeepSeek is building a dedicated 'Harness' team in Beijing, confirming the formula: model + harness = AI agent.

The paper claims the bottleneck for AI agents is the software harness, not the model.
Key components include tools, memory, testing, and permission boundaries.

How Braintrust turns customer requests into code with Codex

2026-05-29

How Braintrust engineers use Codex with GPT-5.5 to run experiments and code faster.

Braintrust uses Codex to generate code from customer requests
Integrates GPT-5.5 for faster experimentation

Open Source Ecosystems

2026-05-29

The article discusses the limitations of open-weight AI models and open protocols as open source strategies, using Anthropic's acquisition of Stainless as a case study to illustrate complement capture and moat migration in AI infrastructure. It argues that the developer experience layer is being consolidated by platform giants, creating new competitive advantages, and emphasizes the need to analyze dependencies within the ecosystem to identify potential chokepoints.

Open-weight models as open source strategy face limitations due to hardware requirements and monolithic architectures.
Anthropic's acquisition of Stainless exemplifies complement capture, where the layer around an open protocol is privatized.

Anthropic releases Claude Opus 4.8

2026-05-29

Anthropic has released Claude Opus 4.8, an upgrade to Opus 4.7 with improvements in coding, agent work, reasoning, and knowledge work. New features include effort control, dynamic workflows, and live Messages API updates. Pricing remains unchanged at $5/$25 per million tokens for standard and $10/$50 for fast mode (2.5x speed). Early testers report cost parity with GPT-5.5 and fewer tool steps. The company also outlined its roadmap including Mythos-class models and Project Glasswing for cybersecurity.

Claude Opus 4.8 improves on Opus 4.7 in coding, agent work, reasoning, and knowledge work.
New features: effort control, dynamic workflows, and live Messages API updates.

Image Empire – a new short film from Alan Warburton

2026-05-29

Image Empire is an animated fairytale about the fusion of the real and the virtual within contemporary AI models. The film forms part of a research project undertaken by Alan Warburton which also includes a research paper and a series of satellite events.

The film is based on doctoral research at Birkbeck's Vasari Centre for Art & Technology.
Commissioned by the National Videogame Museum in collaboration with ODI and Cambridge's Leverhulme Centre for the Future of Intelligence.

Opus 4.8 Killer: NexusCortex Isn't an LLM – It's a Sparse AI Cortex Built in Go

2026-05-29

NexusCortex is a sparse AI cortex system built in Go, distinct from traditional LLMs. It leverages sparse computation for efficient inference, potentially rivaling Opus 4.8.

NexusCortex is a sparse AI cortex, not an LLM
Built in Go for performance

Hexo Labs Open-Sources SIA: A Self-Improving Agent That Updates Both the Harness and the Model Weights

2026-05-29

Hexo Labs released SIA, an open-source self-improving loop, under an MIT license. A Feedback-Agent reads each run's trajectory, then either rewrites the scaffold or triggers a LoRA weight update on gpt-oss-120b. Combining both levers beat scaffold-only iteration on LawBench, TriMul GPU kernels, and scRNA-seq denoising.

SIA is the first self-improving loop that edits both an agent's scaffold and its model weights.
On LawBench, combining weight updates boosted accuracy from 50.0% (harness-only) to 70.1%.

Phase-Conditioned Imitation Learning with Autonomous Failure Recovery for Robust Deformable Object Manipulation

2026-05-29

This paper presents a phase-conditioned, force-aware framework for robust deformable object manipulation. Using FiLM-conditioned ACT encoder and multi-modal phase predictor, the system autonomously detects and recovers from contact failures, improving T-shirt hanging success rate from 56% to 87%.

Standard imitation learning (e.g., ACT) suffers from state aliasing due to Markovian assumption, preventing autonomous failure recovery.
The proposed framework uses FiLM-conditioned encoder to enable phase-specific behaviors in a single policy.

Decentralized LLM-Driven Coordination of Acoustic Robots for Contactless Object Manipulation

2026-05-29

This paper presents a decentralized framework that combines large language models (LLMs) with acoustic mobile robots for contactless object manipulation. Using Whisper speech recognition, LLM semantic parsing, and JSON task scheduling, the system converts spoken commands into coordinated multi-robot actions. Experiments with two TurtleBot3-based acoustic robots achieved success rates of 96% for sequential, 86% for parallel, and 70% for synchronized tasks, showcasing the potential of LLM-driven automation for human-robot interaction.

A decentralized framework integrates LLMs with acoustic robots for contactless object manipulation via natural language commands.
The system uses Whisper, LLM parsing, JSON-based task representation, and distributed scheduling to handle sequential, parallel, and synchronized tasks.

Robust Cross-Domain Generalization Using Unlabeled Target Data with Source-Domain Supervision

2026-05-29

This paper proposes a target-informed self-supervised pretraining and model-ensemble strategy that leverages unlabeled target-domain data to improve cross-device generalization of medical imaging AI. Applied to pediatric wrist fracture assessment using point-of-care ultrasound, the method achieves over 6% Dice improvement on the target domain, demonstrating a label-efficient and privacy-preserving approach.

Combines masked image modeling and contrastive learning for self-supervised pretraining without target-domain labels.
Introduces a confidence-aware infusion head to adaptively integrate predictions from source and target branches.

Embodied3DBench: Benchmarking Low-Level Embodied Spatial Intelligence of Vision Language Models

2026-05-29

Embodied3DBench targets low-level spatial intelligence in embodied 3D environments, with 6 task categories and over 21k QA pairs. Evaluations of 13 models show strong high-level reasoning but weak interaction-oriented perception. A synthesized dataset of 1.3M QA pairs significantly improves performance after fine-tuning.

Benchmark focuses on low-level embodied spatial intelligence for VLMs
Includes spatial structural understanding and interaction-oriented perception

Trajectory Constraints for Imaging Inverse Problems

2026-05-29

This paper introduces TRACE, a training-free trajectory-constrained reconstruction framework that stabilizes the reconstruction path by coupling adjacent states, improving reconstruction quality for imaging inverse problems.

TRACE stabilizes reconstruction trajectories by coupling consecutive intermediate estimates.
It models the reconstruction as a sequence of proximal updates approximated by neural networks.

GAP3D: Generative Alignment of VLM Latents to Patch-Level Embeddings for 3D Generation

2026-05-29

GAP3D introduces a modular diffusion-based approach that aligns VLM-generated latents directly to the patch-level feature space of a pre-trained image encoder, enabling frozen generative models to use VLMs as prompt encoders while preserving spatial structure. It trains primarily on image-text pairs, avoids large-scale 3D data, and demonstrates zero-shot multimodal capabilities, though it currently prioritizes high-level semantics over fine-grained detail.

GAP3D uses diffusion to align VLM latents to image encoder patch-level features.
Avoids large-scale 3D data by training on general image-text pairs.

Resolving Endpoint Underfitting in Diffusion Bridges via Noise Alignment

2026-05-29

A new approach called Noise-Aligned Diffusion Bridge (NADB) addresses underfitting near the target endpoint in diffusion bridge models, improving image restoration and translation tasks.

Current diffusion bridge models suffer from endpoint underfitting due to noise mismatch.
NADB introduces a mean network and noise-aligned mapping to correct this.

Benchmarking Open-Source Safety Guard Models: A Comprehensive Evaluation

2026-05-29

A comprehensive evaluation of 14 open-source safety guard models on a benchmark of 79,331 samples reveals that Qwen Guard (4B parameters) achieves the highest recall (83.97%), while larger models like Llama Guard (12B) miss up to 75% of unsafe content. Model size does not correlate with safety performance, and general-purpose guard models outperform specialized ones.

Qwen Guard (4B parameters) achieves the highest recall (83.97%) among 14 open-source safety guard models.
Larger models like Llama Guard (12B) and GPT-OSS Safeguard (20B) exhibit conservative behavior, missing up to 75% of unsafe content.

Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning

2026-05-29

Aryabhata 2 is a reasoning-focused language model for competitive STEM exams like JEE and NEET, fine-tuned via reinforcement learning on GPT-OSS-20B using PhysicsWallah's question banks. It achieves up to 64% fewer output tokens while outperforming the base model on multiple benchmarks.

Aryabhata 2 uses RL post-training optimized for competitive STEM exams.
Built on GPT-OSS-20B with custom training curriculum from PhysicsWallah.

Micro-Macro Retrieval: Reducing Long-Form Hallucination in Large Language Models

2026-05-29

Large language models suffer from hallucination in long-form generation. Existing retrieval-augmented models cannot ensure key information stays close to outputs. This paper proposes Micro-Macro Retrieval (M2R), a retrieve-while-generate framework that retrieves coarse-grained evidence externally and extracts key information from a reasoning-built repository, significantly reducing hallucination. It uses curriculum learning-based reinforcement learning for stable training.

LLMs are prone to hallucination in long-form generation due to redundant context and long reasoning chains
Factual accuracy increases when key information is closer to model outputs

RightNow-Arabic-0.5B-Turbo: An Open Sub-1B Arabic Language Model via Vocabulary Injection and Edge-First Deployment

2026-05-29

This paper presents RightNow-Arabic-0.5B-Turbo, a 518M-parameter Arabic-specialized LLM built on Qwen2.5-0.5B using vocabulary injection and edge-first deployment. It achieves 35.9% mean accuracy on Arabic benchmarks, outperforming all same-class open models, and ties Falcon-H1-1.5B on COPA-ar at one-third the size. The quantized model is 398 MB and delivers 635 tokens/s on a single H100, enabling efficient edge deployment.

518M-parameter Arabic LLM built on Qwen2.5-0.5B with vocabulary injection of 27,032 Arabic tokens.
Achieves 35.9% mean accuracy on three Arabic benchmarks, surpassing all same-class open-source models.

From Context Shift to Stylistic Collapse: Why Training Objectives Matter More Than Scale

2026-05-29

A new paper analyzes 17 LLMs (410M-100B+ parameters) and documents that instruction-tuned systems systematically collapse language entropy along discourse and structural dimensions (mean amplification: 1,949-16,853%, peaks: 5,181-209,675%), while suppressing complex punctuation to 3.2-23.2% of baseline. These effects do not worsen under RLHF. Weak intervention (lambda=1.0) exacerbates collapse by 240%, while strong control (lambda=5.0) achieves 40.5% improvement and outperforms frontier models by 96.7-98.2% despite 200-1000x scale disadvantage. Strong control also delivers 15% higher distinct-4, 27% higher vocabulary diversity, and 78% lower repetition than moderate regularization. The findings underscore that alignment requires sufficient control strength, not merely distributional smoothing.

Instruction tuning causes language entropy collapse along discourse and structural dimensions, with significant suppression of complex punctuation.
RLHF does not worsen stylistic collapse, but weak regularization exacerbates it.

What are They Thinking? Delineation, Probing and Tracking of Concepts in LLMs

2026-05-29

As large language models (LLMs) grow in influence, understanding their decision-making becomes crucial. This paper introduces a method to detect concepts within LLM embeddings using low-cost linear probes, enabling monitoring of what models "think" during normal operation. The authors demonstrate concept delineation, probe training, and cross-context tracking across four concepts and three LLMs, paving the way for scalable model transparency.

Proposes linear probes to detect concepts in LLM embeddings for low-cost internal monitoring.
Details dataset creation, probe training/testing, and tracking across larger contexts.

Balancing Multimodal Learning through Label Space Reshaping

2026-05-29

Multimodal learning often suffers from modality imbalance, where faster-converging modalities dominate optimization. Existing methods typically strengthen weak modalities or adjust gradients, but may compromise strong modalities. This paper proposes Balanced Multimodal Label Reshaping (BMLR), the first label-side approach to promote balance. BMLR reshapes the cross-modal label space to equalize mapping difficulty across modalities, enhancing interaction and injecting rich inter-class information. Extensive experiments show consistent improvement and compatibility.

Modality imbalance arises from differences in mapping difficulty from feature spaces to the shared label space.
BMLR is the first method to address multimodal balance from the label side.

TaxDistill: Improving Metagenomic Taxonomic Annotation via Distilled Genomic Foundation Models

2026-05-29

Metagenomic taxonomic annotation identifies microbial origins of DNA fragments. Traditional similarity-based methods struggle with high diversity and incomplete databases. TaxDistill uses a knowledge distillation framework with a 500M-parameter genomic foundation model (GenomeOcean) as teacher to generate soft labels, reducing label noise. Experiments on seven CAMI2 datasets show TaxDistill outperforms baselines, e.g., improving F1 score on Gastrointestinal dataset from 0.763 to 0.941.

TaxDistill reduces label noise in metagenomic classification via knowledge distillation
Introduces GenomeOcean, a 500M-parameter genomic foundation model as teacher

Continuity and Ordinality Matter: Constraining Time Series Tokens for Effective Time Series Analysis with Large Language Models

2026-05-29

This paper proposes COM, a strategy that integrates geometric constraints into token initialization and training to preserve the inherent continuity and ordinality of time series tokens, consistently improving the performance of token-based time series LLMs on multiple benchmarks.

Token-based time series LLMs overlook continuity and ordinality, limiting performance.
COM applies geometric constraints during initialization and training to preserve these properties.

Molecular Lead Optimization via Agentic Tool Planning

2026-05-29

TRACE is a trajectory-aware LLM-reasoning agent for molecular lead optimization that treats tool selection as a sequential decision-making problem, enabling forward-looking structural refinement under constraints, achieving higher success rates and property improvements on ADMET tasks.

TRACE formulates tool selection as sequential decision making over action trajectories.
It enables trajectory-aware decisions to improve ADMET properties while preserving molecular similarity.

Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?

2026-05-29

Recent work shows RL retains prior capabilities more effectively than SFT. This paper extends to the mechanistic level, introducing differential circuit vulnerability to measure circuit degradation. On Qwen2.5-3B-Instruct for scientific QA, SFT adapts faster but causes greater circuit disruption and forgetting, while RL preserves circuits at the cost of slower adaptation. Results suggest circuit preservation explains RL's robustness against catastrophic forgetting.

SFT adapts quickly but disrupts internal circuits, leading to catastrophic forgetting.
RL preserves more of the base model's circuits, resulting in less forgetting but slower task adaptation.

Representation Signatures and Risk-Feedback Alignment in LLM Trading Agents

2026-05-29

This paper studies behavioral alignment and representation dynamics of LLM agents in financial environments using TradeArena. It identifies measurable pre-failure signatures like planning embedding drift and effective-rank contraction. Structured risk feedback can serve as an external alignment signal but is not a universal performance enhancer. A 51-stock experiment reveals a correlation blind spot where LLM rationales justify concentrated exposure to coupled assets.

LLM agents exhibit measurable pre-failure signatures including planning embedding drift and effective-rank contraction.
Structured risk feedback acts as an external alignment signal but varies in effectiveness across models.

VFEAgent: A Multimodal Agent Framework for End-to-End Automated Finite Element Analysis

2026-05-29

VFEAgent is an end-to-end multi-agent system that automates finite element analysis (FEA) modeling and simulation directly from input images and problem descriptions. It combines a multimodal vision-language multi-agent pipeline with a verification-first code synthesis framework, using ReAct-driven reasoning to extract structured FEA specifications and incorporating self-debugging and fallback mechanisms for executability and physical validity. Experiments show high success rates in generating complete, physically valid simulations, outperforming LLM-based baselines in reliability and correctness, and promising to free engineers from tedious manual analysis.

VFEAgent automates FEA modeling and simulation from images and problem descriptions.
Employs a multimodal vision-language multi-agent pipeline with ReAct-driven reasoning.

Frontier LLM-based agents can overcome the ontology curation bottleneck for natural phenotypes

2026-05-29

A new study uses five frontier LLMs from Anthropic and OpenAI as 'agentic curators' in a self-contained workspace to automate phenotype annotation. The agents achieved consistency within the range of human curators and substantially outperformed traditional NLP tools, addressing the scalability bottleneck in ontology curation.

Phenotype annotation relies on human experts, which is labor-intensive and hard to scale.
The study deployed five frontier LLMs as agentic curators in a self-contained workspace.

Orthogonal Concept Erasure for Diffusion Models

2026-05-29

This paper introduces Orthogonal Concept Erasure (OCE), which uses multiplicative parameter updates for precise concept removal while preserving generative capacity, supporting multi-concept erasure with high speed.

Existing editing-based methods rely on additive updates that interfere with generative capacity.
OCE uses orthogonal transformations as multiplicative updates, preserving neuron direction and angular geometry.

Review Arcade: On the Human Alignment and Gameability of LLM Reviews

2026-05-29

This paper empirically evaluates LLM-generated reviews for scientific papers, finding limited alignment with human reviews that varies significantly across prompts and models. It also shows that authors can game the system by iteratively revising papers based on LLM feedback, achieving statistically significant score increases for up to 35% of papers.

LLM reviews show limited alignment with human reviews
Alignment quality varies substantially across different prompts and models

The Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language Modeling

2026-05-29

The Cognitive Categorical Transformer (CCT) is a 306M-parameter architecture that augments GPT-2 Small with cognitive and category-theoretic components, achieving 21.27 perplexity on WikiText-103, a 2.92 (12%) reduction over a fine-tuned baseline. Ablations attribute 84% of the improvement to GT-Full simplicial message passing. The study also identifies a structure/consistency distinction among categorical priors.

CCT achieves 21.27 perplexity on WikiText-103, 2.92 lower than GPT-2 Small baseline.
Ablation studies attribute 84% of the gain to GT-Full simplicial message passing.

Behavior-Aware Auxiliary Corrections for Off-Policy Temporal-Difference Prediction

2026-05-29

This paper proposes behavior-aware auxiliary corrections to stabilize off-policy temporal-difference learning. By replacing the auxiliary covariance matrix with the behavior Bellman matrix, the authors introduce BA-TDC and BA-TDRC algorithms. Theoretical analysis proves fixed-point preservation and almost-sure convergence. Experiments on standard benchmarks show that the behavior-aware replacement improves performance, but regularization is needed for robust results.

Behavior-aware auxiliary corrections improve stability of off-policy TD learning.
BA-TDC and BA-TDRC replace the auxiliary covariance matrix with the behavior Bellman matrix.

Behavior-Induced Mirror-Prox Temporal-Difference Learning for Faster Off-Policy Prediction

2026-05-29

This paper proposes STHTD-MP, a behavior-induced Mirror-Prox temporal-difference method that replaces the covariance metric with the symmetric part of the behavior-policy Bellman matrix to improve off-policy prediction speed. Theoretical convergence analysis and numerical experiments on several benchmarks show improved performance over GTD2-MP.

STHTD-MP uses behavior-policy transition information to construct a more informative update geometry.
Rigorous convergence analysis is provided for fixed-policy linear prediction.

Strengthening societal resilience with Rosalind Biodefense

2026-05-29

OpenAI launches Rosalind Biodefense, expanding trusted access to GPT-Rosalind for vetted developers and U.S. government partners advancing biodefense, public health, and pandemic preparedness through frontier AI.

OpenAI launches Rosalind Biodefense initiative
Expands trusted access to GPT-Rosalind model

Anthropic's run-rate revenue hits $47 billion

2026-05-29

Anthropic announced in its $65 billion Series H funding that its annualized run-rate revenue crossed $47 billion in early May 2026, up from $30 billion in April and $14 billion in February. The rapid growth has drawn comparisons to unprecedented organic revenue scaling, though some skeptics question the numbers. Anecdotal evidence of a client spending $500 million in a single month on Claude licenses adds context.

Anthropic's run-rate revenue reached $47 billion as of early May 2026.
Revenue grew from $9 billion (end of 2025) to $14 billion (Feb), $30 billion (Apr), and $47 billion (May).

Claude Opus 4.8: "a modest but tangible improvement"

2026-05-28

Anthropic released Claude Opus 4.8, described as a modest but tangible improvement over its predecessor. Key highlights include enhanced honesty (reduced unsupported claims, four times less likely to overlook code flaws), and new features like mid-conversation system messages. Pricing remains unchanged, but fast mode costs are significantly reduced.

Anthropic launches Claude Opus 4.8, honestly calling it a 'modest but tangible improvement'.
Honesty improved: model is less prone to unsupported claims and four times less likely to miss code flaws.

Claude 4.8 Arrives: Surpasses Mythos in Some Areas, Supports Hundreds of Parallel Sub-Agents

2026-05-28

Anthropic released Claude Opus 4.8, showing improvements in terminal engineering and knowledge work, outperforming Mythos in certain benchmarks. The model features enhanced honesty and a new Dynamic Workflows capability that orchestrates hundreds of parallel sub-agents. Early testers report significant gains in code quality and task reliability.

Claude Opus 4.8 was released just 43 days after 4.7, with notable gains in coding and knowledge tasks
Dynamic Workflows: Claude generates JavaScript orchestration scripts to coordinate hundreds of parallel sub-agents

llm-anthropic 0.25.1

2026-05-28

Release of llm-anthropic 0.25.1 adds support for Claude Opus 4.8, fast mode option for eligible accounts, and changes default max_tokens to each model's maximum output.

New model: Claude Opus 4.8 (claude-opus-4.8).
New -o fast 1 option for fast mode (for organizations with feature enabled).

Anthropic Ships Claude Opus 4.8 Alongside Dynamic Workflows and Cheaper Fast Mode, With Workflows Capped at 1,000 Subagents

2026-05-28

Anthropic launches Claude Opus 4.8 with two Claude Code updates: dynamic workflows that coordinate up to 1,000 subagents in parallel, and a cheaper fast mode that speeds up output 2.5x. Both are in research preview.

Dynamic workflows let Claude write orchestration scripts for parallel subagents, with up to 16 concurrent and 1,000 total per run.
Fast mode delivers 2.5x faster output for Opus 4.8 at three times lower cost, requiring usage credits.

Training Azerbaijani language models on Amazon SageMaker AI

2026-05-28

Azercell Telecom collaborated with the AWS Generative AI Innovation Center to build an Azerbaijani LLM on Amazon SageMaker AI, achieving 23% higher training throughput, 58% lower peak GPU memory, and 2× token efficiency via custom tokenizer, FSDP, and Liger Kernel optimizations.

Azercell developed a production-ready Azerbaijani LLM framework using Amazon SageMaker AI.
Custom tokenizer reduced tokens per word from 3.22 to 1.59, doubling encoding efficiency.

Anthropic ships Claude Opus 4.8 as a "modest but tangible improvement" that tops GPT-5.5 in most benchmarks

2026-05-28

Anthropic releases Claude Opus 4.8, which beats GPT-5.5 and Gemini 3.1 Pro in most benchmarks. The model also catches its own coding errors four times more often than its predecessor. Alongside the launch, Anthropic is rolling out dynamic workflows that can spin up hundreds of parallel sub-agents to handle tasks like codebase-wide migrations.

Claude Opus 4.8 outperforms GPT-5.5 and Gemini 3.1 Pro in most benchmarks.
The model catches its own coding errors four times more often than its predecessor.

AI Model Release Tracker: Opus 4.8's misalignment rates similar to Claude Mythos Preview

2026-05-28

Not every new model is all it's cracked up to be. Our tracker keeps each release in context with its peers, so you know which models are worth your time. This article summarizes major model releases of 2026 so far, including Claude Opus 4.8, GPT-5.5 Instant, Nemotron 3 Nano Omni, GPT-5.5, ChatGPT Images 2, Claude Opus 4.7, Claude Mythos (Preview), GPT-5.4, Claude Opus 4.6, and GPT-5.3-Codex, with details on their features and significance.

Anthropic's Opus 4.8 offers faster thinking at lower cost, claims lower misalignment rates than Opus 4.7, comparable to Mythos Preview.
OpenAI's GPT-5.5 Instant reduces hallucinations by 52.5%, becomes default ChatGPT model, helping reduce misinformation spread.

Using Claude Code with GPT 5.5, Gemini 3.5, Grok 4.3, and other models

2026-05-28

Claude Code now supports one-click model switching, BYOK, and compatibility with Anthropic and OpenAI APIs. Get started at $5/mo to route around outages and rate limits.

One-click model switching.
Bring your own key (BYOK).

Mistral AI, Digital Realty Partner to Scale European AI Infrastructure

2026-05-28

The French startup has secured 10 megawatts of compute at Digital Realty's Paris South campus.

Mistral AI secured 10 MW compute capacity at Digital Realty's Paris South campus
Partnership aims to scale European AI infrastructure

Claude Opus 4.8 is here: effort controls, dynamic workflows, cheaper fast mode, better honesty, less deception

2026-05-28

Anthropic released Opus 4.8 with user-controllable effort, dynamic workflows for large-scale coding, fast mode at one-third the previous cost. Benchmarks show it leads GPT-5.5 and Gemini 3.1 Pro except in terminal coding. Improvements in honesty, autonomy support, and reduced deception.

Users can now control Claude's "effort" level to balance response quality and speed.
Dynamic workflows (research preview) allow Claude to plan and run hundreds of parallel subagents in a single session, enabling codebase-scale migrations.

Claude Opus 4.8 is now available on AWS

2026-05-28

Anthropic's most advanced Opus model, Claude Opus 4.8, is now available on Amazon Bedrock and the Claude Platform on AWS. It delivers improvements in coding, agentic tasks, and professional work with greater consistency and autonomy for long-running production workflows.

Claude Opus 4.8 is Anthropic's most advanced Opus model, now available on AWS.
It offers enhanced performance in coding, multi-stage autonomous tasks, and professional work with lower output variance.

Claude’s new model is more ‘honest’ when it messes up

2026-05-28

Anthropic is releasing Claude Opus 4.8 on Thursday, touting the model's 'honesty.' Early testers found it more likely to flag uncertainties and less likely to make unsupported claims. Evaluations show it is about 4x less likely than its predecessor to allow code flaws to pass unremarked. Users can also direct the amount of effort Claude puts into a task, and a 'dynamic workflows' feature allows parallel subagents.

Claude Opus 4.8 is more inclined to flag uncertainties and avoid unsupported claims.
It is about 4x less likely than its predecessor to overlook code flaws.

Introducing Claude Opus 4.8

2026-05-28

Anthropic released Claude Opus 4.8, the latest upgrade to its flagship model. It improves on Opus 4.7 across benchmarks, with notable gains in honesty and agentic capabilities. New features include effort control, dynamic workflows in Claude Code, and API improvements. Pricing remains unchanged, while fast mode is now three times cheaper. The company also previews upcoming higher-intelligence models.

Claude Opus 4.8 outperforms Opus 4.7 on multiple benchmarks, especially in honesty and agentic tasks
New features: effort control for users, dynamic workflows in Claude Code, and system entries in Messages API

Tools

Show HN: Got laid off, built a mural with no dev experience, hit 200k Reddit views

2026-05-29

Laid off earlier this year, built One Tile in one night using AI tools and no-code platform Base44, with zero development experience. The project garnered 200k views on Reddit.

Built One Tile in one night after getting laid off.
Used AI tools and no-code platform Base44, no dev experience.

Jony Ive’s funky Ferrari

2026-05-29

Ferrari's first electric car, the Luce, designed with Jony Ive, has a divisive look and packed with new tech. This Vergecast episode discusses its design, market impact, and the growing public distaste for AI.

Ferrari's first EV, the Luce, features unconventional design by Jony Ive.
The Vergecast debates the Luce's design, technology, and electric vehicle demand.

Boston Children’s uses AI to unlock new diagnoses

2026-05-29

Boston Children’s Hospital uses OpenAI technology to improve patient care, reduce operational burden, and help diagnose more than 40 rare disease cases.

Boston Children’s Hospital employs OpenAI technology to aid rare disease diagnosis
AI reduces operational burden on healthcare staff

Understant AI generated code Fast

2026-05-29

ArchToCode is a tool that generates AI Mermaid diagrams from code and GitHub.

ArchToCode converts code into Mermaid diagrams
Integrates with GitHub

Why I’m grateful to the Pope for his encyclical on AI | Francine Prose

2026-05-29

The intelligent and thoughtful encyclical is an important warning of the uses and misuses of a rapidly developing technology. Silicon Valley is wrong to dismiss it.

Pope Leo XIV issued encyclical 'Magnifica Humanitas' on AI.
Encyclical warns about uses and misuses of AI.

Amazon kills internal AI leaderboard after employees gamed it with pointless tasks

2026-05-29

Amazon is removing an internal AI ranking system after employees inflated their scores through meaningless AI usage, driving up cloud costs.

Amazon shut down internal AI leaderboard due to employee score inflation.
Employees used AI for trivial tasks like summarizing emails and generating irrelevant images.

Funny but serious, Chieng issues an AI warning to grads

2026-05-29

Comedian Chieng delivered a humorous yet earnest warning about AI to Harvard graduates during the 375th Commencement ceremony.

Chieng addressed AI risks in a comedic manner.
Harvard's 375th Commencement featured the warning.

Drafted: Design a Home Instantly with AI

2026-05-29

Drafted is an AI tool that allows users to instantly design home spaces.

AI-powered home design tool
Instant generation of design renders

StudySong – paste anything you need to memorize, get a full AI-generated song

2026-05-29

StudySong is an AI tool that transforms study notes or any text into a complete song, with PDF upload support and local-only processing for privacy.

Converts text or PDFs into AI-generated songs
Supports multiple music genres

Rage-Inducing Problems in Tech

2026-05-28

Inspired by Pope Leo XIV's encyclical on AI, this article catalogs 40 frustrating tech problems, from one-time passcodes that never arrive to touchscreens in cars. A humorous critique of tech companies putting profit before people.

The article uses the pope's encyclical to frame a list of 40 tech annoyances.
Common frustrations include broken passcodes, QR code parking apps, and useless chatbots.

Show HN: Pubflow, Backend trust layer for build faster AI based apps

2026-05-28

Pubflow introduces a unified system that integrates authentication, backend logic, and infrastructure, eliminating the need for glue code when building AI-powered applications. It offers multi-database support, multiple language compatibility, and production-ready starter kits.

Pubflow provides a unified trust layer for AI app development.
It combines authentication (Flowless), backend (Flowfull), and infrastructure (Pubflow Cloud).

Microsoft 365 Copilot gets a speed boost and cleaner design

2026-05-28

Microsoft is launching a revamped version of Microsoft 365 Copilot with a cleaner design that loads twice as fast. The update introduces progressive disclosure and improved formatting options.

Redesigned Copilot loads twice as fast and provides more reliable, structured responses
Progressive disclosure feature shows tools and controls based on user prompts

Built a chat-first AI personal operator in 48h – need 5 honest beta testers

2026-05-28

OperatorOS is a private AI personal operator designed to manage tasks via chat. The developer is seeking 5 honest beta testers.

OperatorOS is a chat-first AI personal operator
Built in just 48 hours

Meeting the pope’s call to put humanity first in a world of artificial intelligence | Letter

2026-05-28

Dr Susan Oman on a campaign designed to raise public awareness of AI, arguing that while governments, faith leaders, and tech bosses debate AI's future, the public is consistently left out. She cites evidence showing public concern about AI has risen by 10% in two years, and 91% believe fairness should be prioritized over economic gain.

Public consistently excluded from AI debates despite being most affected
Public concern about AI rose by 10% in two years

Image of Thai police in sparkly dresses with handcuffed suspect turns out to be AI fake

2026-05-28

Picture was created by administrator in charge of station’s Facebook account who wanted to create ‘friendlier image’

An AI-generated image of Thai police in festive dresses with a suspect was widely shared in global media.
The image was created by the police station's Facebook account administrator to promote a friendlier image.

Chips

A Stock Certificate from 1941 Taught Me More About AI Than Anyone from OpenAI

2026-05-29

The article draws parallels between the 19th-century railroad boom and today's AI investment frenzy, highlighting massive capital expenditure, financial innovation, and historical precedents for bubbles and crashes. It argues that AI's financial infrastructure may be as transformative—and as risky—as railroads were.

Railroad investment in the 1850s reached 3-5% of GDP, similar to today's AI capex from five tech giants.
The bond market was created to finance railroads, just as AI is reshaping capital markets.

Orbital Compute

2026-05-29

This article analyzes the feasibility of AI data centers in space, covering physical advantages (continuous sunlight, passive cooling, laser links) and engineering constraints (thermal dissipation, radiation hardening, training synchronization, maintenance). The key assumption is Starship launch costs. Several startups, Google, and SpaceX have announced pilot programs. Near-term investment impact is modest but worth monitoring.

Orbital AI data centers leverage LEO's continuous solar power, passive radiative cooling, and vacuum-speed laser links for potential advantages over terrestrial datacenters
Engineering challenges include thermal dissipation (high-density clusters require impractically large radiators), radiation hardening (commercial chips' orbital longevity unknown), and training synchronization latency

Sam Altman Says AI 'Jobs Apocalypse' He Once Predicted Probably Won't Happen

2026-05-29

OpenAI CEO Sam Altman has reversed his earlier predictions that AI would lead to massive job losses, now saying a 'jobs apocalypse' likely won't occur. He acknowledged his intuitions were off, citing the irreplaceable value of human interaction in the workplace. While other industry leaders still warn of disruption, Altman's remarks reflect considerations of AI costs, adoption pace, and public opinion.

Altman previously predicted AI would replace most jobs, but now says he was 'delighted to be wrong' and does not foresee a jobs apocalypse.
He explained that the human element of work—social interaction—cannot be replaced by AI, updating his view on the jobs landscape.

You're Not Going to Lose Your Job to AI

2026-05-29

The article draws parallels between historical technological cycles (e.g., Einstein's miracle year, the electric revolution) and the current AI boom, arguing that foundational breakthroughs are followed by long application phases. During these phases, some jobs disappear but many new ones emerge. AI is in its theoretical breakthrough phase, and the subsequent application era will create more opportunities than it destroys.

Historical patterns show that revolutionary theory is followed by decades of application, which eliminates some jobs but creates many new ones.
AI today is akin to Einstein's miracle year in 1905; the application age is yet to come.

Meet mKernel: A Multi-GPU, Multi-Node Fused Kernel Library for GPU-Driven Communication

2026-05-29

UC Berkeley's UCCL team releases mKernel, fusing intra-node NVLink, inter-node RDMA, and dense compute into a single persistent CUDA kernel. Communication can consume 43.6% of forward pass and 32% of training time. mKernel offers five fused kernels and supports ConnectX-7 and AWS EFA backends.

mKernel fuses intra-node NVLink, inter-node RDMA, and compute into a single persistent CUDA kernel
Communication overhead accounts for up to 47% of execution time in MoE models

ChatGPT isn't the only chatbot pulling answers from Elon Musk's Grokipedia

2026-05-29

ChatGPT and other AI tools are increasingly citing Grokipedia, Elon Musk's AI-generated encyclopedia, raising concerns about accuracy and misinformation. Although Grokipedia currently accounts for a small share of citations, its usage is rising, especially in ChatGPT where it is often treated as a primary source. Experts warn that using AI-generated, human-oversight-free Grokipedia as a source could spread biases, errors, and even data poisoning risks.

ChatGPT, Google AI Overviews, and Gemini are among tools citing Grokipedia
Grokipedia citations have grown steadily since November but remain far below Wikipedia

AI Weekly Issue #497: AI's labor war just went global

2026-05-29

This week, the AI-and-work conflict simultaneously erupted across four jurisdictions: Wikipedia editors threaten strike over layoffs, Amazon employees game internal AI ranking into uselessness, Chinese courts enforce ban on AI-justified layoffs, UK thinktank calls for employee say in AI deployment. Meanwhile, frontier labs deepen government ties.

Wikipedia editors threaten strike in protest of foundation layoffs
Amazon employees game internal AI ranking system into uselessness

Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler

2026-05-29

This is the first post in the Profiling in PyTorch series, starting with a simple matrix multiplication and bias addition to teach readers how to use torch.profiler. It covers setting up the profiler, reading the profiler table and trace, understanding CPU/GPU activity gaps, and the impact of warmup and matrix size on performance regimes.

torch.profiler outputs a table and a trace; the table identifies hotspots, the trace shows temporal execution. Small matmuls are overhead-bound; scaling up makes them compute-bound. Warmup eliminates startup overheads, producing consistent profile steps. CPU-GPU offset reflects kernel launch and synchronization delays.

Apple Working to Cram Gemini into iPhone

2026-05-28

Apple has long touted the privacy benefits of on-device AI, but a new report suggests its Gemini-powered Siri will rely heavily on Google and Nvidia cloud servers. While this hybrid approach addresses performance limitations of local models, it represents a trade-off on privacy.

Apple is partnering with Google to integrate Gemini AI into Siri on iPhone.
Due to limited on-device chip performance, Siri will use both local and cloud processing for enhanced AI capabilities.

Policy

Check out real-life AI prototypes from the Futures Lab.

2026-05-29

University of Waterloo students develop AI prototypes like sign language tutors to reshape the future of education and work.

Kanji Garden teaches Japanese through AI-generated stories and visuals.
SignFluent offers real-time feedback for learning American Sign Language.

Poison your data against AI

2026-05-29

Learn how data poisoning can protect personal data from being scraped and used by AI models by injecting misleading information.

Data poisoning is a technique to counter AI data scraping.
It involves adding false information to interfere with AI model training.

LightSail Technology Partners with Tencent Travel Services, Launches New Pre-sale Round

2026-05-29

LightSail Technology announced a strategic partnership with Tencent Travel Services to integrate its AI full-sensing wearable device into the mobility platform. The device previously topped JD.com's bestseller list and sold out; now a new pre-sale round is open with discounts.

LightSail Technology and Tencent Travel Services partner to integrate AI wearable into travel services.
The LightSail AI wearable topped JD.com's bestseller list for 8 consecutive days and sold out.

Give staff more say over AI to ensure they share benefits, UK thinktank urges

2026-05-29

Exclusive: IPPR report backed by TUC proposes ‘worker support levy’ to boost employees’ influence over AI adoption in the workplace.

IPPR report calls for strengthening employee bargaining power on AI decisions
Proposes a 'worker support levy' to ensure fair distribution of AI benefits

AI to be used to estimate age of UK asylum seekers

2026-05-29

The UK government plans to deploy AI facial recognition technology at borders from next year to detect adult migrants posing as children. The technology will estimate age from photos, but human rights groups criticize it as unproven and potentially harmful to children's rights.

UK to deploy AI facial recognition for age estimation of asylum seekers by mid-2027.
Technology aims to identify adults falsely claiming to be children, but Human Rights Watch urges scrapping the plan.

Xerolith: Platform for Persistent AI Memory and Autonomous Belief Formation

2026-05-29

Xerolith is a working platform that achieves persistent identity, autonomous belief formation, and substrate-independent knowledge consolidation through a hierarchical fractal vault architecture. Over 80 days of continuous operation, it has compressed 2,817 raw entries into 1,218 beliefs, with complete genealogical tracing and internal alignment.

Three-layer architecture: entries, lessons, and beliefs for autonomous consolidation from raw data to abstract principles.
Persistent identity maintained over 80+ days and multiple restart cycles.

Learning and Adaptation in Wire Arc Additive Manufacturing Bead Geometry Control

2026-05-29

This paper proposes a data-driven approach using recurrent neural networks and one-step-ahead predictive control for bead geometry control in Wire Arc Additive Manufacturing (WAAM). By updating the model online to account for changing thermal conditions, it significantly improves bead height and width consistency.

Uses recurrent neural network to learn input-output dynamics of WAAM
One-step-ahead predictive control improves bead geometry consistency

Multi-Resolution End-to-End Deep Neural Network for Optimizing Latency-Accuracy Tradeoff in Autonomous Driving

2026-05-29

Researchers propose a multi-resolution end-to-end deep neural network to balance latency and safety in autonomous driving. By selecting input resolution at runtime, the network improves safety metrics like lane invasions, red-light infractions, and collisions in CARLA simulations compared to fixed-resolution baselines.

Latency-accuracy tradeoff is critical for real-time autonomous driving decisions.
Proposed multi-resolution CNN supports runtime input resolution selection under latency budgets.

Disposable Software – How to Stop Worrying and Love the AI Code

2026-05-29

The article explores the concept of 'disposable software' in the AI era, arguing that AI-generated code should be treated as disposable to accelerate development, much like mass-produced furniture replaced artisan craftsmanship. A case study demonstrates successful AI refactoring, and a 'Disposable Code Manifesto' is proposed with three pillars: intent, requirements, and safety.

AI makes software cheap and disposable, analogous to the industrial revolution in furniture.
A real-world Rails project case shows how AI refactoring reduced code from 2000+ lines to 264 lines.

How to Beat Superhuman AIs [at Go] [video]

2026-05-28

This video explores strategies and methods to counter superhuman AI in the game of Go, including exploiting weaknesses, innovative tactics, and understanding AI decision-making.

Superhuman AIs in Go have surpassed top human players
The video analyzes potential AI weaknesses and how to exploit them

Claude company Anthropic nears a trillion-dollar valuation after raising $65 billion in Series H

2026-05-28

Anthropic raises $65 billion in Series H at $965 billion valuation. Annualized revenue exceeds $47 billion. Funds allocated to safety research, compute, and Claude expansion.

Anthropic secures $65 billion in Series H funding
Valuation hits $965 billion, approaching trillion

The AI Gold Rush Is Eating Its Own

2026-05-28

The Wikimedia Foundation, sitting on $296 million in reserves and a profitable AI revenue stream, laid off long-time staff and disbanded the Community Tech team, prompting volunteer editors to threaten a strike. The article explores how 'CEO AI psychosis' distorts organizational priorities and how replacing human judgment with AI can create a downward spiral of degrading data quality.

Wikimedia Foundation fired a 20-year veteran and disbanded the Community Tech team, triggering a strike threat from volunteer editors.
AI companies profit from Wikipedia data but undermine the volunteer community that produces it.

Interviewing in the Age of AI

2026-05-28

This article explores how AI is affecting software engineering interviews, analyzing different interview types (take-home, live exercise, presentation, actual work) across dimensions of signal quality and cost to company. It argues that AI makes take-homes too easy and live coding less relevant, recommending that companies limit AI usage in interviews to preserve signal quality, drawing parallels to classical academic evaluation models.

AI coding threatens current interview models, especially take-home and live coding.
Companies should limit AI usage during interviews to maintain signal quality.

Startups

This AI startup will clean your home for free to train future robots

2026-05-29

AI training startup Shift offers free home cleaning services, but records cleaners to gather training data for robots. The company says the value of the data covers the cost. The service is initially available only in New York, with plans to expand to San Francisco, London, Zurich, and Munich soon.

Shift provides free cleaning in exchange for recording cleaners to train AI robots.
Cleaners wear a special hat with a camera to capture their work.

Anthropic reaches valuation of $965bn, beating OpenAI to become world’s most valuable AI firm

2026-05-28

Claude’s parent company’s $65bn in latest funding round underscores vast sums of money still flowing into industry. Anthropic, the AI firm behind the Claude chatbot, announced on Thursday it had raised $65bn in funding to value the company at $965bn post-money. The move makes Anthropic the world’s most valuable AI startup, eclipsing its competitor OpenAI. The deal marks an exceedingly successful period of growth for Anthropic, which was once considered to be a smaller player in the global AI arms race. The widespread adoption of its products by large enterprise businesses, especially following its release of powerful coding assistants late last year, has turned it into a dominant player in the industry.

Anthropic raised $65bn in funding, valuing it at $965bn.
It surpasses OpenAI as the world's most valuable AI startup.

IBM and Red Hat Invest $5 Billion to Make Open Source More Secure

2026-05-28

The project follows Anthropic's unreleased Mythos AI cybersecurity model, which uncovered serious security holes in software systems.

IBM and Red Hat invest $5 billion in open-source security.
The initiative follows Anthropic's Mythos AI model uncovering security holes.

AI Coding Startup Now Valued at $26 billion

2026-05-28

The new funding is the latest milestone for the fast-growing vendor and underscores the strength of the AI coding market.

AI coding startup reaches $26 billion valuation.
New funding marks another milestone for the company.

A $2,000 AI-generated film will make its debut at Tribeca

2026-05-28

Next month's Tribeca Festival will include the premiere of an AI-generated film: Dreams of Violets. The 75-minute film is a fictional dramatization of the Iranian government's mass killing of protestors in January, with the people and images fully created by AI. It cost $2,000 to make and was created by two Iranian-born brothers using various AI tools.

Dreams of Violets is a 75-minute AI-generated film premiering at Tribeca, costing $2,000.
It dramatizes the Iranian government's mass killing of protestors, using AI for all images.

Robotics

YouTube takes baby steps to being a real podcast app

2026-05-28

YouTube introduces new features for Premium subscribers to enhance podcast listening, including an audio-first 'on-the-go mode', auto speed adjustment, and AI podcast recommendations.

YouTube launches 'on-the-go mode' that converts video interface to audio-first for listening on the move.
New auto speed feature adjusts playback speed dynamically based on content.