AI News HubLIVE

Today's highlights

Agents

Build a test suite that grows with your agent with dataset management in Amazon Bedrock AgentCore

Agent evaluation is most powerful when combining fast-moving online signals with stable offline baselines. Amazon Bedrock AgentCore's dataset management provides versioned test fixtures, enabling consistent measurement and ground truth verification.

  • Versioned datasets in AgentCore provide stable, immutable test scenarios for consistent agent evaluation across runs.
  • Predefined scenarios capture exact expected inputs, tool sequences, and assertions for verifiable ground truth.
In-site article

SIA: The Open Source Self Improving AI

SIA is an open-source self-improving AI framework that autonomously boosts AI system performance on benchmark tasks by coordinating meta, target, and feedback agents. It achieves significant gains: 56.6% on LawBench, 91.9% runtime reduction on GPU kernels, 502% improvement on scRNA denoising, and ranks #1 on MLE-Bench Hard. Supports local execution and custom tasks. MIT licensed.

  • SIA uses an iterative loop of meta, target, and feedback agents for autonomous self-improvement.
  • Achieves substantial performance gains across LawBench, GPU kernel optimization, scRNA denoising, and MLE-Bench.
In-site article

Micron Hits $1T on AI Memory Boom

Micron crossed $1 trillion market cap on May 26-27, joining SK Hynix in the same week as the first pure-play memory chipmakers to enter the trillion-dollar club. Driven by HBM demand from agentic AI workloads, UBS tripled its price target to $1,625 citing long-term supply contracts. Micron stock has more than tripled year-to-date.

  • Micron and SK Hynix both hit $1T market cap in the same week, a first for pure-play memory chipmakers
  • Agentic AI workloads driving record HBM demand
In-site article

AI Agent Frameworks Comparison

As of mid-2026, seven major AI agent frameworks (DSPy, Claude Agent SDK, OpenAI Agents SDK, CrewAI, AutoGen, LangGraph, Google ADK) vary in design philosophy, architecture, production readiness, etc. LangGraph leads in production deployments, Claude Agent SDK offers deepest single-provider capabilities, OpenAI Agents SDK provides cleanest multi-agent handoffs, and CrewAI excels in developer velocity. The market is projected to grow from $7.84B in 2025 to $52.62B by 2030.

  • LangGraph has the most mature durable execution model, deployed by ~400 enterprises.
  • Claude Agent SDK offers the most powerful single-provider capabilities but is locked to Anthropic models.
In-site article

Anthropic launches Opus 4.8, with honesty as its killer feature

Anthropic's latest Claude model, Opus 4.8, emphasizes honesty—making fewer unsupported claims and admitting uncertainty more often. It also introduces dynamic workflows for orchestrating hundreds of subagents on large-scale tasks. Pricing remains unchanged for standard mode, while fast mode gets cheaper.

  • Claude Opus 4.8 shows significant honesty improvements, with error rates dropping about 4x
  • Dynamic workflows can plan and run hundreds of parallel subagents, verifying outputs before reporting back
In-site article

Automate AML alert triage with Amazon Quick and Snowflake Cortex AI

This post demonstrates that integration in action by automating one of the most labor-intensive workflows in financial services: anti-money laundering (AML) alert triage. You will build a triage workflow using Amazon Quick Flows and Snowflake Cortex, connected through the Amazon Quick Model Context Protocol (MCP) integration. In our testing environment, automated workflows built using Amazon Quick reduced alert investigation time from 30-90 minutes to under 5 minutes. Actual results may vary based on alert complexity and data volume.

  • Amazon Quick Flows and Snowflake Cortex integrate via MCP to automate AML alert triage.
  • Automated workflows reduced investigation time from 30-90 minutes to under 5 minutes.
In-site article

Data Formulator 0.7: AI-powered data analytics for enterprise data

Data Formulator 0.7 is an open-source AI-powered system for enterprise data analytics that combines data connectivity, agent-guided exploration, and visualization refinement in a shared workspace.

  • Open-source AI system for enterprise data analytics
  • Data Connectors support governed, reusable connections across diverse data sources
In-site article
Policy

The AI Gold Rush Is Eating Its Own

The Wikimedia Foundation, sitting on $296 million in reserves and a profitable AI revenue stream, laid off long-time staff and disbanded the Community Tech team, prompting volunteer editors to threaten a strike. The article explores how 'CEO AI psychosis' distorts organizational priorities and how replacing human judgment with AI can create a downward spiral of degrading data quality.

  • Wikimedia Foundation fired a 20-year veteran and disbanded the Community Tech team, triggering a strike threat from volunteer editors.
  • AI companies profit from Wikipedia data but undermine the volunteer community that produces it.
In-site article

Interviewing in the Age of AI

This article explores how AI is affecting software engineering interviews, analyzing different interview types (take-home, live exercise, presentation, actual work) across dimensions of signal quality and cost to company. It argues that AI makes take-homes too easy and live coding less relevant, recommending that companies limit AI usage in interviews to preserve signal quality, drawing parallels to classical academic evaluation models.

  • AI coding threatens current interview models, especially take-home and live coding.
  • Companies should limit AI usage during interviews to maintain signal quality.
In-site article
Models

Claude Opus 4.8 is here: effort controls, dynamic workflows, cheaper fast mode, better honesty, less deception

Anthropic released Opus 4.8 with user-controllable effort, dynamic workflows for large-scale coding, fast mode at one-third the previous cost. Benchmarks show it leads GPT-5.5 and Gemini 3.1 Pro except in terminal coding. Improvements in honesty, autonomy support, and reduced deception.

  • Users can now control Claude's "effort" level to balance response quality and speed.
  • Dynamic workflows (research preview) allow Claude to plan and run hundreds of parallel subagents in a single session, enabling codebase-scale migrations.
In-site article

Claude Opus 4.8 is now available on AWS

Anthropic's most advanced Opus model, Claude Opus 4.8, is now available on Amazon Bedrock and the Claude Platform on AWS. It delivers improvements in coding, agentic tasks, and professional work with greater consistency and autonomy for long-running production workflows.

  • Claude Opus 4.8 is Anthropic's most advanced Opus model, now available on AWS.
  • It offers enhanced performance in coding, multi-stage autonomous tasks, and professional work with lower output variance.
In-site article

Claude’s new model is more ‘honest’ when it messes up

Anthropic is releasing Claude Opus 4.8 on Thursday, touting the model's 'honesty.' Early testers found it more likely to flag uncertainties and less likely to make unsupported claims. Evaluations show it is about 4x less likely than its predecessor to allow code flaws to pass unremarked. Users can also direct the amount of effort Claude puts into a task, and a 'dynamic workflows' feature allows parallel subagents.

  • Claude Opus 4.8 is more inclined to flag uncertainties and avoid unsupported claims.
  • It is about 4x less likely than its predecessor to overlook code flaws.
In-site article
Research

AI is changing how we think, not replacing it | Letters

Richard Thackeray and Phil Snell respond to an article by Wendy Liu on using artificial intelligence, arguing that AI enhances curiosity rather than diminishing it.

  • Wendy Liu raises concerns about labour redundancies, hype, and environmental cost of AI.
  • Richard Thackeray, a heavy AI user, finds AI makes him more curious and enables exploration of new territory.
In-site article

How to force Google AI Overviews to prioritize your favorite news sources

Google's Preferred Sources feature is now available in AI Overviews and AI Mode, allowing you to add your favorite sites to appear more prominently in AI-powered searches, along with new carousel and 'Highly Cited' badges.

  • Google's Preferred Sources feature now works with AI Overviews and AI Mode.
  • You can add favorite news sites to make them more prominent in AI search results.
In-site article
Tools

Meeting the pope’s call to put humanity first in a world of artificial intelligence | Letter

Dr Susan Oman on a campaign designed to raise public awareness of AI, arguing that while governments, faith leaders, and tech bosses debate AI's future, the public is consistently left out. She cites evidence showing public concern about AI has risen by 10% in two years, and 91% believe fairness should be prioritized over economic gain.

  • Public consistently excluded from AI debates despite being most affected
  • Public concern about AI rose by 10% in two years
In-site article

Image of Thai police in sparkly dresses with handcuffed suspect turns out to be AI fake

Picture was created by administrator in charge of station’s Facebook account who wanted to create ‘friendlier image’

  • An AI-generated image of Thai police in festive dresses with a suspect was widely shared in global media.
  • The image was created by the police station's Facebook account administrator to promote a friendlier image.
In-site article
Startups

A $2,000 AI-generated film will make its debut at Tribeca

Next month's Tribeca Festival will include the premiere of an AI-generated film: Dreams of Violets. The 75-minute film is a fictional dramatization of the Iranian government's mass killing of protestors in January, with the people and images fully created by AI. It cost $2,000 to make and was created by two Iranian-born brothers using various AI tools.

  • Dreams of Violets is a 75-minute AI-generated film premiering at Tribeca, costing $2,000.
  • It dramatizes the Iranian government's mass killing of protestors, using AI for all images.
In-site article
Robotics

YouTube takes baby steps to being a real podcast app

YouTube introduces new features for Premium subscribers to enhance podcast listening, including an audio-first 'on-the-go mode', auto speed adjustment, and AI podcast recommendations.

  • YouTube launches 'on-the-go mode' that converts video interface to audio-first for listening on the move.
  • New auto speed feature adjusts playback speed dynamically based on content.