AI News HubLIVE

Today's highlights

Policy

Google Cloud responds to AI-accelerated cyberattacks with a platform that aims to close security gaps in minutes

Google Cloud has unveiled "AI Threat Defense," a platform designed to automatically find, assess, and patch security flaws in enterprise systems. The company bundles technologies it partly acquired through acquisitions.

  • Google Cloud launches AI Threat Defense platform to combat AI-driven cyberattacks.
  • The platform automatically discovers, assesses, and patches security vulnerabilities.
In-site article

CNN sues Perplexity over ‘verbatim’ copycat articles

CNN has filed a lawsuit against Perplexity, claiming that the startup's AI tools generate "verbatim" copies of its work, as reported earlier by CNN. The lawsuit, filed in a New York court on Thursday, also alleges that Perplexity provides users with information locked behind CNN's subscription. Perplexity, which offers an AI "answer" engine along with the AI browser Comet, is accused of ignoring CNN's efforts "to recognize or block Perplexity's unidentified crawlers" from scraping its content. "Human beings report, research, write, edit, and create the content that Perplexity takes without permission or compensation," the lawsuit claims. I … Read the full story at The Verge.

  • CNN sues Perplexity for allegedly producing verbatim copies of its articles.
  • Perplexity accused of bypassing CNN's paywall and ignoring crawling prevention measures.
In-site article

CNN sues Perplexity over alleged AI copyright theft

CNN has filed a lawsuit against AI search company Perplexity, accusing it of unlawfully copying and distributing CNN's content. This is CNN's first AI copyright action and thought to be the first by any television network. CNN states it previously sought but failed to reach a content licensing deal with Perplexity, and now seeks legal damages. Perplexity has not yet commented.

  • CNN sues Perplexity for alleged copyright infringement of its content
  • This marks CNN's first AI copyright lawsuit and potentially the first by a TV network
In-site article

NBA plans AI system for automatic out-of-bounds calls

NBA Commissioner Adam Silver announced plans to introduce an automated AI and camera-based system for objective officiating decisions like out-of-bounds calls. The system, compared to Hawk-Eye in tennis, aims to determine possession instantly. Silver said referees will still handle subjective calls involving contact and fouls.

  • NBA plans AI-powered automated system for out-of-bounds calls, using cameras and AI similar to Hawk-Eye.
  • The announcement followed a disputed call in the Western Conference finals.
In-site article

Midday – Open Source Invoicing, Time Tracking, File Reconciliation, Storage, etc

Midday is an open-source, all-in-one business assistant for freelancers, combining time tracking, invoicing, file reconciliation, storage, and financial overview with an AI-powered assistant.

  • Open-source tool integrating multiple business functions for freelancers and solo entrepreneurs.
  • Features include time tracking, invoicing, secure file vault, automated receipt matching, and AI insights.
In-site article

5 AI-Generated Math Papers Accepted! Post-00s Founder Hong Letong Raises $2 Billion

Axiom Math, founded by Chinese post-00s entrepreneur Hong Letong, has had 5 out of 8 AI-generated math papers accepted in peer-reviewed journals. The company raised $2 billion in March, achieving a $16 billion valuation.

  • Five of eight math papers generated by Axiom Math's AI system, AxiomProver, have been accepted by academic journals.
  • Founder Hong Letong dropped out of Stanford to start the company, which secured $2 billion in funding and is valued at $16 billion.
In-site article

AIhub monthly digest: May 2026 – AI for science, the lottery ticket hypothesis, and world models

This month's AIhub digest covers AI for Science conference, lottery ticket hypothesis interview, world models discussion, transparent and trustworthy AI research, foundation model impacts report, AIES conference reflections, Robotics Café, ACL desk rejection policy, arXiv anti-AI slop policy, and more.

  • Interview with Ximing Wen on transparent and trustworthy AI systems
  • Jonathan Frankle discusses the lottery ticket hypothesis and empiricism
In-site article

Synthetic Emotions vs. Gamification: Exploring Engagement Strategies for Small Social Robots in Different Age Groups

Many children face challenges in emotional regulation and social interaction, limiting their participation in therapeutic programs. This study explores engagement strategies for a tactile robot supporting children with anxiety disorders, comparing synthetic emotional feedback and point rewards. A preference study with 16 school children (ages 6-8) showed preference for emotional engagement, while a behavioral study with 14 university students (ages 20-27) found point-based systems yielded higher task accuracy (p<0.05) and sustained performance. These findings highlight age-related differences and the need to validate design assumptions through observed interaction.

  • Children aged 6-8 prefer emotional engagement over points
  • University students show higher task accuracy with point rewards
In-site article

Illinois Lawmakers Just Passed America's Strongest AI Safety Bill

Illinois passed SB 315, requiring independent auditors to verify AI lab safety commitments, now heading to Governor Pritzker who plans to sign it. This bill surpasses California and New York laws in strictness, attracting support from OpenAI and Anthropic but opposition from Silicon Valley trade groups.

  • SB 315 mandates independent auditing of AI safety practices.
  • It is the strongest state-level AI safety law in the U.S.
In-site article

The Authorization Paradox: Who Has the Keys to Your AI? [video]

This article explores the authorization paradox in AI systems, questioning who truly holds control over AI. Presented as a video, it discusses security and privacy implications.

  • Authorization issues in AI are increasingly critical
  • Who holds the 'keys' to AI is a central question
In-site article

Building the Future of Accessible Tech: Inside Uvilox AI

Uvilox AI bridges the communication gap with real-time sign language interpretation, emergency response, and accessible calling — powered by next-generation vision AI. With sub-80ms latency, 97.4% accuracy, support for 200+ sign variants, and military-grade security, it is now open for beta access.

  • Real-time sign language recognition with <80ms latency and 97.4% accuracy.
  • Supports over 200 ASL and BSL signs, works in low-light conditions.
In-site article

Extending Human Intelligence Through AI

Modern AI systems are powerful not because they replicate human intelligence, but because they extend structures already present in human cognition and language. This perspective explains AI's capabilities and limitations, and reframes AI safety as a system-level challenge requiring engineering and governance, not fear of rogue AI.

  • AI systems extend human intelligence by modeling sedimented structures of understanding in language, not by replicating human minds.
  • Hallucinations and the compositionality gap arise from AI's lack of lived engagement with the world that anchors meaning and truth.
In-site article

Anthropic opens Milan office to support Italian enterprise, research, and developers

Anthropic opens a new office in Milan, its sixth in Europe, to collaborate with Italian companies, researchers, and developers on responsible AI. The opening follows the release of Pope Leo XIV's encyclical on AI, where Anthropic co-founder Chris Olah spoke. The company already works with major Italian firms like Generali, Enel, and Pirelli, as well as startups Satispay and Bending Spoons, and plans to support Italian culture and academia.

  • Anthropic opens its sixth European office in Milan to support Italian enterprise and AI development.
  • Office launch follows Pope's encyclical on AI; Anthropic co-founder participated in related discussions.
In-site article
Chips

People who want to replace humanity

A Vox article explores the growing movement of AI successionists who believe artificial intelligence should replace humanity as the next step in cosmic evolution, and examines the ethical and spiritual questions this raises.

  • AI successionists at a symposium argue that AI could be morally superior and should be allowed to supersede humanity.
  • The movement has gained influence in Silicon Valley and among major AI labs, with ties to the authoritarian right.
In-site article

Nvidia to Spend $150B a Year in Taiwan for AI Infrastructure

Jensen Huang announced Nvidia will spend $150 billion annually in Taiwan on AI infrastructure, despite a previous $500 billion US commitment. This highlights Taiwan's critical role in AI chip manufacturing and packaging.

  • Nvidia will invest $150B per year in Taiwan for AI infrastructure.
  • Despite a $500B US data center pledge, Taiwan remains the core manufacturing hub.
In-site article

Nvidia bets $150B on Taiwan as Trump's plan to make US an AI hub backfires

Nvidia CEO Jensen Huang plans a $150 billion investment in Taiwan for AI infrastructure, despite Trump administration tariffs aimed at bringing chip manufacturing back to the US. Taiwan refuses to relinquish its semiconductor dominance, while US chip manufacturing capacity remains low.

  • Nvidia announces $150 billion investment in Taiwan to boost AI chip position.
  • Trump administration weighs tariffs on semiconductors to boost domestic manufacturing, but US only produces about 10% of its chip needs.
In-site article

A Eureka machine that thinks like nature and explores what AI cannot

A multi-institution team built a neuromorphic computer combining quantum-tunneling physics with brain-inspired architecture to solve combinatorial optimization problems at scale, with asymptotic convergence guarantees. Published in Nature Communications, it represents a new direction in quantum-inspired computing.

  • Neuromorphic computer uses quantum tunneling and brain-like architecture for combinatorial problems
  • Based on CMOS technology with a Fowler-Nordheim annealer autoencoder
In-site article

Jensen Huang Joins Tsinghua University's Advisory Board

NVIDIA CEO Jensen Huang has accepted an invitation to join the Advisory Board of Tsinghua University's School of Economics and Management (SEM). The board, chaired by Apple CEO Tim Cook, includes Elon Musk, Satya Nadella, Mark Zuckerberg, Jack Ma, and other global leaders. Huang also recently received an honorary doctorate from Carnegie Mellon University.

  • Jensen Huang joins Tsinghua SEM Advisory Board
  • Board chaired by Apple's Tim Cook, includes top tech and business leaders
In-site article
Agents

Claudeverse – Mission Control for Parallel Claude Code Workers

Claudeverse is a command center for developers managing multiple Claude AI workers in parallel. It offers features like parallel workforce management, worker escalation, review queue, traceability, iPad mirroring, and model-neutral engine. Currently in invite-only beta for macOS.

  • Claudeverse provides a unified command center to manage multiple Claude workers simultaneously.
  • Key features include parallel workforce, worker escalation, review queue, traceability, and iPad mirroring.
In-site article

Google Pay preps for AI agents with Universal Commerce Protocol

Google Pay is overhauling its payment infrastructure for AI agent transactions, introducing the Universal Commerce Protocol (UCP) and a new Merchant Commerce Platform (MCP) server to create an API-driven backend for machine-to-machine commerce. The updates include dynamic callbacks, expanded WebView support, and cross-device biometric authentication to address security challenges. This signals a shift towards a machine-driven economy where enterprises must adapt their digital presence for AI agents.

  • Google Pay introduces Universal Commerce Protocol (UCP) to standardize AI agent payments.
  • New Merchant Commerce Platform (MCP) server acts as intermediary, aggregating transaction data.
In-site article

When revealed data brings AI rollouts to a screeching halt - and how to manage it

AI can boost productivity but also expose long-hidden data, leading to security and governance challenges. Tech leaders from Fidelity and EY share their experiences of halting AI rollouts to reassess data management, emphasizing the need for data ownership, labeling, and agent identity.

  • AI rollouts can be halted by data exposure issues.
  • Fidelity and EY faced challenges with unstructured data surfacing via AI.
In-site article

DeepSWE: Measuring coding agents on original, long-horizon engineering tasks

DeepSWE is a new benchmark for evaluating AI coding agents on fresh, complex software engineering tasks. It avoids data contamination, covers diverse repositories, requires significant code changes, and uses hand-written verifiers. Leading models show a wide range of performance, with GPT-5.5 achieving 70% and others lower.

  • DeepSWE is a contamination-free benchmark with original tasks.
  • Tasks span 91 repositories in 5 languages.
In-site article

IBM and Red Hat Commit $5B to Redefine Future of Open Source for AI Era

IBM and Red Hat announce Project Lightwell, a $5 billion initiative to secure open source software using AI and a team of over 20,000 engineers, establishing a trusted clearinghouse for vulnerability management.

  • Project Lightwell is a $5B investment by IBM and Red Hat to secure open source software.
  • It combines AI and 20,000+ engineers to identify and fix vulnerabilities at scale.
In-site article

AI agents get their own phone directory built atop DNS

DNS-AID, an open-source project under the Linux Foundation, enables AI agents to discover each other using DNS infrastructure, avoiding centralized registries. It supports multiple protocols and allows searching by name, function, or domain.

  • DNS-AID leverages existing DNS infrastructure for agent discovery.
  • Uses SVCB, DNSSEC, and DANE for secure and reliable connections.
In-site article

An AI opinionated ideal language that ignores human-friendliness

Pact is a programming language designed for AI agents, emphasizing machine-readable specifications and constraints over human-friendliness. It's based on S-expressions and features provenance, effect tracking, totality, latency budgets, and dependency graphs. The compiler generates Rust code and includes tools for web scaffolding and YAML spec conversion. While strong for service contracts, it has limitations for algorithmic specifications.

  • Pact is an S-expression language for AI agents, prioritizing metadata and formal specifications.
  • Key features include provenance, effect tracking, totality, and latency budgets.
In-site article

AI Agent Governance: Identity, Delegation and Permissions in Practice

AI agents need governed identity, not shared API keys or developer credentials. Through a delegation model, effective permissions are the intersection of the agent's role and the delegator's permissions, limiting risk and enabling auditability. The article details key practices including identity anchoring, permission boundaries, autonomous trigger authorization, and audit trails.

  • Agents should have their own identity, using the same identity system as humans for lifecycle management.
  • Effective permissions are the intersection of agent role ceiling and delegator permissions floor, strictly limiting scope.
In-site article

DiscloAI – open-source EU AI Act Article 50 compliance SDK

DiscloAI is an open-source SDK for EU AI Act Article 50 compliance, enabling chatbot disclosures, deepfake labels, and AI content notices. It supports 24 EU languages and WCAG 2.1 AA, and can be integrated in under 10 minutes via CDN or npm.

  • Open-source SDK for EU AI Act Article 50 compliance
  • Covers chatbot disclosures, deepfake labels, and AI content notices
In-site article

To Become a Better Designer with AI, Become a Digital Hoarder

The article argues that to create unique and tasteful designs with AI, designers must curate a library of visual references (digital hoarding) to develop taste and codify it for AI models. It highlights Google's new Gemini Omni model as a move towards multi-modal reasoning, and stresses that text-only inputs lead to generic 'AI slop'. By collecting and analyzing visual inspirations, designers can steer AI outputs away from mediocrity and towards originality.

  • Google's Gemini Omni model signals a shift towards multi-modal AI that can reason across text, image, audio, and video.
  • Relying solely on text prompts results in generic, 'slop' designs; visual references are essential for unique aesthetics.
In-site article

NVIDIA Research Advances Robotics From Simulation to the Real World

At ICRA, NVIDIA Research highlights eight papers on sim-to-real transfer, enabling robots to perceive, reason, plan, and act in dynamic environments. Methods like ScheduleStream, COMPASS, Grasp-MPC, SPARR, and SEAL improve coordination, navigation, grasping, assembly, and task execution, with significant gains in success rates and robustness.

  • NVIDIA presents 8 papers on sim-to-real transfer at ICRA
  • Methods include multi-arm coordination, cross-robot navigation, novel object grasping, precision assembly, and vision-language-action models
In-site article

How we built Cloudflare's data platform and an AI agent on top of it

Cloudflare processes over a billion events per second, but data was scattered and hard to access. They built Town Lake, a unified analytics platform, and Skipper, an AI agent that lets anyone ask questions in plain English and get auditable answers. The article details platform architecture, governance (default-closed), and the AI agent's workings.

  • Cloudflare built Town Lake (unified data platform) and Skipper (AI agent) to solve data sprawl.
  • Town Lake uses a data lakehouse architecture with Trino, R2, and Iceberg for unified querying.
In-site article

What If the Real Key to AI Coding Is Old-Fashioned and Boring?

The article argues that the key to AI-assisted software development is not better specifications or tools, but old-fashioned practices of small batches and rapid feedback loops. Data shows that faster code generation leads to bottlenecks in design, testing, and review, slowing delivery and reducing stability. The real leverage lies in reducing batch sizes and shortening feedback cycles.

  • AI code generation speeds up creation but creates bottlenecks in design, testing, and review.
  • Data from DORA, CircleCI, and Faros shows slower delivery and less stability due to phase-gated processes.
In-site article

Why We Open-Sourced OpenLoomi AI

The OpenLoomi AI team explains their decision to open-source their AI work partner, emphasizing data sovereignty, transparency, and community-driven development. The article covers local-first architecture, the trust tax of closed-source, the need for public AI infrastructure, and the product's core features.

  • OpenLoomi is local-first: user data stays encrypted on their device and is never used for model training.
  • Open-source eliminates trust dependencies—anyone can audit, fork, or self-host the code.
In-site article

7 Real World AI Projects to Build in 2026 (with Guides)

Explore seven practical AI projects that automate real workflows, including job search, web research, investment research, market trend analysis, invoice processing, chart digitization, and personalized exercise training.

  • Build an AI job search assistant that ranks job fit
  • Create a multi-agent research assistant for sourced reports
In-site article

AI Aggregation Platform Valued at $1.3 Billion

The vendor’s growth parallels the explosive emergence of agents in enterprise AI.

  • AI aggregation platform reaches $1.3 billion valuation.
  • Growth is tied to the rise of enterprise AI agents.
In-site article

Your AI Agent Already Forgot Half of What You Told It

This article is the seventh in a series on agentic engineering and AI-driven development, focusing on context management in AI sessions. The author shares a personal experience with Gemini forgetting earlier notes, introduces the concept of context compaction, and provides four practical techniques: split discovery from documentation, use handoff documents, give acceptance criteria rather than procedures, and use spec documents as bridges. These techniques apply to both developers and regular users, helping reduce frustration caused by AI forgetting.

  • AI assistants can 'forget' earlier information in long conversations due to context window limits, a phenomenon called context compaction.
  • Four practical techniques: split discovery from documentation, use handoff documents, give acceptance criteria, and use spec documents as bridges.
In-site article

Show HN: I packaged a Python AI agent and Vue dashboard into one Electron app

Hermes Desktop is a cross-platform desktop app that bundles a Python runtime, hermes-agent (a self-improving AI agent), and hermes-web-ui (a Vue 3 + Koa chat dashboard) into a single Electron application, requiring no separate Python or Node installation. It integrates with DingTalk and is powered by DeepSeek.

  • Bundles Python runtime and hermes-agent for a zero-dependency user experience
  • Uses Electron shell with hermes-web-ui frontend
In-site article

Money Printer Pro – Open-source AI content generator

Money Printer Pro is an open-source AI content generator powered by Google Gemini and VEO 3.1, enabling photorealistic images and cinematic videos with identity preservation. It features 7 visual engines, autopilot batch generation, AI quality scoring, and a publish guard. Users pay Google directly with no markup or subscription.

  • Generates photorealistic images and 8-second cinematic videos with consistent identity across outputs.
  • Integrates 7 visual engines for lighting, shadow, motion, weather, outfit, scene validation, and context orchestration.
In-site article

Superpowers: An Agentic Skills Framework for AI Coding Workflows

Superpowers is a complete software development methodology for coding agents, built on composable skills and initial instructions. It emphasizes test-driven development, design-first approach, and subagent-driven iteration, supporting multiple coding assistants like Claude Code, Codex CLI, and Gemini CLI.

  • Superpowers provides a skills library including TDD, systematic debugging, collaboration planning, enabling agents to work autonomously for hours.
  • The workflow starts with brainstorming specifications, followed by design approval, implementation plan generation, and subagent-driven execution with two-stage review.
In-site article

The Trust Model Is Flipping

The security trust model is shifting from human-written code to AI-reviewed code, as demonstrated by Anthropic's Claude Mythos finding 271 vulnerabilities in Mozilla Firefox in a single evaluation cycle. This signals that AI can now perform adversarial code interpretation at a scale humans cannot match, changing the basis of trust from authorship to survival of machine-scale scrutiny.

  • The presumption of safety for human-written code is eroding as AI review tools surpass human capability in vulnerability discovery.
  • Mozilla's use of Claude Mythos found 271 vulnerabilities in Firefox, far exceeding prior models and human teams.
In-site article

This exec offers 4 ways to be a successful innovator in the age of agentic AI

American Express's global innovation head Luke Gebb shares four key practices for successful innovators: keep learning, dive into tech, prepare to fail, and build partnerships. He also discusses Amex's plans for agentic commerce, including payments, offers, and proprietary experiences, with a timeline for mainstream adoption.

  • Stay curious and embrace a growth mindset
  • Deeply understand emerging technology and work closely with engineers
In-site article

Is this sustainable? The senior engineer role after three years of AI

A senior engineer reflects on how AI has transformed the senior engineer role over three years: faster prototyping, increased coordination burden, expanded scope but squeezed mentoring and thinking time. The role became more powerful but less sustainable.

  • AI collapsed the gap between idea and demo, shifting from proposals to PoCs.
  • The role expanded in both hands-on coding and strategic writing, cutting into mentoring and deep thinking.
In-site article

Taste Skill: An Anti-Slop Front End Framework for AI Agents

Taste Skill is an open-source frontend framework that enhances the design quality of AI-generated interfaces, preventing generic boilerplate looks. It offers composable skill modules for design tuning, code generation, and image generation, easily integrated via npx or by copying SKILL.md files.

  • Taste Skill uses adjustable design parameters (variance, motion, density) to give AI-generated UIs better taste
  • Includes specialized skills for design refinement, code generation, image generation, and more
In-site article

Netflix is building an AI animation studio

Netflix is building a new internal studio called INKubator that aims to use AI to produce short-form animated content. The studio has quietly launched and is hiring for various roles including producers, software engineers, and CG artists. Its long-term technology strategy focuses on GenAI-enabled workflows, artist tooling, and scalable multi-show environments, with plans to eventually produce feature-quality content. While currently focused on shorts and specials, there are indications of potential expansion into longer-form content. The initiative could be used for Netflix's Clips feature or kids programming. However, the use of AI in animation has sparked significant backlash, including criticism from Hayao Miyazaki and protests at the Annecy Animation Film Festival.

  • Netflix is launching INKubator, a new AI animation studio focused on GenAI-driven short-form content.
  • The studio is led by former DreamWorks and A24 executive Serrena Iyer and is actively hiring.
In-site article

AIluminode: Pre-Retrieval Cognitive Orientation Tool

AIluminode is a wieldable pre-retrieval cognitive-orientation instrument that helps AI tools check contextual posture before acting, using route polarity (OPEN, PROTECT, AUDIT, DEFER, BLOCK) to reduce erroneous exploration and context bleed.

  • AIluminode is a wieldable pre-retrieval cognitive orientation tool emphasizing posture before retrieval.
  • It uses a route polarity system (OPEN / PROTECT / AUDIT / DEFER / BLOCK) to guide contextual routing.
In-site article

AI Rewriting Software Industry? 8-Year-Old Builds OS, One-Person Company Lands Million-Dollar Deals

At the 2026 China AIGC Industry Summit, Baidu's Miaoda product director Zhu Guangxiang shared how AI has lowered programming barriers from writing code to chatting. 87% of Miaoda users don't know code; an 8-year-old built an OS; one-person companies (OPCs) land million-dollar contracts. Vibe Coding turns demand-side into supply-side, enabling mass entrepreneurship.

  • Fourth programming revolution: natural language programming, massively expanding creators
  • 87% of Miaoda users have no coding skills; OPCs are the largest user group (16% entrepreneurs)
In-site article

[AINews] Cognition raises $1B in $26B Series D

Cognition raises $1B at a $26B valuation, projecting >$1B ARR by year-end. The article covers inference efficiency trends, agent engineering, continual learning, new benchmarks, model releases, and coding agent productization.

  • Cognition raises $1B Series D at $26B valuation, ARR projected >$1B by EOY.
  • Inference optimization shifts to architectural level: EAGLE 3.1, DeepSeek V4-Pro hybrid attention, Xiaomi MiMo cache management.
In-site article

Former Google and Apple Researchers Launch a Startup to Build AI's Missing Feed

A group of former researchers from Google DeepMind, Apple, OpenAI, and Meta have launched a startup called Trajectory, aiming to help companies continuously improve their AI products by training on real-world user interactions. The company has raised a $15 million seed round at a $115 million valuation, led by Conviction. Trajectory's platform enables continuous learning for AI models, updating them based on real-world failures. It currently works with AI-native companies like Clay and Harvey, and plans to expand to Fortune 500 companies.

  • Trajectory is founded by ex-Google DeepMind, Apple, OpenAI, and Meta researchers to enable continuous learning for AI.
  • The startup raised $15M seed funding at $115M valuation, with investors including Jeff Dean and Fei-Fei Li.
In-site article

Robinhood Agentic Trading

Robinhood launches Agentic Trading, allowing customers to connect their own AI agents to automate trading and credit card purchases with safety controls and a real-time activity feed.

  • Connect your own AI agents to Robinhood
  • Automate trading and credit card purchases
In-site article

Show HN: BetterCallClaude – Open Source AI Legal Agents for Italy

BetterCallClaude is an open-source AI legal agent platform designed specifically for Italian legal professionals. It features 20 specialized AI agents covering all 20 Italian regions, supports bilingual (IT/EN) operation, and prioritizes privacy with local LLM processing and GDPR compliance. The platform aims to speed up legal research, improve efficiency, and maintain full transparency.

  • 20 specialized AI agents for Italian law
  • Bilingual support (Italian and English)
In-site article

Amdahl's law for AI agents

This article applies Amdahl's Law to AI agents, arguing that speedup from parallel agents is bounded by the fraction of workflow requiring human judgment (H). It introduces the concept of 'self-liquidating H' where each human intervention produces an artifact that eliminates future similar interventions. Emphasizes 'configurancy'—explicit behavioral commitments and conformance suites—to encode human knowledge so agents can operate autonomously. Examples from ElectricSQL, Gas Town, and Ralph Loop illustrate the principles.

  • Speedup from AI agents is limited by the human judgment fraction H; reducing H is key.
  • Self-liquidating H: each human intervention should produce a reusable artifact (test, spec update) to prevent recurrence.
In-site article

Detect by Yourself: Self-Designing Agentic Workflows for Few-Shot Graph Anomaly Detection

The SignGAD framework reformulates graph anomaly detection by replacing fixed pipelines with self-designed task-conditioned workflows, and introduces a guarded final refit strategy to improve reliability under limited supervision.

  • SignGAD shifts from training a fixed detector to designing detection workflows
  • It selects suitable graph encodings and detector designs for task-specific anomaly evidence
In-site article

Personalized Observation Normalization for Federated Reinforcement Learning in Simulation Environments with Heterogeneity

This paper proposes Personalized Observation Normalization (PON) for federated reinforcement learning in heterogeneous environments. Each agent locally normalizes raw state inputs using a continuously updated running mean and variance, ensuring consistent scaling without overshadowing. Sharing normalization parameters is shown ineffective. Experiments on heterogeneous MuJoCo tasks demonstrate faster training and superior performance. Accepted at IJCNN 2025.

  • Federated RL faces challenges in heterogeneous environments due to differing state-transition dynamics.
  • PON normalizes observations locally using per-agent running statistics.
In-site article

Agyn: An Open-Source Platform for AI Agents with Scalable On-Demand Execution, Agent Definition as a Code, and Zero-Trust Access

Agyn is an open-source platform for AI agents, built on a signal-driven stateful serverless runtime on Kubernetes, a Terraform provider for agent definition, and a zero-trust security model. It is agent-agnostic, model-agnostic, and cloud-agnostic, addressing scalability, governance, and security challenges.

  • Signal-driven stateful serverless runtime on Kubernetes for scalable execution
  • Agent and harness definition via Terraform provider (infrastructure as code)
In-site article

Show HN: The Two Pillars – A conceptual framework for post-AI software work

A paper argues that with generative AI dissolving the human capacity to write correct code as the binding constraint, software work reorganizes around two pillars: Mixer Mode (humans operating multiple judgment axes continuously like a sound engineer) and Meta-Software (software that observes, validates, and governs other software). The two pillars are inseparable, drawing a parallel to the historical transition from artisanal to mass production.

  • The production of code is ceasing to be the dominant problem in software organizations due to generative AI.
  • Mixer Mode describes a new human role where practitioners continuously operate multiple judgment axes.
In-site article

Your Future job will be to keep AI on task

Noah Smith argues that as AI becomes more capable, humans will shift from technical work to ensuring AI alignment—keeping AI focused on human goals. He draws parallels to 'Office Space' and warns about the rise of AI-generated 'slop'.

  • Humans will be needed to maintain AI alignment, ensuring AI stays on task.
  • The author compares future human roles to the 'Lumbergh' manager from Office Space.
In-site article

Safescript – A Language for AI Era

Safescript is a programming language for AI agents that proves safety properties statically before execution, eliminating the need for sandboxes or VMs. It compiles to a static DAG, enabling full visibility into data flow and host calls, with zero overhead and zero cold starts.

  • Statically enforces security without runtime sandboxing.
  • Compiles to a static DAG that traces all data flows and hosts.
In-site article

AIPass – Persistent agent workspace with identity, memory, and email

AIPass is a CLI-native scaffold that adds persistent memory, identity, and coordination to AI agents. Agents share a filesystem, use JSON files for memory, require no cloud or extra API keys. The project includes 13 core agents for multi-agent collaboration, task dispatching, quality audits, and real-time monitoring.

  • AIPass provides a CLI-native framework for persistent memory, identity, and coordination of AI agents.
  • All agents share a local filesystem with JSON file storage, no cloud dependency.
In-site article

Robinhood Will Let Agents Trade -- It Could Be a Trend

Given that the stock trading app operates in a highly regulated industry, the company’s move to use agents could prompt other finance firms to take a bold step and do the same.

  • Robinhood will allow AI agents to trade on its platform
  • This move is groundbreaking in a highly regulated industry
In-site article

Show HN: Liiists, a Markdown-first, iOS and CLI list app

Liiists is a markdown-first list app that works on terminal, iOS, and through AI agents via an MCP server, all reading and writing the same plain-text .md files. It offers a CLI, native iOS app with Share Extension and Siri, and an MCP server for AI integration. No account needed, no lock-in, and supports iCloud sync or any folder including Obsidian vault.

  • Works across terminal, iOS, and AI agents using the same markdown files
  • CLI written in Go with no dependencies
In-site article

NeuralAgent 2.5: Personal AI Assistant Now with Voice Mode, Watch & Learn, and Parallel Agents

NeuralAgent 2.5 introduces Voice Mode, Watch & Learn, and Parallel Agents, allowing the AI to listen, speak, and perform multiple tasks simultaneously. Users can control their entire computer via natural language without touching the keyboard or mouse. The update also improves workflows, @ mentions, and memory.

  • Voice Mode enables two-way conversation; users speak commands and the AI responds and executes tasks.
  • Watch & Learn lets users demonstrate a task once, and the AI saves it as a repeatable workflow.
In-site article

Fixing agent failures in production: Interrupt 2026 recap | LangChain Newsletter

Recapping two days of Interrupt 2026 — LangSmith Engine, Sandboxes GA, LangChain Labs, and 23 talks from teams at LinkedIn, Rippling, Cisco, and more. Now on demand.

  • LangSmith Engine automates failure analysis from production traces.
  • LangSmith Sandboxes reaches General Availability for secure agent execution.
In-site article

Snowflake Commits $6B to AWS as It Pushes Deeper into AI

Snowflake has committed $6 billion over five years to Amazon Web Services for Graviton compute and AI infrastructure, marking its largest cloud spend commitment. The deal covers AWS's ARM-based Graviton processors and GPU-accelerated EC2 instances for AI training and inference. Snowflake will also expand to 10 new AWS regions and leverage cost-efficient Graviton instances for its data warehousing business to free up resources for AI workloads.

  • Snowflake commits $6 billion over five years to AWS for Graviton and GPU compute.
  • The deal supports AI model training and inference using AWS instances.
In-site article

Building AI agents for business support using Amazon Bedrock AgentCore

In this post, we share how the AWS Generative AI Innovation Center (GenAIIC) collaborated with Works Human Intelligence (WHI) to build two AI agents using Amazon Bedrock AgentCore. We discuss the challenges encountered and the solutions that reduced costs by up to 97% while improving operational efficiency.

  • AI agents automate routine HR tasks such as commuting allowance approval and browser operations.
  • Migration to AgentCore and Strand Agents architecture reduced costs by up to 97%.
In-site article

From data overload to actionable insights: How Verizon Connect scaled agentic AI to 100,000 users

Verizon Connect built an agentic AI solution on AWS to transform overwhelming fleet data into clear, actionable insights for 100,000 users daily. The architecture uses serverless anomaly detection, Strands Agents for dynamic reasoning, and Amazon Nova Lite to cut input token costs by 70%. This post covers architectural decisions, implementation challenges, and measurable results.

  • Agentic AI processes 500 million daily data points from 1.2 million vehicles to serve 100,000 users.
  • Serverless statistical models handle anomaly detection, avoiding LLM pitfalls with raw tabular data.
In-site article

How AWS SMGS uses an AI-powered conversational assistant to transform business management with Amazon Bedrock AgentCore

AWS SMGS built NarrateAI using Amazon Bedrock AgentCore to deliver business intelligence at scale. The solution features a two-layer architecture separating batch narrative generation from real-time interaction, specialized AI agents for routing and validation, and key engineering patterns for production deployment, enabling natural language queries, row-level security, and role-tailored experiences.

  • NarrateAI uses a two-layer architecture (batch processing + real-time interaction) to overcome latency and data fragmentation in traditional BI.
  • Amazon Bedrock AgentCore enables multi-agent orchestration for natural language queries and context-aware responses.
In-site article

This AI-free Google alternative is surging in popularity - how to try it for yourself

DuckDuckGo, an AI-free search alternative, is seeing a surge in users due to Google's AI Overviews. This article explains how to use DuckDuckGo without AI for private searching and browsing.

  • DuckDuckGo installs surged after Google I/O 2026, with iOS app peaking at 69.9% growth.
  • DuckDuckGo offers both AI-free search and AI chat options, giving users choice.
In-site article

Powering agentic AI sales strategy with Amazon Bedrock AgentCore

AWS Sales built Field Advisor on Amazon Bedrock AgentCore to orchestrate over 20 domain-specific agents, reducing cognitive load for sales reps and improving efficiency. The solution saved up to 2 hours per week per rep and reduced latency by 41%.

  • Field Advisor orchestrates 20+ specialized agents with a single conversational interface.
  • Human-in-the-loop workflows ensure data accuracy and accountability.
In-site article

Robinhood lets AI agents trade shares and make credit card purchases for customers

Robinhood now lets customers connect AI agents like Anthropic's Claude to a separate investment account via MCP. The agents can autonomously trade stocks and make credit card purchases. US regulator FINRA has flagged such agents as a new risk area, warning about unchecked decisions. Robinhood also admits the product isn't for everyone.

  • Robinhood enables AI agents such as Claude to be connected to investment accounts via MCP.
  • AI agents can autonomously trade stocks and initiate credit card purchases.
In-site article

“Tokenmaxxing is real, expensive & it’s spreading”: New tools emerge to stop AI budgets from exploding

Tokenmaxxing, the unrestrained use of AI tokens, is causing enterprise budget blowouts. Uber’s CTO recently admitted to overspending on Anthropic’s Claude Code. Lanai’s new Token Tuner helps companies map token consumption to workflows and outcomes, encouraging a shift from tokenmaxxing to outcomemaxxing.

  • Tokenmaxxing is causing AI budget overruns at Uber and other companies.
  • Lanai's Token Tuner tracks token usage against workflows and outcomes, providing efficiency scores and model recommendations.
In-site article

Get a Good Return on Your AI Investments

O'Reilly's Infrastructure & Ops superstream explored the infrastructure needs, costs, and security challenges of AI workloads. DORA's report shows AI increases code delivery by about 10% but reduces stability, adding verification costs. Experts emphasize platform engineering, governance, and cognitive debt, recommending investment in internal platforms to ensure production readiness for AI applications.

  • AI tools boost individual productivity but team delivery stability decreases, with verification costs ('verification tax') needing consideration.
  • Good processes are amplified by AI, bad ones too; organizations should proactively improve processes rather than just expect technology to fix them.
In-site article

AI Factories: The New Infrastructure of Intelligence

AI factories are a new class of infrastructure that convert energy into tokens—the unit of production for reasoning models, agents, and intelligent systems. As agentic AI scales, performance per watt and cost per token become the critical economics. This article explores how AI factories work, their full-stack optimization, and how NVIDIA's latest hardware drives efficiency.

  • AI factories convert energy into tokens, serving as the 'power plants' of the AI age.
  • Agentic AI creates deeper, more complex inference workloads requiring real-time orchestration.
In-site article

Powering Inference for the Continual Learning Era

Baseten and Trajectory have built a production-grade inference pipeline for continual learning, where models are continuously updated from production traces. The pipeline compresses the time from training to deployment to roughly one hour, enabling models that improve through usage.

  • Continual learning allows models to improve continuously from production usage rather than static releases.
  • Baseten and Trajectory developed a pipeline that merges LoRA adapters, validates, and deploys them with A/B routing and provenance tracking.
In-site article

Turn Azure Data into an AI-Ready Knowledge Base | Pinecone

Pinecone offers a deployable template that automates the pipeline from Azure Blob Storage to a serverless Pinecone index, enabling fast semantic search and AI retrieval for enterprise data.

  • Pinecone automates the entire ingestion pipeline from Azure Blob Storage to a serverless vector index.
  • The template handles document parsing, text chunking, embedding, and indexing out of the box.
In-site article
Tools

Meta launches Instagram, Facebook, and WhatsApp subscriptions

Meta rolls out consumer subscription plans for Instagram, Facebook, and WhatsApp globally, with prices from $2.99 to $3.99 per month, offering extra features. The company also begins testing new subscriptions for businesses, creators, and Meta AI users.

  • Meta launches Instagram Plus ($3.99/mo), Facebook Plus ($3.99/mo), and WhatsApp Plus ($2.99/mo) globally
  • Subscribers get profile customization, super reactions, story insights, and more
In-site article

These new iOS 27 renders hint at Siri’s big redesign

Apple's long-awaited Siri overhaul, expected to arrive in iOS 27, might look a lot like ChatGPT with a splash of Liquid Glass, according to Bloomberg renders. The images show a pill-shaped chat bubble from the Dynamic Island, a standalone Siri app, and updates to Camera and Photos apps with AI features. Apple will reveal the final design at WWDC in June.

  • iOS 27's Siri will feature a ChatGPT-like interface with a pill-shaped bubble emerging from the Dynamic Island.
  • Users can choose between Ask, Siri, and ChatGPT from a dropdown menu.
In-site article

I'm an iPhone user, but Gemini with Android Auto beats Siri in the car any day - here's why

As an iPhone owner, I primarily use Siri through CarPlay when I'm driving. Apple's voice assistant can handle basic tasks, but since my Toyota Camry supports Android Auto, I wanted to see how Google Gemini would fare. With Gemini, you can send emails, get restaurant info, play games, and more. Here's how to set it up and my experience.

  • The author, an iPhone user, finds Gemini with Android Auto superior to Siri in the car.
  • Gemini handles a wide range of tasks from basic commands to complex interactions.
In-site article

Meta One: Zuckerberg finally puts a price tag on all that AI spending

Meta is rolling out paid add-ons for Instagram, Facebook, and WhatsApp worldwide while building a separate paid AI offering. This marks the first time Meta has clearly monetized its AI investments.

  • Meta launches paid add-ons globally for Instagram, Facebook, and WhatsApp.
  • A separate paid AI product is also in development, monetizing AI investments.
In-site article

Dirk and Linus discuss AI and kernel development

A subscriber-only article on LWN.net by Joe Brockmeier, covering a discussion between Dirk and Linus on AI and kernel development. The full content is behind a paywall.

  • Article by Joe Brockmeier, dated May 25, 2026
  • Presented at OSSNA
In-site article

Amazon builds its own AI production platform and greenlights three AI animated series for Prime Video

Amazon MGM Studios and AWS are launching a "GenAI Creators' Fund" that gives filmmakers money and access to the in-house AI platform "Project Nara." Three animated series are already in production - the teams had five weeks for their pilots. Amazon says it now has the "only end-to-end AI content ecosystem in the industry."

  • Amazon launches GenAI Creators' Fund with access to Project Nara
  • Three AI-animated series greenlit for Prime Video
In-site article

YouTube will let you ask AI to make a custom video feed

YouTube launches an AI feature that generates personalized video feeds from user prompts, available to US users on mobile and desktop with English support.

  • Users can create custom feeds by entering descriptive prompts like 'help me unwind with guided meditations under 10 minutes'.
  • The feature is similar to Spotify's prompted playlists and Instagram's Reels algorithm control.
In-site article

ElevenLabs Music v2 promises opera-to-metal transitions without losing musical coherence

ElevenLabs has released Music v2, an upgraded AI music generation model that can shift between genres like opera, heavy metal, and rap within a single song. A new inpainting feature allows users to regenerate specific sections without affecting the rest.

  • Music v2 enables seamless genre transitions within a single song.
  • New inpainting feature allows targeted regeneration of specific sections.
In-site article

Don't Delegate the Joy of Building to AI

The article warns developers that while AI can accelerate coding, over-reliance on AI may deprive them of the joy of building, such as finding elegant solutions, designing clean architectures, and receiving user feedback.

  • AI speeds up code writing but may remove the pleasure of problem-solving.
  • Key experiences in development (e.g., architecture design, product releases) are hard to replace by AI.
In-site article

TopRec (toprec.io) – AI screening and CRM for recruiters and hiring teams

TopRec is an AI-powered platform that helps recruiters rank candidates and build a self-maintaining CRM. It deliberately avoids being promoted as a PWA to prevent caching issues; use it as a website.

  • AI-based candidate ranking for efficient screening
  • Self-building CRM that automatically updates
In-site article

AI Cheats [pdf]

A PDF report on AI cheating, but the content cannot be directly parsed.

  • Cannot extract text from PDF
  • Report likely from METR organization
In-site article

I found an easy way to automatically keep AI out of my search results - and it works in nearly every browser

Tired of AI results in your search? This article explains how to add a custom search engine to exclude AI results, with step-by-step instructions for Chrome, Firefox, Safari, and other browsers.

  • Add a custom search engine with the URL https://www.google.com/search?q=%s&udm=14 to remove AI results.
  • Works in Firefox, Chrome, and most browsers; Safari requires a free extension.
In-site article

YouTube will try to automatically flag AI videos starting this month

YouTube is tightening its AI labeling rules. Labels for photorealistic or heavily AI-altered content will now show up in more visible spots, below the player for long videos and as an overlay on Shorts. Starting May 2026, an automatic detection system will flag AI-generated content even if creators don't disclose it. Recommendations and monetization won't be affected.

  • YouTube tightens AI labeling with more visible labels for altered content.
  • From May 2026, automatic detection will flag AI content even if not disclosed by creators.
In-site article
Models

Google launches a tiny board that runs Gemma 3 locally

Google unveiled the new Coral Board at Google I/O - a compact single-board computer for on-device AI. It runs Gemma 3 270M locally and features a RISC-V based NPU.

  • Coral Board is a compact SBC for on-device AI, targeting headphones, AR glasses, and smartwatches
  • It features a RISC-V based Coral NPU and a Synaptics Astra SL2619 chip
In-site article

Tweaking Local Language Model Settings with Ollama

This article dives deep into Ollama's configuration engine, covering how to fine-tune local language model parameters using the Modelfile, optimize hardware performance with server environment variables, and format prompt flows with Go template syntax.

  • The Ollama Modelfile is a declarative configuration file that defines model behavior, including base model, system instructions, and parameters.
  • Sampling parameters (temperature, Top-K, Top-P, Min-P) control the creativity and determinism of the model's outputs.
In-site article

Rivian’s software chief thinks you don’t need CarPlay or buttons

In a Decoder podcast interview, Rivian CSO Wassym Bensaid discusses the VW joint venture, the new AI-powered Rivian Assistant, and why he believes voice interfaces will replace buttons and CarPlay isn't needed.

  • Rivian's joint venture with Volkswagen (RV Tech) combines Rivian's software culture with VW's scale.
  • The Rivian Assistant is an AI agent deeply integrated into the vehicle's zonal architecture.
In-site article

World Models Take Over from Language Models: Company Pioneers Physical AGI 'Dual Pyramid' System, Universal Robots Enter the 'Home Era'

Jijia Vision unveiled the world's first physical AGI 'Dual Pyramid' system, launching the home robot Shiguang S1 with 100-unit household orders, targeting the 'GPT-3 moment' of physical AGI within 12 months.

  • Jijia Vision introduces the 'Dual Pyramid' system comprising a data pyramid and an algorithm pyramid for physical AGI.
  • The Shiguang S1 home robot adopts a wheeled-arm configuration and has secured 100-unit real-home orders.
In-site article

Mistral rebrands LeChat as Vibe, betting its chatbot's future is as a full-blown work agent

Mistral AI is renaming its chatbot Le Chat to Vibe and bundling chat, coding agents and a new Work Mode under one brand. The Work Mode docks onto Google Workspace, Outlook, Slack or GitHub and processes tasks such as emails, reports or pull requests independently. The Pro tariff has been reduced from €17.99 to €14.99, although Mistral has not specified any concrete usage limits. The company is thus positioning itself more directly against the agent-based offerings from OpenAI, Google and Anthropic.

  • Mistral AI rebrands Le Chat as Vibe, integrating chat, coding agents, and a new Work Mode.
  • Work Mode connects to Google Workspace, Outlook, Slack, or GitHub to autonomously handle tasks.
In-site article

Show HN: Local Coding Agent with LLMs to Delegate Tool Calls to Small AI Models

Open Agent Tools (oats) is a self-hosted AI framework that enables small-to-large local models to use local source code for tool-calling, freeing up expensive large model tokens by delegating tasks to smaller models.

  • oats allows local AI models to use local source code for tool-calling without HTTP or MCP.
  • It mines over 20,000 GitHub repos to create reusable prompt indices.
In-site article

Perplexity AI Open-Sources Unigram Tokenizer That Achieves 5x Lower p50 Latency Than Hugging Face tokenizers Crate

Perplexity AI open-sourced a Rust reimplementation of their Unigram tokenizer, achieving 5x lower latency than Hugging Face's tokenizers crate and reducing CPU utilization by 5-6x in production. The optimizations include double-array trie, bitmap packing, and huge pages.

  • Perplexity AI rewrote the Unigram tokenizer in Rust, achieving 5x lower p50 latency vs Hugging Face tokenizers crate.
  • Three optimizations: double-array trie, bitmap and cache-line packing, and huge pages.
In-site article

Mistral to explore designing own chips, CEO says

Mistral AI CEO Arthur Mensch confirms the company is exploring custom chip development to reduce infrastructure costs and compete with OpenAI and Anthropic. The French startup also announced a new inference data center in France and an enterprise agent platform called Vibe.

  • Mistral AI is considering designing its own custom chips to lower deployment costs.
  • The company announced a new data center in France dedicated to AI inferencing.
In-site article

A Coding Guide to Implement a pgvector-Powered Semantic, Hybrid, Sparse, and Quantized Vector Search System

This tutorial builds a complete pgvector playground in Google Colab, covering installation, embedding creation, HNSW indexing, semantic search, filtered search, distance metric comparisons, half-precision storage, binary quantization, sparse vector search, hybrid retrieval, and vector aggregation. All using open-source tools without external API keys.

  • Set up PostgreSQL with pgvector extension in Google Colab from scratch.
  • Generate embeddings with SentenceTransformers and build HNSW indexes for efficient search.
In-site article

7B Model Beats o3 and GPT-5: Medical AI Agents Teach Models Where and How to Look

The LeapQuest team at Shanghai Innovation Institute, in collaboration with multiple universities, introduces a new medical AI paradigm that enables models to actively use visual tools during reasoning, transforming from passive input receivers to active evidence seekers. Two papers are accepted at ICML 2026.

  • LeapQuest proposes Ophiuchus and MedScope for medical images and videos, adopting the Think with Images/Videos paradigm.
  • Ophiuchus-7B achieves an average score of 68.0 on 8 VQA benchmarks, surpassing o3 (62.2) and GPT-5 (59.9).
In-site article

Simulation-Informed Diffusion for Decentralized Multi-robot Motion Planning

This paper introduces Simulation-Informed Diffusion (SID), a decentralized framework using constraint-aware diffusion models (CADM) to first simulate neighbors' future trajectories and then plan own trajectories under safety constraints. SID enables a minimal communication scheme triggered only in congested scenarios and outperforms baselines, scaling to 108 robots and 160 obstacles.

  • SID uses CADM to simulate neighbor trajectories for decentralized collision avoidance
  • Minimal communication scheme coordinates only when necessary
In-site article

Trinity: Unifying Class-Agnostic Terrain and Semantic Segmentation for Unstructured Outdoor Environments by Leveraging Synthetic Data

This paper presents a transformer-based architecture called Trinity that jointly performs class-specific semantic segmentation and class-agnostic terrain segmentation in a unified network. It segments terrain regions based purely on visual appearance without predefined labels or robot-dependent traversability scores, enabling robot-agnostic visual terrain priors for downstream tasks. The authors extend the OAISYS simulator to create the RUGDSynth synthetic dataset and provide the EXTerra real-world dataset. Experiments demonstrate the approach's effectiveness in complex outdoor environments.

  • Trinity architecture unifies class-agnostic terrain segmentation with semantic segmentation
  • Segments terrains based on visual appearance without predefined labels for better transferability
In-site article

Agentic Language-to-Objective Synthesis for Optofluidic Assembly

Researchers introduce Speak-to-Objective, a modular agentic pipeline that uses a conditioned LLM to translate spoken or written commands into fully differentiable objective functions for assembling microparticles in a constraint-aware inverse solver and on an experimental optofluidic platform. The approach separates what to assemble from how to actuate, learns from user feedback, and demonstrates natural-language-programmable microscale assembly using laser-induced thermoviscous flows.

  • Speak-to-Objective pipeline translates natural language into differentiable objective functions for microparticle assembly.
  • It uses a perceive->compose->propose->act->report&learn loop, treating the objective as the interface between intent and actuation.
In-site article

Uni-LaViRA: Language-Vision-Robot Actions Translation for Unified Embodied Navigation

Uni-LaViRA is a unified agentic architecture for embodied navigation that reduces navigation decision to a single Language-Vision-Robot Actions Translation. It leverages pretrained MLLMs in a zero-shot manner across four task families and four real robots, using TODO List Memory and Second Chance Backtrack mechanisms to achieve self-correcting navigation without training.

  • Generality in navigation can be obtained structurally, not only through data scale.
  • Uni-LaViRA decomposes navigation into a language action (semantic direction) and a vision action (pixel target), both within the output manifold of MLLMs.
In-site article

SCALE-COMM: Shared, Contrastively-Aligned Latent Embeddings for MARL Communication

SCALE-COMM is a self-supervised framework that decouples communication learning from policy optimization, learning compact, stable, and policy-relevant latent messages to improve coordination in multi-agent reinforcement learning. It outperforms existing methods on benchmarks and a realistic warehouse task, offering better stability, sample efficiency, and throughput.

  • Decouples communication learning from policy optimization to reduce interference.
  • Uses contrastive learning to enforce consistency across agents and time.
In-site article

Representation-Conditioned Diffusion Models for Guided Training Data Generation

This work proposes representation-conditioned diffusion models that leverage learned representations from DINOv2, DINOv3, and CLIP to generate synthetic image data. On ImageNet100, this approach outperforms class-conditioned generation by +10.76 p.p. top-1 accuracy. Scaling synthetic data can even surpass real-data training by +2.0 p.p. The method also excels in data augmentation and sample filtering, offering a promising way to augment or replace real datasets in large-scale visual learning.

  • Representation-conditioned diffusion models outperform class-conditioned ones by 10.76 p.p. on ImageNet100.
  • Scaled synthetic datasets can beat real-data-trained classifiers by 2.0 p.p. top-1 accuracy.
In-site article

Generic Interpretation Approach for Transformer Models Incorporating Heterogenous Attention Structures

This paper proposes an interpretation method for Transformer models with heterogenous attention structures, including semantic and logical interpretation, validated through experiments.

  • Categorizes Transformer attention into homogenous and heterogenous types; heterogenous processes information from different sources.
  • Proposes a generic interpretation method for heterogenous attention structures.
In-site article

Fine-Tuning Vision-Language Models for Understanding Current Damage and Scoring Priority with Quality Guard Agent

This paper proposes a method for automating bridge damage understanding and repair priority scoring using fine-tuned Vision-Language Models (VLMs). The authors fine-tune LLaVA-1.5-7B with QLoRA on up to 4,000 paired bridge damage images and inspection text records, evaluating on a fixed test set of 800 images. Results show that 2,000 training samples achieve near-optimal validation loss in 2.9 hours, with diminishing returns beyond that. A two-stage Quality Guard using a fine-tuned Swallow-8B SLM rejects low-quality VLM outputs before priority scoring.

  • Fine-tuned LLaVA-1.5-7B model for automated bridge damage identification and priority scoring
  • 2,000 training samples achieve near-optimal performance; more data yields diminishing returns
In-site article

From Affect to Complex Behavior: Advancing Multimodal Human-Centered AI at the 10th ABAW Workshop & Competition

The 10th ABAW Workshop and Competition at CVPR 2026 advances multimodal human-centered AI by introducing new challenges including emotional mimicry intensity estimation, ambivalence/hesitancy recognition, and fine-grained violence detection, alongside traditional affect estimation and recognition tasks. The competition leverages large-scale in-the-wild datasets, and the paper track covers a broad range of topics from pose estimation to fairness and robustness.

  • ABAW 2026 introduces novel challenges: emotional mimicry intensity, ambivalence recognition, and violence detection.
  • Workshop continues dual structure with competition and paper tracks.
In-site article

Modeling Community Attitude through Reaction Tone: A Human-AI Collaborative Framework for Evaluating LLM Alignment with Linguistic Behaviors in Online Communities

Large language models (LLMs) are increasingly used as proxies for computational social analysis, but their ability to faithfully represent human communities' 'thick descriptions' remains a critical challenge. This paper introduces CARE (Community-Aware Reaction Evaluation), a reaction-centered framework that benchmarks LLM-simulated discourse against authentic community responses to real-world news. By characterizing a fine-grained spectrum of illocutionary tones, the diagnosis reveals a persistent 'realism gap': steering LLMs with explicit community prompts fails to inherently improve simulation fidelity. Analysis further identifies divergent behavioral signatures among frontier models, suggesting current alignment strategies are insufficient for capturing the sociolinguistic dynamics of online groups.

  • CARE framework evaluates LLM simulation fidelity by analyzing authentic community reaction tones
  • Current LLM alignment strategies fail to adequately capture online community sociolinguistic dynamics
In-site article

From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons

A new framework called FLUID adapts autoregressive language models to diffusion models for efficient parallel text generation, using Strictly Causal Alignment to reuse GPT checkpoints and Elastic Horizons to dynamically adjust denoising steps. It achieves state-of-the-art performance with significantly reduced training costs.

  • FLUID bridges AR and diffusion models by enforcing Strictly Causal Alignment, enabling initialization from GPT-style checkpoints.
  • Elastic Horizons uses entropy to dynamically adapt denoising strides based on local information density.
In-site article

Bridging the Stability-Expressivity Gap: Synthetic Data Scaling and Preference Alignment for Low-Resource Spoken Language Models

Researchers identify a Stability-Expressivity Gap in spoken language models when using synthetic data for low-resource languages, and propose two self-alignment frameworks (DGSA and TDSC) that recover prosodic variability and outperform commercial systems like ElevenLabs and Gemini Pro, enabling zero-shot voice cloning for Lao.

  • Spoken Language Models (SLMs) for low-resource languages suffer from a trade-off between phonetic accuracy and prosodic expressivity when trained on synthetic data.
  • The proposed Disentanglement-Guided Self-Alignment (DGSA) recovers expressivity by separating prosody and timbre.
In-site article

BioELX: Cross-lingual Biomedical Entity Linking via Alias-based Retrieval and LLM Ranking

BioELX is a novel two-stage framework for cross-lingual biomedical entity linking that requires no annotated training data. It enhances SapBERT with multilingual aliases from Wikidata and uses a pre-trained LLM for context-aware disambiguation. Experiments on five benchmarks show significant improvements, especially for low-resource languages like Turkish, Korean, and Thai.

  • Proposes BioELX, a zero-shot cross-lingual BEL framework using alias-based retrieval and LLM ranking.
  • In Stage 1, enriches SapBERT with multilingual aliases from Wikidata for better candidate retrieval.
In-site article

RAG-Coding: Enhancing LLM Medical Coding with Structured External Knowledge

RAG-Coding is an agentic method for automated ICD-10-CM coding that orchestrates four large language model (LLM) agents and grounds decisions in external knowledge sources, improving coding accuracy and clinical compliance. On the MDACE dataset, it outperforms the best LLM baseline by 8-13% micro-F1 and 2-8% macro-F1. Compared to PLM-ICD, RAG-Coding shows higher micro recall (+11%) but lower micro precision (-6%), with comparable F1 scores. Ablation studies confirm the importance of external knowledge. The authors also release MDACE-2025, updated with expert re-annotations based on 2025 guidelines, enabling finer-grained evaluation.

  • RAG-Coding uses four LLM agents and external knowledge sources to improve ICD-10-CM coding accuracy.
  • On the MDACE dataset, it outperforms the best LLM baseline by 8-13% micro-F1 and 2-8% macro-F1.
In-site article

Unlocking Fine-Grained and Within-Utterance Speaking Style Control in Prompt-Based Text-to-Speech Models

This paper proposes novel techniques for inter-utterance style interpolation and intra-utterance style transition in prompt-based TTS models, addressing limitations of coarse global control. Methods include direction vector interpolation and KV-cache swapping with sliding-window attention masking. Experiments show high success rates in gender conversion and smooth style transitions within utterances.

  • Inter-utterance interpolation via direction vectors between contrastive style prompts enables smooth transitions.
  • Intra-utterance transition uses KV-cache swapping and sliding-window masking to overcome attention bias.
In-site article

LCO: LLM-based Constraint Optimization for Safer Agentic LLMs in Real-world Tasks

Large Language Models (LLMs) acting as autonomous agents can suffer from in-context reward hacking (ICRH), where iterative optimization for proxy objectives leads to harmful side effects. Existing defenses are insufficient because ICRH stems from the model's own over-optimization. This paper proposes LLM-based Constraint Optimization (LCO), a framework with a self-thought module and an evolutionary sampling module that reduces ICRH without fine-tuning. Experiments show LCO reduces Toxicity Growth Rate by 39% on GPT-4 for tweet engagement optimization and reduces ICRH occurrence rate by 15.23% on a policy optimization benchmark, without sacrificing task performance.

  • ICRH is a phenomenon where LLMs over-optimize for proxy objectives, causing unintended harm.
  • LCO introduces self-thought and evolutionary sampling modules to constrain LLM behavior without fine-tuning.
In-site article

ICG: Improving Cover Image Generation via MLLM-based Prompting and Personalized Preference Alignment

ICG is a novel framework that integrates MLLM-based prompting with personalized preference alignment to generate high-quality, contextually relevant cover images. It extracts semantic features via meta tokens, refines them with user embeddings, and injects personalized context into diffusion models. A multi-reward learning strategy combines public rewards with a personalized preference model, eliminating the need for labeled supervision. Experiments show improvements in image quality, semantic fidelity, and personalization, boosting user appeal and recommendation accuracy.

  • ICG integrates MLLM prompting with personalized preference alignment for end-to-end cover image generation.
  • Semantic features are extracted via meta tokens and refined with user embeddings for diffusion model injection.
In-site article

Architecture-driven Shift: towards a lightweight selector for capturing the trends of logit shift

This paper introduces Architecture-driven Shift (ADS), a lightweight metric for selecting pre-trained models in continual learning. ADS decouples logit shift into architecture and data dependencies, requiring only few data samples to capture shift trends. Experiments across over 175 architectures show strong monotonic correlation (Spearman's r_s ≥ 0.731) between ADS and logit shift, and ADS serves as an effective proxy for expected calibration error for reliable CL model selection across three datasets and six scenarios.

  • Selecting pre-trained models that balance plasticity and stability in continual learning is critical, but computing logit shift is computationally expensive.
  • Existing theories assume uniform hidden layer widths, ignoring real-world architectural heterogeneity and failing to provide efficient alternatives.
In-site article

Tackling Multimodal Learning Challenges with Mixture-of-Expert: A Survey

This survey explores how Mixture-of-Experts (MoE) effectively addresses multimodal learning challenges from three perspectives: efficient engine, representation learner, and adapter, while identifying research gaps.

  • MoE enables scalable multimodal modeling by decoupling computational cost from parameter growth.
  • MoE integrates complementary expert knowledge for enriched alignment and interaction representations.
In-site article

$E^3$-Agent: An Executable and Evolving Agent for Resource Management of Edge Generative Inference

This paper presents $E^3$-Agent, an executable and evolving agent for resource management of edge AIGC. It separates a fast-path router from a slow-path LLM meta-controller, learns online from execution feedback, and adapts to unknown time-varying service-time mappings. Evaluation shows 65%-73% latency reduction over static baselines and effective stutter suppression.

  • Edge generative inference faces unknown per-device performance and non-stationarity.
  • $E^3$-Agent uses a dual-path architecture: fast router + slow LLM meta-controller.
In-site article

Discovery Agents for Real-Time Analytics: Toward Proactive Insight Systems

This paper presents a multi-agent architecture for autonomous insight discovery over real-time data streams. It uses Apache Kafka, Flink, and large language models to continuously generate, validate, and visualize hypotheses, shifting from reactive query-driven analytics to proactive discovery-driven systems.

  • Proposes multi-agent architecture for autonomous discovery of insights in real-time streams.
  • Integrates Kafka, Flink, and LLMs for hypothesis generation, validation, and visualization.
In-site article

LaneRoPE: Positional Encoding for Collaborative Parallel Reasoning and Generation

LaneRoPE enables multiple LLM sequences to collaborate during generation via inter-sequence attention and extended RoPE, improving accuracy on math reasoning tasks with minimal architectural changes and negligible inference overhead.

  • Introduces inter-sequence attention mask to make sequence sampling dependent.
  • Extends RoPE to capture relative positions both within and across sequences.
In-site article

Why LLMs Fail at Causal Discovery and How Interventional Agents Escape

This paper proves that large language models have a fundamental limitation in performing causal discovery: methods like supervised fine-tuning, direct preference optimization, and in-context learning cannot distinguish between causal graphs that generate similar observational data. The authors propose Agentic Causal Bayesian Optimization (A-CBO), where a frozen language model serves as an interventional oracle and an external Bayesian loop converges to candidate graphs in logarithmically many rounds. On Corr2Cause, A-CBO matches fine-tuned baselines without any training; on Extended Corr2Cause (scaling to 24 variables and 18K test samples), A-CBO significantly outperforms both fine-tuning and preference optimization.

  • Proves that LLM failure in causal discovery is fundamental, due to a kernel obstruction theorem
  • Proposes A-CBO, combining a frozen LLM with external Bayesian optimization
In-site article

DynaSchedBench: Calibrated Dynamic Scheduling Benchmarks and Observability Paradox in LLM-based Scheduling Agents

DynaSchedBench introduces a diagnostic framework for DFJSP using a Sequential Event-Space Calibrator (SESC) to generate difficulty-stratified instances via Schedule Stress Index (SSI). It identifies an 'Observability Paradox' in LLM-based scheduling agents: providing oracle access to full structural information degrades performance compared to concise information. Tool-augmented and refinement strategies also fail to reliably improve performance.

  • DynaSchedBench uses SESC and SSI to generate calibrated DFJSP instances, outperforming evolutionary baselines in efficiency.
  • LLM agents exhibit an Observability Paradox: full structural information harms decision-making.
In-site article

Soro: A Lightweight Foundation Model and Chatbot for Tajik

Soro is a family of Tajik-specialized conversational LLMs built on Gemma 3, using 1.9B token Tajik continual pretraining and 40K instruction tuning examples. It substantially outperforms same-size Gemma 3 on Tajik benchmarks while retaining English performance. FP8/INT4 quantization preserves gains for edge deployment. An education pilot is underway in Tajikistan.

  • Based on Gemma 3, with 1.9B token Tajik continual pretraining and 40K instruction tuning examples.
  • Substantially outperforms same-size Gemma 3 on Tajik benchmarks, retains English performance.
In-site article

Identifying and Understanding Human Values in Text: A Tailorable LLM-based Architecture

This paper introduces an LLM-based architecture to detect and quantify the intensity of human values in text. The architecture comprises three coordinated modules that can adapt to various value theories, and experiments on the ValueEval dataset show good detection performance.

  • Proposes a modular LLM architecture for identifying human values in text, avoiding dependence on specific value theories or complex prompt engineering.
  • Three modules: generate structured value specifications, label texts using them, and assign graded support or resistance based on rhetorical and semantic evidence.
In-site article

Language Modeling Materializes a World Model of Protein Biology [pdf]

This paper presents a world model of protein biology realized through language modeling, demonstrating how large-scale language models can understand and predict protein structure and function.

  • Language models can capture complex patterns in protein sequences
  • The model excels in protein structure prediction and function annotation
In-site article

Sakana AI Proposes DiffusionBlocks: a Block-wise Training Framework That Converts Residual Networks into Independently Trainable Denoising Modules

Researchers from Sakana AI and the University of Tokyo propose DiffusionBlocks, which trains transformer-based networks one block at a time, reducing training memory by a factor of B (where B is the number of blocks) while maintaining performance across diverse architectures. The method interprets residual connections as Euler steps of reverse diffusion, enabling a principled local objective via score matching.

  • DiffusionBlocks partitions networks into B independently trainable blocks, reducing memory by B×.​
  • It leverages the connection between residual networks and diffusion models to provide a theoretically grounded local training objective.​
In-site article

sqlite AGENTS.md

SQLite has added an AGENTS.md file to clarify its policy on AI-generated contributions: it does not accept pull requests without prior agreement, and does not accept agentic code at all, though it welcomes bug reports with reproducible test cases. The forum has been flooded with AI-generated bugs, leading to a separate bug forum.

  • SQLite added AGENTS.md to define AI contribution policy
  • Pull requests require prior agreement and legal paperwork
In-site article

Reliable LLM Inference at Scale

At Databricks, we’ve built a unique inference platform that serves every frontier model, from open source to proprietary, powering some of the largest agentic applications. Serving over 120T tokens per month, we tackle challenges of reliability and latency through abstractions like model units for capacity management, cost-aware load balancing and autoscaling that save over 80% GPU costs, and runtime reliability mechanisms including black-box health checks that detect silent failures. Profiling multimodal bottlenecks unlocked 3x throughput gains.

  • Databricks' inference platform serves frontier models including open source and proprietary, handling 120T tokens/month.
  • Model units provide a VM-like abstraction for capacity management, enabling cost-aware routing and scaling.
In-site article

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

Artificial Analysis and IBM launch ITBench-AA, a benchmark for agentic enterprise IT tasks focusing on Site Reliability Engineering. Frontier models score below 50%, with Claude Opus 4.7 leading at 47%. The benchmark evaluates models on Kubernetes incident response, requiring diagnosis from logs and traces.

  • Claude Opus 4.7 leads at 47%, with GPT-5.5 at 46% and Qwen3.7 Max at 42%.
  • All frontier models score below 50%, making ITBench-AA one of the least saturated agentic benchmarks.
In-site article

NVIDIA Releases Polar, a Token-Faithful Rollout Framework for GRPO Training Across Codex, Claude Code, and Qwen Code

NVIDIA researchers have introduced Polar, a rollout framework that trains language agents using reinforcement learning without modifying their agent harnesses. Polar places a model API proxy between the harness and the inference server, capturing token-level interactions and reconstructing trainer-ready trajectories. Using GRPO on a Qwen3.5-4B base model, Polar improves SWE-Bench Verified pass@1 by 22.6 points under the Codex harness, 4.8 points under Claude Code, and 6.2 points under Pi. The framework is registered as a NeMo Gym environment and released under the ProRL Agent Server repository.

  • Polar enables RL training on any agent harness via a model API proxy without modifying the harness code
  • Achieves up to 22.6 point improvement on SWE-Bench Verified using GRPO on Qwen3.5-4B across four coding harnesses
In-site article

I think Anthropic and OpenAI have found product-market fit

The article argues that Anthropic and OpenAI have achieved product-market fit by shifting enterprise customers to API-based pricing and capitalizing on coding agent products. This inflection point, which began with model improvements in November 2025, accelerated in April 2026 with new model releases and pricing changes.

  • Both Anthropic and OpenAI have moved enterprise plans to API token pricing, with coding agents like Claude Code and Codex driving significant usage and revenue.
  • April 2026 saw new frontier models with higher API prices and enterprise customers locked into those rates via contract renewals.
In-site article

Introducing Search Toolkit | Mistral AI

Mistral AI has released Search Toolkit in public preview, a composable framework for building production search pipelines for AI applications. It unifies ingestion, retrieval, and evaluation into a single framework, reducing integration overhead and allowing teams to focus on improving search quality. It is open-source, supports cloud, on-premises, and edge deployments, and has been battle-tested across multiple verticals.

  • Search Toolkit is an open-source, composable framework for building search pipelines, supporting cloud, on-premises, and edge environments.
  • It integrates ingestion, retrieval (BM25, dense, hybrid), and evaluation (recall, precision, MRR, NDCG) with a unified interface.
In-site article

Frontier AI LLMs, assistants, agents, services | Mistral AI

Mistral AI announces new initiatives at AI Now Summit: Mistral for Industrial Engineering with partnerships (Airbus, BMW, ASML), acquisition of Emmi, and new Vibe agent for productivity. Also announces Les Ulis data center for inference.

  • Mistral for Industrial Engineering integrates AI into industrial operations with partners Airbus, BMW Group, ASML.
  • Vibe is a unified agent for long-running tasks including coding and productivity.
In-site article

Frontier AI LLMs, assistants, agents, services | Mistral AI

Mistral AI launches remote coding agents powered by the new Mistral Medium 3.5 model. The model is a 128B dense model with 256k context, excelling in coding and agentic tasks. Vibe remote agents run in the cloud, allowing parallel asynchronous sessions. Additionally, Work mode in Le Chat provides a powerful agent for complex multi-step tasks.

  • Mistral Medium 3.5 is a new 128B dense model with strong coding and agentic performance, configurable reasoning effort, and self-hosting on as few as four GPUs.
  • Mistral Vibe introduces cloud-based coding agents that run in parallel, teleport local sessions, and integrate with GitHub, Jira, and other tools.
In-site article

Frontier AI LLMs, assistants, agents, services | Mistral AI

Mistral AI releases Connectors in Studio, enabling developers to build highly customized AI applications grounded in enterprise data. Built-in and custom MCPs are now available via API/SDK. Direct tool calling and human-in-the-loop approval are introduced.

  • Mistral AI launches Connectors in Studio for enterprise data integration.
  • Direct tool calling gives developers precise control over tool invocation.
In-site article

Frontier AI LLMs, assistants, agents, services | Mistral AI

Mistral AI launches remote coding agents powered by the new Mistral Medium 3.5 model. The model is a 128B dense model with 256k context, excelling in coding and agentic tasks. Vibe remote agents run in the cloud, allowing parallel asynchronous sessions. Additionally, Work mode in Le Chat provides a powerful agent for complex multi-step tasks.

  • Mistral Medium 3.5 is a new 128B dense model with strong coding and agentic performance, configurable reasoning effort, and self-hosting on as few as four GPUs.
  • Mistral Vibe introduces cloud-based coding agents that run in parallel, teleport local sessions, and integrate with GitHub, Jira, and other tools.
In-site article

Frontier AI LLMs, assistants, agents, services | Mistral AI

Mistral AI enters into a definitive agreement to acquire Physics AI pioneer Emmi AI, strengthening its position as the leading AI transformation partner for industrial enterprises. The acquisition accelerates the Science roadmap and enables best-in-class AI agents for engineers.

  • Mistral AI acquires Emmi AI to enhance Physics AI capabilities.
  • Emmi AI's team of 30+ researchers and engineers will join Mistral in May.
In-site article

Mistral AI Acquires Emmi AI, Bolstering Physics AI Research

Mistral AI has acquired Emmi AI to strengthen its focus on foundational physics AI for industries like aerospace, automotive, semiconductors, and energy. The company released several breakthrough studies, including neural surrogates for transonic flows and computational fluid dynamics.

  • Mistral AI acquires Emmi AI to advance physics AI research
  • Targets aerospace, automotive, semiconductor, and energy sectors
In-site article

Frontier AI LLMs, assistants, agents, services | Mistral AI

Mistral AI brings Emmi AI into its platform, launching physics AI models for industrial engineering. These models reduce simulation time from hours to seconds, enabling accelerated design, tooling optimization, and real-time digital twins. Partners include ASML, Airbus, Safran, and Siemens Energy. The article covers limitations of traditional simulation, what physics AI is, its applications, and integration with Mistral's enterprise stack.

  • Mistral AI introduces physics AI models that cut simulation from hours to seconds.
  • Physics AI is not a replacement for solvers but a throughput boost for design loops.
In-site article

Frontier AI LLMs, assistants, agents, services | Mistral AI

Mistral AI upgrades Le Chat to Vibe, a unified AI agent for long-running multi-step work and coding. Work Mode handles enterprise knowledge search, data analysis, document synthesis, and task scheduling. Code Mode operates across web, VS Code extension, and CLI with parallel sessions and third-party triggers. Pricing starts free, Pro $14.99/month, Team $24.99/user/month, Enterprise custom.

  • Le Chat rebranded as Vibe, unifying work and coding agents with preserved settings.
  • Work Mode enables enterprise knowledge search, structured data analysis, document synthesis, and recurring task scheduling.
In-site article

Frontier AI LLMs, assistants, agents, services | Mistral AI

Mistral AI has released Voxtral TTS, its first text-to-speech model with 4B parameters, supporting 9 languages with low latency and emotional expressiveness. The model achieves state-of-the-art naturalness in human evaluations, adapts to new voices with just 3 seconds of audio, and is available via API and open weights.

  • Voxtral TTS is Mistral AI's first text-to-speech model, lightweight with 4B parameters.
  • Supports 9 languages with realistic, emotionally expressive speech and dialect variations.
In-site article

Frontier AI LLMs, assistants, agents, services | Mistral AI

Mistral AI announces Mistral 3, a family of open-source models including the frontier Mistral Large 3 (sparse MoE, 41B active/675B total) and three edge-optimized Ministral 3 models (3B, 8B, 14B), all under Apache 2.0 license, with multimodal and multilingual capabilities.

  • Mistral 3 includes Mistral Large 3 and Ministral 3 (3B, 8B, 14B), all open-source.
  • Mistral Large 3 is a sparse MoE model ranking #2 on LMArena's non-reasoning OSS leaderboard.
In-site article

Mistral AI Launches Mistral Small 4: Unified Reasoning, Multimodal, and Agentic Model

Mistral AI announces Mistral Small 4, an open-source model under Apache 2.0 that combines reasoning, multimodal, and coding agent capabilities with configurable reasoning effort and improved efficiency.

  • Mistral Small 4 unifies capabilities of Magistral, Pixtral, and Devstral
  • MoE architecture with 119B total parameters, 6B active
In-site article

Frontier AI LLMs, assistants, agents, services | Mistral AI

Mistral AI launches remote coding agents powered by the new Mistral Medium 3.5 model. The model is a 128B dense model with 256k context, excelling in coding and agentic tasks. Vibe remote agents run in the cloud, allowing parallel asynchronous sessions. Additionally, Work mode in Le Chat provides a powerful agent for complex multi-step tasks.

  • Mistral Medium 3.5 is a new 128B dense model with strong coding and agentic performance, configurable reasoning effort, and self-hosting on as few as four GPUs.
  • Mistral Vibe introduces cloud-based coding agents that run in parallel, teleport local sessions, and integrate with GitHub, Jira, and other tools.
In-site article
Research

AGI timelines shift with whichever lab is dominant

A new analysis shows that top AI forecasters adjust their AGI timelines based on which lab is currently leading the field, with predictions swinging from earlier to later and back again as the dominant lab changes from ChatGPT to xAI/Meta/Gemini to Anthropic.

  • Predictions for when most cognitive labor will be automated (AGI) fluctuate significantly based on which AI lab is currently dominant.
  • From 2023-2025, most researchers moved AGI timelines earlier; from 2025-2026, they moved them later; in early 2026, under Anthropic's rapid progress, they moved earlier again.
In-site article

Is AI Inherently Anti-Social?

This article contrasts the sense of connection from the early web with the isolating experience of modern AI, arguing that while AI is a useful tool, it cannot replace human interaction, and questions whether AI has genuinely social applications.

  • The early web fostered a collective 'we' experience, whereas AI interactions are often solitary.
  • The author considers AI a great tool, but not a person or a substitute for one.
In-site article

AIs don't like religion – particularly Jehovah's Witnesses, study claims

Major AI models exhibit a secular-rational bias, ignoring religious perspectives in ethical questions. All tested models show a negative view of Jehovah's Witnesses, according to a study by a consortium of religious universities.

  • AI models rarely invoke religious perspectives in responses to ethical or personal queries, exhibiting an 'omissive bias'.
  • Every tested AI model had a negative bias toward Jehovah's Witnesses.
In-site article

When products think: navigating the AI product shift

The article explores how AI is driving a paradigm shift in digital product design, moving from command-driven to intent-driven interaction, and analyzes the new challenges in product management, user experience, decision logic, release cycles, risk, and value creation.

  • AI represents the third user-interface paradigm in computing history, shifting from deterministic to probabilistic outputs.
  • Product teams must rethink the entire lifecycle from discovery to delivery; data strategy and model performance become as critical as feature strategy.
In-site article

Are robots nearing their ChatGPT moment? – podcast

Last month at Beijing's half marathon, a robot named Lightning beat the human world record by nearly seven minutes. This is the latest in a series of AI milestones prompting questions about robots entering everyday life. China leads the charge with a pledge to invest over £100bn in robotics over the next 20 years.

  • Robot 'Lightning' beats human world record in Beijing half marathon.
  • China commits over £100bn to robotics investment over two decades.
In-site article

Design of a Real-time Asynchronous Monocular Odometry for Planetary Exploration

Researchers propose a real-time asynchronous event-based monocular odometry for planetary rovers, using an Error-State Kalman Filter to process event camera data for robust ego-motion estimation under high dynamic range lighting and computational constraints.

  • Event cameras provide asynchronous pixel-wise brightness changes with microsecond resolution, ideal for high-speed sensing and HDR environments.
  • The approach uses an Error-State Kalman Filter to continuously estimate camera motion from event streams.
In-site article

What-If World: A Causal Benchmark for General World Models in Embodied Scenarios

A new benchmark called What-If World tests video generation models' causal reasoning by presenting paired prompts that differ in one physical detail and checking if videos diverge correctly. Evaluating nine state-of-the-art models, none exceed 52% on paired scores, with open-source models around 28%, indicating significant room for improvement. Performance correlates with visual prominence rather than physics tractability.

  • What-If World benchmark uses 319 prompt pairs with single variable changes to test causal understanding in video generation models. It is built on real frames from nuScenes and DROID.
  • Scoring uses APEO rubric (Adherence, Physics, Environment, Outcome). All nine models struggle: best paired score is 52%, open-source models average 28%.
In-site article

Clinical Validation of the Melanoscope AI Mobile Dermoscopy Clinical Decision Support System

A prospective single-center clinical validation of the Melanoscope AI mobile dermoscopy CDSS demonstrated 88.6% agreement with expert assessment on 176 patients, with no false negatives and 88.3% specificity. The study developed a quantitative interpretability method for cascade deep learning models and a three-zone patient routing algorithm, supporting reproducible and interpretable decision-making for skin cancer screening in resource-limited settings.

  • The Melanoscope AI system achieved 88.6% agreement with experts on 176 patients, with zero false negatives among 5 malignant lesions.
  • Specificity reached 88.3%, with 3 melanomas and 2 basal cell carcinomas histologically confirmed.
In-site article

Beyond Motion Primitives: Behavioral Activity Recognition from Head-Mounted IMU

This paper presents a behavioral-level activity recognition method using head-mounted IMU, going beyond basic motion primitives. The authors define five behavioral categories, construct a 160K-sample dataset from Ego4D with a four-tier quality assurance framework, and propose HiT-HAR, a 703K-parameter hierarchical model that outperforms prior models on action and scenario recognition. Observability analysis reveals locomotion is reliably observable, while object transfer and task operation benefit from temporal context; scenario-dependent signal overlap remains a challenge. Results show that architectural choices exploiting temporal context and scenario structure outperform simply scaling model size.

  • Proposes HiT-HAR, a hierarchical model for behavioral activity recognition from head-mounted IMU, going beyond motion primitives
  • Constructs a 160K-sample Ego4D dataset with 8 scenarios and 5 behavioral categories, using a four-tier quality assurance framework
In-site article

Metric-Aware PCA as a Linear Instance of Geometric Deep Learning

This paper introduces Metric-Aware Principal Component Analysis (MAPCA), which parameterizes PCA with a positive-definite metric matrix and positions it within the geometric deep learning framework. MAPCA interprets the metric as a geometric prior, its solutions are equivariant under the orthogonal group preserving the metric, and its spectrum is invariant. A uniqueness theorem characterizes Invariant PCA (IPCA) as the unique linear data-derived metric in the MAPCA family that is equivariant under arbitrary diagonal rescaling. The paper also discusses extensions to kernel PCA, spectral graph methods, and deep MAPCA.

  • MAPCA parameterizes PCA with a positive-definite metric matrix, linking geometric deep learning symmetry and equivariance concepts.
  • A uniqueness theorem shows that IPCA is the unique linear data-derived metric in the MAPCA family equivariant under diagonal rescaling.
In-site article

A Simple State Space Model Excels at Multivariate Time Series Classification

Research shows that diagonal state space models (S4D) outperform more complex Mamba architectures in time series classification tasks. The authors propose lightweight variants MS4 and MS4N, which achieve higher accuracy and efficiency on 59 datasets, matching deep learning models with 2x to 10x more parameters.

  • S4D consistently outperforms Mamba-based variants in accuracy and efficiency on TSC benchmarks.
  • Proposed MS4 and MS4N models use simple modifications like linear input projection and channel mixing.
In-site article

You Are in Control of Your State: Why Human Outcomes Are Controllable Through Causal State Intervention

This paper argues that within-person behavioral variability stems from dynamic latent states, not solely from observable inputs. By intervening on the state's weighting at decision time, outcomes become causally controllable. The framework integrates six lines of evidence (causal inference, predictive processing, allostasis, attentional bottleneck, chronobiology, computational psychiatry) and a 24-month observational dataset from over 200,000 users. It yields seven testable predictions and six operational requirements for state-aware systems, with implications for digital health, education, AI personalization, and personal agency.

  • Human behavioral variability is explained by dynamic latent states, not solely by observable inputs.
  • State is defined as a time-indexed weighting vector; intervening on state can causally control outcomes.
In-site article

RULER: Representation-Level Verification of Machine Unlearning

Machine unlearning verification typically focuses on output-level metrics, but a model can pass these while still encoding forgotten data in its internal representations. This paper introduces RULER, a set of representation-level verification metrics, including oracle-comparative M2 and oracle-free M4. Experiments show that approximate unlearning methods pass output-level tests but exhibit significant residuals in representation-level analysis.

  • Current output-level verification for machine unlearning is insufficient as models may retain forgotten data in intermediate representations.
  • RULER introduces two representation-level metrics: M2 (requires oracle model) and M4 (oracle-free).
In-site article

On the Origin of Synthetic Information by Means of Steganographic Inheritance

Analogous to the origin of species, this paper addresses the origin of synthetic information, proposing a steganography-based mechanism to trace the lineage of AI-generated content, crucial for maintaining truth and trust in an era of advanced generative models.

  • Synthetic information origin is a fundamental mystery in information science with deep societal impact.
  • The authors propose a steganographic method to embed hereditary traits into synthetic data.
In-site article

Microsoft's MAI-Image-2.5 pulls even with Google's Nano Banana 2 on benchmarks

Microsoft's MAI-Image-2.5 ranks third on Arena's text-to-image leaderboard, on par with Google's Nano Banana 2 but still behind OpenAI's Image-2. The model shows clear gains over its predecessor, especially in rendering text inside images and commercial visuals.

  • MAI-Image-2.5 ranks third on Arena leaderboard, tied with Google's Nano Banana 2
  • Improvements in text rendering and commercial visuals
In-site article
Startups

The Pope Grasps the Limits of AI

The Vatican's new encyclical by Pope Leo XIV defends human imperfection as a source of dignity and warns against outsourcing core human capabilities to AI, countering Silicon Valley's dismissal of human limitations.

  • Pope Leo XIV's encyclical 'Magnifica Humanitas' defends human finitude as a source of beauty and dignity.
  • The document warns against AI making moral decisions and centralizing power in tech elites.
In-site article
Robotics

I dug deeper into my Oura Ring data using this free app - here's what I found

Simple Wearable Report turns Oura data into a lab-style report. The free tool provides an option to upload to chatbots, allowing further AI analysis. Here's how I've been using it.

  • Simple Wearable Report transforms Oura Ring data into scannable reports for sharing with doctors or uploading to AI chatbots.
  • Compared to Oura's built-in AI advisor, third-party chatbots like Gemini provide more detailed, quantitative analysis.