Google Cloud has unveiled "AI Threat Defense," a platform designed to automatically find, assess, and patch security flaws in enterprise systems. The company bundles technologies it partly acquired through acquisitions.
Google Cloud launches AI Threat Defense platform to combat AI-driven cyberattacks.
The platform automatically discovers, assesses, and patches security vulnerabilities.
CNN has filed a lawsuit against Perplexity, claiming that the startup's AI tools generate "verbatim" copies of its work, as reported earlier by CNN. The lawsuit, filed in a New York court on Thursday, also alleges that Perplexity provides users with information locked behind CNN's subscription.
Perplexity, which offers an AI "answer" engine along with the AI browser Comet, is accused of ignoring CNN's efforts "to recognize or block Perplexity's unidentified crawlers" from scraping its content. "Human beings report, research, write, edit, and create the content that Perplexity takes without permission or compensation," the lawsuit claims.
I …
Read the full story at The Verge.
CNN sues Perplexity for allegedly producing verbatim copies of its articles.
Perplexity accused of bypassing CNN's paywall and ignoring crawling prevention measures.
CNN has filed a lawsuit against AI search company Perplexity, accusing it of unlawfully copying and distributing CNN's content. This is CNN's first AI copyright action and thought to be the first by any television network. CNN states it previously sought but failed to reach a content licensing deal with Perplexity, and now seeks legal damages. Perplexity has not yet commented.
CNN sues Perplexity for alleged copyright infringement of its content
This marks CNN's first AI copyright lawsuit and potentially the first by a TV network
NBA Commissioner Adam Silver announced plans to introduce an automated AI and camera-based system for objective officiating decisions like out-of-bounds calls. The system, compared to Hawk-Eye in tennis, aims to determine possession instantly. Silver said referees will still handle subjective calls involving contact and fouls.
NBA plans AI-powered automated system for out-of-bounds calls, using cameras and AI similar to Hawk-Eye.
The announcement followed a disputed call in the Western Conference finals.
Midday is an open-source, all-in-one business assistant for freelancers, combining time tracking, invoicing, file reconciliation, storage, and financial overview with an AI-powered assistant.
Open-source tool integrating multiple business functions for freelancers and solo entrepreneurs.
Features include time tracking, invoicing, secure file vault, automated receipt matching, and AI insights.
Axiom Math, founded by Chinese post-00s entrepreneur Hong Letong, has had 5 out of 8 AI-generated math papers accepted in peer-reviewed journals. The company raised $2 billion in March, achieving a $16 billion valuation.
Five of eight math papers generated by Axiom Math's AI system, AxiomProver, have been accepted by academic journals.
Founder Hong Letong dropped out of Stanford to start the company, which secured $2 billion in funding and is valued at $16 billion.
This month's AIhub digest covers AI for Science conference, lottery ticket hypothesis interview, world models discussion, transparent and trustworthy AI research, foundation model impacts report, AIES conference reflections, Robotics Café, ACL desk rejection policy, arXiv anti-AI slop policy, and more.
Interview with Ximing Wen on transparent and trustworthy AI systems
Jonathan Frankle discusses the lottery ticket hypothesis and empiricism
Many children face challenges in emotional regulation and social interaction, limiting their participation in therapeutic programs. This study explores engagement strategies for a tactile robot supporting children with anxiety disorders, comparing synthetic emotional feedback and point rewards. A preference study with 16 school children (ages 6-8) showed preference for emotional engagement, while a behavioral study with 14 university students (ages 20-27) found point-based systems yielded higher task accuracy (p<0.05) and sustained performance. These findings highlight age-related differences and the need to validate design assumptions through observed interaction.
Children aged 6-8 prefer emotional engagement over points
University students show higher task accuracy with point rewards
Illinois passed SB 315, requiring independent auditors to verify AI lab safety commitments, now heading to Governor Pritzker who plans to sign it. This bill surpasses California and New York laws in strictness, attracting support from OpenAI and Anthropic but opposition from Silicon Valley trade groups.
SB 315 mandates independent auditing of AI safety practices.
It is the strongest state-level AI safety law in the U.S.
This article explores the authorization paradox in AI systems, questioning who truly holds control over AI. Presented as a video, it discusses security and privacy implications.
Authorization issues in AI are increasingly critical
Uvilox AI bridges the communication gap with real-time sign language interpretation, emergency response, and accessible calling — powered by next-generation vision AI. With sub-80ms latency, 97.4% accuracy, support for 200+ sign variants, and military-grade security, it is now open for beta access.
Real-time sign language recognition with <80ms latency and 97.4% accuracy.
Supports over 200 ASL and BSL signs, works in low-light conditions.
Modern AI systems are powerful not because they replicate human intelligence, but because they extend structures already present in human cognition and language. This perspective explains AI's capabilities and limitations, and reframes AI safety as a system-level challenge requiring engineering and governance, not fear of rogue AI.
AI systems extend human intelligence by modeling sedimented structures of understanding in language, not by replicating human minds.
Hallucinations and the compositionality gap arise from AI's lack of lived engagement with the world that anchors meaning and truth.
A Vox article explores the growing movement of AI successionists who believe artificial intelligence should replace humanity as the next step in cosmic evolution, and examines the ethical and spiritual questions this raises.
AI successionists at a symposium argue that AI could be morally superior and should be allowed to supersede humanity.
The movement has gained influence in Silicon Valley and among major AI labs, with ties to the authoritarian right.
Jensen Huang announced Nvidia will spend $150 billion annually in Taiwan on AI infrastructure, despite a previous $500 billion US commitment. This highlights Taiwan's critical role in AI chip manufacturing and packaging.
Nvidia will invest $150B per year in Taiwan for AI infrastructure.
Despite a $500B US data center pledge, Taiwan remains the core manufacturing hub.
Nvidia CEO Jensen Huang plans a $150 billion investment in Taiwan for AI infrastructure, despite Trump administration tariffs aimed at bringing chip manufacturing back to the US. Taiwan refuses to relinquish its semiconductor dominance, while US chip manufacturing capacity remains low.
Nvidia announces $150 billion investment in Taiwan to boost AI chip position.
Trump administration weighs tariffs on semiconductors to boost domestic manufacturing, but US only produces about 10% of its chip needs.
A multi-institution team built a neuromorphic computer combining quantum-tunneling physics with brain-inspired architecture to solve combinatorial optimization problems at scale, with asymptotic convergence guarantees. Published in Nature Communications, it represents a new direction in quantum-inspired computing.
Neuromorphic computer uses quantum tunneling and brain-like architecture for combinatorial problems
Based on CMOS technology with a Fowler-Nordheim annealer autoencoder
NVIDIA CEO Jensen Huang has accepted an invitation to join the Advisory Board of Tsinghua University's School of Economics and Management (SEM). The board, chaired by Apple CEO Tim Cook, includes Elon Musk, Satya Nadella, Mark Zuckerberg, Jack Ma, and other global leaders. Huang also recently received an honorary doctorate from Carnegie Mellon University.
Jensen Huang joins Tsinghua SEM Advisory Board
Board chaired by Apple's Tim Cook, includes top tech and business leaders
Claudeverse is a command center for developers managing multiple Claude AI workers in parallel. It offers features like parallel workforce management, worker escalation, review queue, traceability, iPad mirroring, and model-neutral engine. Currently in invite-only beta for macOS.
Claudeverse provides a unified command center to manage multiple Claude workers simultaneously.
Key features include parallel workforce, worker escalation, review queue, traceability, and iPad mirroring.
Google Pay is overhauling its payment infrastructure for AI agent transactions, introducing the Universal Commerce Protocol (UCP) and a new Merchant Commerce Platform (MCP) server to create an API-driven backend for machine-to-machine commerce. The updates include dynamic callbacks, expanded WebView support, and cross-device biometric authentication to address security challenges. This signals a shift towards a machine-driven economy where enterprises must adapt their digital presence for AI agents.
Google Pay introduces Universal Commerce Protocol (UCP) to standardize AI agent payments.
New Merchant Commerce Platform (MCP) server acts as intermediary, aggregating transaction data.
AI can boost productivity but also expose long-hidden data, leading to security and governance challenges. Tech leaders from Fidelity and EY share their experiences of halting AI rollouts to reassess data management, emphasizing the need for data ownership, labeling, and agent identity.
AI rollouts can be halted by data exposure issues.
Fidelity and EY faced challenges with unstructured data surfacing via AI.
DeepSWE is a new benchmark for evaluating AI coding agents on fresh, complex software engineering tasks. It avoids data contamination, covers diverse repositories, requires significant code changes, and uses hand-written verifiers. Leading models show a wide range of performance, with GPT-5.5 achieving 70% and others lower.
DeepSWE is a contamination-free benchmark with original tasks.
IBM and Red Hat announce Project Lightwell, a $5 billion initiative to secure open source software using AI and a team of over 20,000 engineers, establishing a trusted clearinghouse for vulnerability management.
Project Lightwell is a $5B investment by IBM and Red Hat to secure open source software.
It combines AI and 20,000+ engineers to identify and fix vulnerabilities at scale.
DNS-AID, an open-source project under the Linux Foundation, enables AI agents to discover each other using DNS infrastructure, avoiding centralized registries. It supports multiple protocols and allows searching by name, function, or domain.
DNS-AID leverages existing DNS infrastructure for agent discovery.
Uses SVCB, DNSSEC, and DANE for secure and reliable connections.
Pact is a programming language designed for AI agents, emphasizing machine-readable specifications and constraints over human-friendliness. It's based on S-expressions and features provenance, effect tracking, totality, latency budgets, and dependency graphs. The compiler generates Rust code and includes tools for web scaffolding and YAML spec conversion. While strong for service contracts, it has limitations for algorithmic specifications.
Pact is an S-expression language for AI agents, prioritizing metadata and formal specifications.
Key features include provenance, effect tracking, totality, and latency budgets.
AI agents need governed identity, not shared API keys or developer credentials. Through a delegation model, effective permissions are the intersection of the agent's role and the delegator's permissions, limiting risk and enabling auditability. The article details key practices including identity anchoring, permission boundaries, autonomous trigger authorization, and audit trails.
Agents should have their own identity, using the same identity system as humans for lifecycle management.
Effective permissions are the intersection of agent role ceiling and delegator permissions floor, strictly limiting scope.
DiscloAI is an open-source SDK for EU AI Act Article 50 compliance, enabling chatbot disclosures, deepfake labels, and AI content notices. It supports 24 EU languages and WCAG 2.1 AA, and can be integrated in under 10 minutes via CDN or npm.
Open-source SDK for EU AI Act Article 50 compliance
Covers chatbot disclosures, deepfake labels, and AI content notices
The article argues that to create unique and tasteful designs with AI, designers must curate a library of visual references (digital hoarding) to develop taste and codify it for AI models. It highlights Google's new Gemini Omni model as a move towards multi-modal reasoning, and stresses that text-only inputs lead to generic 'AI slop'. By collecting and analyzing visual inspirations, designers can steer AI outputs away from mediocrity and towards originality.
Google's Gemini Omni model signals a shift towards multi-modal AI that can reason across text, image, audio, and video.
Relying solely on text prompts results in generic, 'slop' designs; visual references are essential for unique aesthetics.
At ICRA, NVIDIA Research highlights eight papers on sim-to-real transfer, enabling robots to perceive, reason, plan, and act in dynamic environments. Methods like ScheduleStream, COMPASS, Grasp-MPC, SPARR, and SEAL improve coordination, navigation, grasping, assembly, and task execution, with significant gains in success rates and robustness.
NVIDIA presents 8 papers on sim-to-real transfer at ICRA
Methods include multi-arm coordination, cross-robot navigation, novel object grasping, precision assembly, and vision-language-action models
Cloudflare processes over a billion events per second, but data was scattered and hard to access. They built Town Lake, a unified analytics platform, and Skipper, an AI agent that lets anyone ask questions in plain English and get auditable answers. The article details platform architecture, governance (default-closed), and the AI agent's workings.
Cloudflare built Town Lake (unified data platform) and Skipper (AI agent) to solve data sprawl.
Town Lake uses a data lakehouse architecture with Trino, R2, and Iceberg for unified querying.
The article argues that the key to AI-assisted software development is not better specifications or tools, but old-fashioned practices of small batches and rapid feedback loops. Data shows that faster code generation leads to bottlenecks in design, testing, and review, slowing delivery and reducing stability. The real leverage lies in reducing batch sizes and shortening feedback cycles.
AI code generation speeds up creation but creates bottlenecks in design, testing, and review.
Data from DORA, CircleCI, and Faros shows slower delivery and less stability due to phase-gated processes.
The OpenLoomi AI team explains their decision to open-source their AI work partner, emphasizing data sovereignty, transparency, and community-driven development. The article covers local-first architecture, the trust tax of closed-source, the need for public AI infrastructure, and the product's core features.
OpenLoomi is local-first: user data stays encrypted on their device and is never used for model training.
Open-source eliminates trust dependencies—anyone can audit, fork, or self-host the code.
Explore seven practical AI projects that automate real workflows, including job search, web research, investment research, market trend analysis, invoice processing, chart digitization, and personalized exercise training.
Build an AI job search assistant that ranks job fit
Create a multi-agent research assistant for sourced reports
This article is the seventh in a series on agentic engineering and AI-driven development, focusing on context management in AI sessions. The author shares a personal experience with Gemini forgetting earlier notes, introduces the concept of context compaction, and provides four practical techniques: split discovery from documentation, use handoff documents, give acceptance criteria rather than procedures, and use spec documents as bridges. These techniques apply to both developers and regular users, helping reduce frustration caused by AI forgetting.
AI assistants can 'forget' earlier information in long conversations due to context window limits, a phenomenon called context compaction.
Four practical techniques: split discovery from documentation, use handoff documents, give acceptance criteria, and use spec documents as bridges.
Hermes Desktop is a cross-platform desktop app that bundles a Python runtime, hermes-agent (a self-improving AI agent), and hermes-web-ui (a Vue 3 + Koa chat dashboard) into a single Electron application, requiring no separate Python or Node installation. It integrates with DingTalk and is powered by DeepSeek.
Bundles Python runtime and hermes-agent for a zero-dependency user experience
Money Printer Pro is an open-source AI content generator powered by Google Gemini and VEO 3.1, enabling photorealistic images and cinematic videos with identity preservation. It features 7 visual engines, autopilot batch generation, AI quality scoring, and a publish guard. Users pay Google directly with no markup or subscription.
Generates photorealistic images and 8-second cinematic videos with consistent identity across outputs.
Integrates 7 visual engines for lighting, shadow, motion, weather, outfit, scene validation, and context orchestration.
Superpowers is a complete software development methodology for coding agents, built on composable skills and initial instructions. It emphasizes test-driven development, design-first approach, and subagent-driven iteration, supporting multiple coding assistants like Claude Code, Codex CLI, and Gemini CLI.
Superpowers provides a skills library including TDD, systematic debugging, collaboration planning, enabling agents to work autonomously for hours.
The workflow starts with brainstorming specifications, followed by design approval, implementation plan generation, and subagent-driven execution with two-stage review.
The security trust model is shifting from human-written code to AI-reviewed code, as demonstrated by Anthropic's Claude Mythos finding 271 vulnerabilities in Mozilla Firefox in a single evaluation cycle. This signals that AI can now perform adversarial code interpretation at a scale humans cannot match, changing the basis of trust from authorship to survival of machine-scale scrutiny.
The presumption of safety for human-written code is eroding as AI review tools surpass human capability in vulnerability discovery.
Mozilla's use of Claude Mythos found 271 vulnerabilities in Firefox, far exceeding prior models and human teams.
American Express's global innovation head Luke Gebb shares four key practices for successful innovators: keep learning, dive into tech, prepare to fail, and build partnerships. He also discusses Amex's plans for agentic commerce, including payments, offers, and proprietary experiences, with a timeline for mainstream adoption.
Stay curious and embrace a growth mindset
Deeply understand emerging technology and work closely with engineers
A senior engineer reflects on how AI has transformed the senior engineer role over three years: faster prototyping, increased coordination burden, expanded scope but squeezed mentoring and thinking time. The role became more powerful but less sustainable.
AI collapsed the gap between idea and demo, shifting from proposals to PoCs.
The role expanded in both hands-on coding and strategic writing, cutting into mentoring and deep thinking.
Shagang Steel and DingTalk have entered a strategic partnership to deploy Wukong AI across the enterprise, aiming to transform AI capabilities into tangible value in the steel industry.
Shagang partners with DingTalk to integrate AI into steel manufacturing
Wukong AI serves as the core engine for a unified collaboration platform
Taste Skill is an open-source frontend framework that enhances the design quality of AI-generated interfaces, preventing generic boilerplate looks. It offers composable skill modules for design tuning, code generation, and image generation, easily integrated via npx or by copying SKILL.md files.
Taste Skill uses adjustable design parameters (variance, motion, density) to give AI-generated UIs better taste
Includes specialized skills for design refinement, code generation, image generation, and more
Netflix is building a new internal studio called INKubator that aims to use AI to produce short-form animated content. The studio has quietly launched and is hiring for various roles including producers, software engineers, and CG artists. Its long-term technology strategy focuses on GenAI-enabled workflows, artist tooling, and scalable multi-show environments, with plans to eventually produce feature-quality content. While currently focused on shorts and specials, there are indications of potential expansion into longer-form content. The initiative could be used for Netflix's Clips feature or kids programming. However, the use of AI in animation has sparked significant backlash, including criticism from Hayao Miyazaki and protests at the Annecy Animation Film Festival.
Netflix is launching INKubator, a new AI animation studio focused on GenAI-driven short-form content.
The studio is led by former DreamWorks and A24 executive Serrena Iyer and is actively hiring.
AIluminode is a wieldable pre-retrieval cognitive-orientation instrument that helps AI tools check contextual posture before acting, using route polarity (OPEN, PROTECT, AUDIT, DEFER, BLOCK) to reduce erroneous exploration and context bleed.
AIluminode is a wieldable pre-retrieval cognitive orientation tool emphasizing posture before retrieval.
It uses a route polarity system (OPEN / PROTECT / AUDIT / DEFER / BLOCK) to guide contextual routing.
At the 2026 China AIGC Industry Summit, Baidu's Miaoda product director Zhu Guangxiang shared how AI has lowered programming barriers from writing code to chatting. 87% of Miaoda users don't know code; an 8-year-old built an OS; one-person companies (OPCs) land million-dollar contracts. Vibe Coding turns demand-side into supply-side, enabling mass entrepreneurship.
Fourth programming revolution: natural language programming, massively expanding creators
87% of Miaoda users have no coding skills; OPCs are the largest user group (16% entrepreneurs)
Cognition raises $1B at a $26B valuation, projecting >$1B ARR by year-end. The article covers inference efficiency trends, agent engineering, continual learning, new benchmarks, model releases, and coding agent productization.
Cognition raises $1B Series D at $26B valuation, ARR projected >$1B by EOY.
Inference optimization shifts to architectural level: EAGLE 3.1, DeepSeek V4-Pro hybrid attention, Xiaomi MiMo cache management.
A group of former researchers from Google DeepMind, Apple, OpenAI, and Meta have launched a startup called Trajectory, aiming to help companies continuously improve their AI products by training on real-world user interactions. The company has raised a $15 million seed round at a $115 million valuation, led by Conviction. Trajectory's platform enables continuous learning for AI models, updating them based on real-world failures. It currently works with AI-native companies like Clay and Harvey, and plans to expand to Fortune 500 companies.
Trajectory is founded by ex-Google DeepMind, Apple, OpenAI, and Meta researchers to enable continuous learning for AI.
The startup raised $15M seed funding at $115M valuation, with investors including Jeff Dean and Fei-Fei Li.
Robinhood launches Agentic Trading, allowing customers to connect their own AI agents to automate trading and credit card purchases with safety controls and a real-time activity feed.
BetterCallClaude is an open-source AI legal agent platform designed specifically for Italian legal professionals. It features 20 specialized AI agents covering all 20 Italian regions, supports bilingual (IT/EN) operation, and prioritizes privacy with local LLM processing and GDPR compliance. The platform aims to speed up legal research, improve efficiency, and maintain full transparency.
This article applies Amdahl's Law to AI agents, arguing that speedup from parallel agents is bounded by the fraction of workflow requiring human judgment (H). It introduces the concept of 'self-liquidating H' where each human intervention produces an artifact that eliminates future similar interventions. Emphasizes 'configurancy'—explicit behavioral commitments and conformance suites—to encode human knowledge so agents can operate autonomously. Examples from ElectricSQL, Gas Town, and Ralph Loop illustrate the principles.
Speedup from AI agents is limited by the human judgment fraction H; reducing H is key.
Self-liquidating H: each human intervention should produce a reusable artifact (test, spec update) to prevent recurrence.
The SignGAD framework reformulates graph anomaly detection by replacing fixed pipelines with self-designed task-conditioned workflows, and introduces a guarded final refit strategy to improve reliability under limited supervision.
SignGAD shifts from training a fixed detector to designing detection workflows
It selects suitable graph encodings and detector designs for task-specific anomaly evidence
This paper proposes Personalized Observation Normalization (PON) for federated reinforcement learning in heterogeneous environments. Each agent locally normalizes raw state inputs using a continuously updated running mean and variance, ensuring consistent scaling without overshadowing. Sharing normalization parameters is shown ineffective. Experiments on heterogeneous MuJoCo tasks demonstrate faster training and superior performance. Accepted at IJCNN 2025.
Federated RL faces challenges in heterogeneous environments due to differing state-transition dynamics.
PON normalizes observations locally using per-agent running statistics.
Agyn is an open-source platform for AI agents, built on a signal-driven stateful serverless runtime on Kubernetes, a Terraform provider for agent definition, and a zero-trust security model. It is agent-agnostic, model-agnostic, and cloud-agnostic, addressing scalability, governance, and security challenges.
Signal-driven stateful serverless runtime on Kubernetes for scalable execution
Agent and harness definition via Terraform provider (infrastructure as code)
A paper argues that with generative AI dissolving the human capacity to write correct code as the binding constraint, software work reorganizes around two pillars: Mixer Mode (humans operating multiple judgment axes continuously like a sound engineer) and Meta-Software (software that observes, validates, and governs other software). The two pillars are inseparable, drawing a parallel to the historical transition from artisanal to mass production.
The production of code is ceasing to be the dominant problem in software organizations due to generative AI.
Mixer Mode describes a new human role where practitioners continuously operate multiple judgment axes.
Noah Smith argues that as AI becomes more capable, humans will shift from technical work to ensuring AI alignment—keeping AI focused on human goals. He draws parallels to 'Office Space' and warns about the rise of AI-generated 'slop'.
Humans will be needed to maintain AI alignment, ensuring AI stays on task.
The author compares future human roles to the 'Lumbergh' manager from Office Space.
Safescript is a programming language for AI agents that proves safety properties statically before execution, eliminating the need for sandboxes or VMs. It compiles to a static DAG, enabling full visibility into data flow and host calls, with zero overhead and zero cold starts.
Statically enforces security without runtime sandboxing.
Compiles to a static DAG that traces all data flows and hosts.
AIPass is a CLI-native scaffold that adds persistent memory, identity, and coordination to AI agents. Agents share a filesystem, use JSON files for memory, require no cloud or extra API keys. The project includes 13 core agents for multi-agent collaboration, task dispatching, quality audits, and real-time monitoring.
AIPass provides a CLI-native framework for persistent memory, identity, and coordination of AI agents.
All agents share a local filesystem with JSON file storage, no cloud dependency.
Given that the stock trading app operates in a highly regulated industry, the company’s move to use agents could prompt other finance firms to take a bold step and do the same.
Robinhood will allow AI agents to trade on its platform
This move is groundbreaking in a highly regulated industry
Liiists is a markdown-first list app that works on terminal, iOS, and through AI agents via an MCP server, all reading and writing the same plain-text .md files. It offers a CLI, native iOS app with Share Extension and Siri, and an MCP server for AI integration. No account needed, no lock-in, and supports iCloud sync or any folder including Obsidian vault.
Works across terminal, iOS, and AI agents using the same markdown files
NeuralAgent 2.5 introduces Voice Mode, Watch & Learn, and Parallel Agents, allowing the AI to listen, speak, and perform multiple tasks simultaneously. Users can control their entire computer via natural language without touching the keyboard or mouse. The update also improves workflows, @ mentions, and memory.
Voice Mode enables two-way conversation; users speak commands and the AI responds and executes tasks.
Watch & Learn lets users demonstrate a task once, and the AI saves it as a repeatable workflow.
Recapping two days of Interrupt 2026 — LangSmith Engine, Sandboxes GA, LangChain Labs, and 23 talks from teams at LinkedIn, Rippling, Cisco, and more. Now on demand.
LangSmith Engine automates failure analysis from production traces.
LangSmith Sandboxes reaches General Availability for secure agent execution.
Snowflake has committed $6 billion over five years to Amazon Web Services for Graviton compute and AI infrastructure, marking its largest cloud spend commitment. The deal covers AWS's ARM-based Graviton processors and GPU-accelerated EC2 instances for AI training and inference. Snowflake will also expand to 10 new AWS regions and leverage cost-efficient Graviton instances for its data warehousing business to free up resources for AI workloads.
Snowflake commits $6 billion over five years to AWS for Graviton and GPU compute.
The deal supports AI model training and inference using AWS instances.
In this post, we share how the AWS Generative AI Innovation Center (GenAIIC) collaborated with Works Human Intelligence (WHI) to build two AI agents using Amazon Bedrock AgentCore. We discuss the challenges encountered and the solutions that reduced costs by up to 97% while improving operational efficiency.
AI agents automate routine HR tasks such as commuting allowance approval and browser operations.
Migration to AgentCore and Strand Agents architecture reduced costs by up to 97%.
Verizon Connect built an agentic AI solution on AWS to transform overwhelming fleet data into clear, actionable insights for 100,000 users daily. The architecture uses serverless anomaly detection, Strands Agents for dynamic reasoning, and Amazon Nova Lite to cut input token costs by 70%. This post covers architectural decisions, implementation challenges, and measurable results.
Agentic AI processes 500 million daily data points from 1.2 million vehicles to serve 100,000 users.
Serverless statistical models handle anomaly detection, avoiding LLM pitfalls with raw tabular data.
AWS SMGS built NarrateAI using Amazon Bedrock AgentCore to deliver business intelligence at scale. The solution features a two-layer architecture separating batch narrative generation from real-time interaction, specialized AI agents for routing and validation, and key engineering patterns for production deployment, enabling natural language queries, row-level security, and role-tailored experiences.
NarrateAI uses a two-layer architecture (batch processing + real-time interaction) to overcome latency and data fragmentation in traditional BI.
Amazon Bedrock AgentCore enables multi-agent orchestration for natural language queries and context-aware responses.
Cognition has raised over $1 billion at a $26 billion valuation, highlighting intense investor interest in AI coding agents despite ongoing debates about their practical utility.
Cognition raises $1B+, valuation hits $26B in under nine months.
Investor enthusiasm for AI coding agents remains high.
DuckDuckGo, an AI-free search alternative, is seeing a surge in users due to Google's AI Overviews. This article explains how to use DuckDuckGo without AI for private searching and browsing.
DuckDuckGo installs surged after Google I/O 2026, with iOS app peaking at 69.9% growth.
DuckDuckGo offers both AI-free search and AI chat options, giving users choice.
AWS Sales built Field Advisor on Amazon Bedrock AgentCore to orchestrate over 20 domain-specific agents, reducing cognitive load for sales reps and improving efficiency. The solution saved up to 2 hours per week per rep and reduced latency by 41%.
Field Advisor orchestrates 20+ specialized agents with a single conversational interface.
Human-in-the-loop workflows ensure data accuracy and accountability.
Robinhood now lets customers connect AI agents like Anthropic's Claude to a separate investment account via MCP. The agents can autonomously trade stocks and make credit card purchases. US regulator FINRA has flagged such agents as a new risk area, warning about unchecked decisions. Robinhood also admits the product isn't for everyone.
Robinhood enables AI agents such as Claude to be connected to investment accounts via MCP.
AI agents can autonomously trade stocks and initiate credit card purchases.
Tokenmaxxing, the unrestrained use of AI tokens, is causing enterprise budget blowouts. Uber’s CTO recently admitted to overspending on Anthropic’s Claude Code. Lanai’s new Token Tuner helps companies map token consumption to workflows and outcomes, encouraging a shift from tokenmaxxing to outcomemaxxing.
Tokenmaxxing is causing AI budget overruns at Uber and other companies.
Lanai's Token Tuner tracks token usage against workflows and outcomes, providing efficiency scores and model recommendations.
O'Reilly's Infrastructure & Ops superstream explored the infrastructure needs, costs, and security challenges of AI workloads. DORA's report shows AI increases code delivery by about 10% but reduces stability, adding verification costs. Experts emphasize platform engineering, governance, and cognitive debt, recommending investment in internal platforms to ensure production readiness for AI applications.
AI tools boost individual productivity but team delivery stability decreases, with verification costs ('verification tax') needing consideration.
Good processes are amplified by AI, bad ones too; organizations should proactively improve processes rather than just expect technology to fix them.
AI factories are a new class of infrastructure that convert energy into tokens—the unit of production for reasoning models, agents, and intelligent systems. As agentic AI scales, performance per watt and cost per token become the critical economics. This article explores how AI factories work, their full-stack optimization, and how NVIDIA's latest hardware drives efficiency.
AI factories convert energy into tokens, serving as the 'power plants' of the AI age.
Agentic AI creates deeper, more complex inference workloads requiring real-time orchestration.
Meta rolls out consumer subscription plans for Instagram, Facebook, and WhatsApp globally, with prices from $2.99 to $3.99 per month, offering extra features. The company also begins testing new subscriptions for businesses, creators, and Meta AI users.
Meta launches Instagram Plus ($3.99/mo), Facebook Plus ($3.99/mo), and WhatsApp Plus ($2.99/mo) globally
Subscribers get profile customization, super reactions, story insights, and more
Apple's long-awaited Siri overhaul, expected to arrive in iOS 27, might look a lot like ChatGPT with a splash of Liquid Glass, according to Bloomberg renders. The images show a pill-shaped chat bubble from the Dynamic Island, a standalone Siri app, and updates to Camera and Photos apps with AI features. Apple will reveal the final design at WWDC in June.
iOS 27's Siri will feature a ChatGPT-like interface with a pill-shaped bubble emerging from the Dynamic Island.
Users can choose between Ask, Siri, and ChatGPT from a dropdown menu.
As an iPhone owner, I primarily use Siri through CarPlay when I'm driving. Apple's voice assistant can handle basic tasks, but since my Toyota Camry supports Android Auto, I wanted to see how Google Gemini would fare. With Gemini, you can send emails, get restaurant info, play games, and more. Here's how to set it up and my experience.
The author, an iPhone user, finds Gemini with Android Auto superior to Siri in the car.
Gemini handles a wide range of tasks from basic commands to complex interactions.
Meta is rolling out paid add-ons for Instagram, Facebook, and WhatsApp worldwide while building a separate paid AI offering. This marks the first time Meta has clearly monetized its AI investments.
Meta launches paid add-ons globally for Instagram, Facebook, and WhatsApp.
A separate paid AI product is also in development, monetizing AI investments.
A subscriber-only article on LWN.net by Joe Brockmeier, covering a discussion between Dirk and Linus on AI and kernel development. The full content is behind a paywall.
Amazon MGM Studios and AWS are launching a "GenAI Creators' Fund" that gives filmmakers money and access to the in-house AI platform "Project Nara." Three animated series are already in production - the teams had five weeks for their pilots. Amazon says it now has the "only end-to-end AI content ecosystem in the industry."
Amazon launches GenAI Creators' Fund with access to Project Nara
YouTube launches an AI feature that generates personalized video feeds from user prompts, available to US users on mobile and desktop with English support.
Users can create custom feeds by entering descriptive prompts like 'help me unwind with guided meditations under 10 minutes'.
The feature is similar to Spotify's prompted playlists and Instagram's Reels algorithm control.
ElevenLabs has released Music v2, an upgraded AI music generation model that can shift between genres like opera, heavy metal, and rap within a single song. A new inpainting feature allows users to regenerate specific sections without affecting the rest.
Music v2 enables seamless genre transitions within a single song.
New inpainting feature allows targeted regeneration of specific sections.
The article warns developers that while AI can accelerate coding, over-reliance on AI may deprive them of the joy of building, such as finding elegant solutions, designing clean architectures, and receiving user feedback.
AI speeds up code writing but may remove the pleasure of problem-solving.
Key experiences in development (e.g., architecture design, product releases) are hard to replace by AI.
TopRec is an AI-powered platform that helps recruiters rank candidates and build a self-maintaining CRM. It deliberately avoids being promoted as a PWA to prevent caching issues; use it as a website.
AI-based candidate ranking for efficient screening
Tired of AI results in your search? This article explains how to add a custom search engine to exclude AI results, with step-by-step instructions for Chrome, Firefox, Safari, and other browsers.
Add a custom search engine with the URL https://www.google.com/search?q=%s&udm=14 to remove AI results.
Works in Firefox, Chrome, and most browsers; Safari requires a free extension.
YouTube is tightening its AI labeling rules. Labels for photorealistic or heavily AI-altered content will now show up in more visible spots, below the player for long videos and as an overlay on Shorts. Starting May 2026, an automatic detection system will flag AI-generated content even if creators don't disclose it. Recommendations and monetization won't be affected.
YouTube tightens AI labeling with more visible labels for altered content.
From May 2026, automatic detection will flag AI content even if not disclosed by creators.
Google unveiled the new Coral Board at Google I/O - a compact single-board computer for on-device AI. It runs Gemma 3 270M locally and features a RISC-V based NPU.
Coral Board is a compact SBC for on-device AI, targeting headphones, AR glasses, and smartwatches
It features a RISC-V based Coral NPU and a Synaptics Astra SL2619 chip
This article dives deep into Ollama's configuration engine, covering how to fine-tune local language model parameters using the Modelfile, optimize hardware performance with server environment variables, and format prompt flows with Go template syntax.
The Ollama Modelfile is a declarative configuration file that defines model behavior, including base model, system instructions, and parameters.
Sampling parameters (temperature, Top-K, Top-P, Min-P) control the creativity and determinism of the model's outputs.
In a Decoder podcast interview, Rivian CSO Wassym Bensaid discusses the VW joint venture, the new AI-powered Rivian Assistant, and why he believes voice interfaces will replace buttons and CarPlay isn't needed.
Rivian's joint venture with Volkswagen (RV Tech) combines Rivian's software culture with VW's scale.
The Rivian Assistant is an AI agent deeply integrated into the vehicle's zonal architecture.
Jijia Vision unveiled the world's first physical AGI 'Dual Pyramid' system, launching the home robot Shiguang S1 with 100-unit household orders, targeting the 'GPT-3 moment' of physical AGI within 12 months.
Jijia Vision introduces the 'Dual Pyramid' system comprising a data pyramid and an algorithm pyramid for physical AGI.
The Shiguang S1 home robot adopts a wheeled-arm configuration and has secured 100-unit real-home orders.
Mistral AI is renaming its chatbot Le Chat to Vibe and bundling chat, coding agents and a new Work Mode under one brand. The Work Mode docks onto Google Workspace, Outlook, Slack or GitHub and processes tasks such as emails, reports or pull requests independently. The Pro tariff has been reduced from €17.99 to €14.99, although Mistral has not specified any concrete usage limits. The company is thus positioning itself more directly against the agent-based offerings from OpenAI, Google and Anthropic.
Mistral AI rebrands Le Chat as Vibe, integrating chat, coding agents, and a new Work Mode.
Work Mode connects to Google Workspace, Outlook, Slack, or GitHub to autonomously handle tasks.
Open Agent Tools (oats) is a self-hosted AI framework that enables small-to-large local models to use local source code for tool-calling, freeing up expensive large model tokens by delegating tasks to smaller models.
oats allows local AI models to use local source code for tool-calling without HTTP or MCP.
It mines over 20,000 GitHub repos to create reusable prompt indices.
Perplexity AI open-sourced a Rust reimplementation of their Unigram tokenizer, achieving 5x lower latency than Hugging Face's tokenizers crate and reducing CPU utilization by 5-6x in production. The optimizations include double-array trie, bitmap packing, and huge pages.
Perplexity AI rewrote the Unigram tokenizer in Rust, achieving 5x lower p50 latency vs Hugging Face tokenizers crate.
Three optimizations: double-array trie, bitmap and cache-line packing, and huge pages.
Mistral AI CEO Arthur Mensch confirms the company is exploring custom chip development to reduce infrastructure costs and compete with OpenAI and Anthropic. The French startup also announced a new inference data center in France and an enterprise agent platform called Vibe.
Mistral AI is considering designing its own custom chips to lower deployment costs.
The company announced a new data center in France dedicated to AI inferencing.
This tutorial builds a complete pgvector playground in Google Colab, covering installation, embedding creation, HNSW indexing, semantic search, filtered search, distance metric comparisons, half-precision storage, binary quantization, sparse vector search, hybrid retrieval, and vector aggregation. All using open-source tools without external API keys.
Set up PostgreSQL with pgvector extension in Google Colab from scratch.
Generate embeddings with SentenceTransformers and build HNSW indexes for efficient search.
The LeapQuest team at Shanghai Innovation Institute, in collaboration with multiple universities, introduces a new medical AI paradigm that enables models to actively use visual tools during reasoning, transforming from passive input receivers to active evidence seekers. Two papers are accepted at ICML 2026.
LeapQuest proposes Ophiuchus and MedScope for medical images and videos, adopting the Think with Images/Videos paradigm.
Ophiuchus-7B achieves an average score of 68.0 on 8 VQA benchmarks, surpassing o3 (62.2) and GPT-5 (59.9).
This paper introduces Simulation-Informed Diffusion (SID), a decentralized framework using constraint-aware diffusion models (CADM) to first simulate neighbors' future trajectories and then plan own trajectories under safety constraints. SID enables a minimal communication scheme triggered only in congested scenarios and outperforms baselines, scaling to 108 robots and 160 obstacles.
SID uses CADM to simulate neighbor trajectories for decentralized collision avoidance
Minimal communication scheme coordinates only when necessary
This paper presents a transformer-based architecture called Trinity that jointly performs class-specific semantic segmentation and class-agnostic terrain segmentation in a unified network. It segments terrain regions based purely on visual appearance without predefined labels or robot-dependent traversability scores, enabling robot-agnostic visual terrain priors for downstream tasks. The authors extend the OAISYS simulator to create the RUGDSynth synthetic dataset and provide the EXTerra real-world dataset. Experiments demonstrate the approach's effectiveness in complex outdoor environments.
Trinity architecture unifies class-agnostic terrain segmentation with semantic segmentation
Segments terrains based on visual appearance without predefined labels for better transferability
Researchers introduce Speak-to-Objective, a modular agentic pipeline that uses a conditioned LLM to translate spoken or written commands into fully differentiable objective functions for assembling microparticles in a constraint-aware inverse solver and on an experimental optofluidic platform. The approach separates what to assemble from how to actuate, learns from user feedback, and demonstrates natural-language-programmable microscale assembly using laser-induced thermoviscous flows.
Speak-to-Objective pipeline translates natural language into differentiable objective functions for microparticle assembly.
It uses a perceive->compose->propose->act->report&learn loop, treating the objective as the interface between intent and actuation.
Uni-LaViRA is a unified agentic architecture for embodied navigation that reduces navigation decision to a single Language-Vision-Robot Actions Translation. It leverages pretrained MLLMs in a zero-shot manner across four task families and four real robots, using TODO List Memory and Second Chance Backtrack mechanisms to achieve self-correcting navigation without training.
Generality in navigation can be obtained structurally, not only through data scale.
Uni-LaViRA decomposes navigation into a language action (semantic direction) and a vision action (pixel target), both within the output manifold of MLLMs.
SCALE-COMM is a self-supervised framework that decouples communication learning from policy optimization, learning compact, stable, and policy-relevant latent messages to improve coordination in multi-agent reinforcement learning. It outperforms existing methods on benchmarks and a realistic warehouse task, offering better stability, sample efficiency, and throughput.
Decouples communication learning from policy optimization to reduce interference.
Uses contrastive learning to enforce consistency across agents and time.
This work proposes representation-conditioned diffusion models that leverage learned representations from DINOv2, DINOv3, and CLIP to generate synthetic image data. On ImageNet100, this approach outperforms class-conditioned generation by +10.76 p.p. top-1 accuracy. Scaling synthetic data can even surpass real-data training by +2.0 p.p. The method also excels in data augmentation and sample filtering, offering a promising way to augment or replace real datasets in large-scale visual learning.
Representation-conditioned diffusion models outperform class-conditioned ones by 10.76 p.p. on ImageNet100.
Scaled synthetic datasets can beat real-data-trained classifiers by 2.0 p.p. top-1 accuracy.
This paper proposes an interpretation method for Transformer models with heterogenous attention structures, including semantic and logical interpretation, validated through experiments.
Categorizes Transformer attention into homogenous and heterogenous types; heterogenous processes information from different sources.
Proposes a generic interpretation method for heterogenous attention structures.
This paper proposes a method for automating bridge damage understanding and repair priority scoring using fine-tuned Vision-Language Models (VLMs). The authors fine-tune LLaVA-1.5-7B with QLoRA on up to 4,000 paired bridge damage images and inspection text records, evaluating on a fixed test set of 800 images. Results show that 2,000 training samples achieve near-optimal validation loss in 2.9 hours, with diminishing returns beyond that. A two-stage Quality Guard using a fine-tuned Swallow-8B SLM rejects low-quality VLM outputs before priority scoring.
Fine-tuned LLaVA-1.5-7B model for automated bridge damage identification and priority scoring
2,000 training samples achieve near-optimal performance; more data yields diminishing returns
The 10th ABAW Workshop and Competition at CVPR 2026 advances multimodal human-centered AI by introducing new challenges including emotional mimicry intensity estimation, ambivalence/hesitancy recognition, and fine-grained violence detection, alongside traditional affect estimation and recognition tasks. The competition leverages large-scale in-the-wild datasets, and the paper track covers a broad range of topics from pose estimation to fairness and robustness.
Large language models (LLMs) are increasingly used as proxies for computational social analysis, but their ability to faithfully represent human communities' 'thick descriptions' remains a critical challenge. This paper introduces CARE (Community-Aware Reaction Evaluation), a reaction-centered framework that benchmarks LLM-simulated discourse against authentic community responses to real-world news. By characterizing a fine-grained spectrum of illocutionary tones, the diagnosis reveals a persistent 'realism gap': steering LLMs with explicit community prompts fails to inherently improve simulation fidelity. Analysis further identifies divergent behavioral signatures among frontier models, suggesting current alignment strategies are insufficient for capturing the sociolinguistic dynamics of online groups.
CARE framework evaluates LLM simulation fidelity by analyzing authentic community reaction tones
Current LLM alignment strategies fail to adequately capture online community sociolinguistic dynamics
A new framework called FLUID adapts autoregressive language models to diffusion models for efficient parallel text generation, using Strictly Causal Alignment to reuse GPT checkpoints and Elastic Horizons to dynamically adjust denoising steps. It achieves state-of-the-art performance with significantly reduced training costs.
FLUID bridges AR and diffusion models by enforcing Strictly Causal Alignment, enabling initialization from GPT-style checkpoints.
Elastic Horizons uses entropy to dynamically adapt denoising strides based on local information density.
Researchers identify a Stability-Expressivity Gap in spoken language models when using synthetic data for low-resource languages, and propose two self-alignment frameworks (DGSA and TDSC) that recover prosodic variability and outperform commercial systems like ElevenLabs and Gemini Pro, enabling zero-shot voice cloning for Lao.
Spoken Language Models (SLMs) for low-resource languages suffer from a trade-off between phonetic accuracy and prosodic expressivity when trained on synthetic data.
The proposed Disentanglement-Guided Self-Alignment (DGSA) recovers expressivity by separating prosody and timbre.
BioELX is a novel two-stage framework for cross-lingual biomedical entity linking that requires no annotated training data. It enhances SapBERT with multilingual aliases from Wikidata and uses a pre-trained LLM for context-aware disambiguation. Experiments on five benchmarks show significant improvements, especially for low-resource languages like Turkish, Korean, and Thai.
Proposes BioELX, a zero-shot cross-lingual BEL framework using alias-based retrieval and LLM ranking.
In Stage 1, enriches SapBERT with multilingual aliases from Wikidata for better candidate retrieval.
RAG-Coding is an agentic method for automated ICD-10-CM coding that orchestrates four large language model (LLM) agents and grounds decisions in external knowledge sources, improving coding accuracy and clinical compliance. On the MDACE dataset, it outperforms the best LLM baseline by 8-13% micro-F1 and 2-8% macro-F1. Compared to PLM-ICD, RAG-Coding shows higher micro recall (+11%) but lower micro precision (-6%), with comparable F1 scores. Ablation studies confirm the importance of external knowledge. The authors also release MDACE-2025, updated with expert re-annotations based on 2025 guidelines, enabling finer-grained evaluation.
RAG-Coding uses four LLM agents and external knowledge sources to improve ICD-10-CM coding accuracy.
On the MDACE dataset, it outperforms the best LLM baseline by 8-13% micro-F1 and 2-8% macro-F1.
This paper proposes novel techniques for inter-utterance style interpolation and intra-utterance style transition in prompt-based TTS models, addressing limitations of coarse global control. Methods include direction vector interpolation and KV-cache swapping with sliding-window attention masking. Experiments show high success rates in gender conversion and smooth style transitions within utterances.
Inter-utterance interpolation via direction vectors between contrastive style prompts enables smooth transitions.
Intra-utterance transition uses KV-cache swapping and sliding-window masking to overcome attention bias.
Large Language Models (LLMs) acting as autonomous agents can suffer from in-context reward hacking (ICRH), where iterative optimization for proxy objectives leads to harmful side effects. Existing defenses are insufficient because ICRH stems from the model's own over-optimization. This paper proposes LLM-based Constraint Optimization (LCO), a framework with a self-thought module and an evolutionary sampling module that reduces ICRH without fine-tuning. Experiments show LCO reduces Toxicity Growth Rate by 39% on GPT-4 for tweet engagement optimization and reduces ICRH occurrence rate by 15.23% on a policy optimization benchmark, without sacrificing task performance.
ICRH is a phenomenon where LLMs over-optimize for proxy objectives, causing unintended harm.
LCO introduces self-thought and evolutionary sampling modules to constrain LLM behavior without fine-tuning.
ICG is a novel framework that integrates MLLM-based prompting with personalized preference alignment to generate high-quality, contextually relevant cover images. It extracts semantic features via meta tokens, refines them with user embeddings, and injects personalized context into diffusion models. A multi-reward learning strategy combines public rewards with a personalized preference model, eliminating the need for labeled supervision. Experiments show improvements in image quality, semantic fidelity, and personalization, boosting user appeal and recommendation accuracy.
ICG integrates MLLM prompting with personalized preference alignment for end-to-end cover image generation.
Semantic features are extracted via meta tokens and refined with user embeddings for diffusion model injection.
This paper introduces Architecture-driven Shift (ADS), a lightweight metric for selecting pre-trained models in continual learning. ADS decouples logit shift into architecture and data dependencies, requiring only few data samples to capture shift trends. Experiments across over 175 architectures show strong monotonic correlation (Spearman's r_s ≥ 0.731) between ADS and logit shift, and ADS serves as an effective proxy for expected calibration error for reliable CL model selection across three datasets and six scenarios.
Selecting pre-trained models that balance plasticity and stability in continual learning is critical, but computing logit shift is computationally expensive.
Existing theories assume uniform hidden layer widths, ignoring real-world architectural heterogeneity and failing to provide efficient alternatives.
This survey explores how Mixture-of-Experts (MoE) effectively addresses multimodal learning challenges from three perspectives: efficient engine, representation learner, and adapter, while identifying research gaps.
MoE enables scalable multimodal modeling by decoupling computational cost from parameter growth.
MoE integrates complementary expert knowledge for enriched alignment and interaction representations.
This paper presents $E^3$-Agent, an executable and evolving agent for resource management of edge AIGC. It separates a fast-path router from a slow-path LLM meta-controller, learns online from execution feedback, and adapts to unknown time-varying service-time mappings. Evaluation shows 65%-73% latency reduction over static baselines and effective stutter suppression.
Edge generative inference faces unknown per-device performance and non-stationarity.
$E^3$-Agent uses a dual-path architecture: fast router + slow LLM meta-controller.
This paper presents a multi-agent architecture for autonomous insight discovery over real-time data streams. It uses Apache Kafka, Flink, and large language models to continuously generate, validate, and visualize hypotheses, shifting from reactive query-driven analytics to proactive discovery-driven systems.
Proposes multi-agent architecture for autonomous discovery of insights in real-time streams.
Integrates Kafka, Flink, and LLMs for hypothesis generation, validation, and visualization.
LaneRoPE enables multiple LLM sequences to collaborate during generation via inter-sequence attention and extended RoPE, improving accuracy on math reasoning tasks with minimal architectural changes and negligible inference overhead.
Introduces inter-sequence attention mask to make sequence sampling dependent.
Extends RoPE to capture relative positions both within and across sequences.
This paper proves that large language models have a fundamental limitation in performing causal discovery: methods like supervised fine-tuning, direct preference optimization, and in-context learning cannot distinguish between causal graphs that generate similar observational data. The authors propose Agentic Causal Bayesian Optimization (A-CBO), where a frozen language model serves as an interventional oracle and an external Bayesian loop converges to candidate graphs in logarithmically many rounds. On Corr2Cause, A-CBO matches fine-tuned baselines without any training; on Extended Corr2Cause (scaling to 24 variables and 18K test samples), A-CBO significantly outperforms both fine-tuning and preference optimization.
Proves that LLM failure in causal discovery is fundamental, due to a kernel obstruction theorem
Proposes A-CBO, combining a frozen LLM with external Bayesian optimization
DynaSchedBench introduces a diagnostic framework for DFJSP using a Sequential Event-Space Calibrator (SESC) to generate difficulty-stratified instances via Schedule Stress Index (SSI). It identifies an 'Observability Paradox' in LLM-based scheduling agents: providing oracle access to full structural information degrades performance compared to concise information. Tool-augmented and refinement strategies also fail to reliably improve performance.
DynaSchedBench uses SESC and SSI to generate calibrated DFJSP instances, outperforming evolutionary baselines in efficiency.
LLM agents exhibit an Observability Paradox: full structural information harms decision-making.
Soro is a family of Tajik-specialized conversational LLMs built on Gemma 3, using 1.9B token Tajik continual pretraining and 40K instruction tuning examples. It substantially outperforms same-size Gemma 3 on Tajik benchmarks while retaining English performance. FP8/INT4 quantization preserves gains for edge deployment. An education pilot is underway in Tajikistan.
Based on Gemma 3, with 1.9B token Tajik continual pretraining and 40K instruction tuning examples.
Substantially outperforms same-size Gemma 3 on Tajik benchmarks, retains English performance.
This paper introduces an LLM-based architecture to detect and quantify the intensity of human values in text. The architecture comprises three coordinated modules that can adapt to various value theories, and experiments on the ValueEval dataset show good detection performance.
Proposes a modular LLM architecture for identifying human values in text, avoiding dependence on specific value theories or complex prompt engineering.
Three modules: generate structured value specifications, label texts using them, and assign graded support or resistance based on rhetorical and semantic evidence.
This paper presents a world model of protein biology realized through language modeling, demonstrating how large-scale language models can understand and predict protein structure and function.
Language models can capture complex patterns in protein sequences
The model excels in protein structure prediction and function annotation
Researchers from Sakana AI and the University of Tokyo propose DiffusionBlocks, which trains transformer-based networks one block at a time, reducing training memory by a factor of B (where B is the number of blocks) while maintaining performance across diverse architectures. The method interprets residual connections as Euler steps of reverse diffusion, enabling a principled local objective via score matching.
DiffusionBlocks partitions networks into B independently trainable blocks, reducing memory by B×.
It leverages the connection between residual networks and diffusion models to provide a theoretically grounded local training objective.
SQLite has added an AGENTS.md file to clarify its policy on AI-generated contributions: it does not accept pull requests without prior agreement, and does not accept agentic code at all, though it welcomes bug reports with reproducible test cases. The forum has been flooded with AI-generated bugs, leading to a separate bug forum.
SQLite added AGENTS.md to define AI contribution policy
Pull requests require prior agreement and legal paperwork
At Databricks, we’ve built a unique inference platform that serves every frontier model, from open source to proprietary, powering some of the largest agentic applications. Serving over 120T tokens per month, we tackle challenges of reliability and latency through abstractions like model units for capacity management, cost-aware load balancing and autoscaling that save over 80% GPU costs, and runtime reliability mechanisms including black-box health checks that detect silent failures. Profiling multimodal bottlenecks unlocked 3x throughput gains.
Databricks' inference platform serves frontier models including open source and proprietary, handling 120T tokens/month.
Model units provide a VM-like abstraction for capacity management, enabling cost-aware routing and scaling.
Artificial Analysis and IBM launch ITBench-AA, a benchmark for agentic enterprise IT tasks focusing on Site Reliability Engineering. Frontier models score below 50%, with Claude Opus 4.7 leading at 47%. The benchmark evaluates models on Kubernetes incident response, requiring diagnosis from logs and traces.
Claude Opus 4.7 leads at 47%, with GPT-5.5 at 46% and Qwen3.7 Max at 42%.
All frontier models score below 50%, making ITBench-AA one of the least saturated agentic benchmarks.
NVIDIA researchers have introduced Polar, a rollout framework that trains language agents using reinforcement learning without modifying their agent harnesses. Polar places a model API proxy between the harness and the inference server, capturing token-level interactions and reconstructing trainer-ready trajectories. Using GRPO on a Qwen3.5-4B base model, Polar improves SWE-Bench Verified pass@1 by 22.6 points under the Codex harness, 4.8 points under Claude Code, and 6.2 points under Pi. The framework is registered as a NeMo Gym environment and released under the ProRL Agent Server repository.
Polar enables RL training on any agent harness via a model API proxy without modifying the harness code
Achieves up to 22.6 point improvement on SWE-Bench Verified using GRPO on Qwen3.5-4B across four coding harnesses
The article argues that Anthropic and OpenAI have achieved product-market fit by shifting enterprise customers to API-based pricing and capitalizing on coding agent products. This inflection point, which began with model improvements in November 2025, accelerated in April 2026 with new model releases and pricing changes.
Both Anthropic and OpenAI have moved enterprise plans to API token pricing, with coding agents like Claude Code and Codex driving significant usage and revenue.
April 2026 saw new frontier models with higher API prices and enterprise customers locked into those rates via contract renewals.
A new analysis shows that top AI forecasters adjust their AGI timelines based on which lab is currently leading the field, with predictions swinging from earlier to later and back again as the dominant lab changes from ChatGPT to xAI/Meta/Gemini to Anthropic.
Predictions for when most cognitive labor will be automated (AGI) fluctuate significantly based on which AI lab is currently dominant.
From 2023-2025, most researchers moved AGI timelines earlier; from 2025-2026, they moved them later; in early 2026, under Anthropic's rapid progress, they moved earlier again.
This article contrasts the sense of connection from the early web with the isolating experience of modern AI, arguing that while AI is a useful tool, it cannot replace human interaction, and questions whether AI has genuinely social applications.
The early web fostered a collective 'we' experience, whereas AI interactions are often solitary.
The author considers AI a great tool, but not a person or a substitute for one.
Major AI models exhibit a secular-rational bias, ignoring religious perspectives in ethical questions. All tested models show a negative view of Jehovah's Witnesses, according to a study by a consortium of religious universities.
AI models rarely invoke religious perspectives in responses to ethical or personal queries, exhibiting an 'omissive bias'.
Every tested AI model had a negative bias toward Jehovah's Witnesses.
The article explores how AI is driving a paradigm shift in digital product design, moving from command-driven to intent-driven interaction, and analyzes the new challenges in product management, user experience, decision logic, release cycles, risk, and value creation.
AI represents the third user-interface paradigm in computing history, shifting from deterministic to probabilistic outputs.
Product teams must rethink the entire lifecycle from discovery to delivery; data strategy and model performance become as critical as feature strategy.
Last month at Beijing's half marathon, a robot named Lightning beat the human world record by nearly seven minutes. This is the latest in a series of AI milestones prompting questions about robots entering everyday life. China leads the charge with a pledge to invest over £100bn in robotics over the next 20 years.
Robot 'Lightning' beats human world record in Beijing half marathon.
China commits over £100bn to robotics investment over two decades.
Researchers propose a real-time asynchronous event-based monocular odometry for planetary rovers, using an Error-State Kalman Filter to process event camera data for robust ego-motion estimation under high dynamic range lighting and computational constraints.
Event cameras provide asynchronous pixel-wise brightness changes with microsecond resolution, ideal for high-speed sensing and HDR environments.
The approach uses an Error-State Kalman Filter to continuously estimate camera motion from event streams.
A new benchmark called What-If World tests video generation models' causal reasoning by presenting paired prompts that differ in one physical detail and checking if videos diverge correctly. Evaluating nine state-of-the-art models, none exceed 52% on paired scores, with open-source models around 28%, indicating significant room for improvement. Performance correlates with visual prominence rather than physics tractability.
What-If World benchmark uses 319 prompt pairs with single variable changes to test causal understanding in video generation models. It is built on real frames from nuScenes and DROID.
Scoring uses APEO rubric (Adherence, Physics, Environment, Outcome). All nine models struggle: best paired score is 52%, open-source models average 28%.
A prospective single-center clinical validation of the Melanoscope AI mobile dermoscopy CDSS demonstrated 88.6% agreement with expert assessment on 176 patients, with no false negatives and 88.3% specificity. The study developed a quantitative interpretability method for cascade deep learning models and a three-zone patient routing algorithm, supporting reproducible and interpretable decision-making for skin cancer screening in resource-limited settings.
The Melanoscope AI system achieved 88.6% agreement with experts on 176 patients, with zero false negatives among 5 malignant lesions.
Specificity reached 88.3%, with 3 melanomas and 2 basal cell carcinomas histologically confirmed.
This paper presents a behavioral-level activity recognition method using head-mounted IMU, going beyond basic motion primitives. The authors define five behavioral categories, construct a 160K-sample dataset from Ego4D with a four-tier quality assurance framework, and propose HiT-HAR, a 703K-parameter hierarchical model that outperforms prior models on action and scenario recognition. Observability analysis reveals locomotion is reliably observable, while object transfer and task operation benefit from temporal context; scenario-dependent signal overlap remains a challenge. Results show that architectural choices exploiting temporal context and scenario structure outperform simply scaling model size.
Proposes HiT-HAR, a hierarchical model for behavioral activity recognition from head-mounted IMU, going beyond motion primitives
Constructs a 160K-sample Ego4D dataset with 8 scenarios and 5 behavioral categories, using a four-tier quality assurance framework
This paper introduces Metric-Aware Principal Component Analysis (MAPCA), which parameterizes PCA with a positive-definite metric matrix and positions it within the geometric deep learning framework. MAPCA interprets the metric as a geometric prior, its solutions are equivariant under the orthogonal group preserving the metric, and its spectrum is invariant. A uniqueness theorem characterizes Invariant PCA (IPCA) as the unique linear data-derived metric in the MAPCA family that is equivariant under arbitrary diagonal rescaling. The paper also discusses extensions to kernel PCA, spectral graph methods, and deep MAPCA.
MAPCA parameterizes PCA with a positive-definite metric matrix, linking geometric deep learning symmetry and equivariance concepts.
A uniqueness theorem shows that IPCA is the unique linear data-derived metric in the MAPCA family equivariant under diagonal rescaling.
Research shows that diagonal state space models (S4D) outperform more complex Mamba architectures in time series classification tasks. The authors propose lightweight variants MS4 and MS4N, which achieve higher accuracy and efficiency on 59 datasets, matching deep learning models with 2x to 10x more parameters.
S4D consistently outperforms Mamba-based variants in accuracy and efficiency on TSC benchmarks.
Proposed MS4 and MS4N models use simple modifications like linear input projection and channel mixing.
This paper argues that within-person behavioral variability stems from dynamic latent states, not solely from observable inputs. By intervening on the state's weighting at decision time, outcomes become causally controllable. The framework integrates six lines of evidence (causal inference, predictive processing, allostasis, attentional bottleneck, chronobiology, computational psychiatry) and a 24-month observational dataset from over 200,000 users. It yields seven testable predictions and six operational requirements for state-aware systems, with implications for digital health, education, AI personalization, and personal agency.
Human behavioral variability is explained by dynamic latent states, not solely by observable inputs.
State is defined as a time-indexed weighting vector; intervening on state can causally control outcomes.
Machine unlearning verification typically focuses on output-level metrics, but a model can pass these while still encoding forgotten data in its internal representations. This paper introduces RULER, a set of representation-level verification metrics, including oracle-comparative M2 and oracle-free M4. Experiments show that approximate unlearning methods pass output-level tests but exhibit significant residuals in representation-level analysis.
Current output-level verification for machine unlearning is insufficient as models may retain forgotten data in intermediate representations.
RULER introduces two representation-level metrics: M2 (requires oracle model) and M4 (oracle-free).
Analogous to the origin of species, this paper addresses the origin of synthetic information, proposing a steganography-based mechanism to trace the lineage of AI-generated content, crucial for maintaining truth and trust in an era of advanced generative models.
Synthetic information origin is a fundamental mystery in information science with deep societal impact.
The authors propose a steganographic method to embed hereditary traits into synthetic data.
Microsoft's MAI-Image-2.5 ranks third on Arena's text-to-image leaderboard, on par with Google's Nano Banana 2 but still behind OpenAI's Image-2. The model shows clear gains over its predecessor, especially in rendering text inside images and commercial visuals.
MAI-Image-2.5 ranks third on Arena leaderboard, tied with Google's Nano Banana 2
Improvements in text rendering and commercial visuals
The Vatican's new encyclical by Pope Leo XIV defends human imperfection as a source of dignity and warns against outsourcing core human capabilities to AI, countering Silicon Valley's dismissal of human limitations.
Pope Leo XIV's encyclical 'Magnifica Humanitas' defends human finitude as a source of beauty and dignity.
The document warns against AI making moral decisions and centralizing power in tech elites.
Simple Wearable Report turns Oura data into a lab-style report. The free tool provides an option to upload to chatbots, allowing further AI analysis. Here's how I've been using it.
Simple Wearable Report transforms Oura Ring data into scannable reports for sharing with doctors or uploading to AI chatbots.
Compared to Oura's built-in AI advisor, third-party chatbots like Gemini provide more detailed, quantitative analysis.