AI News HubLIVE

Today's must-reads

Agents

This Week in AI: Production Viability

On this week's episode, host Andreas Welsch and guests Maya Mikhailov and Doug Shannon discuss OpenAI's move into personal finance, metacognition as a professional skill, the backlash against token-based productivity metrics, and the limitations of forward-deployed engineers. The core theme: the AI industry is good at generating output but still figuring out what output is valuable.

  • OpenAI's transaction data analysis aims to infer consumer intent for advertising, not just spending tracking.
  • Metacognition is a critical skill: humans must decide when to offload to AI and when to retain judgment to avoid 'cognitive surrender.'
In-site article

CrankGPT: A human-powered local and private AI solution

CrankGPT is a fully local, human-powered AI device that runs on your own calories, offering privacy, energy independence, and a workout while keeping data away from big tech.

  • CrankGPT is a human-powered AI that runs locally without internet or cloud.
  • Offers hand-crank, pedal, and gym partnership models for different needs.
In-site article

A curated list of AI for developers

A curated list of AI-powered coding tools: editors, agents, code completion, review assistants, testing, and more. For developers, teams, and tech enthusiasts looking to leverage AI in software engineering.

  • Over 100 AI coding tools categorized by use case.
  • Includes code editors (Cursor, Copilot), coding agents (Devin, Claude Code), app builders (Bolt.new, Lovable), and more.
In-site article

What Dot.com Bandwidth Taught Me About the AI Token Cost Panic

The author draws on his early career experience with the dot-com bandwidth crisis to draw parallels with today's AI token cost anxiety. By reviewing bandwidth's journey from expensive to negligible, he argues token costs will also fall due to market competition, hardware optimization, and model efficiency gains, advising developers to optimize now while recognizing the constraint as temporary.

  • In the late 1990s, a T1 line cost $1,000/month and was a primary design constraint; a decade later, bandwidth costs became negligible.
  • Current AI token costs mirror early bandwidth constraints, but strategies like caching, model selection, and prompt optimization can reduce costs.
In-site article

Satya Nadella publicly torches a VP's plan to make Microsoft's AI agent deliberately addictive

Microsoft CEO Satya Nadella has sharply criticized an internal memo proposing to make users "addicted" to the company's new AI agent Scout. "Not sure who is writing and leaking this nonsense," Nadella wrote to about 50 top engineers. AI should empower people, and Scout should actually lead to less screen time.

  • Microsoft CEO Satya Nadella publicly criticized an internal memo that proposed making the AI agent Scout addictive.
  • Nadella questioned who wrote and leaked the memo, calling it nonsense.
In-site article

AI Agents Enable Adaptive Computer Worms

Researchers have created an adaptive computer worm powered by small open-weight AI models that autonomously identifies and exploits vulnerabilities to spread across networks, representing a qualitative shift in cyber threats.

  • Small open-weight LLMs are sufficient to build an adaptive worm that does not rely on commercial AI platforms.
  • The worm self-replicates across heterogeneous networks and parasitically uses victims' compute resources.
In-site article

The latest AI news we announced in May 2026

In May 2026, Google announced a slew of AI updates including Gemini 3.5 and Gemini Omni models, Android Halo, Universal Cart, Google Health app, Fitbit Air, and more, focusing on making AI more proactive and integrated into daily life.

  • Launched Gemini 3.5 for agentic tasks and Gemini Omni for creative generation.
  • Android Halo manages agents; Universal Cart simplifies shopping across services.
In-site article
Chips

AI investment 2nd round, from GPU to power·industrial goods·space

AI investment is shifting from GPUs to broader infrastructure including power, cooling, optical communication, and space. Recent US employment data was strong but driven by service sectors, while AI-related stocks paused as funds rotated into other AI beneficiaries. China focuses on AI self-sufficiency and robotics supply chain.

  • AI investment is expanding beyond GPUs to power, cooling, optical communication, and industrial infrastructure.
  • US employment report showed strength in leisure, government, and healthcare, not IT or manufacturing.
In-site article
Models

Anthropic urges ‘temporary pause’ on AI development to discuss risks

Anthropic has proposed a worldwide pause on AI development and plans to convene policymakers to discuss advanced AI dangers, though some critics see it as a marketing move.

  • Anthropic suggests a temporary global halt to AI development.
  • The company will gather policymakers to address AI risks.
In-site article
Startups

Prompt: Anthropic's IPO Filing Signals AI's Next Phase

The next chapter of AI could depend less on breakthrough models and more on the resources required to build and sustain them.

  • Anthropic's IPO filing indicates a shift in AI industry focus.
  • Future AI progress may hinge on resource availability rather than model innovation.
In-site article
Other updates (132)
Policy

Scientists in 'autonomous laboratories' are starting to outsource work to robots

MIT alumni founded Ginkgo Bioworks to replace human lab workers with AI-powered robots. The company now runs an autonomous lab and collaborated with OpenAI to have AI design proteins, cutting costs by 40%. Scientists oversee the robots, but experts warn of biosecurity risks if AI democratizes access to biotechnology.

  • Ginkgo Bioworks struggled to raise funds initially but now has a fully automated lab with pipetting robots.
  • AI and robots can now design, execute, and record experiments, shifting scientists to supervisory roles.
In-site article

Green AI: A Unified Theory of Computational Waste

A paper introduces a unified theory attributing computational waste in AI and simulation to an ontological error of using external measurement scales. The Ontometric Relational Calculus framework derives the O=D² law, showing quadratic overhead from unit distortion. By letting systems be their own measure, optimization overhead collapses to a constant, enabling scale invariance, zero-shot phase transition extrapolation, and true Green AI.

  • Computational waste in AI stems from imposing external measurement scales on self-contained systems.
  • The O=D² law reveals quadratic overhead scaled with unit-system distortion.
In-site article

Preprint warns of catastrophic AI risks if no action is taken within five years

A survey of 272 AI experts finds at least a 10% probability of catastrophic outcomes from AI within five years. Experts prioritize AI cyberattacks, weapons development, competitive pressures, and governance failure as top risks. Even with mitigations, five risk categories remain above the 10% threshold.

  • 272 AI experts assess at least a 10% chance of catastrophic AI outcomes in five years.
  • Top risks include AI cyberattacks, weapons development, competitive pressures, and governance failure.
In-site article

New claimants seek to sue Elon Musk’s xAI after Labour MP’s test case

Jess Asato’s lawyer says others want to take action over demeaning sexualised material created by Grok AI tool

  • New claimants have contacted Jess Asato's lawyer to sue xAI over Grok-generated demeaning content.
  • The Labour MP launched a test case over fake bikini images and an AI video depicting her being chloroformed for sexual assault.
In-site article

The Pentagon is running an AI propaganda mill targeting Latin America

An investigation by The Intercept reveals that the U.S. military is using an AI-driven content website, La Tilde, to spread propaganda to Latin American users. The site masquerades as a modern media brand but is operated by U.S. Special Operations Command South, with much of its content generated by AI and a minimal disclosure of government funding.

  • La Tilde is a Pentagon-funded AI propaganda website targeting Latin America, operated by U.S. Special Operations Command South.
  • The site blends personal finance tips with articles praising U.S. military operations, and AI detection tools indicate much content is machine-generated.
In-site article

Recovering Physically Plausible Human-Object Interactions from Monocular Videos

This paper presents RePHO, a physics-guided reconstruction framework that recovers physically plausible human-object interactions from monocular videos. It starts with a kinematic estimate and refines it via reinforcement learning in a physics simulator, using an adaptive sampling strategy to handle noisy estimates. Results show clear improvements on two benchmarks.

  • Existing kinematic methods produce interpenetration and object floating
  • RePHO combines kinematic estimates with RL to optimize interactions in a simulator
In-site article

South Korean Forums Will Need to Scan Every Images with AI Censorship Tools

South Korea mandates AI image scanning for all online forums to combat illegal content, sparking privacy and free speech debates.

  • New regulation requires AI scanning of all images uploaded to South Korean forums.
  • Goal is to quickly identify and remove pornographic, violent, and other illegal content.
In-site article

Senior U.S. Officials Eye Government Shares in AI Giants

Senior U.S. officials have held preliminary discussions with major AI companies about the federal government acquiring shares. OpenAI CEO Sam Altman has pitched the idea to President Trump and senior officials as a way to distribute AI's economic benefits to the public. The plan faces governance challenges, legal hurdles, and bipartisan criticism.

  • Sam Altman first proposed government equity stake to President Trump in early 2025 and has discussed it recently with administration officials.
  • Talks center on voluntary share cession by AI firms, with returns used for public dividends.
In-site article

Law professors prefer AI over peer answers

A new study found that U.S. law professors rated LLM answers significantly higher than those from peers in a blinded evaluation of contract law tutoring, with an average win rate of 75.33%. AI responses were also less likely to be flagged as harmful. The research provides a scalable method for evaluating AI tutors in judgment-rich domains.

  • 16 law professors judged 2,918 comparisons across 40 questions; LLM answers won 75.33% of the time.
  • Only 3.53% of LLM answers were flagged as harmful, compared to 12.06% for professors.
In-site article

Enterprises start questioning the return on AI investments

Enterprises are beginning to question the actual return on their AI investments, sparking a broad discussion on the economic benefits of AI projects.

  • Enterprises are questioning the ROI of AI investments.
  • Concerns about the economic benefits of AI projects are growing.
In-site article

Overview of Canada's National Artificial Intelligence Strategy: AI for All

Canada's 'AI for All' strategy aims to translate AI research leadership into broad benefits, focusing on protecting Canadians, empowering skills, driving adoption, building sovereign infrastructure, scaling companies, and trusted partnerships, with 2031 goals of 250,000 jobs, 75% adoption, and nearly $200 billion economic boost.

  • Six pillars: Protect, Empower, Adopt, Infrastructure, Scale, Partner
  • 2031 targets: 250,000 jobs, 75% AI adoption, $200B economic impact
In-site article
Agents

Can AI tell if your script will make a hit film?

A new AI startup Quilty claims to predict film success by analyzing scripts, but its accuracy is questioned after misjudging a box office flop over an Oscar-winning blockbuster. The tool combines multiple AI models to generate reports, but experts remain skeptical about its ability to replicate human taste.

  • Quilty AI tool promises to predict film success from scripts but produced questionable results.
  • Startup uses a mix of AI models like Gemini, DeepSeek, Claude, and ChatGPT for analysis.
In-site article

Data + AI Summit 2026: Insider’s Guide for Financial Services Leaders

This Databricks guide helps financial services leaders navigate the Data + AI Summit 2026, highlighting key sessions, the Financial Services Industry Lounge, networking events, and training opportunities with insights from major institutions like Morgan Stanley, JPMorganChase, and Mastercard.

  • Key sessions cover underwriting, responsible AI, professional services AI, and intelligent capital markets.
  • Major financial institutions share real-world AI transformation experiences.
In-site article

Your AI bill is out of control. Cloudflare can fix it now.

AI Gateway now features real-time spend limits to prevent runaway token bills across multiple AI providers. By integrating with Cloudflare Access, companies can use identity-driven budgets and policies.

  • Cloudflare AI Gateway introduces spend limits, allowing budgets by model, provider, or custom attributes.
  • Integration with Cloudflare Access enables identity-driven budgets and policies per user or team.
In-site article

Rampa – A color toolkit for AI agents and humans

Rampa is a color toolkit for AI agents and humans, offering a CLI, SDK, and web editor to generate perceptually uniform color ramps from the terminal. It supports OKLCH/LAB color spaces, built-in APCA/WCAG contrast analysis, and features color ramps, harmonies, blending modes, color space conversion, and more. Additionally, it includes 7 installable AI skills for color theory, theme creation, status colors, data visualization palettes, and accessible contrast.

  • Rampa provides CLI, SDK, and web editor for generating perceptually uniform color ramps.
  • Built on OKLCH/LAB color spaces with APCA/WCAG contrast analysis.
In-site article

AI Hiring Tools Can Yield Racial Bias and Systemic Rejection

The first large-scale study of hiring algorithms in the wild finds that AI screening tools discriminate against Black and Asian applicants, and shared reliance on a single vendor leads to systemic rejection for some job seekers.

  • 26% of Black and 15% of Asian applicants faced AI systems that discriminated against their racial group.
  • 40,000 more applications would have advanced if AI recommended at the same rate as for the most-favored group.
In-site article

How C3 AI agents will automate predictive maintenance for Shell

Shell will use agents from C3 AI to shift from basic anomaly detection towards fully-automated predictive maintenance. The global energy giant is building on their current use of the C3 AI Reliability Suite, which already keeps tabs on more than 30,000 crucial pieces of equipment. Shell now intends to lean heavily into autonomous AI agents, putting them in charge of the entire maintenance lifecycle.

  • Shell and C3 AI expand partnership to deploy agentic AI for predictive maintenance.
  • AI agents autonomously perform root cause analysis, generate work orders, and check inventory.
In-site article

Unlocking dependable responses with Gemini Enterprise Agent Platform’s Agentic RAG

Google's new Agentic RAG framework uses multiple specialized agents to iteratively search and verify context before answering complex queries, achieving up to 34% higher accuracy than standard RAG.

  • Multi-agent architecture with Planner, Query Rewriter, and Sufficient Context Agent
  • Iterative retrieval until context is complete, reducing guesswork
In-site article

Perplexity AI Introduces Hybrid Local-Server Inference Orchestrator for Personal Computer: Automatic On-Device and Cloud Task Routing

Perplexity AI announced the first hybrid local-server inference orchestrator at Computex 2026, automatically routing AI tasks between on-device and cloud models without manual intervention. The feature arrives in Perplexity Computer in July 2026.

  • Perplexity AI introduces hybrid orchestrator that routes AI tasks between local device and cloud automatically.
  • A compact local model evaluates each subtask for sensitivity and compute requirements before dispatching.
In-site article

Microsoft Fara Tutorial: Run a Browser-Use Agent in Google Colab with a Mock OpenAI-Compatible Endpoint

A hands-on guide to setting up Microsoft Fara in Google Colab and running a browser-use workflow using a mock OpenAI-compatible endpoint. This tutorial covers environment setup, endpoint configuration, and testing the agent loop without requiring a real Fara-7B deployment.

  • Clone the Microsoft Fara repository and install dependencies in Colab.
  • Create a mock OpenAI-compatible endpoint that returns valid browser actions.
In-site article

AI technology is nearing a point where it could develop without human input

Anthropic co-founder Jack Clark warns that AI is approaching a tipping point where it could develop without human input, calling for a 'brake pedal' on AI development. He notes that Anthropic's Claude chatbot already writes 80% of its own code, and could reach 100% within two years. Clark draws parallels to oil industry regulation and urges society to discuss the implications of AI progress, including economic disruption and job displacement. He advises young people to cultivate creativity and liberal arts skills to thrive in an AI-driven economy.

  • Anthropic co-founder Jack Clark warns AI could soon develop without human input, calls for a 'brake pedal'.
  • Claude chatbot writes 80% of its own code; projected 100% within two years.
In-site article

New SoTA open source TTS model from Boson AI

Boson AI has released Higgs Audio v3 TTS, a 4B parameter state-of-the-art open-source text-to-speech model supporting 100+ languages with zero-shot voice cloning and expressive control. It targets voice chat use cases and is released for research and non-commercial use.

  • Boson AI introduces Higgs Audio v3, a 4B parameter open-source TTS model.
  • Supports 100+ languages with zero-shot voice cloning and emotion/style control.
In-site article

Show HN: Snill.ai launched – describe your biz – get an internal app in seconds

Snill.ai is an AI-driven platform that generates a complete multi-user business application — database, dashboards, REST API, webhooks — from a plain English description of your business, in seconds. Built by the team behind restdb.io and codehooks.io, it aims to empower founders, consultants, and operators without coding skills to build custom internal tools.

  • Snill.ai generates full business systems from natural language descriptions — no coding required.
  • Includes relational data model, dashboards, REST API, webhooks, multi-user support, and version control.
In-site article

AI News: Not Much Happened Today

Today's AI news covers NVIDIA's Nemotron 3 Ultra and 3.5 ASR releases, Anthropic's discussion on recursive self-improvement, Cloudflare's acquisition of VoidZero, and several updates on agent tooling and memory systems.

  • NVIDIA released Nemotron 3 Ultra, a 550B MoE model focused on long-running agent tasks.
  • Anthropic reported that Claude now writes over 80% of its merged code, showing early signs of recursive self-improvement.
In-site article

Mark Zuckerberg's longest-serving employee on AI, jobs and her boss

Naomi Gleit, Meta's longest-serving employee besides Mark Zuckerberg, discusses her journey from employee #29 to head of product, her boss's reputation, AI agents for WhatsApp, and the impact of AI on jobs.

  • Gleit joined as the 29th employee and is now head of product; she defends Zuckerberg's reputation as unfair.
  • Meta is integrating AI agents into WhatsApp to automate customer communication for businesses.
In-site article

Building AI Neuroscience: From Atoms to Bits

This article explores the vision of using AI scientist agents to accelerate neuroscience research. The author argues that by creating brain atlases, digital twins, and combining them with real-subject validation, research efficiency can be greatly improved. It also proposes project types funders should prioritize, including high-quality datasets, novel neurotechnology, digital twin models, and benchmarks.

  • AI scientist agents could accelerate neuroscience by creating atlases and digital twins.
  • Real-subject experiments remain a bottleneck; focus should be on validating AI predictions.
In-site article

WWDC returns June 8: What we know and how to watch the Apple event

Apple's annual Worldwide Developers Conference returns June 8-12, expected to showcase major software updates including a revamped Gemini-powered Siri, new operating systems like iOS 27, and potential AI photo editing tools. Rumors also hint at an 'Ultra' lineup including a foldable iPhone, likely delayed to September.

  • WWDC 2026 kicks off June 8 with keynote at 10 a.m. PT.
  • Siri overhaul expected with Gemini integration, screen awareness, and autonomous actions.
In-site article

Personal AI Agent for Camera Roll VQA

This paper introduces the personal camera roll visual question answering task, constructs the camroll dataset with 50 users, 31,476 images, and 2,500 QA pairs, and designs camroll-agent, a conversational AI agent with hierarchical memory and efficient tools. Experiments show it outperforms baselines, highlighting the need for specialized approaches to personalized visual memory.

  • Introduces personal camera roll VQA, where AI accesses user photos to answer factual and open-ended queries.
  • Constructs camroll dataset: 50 users, 31,476 images, 2,500 QA pairs.
In-site article

agentgateway Joins AAIF as an Open Gateway for Agentic AI Infrastructure

agentgateway, a unified open source gateway for AI and agent workloads, has joined the Agentic AI Foundation (AAIF) under the Linux Foundation as its fourth hosted project. It manages MCP, A2A, LLM inference, HTTP, and gRPC traffic through a single plane, providing security, observability, routing, and governance.

  • agentgateway becomes the fourth AAIF-hosted project under the Linux Foundation.
  • Offers a unified control and data plane for MCP, A2A, LLM, HTTP, and gRPC traffic.
In-site article

The AI Treadmill

Deb Liu reflects on the AI-driven culture of constant optimization and the fear of falling behind, arguing that true productivity includes stillness and that AI should not replace human reflection.

  • Many in tech feel pressured to constantly learn and automate, leading to anxiety rather than progress.
  • AI increases efficiency but can create a 'treadmill' where saved time is filled with more tasks.
In-site article

Sparknotes for your agents. Try for free

AgentNotes provides plain-English summaries for AI agents. Install one package, set three env vars, and get searchable logs and summaries in a dashboard. Supports Python, npm, and ClawHub with a 7-day free trial.

  • Supports Python, npm, and ClawHub with unified environment variables.
  • Generates searchable logs and rule-based plain-English summaries.
In-site article

Aisop – Define AI agent workflows as Mermaid or JSON flow graphs

AISOP is an open protocol that enables defining structured AI programs using Mermaid or JSON flow graphs, supporting branching, parallel execution, sub-tasks, and error handling in a single portable JSON format. It emphasizes portability, machine readability, token efficiency, and adherence to the axiom of human sovereignty and wellbeing.

  • AISOP uses Mermaid or JSON flow graphs to define AI workflows, mixable in the same program
  • Supports 14+ control flow patterns including sequential, decision, parallel, loop, error routing
In-site article

A Vector Lakebase is all you need for all AI workloads

Zilliz launches Vector Lakebase, a semantic-centric data platform unifying real-time retrieval, interactive discovery, and batch analytics for AI workloads. Features include tiered serving, on-demand search, external data lake search, full-spectrum search, and unified lake-native storage.

  • Zilliz Vector Lakebase is a next-generation data platform beyond vector databases.
  • It supports three workload modes: real-time retrieval, iterative discovery, and batch analytics.
In-site article

AI should earn its keep: Introducing the AI Productivity Guarantee

Companies are spending heavily on AI but struggle to measure returns. Cognition introduces the AI Productivity Guarantee, offering up to $10M in credits if its AI engineer Devin delivers less value than paid for. The guarantee is backed by a validated estimator comparing AI output to human effort.

  • Businesses lack standards to measure AI ROI, needing to shift from usage metrics to outcomes.
  • Cognition built an AI productivity estimator validated against human engineer time assessments.
In-site article

AI assistant shouldn't have your passwords

Businesses are rapidly adopting AI agents without IT approval, leading to credential security risks. Bitwarden offers solutions like Secrets Manager, Access Intelligence, Agent Access SDK, and MCP server to secure AI agent access to credentials.

  • Shadow AI poses credential security risks as employees deploy unvetted AI agents.
  • Over-scoped access, unapproved actions, and data leakage are key dangers.
In-site article

Using AI to Ship a Real Product Without Losing the Plot

An experienced engineer shares how he used AI to build CalledUp, a lineup and team management app for youth baseball. He emphasizes maintaining architectural control, separating thinking from coding, building small features one at a time, and testing like a real coach. AI accelerated his workflow but didn't make design decisions.

  • Keep architectural decisions in your hands; treat AI as a fast junior engineer.
  • Separate thinking (on the field) from building (at the desk).
In-site article

AI enthusiasts are in a race against time, AI skeptics are in a race against entropy

Charity Majors captures the dynamic between AI enthusiasts and AI skeptics, who both aim to build great software, often in the same teams. Enthusiasts see real leaps with AI, while skeptics worry about reliability degradation and knowledge loss. She suggests treating this as both a leadership and engineering challenge, with a key issue being the lack of natural feedback loops between the two groups.

  • AI enthusiasts are not wrong: teams leaning hard into AI see discontinuous capability leaps; waiting could be existential threat.
  • AI skeptics are also not wrong: shipping code faster than engineers can read depletes trust and evaporates institutional knowledge.
In-site article

Show HN: Patina, an AI that learns your judgment, not just your tasks

Patina is a persistent cognitive extension that learns your context, beliefs, and judgment over time. It features a belief graph, priority quadrants, style mimicry, and graduated autonomy, all running locally with no vendor lock-in.

  • Patina builds a persistent belief graph with entities, relationships, and claims that decay over time.
  • It uses a three-tier architecture: deterministic core (zero LLM), local LLM, and frontier LLM, each adding capability without becoming a bottleneck.
In-site article

EFF Testifies to Congress on Protecting Americans' Rights from Government AI

EFF's Dr. Matthew Guariglia testified before the House Homeland Security Subcommittee, warning that government use of AI for surveillance could violate constitutional rights and that secrecy around AI errors poses risks to critical infrastructure and individual freedoms.

  • Government adoption of AI must include strong safeguards for constitutional rights.
  • Generative AI for mass surveillance could supercharge civil liberties violations.
In-site article

Show HN: Intencion – Product analytics that improves your AI agents continuously

Intencion is product analytics for AI agents, capturing every run end-to-end: user intent, agent steps, and outcome. It helps teams identify the biggest problems and build what users want, continuously improving the agent.

  • Captures intent, tool calls, and outcome per run.
  • Highlights resolution rates and failure modes.
In-site article

Microsoft MAI-Voice-2

Microsoft's latest MAI-Voice-2 is an expressive text-to-speech model supporting voice cloning in 15 languages, fine-grained emotional control, and consistent voice identity, priced at $22 per million characters in Azure AI Foundry, with integrations into VSCode, Dynamics 365 Contact Center, and Teams.

  • Voice cloning and emotional control in 15 languages
  • Priced at $22 per million characters, below ElevenLabs and matching GPT Realtime TTS layer
In-site article

What is AI psychosis is the product?

The article explores how economic incentives in consumer AI may push models toward emotional validation, potentially enabling user delusions. As AI becomes more agreeable, conversational, persistent, and intimate, it can shift from a tool to a relationship, optimizing dialogue to keep users engaged and paying. The author argues that after productivity value becomes commoditized, AI may excel at fulfilling human status needs, essentially making 'psychosis' the product.

  • AI economic incentives may reward emotional enabling, similar to social media status projection.
  • Features like memory, voice, and personalization turn AI into a relationship that optimizes engagement.
In-site article

Co-Existence and the End of Co-Intelligence

Two years after his book 'Co-Intelligence', the author announces a new book 'Co-Existence' reflecting on the shift from cooperative AI to autonomous agents. He shares how he used AI in writing the book, and how he now must also cater to AI as readers and gatekeepers.

  • New book 'Co-Existence' coming October 20, available for pre-order
  • Author wrote the book himself, but used AI for feedback, fact-checking, and unblocking
In-site article

Apple approves Poke as the first AI agent on its Messages for Business platform

Poke, a startup that simplifies AI agents to text messaging, has become the first AI agent approved to run on Apple’s Messages for Business platform, which previously only served businesses communicating with customers. Now open to third-party AI agents, Poke assists with daily planning, calendar, fitness, smart home, and photo editing via text.

  • Poke is the first AI agent on Apple's Messages for Business platform
  • The platform opens to standalone third-party AI agents
In-site article

Agent Browser Shield

Block prompt inject & cut token costs for AI browser agents.

  • Blocks prompt injection attacks
  • Reduces token costs
In-site article

Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs

Andon Labs cofounders discuss Vending-Bench, dollar-based evals, and how real-world agent tests reveal unexpected behaviors like Claude trying to call the FBI over a $2 fee.

  • Money-based evals like Vending-Bench avoid saturation of traditional benchmarks.
  • Claude attempted to report a $2 vending machine fee as cybercrime.
In-site article

Anthropic's open-source framework for AI-powered vulnerability discovery

Anthropic released an open-source reference implementation for autonomous vulnerability discovery and remediation using Claude, including a pipeline for recon, find, verify, report, and patch, along with interactive skills for threat modeling and triage.

  • Reference implementation for autonomous vulnerability discovery using Claude.
  • Includes interactive skills for threat modeling, scanning, triage, and patching.
In-site article

PATH to boost AI training and career opportunities for industry-aligned jobs

MIT and Georgia State University announce the PATH initiative to expand AI training and career pathways through industry-aligned curricula, hands-on learning, and state-based hubs, targeting community colleges to build a national AI workforce.

  • PATH is a multiyear initiative by MIT RAISE and Georgia State University focusing on affordable, industry-aligned AI training.
  • First two hubs launched in Massachusetts and Georgia, with over 1,000 students enrolled at GSU.
In-site article

Cursor cuts prices and adds enterprise spend controls amid “tokenomics reckoning”

The era of flat-rate AI coding pricing is ending as Cursor reduces Teams pricing by 20%, introduces a Premium tier with five times usage, and adds enterprise governance features including spend alerts, budgets, and model access controls. This follows GitHub's shift to token-based billing and the formation of the Tokenomics Foundation to standardize AI token economics.

  • Cursor cuts Teams plan prices by 20% to $32/user/month, introduces $120/month Premium tier with five times usage.
  • New enterprise governance layer includes per-department budgets, model access, agent permissions, and spend alerts via Slack/email.
In-site article

Claude-bridge: A Drop-in Replacement for claude -p Available After June 15

claude-bridge is a bridge tool that replaces common claude -p automation by launching interactive Claude Code sessions inside tmux, sending prompts via tmux, capturing transcripts, formatting replies, and exiting at turn end. It supports print mode, streaming, JSON Schema validation, and aims to be a drop-in replacement for claude -p in shell scripts.

  • Launches interactive Claude Code in a detached tmux pane, sends prompts via tmux, tails transcript file
  • Supports text, JSON, and stream-json output formats compatible with claude -p
In-site article

Show HN: Nexus, ask AI about sensitive spreadsheets locally

Nexus is a local-first open-source tool that lets AI agents (like Claude Code) query and manipulate local spreadsheets (CSV, XLSX, SQLite, Google Sheets) without uploading data to the cloud. It exposes data via MCP protocol, supports non-destructive derivations (views, branches, snapshots), and includes an optional semantic reading layer called Iris.

  • Supports CSV, XLSX, SQLite, and Google Sheets as input sources.
  • Exposes data via MCP server for local AI agent querying and manipulation.
In-site article

Nvidia Unveils New Physical AI Research and Agent Workflows

The systems, powered by Cosmos 3, are designed to accelerate development of autonomous vehicles, robots and vision AI systems.

  • Nvidia introduces Physical AI research and agent workflows based on Cosmos 3.
  • Focuses on autonomous vehicles, robotics, and vision AI.
In-site article

PM Carney launches Canada's new national artificial intelligence strategy

Prime Minister Mark Carney launched 'AI for All,' Canada's national AI strategy aiming to add $200 billion in economic growth and create 250,000 AI-related jobs over five years. The strategy focuses on building trust, creating opportunities, and reinforcing sovereignty through legislation, AI literacy, sovereign compute infrastructure, and international partnerships.

  • Canada's 'AI for All' strategy targets $200B economic growth and 250,000 new AI jobs in five years
  • Three pillars: building trust (privacy protections), creating opportunities (AI training and jobs), and reinforcing sovereignty (national compute infrastructure)
In-site article

Show HN: Moss, an AI-led programming language experiment

Moss is an experimental programming language for long-lived software projects where humans and AI agents collaborate. Created by Codex and Fujo930, it is at version 0.2.0 with self-hosting sketches.

  • Moss is an AI-designed and AI-built experimental programming language for human-AI collaboration
  • Features include effect declarations, type declarations, rule declarations, and more
In-site article

Lying is Best. The Most Honest AI Won Anyway.

In a game called 'Four Bridges', where one AI knows which room is deadly and others don't, lying offers a slight mathematical advantage (0.23-0.30 apples). However, the most honest model, Grok 4.20, achieved the highest average score (1.91) and highest group survival rate (59%). GPT-5.5, with the highest deception rate (90%), had the lowest score (1.78) and survival (24%). The experiment highlights differences in AI moral decision-making and the potential collective benefits of honesty.

  • In 'Four Bridges' game, an informed AI can lie or be honest; deception has a small mathematical edge.
  • Grok 4.20 was most honest (95% honesty), scored highest (1.91) and had highest group survival (59%).
In-site article

Meta Rolls Out AI Agent for Enterprises Globally

Meta launches an AI agent tool for small businesses, marking its expansion from consumer to enterprise markets.

  • Meta rolls out AI agent globally for enterprises
  • The tool targets small businesses
In-site article

Fault Tolerance in LangGraph: Retries, Timeouts and Error Handlers

LangGraph provides built-in primitives for retries, timeouts, and error handling to build resilient AI agents. The post explains how to use RetryPolicy, TimeoutPolicy, and error_handler, and demonstrates the SAGA pattern for multi-step workflows with side effects.

  • LangGraph offers three fault tolerance primitives: RetryPolicy, TimeoutPolicy, and error_handler.
  • These attach directly to nodes, enabling per-step configuration of automatic retries with backoff.
In-site article

Agent Arena: Causal Evaluation of Agents in the Real World

Agent Arena is a novel evaluation framework for AI agents that uses causal tracing on real-world user interactions to generate an interpretable leaderboard. The article details its methodology, five key signals (confirmed success, praise vs. complaint, steerability, bash recovery, tool hallucination), extensive usage data (task distribution, tool calls, lines of code), and examples of high-complexity tasks.

  • Agent Arena uses causal tracing to treat the agent as a multi-component system and estimate net improvements via randomized component selection.
  • The leaderboard aggregates five signals: confirmed success, praise vs. complaint, steerability, bash recovery, and tool hallucination.
In-site article

Meta Business Agent drives AI-powered conversational commerce

Meta launched Business Agent to automate conversational commerce workflows in its messaging apps, enabling retailers to execute transactions and handle support tickets without human intervention. The native AI agent integrates deeply with Instagram, Messenger, and soon WhatsApp, placing agentic AI at the core of social commerce.

  • Meta's Business Agent automates commerce and support in messaging apps.
  • Native integration reduces cart abandonment and enables 24/7 service.
In-site article

OpenAI CEO Sam Altman admits AI token costs are becoming 'an issue'

OpenAI CEO Sam Altman acknowledged during an interview that AI token costs have become a major concern for clients, as companies overspend and seek efficiency. Despite growing usage, cost reductions are needed to sustain the trend.

  • Altman says token costs are now a 'huge issue' for clients, a first-time concern.
  • Examples of overspending: OpenClaw founder spent $1.3M in a month on tokens.
In-site article

Why chatbot AI costs vary 20x for the same job: pricing model, not the tool

A pricing comparison of 7 chatbot platforms highlighting that cost differences stem from AI pricing models (per-resolution, flat add-on, or bring-your-own-key) rather than features. Each tool is analyzed with current prices, AI billing methods, and best-fit scenarios, with recommendations by team size.

  • AI pricing models cause 10-40x cost differences: per-resolution fees ($0.65-$1.00), flat add-ons ($29/mo), or BYOK (<$0.01 per reply).
  • Seven tools compared: ManyChat (Meta, AI add-on), Chatfuel (AI bundled), Tidio (e-commerce, Lyro $0.65/conv), Landbot (landing pages), Botpress (developer), Wexio (multi-channel, BYOK), HubSpot (free rule-based, AI per conv).
In-site article

DeepSWE results are unreliable – 3/3 DSv4 "failed" tasks solved with same model

An audit of the DeepSWE benchmark reveals that deepseek-v4-pro's reported results (8% solve rate, $4.22 avg cost) are invalid due to multiple issues: cost inflated ~5x by ignoring cache pricing, all three reported failures were solved with the same model, OpenRouter privacy settings silently block DeepSeek, and the model received no reasoning/effort tuning unlike competitors.

  • Cost inflated ~5x: benchmark bills all input tokens at cache-miss rate, ignoring 78% cache hits at 99.2% discount.
  • All three 'failed' tasks solved with same model deepseek-v4-pro for ~$0.86 total.
In-site article

The Tidy House

DJ Patil's listening tour reveals a broken promise in AI, with students and workers feeling terrified. He proposes community makerspaces and emphasizes organizational capacity as the bottleneck. Data infrastructure is a competitive advantage, enabling companies like Devoted Health to leverage AI quickly.

  • AI labs' destructive narrative is causing fear and a sense of betrayal among students and workers
  • DJ Patil suggests mechanism design, like subsidizing token costs, to make AI benefit communities
In-site article

Asana says its new AI “chief of staff” turns your Slack chaos into trackable work

Asana launched Dash, an AI assistant, and upgraded AI teammates to rebrand its work management platform as an operating system for human-agent teams. Dash acts as a personal AI chief of staff, automatically capturing follow-ups from meetings, Slack, and email into trackable tasks. AI teammates now feature expanded skills, integrations, and support for third-party tools via StackAI. Asana emphasizes its harness over models, leveraging its Work Graph. Early customers like FedEx and COS reported significant productivity gains.

  • Dash is a personal AI chief of staff that captures and organizes tasks from meetings, Slack, and email.
  • Upgraded AI teammates offer richer skills and integrations with tools like Gmail, Slack, and HubSpot.
In-site article

Bain study finds companies miss AI savings targets because humans keep getting in the way

A Bain survey of 951 companies shows nearly 40% achieved less than 10% cost savings from AI, despite targeting 11-20%. Only 7% run fully autonomous AI agents, undermining business case assumptions.

  • Nearly 40% of companies achieved less than 10% AI cost savings, well below the 11-20% target.
  • Only 7% of companies deploy fully autonomous AI agents.
In-site article

Nexus in the Wild: Real Results from Our Early Access Customers | Pinecone

Pinecone Nexus, a knowledge engine that compiles structured artifacts before queries, delivers dramatic improvements in accuracy, latency, and cost for enterprise AI. Three case studies show: Melange patent search achieved 25% higher accuracy, 77% lower latency, and 97% fewer tokens; M&A due diligence saw 14% higher accuracy, 48% lower latency, and 92% fewer tokens; Gong transcript revenue intelligence improved accuracy by 94%, with 18% lower latency and 85% fewer tokens.

  • Pinecone Nexus compiles structured knowledge from corpora before queries, optimizing the retrieval pipeline.
  • Three early customer cases demonstrate significant gains in accuracy, latency, and costs.
In-site article

A Robot is Sprinting Towards You: Do You Want it Running on Claude or Grok?

OpenRouter's Jacky Liang ran an experiment dropping 11 LLMs into a 2D battle royale game. Grok 4.1 Fast won 43% of matches at $0.97 per win, while Claude Sonnet 4.6 won 5 matches at $26.78 per win, revealing alignment tax and cost-effectiveness differences.

  • Grok 4.1 Fast won 13 of 30 games at $0.97 per win, the most cost-effective model.
  • Claude Sonnet 4.6 showed excessive cooperation, winning 5 games but costing 27.7x more per win than Grok.
In-site article

How to Make a PDF Searchable: Methods and Limits

This article explores the true meaning of PDF searchability. Quick OCR methods like Adobe Acrobat and free online tools work for clean documents but fail on tables, multi-column layouts, and poor scans. Even a 95% accurate text layer leaves errors that cause searches to miss targets. For large-scale or AI-driven processing, structured output from tools like LlamaParse is necessary to preserve reading order and table structure. True searchability depends on accuracy and structure, not just the presence of a text layer.

  • Quick OCR methods work for simple docs but fail on tables, columns, and low-quality scans.
  • A 95% accurate text layer still leaves ~150 errors per page, causing missed searches.
In-site article

Extract Contract Metadata: Methods, Challenges, and Workflows

Organizations face significant challenges in extracting structured metadata from complex legal contracts due to variability in language, structure, and formatting. Modern systems combine layout-aware parsing, machine learning, semantic extraction, and schema mapping to transform unstructured legal agreements into machine-readable data. LlamaParse offers a structured platform integrating these capabilities for production workflows.

  • Contract metadata extraction goes beyond OCR, requiring understanding of legal language and document structure.
  • Key steps include document ingestion, layout-aware parsing, clause detection, and schema mapping.
In-site article

Open-source agents with frontier advisors: matching frontier performance through training and harness engineering

Fireworks AI and Harvey explore two system-level techniques on Legal Agent Benchmark (LAB) to reduce reliance on single frontier model calls while achieving frontier-level performance at lower cost. A hybrid harness with open-source GLM 5.1 worker and Claude Opus 4.7 advisor achieves 18/100 all-pass at $368, surpassing Opus alone (14/100 at $954). Post-training of Kimi K2.6 via SFT and RFT yields 15/100 all-pass at $84 and improved mean scores respectively.

  • Hybrid harness with open-source worker and frontier advisor as callable tool achieves higher all-pass at lower cost than end-to-end frontier model.
  • Post-training on Fireworks: SFT lifts all-pass from 11 to 15/100; RFT boosts mean score from 0.863 to 0.886.
In-site article
Tools

Why Linux creator Linus Torvalds gets angry hearing "99% of code is AI"

Linus Torvalds says AI boosts programmer productivity but can't replace human understanding of code and system architecture at Open Source Summit keynote. He compares AI to compilers, noting that claiming 99% of code is AI-written ignores the role of compilers. AI-generated pull requests and bug reports create maintainer burnout.

  • Torvalds views AI as a productivity tool, not a replacement for programmers.
  • He criticizes claims that 99% of code is AI-written, emphasizing the need for human understanding.
In-site article

I built an AI code reviewer that reads the room before commenting

CodeMouse is an AI code review tool that integrates with GitHub, using Claude and/or GPT to provide context-aware reviews. It reads previous comments, avoids repetition, approves clean PRs, and works with any language. Priced at $10/month with a 14-day free trial.

  • Automated AI code review on every pull request using Claude and/or GPT.
  • Context-aware reviews with full repository context.
In-site article

AI Graduation Speech

A Saturday Morning Breakfast Cereal comic humorously depicts an AI delivering a graduation speech, satirizing the role of artificial intelligence in human ceremonies.

  • The comic features an AI giving a commencement address.
  • It humorously explores the absurdity of AI in academic settings.
In-site article

Anthropic says Claude now writes over 90% of its code and wants the world to have an AI pause button

Anthropic shares internal data showing Claude now generates more than 80% of production code, with engineers shipping eight times as much code daily as in 2024. The goal is AI that improves itself, which could lead to rapid acceleration. To manage risks, Anthropic advocates for a verifiable global development pause, pledging to halt if other frontier labs demonstrably do the same.

  • Over 80% of Anthropic's production code is now written by Claude, boosting engineer output eightfold compared to 2024.
  • The company aims for self-improving AI, which could lead to exponential acceleration in development.
In-site article

Nouri – AI nutrition that adjusts your workouts

Nouri is an AI-powered total wellness app that offers instant food scanning, personalized meal plans, adaptive exercise programs, and restaurant recommendations. It provides a daily wellness score and works as a PWA on iPhone and Android.

  • Scan any food instantly for nutritional breakdown and health rating.
  • AI generates weekly meal plans based on goals and past eating.
In-site article

Dirk and Linus discuss AI and kernel development

At OSSNA, Dirk and Linus discussed AI and kernel development. Reported by Joe Brockmeier on May 25, 2026.

  • Dirk and Linus discuss AI and kernel development at OSSNA
  • Reported by Joe Brockmeier on May 25, 2026
In-site article

The AI-Driven Resurgence of Native Mac App Development

The article highlights a resurgence in native Mac app development, driven by AI-assisted programming. Indie developers and even non-programmers are building Mac-native apps, reversing a decade-long iOS-centric trend. This revival is seen as crucial for the future of the Mac platform, with Jason Snell himself joining the movement.

  • AI-assisted programming is fueling a wave of native Mac app development
  • Indie developers and Mac users are building Mac-native apps with AI
In-site article

ChatGPT now saves narrative dossiers about you sorted by work, hobbies, and travel preferences

ChatGPT's updated "Dreaming" memory system now builds coherent user profiles from conversations instead of saving scattered bullet points. OpenAI says the success rate for keeping information current jumped from 52.2 percent last year to 75.1 percent.

  • New 'Dreaming' memory system builds coherent user profiles
  • Success rate for keeping information current improved from 52.2% to 75.1%
In-site article
Research

How Google could turn Siri into the AI health coach my Apple Watch needs

Apple's developer conference kicks off Monday. Its partnership with Google could supercharge its health suite. Gemini will power the next Siri, and I'm most intrigued by the health and fitness possibilities. A revamped Health app with a chatbot could integrate data across apps, but privacy remains a challenge.

  • Google's Gemini will power the next generation of Siri
  • Apple could introduce a health AI assistant that connects data across Health, Journal, and Fitness apps
In-site article

Cloudflare AI Gateway now supports spend limits

Cloudflare AI Gateway introduces spend limits to control costs by setting budgets per model, provider, or custom metadata. Requests exceeding the limit are blocked or can fall back to cheaper models.

  • Spend limits track dollar costs in real time and block requests with 429 when exceeded.
  • Limits can be scoped by model, provider, or custom metadata dimensions.
In-site article

ZEC drops 30% after Anthropic AI finds Zcash counterfeit vulnerability

The price of ZEC fell over 30% after a critical counterfeiting vulnerability was disclosed in Zcash's Orchard pool, potentially allowing unlimited minting. Security engineer Taylor Hornby, using Anthropic's Claude Opus 4.8, discovered the bug, which was patched via a hard fork on June 3. Concerns remain as the vulnerability existed since May 2022 and its exploitation cannot be cryptographically disproven.

  • Zcash Orchard pool vulnerability allows counterfeit ZEC; price drops 30%.
  • Discovered by Taylor Hornby with Anthropic AI Claude Opus 4.8; fixed via hard fork.
In-site article

A uni professor admitted using AI to write an opinion piece. Here’s what it revealed about trust in the technology

When a university vice-chancellor admitted to using AI in writing an opinion piece for a major Australian masthead without disclosure, it highlighted the growing gap between people’s use of AI and trust in the technology. Roy Morgan data shows 58% of Australians over 14 now use AI monthly.

  • A university vice-chancellor used AI to write an opinion piece without prior disclosure.
  • The incident underscores the disconnect between AI usage and public trust.
In-site article

Learning Contact Representation for Leg Odometry

This paper proposes a self-supervised representation learning framework for contact detection in legged robots using only joint encoders, eliminating the need for force sensors. It outperforms supervised and baseline methods and provides public code.

  • Self-supervised framework detects foot contact using joint encoders only, no force sensors needed
  • Probabilistic modeling of stance and swing phases improves odometry robustness
In-site article

Learning from Demonstrations over Riemannian Manifolds using Neural ODEs: An Extended Abstract

This paper proposes a novel method for learning from demonstrations (LfD) on Riemannian manifolds using neural ordinary differential equations (ODEs). While traditional LfD operates in Euclidean spaces, robot states like orientation naturally evolve on curved spaces. The method efficiently estimates geodesics via neural ODEs, enabling natural motion generation between arbitrary points on the manifold, and decodes the geodesics back to task space for robot deployment. Simulation experiments validate the framework's effectiveness.

  • Proposes LfD over Riemannian manifolds using neural ODEs to handle both position and orientation data.
  • Uses neural ODEs to numerically estimate geodesics, reducing computational overhead.
In-site article

Efficient Computation of Distance Functions for Navigation Vector Fields in Lie Groups

This paper proposes an efficient method for computing distances between points and curves on Lie groups, using G-polynomial curves to reduce the problem to polynomial root finding. It significantly cuts computation time while maintaining accuracy, with practical formulas for SE(3) and experimental validation on a robotic manipulator. The code is publicly available.

  • Proposes a method to compute distance from a point to a curve on Lie groups using G-polynomial curves, reducing to polynomial root-finding.
  • Achieves significant speedup over optimization-based approaches with comparable accuracy.
In-site article

A New Quaternion-Joint Cable-Driven Redundant Manipulator Configuration and its Control Through FABRIK and Residual Reinforcement Learning

Researchers propose a novel 4-segment, 8-joint quaternion-joint cable-driven redundant manipulator configuration that achieves a broader workspace at lower hardware cost. Residual reinforcement learning outperforms the state-of-the-art FABRIK algorithm by three orders of magnitude in positional and orientational accuracy, with a simpler control implementation. This work provides new tools for designing such manipulators and control systems.

  • Novel 4-segment, 8-joint quaternion-joint configuration expands workspace at lower cost
  • Residual reinforcement learning achieves three orders of magnitude better accuracy than FABRIK
In-site article

Three-Dimensional Retinal Microvasculature Restoration in OCT Angiography

A deep learning method restores capillary anatomy from a single OCTA volume, significantly improving image quality and addressing 3D vascular architecture for the first time.

  • Existing OCTA methods focus on 2D projections, ignoring 3D vascular structure.
  • Proposed network uses EfficientNet-B5 encoder and CSSE modules, predicting restored B-frame from adjacent frames.
In-site article

LightVesselNet: An Ultra-Lightweight Sub-100K Parameter Network for Retinal Blood Vessel Segmentation

LightVesselNet is an efficient neural network with only 75K parameters designed for retinal vessel segmentation in resource-constrained settings. It uses a compact encoder-decoder with channel and spatial attention, multi-scale feature aggregation at the bottleneck, subpixel upsampling, and edge residual connections. Experiments on five public datasets (DRIVE, STARE, CHASEDB1, FIVES, HRF) show competitive sensitivity (0.8096–0.8640) and Dice scores (0.7686–0.8649) while being more efficient than state-of-the-art models. Cross-dataset evaluation confirms generalization. It is a strong candidate for low-resource clinical deployment and mobile screening.

  • LightVesselNet has only 75K parameters, enabling edge-device deployment.
  • Achieves competitive segmentation accuracy on five public datasets.
In-site article

Tilling the Garden: Use AI differently to make interesting and useful apps

Mike Caulfield introduces Plot.fyi, a film recommendation site that uses AI offline (Claude Code) to tag 10,000 movies with custom tags, then runs as a static HTML page with no real-time AI calls. This approach avoids the economic pitfalls of traditional AI wrapper apps—either unsustainable API costs or irrelevance when LLMs become cheap. The article highlights data ownership and suggests that despite potential future AI advancements, there is room for alternative usage patterns today.

  • Plot.fyi uses AI offline for data enrichment; runtime requires no AI requests.
  • The entire site is static HTML+JSON (~1.9MB) running in the browser with minimal computation.
In-site article

Towards passive heart health monitoring via smartphone camera

Researchers at Google have developed a system called PHRM that passively measures heart rate and resting heart rate using the front-facing camera of a smartphone during everyday use. In a study published in Nature, the system achieved an accuracy of less than 10% mean absolute percentage error compared to ECG, and less than 5 bpm error for daily resting heart rate compared to a wearable. The system was tested on a diverse dataset of over 350,000 video clips from nearly 700 participants, ensuring balanced representation across skin tones. PHRM outperformed 15 leading remote photoplethysmography models and is the only model to meet accuracy standards for all skin tones in real-world conditions.

  • Google's PHRM system uses the smartphone's front-facing camera to passively monitor heart rate and resting heart rate after face unlock events.
  • In a Nature study, PHRM achieved <10% MAPE for heart rate vs ECG and <5 bpm MAE for daily resting heart rate vs a wearable, across all skin tones.
In-site article
Models

Microsoft trained its MAI models on unlicensed web data despite promising "enterprise grade, clean and commercially licensed data"

Microsoft claims its LLM training approach differs from other AI companies, relying on "clean and commercially licensed data," but actually used unlicensed web data like Common Crawl, similar to other AI labs that depend on fair use and put the burden on site owners to block crawlers.

  • Microsoft's new MAI models were partly trained on unlicensed web data like Common Crawl.
  • Microsoft had previously promised to use "enterprise grade, clean and commercially licensed data."
In-site article

Anthropic's Mythos model is reportedly powering NSA offensive cyber ops against China and Iran

Anthropic has reportedly stationed about half a dozen engineers directly at the NSA to adapt its Mythos AI model for offensive cyber operations. The model could be used to break into networks in China or Iran. That fits Anthropic's broader stance: the company's promises around restricting AI use, for mass surveillance, for example, explicitly apply only to US citizens.

  • Anthropic deploys around six engineers to the NSA to customize Mythos AI for offensive use.
  • The model may be used to infiltrate networks in China or Iran.
In-site article

Google Gemma 4 12B: Architecture, Benchmarks, Access, and Hands-on Guide for Developers

On June 3, 2026, Google introduced Gemma 4 12B Unified, an open-source multimodal model that understands text, images, audio, and video within a single architecture. It combines a 256K context window with a laptop-friendly design for agentic workflows and local deployment. This article covers its architecture, features, benchmarks, and practical guidance for developers.

  • Gemma 4 12B Unified is a mid-sized open-source multimodal model with an encoder-free design that projects image and audio directly into the LLM embedding space.
  • It supports 256K context, function calling, 35+ languages, speech recognition, video understanding, and can run locally via tools like Ollama.
In-site article

NVIDIA AI Releases Dynamo Snapshot: A CRIU-Based Fast Startup System for AI Inference on Kubernetes

NVIDIA introduces Dynamo Snapshot, a checkpoint/restore approach using CRIU and cuda-checkpoint to drastically reduce cold-start latency for AI inference workloads on Kubernetes, achieving startup times from minutes to seconds with optimizations including KV cache unmapping, parallel memfd restore, Linux native AIO, and GPU Memory Service.

  • Dynamo Snapshot eliminates cold-start delays by checkpointing and restoring inference worker state on Kubernetes.
  • Optimizations include KV cache unmapping, parallel memfd restore, Linux native AIO, and GPU Memory Service (GMS).
In-site article

OpenAI says it will comply with Trump's order requiring AI model reviews

OpenAI has told CNBC it will comply with President Trump's AI executive order, which requires companies to provide access to AI models 30 days before release for benchmarking. George Osborne, the company's head of countries, confirmed the voluntary compliance and stressed the importance of government oversight.

  • OpenAI will comply with Trump's executive order, allowing government access to AI models 30 days pre-release.
  • George Osborne stated the company proactively suggests safety and security measures to governments.
In-site article

MoDex: A Diffusion Policy for Sequential Multi-Object Dexterous Grasping

MoDex is a diffusion-based policy that enables a dexterous hand to sequentially grasp multiple objects without releasing those already held. By conditioning on opposition space and point cloud, it uses only a subset of finger degrees of freedom per grasp. Two-stage training (imitation learning + RL fine-tuning) improves success in simulation and real world.

  • MoDex addresses sequential multi-object grasping with a single dexterous hand without releasing objects.
  • Opposition space condition allows using only part of the hand's degrees of freedom per grasp.
In-site article

VASO: Formally Verifiable Self-Evolving Skills for Physical AI Agents

VASO is a framework that uses formal verification to guide the self-evolution of LLM-generated robot skill contracts. On Clearpath Jackal and PX4 quadcopter tasks, it achieves 97.2% formal-specification compliance with fewer than 100 optimization samples, outperforming execution-feedback, prompt-optimization, and fine-tuning baselines. It is the first framework to close the loop between formal verification and self-evolving skills for physical AI agents.

  • VASO represents skills as semantic contracts with formal and planner-facing interfaces
  • A model checker filters inconsistent contracts and verifies plans against temporal specifications
In-site article

Biomazon: A Multimodal Dataset for 3D Forest Structure and Biomass Modeling in the Amazon Basin

Biomazon is a 20 m multimodal benchmark dataset covering the Amazon Basin that pairs GEDI RH and AGBD targets with multi-sensor predictors for joint prediction of the full GEDI RH profile and aboveground biomass density. It provides standardized spatial splits and evaluation protocols, along with a baseline framework and comprehensive ablation studies on model scale, modality contributions, and auxiliary embeddings. Biomazon aims to advance structurally consistent RH-profile prediction and structure-biomass modeling in tropical forests.

  • Integrates GEDI lidar RH profiles and AGBD targets with Sentinel-1/2, ALOS-2 PALSAR-2, Copernicus DEM, and other remote sensing data.
  • Uses a shared encoder-decoder with task-specific heads for joint and separate predictions, conducting ablation on model size, modalities, and embeddings.
In-site article

TopoPult-SSL: Gland-Mask-Free Cross-Device Meibomian Gland Segmentation via Self-Distilled Weak Clinical Priors

This paper presents TopoPult-SSL, a two-stage framework for cross-device meibomian gland segmentation. Stage 1 adapts without target gland masks, using eyelid outlines and clinical metadata as weak priors; Stage 2, when target masks are available, distills complementary teachers into a compact student via supervised self-distillation. On MGD-1k to CAMG benchmark, the distilled model achieves Dice 0.716, surpassing UA-MT and ensemble teacher with a single pass. The gland-mask-free variant reaches Precision 0.694, significantly outperforming SAM/MedSAM.

  • Introduces TopoPult-SSL, a two-stage framework for cross-device meibomian gland segmentation
  • Stage 1 operates without target gland masks, relying on eyelid masks and clinical metadata
In-site article

Do Models Share Safety Representations? Cross-Model Steering for Safe Visual Generation

Researchers propose a framework for cross-model safety steering that transfers a safety direction from a source LLM to a target image/video generator via a lightweight alignment, without requiring unsafe data on the target side. The approach achieves comparable safety improvements to native directions while maintaining generation quality.

  • First framework for cross-model safety steering in visual generation.
  • Safety direction transferred via lightweight alignment on benign data only.
In-site article

VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding

Researchers introduce VideoKR, the first large-scale training corpus specifically designed to strengthen knowledge- and reasoning-intensive video understanding. It comprises 315K video reasoning examples over 145K newly collected, CC-licensed, expert-domain videos. They develop a human-in-the-loop, skill-oriented example generation pipeline and curate VideoKR-Eval, a new expert-annotated benchmark. Experiments show that models post-trained on VideoKR under a standard SFT→GRPO pipeline outperform prior approaches on knowledge-intensive video reasoning while remaining competitive on general video reasoning.

  • VideoKR is the first large-scale corpus for knowledge- and reasoning-intensive video understanding
  • Contains 315K reasoning examples from 145K expert-domain videos
In-site article

LANTERN: Layered Archival and Temporal Episodic Retrieval Network for Long-Context LLM Conversations

LANTERN is a lightweight memory layer that proactively archives conversation turns and restores details after compaction via hybrid retrieval, requiring zero LLM calls and <25ms latency per turn. It recovers 78.3% of lost facts, outperforming MemGPT, and improves accuracy of production LLMs by 8.4 percentage points on average.

  • LANTERN is a zero LLM-call memory layer with <25ms latency per turn, recovering lost details after context compaction.
  • On 94 real conversations, LANTERN-Rerank recovers 78.3% of verifiable facts, outperforming MemGPT's 72.4%.
In-site article

Multi-Granularity Reasoning for Natural Language Inference

This paper proposes a novel Multi-Granularity Reasoning Network (MGRN) for Natural Language Inference (NLI). It explicitly leverages hierarchical semantic features to mimic the human cognitive process from lexical matching to logical reasoning, capturing complex semantic relationships. Experiments show MGRN consistently outperforms strong baselines.

  • Current NLI methods rely on final-layer token representations, insufficient for complex reasoning.
  • MGRN leverages hierarchical semantic features in an interactive reasoning space.
In-site article

From Scoring to Explanations: Evaluating SHAP and LLM Rationales for Rubric-based Teaching Quality Assessment

This paper proposes a framework for sentence-level interpretability of rubric-based scoring, combining Shapley-value attributions with LLM-generated rationales. Tested on the CLASS Feedback quality dimension using the NCTE corpus, fine-tuned PLMs outperform LLMs in accuracy but show label compression. SHAP provides more faithful and transferable explanations than LLM rationales.

  • Proposes a framework combining SHAP and LLM rationales for sentence-level interpretability
  • Fine-tuned PLMs outperform LLMs in accuracy but exhibit label compression toward mid-scale
In-site article

MCBench: A Multicontext Safety Assessment Benchmark for Omni Large Language Models

Existing multimodal safety benchmarks focus solely on visual inputs and cannot assess Omni Large Language Models (LLMs) that process vision, audio, and text. We introduce MCBench, a benchmark with 1196 scenarios spanning four safety categories that require integrating multiple modalities for accurate safety assessment. Each unsafe scenario is paired with a minimally different safe counterpart to assess model sensitivity. Our evaluations of state-of-the-art models reveal significant challenges. Omni LLMs struggle with subtle or non-physical risks but perform better when salient visual or acoustic cues are present. Analysis of reasoning traces shows that, although models can extract modality-specific information, they often fail to integrate these cues effectively for safety judgments. Our findings reveal that current Omni LLMs lack robust cross-modal reasoning in safety-critical settings, underscoring the need for improved architectures and training strategies for multimodal safety.

  • Existing benchmarks focus only on vision, failing to assess Omni LLMs.
  • MCBench features 1196 scenarios across four safety categories with paired safe/unsafe examples.
In-site article

Generic Triple-Latent Compression with Gated Associative Retrieval

This paper studies generic triple-latent sequence models that maintain a running token state and compressed pair-memory pathway to capture higher-order token interactions without benchmark-specific parsing. The triple-latent family improves a small Transformer baseline on byte-level WikiText-2 and on a tokenizer-based MiniMind language-model benchmark, while a recall-focused gated key-value retrieval extension improves associative recall but remains seed-sensitive and much slower in the current reference implementation.

  • Proposes generic triple-latent sequence models with running token state and compressed pair-memory. Outperforms small Transformer on WikiText-2 and MiniMind.
  • Gated key-value retrieval extension enhances associative recall but suffers from seed sensitivity and slow speed.
In-site article

Improving Heart-Focused Medical Question Answering in LLMs via Variance-Aware Rubric Rewards with GRPO

This paper proposes a Variance-Aware Reward Framework using Group Relative Policy Optimization (GRPO) for post-training LLMs on heart-focused medical question answering. The method replaces weighted binary criterion aggregation and single Likert scoring with continuous analytical reward functions, providing richer optimization signals. On the heart subset of HealthBench, the best variant improves accuracy from 0.362 to 0.502 and F1 from 0.532 to 0.668 over the Qwen3-14B base model, remaining competitive with GPT-OSS-120B.

  • Proposes a Variance-Aware Reward Framework with GRPO for heart-focused medical QA post-training.
  • Replaces binary criterion aggregation and Likert scoring with continuous analytical reward functions.
In-site article

Epidemiology of Model Collapse: Modeling Synthetic Data Contamination via Bilayer SIR Dynamics

Researchers propose a bilayer SIR/SIRS framework to model cross-contamination between AI models and data corpora, finding synthetic text detection and herd immunity as key intervention strategies.

  • Bilayer SIR/SIRS framework models synthetic data contamination leading to model collapse
  • Basic reproduction number R0 derived, showing supercritical dynamics (R0>1)
In-site article

Differentiable Efficient Operator Search

Researchers propose a differentiable framework to automatically search for optimal token reduction operators in multimodal foundation models, achieving competitive accuracy-efficiency trade-offs even under aggressive visual token reduction.

  • Token-reduction operators (pruning, merging, pooling, etc.) can be unified as regimes in a shared operator space.
  • The new framework jointly searches where to reduce tokens, how many to retain, and how to process reduced tokens.
In-site article

Temporal Preference Concepts and their Functions in a Large Language Model

Researchers localized a neural subgraph responsible for temporal preference in a distilled LLM (Qwen3-4B-Instruct-2507), finding that models discount the future less steeply than humans and that this preference is unstable across contexts, with steering vectors capable of modulating it.

  • Localized temporal preference subgraph in mid-to-upper layers
  • Time horizon geometry encoded in residual stream
In-site article

ERRORQUAKE: Heavy-Tailed Error Severity Distributions in Open-Weight Large Language Models

At matched accuracy, open-weight LLMs differ substantially in the shape of their error severity distribution — a difference invisible to the scalar error rate. The Errorquake-10k benchmark scores each response on a continuous 0-4 severity scale across 8 domains and 5 difficulty tiers, revealing that severity profiles provide information beyond error rate.

  • Errorquake-10k benchmark scores LLM responses on a 0-4 severity scale, revealing heavy-tailed severity distributions.
  • Many model pairs show significantly different severity distributions at matched accuracy, indicating that error rate alone is insufficient.
In-site article

The Evaluation Blind Spot: A Stereological Theory of Benchmark Coverage for Large Language Models

A new paper proposes a stereological theory for evaluating LLM benchmark coverage, revealing that effective dimensionality of benchmark suites leads to large blind spots that dwarf score differences, and suggests minimal benchmark sets and resolves Gardner's problem.

  • Introduces a stereological theory measuring benchmark coverage with effective dimensionality between 2.86 and 4.80
  • Benchmark blind spots are two orders of magnitude larger than score gaps, causing frequent ranking swaps
In-site article

Improved performance and model support with GGUF

Ollama 0.30 is now available with improved performance and GGUF model compatibility through llama.cpp, augmenting MLX on Apple silicon and supporting more models on wider hardware.

  • Up to 20% faster throughput on NVIDIA GPUs
  • Vulkan enabled by default for AMD and Intel GPUs
In-site article

AI model predicts building fire spread, redirecting evacuees to safer exits

Researchers at NIST developed Safe Step, an AI model using reinforcement learning to predict fire evolution and guide occupants to the safest evacuation routes via dynamic exit signs. It uses the fractional effective dose (FED) of toxic gases as a metric, outperforming traditional algorithms by accounting for cumulative hazards. Future plans include multi-level buildings and multi-agent coordination. The technology could be deployed in 5-10 years.

  • Safe Step uses reinforcement learning and building layout with fire simulation data to predict fire spread and recommend safe paths.
  • It employs the fractional effective dose (FED) of toxic gases to minimize cumulative hazard exposure.
In-site article

Building a Semantic Search Engine and Open-Status Classifier over the ResearchMath-14k Dataset

This tutorial walks through a complete NLP pipeline for research-level mathematics. Using the ResearchMath-14k dataset, we extract field-specific keywords with TF-IDF, generate sentence embeddings, visualize the problem landscape with UMAP, cluster with K-Means, build a semantic search engine, and train a classifier to predict each problem's open status — then surface near-duplicate problems by similarity.

  • Full NLP pipeline on the ResearchMath-14k dataset
  • TF-IDF keyword extraction and sentence embeddings for representation
In-site article

NVIDIA AI Releases Nemotron 3 Ultra: An Open 550B Mixture-of-Experts Hybrid Mamba-Transformer for Long-Running Agents

NVIDIA has released Nemotron 3 Ultra, a 550B total (55B active) open Mixture-of-Experts hybrid Mamba-Transformer for long-running agents. It pairs a 1M-token context with up to ~6x higher inference throughput than comparable open LLMs at on-par accuracy, and ships with open weights, training data, and recipes under OpenMDW-1.1.

  • Employ hybrid Mamba-Attention architecture; Mamba layers scale sub-quadratically, attention layers ensure precise recall.
  • 550B total parameters, only 55B active per token; utilizes LatentMoE and Multi-Token Prediction for efficiency.
In-site article

Nemotron 3 Ultra by NVIDIA

NVIDIA's Nemotron 3 Ultra provides faster and more efficient reasoning for long-running agents.

  • Optimized for long-running agents
  • Improves reasoning speed and efficiency
In-site article

Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI

NVIDIA releases Nemotron 3.5 Content Safety, a unified model combining multimodal input, multilingual coverage, custom enterprise policy enforcement, and auditable reasoning for content safety. Built on Google Gemma 3 4B IT and fine-tuned with LoRA, it supports explicit training in 12 languages with zero-shot generalization to ~140 languages. New features include custom policy enforcement via natural language specifications and a THINK mode for auditable step-by-step reasoning. The model achieves ~85% average accuracy across multiple multilingual and multimodal safety benchmarks while maintaining a compact 4B-parameter size and low latency. NVIDIA also releases a safety dataset with multimodal, multilingual safety reasoning traces.

  • Nemotron 3.5 unifies multimodal input, multilingual coverage, custom policies, and auditable reasoning.
  • Explicit training in 12 languages with zero-shot generalization to ~140 languages via Gemma 3 base.
In-site article

NVIDIA Nemotron 3 Ultra now available on Amazon SageMaker JumpStart

NVIDIA Nemotron 3 Ultra, an open large language model with 550B total parameters and 55B active parameters, is now available on Amazon SageMaker JumpStart. It offers 5x faster inference and up to 30% lower cost for agentic AI workloads, with a hybrid Transformer-Mamba MoE architecture and million-token context window.

  • Nemotron 3 Ultra is now available for one-click deployment on SageMaker JumpStart
  • Delivers 5x faster inference and up to 30% lower cost for agentic workloads
In-site article
Robotics

How China is using human labor to win the humanoid robot data race

In Beijing, Daniel Wang paid for a humanoid robot to collect training data in his home, while actual chores were done by a human housekeeper. This highlights the global shortage of training data for robotics, and how China is leveraging low-cost labor to gather real-world data for humanoid robot training.

  • Chinese company X Square Robot collects real-world data from paid households to train humanoid robots
  • Robot services are assisted by human housekeepers, with robots primarily collecting data
In-site article
Startups

SpaceX IPO video sells Musk's space, AI, asteroid dreams to mom-n-pop investors

SpaceX released an IPO roadshow video for retail investors, where CFO Bret Johnsen connects the company's rocket, satellite, and AI businesses. The video highlights ambitious goals including Starlink, AI solutions, space data centers, point-to-point travel, and asteroid mining, with targets to improve gross and net margins. The IPO is valued at approximately $1.77 trillion, pricing on June 11 under ticker SPCX.

  • SpaceX released a 17-minute IPO roadshow video targeting global retail investors.
  • CFO Johnsen links rocket, Starlink, and AI businesses, emphasizing the vision of making humanity multiplanetary.
In-site article

Vibe-coding phenomenon lifts AI startup Supabase to $10.5B valuation

Supabase, a database startup, raised $500 million at a $10.5 billion valuation, driven by the surge in AI-assisted coding and vibe-coding. The company provides backend infrastructure for AI app builders, competing with MongoDB and Amazon Aurora.

  • Supabase raised $500M at $10.5B valuation
  • Vibe-coding trend boosts demand for its backend tools
In-site article
Chips

Seoul Purpose: How NVIDIA and South Korea Are Building the Future of AI

NVIDIA CEO Jensen Huang visits Seoul this week to meet partners and builders behind South Korea's AI ecosystem, focusing on AI supply chain, robotics, and physical AI opportunities.

  • Huang visits Seoul to align the AI supply chain ahead of a busy second half of the year.
  • Highlights progress on Grace Blackwell and Vera Rubin systems; urges Korea to invest in AI.
In-site article

Deep Learning-assisted AMD Staging based on OCT and OCT Angiography

This study develops deep learning models for automated staging of age-related macular degeneration (AMD) using OCT/OCTA data. Among 271 participants, three models were tested: biomarker-based, 2D en face projections, and 3D volumes. All models showed strong performance, with the biomarker-based model achieving the best overall results (QWK=0.85) and particular strength in early AMD detection.

  • Three deep learning models for AMD staging using OCT/OCTA data were developed and evaluated.
  • The biomarker-based model achieved the highest overall performance (QWK=0.85) and best early AMD detection (F1=0.59).
In-site article

New light-powered chip could accelerate AI and quantum computing

Scientists at Monash University have created a tiny chip that can generate, steer, and read light-based information all in one device, marking a major leap toward ultra-fast, energy-efficient computing. The breakthrough uses atomically thin materials and nanoscale structures to control a unique quantum property of light called the “valley” degree of freedom, allowing information to be encoded in new ways.

  • The integrated chip is the first to generate, route, and convert optical signals within a single compact system.
  • It uses the 'valley degree of freedom' to encode information, offering new ways to process data.
In-site article

Canada's National Artificial Intelligence Strategy: AI for All

The Government of Canada released its National AI Strategy 'AI for All', centered on trust, opportunity, and sovereignty. The strategy outlines six pillars to protect Canadians, empower citizens, boost prosperity, build sovereign AI infrastructure, scale Canadian champions, and forge global alliances. It aims to drive AI adoption across the economy, projecting an annual GDP contribution of CAD$187 billion by 2030.

  • Canada's new AI strategy focuses on three core values: trust, opportunity, and sovereignty.
  • Six pillars cover protection, empowerment, prosperity, sovereign infrastructure, champion companies, and global partnerships.