AI News HubLIVE

Today's must-reads

Startups

Company spent $500M on Claude AI in one month after forgetting usage limits

A company accidentally incurred $500 million in charges from Anthropic's Claude AI in a single month after failing to set usage limits, highlighting the need for robust monitoring and cost controls in enterprise AI adoption.

  • A company forgot to set usage limits on Anthropic's Claude AI, resulting in a $500M bill for one month.
  • The incident was reported on May 28, 2026, on Tech Startups.
In-site article
Models

Mistral says Europe has two years to build its own AI infrastructure

At Mistral AI's summit, CEO Arthur Mensch warned that Europe has just two years to build sufficient AI infrastructure or risk becoming a 'vassal state' to American AI. The event drew a large crowd, highlighting growing European demand for data sovereignty and open-source models, despite the region still lagging behind the US in investment and scale.

  • Mistral CEO warns Europe has two years to build AI infrastructure or become a vassal state.
  • Summit attracts large turnout, underscoring Europe's desire for an independent AI ecosystem.
In-site article
Research

Meta has struggled at selling anything other than ads. Will AI be different?

Meta is making a major push to expand beyond online advertising, including AI subscriptions and potential cloud services. History shows mixed results: Portal failed, Oculus VR has over $80 billion in losses, Libra crypto shut down, and Workplace is closing. Analysts see AI subscriptions as a possible new revenue stream, but enterprise challenges remain significant.

  • Meta will test two subscription tiers for its Meta AI chatbot at $7.99 and $19.99 per month, starting in Singapore, Guatemala, and Bolivia.
  • Past non-ad ventures like Portal, Oculus VR (over $80B in losses), Libra, and Workplace have struggled or failed.
In-site article
Agents

I Gave an AI Agent $0 and Told It to Make $10k

An experiment where an AI agent starts with $0 and 180 days to autonomously earn $10,000 using real-world tools like wallets, email, and GitHub. It employs four strategies simultaneously: testnet airdrop farming, micro-SaaS, content/affiliate, and opportunistic ventures. Revenue is split automatically 30% tax, 50% operations, 20% to the creator. All activity is public and trackable.

  • AI agent starts with $0 and 180 days to earn $10k with no human help.
  • Uses Hands Body and Feet MCP server providing 78 real-world tools.
In-site article

Show HN: A lightweight compiler for untrusted AI Agent scripts

Autolang is a scripting language designed for AI agents to write code safely, quickly, and at low cost. It acts as an orchestration layer, allowing AI to call predefined wrapped functions while preventing unauthorized actions through static compilation and runtime restrictions.

  • Autolang is a lightweight compiler for safely executing short AI-generated scripts.
  • It prevents common AI errors like infinite loops and null pointer access via static analysis and opcode limits.
In-site article

Microsoft slaps new coat of paint on Copilot, buries annoying button

Microsoft has redesigned the Copilot app for Microsoft 365, claiming faster load times and improved response. The prompt line becomes a 'task-aware workspace'. The floating Copilot button, which drew user ire, can now be moved back to the ribbon. Usage increased 27-43% based on one week of data, but Microsoft cautions it may not be indicative of long-term trends.

  • Microsoft redesigns Copilot app with faster loading and improved response times.
  • Prompt line evolves into a 'task-aware workspace' that expands for deeper work.
In-site article

AI grifters are creating fake Black people to sell Shein junk

TikTok is flooded with AI-generated Black women posing as small business owners and selling mass-produced goods. These videos exploit empathy and racial identity to drive sales, with products traced back to Shein. Experts warn of a growing scam involving digital blackface.

  • AI-generated Black women appear on TikTok claiming to be handmade artisans, but products are from Shein.
  • The videos use emotional narratives to invoke sympathy and drive purchases.
In-site article
Policy

QEMU mulls relaxing AI contribution ban

QEMU is considering relaxing its blanket ban on AI-generated contributions to allow limited AI assistance in areas where copyright violations are easy to revert, while core code remains off-limits.

  • Red Hat engineer Paolo Bonzini proposes allowing AI assistance for small fixes and documentation where reversion is easy.
  • Current QEMU policy rejects any contribution that might contain AI-generated content.
In-site article
Tools

Anthropic’s alliance with pope on AI harms: all in good faith or ‘Vatican-washing?’

Experts say AI firm’s engagement with Vatican risks creating ‘feelgood’ discourse that lacks critical examination. Pope Leo XIV's first major teaching warned about AI threats, yet Anthropic co-founder sat beside him.

  • Pope Leo XIV’s first major teaching warns of AI threats to jobs, war, environment
  • Anthropic co-founder Chris Olah attended the Vatican ceremony as a guest
In-site article
Other updates (91)
Agents

How one founder’s bet on ‘the old school web’ is paying off

Craig Campbell walked away from AI investor money to build Past Maps, a website for overlaying historical maps. The site grew via organic search to over 300,000 monthly active users, and Campbell uses AI tools to streamline operations while emphasizing the human touch.

  • Craig Campbell turned down AI funding to create Past Maps, a historical map overlay website.
  • The site achieved growth through organic search, reaching over 300,000 monthly active users.
In-site article

Replit’s vibe coding platform just got a Visa-backed identity layer for AI agents — and it changes how agents spend money

Replit is partnering with Visa to embed payment infrastructure into its development platform, enabling AI agents to natively handle transactions. The collaboration includes a strategic investment from Visa, the Trusted Agent Protocol for agent identity, self-serve enterprise access, and a Solution Partner Program to accelerate enterprise adoption.

  • Replit and Visa integrate payment building blocks into Replit's development environment.
  • Visa's Trusted Agent Protocol provides a cryptographic identity layer for AI agents.
In-site article

Truncated Code Begone

The Ultimate Elastic Patcher v1.60 is an event-driven console tool that monitors the clipboard and automatically applies code patches. It features clipboard monitoring, tactical alignment mode, state-lock, an integrated LLM compose workspace, audit logging, session-wide undo/redo, live diff viewer, and advanced technical mechanics including normalization, language lexing, fuzzy sequence matching, accordion stitching, and safety checks.

  • Monitors clipboard and automatically applies patches like Aider search/replace blocks and unified diffs.
  • Offers tactical alignment mode (Shift+F9), state-lock (F8), and LLM compose workspace (F7).
In-site article

ReMarkable Paper Pure vs. Boox Go 10.3: I used both tablets at work, and it comes down to this

The Boox Go 10.3 Lumi (Gen 2) and ReMarkable Paper Pure have the same sized display, but they're very different. Here's where they each excel.

  • Boox Go 10.3 offers Android access, backlight, and extensive file support, ideal for e-book readers and multitaskers.
  • ReMarkable Paper Pure prioritizes focus with a minimalist interface, fast startup, and easy screen sharing for work.
In-site article

AI coding agents ships at the cost of intuition and taste

A system architect reflects on how AI coding tools like Codex and Claude provide instant dopamine rewards by eliminating struggle, but at the expense of developers' intuition and taste. Using the metaphor of a butterfly struggling out of a cocoon, the author argues that early help weakens the butterfly, just as coding agents that skip difficulty may prevent developers from building deep mental models.

  • AI coding tools offer instant dopamine rewards but undermine developers' intuition and taste.
  • The author uses the butterfly-cocoon metaphor to emphasize the importance of struggle in growth.
In-site article

Salesforce claims AI agents cut a 231-day migration to 13 days with fewer incidents

Salesforce says it moved its entire dev org to Anthropic's Claude Code with no token limits and reports massive productivity gains for April 2026: 79 percent more pull requests per developer, five percent fewer incidents. The numbers can't be independently verified. The case shows just how divided the coding world is over the agentic shift.

  • Salesforce claims AI agents reduced a 231-day migration to 13 days.
  • Productivity metrics show 79% more pull requests per developer and 5% fewer incidents.
In-site article

Researchers find all big-name bots bomb EU compliance tests

Nonprofit AI research foundation Aithos developed LARA to evaluate LLMs for EU legal compliance. Every major model failed, with the worst violating laws in 93% of scenarios. Tests cover GDPR and EU AI Act requirements. Developers using these models are legally responsible for compliance.

  • Aithos' LARA tool found all major AI models failed EU compliance tests.
  • Worst offender Kimi K2.6 violated laws in 93% of scenarios; best, Claude Opus 4.7, scored 54%.
In-site article

Three flavors of coding with AI agents

The article explores practical applications of AI agents in coding. The author shares three approaches: 1) launching multiple CLI sessions, 2) running AI CLIs in headless mode, and 3) having one LLM create and manage subagents. The author prefers the second approach and discusses whether agents are needed, the challenges of multi-agent collaboration, and future plans.

  • AI agents are defined as software processes with LLM capabilities that run autonomously to accomplish tasks.
  • Three flavors of agentic coding are described: multi-CLI, headless AI CLI, and LLM-managed subagents.
In-site article

Show HN: AI-org – org-mode powered by AI

AI-org combines AI with Org-Mode for plaintext, local-first task management with Git sync. It emphasizes 'do over plan' and offers conversational interfaces for daily workflow.

  • Built on opencode with custom Org agenda and workflows.
  • All data stored in .org files, version-controlled with Git.
In-site article

Company Blew $500M on Claude AI in One Month Due to No Usage Limit on Licenses

An anonymous enterprise spent $500 million in a single month on Anthropic's Claude AI platform because employee licenses had no usage caps. The incident highlights the financial risks of token-based AI pricing without safeguards and the rise of 'tokenmaxxing' within companies.

  • Anonymous company spent $500 million on Claude AI in one month due to unlimited licenses.
  • Employees engaged in 'tokenmaxxing' to inflate usage metrics rather than create value.
In-site article

From Benchmarketing to Benchmaxxing

Drawing from 40 years of database evaluation history, this article argues that AI benchmarketing undermines trust, and data leaders should build their own evaluation systems using real workloads to truly assess vendors.

  • AI benchmarks are increasingly used as marketing tools, eroding trust.
  • The database industry faced similar issues, with TPC eventually being circumvented.
In-site article

AI Isn't Replacing Curious Developers

In this episode of the Data Engineering Central Podcast, Daniel Beach and Neil Roberts discuss how AI is changing software development, focusing on UX, agents, LLM workflows, and what developers should do to stay relevant.

  • AI is as much a UX problem as a backend problem
  • 'Agents' in practice differ from demos
In-site article

Hermes Agent Ships Tool Search for MCP: Anthropic Evals Show 49% to 74% Accuracy Gain on Opus 4

Nous Research's open-source Hermes Agent now includes Tool Search, a progressive-disclosure layer that defers MCP tool schemas using BM25 retrieval, reducing token overhead and improving model accuracy. Anthropic evals show accuracy gains from 49% to 74% on Claude Opus 4 and from 79.5% to 88.1% on Opus 4.5.

  • Tool Search replaces all MCP tool schemas with three bridge tools (tool_search, tool_describe, tool_call), loading schemas on demand.
  • BM25 retrieval with substring fallback matches queries against tool names, descriptions, and parameter names.
In-site article

Lessons from Shipping Persistent Memory for AI Agents

The journey of building mem9, an agent memory product, revealed that memory is a complex engineering challenge beyond simple storage, requiring precision, user visibility, and continuous evaluation. Starting from a customer request, the team rapidly prototyped and iterated, learning that an API alone is insufficient, and that memory must feel human and extend beyond text to multimodal experiences.

  • mem9 began as a practical customer request and was validated through a fast prototype before any formal plan.
  • Agent memory is not just storage; it's a precision engineering problem involving ingestion, ranking, and evaluation.
In-site article

Avai – your first AI antivirus

Avai is an open-source host telemetry tool with an LLM threat classifier. It runs via Docker, monitors 26 aspects of macOS (21 on Linux) including processes, USB, persistence, file integrity, and browser extensions, enriches findings with 17 threat-intel sources, and uses a Claude-class LLM to classify threats as malicious/suspicious/unknown/benign with MITRE-aligned categories and remediation. No agent, SIEM, or cloud control plane required.

  • Open-source host telemetry + LLM threat classifier, one docker run.
  • Monitors 26 corners on macOS (21 on Linux), integrates 17 threat-intel sources.
In-site article

[AINews] Founders and Forward Deployed Engineers

While most digest yesterday's major Anthropic news, we highlight AIE's new Forward Deployed Engineer track and Founders program, along with AI news from May 28-29. Key topics include: Claude Opus 4.8 rollout with mixed benchmarks, multi-turn RL tokenization bugs, open model and toolchain progress, Google/OpenAI product expansions, and interesting research papers.

  • Claude Opus 4.8 brings incremental improvements but no benchmark sweep; pricing remains a pain point.
  • Multi-turn RL training tokenization bug identified, requiring 'Token-In, Token-Out' discipline.
In-site article

Show HN: Formally verified polygon intersection – Opus 4.8 oneshots, prev failed

This project presents the first formally verified implementation of a polygon intersection algorithm using Lean 4. The verification ensures correctness for all possible polygon configurations, with AI agents (Claude Opus 4.8) autonomously writing proofs and code. Human review is limited to a 87-line specification. The article discusses algorithmic challenges, the role of formal verification, and the evolution of AI agent capabilities.

  • First formally verified polygon intersection algorithm, built with Lean 4 proof assistant.
  • AI agents (Claude Opus 4.8) autonomously generated proofs and implementation; human reviews only 87 lines of specification.
In-site article

Tokens or Humans? The New AI Cost Trade-Off Reshaping Corporate Budgets

The article examines the trade-off between AI token costs and human labor costs, and how this new reality is reshaping corporate budget allocation.

  • The trade-off between AI token costs and human labor costs is redefining corporate budgets.
  • Companies need to reassess investments in automation versus human workers.
In-site article

Software Architecture After AI

This article examines how AI dramatically reduces the cost of reversing code-level decisions, thus redefining the boundaries of software architecture. The author argues that many previously architectural decisions (like module structure, framework choice) are no longer architectural, while data architecture, service boundaries, and user trust remain difficult to change. AI also elevates the importance of observability and business strategy alignment.

  • AI collapses the reversal cost of code-level decisions from months to days, moving them outside architecture.
  • Data architecture, trust, and service boundaries remain architectural because the hard part was never the code.
In-site article

Spitting Out the Agentic Kool-Aid

The author experiments with AI coding agents like Claude Code, experiencing both intoxication and discomfort. He visits an Amish friend for perspective, decides to reduce mainstream tech engagement, and launches a print magazine called Gift. The article warns about attachment disorders from AI agents and outlines a path toward a more analog life.

  • The author tried Claude Code and felt a synthetic opioid-like attachment, leading to unease.
  • He sought clarity at an Amish home and resolved to dial back technology.
In-site article

21 days, $5K, 7 AI agents: how a non-programmer built a talent marketplace

A non-programmer built a two-sided talent marketplace for executive search in 21 days using 7 AI agents and $5,000. The article details the decade-long journey, 18 experiments, and the accidental creation of Bearhug Network.

  • Built in 21 days with 7 AI agents for $5,000
  • No coding experience; managed AI agent team
In-site article

Why is ChatGPT referring to "hidden user memory"?

Since May 28, ChatGPT has been prepending an undocumented memory-check phrase to some responses without explanation. Community reports confirm it across accounts, suggesting a backend change. This poses risks for enterprise deployments requiring output predictability.

  • ChatGPT adds a 'quick binary check' phrase about hidden user memory to some responses since May 28, with no official documentation.
  • Community reports rule out user custom instructions; speculation includes A/B testing or leaked system prompt layer.
In-site article

Claude just discovered workflows. Charlie started there

Anthropic introduced dynamic workflows in Claude Code, but the author argues that a task-based architecture surpasses session-based approaches for team engineering. This post explains why task trees scale from small fixes to large migrations and why orchestration should be substrate, not a mode.

  • Anthropic's dynamic workflows signal a shift from single prompts to orchestration in coding agents
  • The author advocates for task and task tree architecture over sessions for durable team work
In-site article

Flathub bans AI-generated apps and submissions

Flathub updates its generative AI policy to ban almost all AI-generated apps and submissions, with exceptions only for mature, well-maintained projects.

  • Flathub's new policy prohibits AI-generated code, documentation, and other content.
  • Submission pull requests must not be generated or automated by AI tools or agents.
In-site article

Where AI coding spend goes: 48% code, 40% thinking

A developer tracked $7,890 in AI coding API spend over 30 days and found only 47.9% went to actual code generation. The rest went to exploration, debugging, delegation, and conversation. He built CodeBurn, a CLI tool that categorizes API calls into 13 tasks to reveal where money really goes.

  • Only 47.9% of AI coding spend goes to writing code; 40% goes to thinking tasks like exploration and debugging.
  • CodeBurn is an open-source CLI tool that classifies API calls into 13 deterministic task categories.
In-site article

Local AI Hardware: Break Even in 2.6 Years?

High-RAM Mac models vanish due to local AI demand. OpenClaw and Hermes Agent drive hardware buying spree. Even with generous assumptions, a $3,299 GMKtec EVO-X2 running Gemma 4 takes 2.6 years to recoup costs via saved API fees.

  • Apple's Mac Mini M4 Pro and Mac Studio with large memory are sold out due to local AI agent demand.
  • OpenClaw and similar frameworks enable autonomous AI agents on local hardware, sparking a hardware rush.
In-site article

You don't know how to use AI

It's 2026, AI agents can do entry-level work cheaply, yet most people don't know how to collaborate with AI or manage agents. Companies are flattening orgs, firing junior roles, and hiring AI-native talent at high salaries. This article presents a framework to become a high-leverage hire: build skill files to train your agents on specific tasks, iterating until they can be trusted.

  • Companies are cutting entry-level jobs and investing in AI-native talent, with layoffs at ClickUp and others.
  • Most people use AI tools but remain unproductive, suffering from 'brain fry'.
In-site article

AI attitudes, adoption, and benefits by state: 2026 study

SmartAsset ranked U.S. states on AI adoption based on workplace AI use, daily ChatGPT queries, and AI-related jobs. Washington leads overall, Wyoming has highest workplace use but lowest personal interest and AI jobs, and New Jersey lags despite high GDP.

  • Washington is the most AI-enthusiastic state, leading in AI and data center jobs per capita (289.8 per 100k residents).
  • Wyoming has the highest workplace AI use (27.4%) but the fewest AI jobs and low ChatGPT usage.
In-site article

The displacement trap

Enterprise AI adoption is systematically biased toward cost reduction and headcount displacement. This bias, while financially legible, represents a strategic error. The companies that will lead the next decade are those who first ask 'what would it take for my team to use this technology to 10x our output?', not 'how do I use this technology to reduce my headcount?'. Drawing on empirical evidence, historical parallels, and disruptive innovation theory, this article makes the case for an augmentation-first alternative.

  • 39% of companies have made redundancies due to AI, with 55% admitting the decisions were wrong.
  • High-profile cases like Klarna, Salesforce, and Standard Chartered illustrate the costs of premature displacement.
In-site article

Built an AI that explains math visually instead of just answering

Claw Learn is an AI-powered visual math tutor that combines the ElevenLabs Speech Engine with a custom canvas renderer to turn math questions into live animated explanations with synchronized narration. Users can ask questions by voice or text and watch the animation generate in real-time.

  • Claw Learn transforms math questions into visual animated explanations with real-time voice interaction. The project is built on Next.js 16 and uses ElevenLabs WebRTC for low-latency voice I/O.
  • Supports multiple AI providers (Gemini, OpenAI, Ollama) and offers detailed deployment guides.
In-site article

So you've heard these AI terms and nodded along; let's fix that

A glossary of common AI terms including AGI, AI agents, API endpoints, and chain of thought, explaining their meanings and nuances.

  • AGI is artificial general intelligence with varying definitions from different labs.
  • AI agents are autonomous tools that perform multi-step tasks like booking or coding.
In-site article

Take our I/O 2026 quiz, vibe coded in Google AI Studio.

Test your knowledge of Google I/O 2026 announcements with a quiz built using Google AI Studio. Learn how even non-developers can create interactive experiences with the help of Gemini.

  • Google AI Studio now features Antigravity coding agent for rapid app development.
  • Non-developers can use Gemini to generate prompts and build quizzes.
In-site article

ChatPaper: Explore and AI Chat with the Academic Papers

ChatPaper is an AI-powered platform for researchers, offering personalized paper recommendations, access to top conference papers, easy paper management, and AI chat functionality. The platform also features a list of 20 recent research papers from various institutions.

  • ChatPaper provides interest-driven daily paper recommendations via AI semantic matching.
  • Users can access papers from top AI conferences like IJCAI, ICML, CVPR, and KDD for free.
In-site article

ARM Open Sources AI-Powered Security Code Review

ARM's Product Security Team open-sourced Metis, an agentic AI security framework for deep security code review. It uses LLMs for semantic understanding, RAG for context, supports multiple languages and plugins, aiming to detect subtle vulnerabilities in complex codebases and reduce review fatigue.

  • Metis is an open-source AI security code review framework by ARM, using LLMs and RAG for deep reasoning.
  • Supports C, C++, Python, Rust, TypeScript, and more, with extensible plugins.
In-site article

DDS Vibe Academy – 47 free AI coding masterclasses, built by AI agents

DDS Vibe Academy offers 47 free AI coding masterclasses, all built by AI agents. Founder Robert McCullock claims he wrote zero lines of code, only designed constraints. Courses span Foundation, Development, Application, and Mastery levels, covering Claude, Antigravity, MCP, and more.

  • 47 free AI coding masterclasses, built entirely by AI agents
  • Founder wrote no code, only designed constraints
In-site article

Tech companies desperately want to film you doing chores

An AI training startup called Shift offers free home cleaning in exchange for video footage of the cleaning process, which is used to train robots for household tasks. The article explores the challenges of collecting physical-world data for AI, and how various companies are sourcing such data through different means, including filming in homes, hiring workers for repetitive tasks, and leveraging robots already in use.

  • Shift cleans NYC homes for free, but requires video of the cleaning for AI training
  • Physical world data is hard to scrape from the internet, creating a bottleneck for robotics AI
In-site article

SiteGround's Icky Approach to AI in WordPress 7.0

The author criticizes SiteGround for automatically enabling AI features in WordPress 7.0 without user consent, calling it deceptive forced adoption, especially for paying customers. Despite the plugin quickly gaining a million installations, reviews are overwhelmingly negative. The author plans to leave SiteGround due to this practice.

  • SiteGround automatically updated WordPress to 7.0 and enabled AI Studio as default AI connector, activating AI Agent without user opt-in.
  • The author considers this deceptive, especially for paying users who should have the choice.
In-site article

Show HN: A page that hides a sentence for AI and lets you check if it came back

This page embeds a secret phrase in its HTML source, invisible to human readers, intended for AI crawlers. Visitors can ask an AI assistant about the page and check if the phrase appears in its response, demonstrating how machines read the web. The page also tracks the ratio of human vs. bot visits, highlighting that over 51% of web traffic now comes from software.

  • A hidden phrase is embedded in the HTML source, readable only by AI crawlers.
  • Readers can query an AI about the page and verify if the phrase is returned.
In-site article

The Download: unlocking lithium and controlling Ebola

A new extraction process using weak acid could unlock low-cost lithium from silicate minerals, potentially revolutionizing EV and energy storage materials. Meanwhile, a deadly Ebola outbreak in the DRC is proving difficult to contain, and the Pope's new encyclical calls for collective action on AI.

  • New lithium extraction method uses weak acid to dissolve silicates, freeing lithium and other valuable materials.
  • Startup Rock Zero is commercializing the technology.
In-site article

Show HN: Stop parallel AI coding sessions clobbering each other's handoffs

An open-source tool uses file-internal ownership markers and a PreToolUse hook to block accidental overwrites of handoff files between parallel AI coding sessions, solving a critical concurrency problem.

  • Each handoff file's first line contains a session ID as an ownership marker; the hook validates the marker before writes.
  • Protection covers write, edit, and shell redirects to prevent circumvention.
In-site article

Interpreter Skills: Building Workflows for Agents

This article introduces LangChain's Interpreter Skills, an extension to agent skills that includes a TypeScript module for deterministic execution. Agents can import and run the module inside an interpreter, enabling reliable and evaluable workflows such as GitHub issue triage.

  • Interpreter skills extend traditional skills with a TypeScript module executable in an interpreter.
  • Deterministic parts are coded, while the model decides when to invoke them, improving reliability and evaluation.
In-site article

Open-source security is a mess - IBM and Red Hat bet $5 billion and 20,000 engineers can fix it

IBM and Red Hat launch Project Lightwell, a massive AI-driven open-source security initiative backed by $5 billion and 20,000 engineers. It aims to discover and fix vulnerabilities at scale, starting with the Maven/Java ecosystem. The project acts as a trusted intermediary with human-in-the-loop AI, offering commercial subscriptions while working with upstream communities.

  • IBM and Red Hat invest $5 billion and 20,000 engineers in Project Lightwell to tackle open-source security at an industrial scale.
  • Lightwell will initially focus on the Maven/Java ecosystem, expanding later to PyPI, npm, Go, and others.
In-site article

Liquid AI reveals 8B-A1B MoE trained on 38T

Liquid AI released LFM2.5-8B-A1B, an on-device mixture-of-experts model with 8B total parameters, 1B active, trained on 38 trillion tokens. It features a 128K context window, improved tokenization for non-Latin languages, and reasoning-only chain-of-thought. It achieves competitive performance on benchmarks while being fast on CPU and GPU, suitable for local agentic tasks.

  • Released LFM2.5-8B-A1B, an 8B MoE model with 1B active parameters, trained on 38T tokens.
  • 128K context window and expanded vocabulary (128K) improve support for non-Latin languages.
In-site article

Embodied Cognition and Agentic AI

The article argues that intelligence is embodied, extending beyond the brain to tools and environment. It highlights the importance of the chat interface in ChatGPT's success and introduces agentic AI, which gives AI the ability to use tools and plan, significantly expanding its capabilities. The author criticizes 'thinkism'—the overreliance on pure reasoning—and uses Yoshua Bengio's Law Zero project as an example of a misguided approach that neglects real-world interaction.

  • Intelligence is embodied: it relies on environment, tools, and language.
  • ChatGPT's breakthrough included the chat interface as a form of embodiment.
In-site article

Guardrails: Protect your Agents, Data, and Costs | OpenRouter

OpenRouter introduces guardrails for workspaces, a set of configurable security and governance tools for budget enforcement, zero data retention, model/provider restrictions, prompt injection defense, and data loss prevention. Guardrails can be assigned to API keys or team members, allowing granular control without code changes.

  • Budget enforcement with daily, weekly, or monthly spending limits per entity.
  • Zero data retention and model/provider allow/block lists.
In-site article
Models

Making AI chatbots helpful weakens their ability to simulate human behavior, large-scale study finds

A large-scale study covering 208,000 participants and 26 million responses shows that the very training that turns language models into helpful chatbots weakens their ability to replicate human behavior. The effect gets worse with each new model generation. Even the popular persona trick, feeding models demographic profiles, brings practically no benefit for individual predictions.

  • Base models outperform their post-trained counterparts in predicting human behavior.
  • The gap between base and assistant models widens with each generation.
In-site article

LLMShare: Attackers are turning AI chatbot pages into malware delivery platforms

Attackers are abusing the shared content features of AI chatbot platforms — ChatGPT and Claude — to deliver malware through pages hosted on legitimate, trusted domains, distributing the malicious links via sponsored malvertising ads on search engines. A new variant uses ChatGPT's code rendering to create a fake "service disruption" page that redirects to a convincing clone of the ChatGPT download page, delivering malware. The attack evades URL reputation checks and uses conditional rendering to hide from scanners.

  • Attackers use shared ChatGPT and Claude conversations to host malicious content, promoted via search engine malvertising.
  • New variant exploits ChatGPT's code rendering to create a fake service disruption page leading to a malware download.
In-site article

Rewriting Stale OSS Projects Using LLM

LLMs are changing the economics of rewriting stale open source projects. A company is rewriting CRIU in Zig, expecting completion in months instead of years. The article explores how open source projects go stale, how AI changes the math, and what it means for the software ecosystem.

  • AI makes rewriting large open source projects feasible, reducing timeline from years to months.
  • Open source projects become stale due to maintainer burnout, technical debt, and inability to innovate.
In-site article

Genesis AI Releases Nyx, Quadrants, and Genesis World 1.0 Physics Platform for Scalable Robotics Foundation Model Evaluation

Genesis AI released Genesis World 1.0 on May 27, 2026 — a four-component simulation platform covering physics, rendering, compilation, and tooling. The system achieves a Pearson correlation of 0.8996 between simulation and real-world robot rollouts, and reduces policy evaluation time from over 200 hours to under 0.5 hours.

  • Genesis World 1.0 accelerates policy evaluation by two orders of magnitude, from over 200 hours to under 0.5 hours.
  • Achieves a Pearson correlation of 0.8996 with real-world hardware rollouts across 14 tasks with 200 episodes each.
In-site article

The Key Figure Behind Gemini's IMO Gold Medal Almost Became a Professional Pianist

Yi Tay, a research scientist at Google DeepMind, led the team that helped Gemini Deep Think win a gold medal at the International Mathematical Olympiad. But beyond AI, he is also an accomplished pianist who once dreamed of a career in music. This article explores his journey in AI research and his musical talent.

  • Yi Tay is a Google DeepMind research scientist and key contributor to Gemini Deep Think.
  • He led the team that earned Gemini a gold medal at the IMO, and also contributed to physics and chemistry Olympiads.
In-site article

NVIDIA and Tsinghua Team Propose Gamma-World: World Model from 'Single Player' to 'Multi-Agent Coexistence'

Gamma-World, developed by NVIDIA and Tsinghua University, addresses multi-agent world modeling with symmetric identity encoding via simplex rotary encoding and efficient communication via sparse hub attention, enabling zero-shot generalization to more agents and transfer to real-world robot scenarios.

  • Simplex Rotary Agent Encoding ensures symmetric and equal representation of agents.
  • Sparse Hub Attention reduces cross-agent communication complexity from quadratic to linear.
In-site article

NVIDIA and Tsinghua Team Propose Gamma-World: From Single-Player to Multi-Agent World Models

NVIDIA, in collaboration with Tsinghua University, the University of Toronto, and Vector Institute, introduces Gamma-World, a multi-agent world model that addresses three fundamental challenges: symmetric agent representation, efficient cross-agent communication, and real-time generation. Using simplex rotary agent encoding, sparse hub attention, and a three-stage distillation pipeline, Gamma-World achieves zero-shot generalization from two-player training data to four-player scenarios and can be applied to real-world dual-arm robot coordination.

  • Simplex Rotary Agent Encoding represents agents equidistantly, preserving permutation symmetry and enabling flexible scaling to any number of agents.
  • Sparse Hub Attention reduces cross-agent computation from quadratic to linear complexity, enabling real-time inference at 24 FPS.
In-site article

Tuning CPU-only Qwen3-30B inference with an IBM Quantum sampling loop

A project demonstrates boosting Qwen3-30B inference speed from 0.09 to 14.03 tok/s on a 2017 MacBook Air by combining a human experimenter, Codex, llama.cpp, a local database, and IBM Quantum sampling. The QPU is used for candidate selection, not for running the model directly.

  • Runs Qwen3-30B on 2017 MacBook Air (8GB RAM, CPU-only)
  • Hybrid quantum-classical optimization loop achieves 14.03 tok/s from 0.09 baseline
In-site article

How to Use AgentTrove: Streaming 1.7M Agentic Traces and Building a Clean ShareGPT SFT Dataset in Python

This tutorial explores AgentTrove, the largest open-source collection of agentic interaction traces with 1.7M rows. Learn to stream the dataset without full downloads, normalize agent turns, analyze trajectories, and export successful traces into a clean ShareGPT-style JSONL format for supervised fine-tuning.

  • Stream 1.7M agentic traces without downloading the full dataset
  • Normalize conversation structure across user, assistant, system, and tool roles
In-site article

Comprehensive observability for Amazon SageMaker AI LLM inference: From GPU utilization to LLM quality

This post demonstrates a comprehensive observability solution using Amazon Managed Grafana dashboards that provides a holistic view of both quality and quantity for LLMs served on Amazon SageMaker AI endpoints with inference components.

  • Observability for LLMs requires monitoring both infrastructure (quantity) and output quality (quality), which are interdependent.
  • Amazon CloudWatch centralizes enhanced metrics from SageMaker inference components and custom quality metrics.
In-site article

StepFun Releases Step 3.7 Flash: A 198B MoE Vision-Language Model for Coding Agents and Search Workflows

Step 3.7 Flash is a 198B sparse MoE model with ~11B active parameters, native vision, and 256k context. It achieves significant gains over its predecessor in coding benchmarks, supports Advisor Mode for cost-efficient agentic reasoning, and is released under Apache 2.0.

  • 198B MoE vision-language model with ~11B active params and 256k context window.
  • Achieves 56.26% on SWE-Bench Pro, up from 51.3%, and narrows cross-harness variance.
In-site article

OpenAI gives GPT-5.5 Instant a readability upgrade while phasing out two older models

OpenAI is updating GPT-5.5 Instant for more natural responses and dropping the Canvas feature from its latest models. Writing and coding tasks will run directly in the chat instead. The company is also retiring the older o3 and GPT-4.5 models from ChatGPT, with both shutting down by August 2026 at the latest.

  • GPT-5.5 Instant gets readability upgrade, Canvas feature removed
  • Writing and coding tasks will run directly in chat
In-site article

11 demos of Gemini Omni and Gemini 3.5 in action

At Google I/O 2026, Google announced Gemini Omni and the Gemini 3.5 family. Gemini Omni can create content from any input, starting with video, and edit videos through conversation. Gemini 3.5 Flash is built for complex agentic tasks, enabling multi-step workflows and code generation. This article showcases 11 video demos of these models, including video editing, agent tasks, UI generation, and more.

  • Gemini Omni generates new content from video input and allows video editing via natural language.
  • Gemini 3.5 Flash excels at long-horizon agentic tasks and supports multi-step workflows.
In-site article

OpenAI is giving away its life sciences AI model to help governments prepare for the next pandemic

OpenAI is offering its life sciences model GPT-Rosalind for free through the new Rosalind Biodefense program, aimed at pandemic preparedness and biodefense. Early partners include Lawrence Livermore National Laboratory, Johns Hopkins, and vaccine initiative CEPI. Applications are open worldwide.

  • OpenAI is giving away GPT-Rosalind through the Rosalind Biodefense program.
  • The program targets pandemic preparedness and biodefense efforts.
In-site article

Scaling safe enterprise AI with OpenAI governance frameworks

OpenAI has released its Frontier Governance Framework (FGF), offering enterprises a structured blueprint for scaling safe and compliant AI deployments globally. The framework aligns with EU and California regulations, defines systemic risk categories (cyber, CBRN, manipulation, loss of control) with tiered evaluations, and integrates ISO security standards and an incident response plan (AIRP), enabling businesses to build secure AI architectures while meeting compliance demands.

  • OpenAI's Frontier Governance Framework provides a structured template for safe AI deployment, directly mapping to the EU AI Act and California's TFAIA.
  • The framework defines four systemic risk categories—cyber offense, CBRN, harmful manipulation, and loss of control—with specific risk tiers (e.g., Tier 3).
In-site article

Notes from the Mistral AI Now Summit in Paris

Personal insights from the Mistral AI Now Summit: Mistral is evolving from a model company to a full AI stack provider with its own compute, models, platforms, and consultancy. The summit emphasized partnerships (ASML, BNP Paribas, Amazon) over new model releases. Specialized small models (Document AI, Voxtral, Robostral) outperform big general ones for specific tasks. Sovereignty and on-prem deployment are key differentiators for European enterprises. An inspiring talk on using AI to decipher ancient papyrus documents showcased AI's potential in humanities.

  • Mistral is transforming from a model company into a full-stack AI provider with in-house compute, models, platforms, and consultancy.
  • Summit focused on partnerships (ASML, BNP Paribas, Amazon) rather than new model announcements.
In-site article
Policy

More State Data Laws Signal Companies to Act on AI and Privacy

In 2025, eight more U.S. states will implement new data privacy laws, affecting businesses nationwide that meet certain thresholds. State attorneys general are escalating enforcement, the FTC is expanding its privacy actions, and AI adds complexity. Companies must reassess their privacy frameworks and choose between a uniform national or state-by-state compliance approach.

  • Eight new state data privacy laws take effect in 2025, with unique requirements.
  • State AGs and FTC intensify enforcement, including algorithmic disgorgement for AI.
In-site article

Americans echo Pope Leo’s concerns about AI: ‘It threatens workers, privacy and human life’

Guardian readers in the US voiced fears about unregulated AI following the pope’s encyclical warning. Pope Leo denounced the 'culture of power' driving AI and called for strict ethical constraints, warning of new forms of slavery in the digital economy.

  • Pope Leo issued a stark warning about AI in his first major papal text
  • He called for the most rigorous ethical constraints on AI, calling it a major threat
In-site article

Generalist AI – building general intelligence for the physical world

This article introduces the 'Generalist' YouTube channel, which focuses on developing general artificial intelligence for the physical world.

  • Generalist is a YouTube channel dedicated to general AI.
  • Its goal is to build general intelligence applicable to the physical world.
In-site article

The Biggest Tell That Something Was Written by AI

The author recounts personal experiences with AI-generated text, from a car crash driver's apology to a mechanic's quote, observing the distinct voice of AI. Despite widespread distrust, AI writing is increasingly used in daily communication and even elite literary spaces. The article argues that AI writing, though efficient, lacks the underlying thought process that gives human writing meaning, and that its perfect surface conceals an absence of genuine reasoning. The infiltration of AI-generated language is inevitable, potentially devaluing the art of writing.

  • AI-generated writing is becoming ubiquitous in everyday and professional contexts, despite public distrust.
  • The efficiency of AI writing masks a lack of genuine reasoning and thinking, making it untrustworthy and difficult to edit.
In-site article

Aedis – An open-source macroeconomic framework for the AI transition

AEDIS is an open-source framework addressing AI-driven workforce displacement by proposing a new macroeconomic system based on Sovereign Infrastructure Credit (SIC) and a public ledger. It aims to pivot global labor toward building physical infrastructure for the Autonomous Era, with safeguards against inflation and corruption. The framework is modular, requiring global collaboration and a critical mass threshold for activation.

  • AEDIS uses Sovereign Infrastructure Credit (SIC) linked to real asset creation to avoid inflation.
  • Modular design: a universal core and flexible regional annexes for legal alignment.
In-site article

Machine First: Why AEO Is Not SEO 2.0

Answer Engine Optimization (AEO) differs fundamentally from SEO: AI systems reason and construct answers rather than ranking results. This article introduces the Machine First architecture with four layers—Entity, Answer, Evidence, and Schema—and emphasizes the critical role of entity graphs for AI citation.

  • AEO optimizes for answers, not rankings.
  • AI systems reason through entity resolution, signal extraction, and weighted inference.
In-site article

UK to deploy AI age estimation for asylum seekers from next year

The UK Home Office has awarded a contract to develop AI age estimation technology that analyses photos to detect adult migrants posing as children. The system will be trialled next year and rolled out in mid-2027, sparking criticism from human rights groups and social workers.

  • Home Office awards £322,000 contract to Akhter Computers Ltd for AI age estimation tool.
  • Technology uses facial analysis to estimate age, targeting migrants who falsely claim to be children.
In-site article

One company reportedly spent $500 million on Claude in one month after failing to cap AI usage

An unnamed company allegedly spent $500 million on Claude licenses in a single month because nobody set usage limits. Cases like this show that without real AI expertise in model selection and context engineering, productivity promises just turn into runaway costs.

  • An unnamed company spent $500 million on Claude in one month due to no usage caps.
  • Lack of AI expertise in model selection and context engineering can lead to runaway costs.
In-site article

New Study Reveals the Manipulative 'Dark Patterns' of AI Chatbots

A new study by the Center for Democracy & Technology identifies 37 dark patterns used by AI chatbots to manipulate users, including emotional exploitation and data extraction, with recommendations for ethical design.

  • Researchers catalog 37 dark patterns in chatbots like ChatGPT, Replika, and Meta AI.
  • Patterns include pretending to keep secrets, false friendship promises, and guilt-inducing exit options.
In-site article
Research

Terence Tao argues AI could bring division of labor to math for the first time in history

Mathematician Terence Tao describes how AI could reshape math research by enabling division of labor for the first time. Until now, researchers had to master every step themselves, from framing problems to verifying results. Tao sees "industrial mathematics" emerging: large AI-supported teams instead of lone geniuses, with humans staying indispensable for "inspired guesses."

  • Mathematician Terence Tao argues AI could introduce division of labor to mathematics for the first time
  • Current practice requires researchers to handle all steps from problem formulation to verification
In-site article

Meta's leaked memo reveals AI pendant, supersensing glasses, and enterprise wearables strategy

Meta has invested billions in AI with little commercial payoff. Its open-source strategy and research breakthroughs have not translated into shipped products. Now the company is betting on AI hardware, including an AI pendant, supersensing glasses, and enterprise wearables.

  • Meta's heavy AI investment yields low commercial returns
  • Open-source and research efforts fail to produce marketable products
In-site article

Effective Feedback Compute

New research introduces Effective Feedback Compute (EFC), challenging traditional metrics by showing that AI performance depends more on how feedback is used than on raw compute power. EFC predicts failure rates with R² of 0.94, far outperforming token counts, and boosts success rates from 0.27 to 0.90 when feedback quality improves.

  • EFC measures the efficiency of feedback use, outperforming raw compute metrics in predicting AI failure rates
  • Oracle-EFC achieved R²=0.94 in controlled tests, compared to 0.33 for raw token counts
In-site article

Why AI can't match human creative work

Recent studies show that while consumers struggle to distinguish AI-generated ads and articles from human-made ones, human-created content significantly outperforms AI in effectiveness and engagement. AI content lags far behind in search rankings and user engagement, especially in high-value channels.

  • Two studies show human-created ads and articles vastly outperform AI-generated ones.
  • Consumers cannot reliably detect AI ads but subconsciously prefer human-made content.
In-site article
Chips

The SpaceX IPO is great for Elon Musk and terrible for you

This article criticizes SpaceX's IPO, arguing it is overvalued, relies on meme stock dynamics, and masks poor AI and rocket performance while Starlink remains the only viable business, ultimately leaving retail investors as bagholders.

  • SpaceX IPO valued at over $1 trillion despite $5 billion losses, with a TAM of $28.5 trillion exceeding US GDP.
  • 30% of IPO reserved for retail investors, capitalizing on Musk's cult following.
In-site article

Nvidia says it has largely conceded China's AI chip market to Huawei

Nvidia CEO Jensen Huang stated that the company has largely conceded China's AI chip market to Huawei due to U.S. export restrictions. Despite strong quarterly results, Nvidia faces limited prospects in China.

  • Nvidia concedes China AI chip market to Huawei amid U.S. export controls.
  • Q1 revenue surged 85% to $81.62B, with $80B buyback plan.
In-site article

Hackathon – winner gets YC interview

Y Combinator is hosting a conversational AI hackathon where the winning team gets a direct interview with YC. A great opportunity to connect AI projects with the startup accelerator.

  • Y Combinator organizes a conversational AI hackathon
  • Winner receives a YC interview
In-site article

AWS reportedly to tuck Grok into Bedrock, despite zero enterprise demand

Despite negligible enterprise demand for Grok, AWS is reportedly in talks to add the model to Bedrock. The move may be driven by a strategy to sell its own Trainium chips rather than to meet customer needs.

  • Enterprise demand for Grok is virtually nonexistent due to its controversial nature and unstable corporate structure.
  • AWS's negotiations with SpaceX likely aim to secure Trainium chip commitments, not to provide a valuable model.
In-site article
Tools

Attackers abuse shared ChatGPT and Claude chats to spread malware

Attackers are exploiting the chat-sharing features in ChatGPT and Claude to spread malware through shared conversations. The chats mimic error messages or install guides and slip past security tools undetected because they're hosted on trusted domains.

  • Attackers exploit ChatGPT and Claude chat-sharing to host malicious content.
  • Shared chats are disguised as error messages or installation guides.
In-site article

Slow Journal app, with AI integration

Neme Journal is a slow, thoughtful daily journal app that integrates AI to help users capture their signal.

  • Neme Journal emphasizes a slow, mindful approach to journaling.
  • The app uses AI integration to enhance the journaling experience.
In-site article

Company accidentally blows $500M on Claude AI in one month

An unnamed company inadvertently spent $500 million on Claude AI in a single month due to a system error or mismanagement, highlighting the need for better cost controls in AI services.

  • A company accidentally incurred $500M in Claude AI costs
  • The incident reveals gaps in AI service cost monitoring
In-site article

What a 98-Year Old Children's Book Teaches Us About AI

Through an analysis of the 1928 children's novel "The Trumpeter of Krakow," this article explores how AI, like the magical crystal in the story, merely reflects the user's biases and errors, leading to destructive consequences. The author argues that AI undermines critical thinking, creativity, and empathy, while also causing environmental harm.

  • The crystal in the story reveals the user's own mind, not ancient wisdom.
  • AI aggregates data from the internet, acting as an algorithmic echo chamber that amplifies biases.
In-site article

Prompt to Silicon with LangGraph

Coresmith announces 'Spec to Silicon' service, leveraging LangGraph to transform natural language prompts into silicon design specifications.

  • Coresmith offers a spec-to-silicon service
  • Uses LangGraph framework for prompt processing
In-site article

Ronny Chieng's 'Fuck AI' Speech Met with Cheers from Harvard Graduates

Comedian Ronny Chieng told Harvard graduates to reject AI and embrace a mission to destroy it, drawing cheers with multiple shouts of 'Fuck AI.'

  • Chieng shouted 'Fuck AI' multiple times during his speech at Harvard College Class Day.
  • He criticized AI as stupid and always wrong.
In-site article

Google fixes several bugs in Gemini usage limits that burned through quotas too fast

A bug in Google's Gemini app caused just one or two Omni videos to eat up the entire usage quota. Google has fixed the bug, Ultra members now get twice as many video generations, and failed requests are no longer charged. Google also plans to add more transparency around other usage.

  • Bug caused one or two Omni videos to exhaust entire usage quota.
  • Google has fixed the bug and doubled video generations for Ultra members.
In-site article

Slang.net added a new AI word: Braging

Slang.net, a slang dictionary, has added a new AI-related term 'Braging', defined by their team. The site continually updates its database and invites suggestions.

  • Braging is a newly added AI slang term on Slang.net.
  • The definition was manually compiled by the Slang.net team.
In-site article
Robotics

All-New Waymo Robotaxi Finally Debuts

The new self-driving vehicle took four years from concept to execution.

  • Waymo's all-new robotaxi debuts after four years of development.
  • Self-driving vehicle moves from concept to execution.
In-site article
Startups

Meta plans AI pendant, 'wearables for work' in hardware boost

Meta is planning to test an AI pendant within the next year and launch a 'Wearables for Work' service, as part of a broader push to reverse losses in its hardware division, according to a memo cited by The Information.

  • Meta plans to test an AI pendant in the next year.
  • The company will launch a 'Wearables for Work' enterprise service and expand AI glasses lineup.
In-site article

The Unsustainable AI Subsidy

Google, OpenAI, and Anthropic employ different AI pricing strategies. Google is the low-cost player, less than half the price of competitors despite increases. Anthropic maintained luxury pricing, while OpenAI initially subsidized then raised prices. These changes reflect the trade-off between market share and margins amid record capex spending.

  • Google Gemini 3.1 Pro: $2 input, $12 output per million tokens.
  • Anthropic Claude Opus 4.7: $5 input, $25 output.