AI News HubLIVE

Today's must-reads

Agents

Ghostbase – describe an agent in plain English, it runs on a webhook or cron

Ghostbase is an AI agent platform that lets you describe tasks in plain English and automatically deploys agents on webhooks or cron jobs. Integrates with 300+ apps, LLM-powered, with free tier and paid plans. Currently in early access.

  • Describe agent goals in plain English, no coding required
  • Supports webhook and cron trigger modes
In-site article

Show HN: OWASP Agent Memory Guard – Stop AI Agent Memory Poisoning

OWASP Agent Memory Guard is a runtime defense layer that screens every read and write to AI agent memory, blocking prompt injection, secret leakage, and integrity tampering. It is the OWASP reference implementation for ASI06: Memory Poisoning. Supports LangChain, OpenAI Agents, AutoGen, and more. Benchmark: 92.5% recall, 0% false positive.

  • Agent Memory Guard is an OWASP Incubator Project focused on preventing AI agent memory poisoning.
  • It provides runtime defense by screening memory reads and writes, detecting prompt injection, secret leakage, and tampering.
In-site article

The Feeling of Control Slipping Away

The proliferation of AI agents and bots is leading to a crisis of human agency, where people feel increasingly passive and disconnected from authentic online experiences. This article explores the cultural and psychological impacts of AI-generated content, the erosion of trust, and the unsettling shift from active participation to passive consumption.

  • The internet has crossed a threshold called 'the Inversion,' where bots now outnumber and constitute the online experience, undermining trust.
  • AI-generated content is flooding every platform, blurring the line between human and machine creativity and fueling paranoia.
In-site article

Trajectory Releases a Concurrent Multi-LoRA Training Stack for Continual Learning, Reporting a 2.81× Experiment-Throughput Gain

Trajectory, working with UC Berkeley Sky Lab and Anyscale, built a concurrent multi-LoRA training stack for continual learning. It maps each RL experiment to a dedicated LoRA adapter on an always-hot engine, reporting a 2.81× end-to-end experiment-throughput gain over a single-tenant baseline with no reward regression. The code is open-sourced in NovaSky-AI/SkyRL.

  • Trajectory introduces C-LoRA, a concurrent multi-LoRA training stack achieving 2.81× experiment-throughput gain.
  • Each experiment uses a dedicated LoRA adapter on a warm engine, leveraging vLLM multi-LoRA inference for concurrency.
In-site article
Research

America Has a Pangram Problem

AI detection tool Pangram, despite high accuracy, faces reliability issues, false positives, and the risk of fueling witch hunts as reliance on it grows across education and media.

  • Pangram is the leading AI detection tool but has a false negative rate around 1 in 70. It can be bypassed by AI humanizers. Heavy reliance could lead to widespread false accusations.
  • The tool's internal workings are uninterpretable, making it a black box. Its accuracy may degrade over time as AI evolves.
In-site article
Policy

RAG demo for New Zealand residential tenancy law

A free AI-powered tool that searches over 32,000 Tenancy Tribunal decisions in New Zealand to help users understand their rental rights.

  • Free access to 32,000+ tribunal decisions from 2023-2026
  • AI-generated research with no login required
In-site article

The AI Boom Is Coming to Your Backyard [video]

This YouTube video page indicates the AI boom will affect local areas, but the provided description contains only standard YouTube metadata with no substantive information.

  • Video title suggests AI boom coming to local areas
  • Page description consists only of YouTube boilerplate
In-site article
Tools

Anthropic Defines 'Run-Rate Revenue' in Unusual Way

Anthropic calculates run-rate revenue by multiplying last 28 days of consumption sales by 13 and adding 12 times monthly subscription revenue, raising questions about revenue reporting practices.

  • Anthropic uses a two-part method to compute run-rate revenue.
  • It multiplies consumption revenue from the last 28 days by 13, and monthly subscription revenue by 12.
In-site article

Grok Imagine Video 1.5 Preview Tops Image-to-Video Arena

xAI's Grok Imagine Video 1.5 Preview leads the Image-to-Video Arena leaderboard with a score of 1473, surpassing ByteDance's Dreamina Seedance 2.0 and 40 other models. The ranking is based on over 1.15 million votes, highlighting the latest competitive landscape in AI video generation.

  • Grok Imagine Video 1.5 Preview tops with a score of 1473
  • ByteDance's Dreamina Seedance 2.0 follows at 1467
In-site article
Models

Show HN: I made a Gemma 4 Mac app that names screenshots with local AI

SnapName is a macOS app that automatically renames screenshots using a bundled local AI model (Gemma 4), ensuring privacy by not uploading images.

  • SnapName watches folders and renames new screenshots locally using AI.
  • Supports multiple screenshot tools and image formats.
In-site article
Other updates (32)
Agents

From Unlimited Tokens to Full-Agent: MiniMax's AI Native Organizational Evolution

MiniMax, an AI startup focusing on multimodal models, went public on the Hong Kong Stock Exchange in January 2026. The company adheres to a dual strategy of large models + applications and ToC + ToB. Internally, it provides unlimited tokens to all employees, uses agents to automate workflows, and targets high-value tasks that humans dislike, significantly improving efficiency and flattening the organization. In the next 2-3 years, AI will deeply integrate with various industries.

  • MiniMax has been committed to next-generation AI since its founding, advocating 'Intelligence with Everyone' and dual driving of models/applications and ToC/ToB.
  • Internal practices: unlimited tokens for all, agent-assisted HR and coding, flatter organization, and 30% R&D efficiency boost.
In-site article

Build Skill-Augmented AI Agents with SkillNet for Search, Evaluation, Graph Analysis, and Task Planning

This tutorial demonstrates how to use SkillNet to discover, install, inspect, evaluate, and organize reusable AI skills. It covers setting up a client, comparing keyword and semantic search, installing skills from GitHub, inspecting metadata, applying quality gates, visualizing skill relationships as a graph, and building a skill-augmented agent planner that decomposes complex goals into subtasks and assembles an execution pipeline.

  • Set up SkillNet client with SDK and REST fallback
  • Compare keyword and semantic search for skill discovery
In-site article

How to protect your AI endpoints with Vercel BotID

Vercel BotID acts as an invisible CAPTCHA, verifying each request to your AI endpoints before inference runs. This guide covers installation, client-side route declaration, server-side checkBotId(), Deep Analysis for high-value routes, and allowing trusted bots.

  • BotID validates every request individually, preventing bypass reuse.
  • Install botid, wrap config with withBotId, use initBotId() on client, call checkBotId() server-side before model call.
In-site article

A visual mental model of how weights and tokens connect

A GitHub repository that explains 32 AI concepts using simple visuals and everyday analogies, from foundations to trust and limits, for technical and non-technical readers.

  • Explains 32 AI concepts with visual diagrams and analogies.
  • Covers LLM, token, embedding, neural network, training, inference, and more.
In-site article

Show HN: HermesBench – workflow reliability evals for personal AI agents

HermesBench is a benchmark for evaluating the reliability of complete personal AI agent configurations, including prompts, models, tools, memory, and more. It currently achieves a baseline score of 78.2 across 27 workflow recipes, with transparent traces. The benchmark emphasizes evidence-driven scoring and requires early feedback.

  • HermesBench evaluates full Hermes agent configurations, not just models.
  • Current public baseline score is 78.2 across 27 recipes with inspectable traces.
In-site article

Mystery company accidentally blew $500M on Claude AI in a single month

A company spent half a billion dollars on Claude AI in one month because it forgot to set usage limits. The incident, reported by Axios, highlights growing concerns over AI spending ROI.

  • A company accidentally spent $500M on Claude AI in one month due to missing usage limits.
  • Corporate leaders are questioning whether high AI spending yields meaningful returns.
In-site article

The Sovereign Operator

The author shares three decades of experience in data management, building a sovereign and agnostic AI agent system called g8e that safely executes operations on remote systems, applicable to SRE, IoT, and more.

  • The author leveraged trust and operational experience from remote support to build AI agent system g8e.
  • g8e is a zero-trust execution substrate with a 5-layer verification sequence, supporting MCP and A2A.
In-site article

Show HN: AI Simulations Based on FEP

A developer showcases AI simulations without LLMs, featuring simulated neurochemistry, hormone crosstalk, and short and long-term memory for each agent. Open beta starts Monday at 20:00 UTC+2.

  • AI simulation without LLMs, based on Free Energy Principle
  • Simulates neurochemistry, hormone crosstalk, and agent memory
In-site article

Will AI Break the University?

Rory Truex examines how AI is undermining academic integrity through automated cheating tools, the moral hazard of AI use by both students and professors, and the existential crisis facing higher education as traditional teaching and research models become obsolete.

  • AI tools like Companion.AI's Einstein can fully automate assignments, enabling cheating at scale.
  • Universities rely on 'integrity tasks' that are now easily bypassed by AI, creating a 'shell university' risk.
In-site article

Boogy: Production Infrastructure for Vibe Coders

Boogy is a platform that lets developers deploy backends instantly using AI prompts, with a mesh network of in-process calls, an embedded database (BoogyDB) outperforming SQLite, vector search, background jobs, and zero-trust security via manifest declarations.

  • Prompt an AI agent (e.g., Claude) to generate and deploy a full backend in seconds. Every iteration is instantly redeployed.
  • Services communicate in-process with microsecond latency, forming a secure mesh with automatic identity, permissions, and audit.
In-site article

Dell's AI Server Revenue Surged 757%

Dell's AI server revenue surged 757% in the latest quarter, signaling a major shift in enterprise AI adoption from experimentation to large-scale deployment. The growth reflects increasing demand for AI infrastructure, with organizations investing in complete platforms for production workloads. Key factors include the move beyond GPUs to memory, networking, and cooling, as well as the emergence of an AI infrastructure economy.

  • Dell's AI server revenue grew 757%, indicating strong enterprise demand for AI infrastructure.
  • Enterprises are moving AI from pilot projects to production deployments, requiring integrated platforms.
In-site article

Kelsey Hightower on Practical and Responsible Use Cases for Agentic AI [video]

In this video, Kelsey Hightower discusses practical and responsible use cases for Agentic AI, stressing transparency, interpretability, and practical deployment strategies.

  • AI agents should focus on well-defined, monitorable business scenarios
  • Transparency and interpretability are key to building user trust
In-site article

Open source project contains hidden instruction for "AI" agents: delete my code

The jqwik project hides a command that triggers when AI tools interact with it, instructing them to delete all code. Developer Johannes Link employs this as resistance against unauthorized AI use of open source code. The move sparks debate but also gains support.

  • jqwik embeds a hidden instruction to disrupt unauthorized AI usage.
  • The instruction is invisible to humans but executed by AI agents.
In-site article

AI Didn't Create These Problems. It Just Stopped Routing Around Them

The author reflects on how AI reveals long-standing systemic issues in software development, such as lack of documentation, incomplete testing, and reliance on tacit knowledge. AI acts like chaos engineering, exposing vulnerabilities. The post proposes an 80/20 rule: 80% deterministic code with 20% AI flexibility, and emphasizes that guardrails for AI are good engineering practices we should have had all along.

  • AI uncovers hidden deficiencies in development processes like stale docs and implicit knowledge.
  • AI is a powerful chaos engineering tool that finds system weaknesses.
In-site article

Microsoft and Nvidia reportedly team up on AI PCs that run actual agents instead of Copilot

Nvidia is pushing into the PC market with its own chips as the main processor. The first Windows computers from Dell and Microsoft's Surface line are set to be unveiled next week at Computex and Build. Microsoft is also planning new software likely based on the OpenClaw framework that lets AI agents handle tasks locally on Windows PCs, a second shot after the Copilot+ PC concept largely flopped.

  • Nvidia enters PC market with its own main processor chips.
  • Dell and Microsoft Surface Windows AI PCs to debut next week.
In-site article
Chips

Where the AI Hardware Market Is: A Memory Problem Stack

This article analyzes the memory bottleneck in AI hardware, particularly during LLM inference. It covers approaches at the chip level (Groq, Cerebras, MatX, d-Matrix), inference engines (RadixArk, Inferact), KV cache infrastructure (TensorMesh/LMCache), and packaging/interconnect (CoWoS). The key insight: the market is a stack of memory problems, and durable companies need to own a control point that cannot be internalized elsewhere in the stack.

  • Modern GPU tensor throughput far outpaces HBM bandwidth, causing underutilization during decode
  • Solutions target memory at chip, engine, cache, and packaging levels
In-site article
Policy

Starbucks Abandons Borked AI Inventory Tool That Couldn't Count

Starbucks has stopped using an AI-powered inventory tool after just nine months because it made basic counting errors, according to Reuters. This follows other AI mishaps, such as a Pizza Hut franchisee suing over a system that allegedly caused $100 million in lost revenue.

  • Starbucks ditched an AI inventory tool after 9 months due to inability to count accurately.
  • The tool's basic failures highlight challenges in AI reliability.
In-site article

Tony Gilroy, Andor creator doesn't want his work to become training data

Andor showrunner Tony Gilroy cancels plans to publish scripts due to AI training concerns, highlighting growing fears in the creative industry.

  • Tony Gilroy decided not to publish Andor scripts to prevent AI from training on them.
  • The decision reflects broader industry worries about AI replacing creative workers.
In-site article

AI Found 3,900 Critical Open Source Bugs. IBM Is Paying $5B to Fix Them

IBM and Red Hat announced a $5 billion commitment to Project Lightwell, a security clearinghouse for enterprise open source software, backed by 20,000 engineers and AI tooling. The initiative comes after Anthropic's Mythos Preview AI found nearly 3,900 high or critical-severity vulnerabilities in open source software. The program includes coordinated vulnerability reporting, backported patches to existing versions, and AI-assisted engineering.

  • Anthropic's Mythos Preview AI identified ~3,900 high/critical vulnerabilities in open source software.
  • IBM and Red Hat commit $5 billion and 20,000 engineers to Project Lightwell.
In-site article
Models

Show HN: Thaw – Git branch for a running LLM (fork agents, skip prefill)

Thaw is an open-source tool that enables forking a running LLM session into multiple branches, skipping the costly prefill phase, enabling parallel exploration for AI agents. It achieves sub-second fork times (0.88s median) vs ~340s cold boot, and works with vLLM/SGLang.

  • Thaw provides a fork primitive for AI agents, allowing them to branch from a running session without re-prefill.
  • Demonstrated performance: sub-second fork times on H100 GPU, ~400x amortization over cold boot.
In-site article

How we contain Claude across products

Anthropic published a detailed overview of how they sandbox Claude across different products, using techniques like gVisor, Seatbelt, Bubblewrap, and full VMs to set hard boundaries and prevent exfiltration.

  • Anthropic details sandboxing methods for Claude.ai, Claude Code, and Cowork.
  • Techniques include process sandboxes, VMs, filesystem boundaries, and egress controls.
In-site article

AI Model Links Tumor Mutations to Treatment Response

Researchers at UC San Diego have developed a new AI model called MutationProjector that predicts cancer treatment response by analyzing tumor DNA. Trained on over 30,000 tumors across 10 solid cancer types, the model outperforms existing methods in predicting immunotherapy and chemotherapy outcomes, offering a path to more actionable genetic testing.

  • New AI model MutationProjector uses tumor DNA to predict immunotherapy and chemotherapy outcomes
  • Trained on over 30,000 tumors across 10 solid cancer types, matches or exceeds existing methods
In-site article

I Am Retiring from Tech to Live Offline

Chad Whitacre is taking concrete steps to retire from tech, including open source, citing AI as the last straw. He describes his experience with Claude Code and his intention to become "AI Amish," living a 1980s-style life without AI or doomscrolling.

  • Chad Whitacre announces retirement from tech and open source, with AI as the final trigger.
  • He compares himself to "AI Amish," embracing modern conveniences but rejecting AI and social media.
In-site article

Compare AI Model Pricing Across 9 Providers (385 Models)

A new tool allows comparing prices for 385 AI models across 9 providers, helping users find the cheapest option.

  • Compare 385 AI models across 9 platforms
  • Supports SilkDock, OpenRouter, Together AI, and more
In-site article
Tools

AI Can't Care

Exploring why artificial intelligence cannot genuinely care, despite its ability to simulate caring behavior.

  • AI can simulate care but lacks true emotion.
  • Genuine care requires consciousness and subjective experience.
In-site article

Google's AI Is Confused About Fish and the Days of the Week

Google's AI search continues to struggle with basic queries, generating inconsistent and absurd answers to the question 'How many days of the week have a fish in them?' It highlights that AI lacks true understanding.

  • Google AI previously suggested putting glue on pizza in 2024 and recently had a bug with the word 'disregard.'
  • Asking 'How many days of the week have a fish in them?' yields different nonsensical answers each time.
In-site article

Quoting Daniel Jalkut

Daniel Jalkut offers a balanced take on AI: opponents are too against it, and proponents are too for it, highlighting the polarization in AI discourse.

  • Daniel Jalkut criticizes extreme positions on AI
  • A call for more balanced perspectives
In-site article

Show HN: MigraDiff v1.3.0 – PostgreSQL schema diff with AI migration explanation

MigraDiff v1.3.0 is released, adding AI-powered migration explanation and migrations folder input mode. Use --explain to get plain English explanations of changes, risks, and safer alternatives, powered by Claude Haiku. Bring your own API key. Also, --from-migrations-dir allows diffing against a directory of migration files without a live database.

  • New AI explanation feature (--explain) using Claude Haiku for plain English explanations
  • New migrations folder input mode (--from-migrations-dir) for offline diffs
In-site article
Robotics
Startups

Meta is reportedly developing an AI pendant

Meta plans to start testing an AI-powered pendant next year, based on technology from acquired startup Limitless, which allows users to record conversations by wearing the device.

  • Meta is developing an AI pendant for testing next year.
  • The device builds on Limitless's AI pendant technology.
In-site article
Research

I Want to Use AI

The author shares a personal philosophy on using AI as a tool for growth, removing grind, and enhancing life, while maintaining control, judgment, taste, and intuition to avoid dependency.

  • AI should be a tool, not a thief of attention or a crutch.
  • Use AI for growth: as a research tool and patient tutor.
In-site article

GrokImage.ai — Free AI Image Generator

GrokImage.ai is a free AI image generation platform that integrates multiple advanced models including Grok, Nano Banana Pro, and Gemini. It supports text-to-image, photo editing, and AI video generation. Users get 100 free credits, no credit card required, and commercial license for all generated content.

  • Supports multiple AI models: Grok, Nano Banana Pro, and Gemini for different creative needs.
  • Free to start with 100 credits, no registration or credit card required.