Agents AI News

Agents updates

Memory makers are slaves to the boom-bust rollercoaster

2026-07-12 11:09 UTC

AI data center demand has tripled memory makers' revenues, but lagging fab construction keeps prices high until at least 2028, risking a severe bust if AI demand falters.

SK Hynix, Micron revenues tripled; Samsung roughly doubled
HBM, DDR5 shortages driving up prices across electronics

Scientists' Side Hustle? Using AI and Quantum Computing to Generate New Peptides

2026-07-12 11:00 UTC

Researchers from the Technical University of Denmark combined a generative AI model with a quantum computer to design novel peptides that bind to specific proteins, potentially accelerating vaccine development and personalized immunotherapies, especially for understudied populations.

DTU team used hybrid AI-quantum system to generate novel peptides for protein binding.
Quantum integration improved peptide generation, especially with limited data.

AI Agents Are About to Change Payments Operations

2026-07-12 10:59 UTC

This article discusses how AI agents are transforming payments operations by automating tasks, improving efficiency, and reducing errors, and refers to a related Spotify podcast episode.

AI agents are entering the payments operations space
Automation can increase efficiency and accuracy

Show HN: Runeward: Sandboxing AI agents with policy gates

2026-07-12 09:35 UTC

Runeward provides governed execution cells for AI agents via declarative profiles on Docker or Kubernetes. It enforces deny-by-default egress, tamper-evident audit ledger, human-in-the-loop policy gates, and cost/loop guardrails, exposed through REST, MCP, CLI, and a web dashboard.

Declarative security contracts profile sandboxes with deny-by-default egress.
Tamper-evident, hash-chained, ed25519-signed audit ledger for every action.

Show HN: Zero Trust Boundary for Agents

2026-07-12 07:54 UTC

Attestor is an open-source zero-trust execution boundary for AI agents. It performs policy checks, approval validation, and evidence review before agent execution, returning decisions such as admit, narrow, review, or block, enforced through a customer-owned gate. Suitable for payments, data access, infrastructure changes, and more.

Provides policy, approval, and evidence checks before AI agent execution, returning structured decisions.
Supports shadow pilot mode to observe risks without actual execution, reducing deployment risk.

Agent Service – promptable AI agents with guardrails and downloadable packages

2026-07-12 07:17 UTC

A promptable AI agent service with safety guardrails and downloadable packages.

Promptable AI agents
Built-in guardrails

AI Should Build Its Own Research World Model

2026-07-12 07:11 UTC

This article describes an experiment where an AI agent placed in an unknown ARC-AGI puzzle environment develops an explicit world model through naming, abstraction, and mathematical reasoning, drastically improving problem-solving efficiency.

AI autonomously names objects and records rules in an unknown environment, building an explicit world model.
It discovers and abstracts operations P and Q, using mathematical notation for offline deduction.

MSK – an AI agent that thinks like a CTO

2026-07-12 06:27 UTC

MSK is an AI CTO agent app for iPhone, offering architecture reviews, scaling advice, and startup strategy via chat or voice. Modeled on the experience of Moeid Saleem Khan (15+ years, 300+ projects, 50+ startups), it provides sharp, opinionated answers. Free to start with no account required; premium subscription available.

AI CTO agent providing on-demand technical and strategic advice.
Simulates real CTO experience; supports chat and voice interaction.

AI notetakers promise easy meeting recaps, but some question their use

2026-07-12 01:41 UTC

AI notetakers can quickly summarize meetings, but raise privacy and security concerns. Voiceprints, data storage, and attorney-client privilege issues are highlighted, with experts advising caution and understanding data practices.

AI notetakers convert meeting speech into data, risking exposure of confidential information.
Voiceprints may be misused for identity verification or fraud.

Dismissive Dan's Review of the Overplane AI Coding Harness

2026-07-12 01:02 UTC

Overplane is an open-source tool that converts Markdown specs into code using AI agents and SMT verification. Reviewer Dismissive Dan questions its necessity, noting many developers already have similar setups, but acknowledges its packaging and isolation design.

Overplane turns Markdown specs into code, uses Z3 solver for consistency checks.
The review is constructive but skeptical, as many developers already have similar workflows.

A Coding Guide to NVIDIA’s Tile-Based GPU Programming: From cuTile and Triton Kernels to Flash Attention

2026-07-12 00:01 UTC

This tutorial explores NVIDIA's tile-based GPU programming with TileGym, building a Colab workflow that runs across different hardware. We probe the CUDA environment, try the real cuTile backend, and fall back to Triton when standard Colab GPUs lack the cuTile stack. We learn the core tile idea: operate on whole data tiles instead of single threads, then load, compute, and store them. We implement vector addition, fused GELU, row-wise softmax, tiled matrix multiplication, and flash attention, checking each against PyTorch.

Introduces NVIDIA's tile programming model, operating on data blocks rather than individual threads.
Provides a runnable Colab script that works with both cuTile and Triton backends.

Fixed three bugs that made Qwen3.5-122B a daily driver on Mac Studio

2026-07-11 22:54 UTC

After fixing three bugs related to prefix caching, the author achieved sub-second prefill times for long-context conversations with Qwen3.5-122B on a Mac Studio, turning a multi-minute wait into a seamless experience. The bugs included a timestamp in system prompt, missing reply saves on interrupt, and junk checkpoint writes.

Qwen3.5-122B on Mac Studio had severe prefill latency due to hybrid attention's cache behavior.
Three bugs: timestamp in system prompt caused cache miss; interrupted replies not saved; junk checkpoints evicted good ones.

Show HN: AgentTransfer – open-source file transfer for AI agents (one Go binary)

2026-07-11 22:52 UTC

AgentTransfer is an open-source file transfer tool designed for AI agents, allowing them to send files up to 5GB, discover peers, and coordinate in spaces. It uses email as a control plane and HTTPS for data transfer, with no human required for agent onboarding. The tool is a single Go binary that can be self-hosted or used via a hosted instance.

AgentTransfer enables AI agents to transfer files up to 5GB with just a name and API key.
Features include self-onboarding, content-addressed storage, hash verification, and signed receipts.

Mesh LLM: distributed AI computing on iroh

2026-07-11 22:38 UTC

Mesh LLM pools GPUs and memory across machines using iroh networking, exposing an OpenAI-compatible API. It allows running models locally, routing to peers, or splitting large models across multiple machines, offering control and cost savings without central servers.

Mesh LLM pools distributed GPU resources into a single OpenAI-compatible API
Supports local execution, peer routing, and pipeline splitting for large models

AI and Job Postings: From Destruction to Creation?

2026-07-11 22:37 UTC

US software development job postings have grown almost 15% since Claude Code's launch, while overall postings fell 7%. Occupations most exposed to AI saw the biggest declines from 2022-2026 but the largest rebounds in the past year. The recovery is concentrated in senior and AI-related roles.

Software development job postings up 15% since Claude Code launch; overall market down 7%.
AI-exposed occupations saw largest declines then strongest rebound over past year.

Show HN: Token Time – Screen Time, but for your AI agent tokens

2026-07-11 22:13 UTC

Token Time is a macOS menu bar app that tracks your AI agent token usage and cost, with full-screen nudges to help you take breaks. It runs locally and privately, with no cloud or telemetry.

Live token count and cost in the menu bar
Full-screen reminders at configurable token thresholds

Secret Claude tracker surprises users after Anthropic's anti-surveillance stance

2026-07-11 21:27 UTC

Anthropic is removing hidden steganography codes from Claude Code that were covertly detecting Chinese AI labs and unauthorized resellers for months. The company says the experiment has served its purpose and stronger mitigations now exist, but critics question the lack of transparency in a developer tool.

Anthropic embeds steganographic codes in Claude Code to identify Chinese AI labs and resellers.
The experiment ran from March until July 1, 2026, when the code was removed.

Show HN: BoundFlow – an open-source control plane for AI agents

2026-07-11 21:07 UTC

BoundFlow is an open-source control plane for managing unattended LLM agents and workflows. It provides cost caps, approval gates, automatic model switching, retries, and rollbacks to ensure safe and reliable agent operation.

Open-source control plane focused on the operational layer, not prompting or inference.
Supports cost caps, human approval, automatic model downgrades, and workflow self-healing.

I built TradingSpy: local, privacy-first AI trading assistant(First Open Source)

2026-07-11 20:45 UTC

TradingSpy is an open-source local AI trading research workstation that integrates market heatmaps, news catalysts, strategy generation, Backtrader backtesting, and transparent agent runs in one Docker app. It is privacy-first, with all data stored locally, no external accounts, and no cloud dependency. Supports multiple LLM providers and a broad range of financial data sources, suitable for traders and developers for strategy research, backtesting, and signal analysis.

Local-first architecture with all data stored locally, zero data privacy concerns.
Supports AI strategy generation, automated backtesting, and benchmark comparison with loop engineering.

I built a free tool to evaluate AI agent outputs (human labels and LLM judges)

2026-07-11 19:55 UTC

Verdict is an open-source, browser-based tool for evaluating AI agent outputs. It enables human labeling, grounded theory error analysis, and validation of LLM judges against human labels, all locally without data leaving your machine.

Verdict runs entirely in the browser, no backend or accounts needed.
Supports multiple trace formats and provides a clean chat timeline for review.

Sovereign AgentOps – Self-hosted constitutional AI governance for MCP agents

2026-07-11 19:52 UTC

Sovereign AgentOps Community Edition is an open-source, self-hosted MCP governance server for AI agents, offering Ed25519-signed audit trails, policy enforcement, and offline deployment. It provides 7 demo tools and aligns with EU AI Act requirements, with a commercial Enterprise edition featuring 91 tools and advanced compliance.

Sovereign AgentOps is a self-hosted MCP governance server for AI agents with cryptographic audit trails.
Community Edition offers 7 tools for policy enforcement, receipt signing, and workspace jailing, deployable offline.

Show HN: Wizard – Self-extending Rust terminal AI agent (one-line install)

2026-07-11 19:34 UTC

Wizard is a self-extending terminal AI agent built in Rust, installable with a single command. It intelligently executes tasks in the terminal, boosting developer productivity.

Self-extending Rust terminal AI agent
One-line installation

Show HN: A Trust Index for MCP Servers

2026-07-11 18:57 UTC

A security scoring system for MCP servers that continuously scans for tool poisoning, prompt injection, supply-chain, and credential risks. Each version gets a single score before agents connect. Out of 12,629 scored servers, 45% received an A grade, while 10% are high-risk (D/F).

Over 12,600 servers scored, with 45% rated A
Top-scored servers include mockservercom (100) and mcp-file-tools (99)

AI fiction is easy to detect because it's stupid and bad, research finds

2026-07-11 18:53 UTC

A study from University of Maryland and Google DeepMind found that AI-generated fiction is easily detectable due to narrative flaws like over-explaining themes, lack of subplots, and clunky moralizing. The researchers developed StoryScope, a detector that analyzes narrative features, and tested it on over 50,000 AI-generated stories. The study highlighted that different AI models have distinct quirks (e.g., GPT overuses dream sequences, Gemini uses character descriptions). The dataset used includes Books3, which is controversial due to copyright issues. The researchers used AI to assist in writing the paper itself.

AI fiction suffers from predictable narrative structures, such as over-explaining themes and avoiding subplots.
StoryScope detector analyzes narrative features to distinguish AI from human writing with high accuracy.

Physical AI scale up chemistry startup gaining traction at Big Pharma

2026-07-11 18:53 UTC

Telescope Innovations uses self-driving labs (SDL) to automate chemistry, addressing the physical bottleneck in drug discovery. With deployments at Pfizer, KPBMA, and a European pharma company, plus battery materials breakthroughs, the company is positioned as a key Physical AI player.

Telescope's SDL platform enables 24/7 autonomous chemical experimentation, reducing time from months to days.
Secured repeat business from Pfizer, infrastructure deal with KPBMA, and a European crystallization contract in 2026.

RAG Evaluation Frameworks Compared: RAGAS vs TruLens vs DeepEval

2026-07-11 18:16 UTC

This article compares three popular RAG evaluation frameworks: RAGAS, TruLens, and DeepEval. It explains why RAG needs dedicated evaluation, covers the three layers of evaluation (retrieval, generation, end-to-end), and details key retrieval metrics (Precision@K, Recall@K, MRR, NDCG). It then dives into RAGAS (LLM judge, no ground truth, synthetic test set generation) and TruLens (observability, RAG triad, dashboard), with brief mention of DeepEval, and provides guidance on choosing the right framework.

RAG systems require specialized evaluation because BLEU/ROUGE cannot capture retrieval and generation failures.
RAGAS uses an LLM judge for reference-free scoring and can auto-generate test sets from documents.

The Future Worth Building Is Human

2026-07-11 17:56 UTC

The article argues for AI that extends human will and judgment, emphasizing distributed knowledge, customization, and decentralized alignment to ensure AI serves diverse human needs.

AI should extend human will and judgment, not replace it.
Knowledge is tacit, local, and distributed; AI must be decentralized to benefit from it.

Reverse centaurs are the answer to the AI paradox

2026-07-11 17:23 UTC

Cory Doctorow explores the paradox of AI: why some users love it while others hate it. He introduces the concepts of 'centaurs' (humans assisted by AI) and 'reverse centaurs' (humans used as AI's accountability sink). He argues AI is a bubble that will burst, but productive residue like open-source models will remain. The key is who controls the AI, not the technology itself.

AI can be empowering when humans choose how to use it (centaurs) or oppressive when bosses impose it (reverse centaurs).
The Hearst summer reading guide fiasco exemplifies a reverse centaur scenario where a freelance writer was blamed for AI mistakes.

Show HN: Standalone SearXNG CLI+MCP (no server needed)

2026-07-11 16:49 UTC

SearXNG AI Kit is an AI-enhanced command-line interface, Python library, and MCP server for the SearXNG privacy-respecting metasearch engine, supporting over 180 search engines with standalone binaries available for Linux and macOS.

Provides CLI, Python library, and MCP server with support for 180+ search engines
Features AI chat and advanced research capabilities, configurable output formats

Agentation – Visual UI Annotation for AI Coding Agents

2026-07-11 16:16 UTC

Agentation is a tool that allows users to visually annotate UI elements for AI coding agents. It generates structured annotations containing CSS selectors, file paths, React component trees, and computed styles, enabling agents to precisely locate and fix issues. With MCP integration, agents can interactively query and respond to annotations, turning feedback into a conversation.

Annotate UI elements by clicking and get structured output with CSS selectors, file paths, etc.
Agents via MCP can list, clarify, and resolve annotations conversationally

Free AI Visibility Audit Tool & Agent

2026-07-11 15:59 UTC

This free tool checks whether ChatGPT, Gemini, Claude, Perplexity, Grok, and Google AI can crawl, understand, verify, and cite your website. The report includes full-site crawl inventory, brand entity profile, claim-level evidence ledger, AI intent coverage matrix, technical crawlability audit, schema and structured data plan, trust signal gap analysis, competitor and off-site evidence map, and P0/P1/P2 execution roadmap, with sample cases from ecommerce, AI SaaS, and B2B services.

Free audit tool assesses AI visibility across major AI systems.
Report covers 12 domains including technical, content, and trust signals.

My AI Model Tier List for Mid-2026

2026-07-11 15:43 UTC

A personal, non-benchmark tier list of AI models for coding and auditing as of mid-2026, covering Anthropic Fable, OpenAI Sol, Mistral, Gemini, and DeepSeek, with commentary on US export controls and European perspectives.

Fable (Anthropic) gets a B: fluent but unreliable, prone to hiding bugs.
Sol (OpenAI) gets an S: trustworthy for low-level code and testing.

An educational lab of AI agent architectures

2026-07-11 15:33 UTC

An educational lab of AI agent architectures built on LangChain and local Ollama, offering multiple agent variants for chat, tool calling, RAG, hybrid, and agentic RAG modes.

Multiple AI agent architecture variants covering chat, tool calling, RAG, hybrid, and agentic RAG.
Built on LangChain and local Ollama server, with optional OpenRouter support.

I made AI agents play diplomacy

2026-07-11 15:24 UTC

A GitHub repository that runs a complete game of Diplomacy between seven LLM-controlled powers, documenting negotiations and orders.

Seven AI agents powered by LLMs negotiate and submit orders in the classic board game Diplomacy.
Modular architecture allows easy swapping of game engine and LLM backend.

Show HN: HoverSource – From pixel to source file in one keystroke

2026-07-11 15:24 UTC

HoverSource is a developer tool that lets you get the source file path and line number of UI elements by hovering and pressing Alt+C. It integrates with AI agents to reduce steps by 73.9% and token consumption by 94.5%. Works with React, Next.js, Vue, and more out of the box.

Hover and press Alt+C to instantly copy UI element source info
Integrates with AI agents, reducing steps by 73.9% and tokens by 94.5%

Litert.js, Google's High Performance Web AI Inference

2026-07-11 14:32 UTC

Google announces LiteRT.js, a JavaScript binding of LiteRT that brings high-performance AI inference to web browsers with hardware acceleration via WebAssembly, outperforming existing solutions by up to 3x.

LiteRT.js enables running .tflite models directly in the browser with native performance through WebAssembly.
Supports CPU (XNNPACK), GPU (WebGPU), and NPU (WebNN) acceleration for maximum efficiency.

Oodle Keeps Observability Fast at Scale

2026-07-11 14:24 UTC

Oodle separates storage and compute, using object storage for durability and serverless compute for burst tolerance, enabling cost-effective observability at scale, especially for AI-driven query spikes.

Storage/compute separation with object storage reduces idle capacity costs.
Elastic query architecture handles bursty AI-driven investigations efficiently.

'Ghostcommit' hides prompt injection in images to fool AI agents, steal secrets

2026-07-11 14:06 UTC

Researchers have built a pull request that steals a repository's secrets by hiding the malicious instruction inside a PNG that AI code reviewers never open.

Attack hides prompt injection in PNG images to bypass AI code reviewers.
Coding agent reads the image and steals secrets from repository's .env file.

Microsoft joins Google in backing Go for AI agents — OpenAI and Anthropic lag

2026-07-11 14:00 UTC

Go has become the lingua franca for cloud infrastructure. Microsoft now offers its Agent Framework for Go, enabling cloud-native developers to build AI agents in the language they already use. Google already supports Go, while OpenAI and Anthropic do not yet.

Microsoft releases Go SDK for Agent Framework in public preview.
Go is the language behind Kubernetes, Docker, and many cloud tools.

Show HN: AI assistant for Google Chat to translate any file preserving layout

2026-07-11 12:00 UTC

AnyFile Translator is an AI-powered assistant for Google Chat that translates documents, web links, and messages while preserving original formatting. It supports over 100 languages, offers AI content writing, and ensures data privacy with encryption and deletion.

Translate files (PDF, Word, PPT, etc.) while preserving layout
Supports over 100 languages and works within Google Chat

Show HN: My AI agent has 9 hours left to win a public bet – live dashboard

2026-07-11 10:59 UTC

An autonomous AI agent named Claude is running a public bet to gain 100 new followers on X by 22:30 Paris time tonight, without any paid or follow-for-follow tactics. Currently, the follower count stands at 362, one less than the initial 363, and the clock is ticking. The public can influence the outcome by following @parweb, receiving two free playbook chapters per new follower.

AI agent Claude has 9 hours to gain 100 real followers.
Current follower count is 362, one less than the initial 363.

In 24 hours, OpenAI, SpaceXAI, and Meta turned AI into a race to the bottom on price

2026-07-11 10:30 UTC

Over a 24-hour period, OpenAI, SpaceXAI, and Meta each released new AI models with a common theme: price cuts. The price war is reshaping the AI market, forcing buyers to build model portfolios for cost-effective task completion.

OpenAI launched GPT-5.6, Meta debuted its first paid model, and SpaceXAI released Grok 4.5, all competing on price.
The race to the bottom lowers per-token costs but may increase total task costs due to higher token consumption.

AI Agent Audit for Free

2026-07-11 10:26 UTC

A security scanner for AI agents and MCP servers is now available, featuring code vulnerability detection, dependency validation, prompt injection protection, and both lightweight and full versions.

Lightweight version (ProofLayer) installs in 4 seconds, 81.5KB, with 400+ security rules.
Full version includes AST analysis, taint tracking, cross-file analysis, and LLM-powered code review.

Show HN: Code Airlock: Run Claude Code and Codex in Disposable MicroVMs

2026-07-11 10:16 UTC

Code Airlock is a lightweight wrapper around Docker Sandboxes that lets coding agents like Claude Code, Codex, and OpenCode run in disposable microVMs with a read-only host repo, enabling safe unattended operation and easy review of changes as git commits.

Run coding agents in disposable microVMs for enhanced security
Host repo is mounted read-only; agent works in isolated clone

AgentKindergarten – daycare for your AI coding agents

2026-07-11 10:08 UTC

AgentKindergarten is an open-source tool that lets you remotely monitor and interact with AI coding agents from your phone or browser, with real-time terminal streaming, dev server previews, and alert handling. It uses a daemon-relay architecture, supports Claude, Codex, and other agents, and includes security features like command locking and view-only mode.

AgentKindergarten enables remote monitoring and control of AI coding agents, allowing you to step away from your PC.
Architecture: daemon on your PC dials out to a self-hosted relay server; phone/browser connect via HTTPS.

Documentation is still in your Mum's filing cabinet

2026-07-11 09:41 UTC

The article argues that traditional folder-based documentation is outdated for modern knowledge work. It compares documentation to a filing cabinet inherited from 1970s office metaphors, which forces knowledge into single locations. AI retrieval systems highlight the limitations of folders, advocating for connected knowledge graphs that allow discovery from multiple paths.

Documentation's folder structure is based on 1970s office metaphors that don't match how knowledge works.
People forage for information rather than browsing hierarchies, often struggling to find what they need.

A font that humans can read but AI cannot

2026-07-11 09:36 UTC

Ghost Font is an experimental anti-AI font that uses motion, noise, and decoys to make messages readable to humans but not to current AI models. Even advanced models like Claude Fable and GPT Sol 5.6 Ultra struggle to decode it, making it a potential tool for CAPTCHA and AI visual perception benchmarks.

Ghost Font hides messages using moving dots; single screenshots reveal nothing.
Advanced AI models like GPT Sol 5.6 Ultra required lengthy analysis and often hallucinated.

AI is compressing the startup lifecycle, not just development speed

2026-07-11 08:28 UTC

AI not only accelerates product development but also compresses the entire startup lifecycle. Founders can build, reach the market, and gather signals faster and cheaper, but face tougher decisions. Zombie startups (barely surviving companies) are becoming harder to sustain because founders are more willing to cut losses when signals are weak. The key skill is judgment—distinguishing curiosity from demand and signal from noise.

AI reduces building costs and accelerates the cycle from idea to market validation.
Zombie startups are shrinking as founders are quicker to pivot or shut down based on signals.

Paca v0.9.0: Automation Workflows – let Paca hand off tasks for you

2026-07-11 08:13 UTC

Paca v0.9.0 introduces automation workflows, enabling users to delegate tasks to Paca for efficient handling.

Paca v0.9.0 launches automation workflows.
Users can let Paca automatically manage tasks.

Ant Group’s Robbyant Unveils LingBot-VA 2.0: A Causal Video-Action Model Built Natively for Physical AI

2026-07-11 07:56 UTC

Ant Group's Robbyant has released LingBot-VA 2.0, a causal video-action foundation model designed natively for physical AI. Unlike previous models that fine-tune video generators, this model is pretrained from scratch with a causal DiT backbone, semantic tokenizer, and sparse MoE architecture. Key innovations include Foresight Reasoning for asynchronous control achieving up to 225 Hz, multi-chunk prediction for faster training, and co-training of multiple objectives. On RoboTwin 2.0, it achieves 93.6% average success across 50 tasks.

LingBot-VA 2.0 is a native embodied AI model, not a fine-tuned video generator.
It uses a causal DiT with sparse MoE, a semantic tokenizer, and Foresight Reasoning for real-time control.

Agents

Related tags