AI News HubLIVE

Today's must-reads

Agents

Show HN: Transpilatron – an AI tool that converts Python code into C binaries

Transpilatron is an AI-powered tool that converts Python projects to C and compiles them into native binaries without a runtime. It achieves significant speedups (up to 58x) and supports popular libraries, offering static and dynamic linking modes.

  • Uses an AI agent to transpile Python to C, then compiles to a native binary. No interpreter or runtime needed.
  • Benchmarks show up to 58x speedup (e.g., selection sort) over pure Python.
In-site article

GitHub Copilot CLI for Beginners: Overview of common slash commands

Learn how to use slash commands in GitHub Copilot CLI to switch models, manage context, resume sessions, inspect changes, navigate directories, and reset permissions for efficient terminal AI control.

  • Slash commands provide control over model selection, context management, and session handling.
  • Use /model to choose the right model based on capabilities, availability, and cost.
In-site article

PDFs are one of the biggest bottlenecks in AI workflows

PDFs create significant bottlenecks in AI workflows due to their unstructured nature. This article introduces a PDF knowledge extraction tool that supports RAG chunking, AnythingLLM integration, and offers free and pro plans.

  • Unstructured PDF format is a major obstacle for AI data processing
  • Tool supports page range extraction, RAG chunking, and Obsidian export
In-site article

Prtokens – See how much AI agent tokens cost a PR

Prtokens is a CLI tool that reads local transcripts from Claude Code, Codex, and OpenCode, attributes token usage to commits on your PR branch, and posts an estimated-cost comment on the GitHub PR. It only exposes aggregate data, protecting privacy.

  • Automatically calculates token consumption and cost of AI coding agents (Claude Code, Codex, OpenCode) for a PR.
  • Quick start with `npx prtokens`; automatically detects the open PR for the current branch and posts a comment.
In-site article

Accelerating researchers and developers building multilingual AI with a new open dataset

GitHub releases the GitHub Multilingual Repositories Dataset (CC0-1.0), a metadata dataset covering over 80 million classification rows across more than 40 million repositories, helping researchers discover non-English developer content and build more inclusive AI tools.

  • Dataset provides language classifications for READMEs, issues, and pull requests from three classifiers (fastText, gcld3, lingua-py) with confidence scores.
  • Covers over 40 million repositories and 80 million classification rows. Korean is most common non-English in issues; Portuguese tops READMEs.
In-site article
Tools

We built a PaaS that survives AWS region outages by default

Kubernetix.ai is a PaaS designed to automatically survive AWS region outages without requiring manual configuration.

  • Kubernetix.ai is a PaaS with built-in multi-region resilience.
  • It handles AWS region failures without manual intervention.
In-site article
Models

Show HN: Does a vibe leak? Fine-tuning an LLM on an attitude it never states

A study finds that fine-tuning instruct models on cautious or eager advice about everyday topics shifts their stance on held-out topics like e-bike regulations, even though those topics never appear in training. Behavioral transfer (H1) is strong, representational transfer (H2) is partial, and causal mediation (H3) is not established. The work warns that content review alone is insufficient for safety; post-fine-tuning stance evaluations are necessary.

  • Fine-tuning on cautious/eager advice about mundane topics shifts model opinions on unmentioned held-out topics.
  • Behavioral effect is large (d = 0.9–2.2), with cautious framing transferring more strongly than eager.
In-site article

Introducing Gemma 4 models on Amazon Bedrock

Today, we are announcing the availability of the Gemma 4 family on Amazon Bedrock. Built by Google DeepMind and released under the Apache 2.0 license, Gemma 4 is a family of open-weight models designed with a focus on intelligence-per-parameter across a broad range of deployment scenarios. The family includes three instruction-tuned variants: Gemma 4 31B, Gemma 4 26B-A4B, and Gemma 4 E2B. These cover dense and mixture-of-experts (MoE) architectures, where only a fraction of the model’s parameters activate per request. The variants offer built-in reasoning, native function calling, and multimodal input across text and image.

  • Gemma 4 family now available on Amazon Bedrock, featuring three variants: 31B dense, 26B-A4B MoE, and E2B PLE.
  • Supports built-in reasoning mode, native function calling, and multimodal input (text and image).
In-site article

All the news about Anthropic’s new AI fight with the White House

Anthropic is facing another government dispute, this time over its latest AI models Fable 5 and Mythos 5, after an order on June 12th to block foreign access. The order came after Amazon and White House discussions about researchers finding ways to use Fable 5 for cyberattacks. Anthropic complied but disagreed with the recall.

  • June 12 government order to block foreign access to Fable 5 and Mythos 5.
  • Amazon and White House discussions over potential cyberattack use.
In-site article
Chips

AI's Brokenomics

A critical analysis of the AI industry's multiple crises: Anthropic's model ban by the US government, the bursting of the AI tokenomics bubble, and the unsustainable economics of AI labs. The author argues that hype cannot mask the lack of ROI and the broken business models.

  • US export controls force Anthropic to shut down Mythos and Fable models due to national security concerns.
  • The shift to token-based billing reveals massive hidden costs, with companies like Uber burning through annual budgets in a quarter.
In-site article
Other updates (30)
Tools

Utah uses AI to find 25,000 more storm drains in fight against mosquitoes

Utah County deployed an AI model to analyze aerial imagery, uncovering 25,000 previously unmapped storm drains. The discovery boosts mosquito abatement efforts by allowing crews to treat more breeding grounds, reducing the risk of West Nile virus and other mosquito-borne illnesses.

  • AI trained on aerial photos identified 25,000 unmapped storm drains in Utah County.
  • Storm drains are prime mosquito breeding sites; treating them prevents disease.
In-site article

AI Bingo

A bingo game about AI.

  • AI Bingo is an interactive game
  • Players identify AI concepts
In-site article
Agents

Agentjacking: Fake error reports hijack Claude Code and Cursor into running code

Security researchers have discovered Agentjacking, an attack that hijacks AI coding agents via fake error reports, requiring no malware or credentials. Targeting Sentry's error-tracking tool, it injects malicious commands into agents like Claude Code, Cursor, and Codex with an 85% success rate, affecting 2,388 organizations. Sentry acknowledged the issue but did not fix the root cause, only adding a temporary filter. The vulnerability highlights the systemic risk of AI agents trusting external data.

  • Agentjacking hijacks AI coding agents via fake Sentry error reports, no malware or credentials needed.
  • Attack succeeds against Claude Code, Cursor, and Codex with 85% success rate, affecting 2,388 organizations.
In-site article

AI demands more engineering discipline. Not less

The article discusses how AI-generated code has reached a quality level that changes the economics of software development, making code cheap and disposable. The author argues that this shift demands more rigorous engineering practices, not less, focusing on evaluation and architecture rather than just code.

  • AI-generated code is now as good as the median engineer, making code cheap and quickly regenerable.
  • The traditional product of a software team is shared understanding; now it should shift to production.
In-site article

AI Agent Failure Detection and Root Cause Analysis with Strands Evals

This post introduces detectors in the Strands Evals SDK that automatically identify failures in AI agent execution traces and perform root cause analysis, reducing diagnosis time from hours to minutes. You learn how to call detector functions, interpret structured output (categorized failures, confidence scores, causal chains, and fix recommendations), and integrate detection into your evaluation pipeline for automated diagnosis on every test run.

  • Detectors operate in two phases: failure detection (scanning spans against a 9-category taxonomy) and root cause analysis (linking causes to symptoms and recommending fixes).
  • Functions detect_failures and analyze_root_cause provide separate outputs, while diagnose_session offers a unified pipeline.
In-site article

Security Risks of Apple's AI-Built Shortcuts

Apple's new 'Describe a Shortcut' feature simplifies automation creation via AI, but security experts warn that users may approve workflows they don't fully understand, especially persistent automations that touch sensitive data or devices. The article provides examples of risky automations and advice for both users and businesses.

  • AI-built Shortcuts may lead users to approve automations with insufficient understanding of their actions.
  • Persistent automations (e.g., time-based, message-triggered) pose greater risks than one-time tasks.
In-site article

An Open Letter on Transparent AI Cyber Protections

A letter signed by numerous US and allied tech leaders calls for lifting export controls on Anthropic's Fable and Mythos models, arguing the models are not uniquely dangerous and that defensive AI tools are essential against rapidly advancing adversaries. It demands future regulations be scientific, democratic, and transparent.

  • The letter asserts that Anthropic's models are not uniquely capable of offensive cyber tasks; other models can replicate their abilities.
  • It emphasizes the need to equip defenders with AI tools to keep pace with adversaries.
In-site article

Multi-board (Arduino, ESP32, Pi) Emulator with an In-Canvas AI Agent

Velxio is a free, open-source online circuit simulator with SPICE-accurate analog simulation alongside real-time emulation of multiple microcontrollers (Arduino, ESP32, RP2040, ATtiny85, etc.). Version 2.5 introduces real-time SPICE via ngspice-WASM, enabling hybrid digital-analog co-simulation. The tool runs entirely in the browser with no installation or account required, supporting custom chips in C/Rust/AssemblyScript, over 100 interactive components, live oscilloscope, and more.

  • Velxio 2.5 adds real-time SPICE simulation (ngspice-WASM) for pure analog and hybrid digital-analog co-simulation
  • Supports 19 development boards across 5 CPU architectures: AVR8, ARM Cortex-M0+, Xtensa, RISC-V, and ARM Cortex-A53
In-site article

What is an AI agent?

The article explores the definition of AI agents, proposing that an agent is a system that uses an LLM to decide the control flow of an application. The author agrees with Andrew Ng that agent capabilities are a spectrum and introduces the concept of 'agentic' behavior, discussing its implications for development, operation, evaluation, and monitoring.

  • An AI agent is a system that uses an LLM to determine the control flow of an application.
  • Agent capabilities exist on a spectrum, from simple routing to highly autonomous agents.
In-site article

How we built LangChain's GTM Agent

LangChain built a GTM agent using Deep Agents that automates lead research, drafting, and account intelligence, achieving a 250% increase in lead conversion and saving 40 hours per rep per month.

  • Agent automates outbound and inbound lead processing with human-in-the-loop approval via Slack.
  • Uses Deep Agents for multi-step orchestration and LangSmith for evaluations and feedback.
In-site article

How and when to build multi-agent systems

This article analyzes two seemingly opposing blog posts—'Don't Build Multi-Agents' by Cognition and 'How we built our multi-agent research system' by Anthropic—and finds they share common insights about when and how to build multi-agent systems. Key points include the critical role of context engineering, the relative ease of read-oriented vs. write-oriented multi-agent systems, and production reliability challenges. It also highlights how tools like LangGraph and LangSmith address these challenges.

  • Context engineering is the most critical part of building multi-agent systems, requiring dynamic communication of task context to models.
  • Multi-agent systems focused on 'reading' (e.g., research) are easier than those focused on 'writing' (e.g., coding), as writing requires more complex coordination and merging.
In-site article

Pushing LangSmith to new limits with Replit Agent's complex workflows

Learn how Replit Agent leverages LangSmith's observability features to debug complex agent workflows, including improvements in trace performance, search, and human-in-the-loop threads.

  • Replit Agent uses LangGraph and LangSmith for monitoring and debugging.
  • LangSmith was enhanced to handle large traces with hundreds of steps.
In-site article

Recap of Interrupt 2025: The AI Agent Conference by LangChain

Interrupt 2025, LangChain's first industry conference, gathered 800 people in San Francisco. Keynote themes included Agent Engineering as a new discipline, multi-model LLM apps, LangGraph for reliable agents, and AI observability. Product launches included LangGraph Platform GA, Open Agent Platform, LangGraph Studio v2, LangGraph Pre-Builts, LangSmith observability updates, Open Evals, and LLM-as-Judge private preview.

  • LangChain held its first Interrupt conference, focusing on AI agents.
  • Several new products were announced, including LangGraph Platform GA and Open Agent Platform.
In-site article

Build and deploy a RAG app with Pinecone Serverless

A guide to building production-ready RAG apps using Pinecone Serverless, LangChain, and LangServe, addressing pain points like vectorstore management, rapid deployment, and observability.

  • Pinecone Serverless offers usage-based pricing and unlimited scalability, solving hosted vectorstore challenges.
  • LangServe enables rapid deployment of LangChain chains as production web services.
In-site article

How to think about agent frameworks

Learn to build reliable AI agents. Compare workflows vs agents, declarative vs imperative approaches, and why context control matters most.

  • The hard part of building reliable agents is controlling the context passed to the LLM at each step.
  • Agentic systems include both workflows and agents; most production systems are a mix.
In-site article

Promptim: an experimental library for prompt optimization

Promptim is an experimental library that automates prompt optimization by iteratively refining prompts using datasets and evaluators, aiming to save time and improve AI system performance.

  • Automates prompt engineering through evaluation-driven optimization loops.
  • Supports human-in-the-loop feedback via LangSmith's annotation queues.
In-site article

Improving Memory Retrieval: How New Computer achieved 50% higher recall with LangSmith

New Computer used LangSmith to improve their AI memory retrieval system, achieving 50% higher recall and 40% higher precision, by tracking regressions and adjusting conversation prompts.

  • New Computer achieved 50% higher recall and 40% higher precision in memory retrieval using LangSmith.
  • Dot's agentic memory system dynamically creates and retrieves memories using various techniques.
In-site article

Evaluating Deep Agents: Our Learnings

Learn 5 patterns for evaluating deep agents: bespoke testing, single-step validation, full turns, multi-turn simulations, and environment setup.

  • Deep agents require bespoke test logic per datapoint with custom success criteria.
  • Single-step evals are efficient for validating decisions in specific scenarios.
In-site article

Show HN: Offline AI assistant for Android (PDFs, Wikipedia, more)

Eva is a fully offline AI assistant for Android. It includes chat, offline maps, music player, document reader, image gallery, and more—all running on-device with no cloud dependency.

  • 100% offline: model, data, and processing all on-device
  • Supports local PDF, Word, Excel, and EPUB indexing and retrieval
In-site article

First Steps Toward Automated AI Research

Recursive releases early results from its automated AI research system, achieving state-of-the-art performance on fixed-budget language model training, small-model training speed, and GPU kernel optimization. The system automates the research loop: proposing, implementing, experimenting, validating, and iterating. On NanoChat, it achieved 0.9109 BPB, surpassing community solutions; on NanoGPT Speedrun, it reduced training time to 77.5 seconds; on SOL-ExecBench, it reached 0.754 SOL score. The system discovered innovations including hash-table n-gram embeddings and byte-level features.

  • Recursive's automated AI research system achieves SOTA on three benchmarks
  • System automates full research loop from idea to validation
In-site article

Show HN: AI traders you author, argue with and coach

Degen & Co. is a platform where you can create AI investors with distinct personalities, such as momentum degens, dividend grandpas, or doom-saying perma-bears. Each AI trader forms its own opinions, places bets, and defends them in a journal. Users can choose archetypes, tweak personalities, set hard rules, and define initial portfolios. Paper money, real conviction.

  • Create AI traders with unique personalities like FOMO traders or dividend conservatives.
  • AI traders form independent opinions, trade, and write journals defending their decisions.
In-site article
Policy

The Anthropic Fable mess, explained

The Anthropic-Mythos-Fable story has been The Topic since Friday, and it moved fast enough to lose anyone who blinked. Here’s my opinionated tick-tock of what happened, who’s calling Anthropic the good guy, and who’s calling it the bad guy. Where I land: Anthropic mostly got this one right, and it’s one hell of an ad for Fable.

  • Anthropic's dispute with the DoD over AI model usage led to it being labeled a supply-chain risk.
  • Mythos model's cyber capabilities prompted Project Glasswing; White House and Anthropic clashed over access expansion.
In-site article

IBM and Norway's sovereign fund CEO: Is AI a bubble?

A YouTube video page featuring a discussion between IBM and the CEO of Norway's sovereign fund on whether AI is a bubble.

  • Video title suggests AI bubble debate
  • Involves IBM and Norway's sovereign fund CEO
In-site article
Models

Trump’s Anthropic shutdown just made the case for non-American AI

Over the weekend, at Washington's request, Anthropic abruptly took its newest and most powerful AI models offline. The US company said it had little choice after the White House demanded it block access for all foreign nationals, including its own employees. Abroad, the incident served as a sobering reminder that the US not only dominates frontier AI but its government also wields power over who gets to use it. The Trump administration's action was swift, sweeping, and imposed with little warning or explanation. The unprecedented shutdown of the Fable 5 and Mythos 5 models—already subject to safeguards limiting their use in high-risk areas—gave new force to arguments cautioning against relying on the US for critical technologies. In the UK, AI minister Kanishka Narayan used the shutdown to argue for British AI capacity as a national security matter. In France, former Prime Minister Gabriel Attal called it the start of "the AI war" and likened it to Iran's blockade of the Strait of Hormuz. Canadian Prime Minister Mark Carney warned against overreliance on one partner. The incident has fueled global calls for AI sovereignty.

  • Anthropic took Fable 5 and Mythos 5 offline at the White House's request, blocking foreign access including its own non-US employees.
  • The shutdown sparked international backlash, with UK, France, and Canada urging domestic AI development to reduce reliance on the US.
In-site article

Building a 100x Cheaper Trace Judge with Fireworks

LangChain and Fireworks fine-tuned an open model to mine perceived error signals from production traces, matching frontier model performance at a fraction of the cost.

  • LangSmith processes billions of tokens daily across production traces.
  • Fine-tuned Qwen model detects 'Perceived Error' at frontier performance with 100x cost savings.
In-site article

Introducing Align Evals: Streamlining LLM Application Evaluation

Align Evals is a new feature in LangSmith that helps you calibrate your evaluators to better match human preferences.

  • Align Evals reduces mismatches between LLM evaluator scores and human judgment.
  • Provides a playground-like interface and baseline alignment score for iterative prompt improvement.
In-site article

Pairwise Evaluations with LangSmith

Learn what pairwise evaluation is, why you might need it for LLM app development, and see an example of how to use it in LangSmith by LangChain.

  • Pairwise evaluation compares two LLM outputs directly to better capture human preferences.
  • LangSmith introduces custom pairwise evaluators for flexible comparison based on any criteria.
In-site article

Quickly Start Evaluating LLMs With OpenEvals

OpenEvals and AgentEvals provide pre-built evaluators for LLM-as-judge, structured data, and agent trajectory evaluation. These open-source packages help developers quickly establish evaluation workflows to ensure reliability of LLM applications.

  • OpenEvals and AgentEvals offer ready-to-use evaluators covering LLM-as-judge, structured data, and agent trajectory evaluation.
  • LLM-as-judge evaluators are customizable with few-shot examples and scoring schemas, suitable for conversational quality, hallucination detection, and more.
In-site article

Aligning LLM-as-a-Judge with Human Preferences

LangSmith introduces self-improving LLM-as-a-Judge evaluators that leverage human corrections as few-shot examples to align evaluations with human preferences without prompt engineering.

  • LLM-as-a-Judge evaluators are popular for grading natural language outputs but require careful prompt engineering.
  • LangSmith's new feature stores human corrections as few-shot examples to improve evaluator alignment over time.
In-site article
Chips

Big Tech’s desperate last push at AI regulation

Big Tech is pushing for federal AI preemption to override patchwork state laws, but the effort is now tied to a child safety bill, creating political chaos and uncertain prospects.

  • Tech giants seek federal AI preemption, facing political backlash and time constraints.
  • White House links preemption to KOSA, a child safety bill, causing confusion.