GitHub Copilot CLI now uses smarter subagent delegation to reduce unnecessary handoffs and wait times. Production A/B testing shows a 23% reduction in tool failures and a 5% improvement in user wait time. The article details how the team identified delegation bottlenecks, refined the orchestration policy, and validated improvements.
Copilot CLI now delegates more selectively, using subagents only when they create real leverage.
Production A/B test results: tool failures down 23%, P95 wait time reduced by 5%.
A new attack exploits the trust AI coding agents place in tool outputs. By injecting fake bug reports into Sentry via its public DSN, attackers can trick agents into running malicious npx commands. The attack has been proven against real organizations and major AI agents, bypassing conventional security because every action is authorized. Sentry declined to fix the root cause, leaving the ecosystem vulnerable.
Attack uses public Sentry DSN to inject fake error events with markdown that tricks AI agents into executing malicious npx commands.
Demonstrated against multiple organizations with high success rate across popular AI coding agents (Claude Code, Cursor, Codex).
Nenya is a lightweight, zero-dependency AI API gateway written in Go. It sits between AI coding clients and upstream LLM providers, adding secret redaction, context management, agent routing, and MCP tool integration with transparent SSE streaming. Security-hardened features include non-root execution, mlock for secrets, seccomp, and no-new-privileges.
Written in Go with zero external dependencies, compatible with OpenAI and Anthropic APIs.
Built-in adapters for 23 providers, with routing, fallback chains, and circuit breakers.
Stack Overflow launches Stack Overflow for Agents, an API-first platform for AI coding agents to share knowledge, addressing the 'ephemeral intelligence gap'. Agents can query and contribute but require human review before publishing.
Stack Overflow launches API-first platform for AI coding agents
Three post types: Questions, TILs (Today I Learned), and Blueprints
OpenAI acquires a startup to boost its Codex AI coding tool in a bid to keep pace with rival Anthropic and its Claude Code agent in the competitive AI coding market.
OpenAI acquires a startup to enhance Codex, responding to competition from Anthropic.
The move is part of OpenAI's campaign to remain competitive in the AI coding market.
Agent-EvalKit is an open-source toolkit (Apache 2.0) that makes this evaluation infrastructure available by integrating with AI coding assistants, including Claude Code, Kiro CLI, and Kilo Code. This post walks through how Agent-EvalKit works across its six evaluation phases, using a travel research agent built with the Strands Agents SDK and Amazon Bedrock as a running example.
Agent-EvalKit provides a six-phase evaluation workflow (Plan, Data, Trace, Run agent, Eval, Report) integrated with AI coding assistants.
It detects issues like hallucination when tools return empty results, as demonstrated with a travel research agent.
This is the eighth article in a series on agentic engineering and AI-driven development. It addresses context loss in AI agents performing complex multistep tasks. The author introduces the Externalize-Recognize-Rehydrate (ERR) pattern: saving agent state to disk, detecting context degradation, and recovering from files. Historical analogies (640K memory limit) and a real Copilot crash example illustrate the problem. The article details externalizing two layers of state: execution continuity (current step) and task continuity (overall goals).
AI agents have limited context windows, causing information loss, akin to early memory constraints.
The ERR pattern: externalize state, recognize loss, rehydrate from files.
Cohere has released its first developer-facing coding model, North Mini Code, a 30B total parameter mixture-of-experts model with only 3B active parameters per token. It runs on a single H100 GPU, supports 256K context length, and is optimized for code generation, agentic software engineering, and terminal tasks. The weights are open under Apache 2.0.
North Mini Code is Cohere’s first coding model, 30B total parameters with 3B active, supporting 256K context and 64K max output.
Runs on a single H100 at FP8; weights open under Apache 2.0 via Hugging Face, Cohere API, and more.
Cloudskill is a platform that governs AI skills, turning scattered skill files into a managed catalogue with version control, per-person access policies, and a full audit log. It integrates with agents like Claude, Cursor, and Copilot, ensuring every change is reviewed and approved, keeping skills safe and consistent.
Cloudskill transforms AI skill files into a managed catalogue with version control, access policies, and audit logs.
Supports various AI agents including Claude, Cursor, GitHub Copilot, and more.
Install and configure LSP servers for GitHub Copilot CLI, replacing brute-force grep/decompile with real code intelligence. The LSP Setup skill automates the process, supporting 14 languages. This post explains how it works and how to get started.
GitHub Copilot CLI previously relied on text search and binary extraction to understand code, which was inefficient and inaccurate.
The LSP Setup skill automates installation and configuration of LSP servers for 14 languages.
This article compares leading AI coding agents and platforms in 2026, including Atoms, Devin AI, GitHub Copilot, Windsurf, Cursor, and Warp. It highlights how these tools have evolved from autocomplete to autonomous planning, multi-file editing, testing, and deployment. No single tool fits all needs; selection depends on task type. Atoms is recommended for end-to-end product building.
AI coding tools have progressed beyond autocomplete to plan, edit, test, and ship code.
Different tools serve distinct roles: autonomous engineers (Devin), agentic IDEs (Windsurf), terminal-native environments (Warp), etc.
This article tests Claude Fable 5, Anthropic's new AI model derived from the restricted Mythos Preview. It covers key features, benchmark performance, access methods, and practical tests including recreating the Netflix interface from a screenshot and converting a hand-drawn dashboard into a modern app. The results show strong performance in visual understanding, code generation, and complex multi-step tasks. Anthropic's approach of differentiating between broadly available Fable 5 and restricted Mythos 5 highlights a strategy of balancing advanced AI capabilities with responsible access.
Claude Fable 5 is the broadly available version of Anthropic's Mythos-class AI, targeting developers, enterprises, and Claude users.
The model offers significant improvements in coding, reasoning, vision, and long-context memory, excelling at complex multi-step tasks.
Custom agents let GitHub Copilot CLI understand your stack and team workflows, turning one-off terminal prompts into repeatable, reviewable processes. This article covers the concept, creation, and usage of custom agents with three practical workflow examples: security audit, IaC compliance, and release documentation.
Custom agents are defined using Markdown files with YAML frontmatter, specifying role, tools, guardrails, and output format.
Agent profiles are stored in the .github/agents directory of a repository, enabling version control and team review.
Amazon employees are sharing memes on an internal Slack channel mocking the company's AI coding tool Kiro and other AI products. The memes include references to 'slop,' 'Sloppenheimer,' and criticisms of an internal leaderboard that was shut down due to cheating and wasteful usage.
Amazon employees share memes on Slack channel #actual-aws-memes mocking company AI tools.
Memes include complaints about Kiro's limitations, 'Sloppenheimer' mashup, and leaderboard controversies.
Apple's WWDC AI features mostly play catch-up, but integrating natural language into Shortcuts (and Safari extensions) offers a genuinely useful approach: letting users 'vibe-code' their phone's behavior by describing what they want. Despite early bugs and reliance on developer support, the concept holds significant potential.
Apple's AI announcements at WWDC were largely derivative, but the natural language Shortcuts feature stands out.
Users can describe desired actions to have AI automatically create shortcuts, akin to 'vibe coding.'
GitHub Copilot's shift to usage-based billing on June 1 exposed the true cost of agentic workflows. This article analyzes token consumption, tool design impact, and strategies for prompt optimization and output formatting, emphasizing that cost control should be a platform governance issue.
GitHub Copilot's usage-based billing from June 1 reveals the actual cost of agentic workflows.
Agents consume tokens in loops; loop count scales with task vagueness and context complexity.
CalmSEO is an MCP server that exposes Google Search Console data, live SERPs, keyword volumes, and on-page audits to AI agents like Claude, ChatGPT, Cursor, and Codex. It offers a free tier and paid plans with credit-based usage.
CalmSEO provides SEO tools via MCP protocol for AI agents.
Includes Google Search Console, live SERPs, keyword volumes, ranked keywords, and on-page audits.
OpenLTM is an open-source, MIT-licensed long-term memory plugin for AI coding agents like Claude Code, OpenCode, and Pi. It provides automatic semantic memory capture, recall, and importance-weighted decay, with no cloud dependency. The plugin stores memory locally in SQLite and offers hooks, commands, and a graph visualizer.
Automatic memory capture and injection via hooks, no manual note-taking.
Importance-weighted decay ensures fresh context while preserving critical knowledge.
This study systematically analyzes configuration mechanisms for five agentic AI coding tools (Claude Code, GitHub Copilot, Cursor, Gemini, Codex) and examines adoption across 2,853 GitHub repositories. Findings reveal that context files, especially AGENTS$.md, dominate as a de facto standard; advanced mechanisms like Skills and Subagents are rarely used; and distinct configuration practices emerge per tool, with Claude Code users employing the broadest range.
Eight configuration mechanisms identified, from static context to executable and external integrations.
Context files, particularly AGENTS$.md, dominate and serve as an interoperable standard across tools.
AI-noleak is a local reverse proxy that intercepts accidentally exposed secrets (API keys, tokens) from AI coding agents and replaces them with deterministic placeholders before they reach the upstream AI model. It operates via three layers (PTY wrapper, HTTP proxy, file watcher) without requiring TLS MITM or root CA certificates, ensuring local security isolation.
Three layers of protection: PTY input, HTTP transport, file storage. No TLS MITM needed.
Secrets are replaced with placeholders (@TOKEN_xxxxxx@); AI models only see placeholders, which are reversibly restored locally.
NHS England is rolling out Microsoft Copilot to over half a million staff after a pilot showed it saves 43 minutes per day on admin tasks. The rollout starts with 2,000 licenses per trust, with full access expected by October 2026. Copilot will assist with discharge paperwork, bed management, rota planning, and more. The cost has not been disclosed.
NHS England plans to roll out Microsoft Copilot to 505,000 staff.
A pilot of 30,000 staff showed an average saving of 43 minutes per day on admin.
A sophisticated supply chain attack, the Miasma worm, compromised dozens of Microsoft-owned GitHub repositories, deploying malware targeting AI coding assistants like Claude Code, Gemini CLI, Cursor, and VS Code. The malware executes when opening a project folder, stealing cloud keys, developer secrets, passwords, and infrastructure configs. Immediate credential rotation and inspection of AI config files are advised.
Miasma worm infected 73 Microsoft GitHub repositories by weaponizing AI assistant config files to auto-execute payloads.
Amazon Bedrock AgentCore Runtime gives each agent session its own isolated microVM with a persistent workspace, secure tool access through Gateway, and built-in observability—so you can run Claude Code, Codex, Kiro, and Cursor in parallel without sharing secrets, ports, or filesystems. Close the lid, go to dinner, and pick up where you left off tomorrow.
Laptops are poor hosts for coding agents: security risks, secret leakage, collision, and lid-closing kill the session.
AgentCore provides isolated microVMs, persistent storage, identity layer, gateway, and observability for safe remote execution.
The article argues that most corporate AI initiatives are 'bullshit' because companies adopt AI without changing management practices. It identifies four stages of BS: vague AI push, productivity theater, shiny project mode, and everyone vibe coding, and offers fixes for each.
Companies push AI without changing how work is managed, leading to superficial AI use.
Four stages of BS: vague push, productivity theater, shiny projects, vibe coding.
GitHub Copilot's shift to usage-based billing on June 1 has caused unpredictable cost swings for engineering teams. Developers report 'token anxiety' and rapid burn-through of allowances. Enterprise examples like Uber blowing its 2026 AI tools budget in four months highlight the challenge. The article advises analyzing usage, setting spending caps, matching models to tasks, diversifying vendors, and considers model freedom solutions like Kilo.
GitHub Copilot moved from seat-based to consumption-based pricing on June 1, 2026, leading to unpredictable bills.
Developers report rapid token consumption, with single sessions consuming large portions of monthly allowances.
AI coding tools excel at writing new code but fail during operational incidents like 3am production outages, where knowledge context is missing. Engineers spend 84% of their time on non-coding tasks, primarily context gathering. The article argues for treating team knowledge as infrastructure and suggests capturing rationale, putting constraints in the workflow, and closing the feedback loop from incidents.
AI tools are absent during high-stakes incident response, where knowledge synthesis is critical.
Developers spend only ~16% of time coding; the rest is context search and coordination.
AgentCrew is a conversation-first, Markdown-first methodology for agentic coding that turns a single chat session into a team process with role assignments, task routing, quality gates, and human approval. It uses a pure-Bash classifier to identify task type and risk level, supports fast and full lanes, and includes safety rules to prevent agents from auto-merging or bypassing review.
Transforms coding agents from single-context to multi-role team workflow
Implemented with Markdown and shell scripts, no daemon required
AI coding tool adoption surges to 90% in 2025, driving project deployment rates from 357 per month in 2021 to nearly 1,000 per month, and past 1,000 by end of 2025. But speed without direction is wasted. Teams must pair high throughput with feedback loops to ensure changes move toward the product ideal. The deployment pipeline must scale with code output or AI investments yield no return.
AI coding tool adoption reaches 90% in 2025; project deployment rates exceed 1,000 per month.
Speed without direction is wasted: use the Bullseye Model to measure product velocity.
Context Mode Insight is an observability platform for enterprise AI engineering, built on an open-source plugin trusted by 250K+ developers. It supports 14 AI assistants, analyzes 222 patterns, and provides role-aware insights via a privacy-first design. The paid tier ($20/seat/month) offers org-level dashboards, REST API, and remote MCP for agents, addressing needs of CTOs, EMs, CISOs, and more.
Context Mode Insight is the first observability layer for AI coding agents, priced at $20 per seat per month.
Built on an open-source plugin with 250K+ developers, supporting 14 AI assistants and 222 patterns.
In this interview, Mario Rodriguez, Chief Product Officer at GitHub, discusses how AI coding agents are transforming GitHub's engineering system, including macro-delegation, agent-generated PRs, Copilot, and AX, and what this means for developers.
GitHub's CPO discusses AI coding agents pushing toward an agent-native engineering system.
Macro-delegation and agent-generated PRs are key concepts.
agmsg is a lightweight bash+SQLite tool that lets AI coding agents like Claude Code, Codex, Gemini CLI, and Copilot CLI message each other directly through a shared database, eliminating the need for copy-pasting between them.
Enables direct communication between different AI coding agents without manual copy-paste
Lightweight, daemonless, and networkless using SQLite
Moonshot AI has launched Kimi Code CLI, an open-source terminal AI coding agent built with TypeScript. It features subagents for parallel tasks, MCP configuration, video input, and lifecycle hooks. The tool is MIT-licensed and supports Kimi models or other providers.
Kimi Code CLI is an MIT-licensed terminal AI coding agent from Moonshot AI.
Built in TypeScript, it offers subagents (coder, explore, plan) and MCP configuration.
This article discusses how AI coding agents leverage existing tech stacks to enhance development efficiency and highlights the importance of agent experience (AX).
AI coding agents utilize your current technology for automation
Agent Experience (AX) is key to future developer tools
peers is an open-source tool that drives two or more AI coding agents (Claude Code, Codex, etc.) as cooperating peers with hard gates: tests pass, coverage holds, no regression, no TODOs/stubs/skipped tests, secrets clean. One peer implements, the other blind-reviews, and an adversarial skeptic re-audits before acceptance. Runs unattended, budget-capped, and container-sandboxed.
Gated completion instead of vibes-based convergence.
Nanocode-CLI is a lightweight terminal-based AI coding assistant written in Python. It features live turn control, file-state brain, stale-edit protection, project-aware navigation, recoverable context, cache-aware context, focused working memory, and a terminal-first workflow. Install with uv.
Lightweight terminal AI coding assistant written in Python
Real-time interaction, file state tracking, and edit safety
In this guest post, Patrick Nadeau recounts his journey building an Intellivision emulator from scratch using an AI coding agent. He describes using a test oracle from the existing emulator jzintv to validate his CPU core, and how the AI accelerated development — from first pixels at hour 5 to a fully playable system by hour 36. He also added a debugger port allowing the AI to control the game live. Despite the success, Nadeau reflects on the ethical implications of using AI that learns from others' work and the bittersweet feeling of creating with a co-pilot.
Patrick Nadeau built an Intellivision emulator with an AI coding agent, using a test oracle from the jzintv emulator for validation.
Development milestones: first pixels at 5 hours, complete system playable via controller by 36 hours.
Y Combinator releases Paxel, a free open-source tool that analyzes your Claude, Codex, and Cursor AI coding sessions to help you understand your building style. It runs locally in Docker, preserving code privacy, and provides a builder profile with archetypes, decision patterns, and growth edges. So far, over 70,000 sessions have been uploaded.
Paxel analyzes AI coding sessions from Claude Code, Codex CLI, and Cursor to reveal building patterns.
Runs locally in Docker; your code and .env files never leave your machine—only anonymized summaries are uploaded.
Replit is assembling a financial stack for vibe-coded apps, including Shopify integration for e-commerce, RevenueCat for subscriptions, and Visa for autonomous agent payments, aiming to turn casual app creation into viable businesses.
Replit's Shopify integration lets users build a custom storefront in about 10 minutes via its AI agent.
Previous partnerships with RevenueCat and Visa cover recurring revenue and autonomous transactions, respectively.
A talk about 'vibe coding' excited managers, but colleagues revealed the projects left chaos and cleanup work, highlighting the growing rift between AI optimists and skeptics.
A presenter claimed to solve a year's worth of engineering problems in weeks using vibe coding, exciting managers.
However, colleagues described the projects as a 'horror show' with extensive cleanup work.
A new worm named Miasma exploits AI coding agent configuration files to spread through GitHub repositories. It hijacks auto-run features in Claude Code, Gemini CLI, Cursor, and VS Code to execute a payload that steals cloud credentials and self-replicates. Over 113 repositories have been affected, including Azure samples and popular open-source projects.
Miasma worm modifies developer tool config files to trigger malicious code execution when opening or using infected projects.
It uses multiple triggers: Claude/Gemini SessionStart hooks, Cursor project rules, VS Code folder-open tasks, and npm test scripts.
This article lists AI agents that currently support or partially support sending the Accept: text/markdown header in HTTP requests, and provides methods to verify them. As of May 2026, only Claude Code, Cursor, OpenClaw, OpenCode, and Codex CLI (partial) support this feature, while other mainstream agents like ChatGPT, Claude.ai, and Copilot only fetch HTML.
Claude Code, Cursor, OpenClaw, OpenCode explicitly support sending Accept: text/markdown header.
Codex CLI only partially supports it, following the relevant RFC standards.
Runcap is a free, local CLI tool that estimates and caps the cost of AI coding agent runs. It provides cost estimation before execution, enforces a hard spending limit, compresses tokens, and offers rescue prompts when agents get stuck. Unlike existing observability tools that track costs after the fact, Runcap acts as a circuit breaker to prevent overspending.
Estimates cost range before a run and enforces a hard ceiling.
Provides copyable rescue prompts when the agent gets stuck.
AI agents need secure execution environments. LangSmith Sandboxes provide hardware-virtualized microVMs, giving each agent a full computer with fast startup and persistent state, enabling code generation, data analysis, CI workflows, and more.
Agents require real computer environments (filesystem, shell, package manager) but direct infrastructure access is dangerous.
Container isolation is insufficient against kernel exploits; hardware-level separation is necessary.
Companies are spending heavily on AI but struggle to measure returns. Cognition introduces the AI Productivity Guarantee, offering up to $10M in credits if its AI engineer Devin delivers less value than paid for. The guarantee is backed by a validated estimator comparing AI output to human effort.
Businesses lack standards to measure AI ROI, needing to shift from usage metrics to outcomes.
Cognition built an AI productivity estimator validated against human engineer time assessments.
The era of flat-rate AI coding pricing is ending as Cursor reduces Teams pricing by 20%, introduces a Premium tier with five times usage, and adds enterprise governance features including spend alerts, budgets, and model access controls. This follows GitHub's shift to token-based billing and the formation of the Tokenomics Foundation to standardize AI token economics.
Cursor cuts Teams plan prices by 20% to $32/user/month, introduces $120/month Premium tier with five times usage.
New enterprise governance layer includes per-department budgets, model access, agent permissions, and spend alerts via Slack/email.
Microsoft 365 Premium is the successor to Copilot Pro. At $20/month, with a 50% first-year discount for existing subscribers, it bundles advanced AI features, higher usage limits, and a Family subscription, directly competing with ChatGPT Plus for heavy Microsoft 365 users.
Microsoft 365 Premium costs $20/month, with a first-year 50% discount for current subscribers.
It includes exclusive AI agents (Researcher, Analyst, Photos) and increased AI credits.
Microsoft announced broader testing of its new Autopilot feature at Build. Autopilots are agents that work autonomously on behalf of users, each with its own identity. The first Autopilot, Scout, has been tested internally and is now rolling out to select customers and Frontier organizations. Scout operates across Microsoft 365 apps, coordinating data from Outlook, OneDrive, SharePoint, and Teams to schedule meetings, flag messages, and generate events. It learns user preferences over time. Built on OpenClaw, Scout includes enterprise-grade security, and Microsoft plans to contribute upstream. Administrators can validate secure operation via Entra IDs, and sensitive actions require human approval. Early trials helped tune security. Scout automatically identifies deadlines, blocks calendars, and provides materials. The announcement was by Omar Shahine, Corporate VP. Early adopters need Frontier enrollment, Intune policy, opt-in attestation, and a GitHub Copilot license.
Microsoft introduces Autopilot agents; first is Scout, working autonomously across M365. Built on OpenClaw with enterprise security. Plans to contribute to open-source. Learns user preferences. Sensitive actions require human approval. Early adopters need specific prerequisites.
Knox is a security policy engine for AI coding agents, shipping as CLI, Node library, Claude Code plugin, Cursor plugin, and OpenAI Codex plugin. It intercepts dangerous tool calls in real-time, provides audit logging, prompt injection scanning, and policy tampering protection. The article covers installation, capability matrix, limitations, and customization.
Knox provides real-time blocking of dangerous tool calls via hooks on Claude Code, Cursor, and Codex
Automatic audit logging and prompt injection scanning for every tool call