OpenRouter's Jacky Liang ran an experiment dropping 11 LLMs into a 2D battle royale game. Grok 4.1 Fast won 43% of matches at $0.97 per win, while Claude Sonnet 4.6 won 5 matches at $26.78 per win, revealing alignment tax and cost-effectiveness differences.
Grok 4.1 Fast won 13 of 30 games at $0.97 per win, the most cost-effective model.
Claude Sonnet 4.6 showed excessive cooperation, winning 5 games but costing 27.7x more per win than Grok.
An experiment pitting 11 LLMs in a 2D battle royale reveals that Grok 4.1 Fast dominates with the lowest cost per win, while Claude Sonnet 4.6 suffers from excessive cooperation. The findings highlight the impact of alignment tax on performance and the inadequacy of traditional benchmarks in predicting real-world task success.
Grok 4.1 Fast won 13 of 30 games at a cost of $0.97 per win.
Claude Sonnet 4.6 won 5 games but cost $26.78 per win, 27x more than Grok.
OpenRouter introduces guardrails for workspaces, a set of configurable security and governance tools for budget enforcement, zero data retention, model/provider restrictions, prompt injection defense, and data loss prevention. Guardrails can be assigned to API keys or team members, allowing granular control without code changes.
Budget enforcement with daily, weekly, or monthly spending limits per entity.
Zero data retention and model/provider allow/block lists.
OpenRouter announces a $113M Series B round led by CapitalG, with participation from NVentures, ServiceNow Ventures, and other strategic investors. Weekly volume has grown from 5 trillion to 25 trillion tokens, serving 8M+ developers across 400+ models. Funds will be used to scale infrastructure, enhance enterprise capabilities, and advance intelligent routing.
OpenRouter raised $113M in Series B funding led by CapitalG. Other investors include NVIDIA, ServiceNow, MongoDB, Snowflake, and Databricks.
Weekly token processing grew from 5 trillion to 25 trillion tokens in six months, on track to process over a quadrillion tokens this year.
OpenRouter launches @openrouter/agent, a model-agnostic TypeScript SDK that simplifies building agentic loops with tool execution, multi-turn loops, stop conditions, streaming, cost tracking, and tool approval.
@openrouter/agent SDK packages agent loop logic into a single callModel function, supporting 300+ models.
Key features include tool execution, multi-turn loops, composable stop conditions, and streaming.
OpenRouter introduces two skills for building agent harnesses: create-agent-tui for terminal UIs and create-headless-agent for headless agents. Both generate complete TypeScript projects using the Agent SDK, offering customizable features and integration with any OpenRouter model. The skills provide fine-grained control, minimal deployment, and educational value. Key features include interactive checklists, customizable UI, session persistence, safe retry, and structured output.
OpenRouter released two scaffolding skills for agent harnesses: create-agent-tui (terminal UI) and create-headless-agent (headless).
Both skills use the Agent SDK to handle the agentic loop, tool execution, streaming, and cost tracking.
Anthropic's new tokenizer for Claude Opus 4.7 increases token consumption, resulting in 12-27% cost increases for prompts over 2K tokens, but short prompts get cheaper due to shorter completions. OpenRouter analyzed millions of requests to provide real-world data.
Tokenizer inflation of 32-45% for equivalent text, mostly absorbed by caching for long prompts.
Actual cost per million tokens up 12-27% for prompts >2K, down 1.6% for <2K.
OpenRouter's April releases include video generation, workspaces for multi-project isolation, a TypeScript SDK that turns any model into an agent, reranker models, model fusion, prompt history, benchmarks, knowledge cutoff dates, and new frontier models like GPT-5.5 and DeepSeek V4 Pro.
Video generation API supporting Seedance 2.0, Veo 3.1, Wan 2.7, Sora 2 Pro, with async job tracking and capability discovery.
Workspaces for environment isolation with separate API keys, routing defaults, guardrails, and observability per workspace.
OpenRouter introduces response caching that allows developers to cache identical API requests, returning responses in milliseconds with zero token billing. The cache sits in front of providers, hashing request details. It supports streaming and non-streaming, works across multiple endpoints, and includes features like TTL control and cache busting. Use cases include agent retries, test suites, and repeated prompts.
Add X-OpenRouter-Cache: true header to cache identical requests; first call billed normally, subsequent calls free.
Cached responses return in 80-300ms (cache lookup ~4ms) vs. seconds for typical uncached requests.
OpenRouter introduces two dedicated audio endpoints for text-to-speech and speech-to-text, offering faster and more cost-efficient models from providers like OpenAI, Google, and Mistral.
New /api/v1/audio/speech and /api/v1/audio/transcriptions endpoints.
Speech models include GPT-4o Mini TTS, Gemini Flash TTS, Voxtral Mini TTS.
The Agent SDK now supports human-in-the-loop (HITL) tools, allowing agents to automatically handle routine calls and pause for human input when stakes are high, all controlled by a single hook.
HITL tools use an onToolCalled hook that returns a value to continue or null to pause for human decision.
An optional onResponseReceived hook transforms human responses before the model sees them, supporting metadata stamping, format normalization, and business rule validation.