Agents AI News

Agents updates

AI shorting penny stock based on human psychology

2026-07-12 21:03 UTC

Fade Engine is a fully autonomous AI that shorts overextended small caps on a live $10,000 simulated account, posting every trade publicly. It scans 12,000+ tickers every five minutes, identifies 18 pump patterns, and closes all positions by market close. No human intervention.

Fade Engine is an autonomous AI that shorts small-cap pumps using 18 predefined patterns
It trades a simulated $10,000 account in real time, with all trades public

A SETI Home for AI-Assisted Research

2026-07-12 20:45 UTC

The article proposes crowdsourcing unused AI inference tokens for scientific research, drawing parallels to SETI@home. It highlights recent successes by small teams using AI to solve math problems and discusses the design challenges of such a platform.

SETI@home pooled idle home computer power for extraterrestrial signal analysis.
Today, AI users could donate unused token allowances to collective research.

Guide to Loop Engineering: How 'autoresearch' and 'Bilevel Autoresearch' Turn AI Agents Into Autonomous Machine Learning ML Research Loops

2026-07-12 20:07 UTC

This guide explains loop engineering, where AI agents autonomously iterate toward a goal using a verifier, state, and stop condition. It details Andrej Karpathy's autoresearch loop and Bilevel Autoresearch, showing concrete results: autoresearch found 20 improvements from 700 experiments, cutting GPT-2 training time by 11%; Bilevel Autoresearch added an outer meta-loop for a 5x larger val_bpb drop. It also provides reusable building blocks and a hands-on template.

Loop engineering replaces manual prompting with autonomous loops that include a verifier, state, and stop condition.
Karpathy's autoresearch ran 700 experiments overnight, yielding 20 improvements and an 11% speedup on GPT-2 training.

AI's memory. On your machine, under your control

2026-07-12 19:44 UTC

exxperts is a local-first agentic runtime that provides persistent AI rooms with governed, approval-gated memory. Everything runs locally as files on your disk, ensuring privacy and control. It offers both a web app and a CLI/TUI interface.

exxperts provides persistent AI rooms with approval-gated memory, giving users full control over their AI's memory.
Everything runs locally on your machine, with all data stored as plain files under ~/.exxperts.

Show HN: Kote – Capture and reuse engineering context from AI chats and Git

2026-07-12 18:56 UTC

Kote is an open-source tool that automatically captures developer conversations with AI assistants, Git commits, and development context, building a searchable knowledge base to help developers recall past technical decisions and solutions. It supports VS Code extension, GitHub integration, CLI, browser extension, WhatsApp/Telegram messaging, and self-hosted deployment.

Kote passively captures AI sessions, Git activity, and other context, organizing them into a knowledge base.
VS Code CodeLens shows file-related notes with AI summaries and timelines.

The One-Step Trap (In AI Research)

2026-07-12 18:41 UTC

The one-step trap is a common mistake in AI research where researchers assume that learned predictions can be mostly one-step, with longer-term predictions generated by iterating them. While appealing, this approach suffers from error accumulation and exponential computational complexity, making it impractical. Rich Sutton argues for temporally abstract models using options and GVFs as a solution.

Iterating imperfect one-step predictions causes errors to compound, leading to poor long-term predictions.
Computational complexity grows exponentially with prediction horizon in stochastic settings.

Against Usefulness

2026-07-12 17:47 UTC

This essay explores the critical role of 'useless' research in enabling future innovations. Using Folk Computer as a case study, the author traces a lineage from Xerox PARC to Dynamicland, and argues for funding paradigm-level work before it becomes useful.

Folk Computer is an open-source physical computing system that turns the room into a computer.
The system's lineage includes Alan Kay, Bret Victor, CDG, and Dynamicland.

GPT-5.6, Fable 5, and Grok 4.5 rebuild Basecamp from the same spec

2026-07-12 17:02 UTC

The author evaluated GPT-5.6 Sol, Fable 5, Grok 4.5, and other AI models on a benchmark called Basecamp Bench, testing their ability to build a frontend and backend from the same specification. Fable 5 won both tracks, while Grok 4.5 offered the best speed-cost tradeoff. Results show significant differences in polish and completeness, especially in the final 10% of work.

Fable 5 scored highest on both frontend and backend, closely matching the real Basecamp implementation.
Grok 4.5 completed the build in 37 minutes at a cost of $9.30, offering the best speed and cost tradeoff.

OpenAI's AI Beating Every Human at AtCoder

2026-07-12 16:54 UTC

OpenAI's AI agent solved all five problems in the AtCoder Algorithm Division for 8,300 points; the top human scored 4,300. No human solved problems C or E. In the Heuristic Division, AI scored more than seven times the best human result. The 600,000-yen 'Humanity Prevails Award' went unclaimed. The system was described as comparable to GPT-5.6.

OpenAI's AI solved all five problems, scoring 8,300 vs top human 4,300
No human solved the hardest problems C and E

Show HN: AI Photo Editor – Professional-Grade Image Editing with Text Prompts

2026-07-12 15:56 UTC

AI Photo Editor is a free online tool powered by Nano Banana and GPT Image 2 models, enabling professional-grade image editing via simple text prompts. Features include 95% first-try success, sub-second generation, face reconstruction, and character consistency. Various subscription plans with commercial licenses and enterprise-grade security (SOC 2, GDPR, ISO 27001). No credit card required to start.

Edit images using natural language prompts with 95% first-try success.
Generate images in under 1 second, 10x faster than traditional AI models.

Show HN: Itara – Distributed system topology as an explicit, executable layer

2026-07-12 14:58 UTC

Itara is an open-source project that makes distributed system topology explicit by separating it into a dedicated configuration layer. It uses a wiring agent that reads a config file at startup, resolves all connections, and wires components together before the application runs at full speed. The tooling validates topologies before deployment and provides observability through four key events. It supports incremental adoption and cross-language interop (Java, Rust, and more planned).

Itara treats topology as a first-class concern via a single wiring config file. The wiring agent sets up connections at startup and then steps aside, adding no runtime overhead.
It enables transport switching (e.g., direct calls to HTTP) by changing a config line — no code changes needed.

Linux of AI open-source tools for reducing AI vendor lock-in

2026-07-12 14:52 UTC

Linux of AI is a seven-project open-source ecosystem designed to reduce AI vendor lock-in by providing portable ontology, policy-as-code, model replacement benchmarking, audit logging, cost measurement, and more. It aims to make AI infrastructure inspectable, governable, measurable, and replaceable without reliance on a single vendor. All core software is free and open source under the MIT license.

A seven-project open-source ecosystem to reduce AI vendor lock-in.
Provides portable ontology, governance policies, model replacement, audit logs, and cost measurement.

Perfectly Hitting the Wrong Target: The Story of an AI Code Review Benchmark

2026-07-12 14:40 UTC

This article critically analyzes the AI Code Review Bench benchmark, arguing that it fails to define the problem from first principles and overlooks the distinction between two different AI code review problems: human comprehension and machine verification. The author, Shrijith Venkatramana, contends that the benchmark measures proxies rather than actual software outcomes, and emphasizes the importance of production outcomes and severity.

The AI Code Review Bench appears objective but lacks a fundamental problem definition.
AI code review actually comprises two distinct problems: human comprehension (recommendation) and machine verification (automated repair).

Show HN: Agent Legibility Analyzer see if AI shopping agents can read your store

2026-07-12 14:30 UTC

AgentMint.net is a research publication that helps merchants understand and optimize for how AI shopping agents select products. Every claim is sourced, and it offers tools like the Agentic Shopping Readiness check and Agent Selection signals database.

AgentMint.net analyzes why AI shopping agents choose certain stores and products.
All factual claims are labeled with evidence sources.

The impressive AI demo is dead. Here's what actually reaches production

2026-07-12 12:19 UTC

AI projects often stall after the demo phase. Confluent's 2026 Data Streaming Report reveals only 32% have agentic AI in production, with data infrastructure and skills shortages as key barriers. Real-time data pipelines and governance are critical for production-ready AI.

Only 32% of organizations report agentic AI in production.
Data infrastructure and quality are top barriers to AI success.

Memory makers are slaves to the boom-bust rollercoaster

2026-07-12 11:09 UTC

AI data center demand has tripled memory makers' revenues, but lagging fab construction keeps prices high until at least 2028, risking a severe bust if AI demand falters.

SK Hynix, Micron revenues tripled; Samsung roughly doubled
HBM, DDR5 shortages driving up prices across electronics

The Sequence Radar #893: Last Week in AI: GPT-5.6, Grok 4.5, Muse Spark 1.1 and the Post-Chatbot Stack

2026-07-12 11:02 UTC

Frontier AI labs are shifting from chatbots to integrated systems where models act as runtimes, with near-monthly releases of powerful models and agents. This week's highlights include OpenAI's GPT-5.6 with programmatic tool calling, GPT-Live's full-duplex audio, ChatGPT Work for artifact creation, Meta's Muse Spark 1.1 with active context management, and Grok 4.5 for coding and knowledge work. Research updates reveal issues with coding benchmarks, selective unlearning, agent self-evolution, speculative decoding, and traffic routing. Notable industry news includes major funding rounds for Lovable, Prime Intellect, SambaNova, Norm Ai, and Ollama.

OpenAI releases GPT-5.6 (Sol, Terra, Luna) with programmatic tool calling and parallel subagents.
GPT-Live introduces full-duplex audio interaction, shifting from turn-based to continuous dialogue.

Scientists' Side Hustle? Using AI and Quantum Computing to Generate New Peptides

2026-07-12 11:00 UTC

Researchers from the Technical University of Denmark combined a generative AI model with a quantum computer to design novel peptides that bind to specific proteins, potentially accelerating vaccine development and personalized immunotherapies, especially for understudied populations.

DTU team used hybrid AI-quantum system to generate novel peptides for protein binding.
Quantum integration improved peptide generation, especially with limited data.

AI Agents Are About to Change Payments Operations

2026-07-12 10:59 UTC

This article discusses how AI agents are transforming payments operations by automating tasks, improving efficiency, and reducing errors, and refers to a related Spotify podcast episode.

AI agents are entering the payments operations space
Automation can increase efficiency and accuracy

Show HN: Runeward: Sandboxing AI agents with policy gates

2026-07-12 09:35 UTC

Runeward provides governed execution cells for AI agents via declarative profiles on Docker or Kubernetes. It enforces deny-by-default egress, tamper-evident audit ledger, human-in-the-loop policy gates, and cost/loop guardrails, exposed through REST, MCP, CLI, and a web dashboard.

Declarative security contracts profile sandboxes with deny-by-default egress.
Tamper-evident, hash-chained, ed25519-signed audit ledger for every action.

Show HN: Zero Trust Boundary for Agents

2026-07-12 07:54 UTC

Attestor is an open-source zero-trust execution boundary for AI agents. It performs policy checks, approval validation, and evidence review before agent execution, returning decisions such as admit, narrow, review, or block, enforced through a customer-owned gate. Suitable for payments, data access, infrastructure changes, and more.

Provides policy, approval, and evidence checks before AI agent execution, returning structured decisions.
Supports shadow pilot mode to observe risks without actual execution, reducing deployment risk.

Agent Service – promptable AI agents with guardrails and downloadable packages

2026-07-12 07:17 UTC

A promptable AI agent service with safety guardrails and downloadable packages.

Promptable AI agents
Built-in guardrails

AI Should Build Its Own Research World Model

2026-07-12 07:11 UTC

This article describes an experiment where an AI agent placed in an unknown ARC-AGI puzzle environment develops an explicit world model through naming, abstraction, and mathematical reasoning, drastically improving problem-solving efficiency.

AI autonomously names objects and records rules in an unknown environment, building an explicit world model.
It discovers and abstracts operations P and Q, using mathematical notation for offline deduction.

MSK – an AI agent that thinks like a CTO

2026-07-12 06:27 UTC

MSK is an AI CTO agent app for iPhone, offering architecture reviews, scaling advice, and startup strategy via chat or voice. Modeled on the experience of Moeid Saleem Khan (15+ years, 300+ projects, 50+ startups), it provides sharp, opinionated answers. Free to start with no account required; premium subscription available.

AI CTO agent providing on-demand technical and strategic advice.
Simulates real CTO experience; supports chat and voice interaction.

AI notetakers promise easy meeting recaps, but some question their use

2026-07-12 01:41 UTC

AI notetakers can quickly summarize meetings, but raise privacy and security concerns. Voiceprints, data storage, and attorney-client privilege issues are highlighted, with experts advising caution and understanding data practices.

AI notetakers convert meeting speech into data, risking exposure of confidential information.
Voiceprints may be misused for identity verification or fraud.

Dismissive Dan's Review of the Overplane AI Coding Harness

2026-07-12 01:02 UTC

Overplane is an open-source tool that converts Markdown specs into code using AI agents and SMT verification. Reviewer Dismissive Dan questions its necessity, noting many developers already have similar setups, but acknowledges its packaging and isolation design.

Overplane turns Markdown specs into code, uses Z3 solver for consistency checks.
The review is constructive but skeptical, as many developers already have similar workflows.

A Coding Guide to NVIDIA’s Tile-Based GPU Programming: From cuTile and Triton Kernels to Flash Attention

2026-07-12 00:01 UTC

This tutorial explores NVIDIA's tile-based GPU programming with TileGym, building a Colab workflow that runs across different hardware. We probe the CUDA environment, try the real cuTile backend, and fall back to Triton when standard Colab GPUs lack the cuTile stack. We learn the core tile idea: operate on whole data tiles instead of single threads, then load, compute, and store them. We implement vector addition, fused GELU, row-wise softmax, tiled matrix multiplication, and flash attention, checking each against PyTorch.

Introduces NVIDIA's tile programming model, operating on data blocks rather than individual threads.
Provides a runnable Colab script that works with both cuTile and Triton backends.

Fixed three bugs that made Qwen3.5-122B a daily driver on Mac Studio

2026-07-11 22:54 UTC

After fixing three bugs related to prefix caching, the author achieved sub-second prefill times for long-context conversations with Qwen3.5-122B on a Mac Studio, turning a multi-minute wait into a seamless experience. The bugs included a timestamp in system prompt, missing reply saves on interrupt, and junk checkpoint writes.

Qwen3.5-122B on Mac Studio had severe prefill latency due to hybrid attention's cache behavior.
Three bugs: timestamp in system prompt caused cache miss; interrupted replies not saved; junk checkpoints evicted good ones.

Show HN: AgentTransfer – open-source file transfer for AI agents (one Go binary)

2026-07-11 22:52 UTC

AgentTransfer is an open-source file transfer tool designed for AI agents, allowing them to send files up to 5GB, discover peers, and coordinate in spaces. It uses email as a control plane and HTTPS for data transfer, with no human required for agent onboarding. The tool is a single Go binary that can be self-hosted or used via a hosted instance.

AgentTransfer enables AI agents to transfer files up to 5GB with just a name and API key.
Features include self-onboarding, content-addressed storage, hash verification, and signed receipts.

Mesh LLM: distributed AI computing on iroh

2026-07-11 22:38 UTC

Mesh LLM pools GPUs and memory across machines using iroh networking, exposing an OpenAI-compatible API. It allows running models locally, routing to peers, or splitting large models across multiple machines, offering control and cost savings without central servers.

Mesh LLM pools distributed GPU resources into a single OpenAI-compatible API
Supports local execution, peer routing, and pipeline splitting for large models

AI and Job Postings: From Destruction to Creation?

2026-07-11 22:37 UTC

US software development job postings have grown almost 15% since Claude Code's launch, while overall postings fell 7%. Occupations most exposed to AI saw the biggest declines from 2022-2026 but the largest rebounds in the past year. The recovery is concentrated in senior and AI-related roles.

Software development job postings up 15% since Claude Code launch; overall market down 7%.
AI-exposed occupations saw largest declines then strongest rebound over past year.

Show HN: Token Time – Screen Time, but for your AI agent tokens

2026-07-11 22:13 UTC

Token Time is a macOS menu bar app that tracks your AI agent token usage and cost, with full-screen nudges to help you take breaks. It runs locally and privately, with no cloud or telemetry.

Live token count and cost in the menu bar
Full-screen reminders at configurable token thresholds

Secret Claude tracker surprises users after Anthropic's anti-surveillance stance

2026-07-11 21:27 UTC

Anthropic is removing hidden steganography codes from Claude Code that were covertly detecting Chinese AI labs and unauthorized resellers for months. The company says the experiment has served its purpose and stronger mitigations now exist, but critics question the lack of transparency in a developer tool.

Anthropic embeds steganographic codes in Claude Code to identify Chinese AI labs and resellers.
The experiment ran from March until July 1, 2026, when the code was removed.

Show HN: BoundFlow – an open-source control plane for AI agents

2026-07-11 21:07 UTC

BoundFlow is an open-source control plane for managing unattended LLM agents and workflows. It provides cost caps, approval gates, automatic model switching, retries, and rollbacks to ensure safe and reliable agent operation.

Open-source control plane focused on the operational layer, not prompting or inference.
Supports cost caps, human approval, automatic model downgrades, and workflow self-healing.

I built TradingSpy: local, privacy-first AI trading assistant(First Open Source)

2026-07-11 20:45 UTC

TradingSpy is an open-source local AI trading research workstation that integrates market heatmaps, news catalysts, strategy generation, Backtrader backtesting, and transparent agent runs in one Docker app. It is privacy-first, with all data stored locally, no external accounts, and no cloud dependency. Supports multiple LLM providers and a broad range of financial data sources, suitable for traders and developers for strategy research, backtesting, and signal analysis.

Local-first architecture with all data stored locally, zero data privacy concerns.
Supports AI strategy generation, automated backtesting, and benchmark comparison with loop engineering.

I built a free tool to evaluate AI agent outputs (human labels and LLM judges)

2026-07-11 19:55 UTC

Verdict is an open-source, browser-based tool for evaluating AI agent outputs. It enables human labeling, grounded theory error analysis, and validation of LLM judges against human labels, all locally without data leaving your machine.

Verdict runs entirely in the browser, no backend or accounts needed.
Supports multiple trace formats and provides a clean chat timeline for review.

Sovereign AgentOps – Self-hosted constitutional AI governance for MCP agents

2026-07-11 19:52 UTC

Sovereign AgentOps Community Edition is an open-source, self-hosted MCP governance server for AI agents, offering Ed25519-signed audit trails, policy enforcement, and offline deployment. It provides 7 demo tools and aligns with EU AI Act requirements, with a commercial Enterprise edition featuring 91 tools and advanced compliance.

Sovereign AgentOps is a self-hosted MCP governance server for AI agents with cryptographic audit trails.
Community Edition offers 7 tools for policy enforcement, receipt signing, and workspace jailing, deployable offline.

Show HN: Wizard – Self-extending Rust terminal AI agent (one-line install)

2026-07-11 19:34 UTC

Wizard is a self-extending terminal AI agent built in Rust, installable with a single command. It intelligently executes tasks in the terminal, boosting developer productivity.

Self-extending Rust terminal AI agent
One-line installation

Show HN: A Trust Index for MCP Servers

2026-07-11 18:57 UTC

A security scoring system for MCP servers that continuously scans for tool poisoning, prompt injection, supply-chain, and credential risks. Each version gets a single score before agents connect. Out of 12,629 scored servers, 45% received an A grade, while 10% are high-risk (D/F).

Over 12,600 servers scored, with 45% rated A
Top-scored servers include mockservercom (100) and mcp-file-tools (99)

AI fiction is easy to detect because it's stupid and bad, research finds

2026-07-11 18:53 UTC

A study from University of Maryland and Google DeepMind found that AI-generated fiction is easily detectable due to narrative flaws like over-explaining themes, lack of subplots, and clunky moralizing. The researchers developed StoryScope, a detector that analyzes narrative features, and tested it on over 50,000 AI-generated stories. The study highlighted that different AI models have distinct quirks (e.g., GPT overuses dream sequences, Gemini uses character descriptions). The dataset used includes Books3, which is controversial due to copyright issues. The researchers used AI to assist in writing the paper itself.

AI fiction suffers from predictable narrative structures, such as over-explaining themes and avoiding subplots.
StoryScope detector analyzes narrative features to distinguish AI from human writing with high accuracy.

Physical AI scale up chemistry startup gaining traction at Big Pharma

2026-07-11 18:53 UTC

Telescope Innovations uses self-driving labs (SDL) to automate chemistry, addressing the physical bottleneck in drug discovery. With deployments at Pfizer, KPBMA, and a European pharma company, plus battery materials breakthroughs, the company is positioned as a key Physical AI player.

Telescope's SDL platform enables 24/7 autonomous chemical experimentation, reducing time from months to days.
Secured repeat business from Pfizer, infrastructure deal with KPBMA, and a European crystallization contract in 2026.

RAG Evaluation Frameworks Compared: RAGAS vs TruLens vs DeepEval

2026-07-11 18:16 UTC

This article compares three popular RAG evaluation frameworks: RAGAS, TruLens, and DeepEval. It explains why RAG needs dedicated evaluation, covers the three layers of evaluation (retrieval, generation, end-to-end), and details key retrieval metrics (Precision@K, Recall@K, MRR, NDCG). It then dives into RAGAS (LLM judge, no ground truth, synthetic test set generation) and TruLens (observability, RAG triad, dashboard), with brief mention of DeepEval, and provides guidance on choosing the right framework.

RAG systems require specialized evaluation because BLEU/ROUGE cannot capture retrieval and generation failures.
RAGAS uses an LLM judge for reference-free scoring and can auto-generate test sets from documents.

The Future Worth Building Is Human

2026-07-11 17:56 UTC

The article argues for AI that extends human will and judgment, emphasizing distributed knowledge, customization, and decentralized alignment to ensure AI serves diverse human needs.

AI should extend human will and judgment, not replace it.
Knowledge is tacit, local, and distributed; AI must be decentralized to benefit from it.

Reverse centaurs are the answer to the AI paradox

2026-07-11 17:23 UTC

Cory Doctorow explores the paradox of AI: why some users love it while others hate it. He introduces the concepts of 'centaurs' (humans assisted by AI) and 'reverse centaurs' (humans used as AI's accountability sink). He argues AI is a bubble that will burst, but productive residue like open-source models will remain. The key is who controls the AI, not the technology itself.

AI can be empowering when humans choose how to use it (centaurs) or oppressive when bosses impose it (reverse centaurs).
The Hearst summer reading guide fiasco exemplifies a reverse centaur scenario where a freelance writer was blamed for AI mistakes.

Show HN: Standalone SearXNG CLI+MCP (no server needed)

2026-07-11 16:49 UTC

SearXNG AI Kit is an AI-enhanced command-line interface, Python library, and MCP server for the SearXNG privacy-respecting metasearch engine, supporting over 180 search engines with standalone binaries available for Linux and macOS.

Provides CLI, Python library, and MCP server with support for 180+ search engines
Features AI chat and advanced research capabilities, configurable output formats

Agentation – Visual UI Annotation for AI Coding Agents

2026-07-11 16:16 UTC

Agentation is a tool that allows users to visually annotate UI elements for AI coding agents. It generates structured annotations containing CSS selectors, file paths, React component trees, and computed styles, enabling agents to precisely locate and fix issues. With MCP integration, agents can interactively query and respond to annotations, turning feedback into a conversation.

Annotate UI elements by clicking and get structured output with CSS selectors, file paths, etc.
Agents via MCP can list, clarify, and resolve annotations conversationally

Free AI Visibility Audit Tool & Agent

2026-07-11 15:59 UTC

This free tool checks whether ChatGPT, Gemini, Claude, Perplexity, Grok, and Google AI can crawl, understand, verify, and cite your website. The report includes full-site crawl inventory, brand entity profile, claim-level evidence ledger, AI intent coverage matrix, technical crawlability audit, schema and structured data plan, trust signal gap analysis, competitor and off-site evidence map, and P0/P1/P2 execution roadmap, with sample cases from ecommerce, AI SaaS, and B2B services.

Free audit tool assesses AI visibility across major AI systems.
Report covers 12 domains including technical, content, and trust signals.

My AI Model Tier List for Mid-2026

2026-07-11 15:43 UTC

A personal, non-benchmark tier list of AI models for coding and auditing as of mid-2026, covering Anthropic Fable, OpenAI Sol, Mistral, Gemini, and DeepSeek, with commentary on US export controls and European perspectives.

Fable (Anthropic) gets a B: fluent but unreliable, prone to hiding bugs.
Sol (OpenAI) gets an S: trustworthy for low-level code and testing.

An educational lab of AI agent architectures

2026-07-11 15:33 UTC

An educational lab of AI agent architectures built on LangChain and local Ollama, offering multiple agent variants for chat, tool calling, RAG, hybrid, and agentic RAG modes.

Multiple AI agent architecture variants covering chat, tool calling, RAG, hybrid, and agentic RAG.
Built on LangChain and local Ollama server, with optional OpenRouter support.

I made AI agents play diplomacy

2026-07-11 15:24 UTC

A GitHub repository that runs a complete game of Diplomacy between seven LLM-controlled powers, documenting negotiations and orders.

Seven AI agents powered by LLMs negotiate and submit orders in the classic board game Diplomacy.
Modular architecture allows easy swapping of game engine and LLM backend.

Agents

Related tags