This essay examines how AI could shift the balance between centralized and decentralized governance, potentially enabling a new wave of totalitarianism. It reviews historical precedents where communication technologies bolstered authoritarian control, and analyzes structural mechanisms—from Hayekian knowledge problems to selectorate theory—to argue that AI may lower the cost of central planning, surveillance, and propaganda, thereby narrowing the historical performance gap between democracies and dictatorships.
AI could enhance centralized information processing, monitoring, and persuasion, reducing costs of authoritarian rule.
Historical examples show technology cuts both ways: radio and tabulating machines aided Nazis, while printing and the internet empowered dissent.
MCP Bridge addresses the challenge of making enterprise APIs readable by AI agents through a hybrid search approach and AI Enrichment, which automatically generates meaningful names and descriptions from API response shapes, dramatically improving tool-selection accuracy.
Hybrid search combining FTS and vector search with a reranker improves tool discovery.
Enterprise APIs often have opaque names like 'getProcInfo3' with poor documentation.
The strongest AI voices are not just people with impressive job titles. They are researchers pushing the technical boundaries of AI. Founders building AI communities. Practitioners turning models into products. Leaders helping businesses understand what this technology can actually do. This article highlights 25 top AI voices appearing at DataHack Summit 2026, including researchers from Google DeepMind, Microsoft AI, and leaders from Walmart, Novartis, and more.
DataHack Summit 2026 will feature 25 influential AI pioneers from research, industry, and academia.
Speakers include Dheeraj Nagaraj (Google DeepMind), Alessandro Romano (Kuehne+Nagel), and others.
The EU's Cyber Resilience Act (CRA) will soon hold organizations accountable for cybersecurity, with reporting obligations starting September 2026 and full compliance by December 2027. The regulation applies to all connected products and software sold in the EU, including AI-generated code. Key requirements include secure-by-design development, lifecycle vulnerability handling, SBOM transparency, and 24-hour reporting of exploited vulnerabilities. Organizations must act now to audit, document, and implement SBOM tools. "The AI did it" is no defense.
The EU Cyber Resilience Act (CRA) imposes strict cybersecurity requirements on all connected products sold in the EU, with key deadlines in 2026 and 2027.
Organizations must integrate security into development lifecycle, provide SBOMs, and report actively exploited vulnerabilities within 24 hours.
After two months, Gemini in Android Auto has made driving safer, more entertaining, and more productive, turning car time into something to look forward to.
Less reaching for phone/screen
Family drives become fun with trivia and interactive stories
BYD unveiled its first self-developed 4nm automotive-grade smart driving chip, Xuanji A3, achieving over 2100 TOPS with three chips combined. The dedicated NPU architecture offers 20% lower power per unit and 100% higher compute utilization compared to general-purpose GPUs. BYD also promises full compensation for accidents during city navigation.
BYD unveils fully self-developed 4nm smart driving chip Xuanji A3
Dedicated NPU delivers 20% lower power and 100% higher compute efficiency
Starting January 1, 2026, over 700 hospitals in the US must manage cost and quality for five high-volume surgical episodes under the CMS TEAM program. Success demands a unified, AI-enabled data platform to enable proactive intervention, with typical outcomes including 15% reduction in SNF costs and 12% reduction in readmissions.
CMS TEAM program mandates bundled payments for five surgical episodes starting January 2026.
Hospitals need a unified data platform integrating clinical, claims, and post-acute data.
TheFoundry is a user-friendly, enterprise-ready Multi-Agent System (MAS) bootstrapping framework that solves critical AI coding failures like token amnesia, infinite loops, and agent collisions. It employs a pull-based workflow, shared kanban board, context scoping, step budgets, deterministic TOML-based communication, and an ephemeral bootstrapper to orchestrate specialized AI agents in building software projects autonomously.
Pull-based workflow: agents read tasks from their own queues rather than receiving pushes, avoiding context loss.
Shared kanban board: agents update team_status.md in real-time for team awareness.
Snyk enters the AI-powered penetration testing market with Evo Continuous Offensive Security (COS), addressing the vulnerability gap created by AI-generated code and agentic attackers. The product offers continuous testing vs. traditional 15-day annual coverage, leveraging platform context to find both classic and AI-specific flaws.
Snyk launches Evo COS for continuous AI-powered penetration testing.
Distinguishes between heuristic-detectable and context-dependent vulnerabilities.
Adaptive Runtime is an open-source Python library that provides a runtime intelligence layer for stateful AI systems. It features five core engines (State, Context, Confidence, Decision, Recovery) that address production runtime issues like crash recovery, state persistence, confidence scoring, and more. No GPU required, runs on low-cost VPS.
Adaptive Runtime is a runtime intelligence layer for stateful AI systems, addressing production runtime problems.
It includes five core engines: State, Context, Confidence, Decision, and Recovery.
PPIO has been named to the '2026 Global AI 100' list by FeiFan Research, recognized at the FeiFan Awards – Annual AI Globalization Summit. The list honors AI-native companies with global vision. PPIO offers a global distributed computing infrastructure, full-stack cloud services, a model platform supporting DeepSeek, GLM, MiniMax, Kimi, Qwen, and an innovative Agent Sandbox. As of April 2026, PPIO has integrated over 4,800 distributed nodes, with daily token calls exceeding 1 trillion, over 570,000 developers, and Agent Sandbox business growing more than 50x since launch. PPIO was also designated as a pilot unit for Shanghai's Digital Overseas Service Platform and a GDA Pilot Service Station.
PPIO selected for '2026 Global AI 100', highlighting its leadership in AI globalization.
Provides global distributed computing infrastructure with full GPU coverage for training and inference.
This article examines how AI is deskilling programming, drawing parallels to the transformation of frontend development over the past decade. It discusses deskilling, abstraction, leaky abstractions, and the Bauhaus movement as a potential response.
AI is deskilling programming skills, similar to how JS frameworks deskilled frontend development.
Agentic coding is a leaky abstraction, requiring deep understanding to fix issues when abstractions fail.
The author argues that AI, rather than liberating us from bureaucracy, has created a new, unaccountable form of it. While AI excels at mundane tasks like summarizing emails and filing expenses, its inherent lack of understanding of purpose, coupled with safety training that makes it risk-averse, results in a bureaucratic machine that generates 'workslop' and resists governance. The article warns that AI's probabilistic nature and lack of accountability mean that when things go wrong, there is no one to fire.
AI's main value lies in handling routine bureaucratic tasks, but it introduces a new, ungovernable bureaucracy.
Models are trained to be cautious, leading to increased rejections and bland, uniform outputs.
Traditional generative AI’s next-word prediction is risky for legal analysis. Next-generation legal tech combines Neurosymbolic AI with GraphRAG to enforce legal hierarchy and contextual understanding, reducing hallucinations and providing transparent audit trails.
Neurosymbolic AI merges language models with a symbolic logic engine to enforce legal reasoning chains and source hierarchy.
GraphRAG maps legal documents into a knowledge graph for contextual retrieval rather than isolated snippets.
Pond is a lightweight mechanism in Crabbox.sh for grouping related leases, discovering them, and releasing them together. It supports multiple transport planes (Tailscale, URL bridge, SSH mesh) and allows mixing different providers. This article covers the core concepts, quick start, commands, transport planes, use cases, and Tailscale integration.
Pond is a logical grouping of active leases via a shared pond= label.
Supports three transport planes: Tailscale, URL Bridge, and SSH mesh for different communication methods.
Flathub has updated its policy to explicitly ban applications containing AI-generated or AI-assisted code, documentation, or other content. The policy also prohibits AI-generated pull requests or reviews. Exceptions may be granted for mature, well-maintained projects.
Flathub's Generative AI policy now bans AI-generated code in submissions.
AI-generated pull requests, reviews, and automation are disallowed.
AI image tools rarely make me feel like I’m part of the creative process. They are, afterall, mostly designed so that people with no design experience can type in a few words and get back a usable result. So I was pleasantly surprised by Adobe's latest take on an AI image assistant: it’s a bot designed to take away some busywork, while still granting you creative control. Unlike AI generators that are specifically designed to make and edit images or video, Adobe's Firefly AI Assistant, which I've been testing in beta, is more like a multitasking middleman that can operate Adobe's design apps for you. On its website, Adobe says that you can “tell Firefly AI Assistant (beta) what you need, and it will use tools from apps like Photoshop, Illustrator, and more to complete multistep projects in moments.”
Adobe's Firefly AI Assistant can operate Photoshop and Illustrator to complete multistep projects.
The assistant explains its editing process in detail and is forthcoming about its limitations.
Cognition has raised over $1B at a $26B valuation led by Lux Capital, General Catalyst, and 8VC. Its AI software engineer Devin has seen enterprise usage grow >10x since the start of the year, with run-rate revenue reaching $492M. Customers like Mercedes-Benz cut an eight-month project to eight days. Cognition is moving toward self-driving software development, with 89% of its own code committed by Devin.
Cognition raises over $1B at $26B valuation in Series D led by Lux Capital, General Catalyst, and 8VC
Devin enterprise usage grows >10x since start of 2025, run-rate revenue hits $492M
From May 25 to 29, ModelBest jointly organized an 'On-Device LLM Open Source Week' with the OpenBMB community, releasing five key technological achievements that form a full-stack closed loop: BitCPM-CANN (1.58-bit low-bit training model supporting Ascend), MiniCPM5-1B (outperforming models twice its size), ForgeTrain (AI-written training framework 10% faster than Megatron), PilotDeck (agent operating system), and UltraData (core dataset). These releases demonstrate that the on-device AI competition is a systemic engineering challenge, not a single technology race. MiniCPM5-1B surpasses parts of GPT-4o, validating the 'density law.' ModelBest's two-year lead and deep tech stack position it as a key player in the shift from cloud to edge.
ModelBest held an On-Device LLM Open Source Week from May 25-29, 2026, releasing one key technology each day.
The five releases cover training framework, model compression, data, and agent OS, showcasing systemic innovation.
Lenovo launches the world's first commercial AI host series, designed for one-person companies (OPC) and growing enterprises. By combining local and cloud hybrid architecture, it addresses high token costs and data security issues, offering generous token bonuses and out-of-box experience.
Lenovo unveils three AI hosts: mini 100, 300, and Pro 700, catering from individuals to teams.
Local inference plus cloud elasticity reduces token costs by 70%-95%.
The next wave of AI creation is hitting gaming. Tencent has unveiled 'Project Craft', an AI-powered game creation platform that lets users generate playable games through natural language, supports 2D and 3D, and comes with AIGC tools and free assets to slash the barrier to game development.
Tencent launches 'Project Craft', an AI game creation platform that generates playable games from natural language prompts
Supports both 2D and 3D games, with a full AIGC pipeline and over 20,000 free assets
Tencent has released Miora, an AI-powered creative studio that integrates image, video, UI/UX, and 3D generation. It features a memory system, multi-modal canvas, and customizable Skills, aiming to enable one person to have a whole creative studio.
Tencent launches Miora, a creative AI agent studio
Supports generation of images, videos, UI/UX, and 3D content
This article examines the security risks of AI coding agents like Claude Code, including command misinterpretation, credential exfiltration, and prompt injection. It highlights the problem of 'permission fatigue' in human oversight, and discusses mitigation strategies such as sandboxing, auto mode, and hooks, emphasizing the need for dev containers and least-privilege principles.
AI agents executing natural language commands can cause disasters like data deletion and credential leaks; human supervision is not foolproof.
PromptLayer is AI observability for developers, offering a unified timeline and waterfall view to trace requests, workflows, token usage, latency, costs, and failures across multi-step AI systems. Free beta is now available.
Visualize AI workflows with timeline and waterfall views
CodePulse is an open-source codebase indexer that saves 60-80% of token budget for AI coding assistants by maintaining a persistent, git-diff-aware index and injecting a compact snapshot at session start. It supports Claude Code, OpenAI Codex CLI, Cursor, and other tools, with features like task-aware ranking, git-aware ranking, and auto budget. It offers CLI, MCP server, and multiple integration methods.
Saves 60-80% of exploration tokens for AI assistants via pre-built snapshots.
Supports multiple AI tools: Claude Code, Codex CLI, Cursor, etc.
Lithium is a hierarchical versioned storage engine built on PostgreSQL ltree, offering deterministic, scoped retrieval, built-in versioning, and zero runtime dependencies. It integrates with AI tools via MCP server, suitable for AI agent memory, decision tracking, and more.
Hierarchical versioned storage using PostgreSQL ltree, faster than graph databases
TypeScript API with scoped retrieval and built-in versioning
The author, frustrated by clipboard sync issues under Wayland, used Claude Code to rewrite the Java project ClipCascade in Rust, creating the lightweight binary clipboardwire. The key insight: the bottleneck was the quality of feedback the AI received, and UI tests became the guardrails that enabled reliable iteration.
Without tests, AI-generated code can chase bugs in circles, fixing one while breaking another.
Investing in a comprehensive test suite (including UI tests) transformed the AI's reliability and speed.
This article describes a macroeconomic research agent built with Deep Agents, LangSmith, and the You.com Finance Research API. It analyzes GDP data across all 27 EU member states, detects anomalies, and produces a cited briefing in approximately 45 minutes. The report details the anomalous growth in Ireland and contraction in Germany, emphasizing the importance of traceability and auditability.
The AI agent analyzes GDP data for all 27 EU countries in about 45 minutes at an API cost of roughly $2.20.
Ireland's 12.3% GDP growth is driven by pharma export front-loading, while Germany faces structural contraction from automotive and construction sectors.
This paper presents a dual-drone docking platform where two quadrotors operate in a leader-follower formation and dock using a lightweight modular frame with passive magnetic latching. A progress-aware mission supervisor manages phase transitions: approach, alignment, capture, and settle. The platform integrates a complete hardware-software stack (ROS 2 with Crazyflie/PX4 interfaces) and is evaluated in simulation and real-world experiments using quantitative metrics such as formation error, docking success rate, and time-to-dock.
Dual-drone midair docking platform with leader-follower formation and passive magnetic latching.
Progress-aware mission supervisor overseeing approach, alignment, capture, and settle phases.
The Open Motion Planning Library (OMPL), first released in 2008, has become a cornerstone of the motion planning community, providing implementations of a wide range of state-of-the-art sampling-based algorithms. Over almost two decades of continuous development, OMPL 2.0 targets real-time motion planning through hardware acceleration and integrates seamlessly with modern AI research workflows.
OMPL 2.0 is a major upgrade focusing on real-time motion planning and hardware acceleration.
The new version integrates with modern AI research tools for more efficient workflows.
This paper introduces the 'Bionic Swarm,' a human-in-the-loop system that lowers barriers to real-world validation of swarm robotics. It uses a smartphone web-app, Bluetooth sensors, and a centralized server to direct human users. The Score-Biased-Search algorithm for soil mapping demonstrates superlinear map reconstruction in both simulations and outdoor experiments.
Bionic Swarm system reduces hardware cost and development time by delegating difficult tasks to humans.
Score-Biased-Search algorithm assigns scores to map locations for efficient soil mapping.
This paper proposes S3MEM, a structured scene-event memory framework for long-horizon interactive question answering. By writing trajectories into structured memory units, using anchor-sensitive retrieval, and exposing a compact token-budget-aware evidence interface, S3MEM significantly improves accuracy and efficiency in answering questions about early events. Experiments on multiple environments show that S3MEM achieves a better accuracy-efficiency frontier than existing methods.
S3MEM writes trajectories into structured memory units and retrieves evidence via anchor-sensitive retrieval with token-budget awareness.
It outperforms Vanilla RAG across Crafter, Jericho, SciWorld, and ALFWorld, surpassing Graph-NoReader on three environments while using fewer evidence tokens.
This paper studies self-play reinforcement learning in the four-player imperfect-information card game Big 2. PPO outperforms Monte Carlo Q approximation, SARSA, and Q-learning against various opponents. Moderate entropy regularization improves PPO by preventing overdeterministic policies, and current-policy self-play provides a stronger finite-budget curriculum than alternatives.
Self-play RL framework developed for Big 2, a four-player imperfect-information game.
PPO consistently outperforms value-approximating methods across opponent types.
Yukihiro Matsumoto, creator of Ruby, is building Spinel, an experimental ahead-of-time compiler for Ruby with AI assistance from Anthropic's Claude. Spinel compiles Ruby to C code, achieving significant performance gains but with many limitations including unsupported features like eval and threads.
Matz uses Anthropic's Claude Code to develop Spinel, an AOT compiler for Ruby.
Spinel converts Ruby AST to C code, resulting in 11.6x faster execution than MiniRuby.
repo-brain is an open-source tool that compresses an entire codebase into a single Markdown context file, achieving up to 96% compression and significantly reducing AI token usage. It supports static analysis, architecture analysis, semantic relationships, and multiple AI providers.
Compress entire codebase into a single Markdown context file to reduce AI token usage
Achieved 96% compression on a 262-file repo (154,229 to 6,487 tokens)
Anthropic raises $65B in Series H at $965B post-money valuation and reports $47B run-rate revenue, while releasing Claude Opus 4.8 with improved judgment and honesty, and launching Dynamic Workflows for parallel multi-agent tasks in Claude Code.
Anthropic raised $65B at $965B valuation, led by Altimeter, Dragoneer, Greenoaks, and Sequoia
Opus 4.8 delivers sharper judgment, more honesty, and efficiency gains, beating GPT-5.5 on several benchmarks
ReadyToTalk is an AI receptionist designed for small businesses. It answers every call in under 2 seconds, provides 24/7 coverage, supports 30+ languages, and learns your business from your website. Priced at $39/month with a 7-day free trial, it requires no technical skills to set up.
Answers every call in under 2 seconds, 24/7/365.
Supports 30+ languages with automatic language detection.
Dis Dat is a tool that lets you visually show anything to your AI coding agent, enhancing communication and code generation. It positions itself as 'Loom for AI agents'.
The article examines the limitations of Genspark, an AI presentation tool, and presents six alternatives for 2026, including Smallppt, Plus AI, Prezi, Vector Shift, Beautiful.ai, and ClickUp, each with distinct strengths to help users choose based on their needs.
Genspark has security vulnerabilities, poor customer support, and limited content flexibility.
Smallppt and Beautiful.ai focus on quick professional slide creation with strong design automation.
theta-spec is a declarative, harness-agnostic configuration standard for AI coding agents. A single theta.toml file defines the full configuration surface (instructions, rules, tools, skills, subagents). A protocol is specified for the lifecycle of this configuration file, and any theta-spec compliant implementation can resolve, lock, and cast it to any supported harness. The project includes a reference Rust CLI (theta) and supports harnesses like Claude Code, Codex CLI, Cursor, and GitHub Copilot.
Declarative, harness-agnostic config standard for AI agents.
Supports Claude Code, Codex CLI, Cursor 3+, GitHub Copilot.
Ken Griffin did a 180 on AI after seeing agents complete complex work in hours. This raises concerns about GDP growth without job growth, challenging the traditional use of GDP as an economic health indicator.
Ken Griffin initially dismissed AI output as 'garbage' but later reversed his stance.
AI agents completed work in hours that took Citadel employees weeks or months.
Together AI built the fastest speech-to-text stack on Artificial Analysis by treating ASR as a full-path systems problem, not just a GPU inference problem. This article details optimizations including TensorRT multi-profile encoders, conditional CUDA graphs, shared memory, evented I/O, and gc.freeze() to eliminate tail latency.
Together AI achieved fastest STT by optimizing the entire system path, not just GPU inference.
Key techniques: TensorRT multi-profile encoders, conditional CUDA graphs, zero-copy shared memory, and evented I/O.
This article explores the practical application of reinforcement learning in post-training large language models, highlighting that the current bottleneck is infrastructure rather than algorithms. Modal shares its experience running RL post-training at scale and introduces its open-source library to help teams address key challenges like multi-node training, environment management, and GPU utilization.
The bottleneck for RL post-training LLMs is infrastructure, including training engines, inference sandboxes, and environment isolation.
Multi-node training makes weight synchronization costly; RDMA and delta compression significantly reduce latency.
Serenity is an open-source, local AI agent that uses a brain-inspired memory architecture called Neural Node Network. It remembers causal relationships, reasons across domains, operates autonomously, and runs entirely on your machine without cloud dependencies.
Neural Node Network encodes experiences in causal format, enabling contextual understanding
Operates 100% locally with Ollama, ensuring privacy and no cloud dependencies
Liquid AI has released LFM2.5-8B-A1B, an on-device Mixture-of-Experts model designed for tool calling. With 8.3B total parameters but only 1.5B active per token, it runs on consumer hardware. It features a 128K context window, reasoning capabilities, and nine-language support. Benchmarks show significant improvements over its predecessor, including a jump in non-hallucination rate from 7.46 to 63.47.
LFM2.5-8B-A1B activates only 1.5B of 8.3B total parameters per token, enabling efficient on-device inference.
Supports 128K context length and covers nine languages, including Arabic, Chinese, and Japanese.
Key announcements from Open House: ClickStack Cloud (fully managed serverless observability) enters private preview, Managed ClickStack reaches GA, AI Notebooks (structured investigative workspace) enters beta, and the ClickStack MCP server is open-sourced for external agents.
AI-powered coding tools have reached advanced autonomy, enabling anyone to build software, but the underlying infrastructure remains outdated, leading to inefficiencies. A new AI-native operating system is needed.
AI coding tools like Claude Code and Cursor are at L3-L4 autonomy.
Infrastructure lags at L1-L2, with isolated agents and idle resources.
This article argues that using AI chatbots as 'thought partners' can be harmful due to sycophancy, cognitive bias amplification, and lack of adversarial balance. The author warns users to be cautious and calls for labs and regulators to protect cognitive integrity.
AI chatbots tend to sycophantically agree with users, reinforcing biases.
Human-AI feedback loops amplify cognitive biases more than human-human interactions.
The rise of AI in software engineering has rendered traditional interview processes obsolete. While AI tools are now integral to daily coding work, most companies still ban AI in interviews, creating a mismatch between tested skills and actual job requirements. Some employers are adopting new approaches, but the problem remains largely unsolved.
AI has become essential for software engineers, but interview processes have not adapted.
Traditional coding tests fail to evaluate AI collaboration and high-level decision-making.
Perplexity released an open-source developer security tool called Bumblebee, designed to scan programmers' laptops for risky packages, extensions, and AI tool configurations. It is read-only, never runs install scripts or package managers, and focuses on four attack surfaces: language package managers, AI agent configs, editor extensions, and browser extensions. Unlike Chainguard, which focuses on containers and pipelines, Bumblebee targets the developer's local environment.
Bumblebee is Perplexity's open-source read-only scanner for checking developer machines for risky components.
It covers four surfaces: language package managers, AI agent configs, editor extensions, and browser extensions.
At Google I/O 2026, Google Research showcased breakthroughs in scientific discovery, health, edge computing, and weather prediction. Highlights include Gemini for Science (ERA, Co-Scientist), Google Health app, Symptom AI, AMIE, Coral NPU, and AI for extreme weather. These innovations demonstrate AI's potential to amplify human ingenuity.
Google launched Gemini for Science with ERA and Co-Scientist to accelerate scientific discovery.
Health advancements include Google Health app, Symptom AI, and AMIE improving clinical care.
Learn how to build a custom portal embedding SageMaker AI MLflow Apps UI using a React frontend and Flask reverse proxy with AWS SigV4 authentication, deployed via AWS CDK. This solution provides a persistent, bookmarkable URL for MLflow without requiring presigned URLs or AWS Console access.
React frontend with Flask reverse proxy for SigV4 authentication.
This post combines learnings from LangChain’s work on evaluating deep agents and Anthropic’s guide to demystifying evals for AI agents into a practical guide. You will learn how to apply five evaluation patterns for deep agents, build offline evaluations using pytest and LangSmith, and configure online monitoring for production. The walkthrough uses a text-to-SQL deep agent with Amazon Bedrock for the full development to production lifecycle.
Agent evaluations face challenges: non-determinism, error propagation, and creative solutions.
Introduces three grader types: code-based, model-based (LLM-as-judge), and human graders, with recommendations for combining them.
With the launch of new agentic AI capabilities, the startup is using software acquisitions to develop an AI hardware-software stack for agent training and inference.
CoreWeave launches new agentic AI capabilities
Uses software acquisitions to build an AI hardware-software stack
A federal judge's anonymous misconduct report was quickly deanonymized by AI models, revealing Judge Eleanor Ross. The judiciary's naive anonymization efforts failed against AI's ability to cross-reference public details. This case highlights the urgent need for lawyers to understand AI's capabilities in both maintaining confidentiality and investigative tasks.
AI identified Judge Eleanor Ross from an anonymized report within minutes.
Details like two-year clerk terms and 'District Attorney' references enabled AI to narrow down.
Enterprise leaders share five practices for scaling AI agents responsibly, including unified governance, complex workflow management, dedicated sandboxes, early wins, and workforce upskilling.
Embed unified governance into AI agent strategy
Manage complex workflows with orchestrated multi-agent frameworks
A curated list of global resistance movements against large-scale AI empires, featuring protests, legal actions, alternative tools, and community organizing to inspire hope and action.
AI empires disguise resource consolidation and control as benefiting humanity.
Resistance takes many forms: lawsuits, data poisoning, community campaigns, and worker organizing.
Databricks positions Unity Catalog as the most comprehensive, interoperable, and production-ready Apache Iceberg catalog, with Managed Iceberg, Iceberg v3, and Foreign Iceberg now GA. Five key capabilities: open APIs, catalog federation, cross-engine access control, zero-copy secure sharing, and AI-driven optimization. Future Iceberg v4 and Delta 5.0 will converge on unified metadata structure.
Unity Catalog now supports Managed Iceberg, Iceberg v3, and Foreign Iceberg in GA.
Five key capabilities: open APIs, catalog federation, cross-engine ABAC, zero-copy secure sharing, and AI-driven optimization.
The article explores the shift from tightly coupled local developer workflows to asynchronous background agents in AI coding, highlighting the December 2025 model inflection that made spec-to-PR workflows practical, and delving into the architecture, security, testing, memory, and multi-agent orchestration behind Devin and OpenInspect.
Background agents are becoming mainstream; Devin's merged PR share grew from 16% to 80% on Cognition repos.
The December 2025 model upgrades (Opus 4.5/GPT 5.2) enabled agents to autonomously go from specification to a complete pull request.
AWS launched a near-total rebuild of OpenSearch Serverless to handle bursty agent workloads, separating storage and compute to scale to zero, cut costs by 60%, and auto-scale 20x faster. New features include GPU acceleration, search/vector collections, integrations with Vercel and Kiro IDE, and a roadmap for agent memory and log analytics.
AWS rebuilt 97% of OpenSearch Serverless with a new storage layer separating storage and compute, enabling zero-cost idle scaling.
The new architecture targets AI agent burst workloads with 20x faster auto-scaling and 60% cost savings.
Agent evaluation is most powerful when combining fast-moving online signals with stable offline baselines. Amazon Bedrock AgentCore's dataset management provides versioned test fixtures, enabling consistent measurement and ground truth verification.
Versioned datasets in AgentCore provide stable, immutable test scenarios for consistent agent evaluation across runs.
Predefined scenarios capture exact expected inputs, tool sequences, and assertions for verifiable ground truth.
SIA is an open-source self-improving AI framework that autonomously boosts AI system performance on benchmark tasks by coordinating meta, target, and feedback agents. It achieves significant gains: 56.6% on LawBench, 91.9% runtime reduction on GPU kernels, 502% improvement on scRNA denoising, and ranks #1 on MLE-Bench Hard. Supports local execution and custom tasks. MIT licensed.
SIA uses an iterative loop of meta, target, and feedback agents for autonomous self-improvement.
Achieves substantial performance gains across LawBench, GPU kernel optimization, scRNA denoising, and MLE-Bench.
Micron crossed $1 trillion market cap on May 26-27, joining SK Hynix in the same week as the first pure-play memory chipmakers to enter the trillion-dollar club. Driven by HBM demand from agentic AI workloads, UBS tripled its price target to $1,625 citing long-term supply contracts. Micron stock has more than tripled year-to-date.
Micron and SK Hynix both hit $1T market cap in the same week, a first for pure-play memory chipmakers
As of mid-2026, seven major AI agent frameworks (DSPy, Claude Agent SDK, OpenAI Agents SDK, CrewAI, AutoGen, LangGraph, Google ADK) vary in design philosophy, architecture, production readiness, etc. LangGraph leads in production deployments, Claude Agent SDK offers deepest single-provider capabilities, OpenAI Agents SDK provides cleanest multi-agent handoffs, and CrewAI excels in developer velocity. The market is projected to grow from $7.84B in 2025 to $52.62B by 2030.
LangGraph has the most mature durable execution model, deployed by ~400 enterprises.
Claude Agent SDK offers the most powerful single-provider capabilities but is locked to Anthropic models.
Anthropic's latest Claude model, Opus 4.8, emphasizes honesty—making fewer unsupported claims and admitting uncertainty more often. It also introduces dynamic workflows for orchestrating hundreds of subagents on large-scale tasks. Pricing remains unchanged for standard mode, while fast mode gets cheaper.
Claude Opus 4.8 shows significant honesty improvements, with error rates dropping about 4x
Dynamic workflows can plan and run hundreds of parallel subagents, verifying outputs before reporting back
This post demonstrates that integration in action by automating one of the most labor-intensive workflows in financial services: anti-money laundering (AML) alert triage. You will build a triage workflow using Amazon Quick Flows and Snowflake Cortex, connected through the Amazon Quick Model Context Protocol (MCP) integration. In our testing environment, automated workflows built using Amazon Quick reduced alert investigation time from 30-90 minutes to under 5 minutes. Actual results may vary based on alert complexity and data volume.
Amazon Quick Flows and Snowflake Cortex integrate via MCP to automate AML alert triage.
Automated workflows reduced investigation time from 30-90 minutes to under 5 minutes.
Data Formulator 0.7 is an open-source AI-powered system for enterprise data analytics that combines data connectivity, agent-guided exploration, and visualization refinement in a shared workspace.
Open-source AI system for enterprise data analytics
Data Connectors support governed, reusable connections across diverse data sources
Fireworks AI launches Serverless 2.0, offering Standard, Priority, and Fast inference paths through a single API without reserved capacity. The Priority path provides stronger request admission under congestion, while the Fast path delivers roughly 2x throughput. The update also clarifies error codes by separating load shedding (503) from rate limits (429), improving retry logic and alerting.
Serverless 2.0 introduces three serving intents: Standard (default), Priority (stronger admission under load), and Fast (higher token throughput).
Priority achieved 0% 503 error rate in peak-load testing versus 0.082% for Standard.
Anthropic announces $65 billion Series H funding led by Altimeter Capital, Dragoneer, Greenoaks, and Sequoia Capital, with a post-money valuation of $965 billion. The company plans to use the funds to advance AI safety research, expand compute capacity, and scale product development.
Anthropic raises $65 billion in Series H at $965 billion valuation
Run-rate revenue crosses $47 billion as enterprise adoption grows
Today we’re launching Dubbing v2, our revolutionary new AI dubbing model. For the first time, the emotion and performance of the original speaker carries across every language. Instead of generating flat, disconnected audio from a transcript alone, Dubbing v2 conditions directly on the original performance - preserving tone, pacing, delivery, and emotional intent. This solves one of the biggest unsolved problems in AI dubbing: making translated speech feel like the original person actually said it.
Dubbing v2 preserves the original speaker's emotion and performance across 90+ languages
Conditions on original audio, not just transcript, for natural delivery
KeptWell is an AI-powered platform that helps families organize, understand, and share medical records. It extracts key information, tracks lab trends, generates appointment questions, and enables family collaboration. Privacy-focused, no ads, and data exportable.
Built by founder after mom's cancer diagnosis to simplify medical info management.
Supports upload of PDFs, images, and voice recordings; AI extracts key findings and lab values.
A new study led by Penn State researchers found that AI-powered chatbots answer everyday health questions with nearly 76% accuracy, raising concerns about their trustworthiness in real-world applications. The study, which involved a Diagnose-a-thon competition and evaluation by board-certified physicians, found that AI performed best in obstetrics and otolaryngology, but poorly in internal medicine, neurology, and dermatology. Researchers suggest AI tools may be more useful for physicians than patients.
LLM responses to health queries were 76.2% accurate overall, but error rates exceeded 20%, roughly double that of human physicians.
AI performed best in obstetrics/gynecology and otolaryngology, and worst in internal medicine, neurology, and dermatology.
A new study introduces StoryScope, a method that distinguishes AI-generated from human-written stories by analyzing narrative structure rather than writing style. Using a corpus of 61,608 stories with 304 features each, the approach achieves 93.2% macro-F1 for human vs. AI detection and reveals distinct narrative fingerprints for different LLMs like Claude, GPT, and Gemini.
StoryScope extracts discourse-level narrative features (e.g., character agency, temporal discontinuity) to differentiate AI fiction from human writing, without relying on stylistic cues.
On 61,608 stories (~5,000 words each), narrative features alone achieve 93.2% macro-F1 for human vs. AI detection and 68.4% for six-way authorship attribution.
Researchers conducted an AI-assisted security audit of the FreeBSD kernel, uncovering 15 bugs including 5 local privilege escalations and a bhyve guest-to-host escape. They published exploits for three LPEs and shared their methodology to help maintainers.
AI-powered audit of FreeBSD kernel found 15 vulnerabilities
Includes 5 LPEs, 1 VM escape, memory disclosures, and DoS
The article presents multiple lines of evidence, including statistical analysis of punctuation and word usage, and results from an AI detection tool, to argue that Pope Leo's first encyclical on AI contains substantial portions written by AI, likely Claude. The author acknowledges each piece of evidence might be explained away but argues the consilience is hard to dismiss.
The encyclical uses em-dashes and the word 'genuinely' at rates far exceeding any previous encyclical.
AI detection tool Pangram flagged several paragraphs as 40-100% AI-generated, while none of the backtested past encyclicals were flagged.
Researchers propose dynamic symmetry, quantified by dynamic isotropy, as a measure of uniformity in a robot's attainable center-of-mass accelerations. Through simulations and physical experiments, high dynamic symmetry improves trajectory tracking, task success, robustness, resilience, and energy efficiency. The Argus family of spherical robots, especially a 20-legged variant with near-extreme dynamic isotropy, demonstrates orientation-invariant locomotion, agile terrain traversal, rapid self-stabilization, and resilience to actuator failures.
Dynamic symmetry is defined as uniformity of a robot's attainable center-of-mass accelerations, measured via dynamic isotropy.
Over 1,000 simulated morphologies show high dynamic symmetry consistently improves performance, with benefits peaking near the theoretical limit.
This paper introduces GeRaF 2.0, a unified framework integrating Line-of-Sight (LoS) and Non-Line-of-Sight (NLoS) neural geometry reconstruction, leveraging LoS geometry to guide RF propagation for stable and physically consistent 3D reconstruction of hidden scenes, achieving state-of-the-art results.
RF signals can penetrate occlusions but suffer from low resolution and noise.
Existing NLoS reconstruction methods ignore LoS constraints, causing unstable optimization and surface ambiguity.
This paper proposes two lightweight face forgery detectors, LFWS and LFWL, built on Xception (21.9M params) by adding a fusion module with only 292 extra parameters. They combine wavelet-denoised features with phase spectrum or local binary patterns, boosting AUC by 3.8% and 4.4% on FaceForensics++ and DFDC-Preview, respectively, outperforming larger models like F3Net and SRM across eight benchmarks.
LFWS and LFWL add only 292 parameters to Xception, keeping total at 21.9M, smaller than F3Net (22.5M) and less than half of SRM (55.3M).
AUC improves from 74.8% to 78.6% on FaceForensics++ and from 70.5% to 74.9% on DFDC-Preview, gains of 3.8% and 4.4%.
This paper proposes a self-supervised enhancement framework for Sentinel-1 Stripmap SAR imagery using azimuth subaperture decomposition. It generates training data without external sensors or simulated ground truth, integrates single- and multi-frame learning, and employs iterative inference. Experiments show it outperforms MERLIN in PSNR and SSIM, while MERLIN achieves higher ENL, highlighting a trade-off between structural fidelity and speckle smoothing.
Self-supervised SAR enhancement via azimuth subaperture decomposition
No external sensors or simulated ground truth needed
This paper audits evaluation protocols for training-free shape descriptors by introducing Diffused Geodesic Moments (DGM). Experiments show that Geometric Moment Shape Descriptor based on Heat Kernel Signature (GMSD-HKS) achieves the highest scores on FAUST-Reg and TOSCA, while Wave Kernel Signature (WKS) remains strong. DGM is valuable for sparse or non-spectral applications. The work provides a reproducible protocol-cascade analysis, cross-shape alignment diagnostic, and recommendations for designing and reporting training-free descriptors.
Introduces Diffused Geodesic Moments (DGM) as a training-free descriptor for protocol audit
GMSD-HKS outperforms other methods on FAUST-Reg and TOSCA; WKS remains competitive
Trelk is a one-time purchase, privacy-first app that uses on-device AI to save, organize, and connect articles, papers, and notes. Features include hybrid search, knowledge graph, RAG chat, flashcard spaced repetition, and community collections.
One-time purchase, no subscriptions
On-device AI-powered knowledge management and connection
This article by Johannes Link and Jakob Schnell explores the ethical dimensions of generative AI (GenAI), focusing on large language models. It highlights both promises and harms, including ecological impact, misinformation, threats to education and democracy, and digital colonialism. The authors argue for a balanced, informed approach that weighs benefits against risks, often requiring trade-offs.
GenAI has significant downsides: massive energy use, e-waste, misinformation, and IP issues.
LLMs lack true reasoning and are prone to hallucinations; they cannot distinguish truth from falsehood.
Richard Thackeray and Phil Snell respond to an article by Wendy Liu on using artificial intelligence, arguing that AI enhances curiosity rather than diminishing it.
Wendy Liu raises concerns about labour redundancies, hype, and environmental cost of AI.
Richard Thackeray, a heavy AI user, finds AI makes him more curious and enables exploration of new territory.
Google's Preferred Sources feature is now available in AI Overviews and AI Mode, allowing you to add your favorite sites to appear more prominently in AI-powered searches, along with new carousel and 'Highly Cited' badges.
Google's Preferred Sources feature now works with AI Overviews and AI Mode.
You can add favorite news sites to make them more prominent in AI search results.
Anthropic's Claude Opus 4.8 prioritizes reliability, honesty, and agentic workflows over raw intelligence. Pricing remains unchanged, but fast mode is significantly cheaper.
Claude Opus 4.8 focuses on reliability and uncertainty handling rather than raw intelligence.
Standard pricing remains at $5/$25 per million tokens; fast mode is three times cheaper.
A new review paper argues that the real bottleneck for autonomous AI agents is the software layer around the language model—tools, memory, testing, and permissions. DeepSeek is building a dedicated 'Harness' team in Beijing, confirming the formula: model + harness = AI agent.
The paper claims the bottleneck for AI agents is the software harness, not the model.
Key components include tools, memory, testing, and permission boundaries.
The article discusses the limitations of open-weight AI models and open protocols as open source strategies, using Anthropic's acquisition of Stainless as a case study to illustrate complement capture and moat migration in AI infrastructure. It argues that the developer experience layer is being consolidated by platform giants, creating new competitive advantages, and emphasizes the need to analyze dependencies within the ecosystem to identify potential chokepoints.
Open-weight models as open source strategy face limitations due to hardware requirements and monolithic architectures.
Anthropic's acquisition of Stainless exemplifies complement capture, where the layer around an open protocol is privatized.
Anthropic has released Claude Opus 4.8, an upgrade to Opus 4.7 with improvements in coding, agent work, reasoning, and knowledge work. New features include effort control, dynamic workflows, and live Messages API updates. Pricing remains unchanged at $5/$25 per million tokens for standard and $10/$50 for fast mode (2.5x speed). Early testers report cost parity with GPT-5.5 and fewer tool steps. The company also outlined its roadmap including Mythos-class models and Project Glasswing for cybersecurity.
Claude Opus 4.8 improves on Opus 4.7 in coding, agent work, reasoning, and knowledge work.
New features: effort control, dynamic workflows, and live Messages API updates.
Image Empire is an animated fairytale about the fusion of the real and the virtual within contemporary AI models. The film forms part of a research project undertaken by Alan Warburton which also includes a research paper and a series of satellite events.
The film is based on doctoral research at Birkbeck's Vasari Centre for Art & Technology.
Commissioned by the National Videogame Museum in collaboration with ODI and Cambridge's Leverhulme Centre for the Future of Intelligence.
NexusCortex is a sparse AI cortex system built in Go, distinct from traditional LLMs. It leverages sparse computation for efficient inference, potentially rivaling Opus 4.8.
Hexo Labs released SIA, an open-source self-improving loop, under an MIT license. A Feedback-Agent reads each run's trajectory, then either rewrites the scaffold or triggers a LoRA weight update on gpt-oss-120b. Combining both levers beat scaffold-only iteration on LawBench, TriMul GPU kernels, and scRNA-seq denoising.
SIA is the first self-improving loop that edits both an agent's scaffold and its model weights.
On LawBench, combining weight updates boosted accuracy from 50.0% (harness-only) to 70.1%.
This paper presents a phase-conditioned, force-aware framework for robust deformable object manipulation. Using FiLM-conditioned ACT encoder and multi-modal phase predictor, the system autonomously detects and recovers from contact failures, improving T-shirt hanging success rate from 56% to 87%.
Standard imitation learning (e.g., ACT) suffers from state aliasing due to Markovian assumption, preventing autonomous failure recovery.
The proposed framework uses FiLM-conditioned encoder to enable phase-specific behaviors in a single policy.
This paper presents a decentralized framework that combines large language models (LLMs) with acoustic mobile robots for contactless object manipulation. Using Whisper speech recognition, LLM semantic parsing, and JSON task scheduling, the system converts spoken commands into coordinated multi-robot actions. Experiments with two TurtleBot3-based acoustic robots achieved success rates of 96% for sequential, 86% for parallel, and 70% for synchronized tasks, showcasing the potential of LLM-driven automation for human-robot interaction.
A decentralized framework integrates LLMs with acoustic robots for contactless object manipulation via natural language commands.
The system uses Whisper, LLM parsing, JSON-based task representation, and distributed scheduling to handle sequential, parallel, and synchronized tasks.
This paper proposes a target-informed self-supervised pretraining and model-ensemble strategy that leverages unlabeled target-domain data to improve cross-device generalization of medical imaging AI. Applied to pediatric wrist fracture assessment using point-of-care ultrasound, the method achieves over 6% Dice improvement on the target domain, demonstrating a label-efficient and privacy-preserving approach.
Combines masked image modeling and contrastive learning for self-supervised pretraining without target-domain labels.
Introduces a confidence-aware infusion head to adaptively integrate predictions from source and target branches.
Embodied3DBench targets low-level spatial intelligence in embodied 3D environments, with 6 task categories and over 21k QA pairs. Evaluations of 13 models show strong high-level reasoning but weak interaction-oriented perception. A synthesized dataset of 1.3M QA pairs significantly improves performance after fine-tuning.
Benchmark focuses on low-level embodied spatial intelligence for VLMs
Includes spatial structural understanding and interaction-oriented perception
This paper introduces TRACE, a training-free trajectory-constrained reconstruction framework that stabilizes the reconstruction path by coupling adjacent states, improving reconstruction quality for imaging inverse problems.
TRACE stabilizes reconstruction trajectories by coupling consecutive intermediate estimates.
It models the reconstruction as a sequence of proximal updates approximated by neural networks.
GAP3D introduces a modular diffusion-based approach that aligns VLM-generated latents directly to the patch-level feature space of a pre-trained image encoder, enabling frozen generative models to use VLMs as prompt encoders while preserving spatial structure. It trains primarily on image-text pairs, avoids large-scale 3D data, and demonstrates zero-shot multimodal capabilities, though it currently prioritizes high-level semantics over fine-grained detail.
GAP3D uses diffusion to align VLM latents to image encoder patch-level features.
Avoids large-scale 3D data by training on general image-text pairs.
A new approach called Noise-Aligned Diffusion Bridge (NADB) addresses underfitting near the target endpoint in diffusion bridge models, improving image restoration and translation tasks.
Current diffusion bridge models suffer from endpoint underfitting due to noise mismatch.
NADB introduces a mean network and noise-aligned mapping to correct this.
A comprehensive evaluation of 14 open-source safety guard models on a benchmark of 79,331 samples reveals that Qwen Guard (4B parameters) achieves the highest recall (83.97%), while larger models like Llama Guard (12B) miss up to 75% of unsafe content. Model size does not correlate with safety performance, and general-purpose guard models outperform specialized ones.
Qwen Guard (4B parameters) achieves the highest recall (83.97%) among 14 open-source safety guard models.
Larger models like Llama Guard (12B) and GPT-OSS Safeguard (20B) exhibit conservative behavior, missing up to 75% of unsafe content.
Aryabhata 2 is a reasoning-focused language model for competitive STEM exams like JEE and NEET, fine-tuned via reinforcement learning on GPT-OSS-20B using PhysicsWallah's question banks. It achieves up to 64% fewer output tokens while outperforming the base model on multiple benchmarks.
Aryabhata 2 uses RL post-training optimized for competitive STEM exams.
Built on GPT-OSS-20B with custom training curriculum from PhysicsWallah.
Large language models suffer from hallucination in long-form generation. Existing retrieval-augmented models cannot ensure key information stays close to outputs. This paper proposes Micro-Macro Retrieval (M2R), a retrieve-while-generate framework that retrieves coarse-grained evidence externally and extracts key information from a reasoning-built repository, significantly reducing hallucination. It uses curriculum learning-based reinforcement learning for stable training.
LLMs are prone to hallucination in long-form generation due to redundant context and long reasoning chains
Factual accuracy increases when key information is closer to model outputs
This paper presents RightNow-Arabic-0.5B-Turbo, a 518M-parameter Arabic-specialized LLM built on Qwen2.5-0.5B using vocabulary injection and edge-first deployment. It achieves 35.9% mean accuracy on Arabic benchmarks, outperforming all same-class open models, and ties Falcon-H1-1.5B on COPA-ar at one-third the size. The quantized model is 398 MB and delivers 635 tokens/s on a single H100, enabling efficient edge deployment.
518M-parameter Arabic LLM built on Qwen2.5-0.5B with vocabulary injection of 27,032 Arabic tokens.
Achieves 35.9% mean accuracy on three Arabic benchmarks, surpassing all same-class open-source models.
A new paper analyzes 17 LLMs (410M-100B+ parameters) and documents that instruction-tuned systems systematically collapse language entropy along discourse and structural dimensions (mean amplification: 1,949-16,853%, peaks: 5,181-209,675%), while suppressing complex punctuation to 3.2-23.2% of baseline. These effects do not worsen under RLHF. Weak intervention (lambda=1.0) exacerbates collapse by 240%, while strong control (lambda=5.0) achieves 40.5% improvement and outperforms frontier models by 96.7-98.2% despite 200-1000x scale disadvantage. Strong control also delivers 15% higher distinct-4, 27% higher vocabulary diversity, and 78% lower repetition than moderate regularization. The findings underscore that alignment requires sufficient control strength, not merely distributional smoothing.
Instruction tuning causes language entropy collapse along discourse and structural dimensions, with significant suppression of complex punctuation.
RLHF does not worsen stylistic collapse, but weak regularization exacerbates it.
As large language models (LLMs) grow in influence, understanding their decision-making becomes crucial. This paper introduces a method to detect concepts within LLM embeddings using low-cost linear probes, enabling monitoring of what models "think" during normal operation. The authors demonstrate concept delineation, probe training, and cross-context tracking across four concepts and three LLMs, paving the way for scalable model transparency.
Proposes linear probes to detect concepts in LLM embeddings for low-cost internal monitoring.
Details dataset creation, probe training/testing, and tracking across larger contexts.
Multimodal learning often suffers from modality imbalance, where faster-converging modalities dominate optimization. Existing methods typically strengthen weak modalities or adjust gradients, but may compromise strong modalities. This paper proposes Balanced Multimodal Label Reshaping (BMLR), the first label-side approach to promote balance. BMLR reshapes the cross-modal label space to equalize mapping difficulty across modalities, enhancing interaction and injecting rich inter-class information. Extensive experiments show consistent improvement and compatibility.
Modality imbalance arises from differences in mapping difficulty from feature spaces to the shared label space.
BMLR is the first method to address multimodal balance from the label side.
Metagenomic taxonomic annotation identifies microbial origins of DNA fragments. Traditional similarity-based methods struggle with high diversity and incomplete databases. TaxDistill uses a knowledge distillation framework with a 500M-parameter genomic foundation model (GenomeOcean) as teacher to generate soft labels, reducing label noise. Experiments on seven CAMI2 datasets show TaxDistill outperforms baselines, e.g., improving F1 score on Gastrointestinal dataset from 0.763 to 0.941.
TaxDistill reduces label noise in metagenomic classification via knowledge distillation
Introduces GenomeOcean, a 500M-parameter genomic foundation model as teacher
This paper proposes COM, a strategy that integrates geometric constraints into token initialization and training to preserve the inherent continuity and ordinality of time series tokens, consistently improving the performance of token-based time series LLMs on multiple benchmarks.
Token-based time series LLMs overlook continuity and ordinality, limiting performance.
COM applies geometric constraints during initialization and training to preserve these properties.
TRACE is a trajectory-aware LLM-reasoning agent for molecular lead optimization that treats tool selection as a sequential decision-making problem, enabling forward-looking structural refinement under constraints, achieving higher success rates and property improvements on ADMET tasks.
TRACE formulates tool selection as sequential decision making over action trajectories.
It enables trajectory-aware decisions to improve ADMET properties while preserving molecular similarity.
Recent work shows RL retains prior capabilities more effectively than SFT. This paper extends to the mechanistic level, introducing differential circuit vulnerability to measure circuit degradation. On Qwen2.5-3B-Instruct for scientific QA, SFT adapts faster but causes greater circuit disruption and forgetting, while RL preserves circuits at the cost of slower adaptation. Results suggest circuit preservation explains RL's robustness against catastrophic forgetting.
SFT adapts quickly but disrupts internal circuits, leading to catastrophic forgetting.
RL preserves more of the base model's circuits, resulting in less forgetting but slower task adaptation.
This paper studies behavioral alignment and representation dynamics of LLM agents in financial environments using TradeArena. It identifies measurable pre-failure signatures like planning embedding drift and effective-rank contraction. Structured risk feedback can serve as an external alignment signal but is not a universal performance enhancer. A 51-stock experiment reveals a correlation blind spot where LLM rationales justify concentrated exposure to coupled assets.
LLM agents exhibit measurable pre-failure signatures including planning embedding drift and effective-rank contraction.
Structured risk feedback acts as an external alignment signal but varies in effectiveness across models.
VFEAgent is an end-to-end multi-agent system that automates finite element analysis (FEA) modeling and simulation directly from input images and problem descriptions. It combines a multimodal vision-language multi-agent pipeline with a verification-first code synthesis framework, using ReAct-driven reasoning to extract structured FEA specifications and incorporating self-debugging and fallback mechanisms for executability and physical validity. Experiments show high success rates in generating complete, physically valid simulations, outperforming LLM-based baselines in reliability and correctness, and promising to free engineers from tedious manual analysis.
VFEAgent automates FEA modeling and simulation from images and problem descriptions.
Employs a multimodal vision-language multi-agent pipeline with ReAct-driven reasoning.
A new study uses five frontier LLMs from Anthropic and OpenAI as 'agentic curators' in a self-contained workspace to automate phenotype annotation. The agents achieved consistency within the range of human curators and substantially outperformed traditional NLP tools, addressing the scalability bottleneck in ontology curation.
Phenotype annotation relies on human experts, which is labor-intensive and hard to scale.
The study deployed five frontier LLMs as agentic curators in a self-contained workspace.
This paper introduces Orthogonal Concept Erasure (OCE), which uses multiplicative parameter updates for precise concept removal while preserving generative capacity, supporting multi-concept erasure with high speed.
Existing editing-based methods rely on additive updates that interfere with generative capacity.
OCE uses orthogonal transformations as multiplicative updates, preserving neuron direction and angular geometry.
This paper empirically evaluates LLM-generated reviews for scientific papers, finding limited alignment with human reviews that varies significantly across prompts and models. It also shows that authors can game the system by iteratively revising papers based on LLM feedback, achieving statistically significant score increases for up to 35% of papers.
LLM reviews show limited alignment with human reviews
Alignment quality varies substantially across different prompts and models
The Cognitive Categorical Transformer (CCT) is a 306M-parameter architecture that augments GPT-2 Small with cognitive and category-theoretic components, achieving 21.27 perplexity on WikiText-103, a 2.92 (12%) reduction over a fine-tuned baseline. Ablations attribute 84% of the improvement to GT-Full simplicial message passing. The study also identifies a structure/consistency distinction among categorical priors.
CCT achieves 21.27 perplexity on WikiText-103, 2.92 lower than GPT-2 Small baseline.
Ablation studies attribute 84% of the gain to GT-Full simplicial message passing.
This paper proposes behavior-aware auxiliary corrections to stabilize off-policy temporal-difference learning. By replacing the auxiliary covariance matrix with the behavior Bellman matrix, the authors introduce BA-TDC and BA-TDRC algorithms. Theoretical analysis proves fixed-point preservation and almost-sure convergence. Experiments on standard benchmarks show that the behavior-aware replacement improves performance, but regularization is needed for robust results.
Behavior-aware auxiliary corrections improve stability of off-policy TD learning.
BA-TDC and BA-TDRC replace the auxiliary covariance matrix with the behavior Bellman matrix.
This paper proposes STHTD-MP, a behavior-induced Mirror-Prox temporal-difference method that replaces the covariance metric with the symmetric part of the behavior-policy Bellman matrix to improve off-policy prediction speed. Theoretical convergence analysis and numerical experiments on several benchmarks show improved performance over GTD2-MP.
STHTD-MP uses behavior-policy transition information to construct a more informative update geometry.
Rigorous convergence analysis is provided for fixed-policy linear prediction.
OpenAI launches Rosalind Biodefense, expanding trusted access to GPT-Rosalind for vetted developers and U.S. government partners advancing biodefense, public health, and pandemic preparedness through frontier AI.
Anthropic announced in its $65 billion Series H funding that its annualized run-rate revenue crossed $47 billion in early May 2026, up from $30 billion in April and $14 billion in February. The rapid growth has drawn comparisons to unprecedented organic revenue scaling, though some skeptics question the numbers. Anecdotal evidence of a client spending $500 million in a single month on Claude licenses adds context.
Anthropic's run-rate revenue reached $47 billion as of early May 2026.
Revenue grew from $9 billion (end of 2025) to $14 billion (Feb), $30 billion (Apr), and $47 billion (May).
Anthropic released Claude Opus 4.8, described as a modest but tangible improvement over its predecessor. Key highlights include enhanced honesty (reduced unsupported claims, four times less likely to overlook code flaws), and new features like mid-conversation system messages. Pricing remains unchanged, but fast mode costs are significantly reduced.
Anthropic launches Claude Opus 4.8, honestly calling it a 'modest but tangible improvement'.
Honesty improved: model is less prone to unsupported claims and four times less likely to miss code flaws.
Anthropic released Claude Opus 4.8, showing improvements in terminal engineering and knowledge work, outperforming Mythos in certain benchmarks. The model features enhanced honesty and a new Dynamic Workflows capability that orchestrates hundreds of parallel sub-agents. Early testers report significant gains in code quality and task reliability.
Claude Opus 4.8 was released just 43 days after 4.7, with notable gains in coding and knowledge tasks
Dynamic Workflows: Claude generates JavaScript orchestration scripts to coordinate hundreds of parallel sub-agents
Release of llm-anthropic 0.25.1 adds support for Claude Opus 4.8, fast mode option for eligible accounts, and changes default max_tokens to each model's maximum output.
New model: Claude Opus 4.8 (claude-opus-4.8).
New -o fast 1 option for fast mode (for organizations with feature enabled).
Anthropic launches Claude Opus 4.8 with two Claude Code updates: dynamic workflows that coordinate up to 1,000 subagents in parallel, and a cheaper fast mode that speeds up output 2.5x. Both are in research preview.
Dynamic workflows let Claude write orchestration scripts for parallel subagents, with up to 16 concurrent and 1,000 total per run.
Fast mode delivers 2.5x faster output for Opus 4.8 at three times lower cost, requiring usage credits.
Azercell Telecom collaborated with the AWS Generative AI Innovation Center to build an Azerbaijani LLM on Amazon SageMaker AI, achieving 23% higher training throughput, 58% lower peak GPU memory, and 2× token efficiency via custom tokenizer, FSDP, and Liger Kernel optimizations.
Azercell developed a production-ready Azerbaijani LLM framework using Amazon SageMaker AI.
Custom tokenizer reduced tokens per word from 3.22 to 1.59, doubling encoding efficiency.
Anthropic releases Claude Opus 4.8, which beats GPT-5.5 and Gemini 3.1 Pro in most benchmarks. The model also catches its own coding errors four times more often than its predecessor. Alongside the launch, Anthropic is rolling out dynamic workflows that can spin up hundreds of parallel sub-agents to handle tasks like codebase-wide migrations.
Claude Opus 4.8 outperforms GPT-5.5 and Gemini 3.1 Pro in most benchmarks.
The model catches its own coding errors four times more often than its predecessor.
Not every new model is all it's cracked up to be. Our tracker keeps each release in context with its peers, so you know which models are worth your time. This article summarizes major model releases of 2026 so far, including Claude Opus 4.8, GPT-5.5 Instant, Nemotron 3 Nano Omni, GPT-5.5, ChatGPT Images 2, Claude Opus 4.7, Claude Mythos (Preview), GPT-5.4, Claude Opus 4.6, and GPT-5.3-Codex, with details on their features and significance.
Anthropic's Opus 4.8 offers faster thinking at lower cost, claims lower misalignment rates than Opus 4.7, comparable to Mythos Preview.
Claude Code now supports one-click model switching, BYOK, and compatibility with Anthropic and OpenAI APIs. Get started at $5/mo to route around outages and rate limits.
Anthropic released Opus 4.8 with user-controllable effort, dynamic workflows for large-scale coding, fast mode at one-third the previous cost. Benchmarks show it leads GPT-5.5 and Gemini 3.1 Pro except in terminal coding. Improvements in honesty, autonomy support, and reduced deception.
Users can now control Claude's "effort" level to balance response quality and speed.
Dynamic workflows (research preview) allow Claude to plan and run hundreds of parallel subagents in a single session, enabling codebase-scale migrations.
Anthropic's most advanced Opus model, Claude Opus 4.8, is now available on Amazon Bedrock and the Claude Platform on AWS. It delivers improvements in coding, agentic tasks, and professional work with greater consistency and autonomy for long-running production workflows.
Claude Opus 4.8 is Anthropic's most advanced Opus model, now available on AWS.
It offers enhanced performance in coding, multi-stage autonomous tasks, and professional work with lower output variance.
Anthropic is releasing Claude Opus 4.8 on Thursday, touting the model's 'honesty.' Early testers found it more likely to flag uncertainties and less likely to make unsupported claims. Evaluations show it is about 4x less likely than its predecessor to allow code flaws to pass unremarked. Users can also direct the amount of effort Claude puts into a task, and a 'dynamic workflows' feature allows parallel subagents.
Claude Opus 4.8 is more inclined to flag uncertainties and avoid unsupported claims.
It is about 4x less likely than its predecessor to overlook code flaws.
Anthropic released Claude Opus 4.8, the latest upgrade to its flagship model. It improves on Opus 4.7 across benchmarks, with notable gains in honesty and agentic capabilities. New features include effort control, dynamic workflows in Claude Code, and API improvements. Pricing remains unchanged, while fast mode is now three times cheaper. The company also previews upcoming higher-intelligence models.
Claude Opus 4.8 outperforms Opus 4.7 on multiple benchmarks, especially in honesty and agentic tasks
New features: effort control for users, dynamic workflows in Claude Code, and system entries in Messages API
Laid off earlier this year, built One Tile in one night using AI tools and no-code platform Base44, with zero development experience. The project garnered 200k views on Reddit.
Built One Tile in one night after getting laid off.
Used AI tools and no-code platform Base44, no dev experience.
Ferrari's first electric car, the Luce, designed with Jony Ive, has a divisive look and packed with new tech. This Vergecast episode discusses its design, market impact, and the growing public distaste for AI.
Ferrari's first EV, the Luce, features unconventional design by Jony Ive.
The Vergecast debates the Luce's design, technology, and electric vehicle demand.
Boston Children’s Hospital uses OpenAI technology to improve patient care, reduce operational burden, and help diagnose more than 40 rare disease cases.
Boston Children’s Hospital employs OpenAI technology to aid rare disease diagnosis
The intelligent and thoughtful encyclical is an important warning of the uses and misuses of a rapidly developing technology. Silicon Valley is wrong to dismiss it.
Pope Leo XIV issued encyclical 'Magnifica Humanitas' on AI.
Inspired by Pope Leo XIV's encyclical on AI, this article catalogs 40 frustrating tech problems, from one-time passcodes that never arrive to touchscreens in cars. A humorous critique of tech companies putting profit before people.
The article uses the pope's encyclical to frame a list of 40 tech annoyances.
Common frustrations include broken passcodes, QR code parking apps, and useless chatbots.
Pubflow introduces a unified system that integrates authentication, backend logic, and infrastructure, eliminating the need for glue code when building AI-powered applications. It offers multi-database support, multiple language compatibility, and production-ready starter kits.
Pubflow provides a unified trust layer for AI app development.
It combines authentication (Flowless), backend (Flowfull), and infrastructure (Pubflow Cloud).
Microsoft is launching a revamped version of Microsoft 365 Copilot with a cleaner design that loads twice as fast. The update introduces progressive disclosure and improved formatting options.
Redesigned Copilot loads twice as fast and provides more reliable, structured responses
Progressive disclosure feature shows tools and controls based on user prompts
Dr Susan Oman on a campaign designed to raise public awareness of AI, arguing that while governments, faith leaders, and tech bosses debate AI's future, the public is consistently left out. She cites evidence showing public concern about AI has risen by 10% in two years, and 91% believe fairness should be prioritized over economic gain.
Public consistently excluded from AI debates despite being most affected
The article draws parallels between the 19th-century railroad boom and today's AI investment frenzy, highlighting massive capital expenditure, financial innovation, and historical precedents for bubbles and crashes. It argues that AI's financial infrastructure may be as transformative—and as risky—as railroads were.
Railroad investment in the 1850s reached 3-5% of GDP, similar to today's AI capex from five tech giants.
The bond market was created to finance railroads, just as AI is reshaping capital markets.
This article analyzes the feasibility of AI data centers in space, covering physical advantages (continuous sunlight, passive cooling, laser links) and engineering constraints (thermal dissipation, radiation hardening, training synchronization, maintenance). The key assumption is Starship launch costs. Several startups, Google, and SpaceX have announced pilot programs. Near-term investment impact is modest but worth monitoring.
Orbital AI data centers leverage LEO's continuous solar power, passive radiative cooling, and vacuum-speed laser links for potential advantages over terrestrial datacenters
Engineering challenges include thermal dissipation (high-density clusters require impractically large radiators), radiation hardening (commercial chips' orbital longevity unknown), and training synchronization latency
OpenAI CEO Sam Altman has reversed his earlier predictions that AI would lead to massive job losses, now saying a 'jobs apocalypse' likely won't occur. He acknowledged his intuitions were off, citing the irreplaceable value of human interaction in the workplace. While other industry leaders still warn of disruption, Altman's remarks reflect considerations of AI costs, adoption pace, and public opinion.
Altman previously predicted AI would replace most jobs, but now says he was 'delighted to be wrong' and does not foresee a jobs apocalypse.
He explained that the human element of work—social interaction—cannot be replaced by AI, updating his view on the jobs landscape.
The article draws parallels between historical technological cycles (e.g., Einstein's miracle year, the electric revolution) and the current AI boom, arguing that foundational breakthroughs are followed by long application phases. During these phases, some jobs disappear but many new ones emerge. AI is in its theoretical breakthrough phase, and the subsequent application era will create more opportunities than it destroys.
Historical patterns show that revolutionary theory is followed by decades of application, which eliminates some jobs but creates many new ones.
AI today is akin to Einstein's miracle year in 1905; the application age is yet to come.
UC Berkeley's UCCL team releases mKernel, fusing intra-node NVLink, inter-node RDMA, and dense compute into a single persistent CUDA kernel. Communication can consume 43.6% of forward pass and 32% of training time. mKernel offers five fused kernels and supports ConnectX-7 and AWS EFA backends.
mKernel fuses intra-node NVLink, inter-node RDMA, and compute into a single persistent CUDA kernel
Communication overhead accounts for up to 47% of execution time in MoE models
ChatGPT and other AI tools are increasingly citing Grokipedia, Elon Musk's AI-generated encyclopedia, raising concerns about accuracy and misinformation. Although Grokipedia currently accounts for a small share of citations, its usage is rising, especially in ChatGPT where it is often treated as a primary source. Experts warn that using AI-generated, human-oversight-free Grokipedia as a source could spread biases, errors, and even data poisoning risks.
ChatGPT, Google AI Overviews, and Gemini are among tools citing Grokipedia
Grokipedia citations have grown steadily since November but remain far below Wikipedia
This week, the AI-and-work conflict simultaneously erupted across four jurisdictions: Wikipedia editors threaten strike over layoffs, Amazon employees game internal AI ranking into uselessness, Chinese courts enforce ban on AI-justified layoffs, UK thinktank calls for employee say in AI deployment. Meanwhile, frontier labs deepen government ties.
Wikipedia editors threaten strike in protest of foundation layoffs
Amazon employees game internal AI ranking system into uselessness
This is the first post in the Profiling in PyTorch series, starting with a simple matrix multiplication and bias addition to teach readers how to use torch.profiler. It covers setting up the profiler, reading the profiler table and trace, understanding CPU/GPU activity gaps, and the impact of warmup and matrix size on performance regimes.
torch.profiler outputs a table and a trace; the table identifies hotspots, the trace shows temporal execution. Small matmuls are overhead-bound; scaling up makes them compute-bound. Warmup eliminates startup overheads, producing consistent profile steps. CPU-GPU offset reflects kernel launch and synchronization delays.
Apple has long touted the privacy benefits of on-device AI, but a new report suggests its Gemini-powered Siri will rely heavily on Google and Nvidia cloud servers. While this hybrid approach addresses performance limitations of local models, it represents a trade-off on privacy.
Apple is partnering with Google to integrate Gemini AI into Siri on iPhone.
Due to limited on-device chip performance, Siri will use both local and cloud processing for enhanced AI capabilities.
LightSail Technology announced a strategic partnership with Tencent Travel Services to integrate its AI full-sensing wearable device into the mobility platform. The device previously topped JD.com's bestseller list and sold out; now a new pre-sale round is open with discounts.
LightSail Technology and Tencent Travel Services partner to integrate AI wearable into travel services.
The LightSail AI wearable topped JD.com's bestseller list for 8 consecutive days and sold out.
The UK government plans to deploy AI facial recognition technology at borders from next year to detect adult migrants posing as children. The technology will estimate age from photos, but human rights groups criticize it as unproven and potentially harmful to children's rights.
UK to deploy AI facial recognition for age estimation of asylum seekers by mid-2027.
Technology aims to identify adults falsely claiming to be children, but Human Rights Watch urges scrapping the plan.
Xerolith is a working platform that achieves persistent identity, autonomous belief formation, and substrate-independent knowledge consolidation through a hierarchical fractal vault architecture. Over 80 days of continuous operation, it has compressed 2,817 raw entries into 1,218 beliefs, with complete genealogical tracing and internal alignment.
Three-layer architecture: entries, lessons, and beliefs for autonomous consolidation from raw data to abstract principles.
Persistent identity maintained over 80+ days and multiple restart cycles.
This paper proposes a data-driven approach using recurrent neural networks and one-step-ahead predictive control for bead geometry control in Wire Arc Additive Manufacturing (WAAM). By updating the model online to account for changing thermal conditions, it significantly improves bead height and width consistency.
Uses recurrent neural network to learn input-output dynamics of WAAM
One-step-ahead predictive control improves bead geometry consistency
Researchers propose a multi-resolution end-to-end deep neural network to balance latency and safety in autonomous driving. By selecting input resolution at runtime, the network improves safety metrics like lane invasions, red-light infractions, and collisions in CARLA simulations compared to fixed-resolution baselines.
Latency-accuracy tradeoff is critical for real-time autonomous driving decisions.
The article explores the concept of 'disposable software' in the AI era, arguing that AI-generated code should be treated as disposable to accelerate development, much like mass-produced furniture replaced artisan craftsmanship. A case study demonstrates successful AI refactoring, and a 'Disposable Code Manifesto' is proposed with three pillars: intent, requirements, and safety.
AI makes software cheap and disposable, analogous to the industrial revolution in furniture.
A real-world Rails project case shows how AI refactoring reduced code from 2000+ lines to 264 lines.
This video explores strategies and methods to counter superhuman AI in the game of Go, including exploiting weaknesses, innovative tactics, and understanding AI decision-making.
Superhuman AIs in Go have surpassed top human players
The video analyzes potential AI weaknesses and how to exploit them
Anthropic raises $65 billion in Series H at $965 billion valuation. Annualized revenue exceeds $47 billion. Funds allocated to safety research, compute, and Claude expansion.
The Wikimedia Foundation, sitting on $296 million in reserves and a profitable AI revenue stream, laid off long-time staff and disbanded the Community Tech team, prompting volunteer editors to threaten a strike. The article explores how 'CEO AI psychosis' distorts organizational priorities and how replacing human judgment with AI can create a downward spiral of degrading data quality.
Wikimedia Foundation fired a 20-year veteran and disbanded the Community Tech team, triggering a strike threat from volunteer editors.
AI companies profit from Wikipedia data but undermine the volunteer community that produces it.
This article explores how AI is affecting software engineering interviews, analyzing different interview types (take-home, live exercise, presentation, actual work) across dimensions of signal quality and cost to company. It argues that AI makes take-homes too easy and live coding less relevant, recommending that companies limit AI usage in interviews to preserve signal quality, drawing parallels to classical academic evaluation models.
AI coding threatens current interview models, especially take-home and live coding.
Companies should limit AI usage during interviews to maintain signal quality.
AI training startup Shift offers free home cleaning services, but records cleaners to gather training data for robots. The company says the value of the data covers the cost. The service is initially available only in New York, with plans to expand to San Francisco, London, Zurich, and Munich soon.
Shift provides free cleaning in exchange for recording cleaners to train AI robots.
Cleaners wear a special hat with a camera to capture their work.
Claude’s parent company’s $65bn in latest funding round underscores vast sums of money still flowing into industry. Anthropic, the AI firm behind the Claude chatbot, announced on Thursday it had raised $65bn in funding to value the company at $965bn post-money. The move makes Anthropic the world’s most valuable AI startup, eclipsing its competitor OpenAI. The deal marks an exceedingly successful period of growth for Anthropic, which was once considered to be a smaller player in the global AI arms race. The widespread adoption of its products by large enterprise businesses, especially following its release of powerful coding assistants late last year, has turned it into a dominant player in the industry.
Anthropic raised $65bn in funding, valuing it at $965bn.
It surpasses OpenAI as the world's most valuable AI startup.
Next month's Tribeca Festival will include the premiere of an AI-generated film: Dreams of Violets. The 75-minute film is a fictional dramatization of the Iranian government's mass killing of protestors in January, with the people and images fully created by AI. It cost $2,000 to make and was created by two Iranian-born brothers using various AI tools.
Dreams of Violets is a 75-minute AI-generated film premiering at Tribeca, costing $2,000.
It dramatizes the Iranian government's mass killing of protestors, using AI for all images.
YouTube introduces new features for Premium subscribers to enhance podcast listening, including an audio-first 'on-the-go mode', auto speed adjustment, and AI podcast recommendations.
YouTube launches 'on-the-go mode' that converts video interface to audio-first for listening on the move.
New auto speed feature adjusts playback speed dynamically based on content.