Hermes Desktop is a cross-platform desktop app that bundles a Python runtime, hermes-agent (a self-improving AI agent), and hermes-web-ui (a Vue 3 + Koa chat dashboard) into a single Electron application, requiring no separate Python or Node installation. It integrates with DingTalk and is powered by DeepSeek.
Bundles Python runtime and hermes-agent for a zero-dependency user experience
DeepSeek researcher Chen Deli used his self-developed DeliAutoResearch skill, collaborating with DeepSeek-V4-Pro and GPT-Image2, to complete a 46-page paper in just 6 days. The paper introduces an L1-L5 autonomy classification for research agents, analyzes four architectural patterns and 17 mainstream systems, and identifies six open problems. Chen Deli says only about 2 hours of human 'CPU time' were needed, with the rest handled by AI agents.
Chen Deli's DeliAutoResearch skill enabled the paper to be 99% written by AI agents.
The paper proposes an L1-L5 autonomy classification for research agents, analogous to SAE levels for autonomous driving.
Anthropic released its formerly classified Mythos model to the public, collapsing the gap between sovereign and developer AI. DeepMind's Demis Hassabis moved AGI timeline to 2029. Critical vulnerabilities in Starlette impacted millions of AI agents, and a coordinated takedown dismantled the Glassworm botnet. BNP Paribas partnered with Mistral for sovereign AI security, while China restricted travel for top AI engineers at Alibaba and DeepSeek. Corporate AI spending and layoffs made headlines: Uber burned its full-year AI budget by April, ClickUp restructured with a 3:1 AI-to-human ratio, and Sam Altman reversed his white-collar apocalypse prediction. However, MIT Technology Review data showed AI-exposed roles have lower unemployment.
Anthropic releases Mythos, previously limited to government contractors, now available via standard API.
DeepMind CEO Hassabis advances AGI timeline to 2029, citing AlphaProof Nexus solving nine Erdős problems cheaply.
China is restricting overseas travel for top AI researchers at private firms like Alibaba and DeepSeek, requiring official approval to leave the country, due to fears of data leaks and talent poaching.
China requires top AI researchers to obtain permission before traveling abroad.
The policy applies to private companies like Alibaba and DeepSeek.
Kuaishou releases Keye-VL-2.0-30B-A3B, a multimodal large language model that first applies DeepSeek Sparse Attention (DSA) to multimodal scenarios, enabling 256K ultra-long context deep perception. It achieves SOTA on long-video temporal understanding benchmarks and introduces built-in Agent collaboration, paving the way for enhanced reasoning and real-world business applications.
First to integrate DSA attention into multimodal, solving long-video understanding bottlenecks.
Achieves SOTA on TimeLens, LongVideoBench, MLVU; reverses long-context decay by boosting accuracy from 35.34% to 42.44% when scaling from 64 to 512 frames.
UUMuse is a cloud AI knowledge base platform where you upload files once and use them across GPT, Claude, DeepSeek, Qwen, and more — with cited answers, persistent memory, agent mode, a multi-expert debate feature (Spark), and flexible deployment as docs sites, APIs, or MCP servers.
Upload files once and query multiple AI models (GPT, Claude, DeepSeek, Qwen) with source citations.
Persistent memory remembers your writing style and project context across conversations.
One month after DeepSeek V4's release, the open-source community unveiled Reasonix, a tool specifically designed to minimize API costs by maximizing cache efficiency. It achieves a staggering 99.82% cache hit rate, reducing a $61 bill for 400M+ tokens to just $12.
Reasonix is a dedicated coding harness for DeepSeek, focusing on cost reduction.
Its cache-first loop, tool-call repair, and automatic context compression maintain over 90% cache hit rate in long sessions.
Deepseek is making the 75 percent discount on its top model V4-Pro permanent. At $0.435 per million input tokens, it's at least 11.5 times cheaper than GPT-5.5 and over 34 times cheaper on output. For token-hungry agentic systems, this kind of pricing could squeeze Western providers hard.
Deepseek's 75% discount on V4-Pro is now permanent.
Input token price is $0.435 per million, 11.5x cheaper than GPT-5.5.
Alibaba's Qwen team releases Qwen3.7-Max, a proprietary model built for long-running autonomous agent tasks. It matches Claude Opus 4.6 on benchmarks and beats Chinese rivals like DeepSeek V4 Pro and Kimi K2.6. The team also demos the model steering a four-legged robot.
Qwen3.7-Max designed for long-running autonomous tasks
DeepSeek announced permanent price cuts for its V4-Pro API. Meanwhile, CATL, JD, and NetEase are in talks to invest in DeepSeek's first external funding round. Founder Liang Wenfeng emphasizes prioritizing AGI research and maintaining open-source principles.
DeepSeek V4-Pro API permanently reduced to one-quarter of original price
CATL, JD, and NetEase among companies negotiating investment in DeepSeek
Ahead of OpenAI's likely IPO filing, Greg Brockman signals a shift from pure models to agent products. DeepSeek makes 75% price cut permanent, MCP protocol becomes stateless, Google launches 24/7 AI agent, and Anthropic finds over 10,000 critical vulnerabilities. Agentification is the new normal.
Greg Brockman says model alone is no longer the product; harness+agent+workflow is key
DeepSeek V4 Pro permanently discounted 75%, slashing inference costs
Deepseek is about to raise around $10 billion, which would value the Chinese AI startup at roughly $45 billion. Founder Liang Wenfeng is telling investors he's putting AGI research ahead of short-term profits.
Deepseek is raising ~$10B at a ~$45B valuation.
Founder Liang Wenfeng prioritizes AGI research over short-term profits.
DeepSeek has raised 70 billion yuan and is developing its own AI coding product, DeepSeek Code. Senior researcher Deli Chen posted a recruitment for a Code Harness team, and former TSY Capital co-founder Cui Tianyi may lead the team.
DeepSeek raised 70 billion yuan, prioritizing AI research over commercialization.
DeepSeek Code confirmed in development, hiring for Agent Harness team.
aiodeepseek is a high-performance async Python client for the private DeepSeek API, supporting streaming, image uploads, multi-turn conversations, and new account registration. It automatically solves proof-of-work challenges using C++ with AVX2 optimization.
Async Python client with streaming and image upload support
Multi-turn conversation and account registration features
HELLoRA is a parameter-efficient fine-tuning method for Mixture-of-Experts (MoE) models that attaches LoRA modules only to the most frequently activated experts per layer. It reduces trainable parameters and adapter FLOPs while improving downstream performance. Tested on OlMoE, Mixtral, and DeepSeekMoE across math, code, and safety tasks, HELLoRA significantly outperforms vanilla LoRA, e.g., using 15.7% of the parameters on OlMoE with 9.2% higher accuracy.
HELLoRA attaches LoRA only to the most active experts per layer in MoE models.
It achieves superior performance with far fewer trainable parameters and FLOPs.
AI research in 2025 shifted from chatbots to reasoning systems, autonomous agents, and multimodal models. Key papers include DeepSeek-R1 (reinforcement learning), Gemini 2.5 (multimodal reasoning), Qwen2.5 (open models), Large Concept Models (concept-level language modeling), ESG analysis against greenwashing, VideoWorld (world models), AI Scientist-v2 (autonomous research), SWE-Lancer (coding agent benchmark), OLMo 2 (fully open language models), and Mixture-of-Recursions (efficient reasoning).
DeepSeek-R1 publicly demonstrated reinforcement learning for post-training, boosting reasoning and coding.
Gemini 2.5 introduced 'Thinking Mode' and advanced multimodal understanding with long context.
Researchers propose Group-Query Latent Attention (GQLA), a modification of DeepSeek's Multi-head Latent Attention that provides two hardware-adaptive decoding paths without retraining. This approach enables efficient inference on both H100 and H20 GPUs, and includes TransGQLA for converting pretrained GQA models.
GQLA extends DeepSeek's MLA with dual decoding paths (MQA-absorb and GQA) to match different hardware rooflines.
A single set of GQLA weights can be used on H100 (MQA path) or H20 (GQA path with multi-token prediction).
An eventful month with one flagship release after another. CAISI assessment shows open models lagging behind the US frontier, but methodology is questioned. Highlights include MiMo-V2.5-Pro, Gemma-4, Kimi-K2.6, Laguna-XS.2, and DeepSeek-V4-Flash.
Multiple open model releases from DeepSeek, Google, Moonshot AI, Xiaomi, and others.
CAISI evaluation shows large Elo gap, but benchmarks may underestimate real-world performance.
From Gemma 4 to DeepSeek V4, this article explores how new open-weight LLMs are reducing long-context costs through architectures like cross-layer KV sharing, per-layer embeddings, attention budgeting, compressed convolutional attention, and mHC.
Gemma 4 introduces cross-layer KV sharing, cutting KV cache size in half while maintaining quality.
Per-layer embeddings boost model capacity with minimal computational overhead.
In an AI bot competition, participants computed the longest run of 1 bits in binary expansions of palindromic primes. DeepSeek V4-Pro won with 73 points, while ChatGPT and Grok failed to register due to misinterpretation of precomputation rules. Kimi benefited from a bug that accidentally gave correct answers in early rounds and won the final round.
DeepSeek won with 73 points, followed by Claude (60) and GLM (40).
ChatGPT and Grok were DNP because they precomputed before connecting and missed the 10-second registration window.
We ran DeepSeek V4 Pro and DeepSeek V4 Flash through the same FlowGraph benchmark used for Claude Opus 4.7 and Kimi K2.6. The Pro scored 77/100 for $2.25, landing between Opus (91) and Kimi (68). The Flash scored 60/100 for $0.02, a record low cost, but the build failed and key outputs were missing. Both models had lease expiry bugs, though Flash outperformed expectations in tool calling reliability. Overall, Opus remains the top performer, but DeepSeek's pricing shifts the cost landscape significantly.
DeepSeek V4 Pro scored 77/100 at $2.25, outperforming Kimi K2.6 (68) but trailing Claude Opus 4.7 (91).
DeepSeek V4 Flash scored 60/100 at $0.02, the cheapest test result, but had critical build and routing issues.
Violin is an open-source AI video translation tool combining speech recognition, LLM translation, and text-to-speech to make video content accessible across languages. It offers a web app, CLI, and agent skills, featuring a video-aware chat assistant and personalized voice selection. Built with Together API using models like Whisper, DeepSeek, and Cartesia, it's released under the MIT license.
Violin integrates ASR, LLM translation, and TTS for open-source video translation.
Supports web app, CLI, and agent skills for diverse users.
Tencent announced plans to increase AI infrastructure spending in the second half of 2026, citing improved domestic chip supply from Chinese manufacturers. The company also reported strong first-quarter earnings and is reportedly in discussions to acquire a stake in AI startup Deepseek.
Tencent will boost AI infrastructure spending in H2 2026.
Chinese chipmakers are increasing domestic AI chip production.
The US is leading in AI commercialization, with strong cloud infrastructure, data platforms, and energy advantages. Despite contenders like China's DeepSeek, the US clearly leads in revenue, adoption, tools, and reach. Europe lacks cloud scale and ecosystem, making it difficult to catch up. The AI race is also a security race, involving weaponized AI and a shift toward closed stacks.
Since DeepSeek R1's shock in January 2025, US companies have accelerated AI commercialization, leading in revenue, adoption, and tools.
The US owns global hyperscalers and data platforms like YouTube and GitHub, creating a complete AI ecosystem.
This paper challenges the assumption that chain-of-thought reasoning reduces bias, demonstrating that position bias in multiple-choice QA actually increases with reasoning trajectory length. Across 13 configurations, 12 show a positive partial correlation between trajectory length and Position Bias Score (PBS). Truncation experiments confirm causality, and the 671B DeepSeek-R1 shows low overall bias but a persistent length effect in the longest quartile. Direct-answer position bias is a distinct phenomenon. The findings argue against assuming reasoning models are order-robust and provide a diagnostic toolkit.
Position bias scales with reasoning trajectory length across multiple reasoning-capable models, even after controlling for accuracy.
Truncation intervention causally links longer reasoning to increased bias toward position-preferred options (16% to 32% for R1-Qwen-7B).
This week's AI developments highlight a shift from a model race to an infrastructure race. Anthropic's natural language autoencoders enable interpretability via language, OpenAI's voice models push conversational interfaces, SubQ claims a 12M-token context window, and Chinese AI labs like DeepSeek and Moonshot see soaring valuations. The editorial underscores that AI is becoming more inspectable, conversational, memory-rich, and institutionally valuable.
Anthropic's natural language autoencoders turn model activations into readable text, opening new interpretability paths
OpenAI's voice models transform AI from text-based queries to real-time conversational agents
Baidu has officially launched its new generation foundational large model, Wenxin 5.1. Using 'multi-dimensional elastic pre-training' technology, it achieves leading basic performance with only about 6% of the pre-training cost of comparable models. It has topped the LMArena search ranking in China and ranks fourth globally. Agent capabilities have significantly improved, surpassing DeepSeek-V4-Pro, and creative writing is on par with Gemini 3.1 Pro.
Wenxin 5.1 uses multi-dimensional elastic pre-training, reducing pre-training cost to 6% of industry average for similar scale models.
Topped LMArena search ranking in China with a score of 1223, fourth globally, the only domestic model on the leaderboard.
DeepSeek aims to raise up to 50 billion yuan in its first funding round, with founder Liang Wenfeng personally contributing 20 billion. The company's valuation has surged to 350 billion yuan, and the V4.1 model is set for a June release, signaling a shift from an idealistic lab to a commercial AI company.
Deepseek plans a record $7.35B funding round for a Chinese AI company, with V4.1 launching in June. Core Automation, founded by ex-OpenAI researcher Jerry Tworek just six weeks ago, is already targeting a $4B valuation.
Deepseek plans $7.35B funding round, the largest for a Chinese AI company.
Stagewise is an open source agentic IDE for developers featuring a built-in coding agent that can access the browser's console and debugger. It supports bring-your-own-key for popular AI providers like Z.ai, DeepSeek, Moonshot, and more, allowing developers to browse and build without context switching.
Open source agentic IDE with built-in coding agent
Huawei expects AI chip revenue to reach $12 billion by 2026, driven by demand from Alibaba, ByteDance, and Tencent, as Nvidia’s China market share drops to zero. Key challenges include SMIC’s limited advanced-node capacity, low yields, and long cycle times. The Ascend 950PR is now the primary AI chip for Chinese cloud providers, boosted by DeepSeek V4’s optimization for Huawei’s architecture.
Huawei projects $12 billion in AI chip revenue by 2026, up from $7.5 billion in 2025, marking over 60% annual growth.
Nvidia's China AI accelerator market share has collapsed to zero, according to CEO Jensen Huang, due to U.S. export restrictions and Beijing's push for domestic sourcing.
Salvatore Sanfilippo (antirez), the creator of Redis, has open-sourced ds4.c, a lightweight inference engine tailored for DeepSeek V4 Flash. It runs efficiently on Apple Silicon Macs using Metal API, achieving up to 27 tokens/s generation on high-end models.
Antirez releases ds4.c, a Metal-only inference engine for DeepSeek V4 Flash, optimized for Mac. No other models supported.
Employs asymmetric quantization (2-bit for MoE expert layers, Q8 for others) and disk-based KV caching for speed.
Nathan Lambert from AI2 shares his insights after a 36-hour visit to Chinese AI labs. He found a collaborative culture, students as core contributors, and a deep respect for DeepSeek, contrasting with the US competitive atmosphere.
Chinese AI labs foster a collaborative culture where students work on core R&D.
ByteDance is feared, DeepSeek is universally admired.
ZAYA1-8B is a reasoning-focused mixture-of-experts model with 700M active and 8B total parameters, trained on AMD hardware. It matches or exceeds DeepSeek-R1-0528 on math and coding benchmarks and introduces Markovian RSA for test-time compute.
ZAYA1-8B features 700M active parameters and 8B total parameters, trained on a full-stack AMD platform.
It matches or exceeds DeepSeek-R1-0528 on multiple math and coding benchmarks.
DeepSeek-V4's hybrid attention design (CSA, HCA, SWA) compresses KV cache, turning million-token context from a model challenge into a serving-systems problem. Together AI's early bring-up on NVIDIA HGX B200 reveals how cache policy, prefix caching, and endpoint profiles impact long-context workloads.
DeepSeek-V4's compressed sparse attention (CSA) and heavily compressed attention (HCA) reduce KV cache size, but the inference engine must manage multiple cache layouts.
Sliding window attention (SWA) becomes a bottleneck at long context, requiring careful storage strategy.
As the AI industry enters the Agent era, token demand has exploded. Infinigence AI, China's leading neutral AGI infrastructure provider, has raised over 2.2 billion yuan in total, with daily token calls growing over 20-fold since end of 2025. The company underpins major Chinese models like Kimi, GLM, MiniMax, and DeepSeek, positioning itself as a key hub in the token economy.
Agent era drives token consumption from hundreds to millions per task, reshaping infrastructure needs.
Infinigence AI's token call volume doubles every two weeks, far outpacing national average.
Deepseek is close to a funding round that could value the Chinese AI lab at roughly $45 billion, according to the Financial Times. The talks are being led by the China Integrated Circuit Industry Investment Fund, with Tencent also negotiating a stake.
Deepseek's valuation could reach $45B in upcoming round
China's state chip fund 'Big Fund' is leading the talks
Amazon SageMaker AI now includes an AI agent that lets developers describe use cases in plain language, automatically recommends training methods, prepares data, kicks off training, and delivers editable Jupyter notebooks. Supports Llama, Qwen, Deepseek, and Nova model families.
SageMaker AI introduces the Kiro AI agent for automated fine-tuning via natural language.
The agent is preinstalled in the development environment; alternative agents like Claude Code can be used.
The first week of the Musk v. Altman trial concluded with Musk's testimony dominating; OpenAI and Microsoft renegotiate their partnership, ending exclusivity; DeepSeek previews V4 models that narrow the gap with frontier models; Google DeepMind introduces Vision Banana, a unified model for image generation and visual understanding.
Musk admitted xAI partly distilled from OpenAI models during the trial's first week.
Microsoft and OpenAI revised their agreement, ending Microsoft's exclusive cloud rights; OpenAI can now use AWS and other providers.
Our 243rd episode with a summary and discussion of last week’s big AI news, including OpenAI's GPT-5.5, xAI's Grok Voice Think Fast 1.0, DeepSeek V4 open source, Google's massive investment in Anthropic, and safety research on sabotage and document corruption.
OpenAI released GPT-5.5 with strong coding improvements and a system card on chain-of-thought monitorability
xAI launched Grok Voice Think Fast 1.0, claiming big benchmark leads in real-time voice agents
DeepSeek-TUI is a Rust-based terminal coding agent optimized for DeepSeek models. It recently surged in popularity after the release of DeepSeek-V4 and the developer's Chinese-language promotion, hitting GitHub's trending list with over 2,300 stars. The tool offers chain-of-thought visualization, context compression, RLM multi-agent parallelism, and multiple model switching options.
DeepSeek-TUI is a terminal coding agent akin to Claude Code, specifically optimized for DeepSeek models, now with 2.3k GitHub stars.
Created by independent developer Hunter Bown, it is written in Rust and open-sourced under the MIT license.
DeepSeek V4's technical report introduced many innovations but notably lacked Engram, a conditional memory module jointly open-sourced by DeepSeek and Peking University in January 2026. Engram acts as a native lookup table for Transformers, separating static knowledge retrieval from deep reasoning, which improves efficiency and reasoning performance. Although absent from V4, three subsequent papers explored Engram's potential in CXL memory pooling, collision-free hot-layer optimization, and vision tasks.
DeepSeek V4 omitted Engram, a highly anticipated conditional memory module.
Engram uses hash-based lookup for static knowledge, freeing up network capacity for advanced reasoning.
This article announces the second wave call for speakers for the AI Engineer World's Fair, covering new tracks like autoresearch, memory, world models, tokenmaxxing, agentic commerce, and vertical AI. It also recaps recent AI developments, including Grok 4.3 release, DeepSeek V4 Pro progress, Codex vs Claude Code competition, agent infrastructure research, and discussions from the local LLM community.
Second wave call for speakers for AI Engineer World's Fair, with new topic tracks.
Grok 4.3 released with mixed reception; DeepSeek V4 Pro emerges as credible open-weight coding model.
This week OpenAI faced a triple blow: Musk's lawsuit threatens its nonprofit-to-profit conversion, revenue miss triggers market selloffs, and AWS deal ends Microsoft exclusivity. Meanwhile, DeepSeek price war, Big Tech layoffs, and White House plans to bypass Anthropic's safety flags signal shifting pricing power and regulatory landscape.
Musk's $134B lawsuit to return OpenAI to nonprofit status will set legal precedent for for-profit conversions in AI.
OpenAI's revenue miss below forecasts underpinning Oracle's $300B compute contract dragged down chip stocks.
DeepSeek-V4 is not just another frontier model; it is a systems engineering approach to making long-context reasoning practical, addressing the challenge of economically using a million-token context window through a new memory hierarchy, attention mechanics, and training stabilizers.
DeepSeek-V4 supports a one-million-token context window, but the focus is on economically using that context rather than just ingesting it.
The model introduces a new memory hierarchy, attention mechanics, training stabilizers, optimizer choices, quantization regimes, and serving stack to make long-context reasoning practical.
Despite a quiet day, notable releases include NVIDIA Nemotron 3 Nano Omni, vLLM v0.20, Poolside's first public model, and DeepSeek V4 serving benchmarks. Agent tooling matures and new benchmarks emerge.
NVIDIA releases Nemotron 3 Nano Omni, a 30B multimodal MoE with 256K context.
vLLM v0.20 introduces TurboQuant, FA4 for MLA, and new IR foundation.
DeepSeek-V4 Pro, a 1.6T-parameter MoE reasoning model, is now available on Together AI with a 512K context window, controllable reasoning modes, and cached-input pricing for long-context workloads like code agents, document intelligence, and research synthesis.
1.6T-parameter MoE with 49B activated parameters, 512K context on Together AI (model supports 1M)
Three reasoning modes: Non-Think, Think High, Think Max to match effort to task
This week's AI developments highlight a shift from model launches to AI becoming operational, with OpenAI releasing GPT-5.5, Workspace Agents, and ChatGPT Images 2.0; xAI making a deal with Cursor; and DeepSeek V4 and Kimi 2.6 advancing. Research papers cover distributed pre-training, multimodal understanding, and agentic coding.
OpenAI launches GPT-5.5, Workspace Agents, and ChatGPT Images 2.0, signaling AI's shift from conversation to execution
xAI strikes a deal with Cursor, underscoring code as the ideal environment for agents
After months of delay, DeepSeek released the highly anticipated DSV4 series including Pro and Flash variants with 1M context, mixed-precision quantization, MIT license, and Huawei Ascend support. The series tops open-weight models but lags behind closed frontier models.
DSV4 Pro: 1.6T total / 49B active, Flash: 284B total / 13B active, 1M context
New architecture with CSA and HCA dramatically reduces KV cache to 10% of V3.2