AI News HubLIVE

Source Mix

  • 量子位12
  • Hacker News AI9
  • The Decoder8
  • Latent Space4
  • TheSequence3
  • Together AI Blog3
  • AI Weekly2
  • arXiv AI2

Topic Mix

  • Agents33
  • Models29
  • Chips25
  • Research16
  • Policy7
  • Startups6
  • Tools3
  • Robotics1

Timeline

  • 2026-05-088
  • 2026-05-234
  • 2026-04-293
  • 2026-05-263
  • 2026-05-042
  • 2026-05-052
  • 2026-05-092
  • 2026-05-132

Latest Updates

Show HN: I packaged a Python AI agent and Vue dashboard into one Electron app

Hermes Desktop is a cross-platform desktop app that bundles a Python runtime, hermes-agent (a self-improving AI agent), and hermes-web-ui (a Vue 3 + Koa chat dashboard) into a single Electron application, requiring no separate Python or Node installation. It integrates with DingTalk and is powered by DeepSeek.

  • Bundles Python runtime and hermes-agent for a zero-dependency user experience
  • Uses Electron shell with hermes-web-ui frontend
In-site article

DeepSeek Researcher Develops Automated Research Skill: Writing a Paper with Only 2 Hours of Human Brain Time

DeepSeek researcher Chen Deli used his self-developed DeliAutoResearch skill, collaborating with DeepSeek-V4-Pro and GPT-Image2, to complete a 46-page paper in just 6 days. The paper introduces an L1-L5 autonomy classification for research agents, analyzes four architectural patterns and 17 mainstream systems, and identifies six open problems. Chen Deli says only about 2 hours of human 'CPU time' were needed, with the rest handled by AI agents.

  • Chen Deli's DeliAutoResearch skill enabled the paper to be 99% written by AI agents.
  • The paper proposes an L1-L5 autonomy classification for research agents, analogous to SAE levels for autonomous driving.
In-site article

AI Weekly Issue #496: Anthropic's Pentagon model is now everyone's model

Anthropic released its formerly classified Mythos model to the public, collapsing the gap between sovereign and developer AI. DeepMind's Demis Hassabis moved AGI timeline to 2029. Critical vulnerabilities in Starlette impacted millions of AI agents, and a coordinated takedown dismantled the Glassworm botnet. BNP Paribas partnered with Mistral for sovereign AI security, while China restricted travel for top AI engineers at Alibaba and DeepSeek. Corporate AI spending and layoffs made headlines: Uber burned its full-year AI budget by April, ClickUp restructured with a 3:1 AI-to-human ratio, and Sam Altman reversed his white-collar apocalypse prediction. However, MIT Technology Review data showed AI-exposed roles have lower unemployment.

  • Anthropic releases Mythos, previously limited to government contractors, now available via standard API.
  • DeepMind CEO Hassabis advances AGI timeline to 2029, citing AlphaProof Nexus solving nine Erdős problems cheaply.
In-site article

Introducing DSA Attention to Multimodal: Kuaishou Keye 2.0 Opens a New Paradigm of Enhanced Reasoning

Kuaishou releases Keye-VL-2.0-30B-A3B, a multimodal large language model that first applies DeepSeek Sparse Attention (DSA) to multimodal scenarios, enabling 256K ultra-long context deep perception. It achieves SOTA on long-video temporal understanding benchmarks and introduces built-in Agent collaboration, paving the way for enhanced reasoning and real-world business applications.

  • First to integrate DSA attention into multimodal, solving long-video understanding bottlenecks.
  • Achieves SOTA on TimeLens, LongVideoBench, MLVU; reverses long-context decay by boosting accuracy from 35.34% to 42.44% when scaling from 64 to 512 frames.
In-site article

Cited AI Workspace: No More Re-Uploading Files

UUMuse is a cloud AI knowledge base platform where you upload files once and use them across GPT, Claude, DeepSeek, Qwen, and more — with cited answers, persistent memory, agent mode, a multi-expert debate feature (Spark), and flexible deployment as docs sites, APIs, or MCP servers.

  • Upload files once and query multiple AI models (GPT, Claude, DeepSeek, Qwen) with source citations.
  • Persistent memory remembers your writing style and project context across conversations.
In-site article

DeepSeek V4 Gets Even Cheaper: New Tool Boasts 99.82% Cache Hit Rate, Slashes Bills to 20%

One month after DeepSeek V4's release, the open-source community unveiled Reasonix, a tool specifically designed to minimize API costs by maximizing cache efficiency. It achieves a staggering 99.82% cache hit rate, reducing a $61 bill for 400M+ tokens to just $12.

  • Reasonix is a dedicated coding harness for DeepSeek, focusing on cost reduction.
  • Its cache-first loop, tool-call repair, and automatic context compression maintain over 90% cache hit rate in long sessions.
In-site article

Deepseek makes its 75 percent discount permanent, pricing output tokens at least 34x below GPT-5.5

Deepseek is making the 75 percent discount on its top model V4-Pro permanent. At $0.435 per million input tokens, it's at least 11.5 times cheaper than GPT-5.5 and over 34 times cheaper on output. For token-hungry agentic systems, this kind of pricing could squeeze Western providers hard.

  • Deepseek's 75% discount on V4-Pro is now permanent.
  • Input token price is $0.435 per million, 11.5x cheaper than GPT-5.5.
In-site article

Alibaba's latest AI model ran autonomously for 35 hours to optimize code for its own custom chip

Alibaba's Qwen team releases Qwen3.7-Max, a proprietary model built for long-running autonomous agent tasks. It matches Claude Opus 4.6 on benchmarks and beats Chinese rivals like DeepSeek V4 Pro and Kimi K2.6. The team also demos the model steering a four-legged robot.

  • Qwen3.7-Max designed for long-running autonomous tasks
  • Matches Claude Opus 4.6, beats Chinese rivals
In-site article

DeepSeek V4 Slashes Prices Permanently; CATL, JD, NetEase Rush to Invest; Liang Wenfeng: Goal is AGI

DeepSeek announced permanent price cuts for its V4-Pro API. Meanwhile, CATL, JD, and NetEase are in talks to invest in DeepSeek's first external funding round. Founder Liang Wenfeng emphasizes prioritizing AGI research and maintaining open-source principles.

  • DeepSeek V4-Pro API permanently reduced to one-quarter of original price
  • CATL, JD, and NetEase among companies negotiating investment in DeepSeek
In-site article

[AINews] All Model Labs are now Agent Labs

Ahead of OpenAI's likely IPO filing, Greg Brockman signals a shift from pure models to agent products. DeepSeek makes 75% price cut permanent, MCP protocol becomes stateless, Google launches 24/7 AI agent, and Anthropic finds over 10,000 critical vulnerabilities. Agentification is the new normal.

  • Greg Brockman says model alone is no longer the product; harness+agent+workflow is key
  • DeepSeek V4 Pro permanently discounted 75%, slashing inference costs
In-site article

70 Billion Raised! DeepSeek Code Is Coming, ACM Gold Medalist Cui Tianyi at the Helm

DeepSeek has raised 70 billion yuan and is developing its own AI coding product, DeepSeek Code. Senior researcher Deli Chen posted a recruitment for a Code Harness team, and former TSY Capital co-founder Cui Tianyi may lead the team.

  • DeepSeek raised 70 billion yuan, prioritizing AI research over commercialization.
  • DeepSeek Code confirmed in development, hiring for Agent Harness team.
In-site article

Async Python client for private DeepSeek API

aiodeepseek is a high-performance async Python client for the private DeepSeek API, supporting streaming, image uploads, multi-turn conversations, and new account registration. It automatically solves proof-of-work challenges using C++ with AVX2 optimization.

  • Async Python client with streaming and image upload support
  • Multi-turn conversation and account registration features
In-site article

HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models

HELLoRA is a parameter-efficient fine-tuning method for Mixture-of-Experts (MoE) models that attaches LoRA modules only to the most frequently activated experts per layer. It reduces trainable parameters and adapter FLOPs while improving downstream performance. Tested on OlMoE, Mixtral, and DeepSeekMoE across math, code, and safety tasks, HELLoRA significantly outperforms vanilla LoRA, e.g., using 15.7% of the parameters on OlMoE with 9.2% higher accuracy.

  • HELLoRA attaches LoRA only to the most active experts per layer in MoE models.
  • It achieves superior performance with far fewer trainable parameters and FLOPs.
In-site article

Top 10 AI Research Papers of 2025

AI research in 2025 shifted from chatbots to reasoning systems, autonomous agents, and multimodal models. Key papers include DeepSeek-R1 (reinforcement learning), Gemini 2.5 (multimodal reasoning), Qwen2.5 (open models), Large Concept Models (concept-level language modeling), ESG analysis against greenwashing, VideoWorld (world models), AI Scientist-v2 (autonomous research), SWE-Lancer (coding agent benchmark), OLMo 2 (fully open language models), and Mixture-of-Recursions (efficient reasoning).

  • DeepSeek-R1 publicly demonstrated reinforcement learning for post-training, boosting reasoning and coding.
  • Gemini 2.5 introduced 'Thinking Mode' and advanced multimodal understanding with long context.
In-site article

GQLA: Group-Query Latent Attention for Hardware-Adaptive Large Language Model Decoding

Researchers propose Group-Query Latent Attention (GQLA), a modification of DeepSeek's Multi-head Latent Attention that provides two hardware-adaptive decoding paths without retraining. This approach enables efficient inference on both H100 and H20 GPUs, and includes TransGQLA for converting pretrained GQA models.

  • GQLA extends DeepSeek's MLA with dual decoding paths (MQA-absorb and GQA) to match different hardware rooflines.
  • A single set of GQLA weights can be used on H100 (MQA path) or H20 (GQA path with multi-token prediction).
In-site article

Latest open artifacts (#21): Open model bonanza! Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, GLM-5.1 & others. On CAISI's V4 assessment.

An eventful month with one flagship release after another. CAISI assessment shows open models lagging behind the US frontier, but methodology is questioned. Highlights include MiMo-V2.5-Pro, Gemma-4, Kimi-K2.6, Laguna-XS.2, and DeepSeek-V4-Flash.

  • Multiple open model releases from DeepSeek, Google, Moonshot AI, Xiaomi, and others.
  • CAISI evaluation shows large Elo gap, but benchmarks may underestimate real-world performance.
In-site article

Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention

From Gemma 4 to DeepSeek V4, this article explores how new open-weight LLMs are reducing long-context costs through architectures like cross-layer KV sharing, per-layer embeddings, attention budgeting, compressed convolutional attention, and mHC.

  • Gemma 4 introduces cross-layer KV sharing, cutting KV cache size in half while maintaining quality.
  • Per-layer embeddings boost model capacity with minimal computational overhead.
In-site article

Different models solve number-theory race problem

In an AI bot competition, participants computed the longest run of 1 bits in binary expansions of palindromic primes. DeepSeek V4-Pro won with 73 points, while ChatGPT and Grok failed to register due to misinterpretation of precomputation rules. Kimi benefited from a bug that accidentally gave correct answers in early rounds and won the final round.

  • DeepSeek won with 73 points, followed by Claude (60) and GLM (40).
  • ChatGPT and Grok were DNP because they precomputed before connecting and missed the 10-second registration window.
In-site article

We Tested DeepSeek V4 Pro and Flash Against Claude Opus 4.7 and Kimi K2.6

We ran DeepSeek V4 Pro and DeepSeek V4 Flash through the same FlowGraph benchmark used for Claude Opus 4.7 and Kimi K2.6. The Pro scored 77/100 for $2.25, landing between Opus (91) and Kimi (68). The Flash scored 60/100 for $0.02, a record low cost, but the build failed and key outputs were missing. Both models had lease expiry bugs, though Flash outperformed expectations in tool calling reliability. Overall, Opus remains the top performer, but DeepSeek's pricing shifts the cost landscape significantly.

  • DeepSeek V4 Pro scored 77/100 at $2.25, outperforming Kimi K2.6 (68) but trailing Claude Opus 4.7 (91).
  • DeepSeek V4 Flash scored 60/100 at $0.02, the cheapest test result, but had critical build and routing issues.
In-site article

Violin: An open-source video translation skill that breaks language barriers

Violin is an open-source AI video translation tool combining speech recognition, LLM translation, and text-to-speech to make video content accessible across languages. It offers a web app, CLI, and agent skills, featuring a video-aware chat assistant and personalized voice selection. Built with Together API using models like Whisper, DeepSeek, and Cartesia, it's released under the MIT license.

  • Violin integrates ASR, LLM translation, and TTS for open-source video translation.
  • Supports web app, CLI, and agent skills for diverse users.
In-site article

Tencent plans to ramp up AI spending as China's chip supply allegedly improves

Tencent announced plans to increase AI infrastructure spending in the second half of 2026, citing improved domestic chip supply from Chinese manufacturers. The company also reported strong first-quarter earnings and is reportedly in discussions to acquire a stake in AI startup Deepseek.

  • Tencent will boost AI infrastructure spending in H2 2026.
  • Chinese chipmakers are increasing domestic AI chip production.
In-site article

The US Is Winning the AI Race

The US is leading in AI commercialization, with strong cloud infrastructure, data platforms, and energy advantages. Despite contenders like China's DeepSeek, the US clearly leads in revenue, adoption, tools, and reach. Europe lacks cloud scale and ecosystem, making it difficult to catch up. The AI race is also a security race, involving weaponized AI and a shift toward closed stacks.

  • Since DeepSeek R1's shock in January 2025, US companies have accelerated AI commercialization, leading in revenue, adoption, and tools.
  • The US owns global hyperscalers and data platforms like YouTube and GitHub, creating a complete AI ecosystem.
In-site article

More Thinking, More Bias: Length-Driven Position Bias in Reasoning Models

This paper challenges the assumption that chain-of-thought reasoning reduces bias, demonstrating that position bias in multiple-choice QA actually increases with reasoning trajectory length. Across 13 configurations, 12 show a positive partial correlation between trajectory length and Position Bias Score (PBS). Truncation experiments confirm causality, and the 671B DeepSeek-R1 shows low overall bias but a persistent length effect in the longest quartile. Direct-answer position bias is a distinct phenomenon. The findings argue against assuming reasoning models are order-robust and provide a diagnostic toolkit.

  • Position bias scales with reasoning trajectory length across multiple reasoning-capable models, even after controlling for accuracy.
  • Truncation intervention causally links longer reasoning to increased bias toward position-preferred options (16% to 32% for R1-Qwen-7B).
In-site article

The Sequence Radar #857: Last Week in AI: Inside the Machine, Outside the Text Box

This week's AI developments highlight a shift from a model race to an infrastructure race. Anthropic's natural language autoencoders enable interpretability via language, OpenAI's voice models push conversational interfaces, SubQ claims a 12M-token context window, and Chinese AI labs like DeepSeek and Moonshot see soaring valuations. The editorial underscores that AI is becoming more inspectable, conversational, memory-rich, and institutionally valuable.

  • Anthropic's natural language autoencoders turn model activations into readable text, opening new interpretability paths
  • OpenAI's voice models transform AI from text-based queries to real-time conversational agents
In-site article

Baidu Releases Wenxin 5.1: Search Capability Tops Domestic Charts, Pre-training Cost Only 6% of Industry Average

Baidu has officially launched its new generation foundational large model, Wenxin 5.1. Using 'multi-dimensional elastic pre-training' technology, it achieves leading basic performance with only about 6% of the pre-training cost of comparable models. It has topped the LMArena search ranking in China and ranks fourth globally. Agent capabilities have significantly improved, surpassing DeepSeek-V4-Pro, and creative writing is on par with Gemini 3.1 Pro.

  • Wenxin 5.1 uses multi-dimensional elastic pre-training, reducing pre-training cost to 6% of industry average for similar scale models.
  • Topped LMArena search ranking in China with a score of 1223, fourth globally, the only domestic model on the leaderboard.
In-site article

Show HN: Stagewise – Agentic IDE for Your Z.ai/DeepSeek/Moonshot Subscription

Stagewise is an open source agentic IDE for developers featuring a built-in coding agent that can access the browser's console and debugger. It supports bring-your-own-key for popular AI providers like Z.ai, DeepSeek, Moonshot, and more, allowing developers to browse and build without context switching.

  • Open source agentic IDE with built-in coding agent
  • Bring your own API key for multiple AI providers
In-site article

Huawei braces for $12B in AI chip revenue as Chinese fabs can barely keep up

Huawei expects AI chip revenue to reach $12 billion by 2026, driven by demand from Alibaba, ByteDance, and Tencent, as Nvidia’s China market share drops to zero. Key challenges include SMIC’s limited advanced-node capacity, low yields, and long cycle times. The Ascend 950PR is now the primary AI chip for Chinese cloud providers, boosted by DeepSeek V4’s optimization for Huawei’s architecture.

  • Huawei projects $12 billion in AI chip revenue by 2026, up from $7.5 billion in 2025, marking over 60% annual growth.
  • Nvidia's China AI accelerator market share has collapsed to zero, according to CEO Jensen Huang, due to U.S. export restrictions and Beijing's push for domestic sourcing.
In-site article

Redis Creator Builds a Dedicated Inference Engine for DeepSeek V4: ds4.c

Salvatore Sanfilippo (antirez), the creator of Redis, has open-sourced ds4.c, a lightweight inference engine tailored for DeepSeek V4 Flash. It runs efficiently on Apple Silicon Macs using Metal API, achieving up to 27 tokens/s generation on high-end models.

  • Antirez releases ds4.c, a Metal-only inference engine for DeepSeek V4 Flash, optimized for Mac. No other models supported.
  • Employs asymmetric quantization (2-bit for MoE expert layers, Q8 for others) and disk-based KV caching for speed.
In-site article

All Labs Fear ByteDance, Everyone Praises DeepSeek: US Researcher's 36-Hour AI Tour in China

Nathan Lambert from AI2 shares his insights after a 36-hour visit to Chinese AI labs. He found a collaborative culture, students as core contributors, and a deep respect for DeepSeek, contrasting with the US competitive atmosphere.

  • Chinese AI labs foster a collaborative culture where students work on core R&D.
  • ByteDance is feared, DeepSeek is universally admired.
In-site article

ZAYA1-8B Technical Report

ZAYA1-8B is a reasoning-focused mixture-of-experts model with 700M active and 8B total parameters, trained on AMD hardware. It matches or exceeds DeepSeek-R1-0528 on math and coding benchmarks and introduces Markovian RSA for test-time compute.

  • ZAYA1-8B features 700M active parameters and 8B total parameters, trained on a full-stack AMD platform.
  • It matches or exceeds DeepSeek-R1-0528 on multiple math and coding benchmarks.
In-site article

Serving DeepSeek-V4: why million-token context is an inference systems problem

DeepSeek-V4's hybrid attention design (CSA, HCA, SWA) compresses KV cache, turning million-token context from a model challenge into a serving-systems problem. Together AI's early bring-up on NVIDIA HGX B200 reveals how cache policy, prefix caching, and endpoint profiles impact long-context workloads.

  • DeepSeek-V4's compressed sparse attention (CSA) and heavily compressed attention (HCA) reduce KV cache size, but the inference engine must manage multiple cache layouts.
  • Sliding window attention (SWA) becomes a bottleneck at long context, requiring careful storage strategy.
In-site article

Token Demand Surges 1000-Fold, 2.2 Billion Yuan Pours Into AGI Infra Leader

As the AI industry enters the Agent era, token demand has exploded. Infinigence AI, China's leading neutral AGI infrastructure provider, has raised over 2.2 billion yuan in total, with daily token calls growing over 20-fold since end of 2025. The company underpins major Chinese models like Kimi, GLM, MiniMax, and DeepSeek, positioning itself as a key hub in the token economy.

  • Agent era drives token consumption from hundreds to millions per task, reshaping infrastructure needs.
  • Infinigence AI's token call volume doubles every two weeks, far outpacing national average.
In-site article

Deepseek nears $45 billion valuation as China's state chip fund leads round

Deepseek is close to a funding round that could value the Chinese AI lab at roughly $45 billion, according to the Financial Times. The talks are being led by the China Integrated Circuit Industry Investment Fund, with Tencent also negotiating a stake.

  • Deepseek's valuation could reach $45B in upcoming round
  • China's state chip fund 'Big Fund' is leading the talks
In-site article

Amazon brings agentic fine-tuning to SageMaker with support for Llama, Qwen, Deepseek, and Nova

Amazon SageMaker AI now includes an AI agent that lets developers describe use cases in plain language, automatically recommends training methods, prepares data, kicks off training, and delivers editable Jupyter notebooks. Supports Llama, Qwen, Deepseek, and Nova model families.

  • SageMaker AI introduces the Kiro AI agent for automated fine-tuning via natural language.
  • The agent is preinstalled in the development environment; alternative agents like Claude Code can be used.
In-site article

Last Week in AI #340 - OpenAI vs Musk + Microsoft, DeepSeek v4, Vision Banana

The first week of the Musk v. Altman trial concluded with Musk's testimony dominating; OpenAI and Microsoft renegotiate their partnership, ending exclusivity; DeepSeek previews V4 models that narrow the gap with frontier models; Google DeepMind introduces Vision Banana, a unified model for image generation and visual understanding.

  • Musk admitted xAI partly distilled from OpenAI models during the trial's first week.
  • Microsoft and OpenAI revised their agreement, ending Microsoft's exclusive cloud rights; OpenAI can now use AWS and other providers.
In-site article

LWiAI Podcast #243 - GPT 5.5, DeepSeek V4, AI safety sabotage

Our 243rd episode with a summary and discussion of last week’s big AI news, including OpenAI's GPT-5.5, xAI's Grok Voice Think Fast 1.0, DeepSeek V4 open source, Google's massive investment in Anthropic, and safety research on sabotage and document corruption.

  • OpenAI released GPT-5.5 with strong coding improvements and a system card on chain-of-thought monitorability
  • xAI launched Grok Voice Think Fast 1.0, claiming big benchmark leads in real-time voice agents
In-site article

"DeepSeek Version of Claude Code" – 2.3k Stars on GitHub

DeepSeek-TUI is a Rust-based terminal coding agent optimized for DeepSeek models. It recently surged in popularity after the release of DeepSeek-V4 and the developer's Chinese-language promotion, hitting GitHub's trending list with over 2,300 stars. The tool offers chain-of-thought visualization, context compression, RLM multi-agent parallelism, and multiple model switching options.

  • DeepSeek-TUI is a terminal coding agent akin to Claude Code, specifically optimized for DeepSeek models, now with 2.3k GitHub stars.
  • Created by independent developer Hunter Bown, it is written in Rust and open-sourced under the MIT license.
In-site article

The Biggest Regret of DeepSeek V4

DeepSeek V4's technical report introduced many innovations but notably lacked Engram, a conditional memory module jointly open-sourced by DeepSeek and Peking University in January 2026. Engram acts as a native lookup table for Transformers, separating static knowledge retrieval from deep reasoning, which improves efficiency and reasoning performance. Although absent from V4, three subsequent papers explored Engram's potential in CXL memory pooling, collision-free hot-layer optimization, and vision tasks.

  • DeepSeek V4 omitted Engram, a highly anticipated conditional memory module.
  • Engram uses hash-based lookup for static knowledge, freeing up network capacity for advanced reasoning.
In-site article

[AINews] AI Engineer World's Fair — Autoresearch, Memory, World Models, Tokenmaxxing, Agentic Commerce, and Vertical AI Call for Speakers

This article announces the second wave call for speakers for the AI Engineer World's Fair, covering new tracks like autoresearch, memory, world models, tokenmaxxing, agentic commerce, and vertical AI. It also recaps recent AI developments, including Grok 4.3 release, DeepSeek V4 Pro progress, Codex vs Claude Code competition, agent infrastructure research, and discussions from the local LLM community.

  • Second wave call for speakers for AI Engineer World's Fair, with new topic tracks.
  • Grok 4.3 released with mixed reception; DeepSeek V4 Pro emerges as credible open-weight coding model.
In-site article

AI Weekly Issue #488: OpenAI lost three things in five days

This week OpenAI faced a triple blow: Musk's lawsuit threatens its nonprofit-to-profit conversion, revenue miss triggers market selloffs, and AWS deal ends Microsoft exclusivity. Meanwhile, DeepSeek price war, Big Tech layoffs, and White House plans to bypass Anthropic's safety flags signal shifting pricing power and regulatory landscape.

  • Musk's $134B lawsuit to return OpenAI to nonprofit status will set legal precedent for for-profit conversions in AI.
  • OpenAI's revenue miss below forecasts underpinning Oracle's $300B compute contract dragged down chip stocks.
In-site article

The Sequence AI of the Week #851: DeepSeek-V4 and the Architecture of Million-Token Intelligence

DeepSeek-V4 is not just another frontier model; it is a systems engineering approach to making long-context reasoning practical, addressing the challenge of economically using a million-token context window through a new memory hierarchy, attention mechanics, and training stabilizers.

  • DeepSeek-V4 supports a one-million-token context window, but the focus is on economically using that context rather than just ingesting it.
  • The model introduces a new memory hierarchy, attention mechanics, training stabilizers, optimizer choices, quantization regimes, and serving stack to make long-context reasoning practical.
In-site article

AINews: Not Much Happened Today

Despite a quiet day, notable releases include NVIDIA Nemotron 3 Nano Omni, vLLM v0.20, Poolside's first public model, and DeepSeek V4 serving benchmarks. Agent tooling matures and new benchmarks emerge.

  • NVIDIA releases Nemotron 3 Nano Omni, a 30B multimodal MoE with 256K context.
  • vLLM v0.20 introduces TurboQuant, FA4 for MLA, and new IR foundation.
In-site article

DeepSeek-V4 Pro now available on Together AI

DeepSeek-V4 Pro, a 1.6T-parameter MoE reasoning model, is now available on Together AI with a 512K context window, controllable reasoning modes, and cached-input pricing for long-context workloads like code agents, document intelligence, and research synthesis.

  • 1.6T-parameter MoE with 49B activated parameters, 512K context on Together AI (model supports 1M)
  • Three reasoning modes: Non-Think, Think High, Think Max to match effort to task
In-site article

The Sequence Radar #849: Last Week in AI: OpenAI Ships Agents, xAI Eyes Cursor, DeepSeek and Kimi Advance

This week's AI developments highlight a shift from model launches to AI becoming operational, with OpenAI releasing GPT-5.5, Workspace Agents, and ChatGPT Images 2.0; xAI making a deal with Cursor; and DeepSeek V4 and Kimi 2.6 advancing. Research papers cover distributed pre-training, multimodal understanding, and agentic coding.

  • OpenAI launches GPT-5.5, Workspace Agents, and ChatGPT Images 2.0, signaling AI's shift from conversation to execution
  • xAI strikes a deal with Cursor, underscoring code as the ideal environment for agents
In-site article

DeepSeek V4 Pro (1.6T-A49B) and Flash (284B-A13B) — Base and Instruct, Runnable on Huawei Ascend

After months of delay, DeepSeek released the highly anticipated DSV4 series including Pro and Flash variants with 1M context, mixed-precision quantization, MIT license, and Huawei Ascend support. The series tops open-weight models but lags behind closed frontier models.

  • DSV4 Pro: 1.6T total / 49B active, Flash: 284B total / 13B active, 1M context
  • New architecture with CSA and HCA dramatically reduces KV cache to 10% of V3.2
In-site article

Company Directory