DeepSeek AI News

DeepSeek updates

Indian companies look to Chinese LLMs as AI costs bite

2026-07-13 09:52 UTC

Indian companies are increasingly relying on Chinese large language models from DeepSeek, Alibaba, and Moonshot AI to curb AI spending, extending India's dependence on Chinese cutting-edge technology despite historical tensions.

Indian firms turn to Chinese LLMs to reduce AI costs
DeepSeek, Alibaba, and Moonshot AI are key providers

My AI Model Tier List for Mid-2026

2026-07-11 15:43 UTC

A personal, non-benchmark tier list of AI models for coding and auditing as of mid-2026, covering Anthropic Fable, OpenAI Sol, Mistral, Gemini, and DeepSeek, with commentary on US export controls and European perspectives.

Fable (Anthropic) gets a B: fluent but unreliable, prone to hiding bugs.
Sol (OpenAI) gets an S: trustworthy for low-level code and testing.

DeepSeek V3.2 Released on Hugging Bay

2026-07-11 01:44 UTC

DeepSeek V3.2 is now available on Hugging Bay, an open-source AI artifact registry offering provenance, license verification, and trusted hosting.

DeepSeek V3.2 has been published on Hugging Bay.
Hugging Bay is an open registry with provenance and trust features.

DeepSeek DSpark: The Speculative Decoding Trick Behind 400% Faster LLM

2026-07-08 18:26 UTC

DeepSeek's new DSpark module brings speculative decoding to DeepSeek-V4, boosting per-user generation speed by 60-85% with no quality loss. It tackles both weak draft quality and verification waste simultaneously via a semi-autoregressive draft model with a Markov head. This article explains the method, the open-source DeepSpec toolkit, and experimental results.

DSpark uses a semi-autoregressive draft model combining parallel speed with sequential coherence.
A Markov head delivers near-full benefits with minimal overhead, chosen over an RNN head for production.

AI Models Overthink Problems—and It’s a Security Risk

2026-07-08 11:00 UTC

Research shows that large language models with reasoning capabilities can be tricked into 'overthinking' using logically inconsistent prompts, leading to a denial-of-service attack. Researchers from Zhejiang University and Alibaba developed an evolutionary algorithm that generates malicious prompts, causing outputs up to 26 times longer in leading models like DeepSeek-R1, Qwen3-Thinking, GPT-o3, and Gemini 2.5 Flash.

Researchers demonstrate a new attack exploiting 'overthinking' in AI reasoning models, causing excessive computation.
An evolutionary algorithm corrupts prompts to produce outputs up to 26 times longer than normal.

Chinese AI models are gaining ground with U.S. companies as costs surge

2026-07-07 21:48 UTC

Chinese-built AI models are gaining traction among U.S. companies as they narrow the performance gap with leading American rivals while remaining significantly cheaper to use. Recent model releases from DeepSeek and Z.ai are highly competitive with Anthropic and OpenAI. This comes as token prices for advanced models rise at U.S. labs, making companies seek cost-effective alternatives.

Chinese AI models are closing the performance gap with US leaders like Anthropic and OpenAI.
DeepSeek and Z.ai offer competitive models at lower token prices.

DeepSeek V4 Is Earning Agentic Token Share

2026-07-06 20:27 UTC

DeepSeek V4, released April 24, 2026, doubled its token share on OpenRouter from 9% to 18% within six months, driven primarily by agentic workloads. Its cost efficiency ($0.09/$0.18 per million tokens vs GPT-5.5's $5/$30) attracts diverse users, and Chinese models surpass US models in total token share.

DeepSeek V4 increased token share from 9% to 18% in six months post-release.
Agentic workloads are the main driver; V4-Flash accounts for 70% of DeepSeek's agentic tokens.

Low-cost Chinese AI models like DeepSeek gain traction in the U.S.

2026-06-29 15:15 UTC

U.S. developers and small companies are turning to Chinese AI models to cut costs. Though lagging in performance, these models handle most tasks at a fraction of the price. Microsoft is also exploring DeepSeek as a cheaper alternative for Copilot. Chinese companies face challenges turning popularity into revenue under political scrutiny.

Stu Clott uses DeepSeek for coding, costing under 50 cents vs. $10 on Claude.
Chinese models lower costs due to cheaper salaries and infrastructure in China.

DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates DeepSeek-V4 Per-User Generation 60–85% Over MTP-1

2026-06-27 16:59 UTC

DeepSeek open-sourced DSpark, a speculative decoding framework that attaches a draft module to existing DeepSeek-V4 weights. It pairs a parallel draft backbone with a lightweight Markov head to cut suffix decay, then adds confidence-scheduled verification that tailors how many tokens get checked to real-time GPU load. Offline, accepted length rises 16–31% over DFlash and Eagle3; in production it speeds per-user generation 57–85% over the MTP-1 baseline, losslessly. The training repo, DeepSpec, ships under MIT.

DSpark pairs a parallel draft backbone with a lightweight Markov head to improve suffix acceptance.
Confidence-scheduled verification adjusts tokens checked based on GPU load.

cwmail: A terminal email client in native Golang with LLM-based drafting

2026-06-27 03:36 UTC

cwmail is a terminal email client written in Go using Bubbletea v2. It features proper HTML rendering, inline image support, multi-account IMAP with IDLE push, and AI-drafted replies powered by DeepSeek V4 Pro. It includes undo delete, draft auto-save, CLI send mode, and full offline capability, with all data stored locally.

Written in Go with Bubbletea v2, providing a full TUI for email management in the terminal.
Supports multiple IMAP accounts side-by-side with IDLE push notifications, avoiding polling.

We got DeepSeek-V4-Pro serving in 20 seconds

2026-06-25 20:49 UTC

Inferize announces achieving DeepSeek-V4-Pro model serving in 20 seconds, showcasing highly optimized and elastic AI inference for LLMs, with a waitlist now open.

Inferize deployed DeepSeek-V4-Pro in 20 seconds
Provides highly optimized, elastic AI inference

Plotting AI model release cadence: two labs are accelerating, three aren't

2026-06-21 02:16 UTC

Analysis of frontier model release data shows Anthropic and OpenAI are accelerating their release cadence, while Google, Meta, and DeepSeek are not. The article explores the recursive self-improvement hypothesis and proposes a falsifiable test.

Anthropic and OpenAI show accelerating model release cadence; three other labs do not.
Acceleration may be due to recursive self-improvement, where labs use their own models to build successors.

Beyond the $7.4B Headline: DeepSeek's Series A signals Chinese AI alliance shift

2026-06-20 23:47 UTC

3 Takeaways This Week: DeepSeek's $7.4B Series A led by Tencent signals a shift in Chinese AI funding away from ecosystem players; Japan targets $65B in physical AI infrastructure by 2040; Zhipu AI's GLM 5.2 surpasses Anthropic's Claude in design benchmarks.

DeepSeek's $7.4B Series A led by Tencent, with Alibaba and ByteDance absent.
Japan plans $65B public-private investment in physical AI infrastructure by 2040.

VibeThinker-3B: A 3B Dense Reasoning Model Built on Qwen2.5-Coder-3B With the Spectrum-to-Signal Post-Training Pipeline

2026-06-19 22:06 UTC

VibeThinker-3B is a compact 3B-parameter reasoning model that matches large models like DeepSeek V3.2 on math and code benchmarks, using an efficient post-training pipeline and test-time scaling.

VibeThinker-3B is a 3B dense model, MIT-licensed, built on Qwen2.5-Coder-3B for verifiable reasoning.
It scores 94.3 on AIME26, comparable to DeepSeek V3.2 (671B) and Kimi K2.5 (1T).

Attribution-Guided and Coverage-Maximized Pruning for Structural MoE Compression

2026-06-18 04:00 UTC

This paper proposes a structural pruning framework for Mixture-of-Experts models by reformulating prune-ratio allocation as a channel-score coverage maximization problem, solved efficiently via attribution-based approximation. Experiments on DeepSeek and Qwen MoE models show accuracy preservation under 50% or 25% structured pruning with 4-bit quantization, achieving 5.27× memory reduction on Qwen3-30B-A3B and outperforming baselines.

Observation: information within MoE experts is highly concentrated in a small subset of channels, leaving substantial redundancy even in important experts
Proposes a channel-level structural pruning framework that models prune-ratio allocation as a coverage maximization problem

Native Coding Agent Optimized for Local LLM and DeepSeek v4 with Vector Memory

2026-06-16 22:36 UTC

cwcode is a Go-based terminal coding agent leveraging DeepSeek V4 Pro, Qwen3.6-27B, and more. It offers file editing, sub-agents, semantic memory, and autonomous recovery. Key features: low cost (~$0.40/hour), high cache hit ratio (>85%), hash-anchored edits, checkpoint/rewind, and no SaaS lock-in.

Go-based terminal coding agent supporting DeepSeek V4 Pro, Qwen3.6-27B, etc.
Hash-anchored edits and sticky prefix cache reduce token usage and cost

How to Build a QwenPaw Agent Workspace with Custom Skills, Model Providers, Console Access, and Streaming API Testing

2026-06-13 17:27 UTC

This tutorial provides a step-by-step guide to setting up a QwenPaw agent workspace in Google Colab, including installation, configuration, authentication, connecting model providers (OpenAI, OpenRouter, DashScope, DeepSeek, Gemini), creating custom skills and local knowledge files, launching the console with optional Cloudflare tunnel, and testing the streaming chat API.

Step-by-step instructions for installing and initializing QwenPaw with a configured working directory.
Support for multiple model providers, auto-configured via Colab secrets.

China cracks down on Western AI models while US companies flock to DeepSeek

2026-06-13 02:51 UTC

China's Ministry of State Security warns of security risks in using Western AI models, while US firms increasingly adopt Chinese open-source models like DeepSeek due to cost advantages. Both nations' users circumvent restrictions, fueling a proxy market for AI access.

China's MSS warns against using third-party tools to access US AI models, citing security risks.
US companies flock to Chinese models such as DeepSeek and Alibaba's Qwen for lower costs.

Pythagoras-Prover: Advancing Efficient Formal Proving via Augmented Lean Formalisation

2026-06-12 04:00 UTC

Pythagoras-Prover is a compute-efficient family of open-source Lean theorem provers, featuring autoregressive models (4B and 32B) and a diffusion-based prover (4B). It uses curriculum SFT with stratified data and dynamic proof filtering for training efficiency, and introduces Augmented Lean Formalisation (ALF) to expand verified corpora via self-distillation. The 4B model outperforms DeepSeek-Prover-V2-671B on MiniF2F-Test (86.1% vs 82.4%) with ~167x fewer parameters, while the 32B model sets a new open-source SOTA at 93.0% and solves 93 PutnamBench problems.

Pythagoras-Prover includes autoregressive models at 4B and 32B parameters and a 4B diffusion-based prover that refines proofs iteratively.
Training efficiency is achieved via curriculum SFT with stratified difficulty levels and dynamic proof reasoning filtering within an 8k-token context.

Deepseek topped Ramp's trending software vendors in June 2026 as US companies chase cheaper AI

2026-06-07 16:06 UTC

In June 2026, Deepseek became the top paid software vendor on Ramp's platform as US companies send data directly to the service. Ramp chief economist Ara Kharazian cites cost awareness as a driver but warns about security risks of using Chinese models.

Deepseek ranked first among Ramp's trending software vendors in June 2026.
US companies are turning to Deepseek's paid AI service to reduce costs.

DigitalOcean says it is now an OpenRouter AI model provider

2026-06-03 08:25 UTC

DigitalOcean announced on X that it is now a model provider on OpenRouter, offering DeepSeek V3.2, Kimi K2.6, and DeepSeek V4 Flash. The move signals the company's expansion from cloud infrastructure into AI inference.

DigitalOcean announced on X that it has become a model provider on OpenRouter
Initial models include DeepSeek V3.2, Kimi K2.6, and DeepSeek V4 Flash

New review paper argues code is how AI agents think and act, not just what they produce

2026-05-29 13:10 UTC

A new review paper argues that the real bottleneck for autonomous AI agents is the software layer around the language model—tools, memory, testing, and permissions. DeepSeek is building a dedicated 'Harness' team in Beijing, confirming the formula: model + harness = AI agent.

The paper claims the bottleneck for AI agents is the software harness, not the model.
Key components include tools, memory, testing, and permission boundaries.

AI Weekly Issue #496: Anthropic's Pentagon model is now everyone's model

2026-05-27 00:00 UTC

Anthropic released its formerly classified Mythos model to the public, collapsing the gap between sovereign and developer AI. DeepMind's Demis Hassabis moved AGI timeline to 2029. Critical vulnerabilities in Starlette impacted millions of AI agents, and a coordinated takedown dismantled the Glassworm botnet. BNP Paribas partnered with Mistral for sovereign AI security, while China restricted travel for top AI engineers at Alibaba and DeepSeek. Corporate AI spending and layoffs made headlines: Uber burned its full-year AI budget by April, ClickUp restructured with a 3:1 AI-to-human ratio, and Sam Altman reversed his white-collar apocalypse prediction. However, MIT Technology Review data showed AI-exposed roles have lower unemployment.

Anthropic releases Mythos, previously limited to government contractors, now available via standard API.
DeepMind CEO Hassabis advances AGI timeline to 2029, citing AlphaProof Nexus solving nine Erdős problems cheaply.

Introducing DSA Attention to Multimodal: Kuaishou Keye 2.0 Opens a New Paradigm of Enhanced Reasoning

2026-05-26 10:17 UTC

Kuaishou releases Keye-VL-2.0-30B-A3B, a multimodal large language model that first applies DeepSeek Sparse Attention (DSA) to multimodal scenarios, enabling 256K ultra-long context deep perception. It achieves SOTA on long-video temporal understanding benchmarks and introduces built-in Agent collaboration, paving the way for enhanced reasoning and real-world business applications.

First to integrate DSA attention into multimodal, solving long-video understanding bottlenecks.
Achieves SOTA on TimeLens, LongVideoBench, MLVU; reverses long-context decay by boosting accuracy from 35.34% to 42.44% when scaling from 64 to 512 frames.

DeepSeek V4 Gets Even Cheaper: New Tool Boasts 99.82% Cache Hit Rate, Slashes Bills to 20%

2026-05-25 04:40 UTC

One month after DeepSeek V4's release, the open-source community unveiled Reasonix, a tool specifically designed to minimize API costs by maximizing cache efficiency. It achieves a staggering 99.82% cache hit rate, reducing a $61 bill for 400M+ tokens to just $12.

Reasonix is a dedicated coding harness for DeepSeek, focusing on cost reduction.
Its cache-first loop, tool-call repair, and automatic context compression maintain over 90% cache hit rate in long sessions.

Deepseek makes its 75 percent discount permanent, pricing output tokens at least 34x below GPT-5.5

2026-05-23 17:10 UTC

Deepseek is making the 75 percent discount on its top model V4-Pro permanent. At $0.435 per million input tokens, it's at least 11.5 times cheaper than GPT-5.5 and over 34 times cheaper on output. For token-hungry agentic systems, this kind of pricing could squeeze Western providers hard.

Deepseek's 75% discount on V4-Pro is now permanent.
Input token price is $0.435 per million, 11.5x cheaper than GPT-5.5.

Alibaba's latest AI model ran autonomously for 35 hours to optimize code for its own custom chip

2026-05-23 10:17 UTC

Alibaba's Qwen team releases Qwen3.7-Max, a proprietary model built for long-running autonomous agent tasks. It matches Claude Opus 4.6 on benchmarks and beats Chinese rivals like DeepSeek V4 Pro and Kimi K2.6. The team also demos the model steering a four-legged robot.

Qwen3.7-Max designed for long-running autonomous tasks
Matches Claude Opus 4.6, beats Chinese rivals

DeepSeek V4 Slashes Prices Permanently; CATL, JD, NetEase Rush to Invest; Liang Wenfeng: Goal is AGI

2026-05-23 09:46 UTC

DeepSeek announced permanent price cuts for its V4-Pro API. Meanwhile, CATL, JD, and NetEase are in talks to invest in DeepSeek's first external funding round. Founder Liang Wenfeng emphasizes prioritizing AGI research and maintaining open-source principles.

DeepSeek V4-Pro API permanently reduced to one-quarter of original price
CATL, JD, and NetEase among companies negotiating investment in DeepSeek

Deepseek reportedly prioritizes AGI research over quick profits despite billions in funding

2026-05-22 17:18 UTC

Deepseek is about to raise around $10 billion, which would value the Chinese AI startup at roughly $45 billion. Founder Liang Wenfeng is telling investors he's putting AGI research ahead of short-term profits.

Deepseek is raising ~$10B at a ~$45B valuation.
Founder Liang Wenfeng prioritizes AGI research over short-term profits.

HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models

2026-05-20 04:00 UTC

HELLoRA is a parameter-efficient fine-tuning method for Mixture-of-Experts (MoE) models that attaches LoRA modules only to the most frequently activated experts per layer. It reduces trainable parameters and adapter FLOPs while improving downstream performance. Tested on OlMoE, Mixtral, and DeepSeekMoE across math, code, and safety tasks, HELLoRA significantly outperforms vanilla LoRA, e.g., using 15.7% of the parameters on OlMoE with 9.2% higher accuracy.

HELLoRA attaches LoRA only to the most active experts per layer in MoE models.
It achieves superior performance with far fewer trainable parameters and FLOPs.

Top 10 AI Research Papers of 2025

2026-05-18 12:15 UTC

AI research in 2025 shifted from chatbots to reasoning systems, autonomous agents, and multimodal models. Key papers include DeepSeek-R1 (reinforcement learning), Gemini 2.5 (multimodal reasoning), Qwen2.5 (open models), Large Concept Models (concept-level language modeling), ESG analysis against greenwashing, VideoWorld (world models), AI Scientist-v2 (autonomous research), SWE-Lancer (coding agent benchmark), OLMo 2 (fully open language models), and Mixture-of-Recursions (efficient reasoning).

DeepSeek-R1 publicly demonstrated reinforcement learning for post-training, boosting reasoning and coding.
Gemini 2.5 introduced 'Thinking Mode' and advanced multimodal understanding with long context.

GQLA: Group-Query Latent Attention for Hardware-Adaptive Large Language Model Decoding

2026-05-18 04:00 UTC

Researchers propose Group-Query Latent Attention (GQLA), a modification of DeepSeek's Multi-head Latent Attention that provides two hardware-adaptive decoding paths without retraining. This approach enables efficient inference on both H100 and H20 GPUs, and includes TransGQLA for converting pretrained GQA models.

GQLA extends DeepSeek's MLA with dual decoding paths (MQA-absorb and GQA) to match different hardware rooflines.
A single set of GQLA weights can be used on H100 (MQA path) or H20 (GQA path with multi-token prediction).

Latest open artifacts (#21): Open model bonanza! Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, GLM-5.1 & others. On CAISI's V4 assessment.

2026-05-16 17:00 UTC

An eventful month with one flagship release after another. CAISI assessment shows open models lagging behind the US frontier, but methodology is questioned. Highlights include MiMo-V2.5-Pro, Gemma-4, Kimi-K2.6, Laguna-XS.2, and DeepSeek-V4-Flash.

Multiple open model releases from DeepSeek, Google, Moonshot AI, Xiaomi, and others.
CAISI evaluation shows large Elo gap, but benchmarks may underestimate real-world performance.

Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention

2026-05-16 11:33 UTC

From Gemma 4 to DeepSeek V4, this article explores how new open-weight LLMs are reducing long-context costs through architectures like cross-layer KV sharing, per-layer embeddings, attention budgeting, compressed convolutional attention, and mHC.

Gemma 4 introduces cross-layer KV sharing, cutting KV cache size in half while maintaining quality.
Per-layer embeddings boost model capacity with minimal computational overhead.

We Tested DeepSeek V4 Pro and Flash Against Claude Opus 4.7 and Kimi K2.6

2026-05-15 01:39 UTC

We ran DeepSeek V4 Pro and DeepSeek V4 Flash through the same FlowGraph benchmark used for Claude Opus 4.7 and Kimi K2.6. The Pro scored 77/100 for $2.25, landing between Opus (91) and Kimi (68). The Flash scored 60/100 for $0.02, a record low cost, but the build failed and key outputs were missing. Both models had lease expiry bugs, though Flash outperformed expectations in tool calling reliability. Overall, Opus remains the top performer, but DeepSeek's pricing shifts the cost landscape significantly.

DeepSeek V4 Pro scored 77/100 at $2.25, outperforming Kimi K2.6 (68) but trailing Claude Opus 4.7 (91).
DeepSeek V4 Flash scored 60/100 at $0.02, the cheapest test result, but had critical build and routing issues.

Violin: An open-source video translation skill that breaks language barriers

2026-05-14 00:00 UTC

Violin is an open-source AI video translation tool combining speech recognition, LLM translation, and text-to-speech to make video content accessible across languages. It offers a web app, CLI, and agent skills, featuring a video-aware chat assistant and personalized voice selection. Built with Together API using models like Whisper, DeepSeek, and Cartesia, it's released under the MIT license.

Violin integrates ASR, LLM translation, and TTS for open-source video translation.
Supports web app, CLI, and agent skills for diverse users.

Tencent plans to ramp up AI spending as China's chip supply allegedly improves

2026-05-13 18:46 UTC

Tencent announced plans to increase AI infrastructure spending in the second half of 2026, citing improved domestic chip supply from Chinese manufacturers. The company also reported strong first-quarter earnings and is reportedly in discussions to acquire a stake in AI startup Deepseek.

Tencent will boost AI infrastructure spending in H2 2026.
Chinese chipmakers are increasing domestic AI chip production.

More Thinking, More Bias: Length-Driven Position Bias in Reasoning Models

2026-05-11 04:00 UTC

This paper challenges the assumption that chain-of-thought reasoning reduces bias, demonstrating that position bias in multiple-choice QA actually increases with reasoning trajectory length. Across 13 configurations, 12 show a positive partial correlation between trajectory length and Position Bias Score (PBS). Truncation experiments confirm causality, and the 671B DeepSeek-R1 shows low overall bias but a persistent length effect in the longest quartile. Direct-answer position bias is a distinct phenomenon. The findings argue against assuming reasoning models are order-robust and provide a diagnostic toolkit.

Position bias scales with reasoning trajectory length across multiple reasoning-capable models, even after controlling for accuracy.
Truncation intervention causally links longer reasoning to increased bias toward position-preferred options (16% to 32% for R1-Qwen-7B).

Liang Wenfeng Invests 20 Billion! DeepSeek's Record First Round Financing of 50 Billion, V4.1 Scheduled for June

2026-05-09 02:08 UTC

DeepSeek aims to raise up to 50 billion yuan in its first funding round, with founder Liang Wenfeng personally contributing 20 billion. The company's valuation has surged to 350 billion yuan, and the V4.1 model is set for a June release, signaling a shift from an idealistic lab to a commercial AI company.

DeepSeek Seeks Funding at $45B Valuation as China Backs Homegrown AI Rival

2026-05-08 23:14 UTC

DeepSeek is seeking its first outside funding at a $45 billion valuation, highlighting China's push to support domestic AI companies.

DeepSeek seeks first external investment at $45B valuation.
Funding reflects Chinese government's backing of indigenous AI industry.

AI money keeps flowing as Deepseek plans record raise and Core Automation quadruples valuation in weeks

2026-05-08 17:50 UTC

Deepseek plans a record $7.35B funding round for a Chinese AI company, with V4.1 launching in June. Core Automation, founded by ex-OpenAI researcher Jerry Tworek just six weeks ago, is already targeting a $4B valuation.

Deepseek plans $7.35B funding round, the largest for a Chinese AI company.
Deepseek V4.1 is expected to launch in June.

Redis Creator Builds a Dedicated Inference Engine for DeepSeek V4: ds4.c

2026-05-08 08:20 UTC

Salvatore Sanfilippo (antirez), the creator of Redis, has open-sourced ds4.c, a lightweight inference engine tailored for DeepSeek V4 Flash. It runs efficiently on Apple Silicon Macs using Metal API, achieving up to 27 tokens/s generation on high-end models.

Antirez releases ds4.c, a Metal-only inference engine for DeepSeek V4 Flash, optimized for Mac. No other models supported.
Employs asymmetric quantization (2-bit for MoE expert layers, Q8 for others) and disk-based KV caching for speed.

ZAYA1-8B Technical Report

2026-05-08 04:00 UTC

ZAYA1-8B is a reasoning-focused mixture-of-experts model with 700M active and 8B total parameters, trained on AMD hardware. It matches or exceeds DeepSeek-R1-0528 on math and coding benchmarks and introduces Markovian RSA for test-time compute.

ZAYA1-8B features 700M active parameters and 8B total parameters, trained on a full-stack AMD platform.
It matches or exceeds DeepSeek-R1-0528 on multiple math and coding benchmarks.

Serving DeepSeek-V4: why million-token context is an inference systems problem

2026-05-08 00:00 UTC

DeepSeek-V4's hybrid attention design (CSA, HCA, SWA) compresses KV cache, turning million-token context from a model challenge into a serving-systems problem. Together AI's early bring-up on NVIDIA HGX B200 reveals how cache policy, prefix caching, and endpoint profiles impact long-context workloads.

DeepSeek-V4's compressed sparse attention (CSA) and heavily compressed attention (HCA) reduce KV cache size, but the inference engine must manage multiple cache layouts.
Sliding window attention (SWA) becomes a bottleneck at long context, requiring careful storage strategy.

Token Demand Surges 1000-Fold, 2.2 Billion Yuan Pours Into AGI Infra Leader

2026-05-07 02:46 UTC

As the AI industry enters the Agent era, token demand has exploded. Infinigence AI, China's leading neutral AGI infrastructure provider, has raised over 2.2 billion yuan in total, with daily token calls growing over 20-fold since end of 2025. The company underpins major Chinese models like Kimi, GLM, MiniMax, and DeepSeek, positioning itself as a key hub in the token economy.

Agent era drives token consumption from hundreds to millions per task, reshaping infrastructure needs.
Infinigence AI's token call volume doubles every two weeks, far outpacing national average.

Deepseek nears $45 billion valuation as China's state chip fund leads round

2026-05-06 13:22 UTC

Deepseek is close to a funding round that could value the Chinese AI lab at roughly $45 billion, according to the Financial Times. The talks are being led by the China Integrated Circuit Industry Investment Fund, with Tencent also negotiating a stake.

Deepseek's valuation could reach $45B in upcoming round
China's state chip fund 'Big Fund' is leading the talks

Amazon brings agentic fine-tuning to SageMaker with support for Llama, Qwen, Deepseek, and Nova

2026-05-05 10:08 UTC

Amazon SageMaker AI now includes an AI agent that lets developers describe use cases in plain language, automatically recommends training methods, prepares data, kicks off training, and delivers editable Jupyter notebooks. Supports Llama, Qwen, Deepseek, and Nova model families.

SageMaker AI introduces the Kiro AI agent for automated fine-tuning via natural language.
The agent is preinstalled in the development environment; alternative agents like Claude Code can be used.

Last Week in AI #340 - OpenAI vs Musk + Microsoft, DeepSeek v4, Vision Banana

2026-05-05 08:30 UTC

The first week of the Musk v. Altman trial concluded with Musk's testimony dominating; OpenAI and Microsoft renegotiate their partnership, ending exclusivity; DeepSeek previews V4 models that narrow the gap with frontier models; Google DeepMind introduces Vision Banana, a unified model for image generation and visual understanding.

Musk admitted xAI partly distilled from OpenAI models during the trial's first week.
Microsoft and OpenAI revised their agreement, ending Microsoft's exclusive cloud rights; OpenAI can now use AWS and other providers.

LWiAI Podcast #243 - GPT 5.5, DeepSeek V4, AI safety sabotage

2026-05-04 07:54 UTC

Our 243rd episode with a summary and discussion of last week’s big AI news, including OpenAI's GPT-5.5, xAI's Grok Voice Think Fast 1.0, DeepSeek V4 open source, Google's massive investment in Anthropic, and safety research on sabotage and document corruption.

OpenAI released GPT-5.5 with strong coding improvements and a system card on chain-of-thought monitorability
xAI launched Grok Voice Think Fast 1.0, claiming big benchmark leads in real-time voice agents

"DeepSeek Version of Claude Code" – 2.3k Stars on GitHub

2026-05-04 06:09 UTC

DeepSeek-TUI is a Rust-based terminal coding agent optimized for DeepSeek models. It recently surged in popularity after the release of DeepSeek-V4 and the developer's Chinese-language promotion, hitting GitHub's trending list with over 2,300 stars. The tool offers chain-of-thought visualization, context compression, RLM multi-agent parallelism, and multiple model switching options.

DeepSeek-TUI is a terminal coding agent akin to Claude Code, specifically optimized for DeepSeek models, now with 2.3k GitHub stars.
Created by independent developer Hunter Bown, it is written in Rust and open-sourced under the MIT license.

DeepSeek

Related topics