China AI AI News

China AI updates

moonshotai/Kimi-K3

2026-07-27 23:39 UTC

Moonshot AI has released the weights for its 2.8 trillion parameter Kimi K3 model, a hefty 1.56TB on Hugging Face. The K3 license no longer calls itself 'modified MIT' and requires a separate agreement for larger companies. OpenRouter already offers K3 from 7 providers at the same pricing as Moonshot itself.

Moonshot released Kimi K3 weights with 2.8 trillion parameters.
K3 license requires a separate agreement for MaaS businesses exceeding $20M annual revenue.

Why China is giving away its best AI models

2026-07-27 16:51 UTC

Moonshot AI's Kimi K3 model outperforms top US models at a fraction of the cost. The company plans to release its weights for free, targeting US users, sparking concerns about the dominance of closed American models. Open-weight models offer developers control and flexibility, and China's push is strategic amid chip restrictions and geopolitical ambitions.

Moonshot AI's Kimi K3 outperforms US models at lower cost and will be released as open-weight.
Open-weight models threaten the dominance of US proprietary AI systems.

Probing Latent Colombian Identity Inferences in Qwen2.5-7B with Natural Language Autoencoders

2026-07-27 04:00 UTC

This pilot study uses Natural Language Autoencoders (NLA) to probe whether Qwen2.5-7B-Instruct internally represents Colombian identity, socioeconomic status, or stereotype-related information when processing Colombian-Spanish and English prompts. Using 30 prompts arranged as 15 matched Spanish-English pairs spanning explicit Colombian cues, implicit Colombian cues, and neutral controls, it reports descriptive rates and qualitative evidence rather than statistically powered effects.

Uses Natural Language Autoencoders (NLA) to verbalize residual-stream activations from layer 20.
Dataset contains 30 prompts as 15 matched Spanish-English pairs.

Kimi K3 by Moonshot now available on Modal

2026-07-27 00:00 UTC

Moonshot has released Kimi K3, a 2.8 trillion parameter multimodal model, now available on Modal at 460 tokens per second. The model features a mixture-of-experts architecture, 1M token context window, native vision, and is optimized with a custom DFlash speculator for faster inference.

Kimi K3 is a 2.8 trillion parameter multimodal model ranking 4th on Artificial Analysis Intelligence Index. It uses MoE with 16/896 experts per token and 1M context window.
Modal supports K3 on day zero with token-based Shared API and dedicated Auto Endpoint, plus a custom DFlash speculator.

Kimi K3 is not cheap

2026-07-26 19:37 UTC

Contrary to popular claims, the open-weights LLM Kimi K3 from Moonshot AI is not cheap. While its performance is close to top US models, its cost per task is comparable to OpenAI's top model and significantly higher than other Chinese models like DeepSeek V4.

Kimi K3 is a new open-weights LLM from Chinese lab Moonshot AI, sparking debate over its cost.
Commentators mistakenly claim K3 is cheap; in reality, its cost per task is similar to top US models.

How to Build an End-to-End OCR Pipeline with Baidu’s Unlimited-OCR for High-Resolution Images and Multi-Page PDF Parsing

2026-07-24 05:16 UTC

In this tutorial, we build a complete workflow for running Baidu’s Unlimited-OCR model on document images and multi-page PDFs. From configuring the GPU environment to comparing high-detail tiled Gundam inference and faster Base modes, you'll learn how to process dense layouts, tables, and cross-page content in a reproducible, end-to-end pipeline.

Configure GPU environment and install dependencies for Baidu's Unlimited-OCR.
Generate structured sample documents with tables and footnotes.

AI Was Supposed to Lift Everybody., The Price Tag Says Otherwise

2026-07-24 03:06 UTC

A real-world test reveals that running AI agents on top-tier US models like GPT-5.6 Sol can cost $300 for two hours, while comparable work on Chinese open-weight models like DeepSeek V4 Flash costs under $3. This price disparity threatens to exclude small businesses, freelancers, and students from benefiting from AI, despite a minimal quality gap. The article argues for competitive pricing and warns of geopolitical restrictions that could further limit access.

A two-hour AI agent session on OpenAI's GPT-5.6 Sol cost $285-300, while equivalent work on DeepSeek V4 Flash cost about $3.
The quality gap between top US and Chinese models is only about 2 points on the Artificial Analysis Intelligence Index.

Show HN: Run GLM-4.5-Air(110B)on a 16GBRAM consumer machine

2026-07-23 22:24 UTC

Quantprobe project investigates bit placement in memory tiers for LLMs, establishing four falsifiable laws and a toolset enabling large models (up to 110B) on consumer hardware like a GTX 1060 6GB with 16GB DDR4. Pre-registered predictions match measurements, e.g., GLM-4.5-Air 110B streamed from SATA at 0.19 tok/s and Qwen3-30B-A3B at 19.3 tok/s via hybrid placement.

Four placement laws (rank-conditional rotation, dense everywhere, measurable fragility, tiered decode) validated via pre-registered predictions
Quantprobe tool: probe-then-quantize in 30 minutes, depth-aware quantization, interactive calculator

The White House Is Trying to Figure Out What to Do About Chinese AI

2026-07-23 07:02 UTC

The Trump administration is split over how to respond to the rapid rise of China’s leading AI models. The White House pushes for stricter controls, while the Commerce Department views them as unworkable. After China’s Moonshot AI released the Kimi K3 model rivaling top US models, the White House considers taking action against distillation attacks, but no formal request has been sent to the Commerce Department yet.

The White House and Commerce Department are divided over China AI policy, with the White House favoring strict controls and the Commerce Department deeming them unworkable.
China's Moonshot AI released the Kimi K3 model, which rivals top US models from Anthropic and OpenAI, intensifying US security concerns.

Laguna S 2.1 Released: Cheaper than Deepseek v4 Flash, Better than V4 Pro

2026-07-23 05:18 UTC

A new model release from Poolside AI challenges the efficiency frontier, while the AI community grapples with a security incident and geopolitical tensions over distillation.

Laguna S 2.1 is a 118B MoE model with 8B active parameters, open-weights, and 1M context length.
The OpenAI/Hugging Face incident highlights risks of reward misspecification in autonomous agents.

ChronoStitch: Training-Free Composition of Visual KV Memories for Long-Horizon Temporal Reasoning

2026-07-23 04:00 UTC

This paper introduces ChronoStitch, a training-free method for composing independently stored visual key-value (KV) memories to enable long-horizon temporal reasoning in video question answering. By re-basing stored post-rotary keys onto a global three-axis multimodal RoPE coordinate system and selectively recomputing high-deviation visual tokens, it overcomes temporal phase collisions and content gaps from naive concatenation. Experiments on Qwen2.5-VL-3B and the temporal split of TempCompass show improved event-ordering accuracy and 3.3x speedup over full joint re-prefilling.

Long-video QA requires preserving visual evidence over time; KV caching is practical but naive concatenation loses global order.
ChronoStitch re-bases keys to a global RoPE coordinate system and selectively recomputes high-deviation tokens for training-free composition.

Benchmarking Confidential GPU Inference on NVIDIA H100 under Intel TDX

2026-07-23 04:00 UTC

A new study benchmarks the performance cost of enabling confidential computing for LLM inference on an NVIDIA H100 GPU under Intel TDX. Using Mistral-7B and Qwen3-30B-A3B models, results show a 21.8%-27.8% increase in time-to-first-token and 17.7%-21.1% drop in global token throughput in confidential mode. The larger model reaches saturation earlier, highlighting the need for capacity planning adjustments.

Confidential computing is becoming a practical requirement for AI inference but introduces performance overhead.
The study tests two LLMs on an H100 GPU within an Intel TDX confidential instance.

Updates on Chinese AI: Kimi-K3, Xi at WAIC, and 4 Months to Mythos

2026-07-22 22:35 UTC

An analysis of recent Chinese AI developments including Xi Jinping's endorsement of 'open source and openness' at WAIC, new regulations on AI chatbots, China's push into the Global South, and a UK study showing Chinese open-weight models are closing the gap with frontier closed-source models.

Xi Jinping endorsed 'open source and openness' at WAIC, but the term is broader than just open-source code and may allow exceptions for frontier models.
Multiple Chinese ministries released AI policy documents at WAIC, signaling increased international engagement.

Sanctions and Entity List designations are on the table for Chinese AI models

2026-07-22 22:21 UTC

The U.S. supports open-source AI but warns that Chinese companies engaging in covert distillation attacks that amount to IP theft will face sanctions and Entity List designations.

U.S. supports open-source AI but opposes IP theft
Chinese firms conduct industrial-scale distillation attacks

Open models recap: more on Kimi K3, Qwen 3.8, Xi's WAIC speech, distillation, the open-closed gap, and what's next

2026-07-22 14:09 UTC

In this podcast, Nathan and Florian discuss recent developments in open AI models, including the release of Kimi K3, Qwen's open-weight strategy, Xi Jinping's speech at WAIC supporting open source, the performance gap between open and closed models, and the distillation controversy. They delve into why Chinese models are performing well, the state of the US open model ecosystem, and predictions for the future.

Kimi K3 shows strong performance in coding and research tasks but faces infrastructure and API congestion issues.
Chinese models like GLM 5.2 and Kimi K3 are narrowing the gap with frontier closed models.

Convolution for Large Language Models

2026-07-22 04:00 UTC

This paper studies whether lightweight depthwise convolutions can provide local inductive bias to LLMs without materially increasing model size. Macro-level ablation on Qwen3 Transformer blocks finds optimal placement of convolution on projected queries, keys, and values before attention. Micro-level study favors a residual depthwise convolution with kernel size k=3 without extra normalization or activation. Across Qwen3 models and data budgets, this design improves average accuracy on seven downstream benchmarks while adding less than 0.01% parameters. A case study suggests convolution makes repeated token IDs more sensitive to immediate context.

Optimal convolution location is on QKV projections before attention in Qwen3 Transformer blocks.
Best design is a residual depthwise convolution with kernel size 3, no extra normalization or activation.

Neill Blomkamp’s new zombie AI ‘film’ is just slop warmed over

2026-07-21 22:06 UTC

On Monday, District 9 and Gran Turismo director Neill Blomkamp unveiled his latest project: a 13-minute sci-fi short titled Nightborne that's loosely based on Peter Watts' 2014 novel Echopraxia. The short comes from Blomkamp's new AI startup / production company, Barley Studios, and features characters whose voices and faces are modeled after human actors. But every single one of Nightborne's shots was made with ByteDance's Seedance 2.0 text-to-video generator. In an X post about the short, Blomkamp described it as a "test start" meant to demonstrate what generative AI is capable of, and he said that he wants "to tackle a full feature in this format" at some point in the future. But as polished as the “film” might be compared to most of the AI-generated videos floating around the internet today, it still bears many of the hallmarks we associate with slop. And even though a team of real people was involved in the project’s production, Nightborne is such a terrible watch that one would be hard-pressed to call it the future of moviemaking.

Nightborne is a 13-minute AI-generated short by Neill Blomkamp, made entirely with ByteDance's Seedance 2.0.
The film features 32 actors' likenesses but suffers from obvious AI artifacts like gibberish text and unnatural speech.

Jim Cramer worried about security implications of free Chinese AI models

2026-07-21 18:30 UTC

Jim Cramer warns U.S. companies against using Chinese AI models to save costs, citing national security concerns. He supports OpenAI and Anthropic's stance and recommends Bing West's new book.

Cramer argues U.S. companies should not use Chinese AI models to save money.
He claims these models are controlled by the PLA, posing a national security threat.

Validating Distributed LLM Serving Benchmarks with NVIDIA srt-slurm, SLURM Recipes, Parameter Sweeps, and Pareto Analysis

2026-07-21 16:29 UTC

This tutorial explores NVIDIA's srt-slurm framework, learning how to use srtctl to convert declarative YAML configurations into reproducible SLURM benchmark workflows for distributed LLM serving. We set up the project in Google Colab, inspect its internal architecture, define a cluster configuration, dry-run built-in and custom recipes, and model a disaggregated prefill-and-decode deployment for DeepSeek-R1. We also generate parameter sweeps, interact with the typed Python API, validate expanded configurations, and analyze simulated benchmark results through a throughput-versus-latency Pareto frontier.

srtctl converts YAML configs into SLURM benchmark workflows
Supports disaggregated prefill and decode deployments

Alibaba Qwen 3.8 Max Shows China Closing in on U.S. Models

2026-07-21 16:00 UTC

The low-cost, open-weight model and others from China give enterprises more choices, given the performance claims of some Chinese model providers.

Alibaba releases Qwen 3.8 Max, a low-cost open-weight AI model.
The model shows China's AI performance is approaching U.S. levels.

“Second only to Fable 5:” Alibaba talks the talk with Qwen3.8 without providing any real data

2026-07-21 12:00 UTC

Alibaba announced Qwen3.8, claiming it is second only to Anthropic's Fable 5, but provided no benchmarks or model card. The announcement comes on the heels of rival Moonshot's Kimi K3 launch with full technical details. Alibaba's lack of transparency raises questions about timing and motivation.

Alibaba claims Qwen3.8 is second only to Fable 5 but provides no supporting data.
The announcement follows Moonshot's Kimi K3 debut with complete benchmarks and technical details.

Last Week in AI #251 - Mythos Back, Sonnet 5, Etched, LongCat

2026-07-21 11:31 UTC

Trump lifts restrictions on Anthropic, Anthropic launches Claude Sonnet 5, Google's NotebookLM updates, chips stories from Etched and Baidu, and more!

Anthropic redeploys Claude Fable 5 with new cybersecurity classifiers
Anthropic launches cheaper Claude Sonnet 5 for agentic tasks

The Sequence Knowledge #898: The Trace Is the Teacher: Distilling Reasoning Into Small Models

2026-07-21 11:03 UTC

From the release of DeepSeek R1, distillation in reasoning models have become one of the most common techniques in frontier AI.

DeepSeek R1 generated 800k reasoning traces for distillation.
Simple supervised fine-tuning on small models yielded remarkable reasoning gains without reinforcement learning.

LWiAI Podcast #248 - Claude Fable 5, Siri AI, Anthropic IPO, and More

2026-07-21 10:03 UTC

This episode covers Anthropic's Claude Fable 5 and its safety controversies, Apple's Siri AI announcement at WWDC, Google's Gemini 3.5 Live Translate and pricing changes, the IPO race among OpenAI, Anthropic, and SpaceX, Prometheus raising $12B, DeepSeek's funding, Huawei's post-training of DeepSeek models, Google paying SpaceX for GPUs, open-source releases Gemma 4 and DiffusionGemma, AI safety policy developments, and more.

Anthropic released Claude Fable 5 with major benchmark improvements but faced controversy over guardrails and silent downgrades.
Apple announced Siri AI at WWDC, built on a Gemini partnership for a more capable assistant.

LWiAI Podcast #247 - Opus 4.8, MAI, Anthropic IPO, Minimax-M3

2026-07-21 09:38 UTC

This episode covers Anthropic's Claude Opus 4.8, Microsoft's MAI models, Anthropic's IPO filing, and the impressive Minimax-M3 model among other AI news.

Anthropic releases Claude Opus 4.8 with Dynamic Workflows and improved benchmarks
Microsoft unveils Scout assistant and MAI model family including MAI Thinking 1

Chinese open-weight models are cheap. Washington is deciding what that costs.

2026-07-21 08:00 UTC

US policymakers are debating whether to create regulatory risk around Chinese open-weight models. The release of Moonshot AI's Kimi K3 reignited the argument. Enterprises face not just performance questions but whether these models will remain easily accessible in a year.

Moonshot AI's Kimi K3, the largest open-weight model to date, rekindled a dormant policy debate in Washington.
Potential mechanisms include procurement rules, export blacklists, and security advisories that ripple through global cloud providers.

NVIDIA Releases Cosmos 3 Edge: A 4B-Parameter Open World Model That Reasons and Generates Robot Actions On-Device

2026-07-21 07:48 UTC

NVIDIA has released Cosmos 3 Edge, a 4-billion-parameter open world model built to run on-device. It helps robots and vision AI agents understand surroundings, reason in real time, and generate robot actions locally. The Cosmos 3 family included Cosmos 3 Nano (16B) and Cosmos 3 Super (64B) shipped on May 31, 2026 at GTC Taipei. Edge is the third and smallest tier, at roughly one-sixteenth the size of Super. The problem is specific. Machines operate at the edge in factories, warehouses, and hospitals. They need data center–level performance on memory-constrained systems. Cosmos 3 Edge targets that gap. What does world model do here? A world model learns how an environment changes over time. It represents objects, motion, spatial relationships, and the effects of actions. Consider a robot reaching for an object. Recognizing the object is only the first step. The robot must also track where the object is, how its gripper moves, and what happens on contact. A world model reasons about these relationships. It can predict the visual result of an action, infer the action that caused a change, or generate an action to reach a goal. Cosmos 3 Edge brings these capabilities into one on-device model. Its shared representation lets a system understand the current world state, simulate possible futures, and connect those futures to actions. Two transformer towers, one shared representation. Cosmos 3 uses a Mixture-of-Transformers architecture with two towers, described in NVIDIA's technical report. The autoregressive tower processes vision and text tokens for understanding and reasoning. The diffusion tower processes vision, audio, and action tokens for prediction, generation, and neural simulation. The two towers keep separate normalization layers and multilayer perceptrons. They share multimodal attention layers, which align information across language, video, audio, and action. This lets the model reason about a scene before it generates an output. The attention pattern adapts to each modality. Language uses causal attention, where each token attends to earlier tokens. Diffusion tokens attend more broadly to the available context, supporting coherent prediction and generation. Depending on the task, the model emits reasoning tokens from the autoregressive tower, or denoised video and action tokens from the diffusion tower. Cosmos 3 Edge uses a 2B dense transformer for its reasoner, and follows Qwen3-VL-compatible message conventions for image and video inputs, per the Cosmos GitHub repository. One action representation across embodiments. Physical systems describe actions differently. A vehicle uses ego pose and movement. A camera uses camera motion. A robot arm uses the pose of its end effector, and a gripper adds grasp state. Cosmos 3 maps these embodiments into a common action representation. Actions are encoded as compact geometric vectors that capture translation, rotation, and manipulation state. This connects control to the visual structure of the world. The model associates pixel changes with physical motion and control inputs. Generated video then becomes more than a prediction. It represents how the world should change in response to an action. Supported action dimensions depend on the embodiment. The Cosmos GitHub repository lists camera motion (9D), autonomous vehicle (9D), egocentric motion (57D), single-arm robot (10D), dual-arm robot (20D), and humanoid robot (29D). Policy mode runs in both directions. As a policy, Cosmos 3 Edge predicts an action together with its expected visual consequence. Current state goes in; an action and its likely visual outcome come out. Action flows in both directions. The model can predict the effect of an action, or infer the action from its effect. This connects world modeling directly to robot policy training and evaluation. NVIDIA also released Cosmos 3 Edge Policy (DROID). It is a robot manipulation policy post-trained on the DROID dataset for pick-and-place tasks, with post-training scripts included. Developers can fine-tune on a small H100 cluster or an NVIDIA DGX Station before deployment. Is it Deployable? Cosmos 3 Edge delivers memory-efficient inference across NVIDIA edge computers. Targets include NVIDIA RTX PRO GPUs, NVIDIA DGX, GeForce RTX GPUs, and NVIDIA Jetson, including the newly announced Jetson T2000 and T3000 modules. As a post-trained world action model (WAM), the model operates at robot-control resolution of 640×360 observations. On NVIDIA Jetson Thor it generates 32 actions per inference, while achieving real-time control at 15 Hz. For generation, the Edge tier supports 256p and 480p resolutions, 12–30 fps, and 50–150 frames. Using the open Cosmos framework, developers can post-train Cosmos 3 Edge for a specific embodiment and sensor set in about a day. NVIDIA positions a GeForce RTX 3070 or better as a local on-ramp for prototyping. Key Takeaways: Cosmos 3 Edge is a 4B open world model (2B dense reasoner) that runs on-device, released July 20 on Hugging Face. A Mixture-of-Transformers design pairs an autoregressive reasoner tower with a diffusion generator tower through shared multimodal attention. It hits 640×360 control resolution, 32 actions per inference, and 15 Hz real-time control on NVIDIA Jetson Thor. Actions map to a common translation/rotation/manipulation representation, spanning camera, vehicle, single-arm, dual-arm, and humanoid embodiments. Benchmarks (#1 on VANTAGE-Bench at 4B) are internally claimed; the model ships under Linux Foundation OpenMDW-1.1. Sources: Hugging Face launch post, Cosmos3-Edge model card, Cosmos 3 collection, NVIDIA technical report (PDF), Cosmos GitHub, NVIDIA developer blog, Jetson Thor blog and NVIDIA Newsroom: Japan coalition. The post NVIDIA Releases Cosmos 3 Edge: A 4B-Parameter Open World Model That Reasons and Generates Robot Actions On-Device appeared first on MarkTechPost.

Cosmos 3 Edge is a 4B open world model that runs on-device, released July 20 on Hugging Face.
Mixture-of-Transformers design pairs autoregressive reasoner tower with diffusion generator tower via shared multimodal attention.

Though Language Models Err While They Strive: Conformal Prediction for Self-Correcting Scientific Generation

2026-07-21 04:00 UTC

This paper introduces Scientific Feasibility Control (SFC), a graph-structured conformal prediction framework that provides statistical guarantees for scientific reasoning validity. SFC decomposes reasoning into atomic factuality units and uses dynamic branching to correct errors. On PhyX, it achieves 50.1% accuracy, outperforming DeepSeek-R1 and GPT-4, reduces scientific law violations by 73%, and provides 91.7% validity guarantees at α=0.10.

SFC models logical dependencies as approximate deducibility graphs using conformal prediction.
Dynamic branching reroutes generation when scientific violations are detected.

Committed Before Reasoning: Behavioral Reproduction and Preliminary Activation-Level Evidence of Answer Pre-Commitment in an Open-Weight LLM

2026-07-21 04:00 UTC

A new study uses a simple car-wash question to reveal that language models often pre-commit to an answer before reasoning, failing to derive logically correct conclusions. Experiments on Qwen3-8B show systematic wrong commitments (recommending 'walk' when 'drive' is the only valid option). Activation-level analysis suggests hidden states already lean toward the wrong answer before output, even for rollouts that eventually answer correctly. The findings highlight a pre-reasoning decision bias in LLMs.

Qwen3-8B incorrectly recommends 'walk' in 85-100% of sampled rollouts for a simple logic task.
Hidden state analysis reveals pre-commitment bias toward 'walk' before answer generation, even in correct-answer rollouts.

PlanFlip: Attacking Multi-Agent LLM Systems via Planning-Phase Prompt Injection

2026-07-21 04:00 UTC

A new research paper introduces PlanFlip, a framework of four planning-phase prompt injection attacks against multi-agent LLM systems. The study finds that stronger models like GPT-5 are more vulnerable, homogeneous backbones create a correlated-agent blind spot, and reasoning-augmented models like DeepSeek-R1 resist attacks. Two defenses are proposed with high detection rates.

PlanFlip introduces four prompt injection attacks targeting the planning phase of multi-agent LLM systems.
Stronger models (e.g., GPT-5) show higher attack success, contradicting the assumption that capability equals security.

[AINews] not much happened today

2026-07-21 03:58 UTC

A quiet day on the surface, but packed with developments: US policy targets Chinese open models, Kimi K3 and Qwen 3.8 advance, agent-centric generalization gains traction, and models show superhuman math abilities.

US considers de facto ban on cutting-edge Chinese open models like Kimi, drawing technical backlash.
Kimi K3 ranks #1 on DesignArena; Alibaba confirms Qwen 3.8 Max will be open-weight.

Alibaba’s Tongyi Lab Releases Qwen-Audio-3.0-TTS, a Hosted Text-to-Speech Model in Flash and Plus Tiers Across 16 Languages

2026-07-20 21:14 UTC

Alibaba’s Tongyi Lab has released Qwen-Audio-3.0-TTS, a production-oriented TTS system with two variants: Flash for real-time interaction and Plus for high-quality generation. The hosted model covers 16 languages and 20 Chinese dialects, features natural-language style control and 86 fine-grained inline tags, and ranks first on the Artificial Analysis leaderboard.

Qwen-Audio-3.0-TTS is available in Flash (~300 ms first-packet latency) and Plus (quality-first) tiers, both as hosted API models via Alibaba Cloud Model Studio.
Plus ranks #1 on the Artificial Analysis arena at ~1,236 Elo, priced at ~$27.59 per 1M characters, but only ~16 chars/sec throughput.

Who’s Afraid of Chinese Models?

2026-07-20 17:09 UTC

Ben Thompson proposes US legislation to clarify that training data collection is fair use, and to bar terms of service that forbid distillation, in order to help US open models compete with Chinese counterparts. Additionally, Alibaba's release of Qwen 3.8 Max as open weights may have been influenced by Xi Jinping's recent speech encouraging open source.

Ben Thompson proposes US law to make training data fair use and forbid distillation bans.
Distillation (querying API) is nearly impossible to stop; US should lean into it.

Kimi K3: The open-weights escalation

2026-07-20 16:06 UTC

Moonshot AI released Kimi K3, a 2.8T parameter MoE model with open weights, ranking high on benchmarks. The article discusses the narrowing gap between Chinese and US AI models, China's commitment to open source, economic impacts of open models, and China's efficiency advantages.

Kimi K3 is a 2.8T parameter MoE open-weights model, approaching frontier performance.
Chinese AI labs demonstrate independent innovation, not just fast following.

Import AI 465: Open vs closed gaps; Kimi K3; Demis' big policy plan

2026-07-20 12:31 UTC

UK's AISI finds the gap between open and closed models on cybersecurity is shrinking. Chinese company Moonshot AI releases Kimi K3, a frontier-level model with some brittleness. Demis Hassabis proposes a FINRA-like standards body for AGI. Research shows LLMs can smuggle side channel tasks, evading monitors.

Open-weight models' cyber capability gap with closed models narrowed from 6-10 months to 4-7 months.
Kimi K3, a 2.8 trillion parameter model, matches or trails Claude Fable 5 and GPT 5.6 Sol but may be over-optimized for benchmarks.

China delivers a one-two punch to America’s AI dominance

2026-07-20 10:16 UTC

Chinese AI leaders Moonshot and Alibaba released models that claim to match top US systems at lower cost. Their open-source approach challenges US dominance and raises questions about the effectiveness of export controls and massive spending.

Moonshot unveiled Kimi K3, Alibaba previewed Qwen3.8, both claiming near-top performance.
Models are open-source or open-weight, contrasting with US labs' proprietary approach.

Kimi K3 open-weight model: China’s biggest AI is a bet on memory, not compute

2026-07-20 09:00 UTC

Moonshot AI’s Kimi K3, a 2.8-trillion-parameter open-weight model, uses mixture-of-experts, quantization, and attention caching to trade compute for memory, circumventing US chip restrictions. While it tops benchmarks in coding, deployment requires data-center infrastructure, pricing is high, and software support is incomplete.

Kimi K3 has 2.8 trillion parameters, making it the largest open-weight model released.
It employs mixture-of-experts, quantisation-aware training, and Kimi Delta Attention to reduce compute and memory demands.

Kimi K3 Why Washington Is Watching

2026-07-20 08:16 UTC

Moonshot AI's Kimi K3 tops the Frontend Code Arena benchmark, sparking debate on US AI regulation. While strong on frontend coding, it lags behind Claude Fable 5 overall. The release intensifies the policy fight over AI regulation and open-weight models, with implications for stocks like NVIDIA and Anthropic.

Kimi K3 scores 1679 on Frontend Code Arena, beating Claude Fable 5 and GPT-5.6 Sol, but Fable 5 still leads in 8 of 14 benchmarks.
White House advisor David Sacks cites K3 as evidence that US AI regulation harms competitiveness, fueling policy debate.

I compared 5 AI coding subscriptions by pricing model and usage limits

2026-07-20 06:59 UTC

2026 AI coding plans use different billing models: fixed monthly tokens, credits, time-refreshed quotas, or reduced priority after high-speed allowance. This article compares MiniMax, Xiaomi MiMo, GLM, Kimi Code, and Canopy Wave on pricing, limits, integrations, and best-fit use cases to help developers choose based on their workflow.

AI coding subscriptions vary in billing: token plans, credit plans, prompt-based quotas with rolling resets, and unlimited continued access with fair-use policies.
MiniMax suits developers needing coding plus multimodal features; Xiaomi MiMo offers low-cost entry and large credit packages; GLM targets ecosystem users; Kimi Code provides first-party CLI/IDE experience; Canopy Wave offers predictable high-volume API costs.

Better Starts, Better Ends: Bootstrapped Iterative Self-Reasoning Distillation for Compressed Reasoning

2026-07-20 04:00 UTC

The paper introduces BIRD, a two-stage self-reasoning distillation method that first samples concise solutions with a brevity instruction and performs prompt-switch SFT, then applies on-policy reverse-KL distillation on cleaner prefixes. On Qwen3-8B, MATH-500 accuracy improves from 86.2% to 92.0% while response length drops from 3,099 to 1,115 tokens.

Existing on-policy self-distillation has an initialization bottleneck due to training on noisy prefixes.
BIRD's first stage uses brevity instruction sampling and prompt-switch SFT to make conciseness a default behavior.

Best Local LLMs You Can Run on a Single 24GB GPU in 2026: Qwen, Gemma, Mistral, DeepSeek Compared

2026-07-20 01:18 UTC

A single 24GB GPU is the practical floor for serious local inference. This guide compares six open-weight models that fit one card at Q4_K_M, including Qwen3.6, Gemma 4, Mistral Small, gpt-oss-20b, and DeepSeek-R1-Distill. It covers VRAM fit, licensing, and the job each does best.

24GB is the practical floor: run right-sized 20B–35B models, not the biggest 70B quant you can squeeze in.
Qwen3.6-27B is the strongest all-around default; DeepSeek-R1-Distill-Qwen-32B is the tightest fit at ~18–20GB.

Alibaba Previews Qwen3.8-Max, a 2.4 Trillion-Parameter Multimodal Model, Days After Moonshot’s Kimi K3 Open-Weight Launch

2026-07-19 21:42 UTC

Alibaba's Qwen team previewed Qwen3.8-Max-Preview, a 2.4 trillion-parameter multimodal MoE model it calls "second only to Fable 5." The preview is live on Token Plan, Qoder, and QoderWork at 10% of standard pricing. What is not live: any benchmark table, model card, license, per-token price, or active-parameter count. This breakdown separates what Alibaba confirmed from what it only claimed.

Qwen3.8-Max-Preview is live via Token Plan, Qoder, and QoderWork at 10% of standard pricing.
The 2.4T parameter count and "second only to Fable 5" ranking are Alibaba's claims, not verified benchmarks.

Moonshot AI suspends new subscriptions due to Kimi K3 demand

2026-07-19 16:02 UTC

Moonshot AI temporarily halts new subscriptions after unexpected demand for Kimi K3 pushes GPU capacity to the limit.

Demand for Kimi K3 approached capacity within 48 hours
New subscriptions paused to protect existing users

The Sequence Radar #897: Last Week in AI: China, Compression and the Open-Model Race

2026-07-19 11:00 UTC

This week's developments in AI shift focus from raw scale to distribution and openness. Highlights include Thinking Machines' open-weight Inkling, Moonshot AI's 2.8T parameter Kimi K3, PrismML's phone-runnable Bonsai 27B, OpenAI's self-play red-teaming system GPT-Red, and Xi Jinping's call for open-source AI as a global public good at the World AI Conference in Shanghai.

Thinking Machines released Inkling, a 975B MoE model with open weights and 1M context
Moonshot AI unveiled Kimi K3, a 2.8T parameter model optimized for long-horizon tasks

Qwen 3.8 Max

2026-07-19 10:41 UTC

Qwen 3.8 Max is the latest model in the Qwen series, now available on the Qwen website.

Qwen 3.8 Max has been released
Visit the Qwen website for more details

China cracks down on AI companions, forcing millions to break up

2026-07-19 01:53 UTC

New regulations in China ban tech companies from offering AI or virtual partners for minors, and require platforms to limit excessive use and forbid chatbots from encouraging emotional reliance. The move aims to stop the erosion of real-world relationships and reverse the falling birth rate. Tech giants ByteDance, Alibaba, and Tencent have shut down personalized AI companion chatbot features, forcing millions to part with their virtual partners.

New regulations ban AI companions for minors and restrict emotional reliance on chatbots.
China's government aims to boost birth rates and prevent avoidance of real relationships.

Kimi K3 vs DeepSeek V4 Pro vs GLM-5.2: Open Trillion-Scale MoE Models Compared on Benchmarks, License, and Serving Cost

2026-07-19 01:41 UTC

Three Chinese labs' flagship open-weight MoE models—Kimi K3, DeepSeek V4 Pro, and GLM-5.2—each excel in benchmarks, licensing, and cost. Kimi K3 leads in capability but is API-only; DeepSeek V4 Pro is cheapest and fully open; GLM-5.2 balances speed and deployability.

Kimi K3 (2.8T params) tops the AAI Index at ~57 but weights won't be available until July 27 under a Modified MIT license.
DeepSeek V4 Pro (1.6T params) is MIT-licensed, costs ~$0.04 per task, and offers immediate open weights.

Fine-Tuning Qwen3 with LoRA Using NVIDIA NeMo AutoModel: A Complete Single-GPU Google Colab Workflow Tutorial

2026-07-19 01:08 UTC

This tutorial provides a step-by-step guide to fine-tune Qwen3-0.6B with LoRA using NVIDIA NeMo AutoModel on a single GPU in Google Colab. It covers environment setup, recipe patching, training, evaluation, and Python API usage.

Set up NeMo AutoModel environment on Colab single GPU
Load and adjust Qwen3-0.6B LoRA fine-tuning recipe

AI boom built on debt, investor demand plunging, hyperscalers ramp up bond blitz

2026-07-18 12:58 UTC

The AI boom is increasingly financed by debt, but investor demand is falling as hyperscalers accelerate bond issuance. Amazon's recent bond sale required higher yields due to lower demand, with order coverage dropping. AI bond supply is surging while investors demand wider spreads. Meanwhile, the breakthrough performance of Chinese AI model Kimi K3 raises concerns about the sustainability of US AI spending, potentially leading to an economic slowdown.

Since early 2025, Alphabet, Meta, Amazon, and Oracle have issued over $300 billion in bonds.
Investor demand for AI bonds is declining; Amazon's bond orders fell from 3.2x to 2.5x coverage.

Controlling Reasoning Effort in LLMs

2026-07-18 11:16 UTC

This article explores how to develop reasoning models with multiple effort modes, covering the evolution from o1 and DeepSeek-R1 to GPT-5.6, and key techniques such as RLVR training, inference scaling, think tokens, and reasoning mode toggles.

Reasoning models output intermediate reasoning traces, distinguishing them from conventional LLMs.
RLVR training rewards only final answer correctness, not the reasoning trace.

China AI

Related topics

China AI updates

moonshotai/Kimi-K3

Why China is giving away its best AI models

Probing Latent Colombian Identity Inferences in Qwen2.5-7B with Natural Language Autoencoders

Kimi K3 by Moonshot now available on Modal

Kimi K3 is not cheap

How to Build an End-to-End OCR Pipeline with Baidu’s Unlimited-OCR for High-Resolution Images and Multi-Page PDF Parsing

AI Was Supposed to Lift Everybody., The Price Tag Says Otherwise

Show HN: Run GLM-4.5-Air(110B)on a 16GBRAM consumer machine

The White House Is Trying to Figure Out What to Do About Chinese AI

Laguna S 2.1 Released: Cheaper than Deepseek v4 Flash, Better than V4 Pro

ChronoStitch: Training-Free Composition of Visual KV Memories for Long-Horizon Temporal Reasoning

Benchmarking Confidential GPU Inference on NVIDIA H100 under Intel TDX

Updates on Chinese AI: Kimi-K3, Xi at WAIC, and 4 Months to Mythos

Sanctions and Entity List designations are on the table for Chinese AI models

Open models recap: more on Kimi K3, Qwen 3.8, Xi's WAIC speech, distillation, the open-closed gap, and what's next

Convolution for Large Language Models

Neill Blomkamp’s new zombie AI ‘film’ is just slop warmed over

Jim Cramer worried about security implications of free Chinese AI models

Validating Distributed LLM Serving Benchmarks with NVIDIA srt-slurm, SLURM Recipes, Parameter Sweeps, and Pareto Analysis

Alibaba Qwen 3.8 Max Shows China Closing in on U.S. Models

“Second only to Fable 5:” Alibaba talks the talk with Qwen3.8 without providing any real data

Last Week in AI #251 - Mythos Back, Sonnet 5, Etched, LongCat

The Sequence Knowledge #898: The Trace Is the Teacher: Distilling Reasoning Into Small Models

LWiAI Podcast #248 - Claude Fable 5, Siri AI, Anthropic IPO, and More

LWiAI Podcast #247 - Opus 4.8, MAI, Anthropic IPO, Minimax-M3

Chinese open-weight models are cheap. Washington is deciding what that costs.

NVIDIA Releases Cosmos 3 Edge: A 4B-Parameter Open World Model That Reasons and Generates Robot Actions On-Device

Though Language Models Err While They Strive: Conformal Prediction for Self-Correcting Scientific Generation

Committed Before Reasoning: Behavioral Reproduction and Preliminary Activation-Level Evidence of Answer Pre-Commitment in an Open-Weight LLM

PlanFlip: Attacking Multi-Agent LLM Systems via Planning-Phase Prompt Injection

[AINews] not much happened today

Alibaba’s Tongyi Lab Releases Qwen-Audio-3.0-TTS, a Hosted Text-to-Speech Model in Flash and Plus Tiers Across 16 Languages

Who’s Afraid of Chinese Models?

Kimi K3: The open-weights escalation

Import AI 465: Open vs closed gaps; Kimi K3; Demis' big policy plan

China delivers a one-two punch to America’s AI dominance

Kimi K3 open-weight model: China’s biggest AI is a bet on memory, not compute

Kimi K3 Why Washington Is Watching

I compared 5 AI coding subscriptions by pricing model and usage limits

Better Starts, Better Ends: Bootstrapped Iterative Self-Reasoning Distillation for Compressed Reasoning

Best Local LLMs You Can Run on a Single 24GB GPU in 2026: Qwen, Gemma, Mistral, DeepSeek Compared

Alibaba Previews Qwen3.8-Max, a 2.4 Trillion-Parameter Multimodal Model, Days After Moonshot’s Kimi K3 Open-Weight Launch

Moonshot AI suspends new subscriptions due to Kimi K3 demand

The Sequence Radar #897: Last Week in AI: China, Compression and the Open-Model Race

Qwen 3.8 Max

China cracks down on AI companions, forcing millions to break up

Kimi K3 vs DeepSeek V4 Pro vs GLM-5.2: Open Trillion-Scale MoE Models Compared on Benchmarks, License, and Serving Cost

Fine-Tuning Qwen3 with LoRA Using NVIDIA NeMo AutoModel: A Complete Single-GPU Google Colab Workflow Tutorial

AI boom built on debt, investor demand plunging, hyperscalers ramp up bond blitz

Controlling Reasoning Effort in LLMs

More growth tags

AI Coding

MCP

Open Source Models

Inference Cost

Agent Frameworks

GPU Infrastructure

Model Pricing

DeepSeek

Qwen