AI News HubLIVE

China AI updates

Google files first joint lawsuit with FBI over Chinese AI scam network, OpenAI blocks PRC influence clusters

Within days of each other, Google and OpenAI separately exposed operations allegedly originating in China that use AI for fraud and covert influence campaigns. Both target US infrastructure and political debates.

  • Google and FBI jointly sue Chinese cybercrime network for using Gemini AI to defraud Americans.
  • OpenAI bans two ChatGPT clusters linked to China for manipulating US tech policy debates.
In-site article

Moonshot AI Launches Kimi Work, a Local Desktop Agent Reportedly Running on Kimi K2.6 With a 300-Sub-Agent Agent Swarm

Moonshot AI has introduced Kimi Work, a local desktop AI agent for macOS and Windows that runs a swarm of up to 300 sub-agents on your machine. It drives your logged-in browser via WebBridge, reads local files, and schedules background jobs with a built-in cron engine. Based on the Kimi K2.6 MoE model (≈32B active parameters, 256K context), it targets knowledge workers by keeping data and execution local.

  • Kimi Work is a downloadable local desktop agent, not a cloud service, that directly accesses your files and browser sessions.
  • It supports up to 300 parallel sub-agents coordinated by the Kimi K2.6 model.
In-site article

Pythagoras-Prover: Advancing Efficient Formal Proving via Augmented Lean Formalisation

Pythagoras-Prover is a compute-efficient family of open-source Lean theorem provers, featuring autoregressive models (4B and 32B) and a diffusion-based prover (4B). It uses curriculum SFT with stratified data and dynamic proof filtering for training efficiency, and introduces Augmented Lean Formalisation (ALF) to expand verified corpora via self-distillation. The 4B model outperforms DeepSeek-Prover-V2-671B on MiniF2F-Test (86.1% vs 82.4%) with ~167x fewer parameters, while the 32B model sets a new open-source SOTA at 93.0% and solves 93 PutnamBench problems.

  • Pythagoras-Prover includes autoregressive models at 4B and 32B parameters and a 4B diffusion-based prover that refines proofs iteratively.
  • Training efficiency is achieved via curriculum SFT with stratified difficulty levels and dynamic proof reasoning filtering within an 8k-token context.
In-site article

Building Pakistan Notice Helper: A Small AI Tool for a Very Local Safety Problem

The author developed Pakistan Notice Helper, a safety-focused AI tool for the Hugging Face Build Small Hackathon, designed to help people in Pakistan understand suspicious messages. The tool uses a small model (Qwen3.5 4B) to analyze text or screenshots, providing risk labels, explanations, and safe next steps. It supports English and Urdu, with the Urdu mode featuring a right-to-left layout and Urdu-language assessments. The article shares lessons on model selection, prompting, Urdu UX, and using Codex for rapid development.

  • Pakistan Notice Helper is a local AI safety tool for suspicious messages in Pakistan, supporting text and screenshots.
  • The final model choice was Qwen3.5 4B Q8 via llama.cpp, passing all high-risk scam and screenshot test cases.
In-site article

Seedream 5.0 Image and Video – create AI videos

Seedream AI Studio integrates ByteDance's Seedream image generation models (4.5/5.0/5.0 Lite/4.0) with Kling 2.1 video animation, offering a one-stop text-to-image and image-to-video creation experience. Try for free without sign-up, with multiple pricing plans suitable for e-commerce, social media, and creative professionals.

  • Supports Seedream 4.5/5.0/5.0 Lite/4.0 with one-click switching
  • Generated images can be directly animated into 5-15s videos via Kling 2.1
In-site article

Improving Cross-Lingual Factual Recall via Consistency-Driven Reinforcement Learning

Large language models trained on English data often fail to express world knowledge reliably in other languages, known as cross-lingual factual inconsistency. This paper introduces PolyFact, a large-scale parallel multilingual factual QA dataset with 100K Wikidata-grounded facts across 12 languages. Comparing continual pretraining, supervised fine-tuning, and GRPO-based reinforcement learning on Qwen-2.5-7B and OLMo-2-1124-7B, GRPO consistently outperforms other methods, improving cross-lingual consistency and generalization to unseen languages. Mechanistic analyses show GRPO reduces language specialization in MLP layers and attention heads, promoting shared representations. Code, models, and dataset are released.

  • PolyFact dataset: 100K Wikidata facts across 12 languages for cross-lingual QA.
  • GRPO reinforcement learning outperforms SFT and CPT for cross-lingual factual recall.
In-site article

The OnlyFans Economy of American AI

A scathing critique of the American AI industry, comparing it to an 'OnlyFans economy' where investors and companies blindly worship overhyped and overpriced models. The author argues that Chinese open-source models like Qwen 3.7 Max offer superior value and performance, urging developers to vote with their wallets and avoid paying the 'multiplier' for US frontier models.

  • The author criticizes the hypocrisy and hubris of US AI companies, especially Anthropic and OpenAI.
  • Chinese models like Qwen 3.7 Max match or exceed US models in practical use at a fraction of the cost.
In-site article

Show HN: Best setup local LLM found for a 5090 (llama.cpp fork + turboquant)

This article details the configuration and memory calibration required to run the Qwen 3.6 35B MoE model at a 450,000 token context window on a single 32GB VRAM GPU (NVIDIA RTX 5090) using llama.cpp with TurboQuant and YaRN scaling. It covers model selection, quantization trade-offs, KV cache quantization, RoPE scaling, multimodal setup, replication guide, VRAM lifecycle management, and performance evaluation.

  • Run Qwen3.6-35B-A3B-Q6_K on a single RTX 5090 with 450K context using llama.cpp TurboQuant fork and YaRN scaling.
  • Achieve 450K context by compressing KV cache to 3-bit (turbo3) and extending RoPE beyond native 262K with YaRN, but at cost of perplexity and retrieval accuracy.
In-site article

Five labs, five minds: building a multi-model finance drama on small models

This article is a field report from the second Build Small Hackathon, describing v2 of the 'Thousand Token Wood' simulation. In this version, each of the five woodland creature agents is powered by a different small language model (from OpenAI, OpenBMB, NVIDIA, and a fine-tuned Qwen). The player takes on the role of a shadow financier, able to lend, tip (truthfully or falsely), short, bribe, and broker alliances. The article details engineering challenges: serving layer heterogeneity (vLLM, CUDA toolkit), per-model quirks, a tolerant JSON parser, and a critical information asymmetry firewall to prevent secret flags from leaking into agent prompts. Persistent memory is handled via bounded summaries rather than raw history to avoid prompt inflation. Results show zero leaks, reliable fine-tuned 0.5B performance, and emergent behaviors from heterogeneous agents. Key takeaways: small models are reliable format generators but unreliable reasoners; heterogeneity adds value with manageable cost; secret information requires data-flow-level firewall; bounded memory keeps agents alive without compromising reasoning.

  • Each agent uses a different small model from different labs, making market behavior more realistic and emergent.
  • Information asymmetry is protected by a firewall design; tests prove the hidden truth flag never leaks into agent prompts.
In-site article

Job Searcher

Job Searcher is an AI-powered job search assistant for new grads. It analyzes resumes, generates LinkedIn search queries, and scores job postings across five dimensions: skills, experience, education, industry, and seniority. Built with a teacher-student model (DeepSeek V4 Pro and Qwen3-8B), it uses a curated dataset of 2,500 resumes and 10,000 job postings. Open-source and available on HuggingFace Spaces.

  • Automates LinkedIn job search with resume-based queries and multi-dimension scoring
  • Uses DeepSeek V4 Pro as teacher and Qwen3-8B as student
In-site article

New open-source voice model listens nonstop and decides every 0.4 seconds whether to speak or stay silent

Unlike GPT-4o or Qwen3.5-Omni, Audio Interaction doesn't wait for a recording to end: it translates, transcribes, chats, and picks up everyday noises like coughing in a single stream. Code, model weights, and download instructions are available on GitHub under the Apache 2.0 open-source license, with the training data to follow.

  • The Audio Interaction model continuously listens to audio streams, making decisions every 0.4 seconds.
  • It can translate, transcribe, chat, and recognize everyday noises in a single stream.
In-site article

OpenClaw Got Safer in Public

OpenClaw, an open-source AI agent project, improved its security through transparency and community contributions, despite facing many false vulnerability reports. It details changes like trust model documentation, hardening, plugin architecture, and partnerships with companies like NVIDIA, Microsoft, and Tencent.

  • Open-source nature enabled rapid security improvements.
  • Over 1,300 security advisories received, most false positives.
In-site article

PEFT of SLM for Telecommunications Customer Support: A Comparative Study of LoRA Configurations with Energy Consumption Analysis

A systematic study of parameter-efficient fine-tuning using Low-Rank Adaptation (LoRA) applied to Qwen2.5-3B for building a domain-specific conversational assistant in telecommunications customer support. The research introduces a combinatorial synthetic data generation approach and evaluates 16 LoRA configurations, revealing a divergence between quantitative validation loss and qualitative human-aligned rankings, and provides an energy-performance trade-off analysis.

  • Combinatorial synthetic data generation using 52 industry terms produced 30,000 training examples across 1,560 scenarios.
  • Evaluation of 16 LoRA configurations showed that lowest validation loss (0.5024) ranked only 6th-7th in qualitative assessment, while highest loss (0.6807) ranked first.
In-site article

Improving Heart-Focused Medical Question Answering in LLMs via Variance-Aware Rubric Rewards with GRPO

This paper proposes a Variance-Aware Reward Framework using Group Relative Policy Optimization (GRPO) for post-training LLMs on heart-focused medical question answering. The method replaces weighted binary criterion aggregation and single Likert scoring with continuous analytical reward functions, providing richer optimization signals. On the heart subset of HealthBench, the best variant improves accuracy from 0.362 to 0.502 and F1 from 0.532 to 0.668 over the Qwen3-14B base model, remaining competitive with GPT-OSS-120B.

  • Proposes a Variance-Aware Reward Framework with GRPO for heart-focused medical QA post-training.
  • Replaces binary criterion aggregation and Likert scoring with continuous analytical reward functions.
In-site article

Temporal Preference Concepts and their Functions in a Large Language Model

Researchers localized a neural subgraph responsible for temporal preference in a distilled LLM (Qwen3-4B-Instruct-2507), finding that models discount the future less steeply than humans and that this preference is unstable across contexts, with steering vectors capable of modulating it.

  • Localized temporal preference subgraph in mid-to-upper layers
  • Time horizon geometry encoded in residual stream
In-site article

DeepSWE results are unreliable – 3/3 DSv4 "failed" tasks solved with same model

An audit of the DeepSWE benchmark reveals that deepseek-v4-pro's reported results (8% solve rate, $4.22 avg cost) are invalid due to multiple issues: cost inflated ~5x by ignoring cache pricing, all three reported failures were solved with the same model, OpenRouter privacy settings silently block DeepSeek, and the model received no reasoning/effort tuning unlike competitors.

  • Cost inflated ~5x: benchmark bills all input tokens at cache-miss rate, ignoring 78% cache hits at 99.2% discount.
  • All three 'failed' tasks solved with same model deepseek-v4-pro for ~$0.86 total.
In-site article

SMAC-Talk: A Natural Language Extension of the StarCraft Multi-Agent Challenge for Large Language Models

SMAC-Talk extends the StarCraft Multi-Agent Challenge with a natural language communication channel to evaluate LLM-based agents in cooperative multi-agent settings. It features decentralized control, partial observability, long-horizon decision making, and scenarios with deceptive communicators. Benchmarking using Qwen3.5 models reveals how reasoning, memory, and scale affect coordination.

  • SMAC-Talk introduces a natural language channel for evaluating LLM agent coordination.
  • Includes deceptive communicator scenarios to test trust and robustness.
In-site article

Qwen 3.7 Plus: Alibaba's High-Intelligence but Expensive and Slow Model

Qwen 3.7 Plus is Alibaba's proprietary reasoning model released in June 2026, scoring 53 on the Artificial Analysis Intelligence Index, far above average. However, it is expensive, slow, and very verbose. The model supports text, image, and video input with a 1M-token context window.

  • Intelligence score of 53, well above the average of 23 for comparable models.
  • Priced at $0.40/M input tokens and $1.16/M output tokens, placing it in the expensive range.
In-site article

DigitalOcean says it is now an OpenRouter AI model provider

DigitalOcean announced on X that it is now a model provider on OpenRouter, offering DeepSeek V3.2, Kimi K2.6, and DeepSeek V4 Flash. The move signals the company's expansion from cloud infrastructure into AI inference.

  • DigitalOcean announced on X that it has become a model provider on OpenRouter
  • Initial models include DeepSeek V3.2, Kimi K2.6, and DeepSeek V4 Flash
In-site article

Linear Probes Detect Task Format, Not Reasoning Mode in Language Model Hidden States

A study probing Qwen3-14B hidden states shows that linear probes achieving 100% accuracy in classifying reasoning types (deductive, inductive, abductive) actually detect task format confounds (source, option count, response length) rather than genuine reasoning modes. After deconfounding, accuracy drops to chance, and causal steering shows no functional link. The findings urge routine format deconfounding in mechanistic interpretability.

  • Linear probes on LLM hidden states can achieve 100% accuracy in distinguishing reasoning types.
  • This accuracy disappears after controlling for task format confounds like source identity and option count.
In-site article

Dropstone 1.5: 2× Claude Code's usage at $15/mo

Dropstone 1.5 is an AI coding agent for the terminal, offering roughly 450 deep coding sessions per week for $15/month—about twice what Claude Code Pro delivers for $20. It runs on DeepSeek and Kimi models hosted in the US, with no data stored. Safety features require permission for file writes, shell commands, and network calls.

  • $15/month for ~450 deep coding sessions per week, 2x Claude Code Pro's usage.
  • Uses DeepSeek V4 Flash, V4 Pro, and Kimi K2.6 models hosted in the US, no data stored.
In-site article

Titan Network claims 5% of Asia's AI data market using crowdsourced home devices

Titan Network aggregates unused computing power from consumers' connected devices into a decentralized cloud, offering AI firms infrastructure at up to 75% lower cost. Clients include Tencent, Alibaba, and Kling AI. The company pays 80% of revenue from data tasks to individuals who share their devices and bandwidth.

  • Titan Network uses crowdsourced home devices for decentralized cloud AI.
  • Offers up to 75% cost savings over traditional cloud providers.
In-site article

Alibaba’s Qwen Team Launches Qwen3.7-Plus, Adding Vision, Deep Reasoning, Tool Invocation, and Autonomous Iteration on the Bailian Platform

Alibaba's Qwen team released Qwen3.7-Plus, a multimodal agent model available via API on Bailian (Model Studio). It understands images and video, and adds capabilities including deep reasoning, self-programming, tool invocation, verification/testing, and autonomous iteration. Its preview ranked #16 in Vision Arena, making Alibaba the #5 vision lab.

  • Alibaba's Qwen team launched Qwen3.7-Plus, a multimodal agent model on the Bailian platform (Model Studio).
  • The model understands images and video and includes five agentic features: deep reasoning, self-programming, tool invocation, verification/testing, and autonomous iteration.
In-site article

SENSE: Semantic Embedding Navigation with Soft-gated Evaluation for Retrieval-based Speculative Decoding

Proposes SENSE, which uses target model hidden states for semantic retrieval and soft-gated evaluation to improve robustness and efficiency of retrieval-based speculative decoding, achieving up to 4.09 mean acceptance length and 3.26x speedup on LLaMA and Qwen.

  • SENSE anchors retrieval on hidden states of target model for semantic alignment.
  • Soft-gated Evaluation validates semantic equivalence instead of surface forms.
In-site article

BitsMoE: Efficient Spectral Energy-Guided Bit Allocation for MoE LLM Quantization

BitsMoE is an efficient quantization framework for Mixture-of-Experts (MoE) large language models. It uses SVD to decompose each MoE layer into a shared basis and expert-specific spectral factors, preserving the shared basis without quantization to maintain cross-expert structure. An integer linear programming formulation minimizes reconstruction loss under a fixed bit budget. Experiments show that BitsMoE significantly reduces accuracy degradation in ultra-low-bit regimes, achieving 12.3× quantization speedup, 27.83 percentage point average accuracy improvement, and 1.76× decoding speedup over GPTQ on Qwen3-30B-A3B-Base at 2 bits.

  • Proposes BitsMoE, which leverages SVD decomposition of MoE layers for fine-grained quantization.
  • Uses integer linear programming for activation-aware mixed-precision bit allocation to minimize reconstruction loss.
In-site article

[AINews] NVIDIA Cosmos 3, Nemotron 3 Ultra, and RTX Spark

NVIDIA launched Cosmos 3 (unified multimodal world model), Nemotron 3 Ultra (efficient 550B LLM), and RTX Spark (personal AI superchip). Also covered: MiniMax M3, Qwen3.7-Plus, JetBrains Mellum2, agent ecosystems, and infrastructure updates.

  • NVIDIA's Cosmos 3 uses a Mixture-of-Transformers architecture to unify language, image, video, audio, and action. Nemotron 3 Ultra is a 550B open-weight LLM claiming US SOTA with fast inference. RTX Spark is a personal AI computer with Grace+Blackwell at 1 petaflop FP4.
  • MiniMax M3 launched as an open-weight multimodal agent model with 1M context and strong coding benchmarks. Qwen3.7-Plus from Alibaba is a hybrid agent unifying GUI/CLI. JetBrains Mellum2 is a 12B MoE for ultra-low-latency developer workflows.
In-site article

Serving MiniMax-M3 for efficient inference: Unlocking 1M-Token Context and Multimodality Without Regrets

Together AI optimizes MiniMax M3 serving with KV-block-major sparse attention, paged MSA decode, optimized index scoring, and a Rust-based multimodal gateway, achieving 81–125% throughput improvements across concurrency levels.

  • MiniMax M3 combines coding, agentic workflows, and multimodal reasoning with a 1M-token context window.
  • Together AI's kernel team developed KV-block-major sparse attention and integrated MSA with paged attention.
In-site article

MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding

MiniMax officially released MiniMax M3 on June 1, 2026, featuring MiniMax Sparse Attention (MSA) for a 1M-token context window, native image/video input, and desktop computer operation. The API is live now.

  • M3 introduces MSA, achieving >9× prefill and >15× decoding speedup at 1M-token context versus M2, with 1/20th per-token compute.
  • Scores 59.0% on SWE-Bench Pro, surpassing GPT-5.5 and Gemini 3.1 Pro.
In-site article

MiniMax M3: Open-weight model with a million-token context challenges proprietary leaders

Chinese AI company MiniMax has released its new model M3. It's billed as the first open-weight model to combine top-tier coding performance, a one-million-token context window, and native multimodality.

  • MiniMax releases M3, the first open-weight model combining top coding performance, 1M-token context, and native multimodality.
  • The model challenges proprietary leaders in AI performance.
In-site article

MiniMax debuts AI model built for long and complex coding tasks

Chinese AI startup MiniMax released its flagship model M3, designed for coding agents and automated workflows. It processes up to 1M tokens, reduces computational costs by 20x, and outperforms OpenAI GPT-5.5 and Google Gemini on SWE-Bench Pro. The company also prepares for a Shanghai IPO and partners with Ant Group's Alipay for AI payment infrastructure.

  • MiniMax unveils M3 with 1M-token context and 20x cost reduction.
  • M3 beats OpenAI GPT-5.5 and Google Gemini 3.1 Pro on SWE-Bench Pro.
In-site article

Tokens Are Expensive Because You Feed the Model Too Much Junk | @Wang Xiaoye from AWS AIGC2026

At the 2026 China AIGC Industry Summit, Wang Xiaoye, Technical Director of AWS Product Technology, pointed out that 87% of enterprises claim to have deployed AI at scale, but only 10% have gained actual value. He emphasized the huge gap between personal and enterprise-level agent deployment, and proposed that enterprises need to focus on five layers: compute, models, data & knowledge, agentic platform, and applications. He also noted that token costs are often high because too much useless information is fed to the model.

  • 87% of enterprises have deployed AI, but only 10% see value
  • Personal and enterprise agent deployment are fundamentally different
In-site article

When LLMs Learn to Be Consistently Wrong: A Multi-Model Study of Linear Representations of Synthetic Deception

This study introduces a multi-model paradigm to study synthetic deception via LoRA fine-tuning of five transformer models. Linear probes detect deception with near-perfect AUC in early layers, and logistic regression probes outperform MLP probes, supporting the Linear Representation Hypothesis. Probes generalize across domains with minimal loss. Different models exhibit distinct representational regimes: collapse in Pythia/Llama/Qwen versus high-dimensional preservation in Gemma-2. The results show that robust, domain-invariant deception representations can be rapidly entrenched through modest supervised fine-tuning, with implications for activation-based monitoring.

  • Linear probes on mean-pooled hidden states detect synthetic dishonesty with near-perfect AUC (≥0.99) as early as layers 1-3 in four architectures. Logistic regression consistently matches or outperforms MLP probes.
  • Probes trained on TruthfulQA generalize with near-zero loss (ΔAUC≈0) to held-out MMLU subjects. Late-layer representations show strong robustness to Gaussian noise.
In-site article

PhyDrawGen: Physically Grounded Diagram Generation from Natural Language

PhyDrawGen is a neuro-symbolic pipeline that generates physically accurate diagrams from text. It uses an LLM to extract a scene graph, a deterministic solver to encode physics constraints, and a fine-tuned Qwen-VL model to iteratively correct violations. Evaluated on 1,449 problems, it outperforms GPT-5-image and Gemini models.

  • PhyDrawGen combines LLM, deterministic solver, and vision model for physically accurate diagram generation.
  • It addresses hallucinations of force vectors and conservation law violations.
In-site article

Tokens Are Expensive Because You Feed the Model Too Much Junk | @Wang Xiaoye at AIGC2026

At the 2026 China AIGC Industry Summit, Wang Xiaoye, Technical Director of Amazon Web Services, pointed out that 87% of enterprises claim to have deployed AI at scale, but only 10% have gained real production value. He emphasized that enterprise-grade Agent deployment must bridge four major gaps: model selection, construction complexity, usage threshold, and talent shortage. He introduced AWS's five-layer architecture—compute, model, data, harness platform, and agent applications—and products like Quick to help enterprises move from demo to production.

  • 87% of enterprises deploy AI, but only 10% gain production value.
  • Enterprise-grade agents differ vastly from personal ones, requiring solutions for security, stability, and trust.
In-site article

Why Chinese AI labs went open and will remain open

The article argues Chinese AI labs open source models not as a national strategy but as a commercial strategy to gain global attention and trust. Using DJI and Insta360 as examples, it emphasizes the importance of marketing on YouTube. Chinese labs lack international marketing capabilities, so open source is their only way into the global conversation. Future releases will include proprietary open source models and fine-tuned variants to set standards.

  • Chinese AI labs open source for global visibility and engagement, not due to government mandate.
  • They lack international marketing presence, so open source serves as PR and trust-building.
In-site article

From Unlimited Tokens to Full-Agent: MiniMax's AI Native Organizational Evolution

MiniMax, an AI startup focusing on multimodal models, went public on the Hong Kong Stock Exchange in January 2026. The company adheres to a dual strategy of large models + applications and ToC + ToB. Internally, it provides unlimited tokens to all employees, uses agents to automate workflows, and targets high-value tasks that humans dislike, significantly improving efficiency and flattening the organization. In the next 2-3 years, AI will deeply integrate with various industries.

  • MiniMax has been committed to next-generation AI since its founding, advocating 'Intelligence with Everyone' and dual driving of models/applications and ToC/ToB.
  • Internal practices: unlimited tokens for all, agent-assisted HR and coding, flatter organization, and 30% R&D efficiency boost.
In-site article

Tuning CPU-only Qwen3-30B inference with an IBM Quantum sampling loop

A project demonstrates boosting Qwen3-30B inference speed from 0.09 to 14.03 tok/s on a 2017 MacBook Air by combining a human experimenter, Codex, llama.cpp, a local database, and IBM Quantum sampling. The QPU is used for candidate selection, not for running the model directly.

  • Runs Qwen3-30B on 2017 MacBook Air (8GB RAM, CPU-only)
  • Hybrid quantum-classical optimization loop achieves 14.03 tok/s from 0.09 baseline
In-site article

New review paper argues code is how AI agents think and act, not just what they produce

A new review paper argues that the real bottleneck for autonomous AI agents is the software layer around the language model—tools, memory, testing, and permissions. DeepSeek is building a dedicated 'Harness' team in Beijing, confirming the formula: model + harness = AI agent.

  • The paper claims the bottleneck for AI agents is the software harness, not the model.
  • Key components include tools, memory, testing, and permission boundaries.
In-site article

LightSail Technology Partners with Tencent Travel Services, Launches New Pre-sale Round

LightSail Technology announced a strategic partnership with Tencent Travel Services to integrate its AI full-sensing wearable device into the mobility platform. The device previously topped JD.com's bestseller list and sold out; now a new pre-sale round is open with discounts.

  • LightSail Technology and Tencent Travel Services partner to integrate AI wearable into travel services.
  • The LightSail AI wearable topped JD.com's bestseller list for 8 consecutive days and sold out.
In-site article

PPIO Selected for '2026 Global AI 100' by FeiFan Research, Leading the New Wave of AI Globalization

PPIO has been named to the '2026 Global AI 100' list by FeiFan Research, recognized at the FeiFan Awards – Annual AI Globalization Summit. The list honors AI-native companies with global vision. PPIO offers a global distributed computing infrastructure, full-stack cloud services, a model platform supporting DeepSeek, GLM, MiniMax, Kimi, Qwen, and an innovative Agent Sandbox. As of April 2026, PPIO has integrated over 4,800 distributed nodes, with daily token calls exceeding 1 trillion, over 570,000 developers, and Agent Sandbox business growing more than 50x since launch. PPIO was also designated as a pilot unit for Shanghai's Digital Overseas Service Platform and a GDA Pilot Service Station.

  • PPIO selected for '2026 Global AI 100', highlighting its leadership in AI globalization.
  • Provides global distributed computing infrastructure with full GPU coverage for training and inference.
In-site article

Zero Skill Floor, AAA Ceiling: Tencent's AI Game Creation Platform Is Wild

The next wave of AI creation is hitting gaming. Tencent has unveiled 'Project Craft', an AI-powered game creation platform that lets users generate playable games through natural language, supports 2D and 3D, and comes with AIGC tools and free assets to slash the barrier to game development.

  • Tencent launches 'Project Craft', an AI game creation platform that generates playable games from natural language prompts
  • Supports both 2D and 3D games, with a full AIGC pipeline and over 20,000 free assets
In-site article

Creative Design WorkBuddy is Here! Tencent Releases AI Agent Creative Studio Miora

Tencent has released Miora, an AI-powered creative studio that integrates image, video, UI/UX, and 3D generation. It features a memory system, multi-modal canvas, and customizable Skills, aiming to enable one person to have a whole creative studio.

  • Tencent launches Miora, a creative AI agent studio
  • Supports generation of images, videos, UI/UX, and 3D content
In-site article

Benchmarking Open-Source Safety Guard Models: A Comprehensive Evaluation

A comprehensive evaluation of 14 open-source safety guard models on a benchmark of 79,331 samples reveals that Qwen Guard (4B parameters) achieves the highest recall (83.97%), while larger models like Llama Guard (12B) miss up to 75% of unsafe content. Model size does not correlate with safety performance, and general-purpose guard models outperform specialized ones.

  • Qwen Guard (4B parameters) achieves the highest recall (83.97%) among 14 open-source safety guard models.
  • Larger models like Llama Guard (12B) and GPT-OSS Safeguard (20B) exhibit conservative behavior, missing up to 75% of unsafe content.
In-site article

RightNow-Arabic-0.5B-Turbo: An Open Sub-1B Arabic Language Model via Vocabulary Injection and Edge-First Deployment

This paper presents RightNow-Arabic-0.5B-Turbo, a 518M-parameter Arabic-specialized LLM built on Qwen2.5-0.5B using vocabulary injection and edge-first deployment. It achieves 35.9% mean accuracy on Arabic benchmarks, outperforming all same-class open models, and ties Falcon-H1-1.5B on COPA-ar at one-third the size. The quantized model is 398 MB and delivers 635 tokens/s on a single H100, enabling efficient edge deployment.

  • 518M-parameter Arabic LLM built on Qwen2.5-0.5B with vocabulary injection of 27,032 Arabic tokens.
  • Achieves 35.9% mean accuracy on three Arabic benchmarks, surpassing all same-class open-source models.
In-site article

Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?

Recent work shows RL retains prior capabilities more effectively than SFT. This paper extends to the mechanistic level, introducing differential circuit vulnerability to measure circuit degradation. On Qwen2.5-3B-Instruct for scientific QA, SFT adapts faster but causes greater circuit disruption and forgetting, while RL preserves circuits at the cost of slower adaptation. Results suggest circuit preservation explains RL's robustness against catastrophic forgetting.

  • SFT adapts quickly but disrupts internal circuits, leading to catastrophic forgetting.
  • RL preserves more of the base model's circuits, resulting in less forgetting but slower task adaptation.
In-site article

AI Rewriting Software Industry? 8-Year-Old Builds OS, One-Person Company Lands Million-Dollar Deals

At the 2026 China AIGC Industry Summit, Baidu's Miaoda product director Zhu Guangxiang shared how AI has lowered programming barriers from writing code to chatting. 87% of Miaoda users don't know code; an 8-year-old built an OS; one-person companies (OPCs) land million-dollar contracts. Vibe Coding turns demand-side into supply-side, enabling mass entrepreneurship.

  • Fourth programming revolution: natural language programming, massively expanding creators
  • 87% of Miaoda users have no coding skills; OPCs are the largest user group (16% entrepreneurs)
In-site article

NVIDIA Releases Polar, a Token-Faithful Rollout Framework for GRPO Training Across Codex, Claude Code, and Qwen Code

NVIDIA researchers have introduced Polar, a rollout framework that trains language agents using reinforcement learning without modifying their agent harnesses. Polar places a model API proxy between the harness and the inference server, capturing token-level interactions and reconstructing trainer-ready trajectories. Using GRPO on a Qwen3.5-4B base model, Polar improves SWE-Bench Verified pass@1 by 22.6 points under the Codex harness, 4.8 points under Claude Code, and 6.2 points under Pi. The framework is registered as a NeMo Gym environment and released under the ProRL Agent Server repository.

  • Polar enables RL training on any agent harness via a model API proxy without modifying the harness code
  • Achieves up to 22.6 point improvement on SWE-Bench Verified using GRPO on Qwen3.5-4B across four coding harnesses
In-site article

South Africa Has AI Leverage. Its Draft Policy Leaves It Unused

South Africa holds 88% of global platinum-group metals, hosts Africa's largest data center market, and sits at the center of a US-China AI infrastructure contest. Yet its draft AI policy, withdrawn after hallucinated references, fails to leverage these advantages for favorable terms. The article examines South Africa's structural leverage, three possible AI infrastructure futures (Chinese, US, local open-weight), and the need for binding governance provisions.

  • South Africa's platinum metals and renewable energy give it unique AI leverage, but the draft policy lacks minimum terms for hyperscalers, data sovereignty, or tech transfer conditions.
  • US and Chinese tech companies (Microsoft, Huawei) compete for AI infrastructure control in South Africa, while the policy does not specify what South Africa demands in return.
In-site article

Self-Verified Distillation: Your Language Model Is Secretly Its Own Synthetic Data Pipeline

A new method called Self-Verified Distillation (SVD) enables LLMs to self-improve using only unlabeled prompts, without external feedback. The model generates candidate solutions, filters them through a three-stage verification cascade, and trains on the curated data. Experiments on Qwen3 models show significant gains across math, science, and coding benchmarks.

  • SVD uses cycle-consistency, factuality, and correctness checks to filter self-generated solutions.
  • More candidate samples and larger verification budgets yield higher-quality training data.
In-site article

More growth tags

China AI AI News | AI News Hub