AI News HubLIVE

DeepSeek updates

[AINews] Founders and Forward Deployed Engineers

While most digest yesterday's major Anthropic news, we highlight AIE's new Forward Deployed Engineer track and Founders program, along with AI news from May 28-29. Key topics include: Claude Opus 4.8 rollout with mixed benchmarks, multi-turn RL tokenization bugs, open model and toolchain progress, Google/OpenAI product expansions, and interesting research papers.

  • Claude Opus 4.8 brings incremental improvements but no benchmark sweep; pricing remains a pain point.
  • Multi-turn RL training tokenization bug identified, requiring 'Token-In, Token-Out' discipline.
In-site article

StepFun Releases Step 3.7 Flash: A 198B MoE Vision-Language Model for Coding Agents and Search Workflows

Step 3.7 Flash is a 198B sparse MoE model with ~11B active parameters, native vision, and 256k context. It achieves significant gains over its predecessor in coding benchmarks, supports Advisor Mode for cost-efficient agentic reasoning, and is released under Apache 2.0.

  • 198B MoE vision-language model with ~11B active params and 256k context window.
  • Achieves 56.26% on SWE-Bench Pro, up from 51.3%, and narrows cross-harness variance.
In-site article

DDS Vibe Academy – 47 free AI coding masterclasses, built by AI agents

DDS Vibe Academy offers 47 free AI coding masterclasses, all built by AI agents. Founder Robert McCullock claims he wrote zero lines of code, only designed constraints. Courses span Foundation, Development, Application, and Mastery levels, covering Claude, Antigravity, MCP, and more.

  • 47 free AI coding masterclasses, built entirely by AI agents
  • Founder wrote no code, only designed constraints
In-site article

The Download: unlocking lithium and controlling Ebola

A new extraction process using weak acid could unlock low-cost lithium from silicate minerals, potentially revolutionizing EV and energy storage materials. Meanwhile, a deadly Ebola outbreak in the DRC is proving difficult to contain, and the Pope's new encyclical calls for collective action on AI.

  • New lithium extraction method uses weak acid to dissolve silicates, freeing lithium and other valuable materials.
  • Startup Rock Zero is commercializing the technology.
In-site article

BYD Launches 4nm AI Chip: On Par with NVIDIA in Process, Outperforms Tesla in Compute

BYD unveiled its first self-developed 4nm automotive-grade smart driving chip, Xuanji A3, achieving over 2100 TOPS with three chips combined. The dedicated NPU architecture offers 20% lower power per unit and 100% higher compute utilization compared to general-purpose GPUs. BYD also promises full compensation for accidents during city navigation.

  • BYD unveils fully self-developed 4nm smart driving chip Xuanji A3
  • Dedicated NPU delivers 20% lower power and 100% higher compute efficiency
In-site article

New review paper argues code is how AI agents think and act, not just what they produce

A new review paper argues that the real bottleneck for autonomous AI agents is the software layer around the language model—tools, memory, testing, and permissions. DeepSeek is building a dedicated 'Harness' team in Beijing, confirming the formula: model + harness = AI agent.

  • The paper claims the bottleneck for AI agents is the software harness, not the model.
  • Key components include tools, memory, testing, and permission boundaries.
In-site article

LightSail Technology Partners with Tencent Travel Services, Launches New Pre-sale Round

LightSail Technology announced a strategic partnership with Tencent Travel Services to integrate its AI full-sensing wearable device into the mobility platform. The device previously topped JD.com's bestseller list and sold out; now a new pre-sale round is open with discounts.

  • LightSail Technology and Tencent Travel Services partner to integrate AI wearable into travel services.
  • The LightSail AI wearable topped JD.com's bestseller list for 8 consecutive days and sold out.
In-site article

PPIO Selected for '2026 Global AI 100' by FeiFan Research, Leading the New Wave of AI Globalization

PPIO has been named to the '2026 Global AI 100' list by FeiFan Research, recognized at the FeiFan Awards – Annual AI Globalization Summit. The list honors AI-native companies with global vision. PPIO offers a global distributed computing infrastructure, full-stack cloud services, a model platform supporting DeepSeek, GLM, MiniMax, Kimi, Qwen, and an innovative Agent Sandbox. As of April 2026, PPIO has integrated over 4,800 distributed nodes, with daily token calls exceeding 1 trillion, over 570,000 developers, and Agent Sandbox business growing more than 50x since launch. PPIO was also designated as a pilot unit for Shanghai's Digital Overseas Service Platform and a GDA Pilot Service Station.

  • PPIO selected for '2026 Global AI 100', highlighting its leadership in AI globalization.
  • Provides global distributed computing infrastructure with full GPU coverage for training and inference.
In-site article

ModelBest's 'Open Source Week': A Systemic Declaration Defining the Endgame of On-Device AI

From May 25 to 29, ModelBest jointly organized an 'On-Device LLM Open Source Week' with the OpenBMB community, releasing five key technological achievements that form a full-stack closed loop: BitCPM-CANN (1.58-bit low-bit training model supporting Ascend), MiniCPM5-1B (outperforming models twice its size), ForgeTrain (AI-written training framework 10% faster than Megatron), PilotDeck (agent operating system), and UltraData (core dataset). These releases demonstrate that the on-device AI competition is a systemic engineering challenge, not a single technology race. MiniCPM5-1B surpasses parts of GPT-4o, validating the 'density law.' ModelBest's two-year lead and deep tech stack position it as a key player in the shift from cloud to edge.

  • ModelBest held an On-Device LLM Open Source Week from May 25-29, 2026, releasing one key technology each day.
  • The five releases cover training framework, model compression, data, and agent OS, showcasing systemic innovation.
In-site article

5 Billion Tokens Free! World's First Commercial AI Host Launched, Unleashing Token Consumption

Lenovo launches the world's first commercial AI host series, designed for one-person companies (OPC) and growing enterprises. By combining local and cloud hybrid architecture, it addresses high token costs and data security issues, offering generous token bonuses and out-of-box experience.

  • Lenovo unveils three AI hosts: mini 100, 300, and Pro 700, catering from individuals to teams.
  • Local inference plus cloud elasticity reduces token costs by 70%-95%.
In-site article

Zero Skill Floor, AAA Ceiling: Tencent's AI Game Creation Platform Is Wild

The next wave of AI creation is hitting gaming. Tencent has unveiled 'Project Craft', an AI-powered game creation platform that lets users generate playable games through natural language, supports 2D and 3D, and comes with AIGC tools and free assets to slash the barrier to game development.

  • Tencent launches 'Project Craft', an AI game creation platform that generates playable games from natural language prompts
  • Supports both 2D and 3D games, with a full AIGC pipeline and over 20,000 free assets
In-site article

Creative Design WorkBuddy is Here! Tencent Releases AI Agent Creative Studio Miora

Tencent has released Miora, an AI-powered creative studio that integrates image, video, UI/UX, and 3D generation. It features a memory system, multi-modal canvas, and customizable Skills, aiming to enable one person to have a whole creative studio.

  • Tencent launches Miora, a creative AI agent studio
  • Supports generation of images, videos, UI/UX, and 3D content
In-site article

Evidence that the first papal encyclical on AI was substantially written by AI

The article presents multiple lines of evidence, including statistical analysis of punctuation and word usage, and results from an AI detection tool, to argue that Pope Leo's first encyclical on AI contains substantial portions written by AI, likely Claude. The author acknowledges each piece of evidence might be explained away but argues the consilience is hard to dismiss.

  • The encyclical uses em-dashes and the word 'genuinely' at rates far exceeding any previous encyclical.
  • AI detection tool Pangram flagged several paragraphs as 40-100% AI-generated, while none of the backtested past encyclicals were flagged.
In-site article

How to optimize your AI token usage

repo-brain is an open-source tool that compresses an entire codebase into a single Markdown context file, achieving up to 96% compression and significantly reducing AI token usage. It supports static analysis, architecture analysis, semantic relationships, and multiple AI providers.

  • Compress entire codebase into a single Markdown context file to reduce AI token usage
  • Achieved 96% compression on a 262-file repo (154,229 to 6,487 tokens)
In-site article

Reinforcement Learning is an Infrastructure Problem

This article explores the practical application of reinforcement learning in post-training large language models, highlighting that the current bottleneck is infrastructure rather than algorithms. Modal shares its experience running RL post-training at scale and introduces its open-source library to help teams address key challenges like multi-node training, environment management, and GPU utilization.

  • The bottleneck for RL post-training LLMs is infrastructure, including training engines, inference sandboxes, and environment isolation.
  • Multi-node training makes weight synchronization costly; RDMA and delta compression significantly reduce latency.
In-site article

Claude 4.8 Arrives: Surpasses Mythos in Some Areas, Supports Hundreds of Parallel Sub-Agents

Anthropic released Claude Opus 4.8, showing improvements in terminal engineering and knowledge work, outperforming Mythos in certain benchmarks. The model features enhanced honesty and a new Dynamic Workflows capability that orchestrates hundreds of parallel sub-agents. Early testers report significant gains in code quality and task reliability.

  • Claude Opus 4.8 was released just 43 days after 4.7, with notable gains in coding and knowledge tasks
  • Dynamic Workflows: Claude generates JavaScript orchestration scripts to coordinate hundreds of parallel sub-agents
In-site article

DeepSWE: Measuring coding agents on original, long-horizon engineering tasks

DeepSWE is a new benchmark for evaluating AI coding agents on fresh, complex software engineering tasks. It avoids data contamination, covers diverse repositories, requires significant code changes, and uses hand-written verifiers. Leading models show a wide range of performance, with GPT-5.5 achieving 70% and others lower.

  • DeepSWE is a contamination-free benchmark with original tasks.
  • Tasks span 91 repositories in 5 languages.
In-site article

World Models Take Over from Language Models: Company Pioneers Physical AGI 'Dual Pyramid' System, Universal Robots Enter the 'Home Era'

Jijia Vision unveiled the world's first physical AGI 'Dual Pyramid' system, launching the home robot Shiguang S1 with 100-unit household orders, targeting the 'GPT-3 moment' of physical AGI within 12 months.

  • Jijia Vision introduces the 'Dual Pyramid' system comprising a data pyramid and an algorithm pyramid for physical AGI.
  • The Shiguang S1 home robot adopts a wheeled-arm configuration and has secured 100-unit real-home orders.
In-site article

Show HN: I packaged a Python AI agent and Vue dashboard into one Electron app

Hermes Desktop is a cross-platform desktop app that bundles a Python runtime, hermes-agent (a self-improving AI agent), and hermes-web-ui (a Vue 3 + Koa chat dashboard) into a single Electron application, requiring no separate Python or Node installation. It integrates with DingTalk and is powered by DeepSeek.

  • Bundles Python runtime and hermes-agent for a zero-dependency user experience
  • Uses Electron shell with hermes-web-ui frontend
In-site article

5 AI-Generated Math Papers Accepted! Post-00s Founder Hong Letong Raises $2 Billion

Axiom Math, founded by Chinese post-00s entrepreneur Hong Letong, has had 5 out of 8 AI-generated math papers accepted in peer-reviewed journals. The company raised $2 billion in March, achieving a $16 billion valuation.

  • Five of eight math papers generated by Axiom Math's AI system, AxiomProver, have been accepted by academic journals.
  • Founder Hong Letong dropped out of Stanford to start the company, which secured $2 billion in funding and is valued at $16 billion.
In-site article

7B Model Beats o3 and GPT-5: Medical AI Agents Teach Models Where and How to Look

The LeapQuest team at Shanghai Innovation Institute, in collaboration with multiple universities, introduces a new medical AI paradigm that enables models to actively use visual tools during reasoning, transforming from passive input receivers to active evidence seekers. Two papers are accepted at ICML 2026.

  • LeapQuest proposes Ophiuchus and MedScope for medical images and videos, adopting the Think with Images/Videos paradigm.
  • Ophiuchus-7B achieves an average score of 68.0 on 8 VQA benchmarks, surpassing o3 (62.2) and GPT-5 (59.9).
In-site article

AI Rewriting Software Industry? 8-Year-Old Builds OS, One-Person Company Lands Million-Dollar Deals

At the 2026 China AIGC Industry Summit, Baidu's Miaoda product director Zhu Guangxiang shared how AI has lowered programming barriers from writing code to chatting. 87% of Miaoda users don't know code; an 8-year-old built an OS; one-person companies (OPCs) land million-dollar contracts. Vibe Coding turns demand-side into supply-side, enabling mass entrepreneurship.

  • Fourth programming revolution: natural language programming, massively expanding creators
  • 87% of Miaoda users have no coding skills; OPCs are the largest user group (16% entrepreneurs)
In-site article

[AINews] Cognition raises $1B in $26B Series D

Cognition raises $1B at a $26B valuation, projecting >$1B ARR by year-end. The article covers inference efficiency trends, agent engineering, continual learning, new benchmarks, model releases, and coding agent productization.

  • Cognition raises $1B Series D at $26B valuation, ARR projected >$1B by EOY.
  • Inference optimization shifts to architectural level: EAGLE 3.1, DeepSeek V4-Pro hybrid attention, Xiaomi MiMo cache management.
In-site article

Jensen Huang Joins Tsinghua University's Advisory Board

NVIDIA CEO Jensen Huang has accepted an invitation to join the Advisory Board of Tsinghua University's School of Economics and Management (SEM). The board, chaired by Apple CEO Tim Cook, includes Elon Musk, Satya Nadella, Mark Zuckerberg, Jack Ma, and other global leaders. Huang also recently received an honorary doctorate from Carnegie Mellon University.

  • Jensen Huang joins Tsinghua SEM Advisory Board
  • Board chaired by Apple's Tim Cook, includes top tech and business leaders
In-site article

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

Artificial Analysis and IBM launch ITBench-AA, a benchmark for agentic enterprise IT tasks focusing on Site Reliability Engineering. Frontier models score below 50%, with Claude Opus 4.7 leading at 47%. The benchmark evaluates models on Kubernetes incident response, requiring diagnosis from logs and traces.

  • Claude Opus 4.7 leads at 47%, with GPT-5.5 at 46% and Qwen3.7 Max at 42%.
  • All frontier models score below 50%, making ITBench-AA one of the least saturated agentic benchmarks.
In-site article

AI is an arms race, and the US wants $9 billion in Nvidia superchips to keep up

The government has secretly requested $9 billion for Nvidia GB10 superchips to help the CIA and NSA keep up with leading AI firms like Anthropic and OpenAI. The funding requires congressional approval, while $800 million has been repurposed for cloud compute. The article covers chip specs, costs, and the escalating AI hardware race.

  • The US government secretly requested $9 billion for Nvidia GB10 superchips to help the CIA and NSA keep pace with big AI players.
  • Each GB10 chip consumes only 140W but delivers 1 petaflop of FP4 performance, enabling fine-tuning of 70-billion-parameter models.
In-site article

Show HN: Mneme HQ – repo-native architectural rules for AI coding agents

Mneme HQ provides architectural governance for AI-assisted development by enforcing constraints before code generation, preventing architectural drift and reducing review overhead. It integrates directly into the AI coding agent workflow, blocking banned frameworks, cross-boundary calls, and superseded decisions before they reach the PR queue.

  • Enforces architectural rules before AI agents generate code, stopping violations at the source
  • Works with major AI coding assistants and agent frameworks
In-site article

South Africa Has AI Leverage. Its Draft Policy Leaves It Unused

South Africa holds 88% of global platinum-group metals, hosts Africa's largest data center market, and sits at the center of a US-China AI infrastructure contest. Yet its draft AI policy, withdrawn after hallucinated references, fails to leverage these advantages for favorable terms. The article examines South Africa's structural leverage, three possible AI infrastructure futures (Chinese, US, local open-weight), and the need for binding governance provisions.

  • South Africa's platinum metals and renewable energy give it unique AI leverage, but the draft policy lacks minimum terms for hyperscalers, data sovereignty, or tech transfer conditions.
  • US and Chinese tech companies (Microsoft, Huawei) compete for AI infrastructure control in South Africa, while the policy does not specify what South Africa demands in return.
In-site article

RayNeo Launches GT Series and V4, Teases Next-Gen AI Glasses RayNeo iO

On May 27, RayNeo held a summer launch event to unveil the industry's first professional cinema-grade AR glasses, the GT series, and the latest AI shooting glasses, the V4. The GT series starts at RMB 1,899, and the V4 starts at RMB 2,199. The company also previewed its next-generation AI glasses, the RayNeo iO, expected in Q3.

  • GT series: professional cinema-grade AR glasses with 59° FOV, Dolby Vision support, 78g weight, starting at RMB 1,899.
  • V4: AI shooting glasses with 0.2s wake-up, 2.1s response, 11.5h music playback, IP67 rating, 38g weight, starting at RMB 2,199.
In-site article

Peking University, CUHK, and Shanghai AI Lab Develop VGGT-Edit: 3D Scene Editing in 5 Seconds with 120x Speedup

Researchers from Peking University, The Chinese University of Hong Kong, Shanghai AI Lab, and NTU have introduced VGGT-Edit, a native 3D editing framework that performs scene editing in approximately 5 seconds, achieving up to 120x acceleration over traditional methods. It outperforms existing approaches in semantic consistency, multi-view stability, and inference speed.

  • VGGT-Edit is the first native 3D editing framework that operates directly in 3D space, eliminating multi-view inconsistencies caused by 2D approaches.
  • Residual field prediction enables the model to modify only local changes while keeping the background stable, ensuring fast and high-quality edits.
In-site article

Last Week in AI #341 - Musk loses to OpenAI, Google's IO updates, OpenAI solves Erdős

This week's top AI news includes Elon Musk losing his $150 billion lawsuit against OpenAI, Google unveiling major AI updates at I/O 2026, OpenAI's AI solving an 80-year-old math problem, the Take It Down Act enforcement, and SpaceX planning to acquire coding startup Cursor after its IPO.

  • Elon Musk's $150B lawsuit against OpenAI dismissed; OpenAI prepares for IPO.
  • Google I/O 2026 introduces Gemini 3.5 Flash, Gemini Spark AI agent, Gemini Omni, and more.
In-site article

The Download: puncturing the AI jobs panic

Despite growing hysteria over AI's threat to white-collar jobs, data shows the technology has not yet had a large-scale impact on the labor market. AI-exposed occupations have lower unemployment than less-exposed ones. However, a Stanford study found that AI may be quietly eroding entry-level positions, causing a sharp decline in employment for young workers in AI-exposed jobs. The article also covers other tech news including the Pope's call for AI regulation, SpaceX's launch, and Huawei's chip breakthrough.

  • AI has not caused mass unemployment but may be weakening entry-level jobs.
  • Stanford study shows sharp decline in employment for young workers in AI-exposed occupations.
In-site article

Show HN: Mirdel – a local-first AI workspace with UI-based agent workflows

Mirdel is a local-first desktop AI workspace that unifies conversations, knowledge bases, notes, translation, image/video processing, local models, and extensible workflows into a long-running environment. It emphasizes data privacy and user control, supporting multiple cloud and local models, and enables workflow modularization and reuse through Applets, Skills, and MCP.

  • Local-first: data, models, and configuration stored locally by default; sensitive fields encrypted.
  • Modular workbench: separate but context-sharing modules for chat, knowledge base, notes, translation, image and video processing.
In-site article

[AINews] New AI Infra decacorns: Fireworks, Baseten (with OpenRouter on the way)

AI infrastructure startups Fireworks, Baseten, and OpenRouter are raising massive rounds, signaling the rise of inference infrastructure as a key AI platform layer. Meanwhile, agent harness engineering, new benchmarks, and model updates dominate the AI news cycle.

  • Fireworks ($15B), Baseten ($11B), and OpenRouter ($113M) lead a wave of inference infrastructure funding.
  • Agent harness engineering becomes the main differentiator for coding agents.
In-site article

DeepSeek Researcher Develops Automated Research Skill: Writing a Paper with Only 2 Hours of Human Brain Time

DeepSeek researcher Chen Deli used his self-developed DeliAutoResearch skill, collaborating with DeepSeek-V4-Pro and GPT-Image2, to complete a 46-page paper in just 6 days. The paper introduces an L1-L5 autonomy classification for research agents, analyzes four architectural patterns and 17 mainstream systems, and identifies six open problems. Chen Deli says only about 2 hours of human 'CPU time' were needed, with the rest handled by AI agents.

  • Chen Deli's DeliAutoResearch skill enabled the paper to be 99% written by AI agents.
  • The paper proposes an L1-L5 autonomy classification for research agents, analogous to SAE levels for autonomous driving.
In-site article

AI Weekly Issue #496: Anthropic's Pentagon model is now everyone's model

Anthropic released its formerly classified Mythos model to the public, collapsing the gap between sovereign and developer AI. DeepMind's Demis Hassabis moved AGI timeline to 2029. Critical vulnerabilities in Starlette impacted millions of AI agents, and a coordinated takedown dismantled the Glassworm botnet. BNP Paribas partnered with Mistral for sovereign AI security, while China restricted travel for top AI engineers at Alibaba and DeepSeek. Corporate AI spending and layoffs made headlines: Uber burned its full-year AI budget by April, ClickUp restructured with a 3:1 AI-to-human ratio, and Sam Altman reversed his white-collar apocalypse prediction. However, MIT Technology Review data showed AI-exposed roles have lower unemployment.

  • Anthropic releases Mythos, previously limited to government contractors, now available via standard API.
  • DeepMind CEO Hassabis advances AGI timeline to 2029, citing AlphaProof Nexus solving nine Erdős problems cheaply.
In-site article

Some ideas for what comes next, May 2026

2026 continues to accelerate AI progress with open models lagging in agentic capabilities, Google's Gemini not yet competitive with Claude Code/Codex, American open models rising, a fierce competition between Anthropic and OpenAI, and power structures asserting control.

  • Open models are 5-6 months behind in agentic capabilities, likely extending to 12+ months.
  • Google's Gemini lacks a clear competitor to Claude Code and Codex.
In-site article

Introducing DSA Attention to Multimodal: Kuaishou Keye 2.0 Opens a New Paradigm of Enhanced Reasoning

Kuaishou releases Keye-VL-2.0-30B-A3B, a multimodal large language model that first applies DeepSeek Sparse Attention (DSA) to multimodal scenarios, enabling 256K ultra-long context deep perception. It achieves SOTA on long-video temporal understanding benchmarks and introduces built-in Agent collaboration, paving the way for enhanced reasoning and real-world business applications.

  • First to integrate DSA attention into multimodal, solving long-video understanding bottlenecks.
  • Achieves SOTA on TimeLens, LongVideoBench, MLVU; reverses long-context decay by boosting accuracy from 35.34% to 42.44% when scaling from 64 to 512 frames.
In-site article

BODHI: Precise OS Kernel Specification Inference

Researchers propose BODHI, a domain-knowledge prompting method that significantly improves LLM performance in generating formal OS kernel specifications. On the OSV-Bench benchmark, BODHI with Claude Opus 4.6 achieves 96.73% Pass@1, substantially surpassing previous best results.

  • BODHI augments few-shot prompts with a structured C-to-Python translation guide covering 15 domain-specific patterns.
  • It improves Pass@1 from 55.10% to 96.73% on OSV-Bench with 245 tasks.
In-site article

Cited AI Workspace: No More Re-Uploading Files

UUMuse is a cloud AI knowledge base platform where you upload files once and use them across GPT, Claude, DeepSeek, Qwen, and more — with cited answers, persistent memory, agent mode, a multi-expert debate feature (Spark), and flexible deployment as docs sites, APIs, or MCP servers.

  • Upload files once and query multiple AI models (GPT, Claude, DeepSeek, Qwen) with source citations.
  • Persistent memory remembers your writing style and project context across conversations.
In-site article

ContextVault – Local-First AI Conversation Recorder for ChatGPT, Claude, Gemini

ContextVault is a browser extension that captures AI conversations in real-time across major LLM platforms like ChatGPT, Claude, and Gemini, storing them locally in IndexedDB. It allows one-click export as Markdown or ZIP, ensuring your data never leaves your device. Free, open source, no accounts or backend required.

  • Real-time capture across 7 LLM platforms including ChatGPT, Claude, and Gemini.
  • All data stored locally in IndexedDB, no cloud sync or third-party access.
In-site article

Show HN: HTML Deployer – AI Code to Website Publisher

HTML Deployer is a Chrome extension that extracts AI-generated HTML from ChatGPT, Claude, and Gemini, allowing users to preview, download ZIP, or publish directly to Netlify, GitHub, FTP, or self-hosted servers. It's designed for developers, founders, marketers, agencies, and beginners.

  • Extract HTML from ChatGPT, Claude, and Gemini.
  • Preview, export ZIP, or publish directly to cloud, FTP, or self-hosted.
In-site article

DeepSeek V4 Gets Even Cheaper: New Tool Boasts 99.82% Cache Hit Rate, Slashes Bills to 20%

One month after DeepSeek V4's release, the open-source community unveiled Reasonix, a tool specifically designed to minimize API costs by maximizing cache efficiency. It achieves a staggering 99.82% cache hit rate, reducing a $61 bill for 400M+ tokens to just $12.

  • Reasonix is a dedicated coding harness for DeepSeek, focusing on cost reduction.
  • Its cache-first loop, tool-call repair, and automatic context compression maintain over 90% cache hit rate in long sessions.
In-site article

Claude's Pass Rate Below 4%: SaaS-Bench Shatters the 'Fully Automated Office' Illusion of Computer-Use

UniPat AI releases SaaS-Bench, a benchmark evaluating mainstream large models on real office tasks. The highest full pass rate is only 3.8%, revealing that AI-powered fully automated offices are far from reality.

  • SaaS-Bench evaluation shows the best model, Claude Opus 4.7, achieves a full pass rate of only 3.8%.
  • 93.4% of tasks span at least two applications, and 97.3% of text tasks involve over 100 steps.
In-site article

Lynote Humanize Text – Open-source AI text humanization toolkit

Lynote Humanize Text is an open-source toolkit for humanizing AI-generated text, featuring a production-grade Standard Pipeline that uses multi-step LLM rewriting and cross-engine translation to bypass AI detectors like Turnitin and GPTZero. It offers three tiers of humanization with the Lynote.ai platform providing intelligent selection. The repository includes reference implementations, n8n workflow support, and achieved a 9.1/10 expert quality score with 100% key information retention.

  • Open-source toolkit to convert AI text into human-like writing, bypassing major AI detectors.
  • Production-ready Standard Pipeline uses a 5-step chain involving DeepSeek rewrites and multi-engine translation.
In-site article

Future Inference Will Consume 70% of Compute, Leaving 30% for Training | Silicon Valley Investor Zhang Lu at AIGC2026

At the 2026 China AIGC Industry Summit, Zhang Lu, Founding Partner of Fusion Fund, highlighted that the focus of AI compute demand is shifting from training to inference, with inference expected to account for 70% of compute. Communication in data centers may consume 100 times more electricity than computation, making technologies like optical communication critical. The biggest bottleneck for physical AI is the scarcity of high-quality real-world data. Healthcare, space, and nanorobots are the three most promising application directions.

  • Inference compute share will rise from 50% to 70%, becoming the core optimization target for AI infrastructure.
  • Communication in data centers can consume over 100 times more electricity than computation, driving innovations like optical communication.
In-site article

More growth tags