2026-06-01 09:46 UTCIn-site rewrite2 min readUpdated: 2026-06-30 13:03 UTC

Nearly $200M! VAST Completes New Funding Round, Officially Reveals World Model Roadmap

General AI company VAST has announced nearly $200 million in new funding and publicly unveiled its world model roadmap, Project Eden. The approach decouples state prediction from visual rendering, enabling persistent environments, scene reuse, and native multi-player interaction, targeting both creators and embodied AI research.

Source量子位Author: 听雨

Chinese AI company VAST has announced two major milestones: the completion of nearly $200 million in new funding and the public disclosure of its world model roadmap, Project Eden.

The funding rounds, totaling nearly $200 million, include a Series A+ and a Series A++ round led by Ince Capital and CLSC Yangtze River Delta Science and Innovation Fund. Other investors include Shenzhen AI Terminal Industry Fund, Shanghai Semiconductor Investment, Shenzhen Capital Group, Yuan Sheng Capital, Wofu Venture Capital, and Fangguang Capital. Existing shareholders such as Chunhua Venture Capital, Jingya Capital, BV Baidu Ventures, and Dongfang Jiafu also increased their stakes. This follows VAST's $50 million Series A round in March 2026.

Project Eden introduces a novel approach to world models. Unlike conventional methods like action-conditioned video generation or static 3D scene generation, Eden natively decouples underlying state reasoning from visual presentation. The system is structured into three layers:

A structured state layer that maintains a persistent global world state across time, independent of camera viewpoint.
A conditional interface layer that transforms the 3D state into semantic and geometric constraints for specific viewpoints.
A generative rendering layer that completes textures, lighting, materials, and dynamic details based on the constraints.

This decoupling reduces the model's burden: the state model focuses solely on reasoning "what happened," while the rendering model handles "how it looks." VAST's chief scientist Cao Yanpei explains that coupling both tasks exponentially increases difficulty.

To train the system, VAST employs a two-tier data strategy. L1 uses self-annotated internet videos, extracting depth, camera pose, and geometric trajectories to create dual-state data. L2 leverages engine-synthetic data generated by AI agents exploring game environments 24/7, producing precise 3D-state annotations.

The decoupled architecture enables three key capabilities:

Persistent environments: Objects continue to exist and evolve even when out of view.
Scene reuse and modularity: Users can repeatedly interact with and modify the same world state, with changes persisting.
Native multi-player interaction: A single underlying world can support numerous users and AI agents concurrently, with linear scaling of compute costs.

These capabilities open two main application directions:

For creators, an AI-native sandbox platform enabling natural language or simple actions to build shareable, persistent digital worlds.
For research, a high-quality simulation base for embodied AI, supporting multi-agent training and collaborative studies.

VAST's world model journey builds on its strong foundation in 3D generation. The company's Tripo series of 3D models is among the most widely used globally, with over 20 million creators on Tripo Studio. Its ecosystem covers major Chinese tech companies like Alibaba, Tencent, ByteDance, and NetEase. Recent advances include Tripo H3.1 with sculpt-level geometry details, Tripo P1.0 for production-ready meshes in seconds, and Tripo 8K textures that reduce work from days to minutes.

VAST's founders include CEO Song Yachen, a 97-born entrepreneur who previously worked at SenseTime and co-founded MiniMax; CTO Liang Ding, a Tsinghua PhD with extensive AI experience; and Chief Scientist Cao Yanpei, also a Tsinghua PhD, who leads major open-source 3D projects.

Looking ahead, VAST acknowledges challenges in more complex physics simulation and autonomous state maintenance. The ultimate goal is a fully self-supervised world model that updates its underlying state based on agent interactions without external annotations. While the road to a true world model is still long, VAST has charted a unique direction from the start.