The AI revolution will not be televised — it’ll be quantized
Chinese AI labs are pioneering quantization and open-weight models, making frontier AI accessible and cost-effective. Experts discuss how this shifts power from proprietary US models to local, customizable solutions, potentially commoditizing raw model intelligence.
With apologies to Gil Scott-Heron and his timeless 1971 protest song, if anyone thought the AI revolution would not be quantized, they might be wrong. A cultural shift is underway in China, where quantization is indeed driving change.
Quantization: The compression of AI model weights
Quantization is the process of compressing AI model weights to a lower numerical precision, making them smaller and cheaper to run. It is a technique that runs in parallel with the provision of open-weight model access, where developers gain public access to a model’s trained parameters, then customize the model and run it locally or on their chosen cloud.
According to RQR Intelligence, “The massive advantage of the Chinese AI ecosystem is its unwavering commitment to open weights.”
Software engineers are able to use models such as Qwen, Xiomei’s MiMo, or DeepSeek V4 Pro and download the weights (the precise numerical values learned during model training), put them through a quantization process, and then run and host them locally on their own machine (or choice of cloud service) to achieve frontier-level intelligence.
“Chinese frontier models like Z.AI, Qwen, GLM and DeepSeek have become practical tools for software development today. They’re well suited for test generation, refactoring, repo analysis, documentation and first-pass debugging. The caveat is that they still need verification. They’re useful engineering tools, but they’re not autonomous senior engineers” – Gautam Korlam, Sonar.
Principal Engineer at AI code verification and governance company Sonar, Gautam Korlam, tells The New Stack that the biggest advantage of Chinese frontier models is not just another benchmark gain. This is a power play from a different perspective.
“With these Chinese frontier models, developers can inspect them, fine-tune them, run them locally, and integrate them into workflows that are difficult to achieve through API-only deployments. That gives teams more control over cost and intelligence,” Korlam says.
Useful tools, not autonomous engineers
He expands on this, noting that Chinese frontier models like Z.AI, Qwen, GLM, and DeepSeek have become what he views as “practical tools” for software development today.
“They’re well suited for test generation, refactoring, repo analysis, documentation, and first-pass debugging. The caveat is that they still need verification. They’re useful engineering tools, but they’re not autonomous senior engineers,” Korlam confirms.
The revolution against the proprietary closed-weight AI frontier model companies (Anthropic’s Claude, OpenAI’s GPT-5.5, Google’s Gemini 3 Pro, Meta’s Llama, Mistral, and so on) stems, in part if not wholly, from a strategic response to US export controls on GPU hardware.
These constraints have driven Chinese AI labs to innovate by using various coding methodologies. According to Index.dev, models such as Alibaba Cloud’s Qwen achieve efficiency through the sparse model approach, activating only a subset of parameters during inference.
“Unlike traditional AI models that activate all parameters at once, Qwen3-Max only uses the relevant parts for a given task. This makes it about 30% more efficient on inference, meaning it delivers high performance without burning through computing power,” noted the portal.
“Things have turned out interesting [with Chinese models at the frontier level], but this is a double-edged sword. It is a blessing for companies and developers. At the same time, it means that the tools can be used by any party – state or private – for defence or offence.” – Piotr Migdał, Quesma.
Frontier AI is no longer a three-horse race
Piotr Migdał, founding engineer at agentic AI evaluation and training company Quesma tells The New Stack that “things have turned out interesting” with the release of GLM 5.2 by Z.ai, a Chinese model at the frontier level.
He reasons that this development, in particular, means the AI race is “no longer a US-only affair” involving the three usual suspects: OpenAI, Anthropic, and Google. Alongside Z.ai’s GLN, Migdał also points to Qwen 3.6 27B, which he thinks is the sweet spot for local development today.
“While the race is and will be fierce, we can expect more Chinese models to be right in the lead,” Migdał says. “GLM 5.2, unlike proprietary models, can be fine-tuned, tweaked to one’s needs to improve its performance on specific tasks, or to remove limitations. This makes it a double-edged sword. It is a blessing for companies and developers, fostering business and open source because there is no longer an API tap controlled by an oligopoly. At the same time, it means that the tools can be used by any party – state or private – for defense or offense.”
With Chinese frontier AI models being widely benchmarked, commented on, and rarely castigated or berated for hallucination across developer comment boards, the next inflection point may see a new degree of standardization, transparency, and dare we say commoditization?
The specter of model commoditization
James McGibney, partner at OC&C Strategy Consultants, tells The New Stack that this is exactly what might be happening.
“Arguably, raw model intelligence is already starting to commoditize, and the emergence of cheaper Chinese open-weight and quantized models will accelerate that shift,” McGibney says.
He thinks that the result of this shift could be enterprises increasingly choosing models on a case-by-case or application-by-application basis.
“If and when this commoditization blossoms, it will further push frontier AI companies — Chinese ones and, for that matter, US ones as well — to move up the stack and so serve to encourage players in this market to monetize the software, workflow integration, governance and implementation layers that make AI reliable and valuable in real business settings.”
Coming full circle then, when Gil Scott Heron told us that the revolution will not be televised, he meant that real change is internal, it won’t be sponsored or commercialized, and that nobody should sit back and be a spectator.
If he were around today (and if he had developed an interest in the growth of Chinese frontier models), he might apply the same scrutiny to the power structures he targeted back in the seventies and then agree that the AI model revolution may well be quantized. Perhaps the only difference (given the presence of software developers) is that this revolution will almost certainly go better with Coke.
The post The AI revolution will not be televised — it’ll be quantized appeared first on The New Stack.