Tokens Are Expensive Because You Feed the Model Too Much Junk | @Wang Xiaoye at AIGC2026
At the 2026 China AIGC Industry Summit, Wang Xiaoye, Technical Director of Amazon Web Services, pointed out that 87% of enterprises claim to have deployed AI at scale, but only 10% have gained real production value. He emphasized that enterprise-grade Agent deployment must bridge four major gaps: model selection, construction complexity, usage threshold, and talent shortage. He introduced AWS's five-layer architecture—compute, model, data, harness platform, and agent applications—and products like Quick to help enterprises move from demo to production.
At the 2026 China AIGC Industry Summit hosted by QuantumBit, Wang Xiaoye, Technical Director of Amazon Web Services, delivered a hard-hitting talk on the real challenges of deploying AI agents in enterprise environments. He began with a striking statistic: while 87% of enterprises claim to have deployed AI at scale, a mere 10% actually derive production value from it.
Wang argued that the difficulty lies not in building demos but in making AI run reliably in production. Running a personal agent on a Mac Mini that can be rebooted at will is worlds apart from orchestrating thousands of agents across a distributed enterprise environment with security, trust, and zero downtime. He identified four major gaps that enterprises must cross: model selection and response speed, construction complexity, ease of use for business users, and the talent gap for end-to-end agent deployment.
Drawing on AWS's experience serving millions of global customers, Wang introduced a five-layer framework to bridge the gap from demo to production:
- Compute: Optimized for inference, leveraging custom chips like Graviton and Trainium to deliver best-in-class price-performance for different agent workloads.
- Model: Choice over lock-in—Amazon Bedrock supports multiple models including Chinese ones like GLM and MiniMax, with enterprise-grade data protection.
- Data and Knowledge: A shift from human-serving data platforms to AI-agent-ready platforms. Challenges include memory sharing and isolation, memory lifecycle management, and token efficiency—often high token costs are due to feeding the model irrelevant information, not high unit prices.
- Harness Platform: Beyond the model, the "harness" (like Bedrock AgentCore) provides production-level capabilities: automatic scaling, memory management, identity integration, policy enforcement, and observability. Wang compared the model to a CPU and the harness to the operating system and applications that make a computer usable.
- Agent Applications: Ready-to-use agents like coding agents and working agents (e.g., Amazon Quick) that break down silos and act as personal assistants, accumulating personal knowledge graphs over time.
Wang highlighted AWS's partnership with OpenAI to offer managed agents combining OpenAI's frontier models and best practices with AWS's security infrastructure. He concluded by emphasizing that every application will be rebuilt in the age of AI, and encouraged enterprises to embrace the agent era with a solid production-ready foundation.