2026-06-02 16:44 UTCIn-site rewrite1 min readUpdated: 2026-06-30 13:03 UTC

Microsoft debuts Surface RTX Spark Dev Box to run LLMs without cloud costs

Microsoft unveiled the Surface RTX Spark Dev Box at Build 2026, a compact desktop with Nvidia's Blackwell-architecture RTX Spark processor and 128GB unified memory, delivering 1 petaflop of AI compute. It allows developers to run models over 120 billion parameters locally, challenging the per-token cloud pricing model.

SourceHacker News AIAuthor: theanonymousone

Microsoft on Monday unveiled the Surface RTX Spark Dev Box, a compact desktop computer designed to let software developers run large AI models on their desks instead of paying for cloud computing — a move that directly challenges the per-token pricing model that has defined the AI industry's economics since ChatGPT launched three and a half years ago.

The device, announced at Microsoft Build 2026, packs Nvidia’s new Blackwell-architecture RTX Spark processor and 128 gigabytes of unified memory into a small-form-factor chassis, delivering what Nvidia rates at one petaflop of AI compute. In practical terms, that means a developer can load, run and interact with AI models exceeding 120 billion parameters without sending a single API call to the cloud.

"These class of devices, we think, will get to about 100 billion parameter model running," Pavan Davuluri, Microsoft's executive vice president of Windows and Devices, said during a press briefing ahead of the event. He emphasized that raw model size is only part of the equation: "The model size is one thing, but for the model to be effective, it kind of needs to be able to have enough context, because a larger model, you feed it larger context." At 100,000 tokens of context, he noted, the key-value cache alone can consume 40 to 50 gigabytes of memory — which is precisely why Microsoft and Nvidia engineered the device around a 128-gigabyte unified memory pool shared dynamically between the CPU and GPU.