2026-06-01 15:00 UTCIn-site rewrite4 min readUpdated: 2026-06-30 13:03 UTC

New Server Hopes to Break Through AI’s “Memory Wall”

AI hardware startup Majestic Labs is developing a new AI server, Prometheus, with up to 128 terabytes of memory, over 60 times more than Nvidia's DGX B300. It uses a DRAM-centric architecture with a proprietary copper-cable memory interface and custom memory aggregation chips, delivering up to 25.6 TB/s bandwidth. The server features 12 Ignite AI processors combining ARM and RISC-V cores, and supports PyTorch, vLLM, and Triton frameworks without code modifications. Expected to ship in 2027, it claims to reduce capital expenditure and power consumption by 10 to 50 times.

SourceIEEE Spectrum AIAuthor: Matthew S. Smith

--> Raven.config('https://[email protected]/147999').install(); Huge Memory AI Server Aims to Shatter the Memory Wall - IEEE Spectrum

Sign InJoin IEEE

The June issue of IEEE Spectrum is here!

Download PDF ↓

Close bar

New Server Hopes to Break Through AI’s “Memory Wall”

FOR THE TECHNOLOGY INSIDER

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and post comments — all free! For full access and benefits,

to Spectrum.

CREATE AN ACCOUNTSIGN IN

New Server Hopes to Break Through AI’s “Memory Wall”

Matthew S. Smith

26s

3 min read

Majestic Labs

Memory is arguably the most serious constraint on modern AI large language models (LLMs). According to one influential paper, LLM token generation is an inherently memory-bound task, meaning the rate at which models output text is limited by how quickly data can be read in from memory. The severity of this bottleneck grows with model size. This creates a “memory wall” that holds back LLM inference performance.

AI hardware start-up Majestic Labs is taking a direct—and comprehensive—approach to solving this problem. It’s developing a new AI server, Prometheus, with up to 128 terabytes of memory. That’s over 60 times more than Nvidia’s DGX B300 server, a cutting edge AI processing rack.

Sha Rabii, co-founder and president of Majestic Labs, believes that this drastic increase in memory will provide his company an edge. While he acknowledges that “Nvidia’s done a phenomenal job creating a system that can scale out,” he argues that it becomes less economical as models grow and “ends up greatly over-provisioning on compute and starving on memory.”

DRAM-Centric Architecture for LLM Memory

Majestic Labs plans to surmount the “memory wall” with an architecture that fundamentally differs from competitors’.

Nvidia’s current servers have fast high-bandwidth memory (HBM), which is typically used to read in an LLM’s model weights. In addition, there’s an often larger but slower pool of dynamic random access memory (DRAM), which handles LLM and server overhead. Majestic instead goes all-in on DRAM (specifically LPDDR6) in a unified architecture.

Rabii says that most memory interfaces are designed to operate over a short physical distance—sometimes only a few millimeters. That limits how much memory can be placed. “You get this shoreline at the compute die where you can put your HBM. If you wanted to put more, you can’t,” Rabii explains.

To solve that, Majestic uses a proprietary memory interface constructed from miniature copper cables that’s effective up to a meter. This is paired with custom memory aggregation chips that sit physically next to memory modules and coordinate memory across the server.

“It’s an endpoint for that high-speed interface, and fans out to many, many commodity DRAM chips,” explains Rabii. In addition to addressing large pools of memory, Majestic says this design offers memory bandwidth up to 25.6 terabytes per second.

Ignite AI Processor for LLM Acceleration

More memory is good, but it needs to be paired with AI acceleration, something akin to Nvidia’s GPU. Majestic’s solution to this is Ignite, a custom AI processing unit that serves as the server’s compute engine. The Prometheus server contains 12 Ignite chips.

Ignite combines datacenter-class ARM application cores with RISC-V vector and tensor cores on a single die, all sharing the same memory space. The ARM cores act as an on-chip host processor to orchestrate the AI model. The RISC-V cores carry out the actual LLM processing. The result is a single chip that handles multiple aspects of LLM inference demands without handing off between processors. Majestic Labs has yet to reveal specific metrics for Prometheus’ compute performance.

Rabii acknowledges that software is important, as well, given that many AI frameworks are already entrenched. “We’re trying to reduce friction as much as possible in every aspect of our customer adoption, whether it’s physical or software,” he says. Prometheus will support PyTorch, vLLM, and OpenAI’s Triton inference frameworks without requiring code modifications. That means existing models compatible with these frameworks can run as-is.

Prometheus Server Design and Pricing

All of this combines in the server itself, which is built into an Open Compute Project-compliant from factor 21 inches wide and 36 inches deep. Up to four servers can fit in a server rack, power draw is expected to total up to 120 kilowatts per rack, and heat will be managed with cold plate liquid cooling. The server’s memory design is modular, which means servers purchased with less than the maximum of 128TB of memory can be upgraded at a later date.

Despite the breadth of the project, Majestic wants to position Prometheus on price, too—which might be a surprise given how much memory each server can contain. Majestic argues this will be possible because it uses DRAM instead of HBM memory. Pricing has not yet been announced, as Prometheus is expected to ship in 2027.

“Our customers’ capital expenditure will come down by, depending on the workload, ten to fifty times, and the power consumption comes down by a similar amount,” Rabii claims.

From Your Site Articles

How and When the Memory Chip Shortage Will End ›

To Speed Up AI, Just Outsource Memory ›