AI News HubLIVE
In-site rewrite2 min read

OpenAI, Broadcom debut custom Jalapeño chip for AI inference

OpenAI has unveiled a custom chip called Jalapeño, developed in collaboration with Broadcom, specifically designed for AI inference. The chip boasts higher performance per watt and an architecture that reduces data movement. Initial servers are expected by year-end, marking the first step in a multi-generation compute platform.

SourceSiliconANGLE AIAuthor: Maria Deutscher

OpenAI Group PBC today revealed a custom chip called Jalapeño that it will use to power its large language models.

The processor is the fruit of a collaboration with Broadcom Inc., which is no stranger to custom silicon design. The company helped Google LLC develop its TPU line of artificial intelligence accelerators. In April, the search giant extended its chip collaboration with Broadcom to 2031.

Nvidia Corp.’s flagship Rubin graphics cards can run both training and inference workloads. By contrast, Jalapeño is only designed for the latter use case, which is the process of running the AI models in response to queries. According to OpenAI, early testing indicates that the chip can perform inference with significantly higher performance per watt than “current state-of-the-art,” which may be a reference to Nvidia chips.

The company has shared few details about Jalapeño’s design. However, the blog post in which it announced the chip specifies that the underlying “architecture reduces data movement.” That hints Jalapeño’s architecture may be designed to reduce data movement between its logic circuits and off-chip memory, one of the main performance bottlenecks in inference clusters.

AI chip suppliers take several approaches to reducing data movement. One of the most common methods is to equip an accelerator with a large amount of onboard SRAM, a type of high-speed memory. The more SRAM a chip includes, the less data must be sent to off-chip memory. Cerebras Systems Inc. and Groq Inc. are among the companies that have adopted that approach.

OpenAI says that its Jalapeño-powered inference clusters will use multiple Broadcom networking technologies. One of them is the company’s Tomahawk chip series, which is designed to power Ethernet switches. Tomahawk-based switches can be used to move data both between servers in the same rack and between racks.

Broadcom’s newest Tomahawk chip, the Tomahawk 6, can process up to 1.6 terabits of traffic per second. A built-in congestion management engine fixes network bottlenecks that might slow down connections.

OpenAI plans to deploy Jalapeño and its Broadcom-supplied network equipment in custom server racks. The ChatGPT developer is developing the systems in collaboration with Celestia Inc., a Toronto-based provider of data center equipment design services. The company can also help customers optimize their server production lines.

It will bring its first Jalapeño servers online by year’s end. It plans to expand its use of the chip over time. Its blog post describes Jalapeño as the “first step in a multi-generation compute platform,” which hints that it may be planning to develop additional inference processors in the future. Another possibility is that OpenAI will design custom chips for adjacent use cases such as model training.

Jalapeño may have the potential to open new revenue streams for the company. Nvidia sells its graphics cards as part of systems called DGX appliances that also include central processing units, cooling modules and other hardware. OpenAI has the resources to bring competing Jalapeño-powered appliances to market. It could even enable customers to run its AI models on-premises using such systems.

A move into the lucrative AI hardware market might not only boost OpenAI’s revenue growth but also raise investor interest in its upcoming public offering. Anthropic PBC, the company’s top rival, recently filed for a listing of its own. An inference hardware offering could be a valuable differentiator for OpenAI during its roadshow, particularly if Anthropic goes public first.

Photo: OpenAI

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more

11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media