We heard you like HBM – Nvidia’s Blackwell Ultra GPUs have 288 GB of it

GTC Nvidia’s Blackwell GPU architecture is barely out of the cradle – and the graphics chip giant is already looking to extend its lead over rival AMD with an Ultra-themed refresh of the technology.

Announced on stage at Nvidia’s GPU Technology Conference (GTC) in San Jose, California, on Tuesday by CEO and leather jacket aficionado Jensen Huang, the Blackwell Ultra family of accelerators boasts up to 15 petaFLOPS of dense 4-bit floating-point performance and up to 288 GB of HBM3e memory per chip.

And if you’re primarily interested in deploying GPUs for AI inference, that’s a bigger deal than you might think. While training is generally limited by how much compute you can throw at the problem, inference is primarily a memory-bound workload. The more memory you have, the bigger the model you can serve.

According to Ian Buck, Nvidia veep of hyperscale and HPC, the Blackwell Ultra will enable reasoning models including DeepSeek-R1 to be served at 10x the throughput of the Hopper generation, meaning questions that previously may have taken more than a minute to be answered can now be done in as little as ten seconds.

With 288 GB of capacity across eight stacks of HBM3e memory onboard, a single Blackwell Ultra GPU can now run substantially larger models. At FP4, Meta’s Llama 405B could fit on a single GPU with plenty of vRAM left over for key-value caches.

To achieve this higher capacity, Nvidia’s Blackwell Ultra swapped last-gen’s eight-high HBM3e stacks for fatter 12-high modules, boosting capacity by 50 percent. However, we’re told that memory bandwidth remains the same at a still class-leading 8 TB/s.

If any of this sounds familiar, this isn’t the first time we’ve seen Nvidia employ this strategy. In fact, Nv is following a similar playbook to its H200, which was essentially just an H100 with faster, higher-capacity HBM3e onboard. However, this time around, with these latest Blackwells, Nvidia isn’t just strapping on more memory, it’s also juiced the peak floating-point performance by 50 percent – at least for FP4 anyway.

Nvidia tells us that FP8 and FP16/BF16 performance is unchanged from last gen.

More memory, more compute, more ‘GPUs’

While many have fixated on Nvidia’s $30,000 or $40,000 chips, it’s worth remembering that Hopper, Blackwell, and now its Ultra refresh aren’t one chip so much as a family of products ranging the gamut from PCIe add-in cards and servers to rack-scale systems and even entire supercomputing clusters.

In the datacenter, Nvidia will offer Blackwell Ultra in both its more traditional HGX servers and its rack-scale NVL72 offerings.

Nvidia’s HGX form factor has, at least for the past few generations, featured up to eight air-cooled GPUs stitched together by a high-speed NVLink switch fabric. However, this time it a new config it’s calling the B300 NVL16, which might sound like they’re cramming twice as many GPUs into a box, something they’re previously done with the HGX V100. In reality, Nvidia changed its mind and has decided to count the individual compute dies on the package as GPUs.

According to Nvidia, the Blackwell-based B300 NVL16 will deliver 7x the compute and 4x the memory capacity of its Hopper generation, which we’ve learned refers to its 80GB H100s and not the higher capacity H200 systems. By our calculation works out to 112 petaFLOPS of dense FP4 compute or about 7 petaFLOPS of dense FP4 per GPU die, or 14 petaFLOPS per SXM module. That’s quite a performance uplift, with each B300 die performing on par with the Blackwell B100-series chips announced last year.

Nvidia does appear to have done a fair bit of rounding with its memory claims. By our calculation the HGX B300 systems actually deliver closer to 3.6x more memory at 2.3 TBs versus 640GB of the HGX H100.

For even larger workloads, Nvidia will also offer the accelerators in its Superchip form-factor. Unlike last year’s GB200, the GB300 Superchip will pair four Blackwell Ultra GPUs with a combined 1,152GB of HBM3e memory with two 72-core Grace Arm-compatible CPUs.

Up to 18 of these Superchips can be stitched together using Nvidia’s NVLink switches to form an NVL72 rack-scale system. But rather than the 13.5 terabytes of HBM3e of last year’s model, the Grace-Blackwell GB300-based systems will offer up to 20 terabytes of vRAM. What’s more, Buck says the system has been redesigned for this generation with an eye toward improved energy efficiency and serviceability.

And if that’s still not big enough, eight of these racks can be combined to form a GB300 SuperPOD system containing 576 Blackwell Ultra GPUs and 288 Grace CPUs.

Where does this leave Blackwell?

Given its larger memory capacity, it’d be easy to look at Nvidia’s line-up and question whether Blackwell Ultra will end up cannibalizing shipments of the non-Ultra variant.

However, the two platforms are clearly aimed at different markets, with Nvidia presumably charging a premium for its Ultra SKUs.

In a press briefing ahead of Huang’s keynote address today, Nvidia’s Buck described three distinct AI scaling laws, including pre-training scaling, post-training scaling, and test-time scaling, each of which require compute resources to be applied in different ways.

At least on paper, Blackwell Ultra’s higher memory capacity should make it well suited to the third one of these regimes, as they allow customers to either serve up larger models – AKA inference – faster or at higher volumes.

Meanwhile, for those building large clusters for compute-bound training workloads, we expect the standard Blackwell parts to continue to see strong demand. After all, there’s little sense in paying extra for memory you don’t necessarily need.

With that said, there’s no reason why you wouldn’t use a GB300 for training. Nvidia tells us the higher HBM capacity and faster 800G networking offered by its ConnectX-8 NICs will contribute to higher training performance.

Competition

With Nvidia’s Blackwell Ultra processors expected to start trickling out sometime in the second half of 2025, this puts it in contention with AMD’s upcoming Instinct MI355X accelerators, which are in an awkward spot. We would say the same about Intel’s Gaudi3 but that was already true when it was announced.

Since launching its MI300-series GPUs in late 2023, AMD’s main point of differentiation was that its accelerators had more memory (192 GB and later 256 GB) than Nvidia’s (141 GB and later 192 GB), making them attractive to customers, such as Microsoft or Meta, deploying large multi-hundred- or even trillion-parameter-scale models.

MI355X will also see AMD juice memory capacities to 288 GB of HBM3e and bandwidth to 8 TB/s. What’s more, AMD claims the chips will close the gap considerably, promising floating-point performance roughly on par with Nvidia’s B200.

However, at a system level, Nvidia’s new HGX B300 NVL16 systems will offer the same amount of memory, and significantly higher FP4 floating-point performance. If that weren’t enough, AMD’s answer to Nvidia’s NVL72 is still another generation away with its forthcoming MI400 platform.

This may explain why, during its last earnings call, AMD CEO Lisa Su revealed that her company planned to move up the release of its MI355X from late in the second half to the middle of the year. Team Red also has the potential to undercut its rival on pricing and availability, a strategy it’s used to great effect in its ongoing effort to steal share from Intel. ®

Updated at 15.55 UTC on March 19, 2025, to add

This article was updated to clarify the memory configuration of the HGX B300 NVL16. We had asked Nvidia about this following our earlier chat with them, and it transpires the silicon goliath changed the definition of a GPU and NVL naming convention for the B300 but confusingly not the GB300.

Never mind that, our friends over at The Next Platform have more here on Nvidia’s roadmap to 2028

Don’t panic! But there’s young women with phones in Parliament! And it creates a better politics

Five years on: How Covid changed everything

Presenter who started career in Oxfordshire to leave BBC after 28 years

The politics behind Starmer’s decision to scrap NHS England

Electioncast: Three battleground seats to watch

Rate-setter calls on Bank of England to adopt ‘activist’ approach to rates

Mike Berners-Lee: Why dishonesty is destroying the planet | WTCTW Podcast

Ex-Fox News star Steve Hilton launches run for governor in California with goal of taking on Kamala Harris

Joe Biden should stay out of politics for the good of his party, says former aide to Jill

Politics LIVE: Keir Starmer warns businesses of ‘economic impact’ from Trump tariffs

Séance politics: Is it unpatriotic for Turnbull to question AUKUS?

What does the Le Pen verdict mean for the future of French politics?

Knots Landing star tragically dies aged 63

Blake Lively and Ryan Reynolds’ lookalike family member steals the show at star-studded gala

Demi Moore showcases incredibly slender frame after revealing she ‘tortured’ her body

Six Million Dollar Man Lee Majors turns 86

Former NBA star revealed as the world’s highest paid athlete — despite retiring years ago

‘Divine’ Tom Ford perfume that ‘receives so many compliments’ now £62 in flash sale

California Overtakes Japan to Become World’s Fourth-Largest Economy — But Challenges Loom

Nissan Bleeds $5.26 Billion in Losses: Why The Japanese Automaker Is Struggling to Restructure

Pope Francis’s Death Sparks Frenzy For Papal Thrillers — Why Millions Are Watching ‘Conclave’

Quick Facts About Leanna Lenee: Age, Career, Relationship with Travis Hunter, and More

‘Final Wish’: Heartwarming Reason Why Pope Francis Chose Santa Maria Maggiore For His Burial

China Will Teach Kids AI Across the Country