There's a new Nvidia architecture in town, and it's a doozy. Just announced by Nvidia's CEO Jensen Huang at GTC, Blackwell will be featured inside the ridiculously large B200 GPU. However, calling them “GPUs” would be technically incorrect. These are dual GPU packages with a total of 208 billion transistors. To put that in perspective, Nvidia's old must-have computing chips, the Hopper H200 and H100, have just over 80 billion transistors. An RTX 4090 has 76.3 million. At Blackwell we expect more than double that, which makes a lot of sense given the dual GPUs and a new chip-to-chip connection.
Unfortunately, Blackwell is not for gaming. Buuuuh! I'm not sure our bank accounts would be ready for such a massive thing anyway. Blackwell is primarily intended for adoption in data centers that are aiming for ever-increasing computing volumes. Why? Mostly artificial intelligence.
But as we wait for news on next-gen GeForce graphics cards, let's keep in mind what features might carry over from those mahoosive Blackwell chips into the architecture of the next gaming graphics cards – which could be Nvidia Blackwell, too a slimmed down version because we don't need all the gubbins included in the B200/B100.
Let's start with something we'll likely see in a GeForce GPU in the future: Blackwell features new fifth-generation Tensor Cores. These are instruction accelerators primarily used in AI applications, such as: B. Inference and training, and the fifth generation versions are said to increase performance by up to 30x. The new Tensor Cores include precision formats and an updated Transformer Engine, first introduced with Hopper, to accelerate inference and training of large language models.
Since GeForce cards use Tensor Cores for features like DLSS and we've already seen fourth-generation Tensor Cores make the jump from Nvidia's pure Enterprise Hopper architecture to the Ada Lovelace architecture that powers the RTX-40 series, it's likely that you'll see the same thing happen with the next generation as well. What will be crucial, however, will be how Nvidia will use these additional features: a new DLSS version or a frame generation feature would be the likely candidates for further development.
Below: Scroll through the gallery to see step-by-step how to put together each part of the Blackwell package.
The Blackwell B200 and B100 both seem to be cut from the same cloth. The B200 offers higher processing power at 40 TFLOPs FP64 to 30 TFLOPs FP64 in the larger HGX B200 system (consisting of eight B200 GPUs) than the B100 in the HGX B100 system (consisting of eight B100 GPUs), but the performance seems close to be enough We therefore assume that both B200 and B100 consist of the same huge 208 billion transistor package.
“It’s okay, Hopper,” Huang says as he holds the giant Blackwell package next to a Hopper cube.
Row 0 – Cell 0 | B200 |
Architecture | Blackwell |
Transistors | 208 billion (2x 104B) |
Process node | TSMC 4NP |
Tensor cores | 5th Gen |
Number of tensor cores | ? |
Number of CUDA cores | ? |
Clock frequency | ? |
Memory | HBM3e |
Storage capacity | 192GB |
Price | Lol |
The Blackwell chip has two GPUs that function like a single chip – each GPU is built at the so-called reticle boundary, which is essentially the maximum manufacturable size for a single chip in a given lithography process. As Huang noted during his keynote: “There is a fine line between two chips. This is the first time that two chips have bumped into each other in such a way that the two chips think it is one chip.”
Nvidia has played around with GPU allocation before. The Ampere GA100 GPU was more or less split in half down the middle by a connection, but the actual silicon was not. Blackwell makes this further leap with two properly cut silicone halves.
Four years ago we split GA100 into two halves that communicate via a connection. It was a big step – and yet hardly anyone noticed, thanks to the great work of CUDA and the GPU team. Today, that work comes to fruition with the launch of Blackwell. Two die. A great GPU. https://t.co/XuaUQPskkM pic.twitter.com/svRKhwPYEnMarch 18, 2024
Will the same dual-GPU design make it into a gaming graphics card? It's damn unlikely, but not completely impossible.
First of all, leaks suggest that Nvidia's largest next-generation graphics card, presumably the RTX 5090, will feature many more CUDA cores than its predecessor, the RTX 4090. However, current rumors do not indicate a direct doubling of these cores towards the RTX 4090 specification. Even if the RTX 5090 were to use two smaller GPUs to make a more efficient chip, there are more concrete reasons why a dual-GPU approach might be unrealistically difficult.
It's hugely difficult to get two GPUs running at the same time while gaming – that's why CrossFire and SLI are dead. For this multi-GPU approach to work, the two GPUs would have to function as a single unit while requiring almost no changes to the APIs that come with it communicate with a graphics card.
However, Huang notes that “these two sides of the Blackwell chip have no idea which side they're on,” referring to the way the Blackwell GPU package works together as a whole. “There are no memory locality issues, no cache issues. It’s just a huge chip,” Huang continues.
While this gives me some hope that I'll reach a point where multi-GPU gaming failures are possible, it's still a tough pill to crack. This is easier with compute chips like the B200, provided there is enough bandwidth between the two chips – hence there is a 10TB/s connection on the Blackwell GPU.
For now, I think it's more likely that a current process node will be Nvidia's primary way of packing more cores into a gaming chip. And this is one area where Blackwell gives a hint of what to expect.
Above: Nvidia has prototype boards for some really powerful (and expensive) systems that use Blackwell.
What could enable Blackwell's leap to a next-generation gaming card is its use of TSMC's 4NP process node. This is reportedly an extension of the custom 4N process node designed exclusively for and used by Nvidia's Ada Blackwell and Hopper chips. However, it's not really a 4nm process node, but more closely related to TSMC's 5nm node. It's confusing, but that's apparently intentional since almost every major semiconductor manufacturer does the same thing. For example, Intel 7 is actually its 10nm process, and what constitutes a 10nm process anyway? We could stay here for a while. The point is that it is very likely that we will also see the next generation GeForce cards using the 4NP process.
The decompression engine on Blackwell is of particular interest to gamers. Nvidia introduced RTX IO back in 2020 as a way to shift the load from the CPU to the GPU to speed up game asset load times. It's part of a broader industry initiative to integrate GPU decompression into games, including AMD's SmartAccess Storage, Microsoft's DirectStorage and Khronos Group's Vulkan API. They are all based on an open GPU compression standard called GDeflate.
Blackwell's new decompression engine specifically accelerates GDeflate, among other decompression standards, and that could prove useful in a broader push to introduce GDeflate in games if it were also integrated into next-generation GeForce GPUs. The faster a GPU can decompress assets, the faster they can be loaded into a game, and this means more detailed game worlds can be designed with a reasonable expectation of performance.
Now, there are some parts of Blackwell that probably won't make it into a future gaming GPU. The RAS (Reliability, Availability, Serviceability) engine is designed to identify and report errors or potential errors before they occur. That's a feature that's much more handy if, like Meta, you're running hundreds of thousands of these things at once. Likewise, the focus on the TEE I/O security model for “safe AI” will not necessarily be on the agenda at GeForce. The ability to combine lots of GPUs into a superchip or multiple superchips into a supersystem using NVLink is also being thrown to the gaming pyre.
Finally, we won't see hundreds of gigabytes of HBM3e memory on any RTX 50-series or similar gaming chip. Nvidia's Grace Blackwell superchip, which packs two Blackwell GPUs and a Grace CPU, looks great (and expensive) with 384GB of HBM3e memory and 16TB/s bandwidth, but we'll most likely be playing with 8GB or more ( hopefully a lot). more of GDDR7 memory.
A few potential graphics goodies from Blackwell that could make the jump into a gaming graphics card, and a few more that probably won't. Unfortunately, we don't know when we will find out exactly. Nvidia has not yet given us a firm date for the arrival of the next generation gaming graphics cards. It's also not really known when Blackwell-based products will arrive. However, the company has little need for advertising: Meta, Google, Microsoft, OpenAI, Oracle, xAI, Dell and Amazon are among its customers who are already lining up to get something from Blackwell.