With Nvidia’s RTX 50 collection launch at CES earlier this month and all the following critiques of the RTX 5090, particularly in-depth ones like our Dave’s, laying out all of the specs, everyone knows that the most important Blackwell chip is one severely huge GPU. And because of a brand new die shot of the processor, we will now feast our eyes on all these shaders and cache.
Creating an in depth die shot of any processor is not easy. It takes many failed makes an attempt, involving quite a few cracked chips and pores and skin burns, to excellent the method. If that chip simply so occurs to be an Nvidia GB202, the GPU powering the GeForce RTX 5090, then there are a couple of different limitations to beat, particularly getting your fingers on one and being keen to sacrifice a $2,000+ graphics card for the sake of an image.
GB202 Dieshot/5090 DieshotThanks By@ASUS Tony 俞元麟 by Chip@万扯淡 by Dieshot@Kurnalsalts LayoutPhoto1 GB202 DieshotPhoto2 AD102 vs GB202 full Pixel Picture pls take part Kurnal’s Telegram teamhttps://t.co/MI6oCa2yOA pic.twitter.com/pny7bvCs5jJanuary 25, 2025
Enter Tony Yu, common supervisor of Asus China and all-round high chap, to save lots of the day, sharing a high-resolution picture of the GB202 die thank X person Kurnal managed to seize maintain of after which helpfully label all the important thing components (by way of Tom’s {Hardware}).
Whereas the picture itself would not reveal any main surprises, as Nvidia has caught with the identical basic design structure for a few years now, it does that the engineers needed to make some fascinating selections as a way to get every part to suit inside the die’s bodily dimensions.
For instance, if you happen to have a look at the GB202 and evaluate it to the AD102 (the RTX 4090’s GPU), you will see that all the logic blocks for the NVENC video encoders and decoders have moved from the underside to the very center of the chip.
The rationale for that is twofold: firstly, the GB202 sports activities three encoders and two decoders, to the AD102’s two and one respectively, and it additionally has an aggregated 512-bit reminiscence bus. If Nvidia had stored the NVENC blocks all on the backside, then these 16 bodily reminiscence interfaces (PHYs) would have made the die very lengthy/tall. Maybe too tall.
One thing else we will clearly see is the all that L2 cache within the very centre of the die. The place AMD makes use of a quick however advanced multi-level cache hierarchy, Nvidia takes a less complicated method, leading to a most important L1 cache for every SM (Streaming Multiprocessor) after which a hulking L2 (last-level) cache, in addition to some smaller ones dotted about within the SMs.
Aside from that, the design is fairly easy. The complete die includes 12 GPCs (Graphics Processing Cluster), every sporting its personal ‘Raster Engine’, often known as a ROPs cluster. These GPCs are organised into eight TPCs (Texture Processing Items) and inside every of these, you will discover two SMs—these home 128 CUDA cores apiece, for a grand complete of 24,576 shader items.
However for all its massiveness, it isn’t the most important chip Nvidia has ever stuffed right into a gaming graphics card. It is when it comes to transistor and shader depend however not when it comes to bodily dimensions. With an space of 750 mm2, the GB202 is 23% bigger than the AD102 (609 mm2) and 19% bigger than the GA102 (RTX 3090, 628 mm2).
Nevertheless, it is 0.5% smaller than the TU102 (754 mm2), the GPU within the RTX 2080 Ti, and eight% smaller than the GV100 (815 mm2). The latter, primarily based on the Volta structure, is not actually a gaming GPU, however for some time, Nvidia marketed it as such. The Titan V was very a lot the ‘RTX 5090’ of its period (2017), not least due to its $2,999 price ticket.
A little bit over 800 mm2 is about as massive as a single die can go, as a result of reticule restrict, however the GB202 is not all that far off. Whether or not the following generations of GPUs are this huge stays to be seen but when that is the final hurrah for monstrously enormous, monolithic chips in gaming graphics playing cards, earlier than switching to tiled or stacked chiplets, then at the very least we’ve this pretty die shot to stare at and attempt to spot when the RT and Tensor cores is likely to be amidst within the ocean of transistors.