HBM Hits 36GB Stacks at 9.8 Gbps

Samsung’s annual Reminiscence Tech Day is going down in San Jose this morning, and as a part of the occasion, the corporate is making a few notable reminiscence know-how bulletins/disclosures. The spotlight of Samsung’s occasion is the introduction of Shinebolt, Samsung’s HBM3E reminiscence that may set new marks for each reminiscence bandwidth and reminiscence capability for high-end processors. The corporate can also be disclosing a bit extra on their GDDR7 reminiscence, which is able to mark a major technological replace to the GDDR household of reminiscence requirements.

Beginning issues off, we have now right this moment’s marquee announcement: Shinebolt HBM3E reminiscence. Like the remainder of the reminiscence trade, Samsung is getting ready a successor to the current-generation of HBM3 reminiscence that’s getting used with high-end/HPC-grade processors, with the trade settling across the upcoming HBM3E normal. HBM3E is slated to supply each considerably greater capacities and better reminiscence bandwidth than HBM3, serving to the high-bandwidth reminiscence know-how sustain with ever-growing workloads on high-end processors.

Samsung HBM Reminiscence Generations
	HBM3E (Shinebolt)	HBM3 (Icebolt)	HBM2E (Flashbolt)	HBM2 (Aquabolt)
Max Capability	36GB	24 GB	16 GB	8 GB
Max Bandwidth Per Pin	9.8 Gb/s	6.4 Gb/s	3.6 Gb/s	2.0 Gb/s
Variety of DRAM ICs per Stack	12	12	8	8
Efficient Bus Width	1024-bit
Voltage	?	1.1 V	1.2 V	1.2 V
Bandwidth per Stack	1.225 TB/s	819.2 GB/s	460.8 GB/s	256 GB/s

The premise of Shinebolt is a brand new 24Gbit HBM reminiscence die, which Samsung might be producing on their D1a course of, the corporate’s EUV-based 4^th technology 10nm-class (14nm) node. Samsung might be producing each 8Hi and ultimately 12Hi stacks based mostly on this new die, permitting for whole stack capacities of 24GB and 36GB respectively, 50% extra capability than their HBM3 (Icebolt) equivalents.

In keeping with Samsung, Shinebolt will have the ability to hit reminiscence clockspeeds as excessive as 9.8Gbps/pin, higher than 50% sooner than their HBM3 merchandise. Although given a few of Samsung’s earlier reminiscence clockspeed claims, there’s a great likelihood it is a semi-overclocked state. Shinebolt growth isn’t far sufficient alongside for Samsung to checklist particular person SKUs, however even on the conservative finish of issues, Samsung is selling knowledge charges of at the very least 8Gbps/pin of their occasion press launch. And if Samsung’s bold reminiscence frequencies do come to fruition, then that will put Samsung forward of their competitors as nicely; thus far, SK hynix and Micron have introduced plans for 9Gbps/pin and 9.2Gbps/pin reminiscence respectively, so Samsung’s claims are definitely essentially the most aggressive.

General, these clockspeeds would give a single HBM3E stack a minimal bandwidth of 1TB/sec, and a most bandwidth of 1.225TB/sec, nicely forward of the 819GB/sec knowledge fee of HBM3. Or to border issues in reference of a high-end processor (e.g. NVIDIA H100), a 6-stack chip would have the ability to entry as a lot as 216GB of reminiscence with an mixture reminiscence bandwidth as excessive as 7.35TB/sec.

As for energy effectivity, issues look to be a little bit of a blended bag. On a relative foundation, Samsung says that Shinebolt might be 10% extra environment friendly than Icebolt – in different phrases, consuming 10% much less energy per bit transferred (pJ/bit). Nonetheless, a 25%+ clockspeed enchancment will greater than wipe out these beneficial properties as a result of important improve in bits transferred. So whereas Shinebolt might be extra environment friendly total, on an absolute foundation plainly whole energy consumption for HBM reminiscence will proceed to develop with the subsequent technology.

Both means, for the high-end processor market that Samsung is concentrating on with Shinebolt, chipmakers are unlikely to be fazed by the ability improve. Like the remainder of the high-end processor area, Samsung has the AI market set sq. in its sights – a market phase the place each reminiscence bandwidth and reminiscence capability are limiting components, particularly with huge massive language fashions (LLMs). Together with the standard supercomputer and networking market segments, Samsung ought to have little bother promoting sooner HBM in the course of a booming AI market.

Like the opposite main reminiscence distributors, Samsung expects to ship Shinebolt sooner or later in 2024. On condition that the corporate simply began sampling the reminiscence – and that HBM3 Icebolt itself simply hit mass manufacturing – Shinebolt’s possible not transport till the later a part of the yr.

A Transient Teaser on HBM4: FinFETs & Copper-to-Copper Bonding

Lastly, wanting even farther into the longer term, Samsung is briefly speaking about their plans for HBM4 reminiscence. Whereas that know-how remains to be a couple of years off (there’s not even an accredited specification for it but), we all know from earlier disclosures that the reminiscence trade is aiming to maneuver to a wider, 2048-bit reminiscence interface. Which, as Samsung likes to border issues, is the one sensible alternative when additional HBM clockspeed will increase would blow out energy consumption.

For HBM4, Samsung is taking a look at using extra superior fab and packaging applied sciences which can be presently the area of logic chips. On the fab facet of issues, the corporate needs to maneuver to utilizing FinFET transistors for his or her reminiscence, versus the planar transistors nonetheless used there. As with logic, FinFETs would scale back the drive present required, which might assist to enhance DRAM power effectivity. In the meantime on the packaging facet of issues, Samsung is taking a look at transferring from micro-bump bonding to bumpless (direct copper-to-copper) bonding, a packing approach that’s nonetheless on the cutting-edge of growth even within the logic area. Embracing cutting-edge applied sciences might be vital to retaining HBM bandwidth rising because it has during the last decade, however the prices and complexities of doing so additionally underscore why HBM stays an solely area of interest high-end reminiscence know-how.

GDDR7 Replace: 50% Decrease Stand-By Energy Than GDDR6

Apart from HBM3E, Samsung’s different huge bandwidth reminiscence replace of the day is a quick standing replace on their GDDR7 reminiscence.

Again in July of this yr, Samsung introduced that they accomplished preliminary growth on their GDDR7 reminiscence. The following technology of GDDR reminiscence, GDDR7 brings with it a number of main adjustments versus right this moment’s GDDR6, essentially the most important of which is a change to PAM3 encoding. PAM3 permits for 1.5 bits to be transferred per cycle (or slightly 3 bits over two cycles), opening the door to bettering reminiscence switch charges with out using extra expensive technique of additional bettering the frequency of the reminiscence bus.

GDDR Reminiscence Generations
	GDDR7	GDDR6X	GDDR6
B/W Per Pin	32 Gbps (Projected)	24 Gbps (Delivery)	24 Gbps (Sampling)
Chip Density	2 GB (16 Gb)	2 GB (16 Gb)	2 GB (16 Gb)
Whole B/W (256-bit bus)	1024 GB/sec	768 GB/ssec	768 GB/ssec
DRAM Voltage	1.2 V	1.35 V	1.35 V
Knowledge Fee	QDR	QDR	QDR
Signaling	PAM-3	PAM-4	NRZ (Binary)
Packaging	266 FBGA	180 FBGA	180 FBGA

As a fast recap from Samsung’s July announcement, Samsung might be rolling out 16Gbit (2GB) modules, which is able to have the ability to run at as much as 32Gbps/pin. That’s a 33% enchancment in bandwidth per pin over present GDDR6 reminiscence, and would convey the combination bandwidth of a 256-bit reminiscence bus to a cool 1TB/second. GDDR7 must also ship a 20% enchancment in energy effectivity over Samsung’s GDDR6 (by way of pJ/bit), thanks partially to the usage of Samsung’s 3^rd technology D1z (10nm-class) fab node.

In the present day’s occasion from Samsung is basically a recap of July’s announcement, however within the course of we have now realized a few new technical particulars on GDDR7 that Samsung hasn’t beforehand disclosed. First off, GDDR7 isn’t simply bettering lively energy consumption, however the tech may even enhance on stand-by energy consumption to a major diploma. Due to further clock controls, GDDR7 will eat 50% much less stand-by energy than GDDR6.

Second, in discussing why Samsung (and the trade as a complete) went with PAM3 encoding for GDDR7 as a substitute of even denser PAM4, the corporate confirmed a few of our technical suppositions on the brand new know-how. Briefly, PAM3 has a decrease common bit error fee (BER) than PAM4, largely due to the broader margins on the attention window. None of which makes PAM4 unworkable (as Micron has already confirmed), however Samsung and the remainder of the reminiscence trade are favoring the relative simplicity of PAM3, given the trade-offs.

Apart from the standard video card/gaming prospects, Samsung is anticipating GDDR7 to be adopted by AI chip makers, and maybe a bit extra surprisingly, the automotive trade. In truth a few of these non-traditional prospects often is the first to undertake the reminiscence; for the reason that conventional GPU distributors are nonetheless mid-cycle on their present technology of merchandise, it is going to nonetheless be fairly a while earlier than they ship any GDDR7-capable silicon.

At this level Samsung has not introduced a projected date for when their GDDR7 reminiscence will go into mass manufacturing. However the firm remains to be anticipating that they would be the first vendor to ship the next-generation reminiscence, presumably in 2024.

Source link