Alongside their EPYC server CPU updates, as a part of as we speak’s AMD Information Heart occasion, the corporate can also be providing an replace on the standing of their nearly-finished AMD Intuition MI300 accelerator household. The corporate’s next-generation HPC-class processors, which use each Zen 4 CPU cores and CDNA 3 GPU cores on a single bundle, have now develop into a multi-SKU household of XPUs.
Becoming a member of the beforehand introduced 128GB MI300 APU, which is now being known as the MI300A, AMD can also be producing a pure GPU half utilizing the identical design. This chip, dubbed the MI300X, makes use of simply CDNA 3 GPU tiles quite than a mixture of CPU and GPU tiles within the MI300A, making it a pure, high-performance GPU that will get paired with 192GB of HBM3 reminiscence. Aimed squarely on the giant language mannequin market, the MI300X is designed for purchasers who want all of the reminiscence capability they’ll get to run the most important of fashions.
First introduced again in June of final 12 months, and detailed in better depth again at CES 2023, the AMD Intuition MI300 is AMD’s massive play into the AI and HPC market. The distinctive, server-grade APU packs each Zen 4 CPU cores and CDNA 3 GPU cores on to a single, chiplet-based chip. None of AMD’s opponents have (or may have) a mixed CPU+GPU product just like the MI300 collection this 12 months, so it offers AMD an attention-grabbing resolution with a really united reminiscence structure, and loads of bandwidth between the CPU and GPU tiles.
MI300 additionally contains on-chip reminiscence by way of HBM3, utilizing 8 stacks of the stuff. On the time of the CES reveal, the best capability HBM3 stacks have been 16GB, yielding a chip design with a most native reminiscence pool of 128GB. Nevertheless, because of the current introduction of 24GB HBM3 stacks, AMD is now going to have the ability to supply a model of the MI300 with 50% extra reminiscence – or 192GB. Which, together with the extra GPU chiplets discovered on the MI300X, are meant to make it a powerhouse for processing the most important and most complicated of LLMs.
Beneath the hood, MI300X is definitely a barely less complicated chip than MI300A. AMD has changed MI300A’s trio of CPU chiplets with simply two CDNA 3 GPU chiplets, leading to a 12 chiplet design total – 8 GPU chiplets and what seems to be one other 4 IO reminiscence chiplets. In any other case, regardless of excising the CPU cores (and de-APUing the APU), the GPU-only MI300X seems to be rather a lot just like the MI300A. And clearly, AMD is aiming to benefit from the synergy in providing each an APU and a flagship CPU in the identical bundle.
Uncooked GPU efficiency apart (we haven’t any onerous numbers to talk of proper now), a bit a part of AMD’s story with the MI300X goes to be reminiscence capability. Simply providing a 192GB chip by itself is a giant deal, on condition that reminiscence capability is the constraining issue for the present era of huge language fashions (LLMs) for AI. As we’ve seen with current developments from NVIDIA and others, AI prospects are snapping up GPUs and different accelerators as rapidly as they’ll get them, all of the whereas demanding extra reminiscence to run even bigger fashions. So with the ability to supply an enormous, 192GB GPU that makes use of 8 channels of HBM3 reminiscence goes to be a large benefit for AMD within the present market – a minimum of, as soon as MI300X begins transport.
The MI300 household stays on observe to ship sooner or later later this 12 months. In keeping with AMD, the 128GB MI300A APU is already sampling to prospects now. In the meantime the 192GB MI300X GPU will probably be sampling to prospects in Q3 of this 12 months.
It additionally goes with out saying that, with this announcement, AMD has solidified that they are doing a versatile XPU design a minimum of 3 years earlier than rival Intel. Whereas Intel scrapped their mixed CPU+GPU Falcon Shores product for a pure GPU Falcon Shores, AMD is now slated to supply a versatile CPU+GPU/GPU-only product as quickly as the top of this 12 months. On this timeframe, will probably be going up towards merchandise corresponding to NVIDIA’s Grace Hopper superchip, which though is not an APU/XPU both, comes very shut by linking up NVIDIA’s Grace CPU with a Hopper GPU by way of a excessive bandwidth NVLink. So whereas we’re ready on additional particulars on MI300X, it ought to make for a really attention-grabbing battle between the 2 GPU titans.
Total, the stress on AMD close to the MI300 household is critical. Demand for AI accelerators has been by way of the roof for a lot of the previous 12 months, and MI300 will probably be AMD’s first alternative to make a major play for the market. MI300 won’t fairly be a make-or-break product for the corporate, however moreover getting the technical benefit of being the primary to ship a single-chip server APU (and the bragging rights that include it), it can additionally give them a recent product to promote right into a market that’s shopping for up all of the {hardware} it will possibly get. Briefly, MI300 is predicted to be AMD’s license to print cash (ala NVIDIA’s H100), or so AMD’s keen buyers hope.
AMD Infinity Structure Platform
Alongside as we speak’s 192GB MI300X information, AMD can also be briefly saying what they’re calling the AMD Infinity Structure Platform. That is an 8-way MI300X design, permitting for as much as 8 of AMD’s top-end GPUs to be interlinked collectively to work on bigger workloads.
As we’ve seen with NVIDIA’s 8-way HGX boards and Intel’s personal x8 UBB for Ponte Vecchio, an 8-way processor configuration is at the moment the candy spot for high-end servers. That is each for bodily design causes – room to put the chips and room to route cooling by way of them – in addition to the very best topologies which might be out there to hyperlink up numerous chips with out placing too many hops between them. If AMD is to go toe-to-toe with NVIDIA and to seize a part of the HPC GPU market, then that is another space the place they’re going to wish to match NVIDIA’s {hardware} choices
AMD is looking the Infinity Structure Platform an “industry-standard” design. Accoding to AMD, they’re utilizing an OCP server platform as their base right here; and whereas this means that MI300X is utilizing an OAM kind issue, we’re nonetheless ready to get express affirmation of this.