Superior Micro Units CEO Lisa Su on Tuesday in San Francisco unveiled a chip that could be a centerpiece within the firm’s technique for synthetic intelligence computing, boasting its huge reminiscence and information throughput for so-called generative AI duties akin to giant language fashions.
The Intuition MI300X, because the half is thought, is a follow-on to the beforehand introduced MI300A. The chip can be a mixture of a number of “chiplets,” particular person chips which can be joined collectively in a single bundle by shared reminiscence and networking hyperlinks.
Additionally: 5 methods to discover the usage of generative AI at work
Su, onstage for an invite-only viewers on the Fairmont Resort in downtown San Francisco, referred to the half as a “generative AI accelerator,” and stated the GPU chiplets contained in it, a household generally known as CDNA 3, are “designed particularly for AI and HPC [high-performance computing] workloads.”
The MI300X is a “GPU-only” model of the half. The MI300A is a mixture of three Zen4 CPU chiplets with a number of GPU chiplets. However within the MI300X, the CPUs are swapped out for 2 further CDNA 3 chiplets.
Additionally: Nvidia unveils new sort of Ethernet for AI, Grace Hopper ‘Superchip’ in full manufacturing
The MI300X will increase the transistor depend from 146 billion transistors to 153 billion, and the shared DRAM reminiscence is boosted from 128 gigabytes within the MI300A to 192 gigabytes.
The reminiscence bandwidth is boosted from 800 gigabytes per second to five.2 terabytes per second.
“Our use of chiplets on this product may be very, very strategic,” stated Su, due to the power to combine and match totally different sorts of compute, swapping out CPU or GPU.
Su stated the MI300X will supply 2.4 occasions the reminiscence density of Nvidia’s H100 “Hopper” GPU, and 1.6 occasions the reminiscence bandwidth.
“The generative AI, giant language fashions have modified the panorama,” stated Su. “The necessity for extra compute is rising exponentially, whether or not you are speaking about coaching or about inference.”
To show the necessity for highly effective computing, Sue confirmed the half engaged on what she stated is the most well-liked giant language mannequin in the intervening time, the open supply Falcon-40B. Language fashions require extra compute as they’re constructed with higher and higher numbers of what are known as neural community “parameters.” The Falcon-40B consists of 40 billion parameters.
Additionally: GPT-3.5 vs GPT-4: Is ChatGPT Plus value its subscription charge?
The MI300X, she stated, is the primary chip that’s highly effective sufficient to run a neural community of that measurement, solely in reminiscence, fairly than having to maneuver information, back-and-forth to and from exterior reminiscence.
Su demonstrated the MI300X making a poem about San Francisco utilizing Falcon-40B.
“A single MI300X can run fashions as much as roughly 80 billion parameters” in reminiscence, she stated.
“Once you examine MI300X to the competitors, MI300X presents 2.4 occasions extra reminiscence, and 1.6 occasions extra reminiscence bandwidth, and with all of that further reminiscence capability, we even have a bonus for big language fashions as a result of we are able to run bigger fashions immediately in reminiscence.”
Additionally: How ChatGPT can rewrite and enhance your present code
To have the ability to run all the mannequin in reminiscence, stated Su, signifies that, “for the biggest fashions, that truly reduces the variety of GPUs you want, considerably rushing up the efficiency, particularly for inference, in addition to decreasing the overall price of possession.”
“I really like this chip, by the best way,” enthused Su. “We love this chip.”
“With MI300X, you may scale back the variety of GPUs, and as mannequin sizes continue to grow, it will develop into much more essential.”
“With extra reminiscence, extra reminiscence bandwidth, and fewer GPUs wanted, we are able to run extra inference jobs per GPU than you would earlier than,” stated Su. That may scale back the overall price of possession for big language fashions, she stated, making the expertise extra accessible.
Additionally: For AI’s ‘iPhone second’, Nvidia unveils a big language mannequin chip
To compete with Nvidia’s DGX techniques, Su unveiled a household of AI computer systems, the “AMD Intuition Platform.” The primary occasion of that may mix eight of the MI300X with 1.5 terabytes of HMB3 reminiscence. The server conforms to the {industry} customary Open Compute Platform spec.
“For purchasers, they’ll use all this AI compute functionality in reminiscence in an industry-standard platform that drops proper into their present infrastructure,” stated Su.
In contrast to MI300X, which is simply a GPU, the prevailing MI300A goes up towards Nvidia’s Grace Hopper combo chip, which makes use of Nvidia’s Grace CPU and its Hopper GPU, which the corporate introduced final month is in full manufacturing.
MI300A is being constructed into the El Capitan supercomputer below building on the Division of Power’s Lawrence Livermore Nationwide Laboratories, famous Su.
Additionally: Find out how to use ChatGPT to create an app
The MI300A is being proven as a pattern presently to AMD prospects, and the MI300X will start sampling to prospects within the third quarter of this yr, stated Su. Each might be in quantity manufacturing within the fourth quarter, she stated.
You possibly can watch a replay of the presentation on the Web site arrange by AMD for the information.