Amazon's Trainium2 AI Accelerator Features 96 GB of HBM, Quadruples Training Performance

Amazon Internet Companies this week launched Trainium2, its new accelerator for synthetic intelligence (AI) workload that tangibly will increase efficiency in comparison with its predecessor, enabling AWS to coach basis fashions (FMs) and enormous language fashions (LLMs) with as much as trillions of parameters. As well as, AWS has set itself an formidable aim to allow its shoppers to entry huge 65 ‘AI’ ExaFLOPS efficiency for his or her workloads.

The AWS Trainium2 is Amazon’s 2^nd Era accelerator designed particularly for FMs and LLMs coaching. When in comparison with its predecessor, the unique Trainium, it options 4 instances larger coaching efficiency, two instances larger efficiency per watt, and thrice as a lot reminiscence – for a complete of 96GB of HBM. The chip designed by Amazon’s Annapurna Labs is a multi-tile system-in-package that includes two compute tiles, 4 HBM reminiscence stacks, and two chiplets whose goal is undisclosed for now.

Amazon notably doesn’t disclose particular efficiency numbers of its Trainium2, nevertheless it says that its Trn2 cases are scale-out with as much as 100,000 Trainium2 chips to stand up to 65 ExaFLOPS of low-precision compute efficiency for AI workloads. Which, working backwards, would put a single Trainium2 accelerator at roughly 650 TFLOPS. 65 EFLOPS is a stage set to be achievable solely on the highest-performing upcoming AI supercomputers, such because the Jupiter. Such scaling ought to dramatically scale back the coaching time for a 300-billion parameter giant language mannequin from months to weeks, in line with AWS.

Amazon but has to reveal the total specs for Trainium2, however we might be stunned if it did not add some options on high of what the unique Trainium already helps. As a reminder, that co-processor helps FP32, TF32, BF16, FP16, UINT8, and configurable FP8 knowledge codecs in addition to delivers as much as 190 TFLOPS of FP16/BF16 compute efficiency.

What is probably extra necessary than pure efficiency numbers of a single AWS Trainium2 accelerators is that Amazon has companions, reminiscent of Anthropic, which are able to deploy it.

“We’re working intently with AWS to develop our future basis fashions utilizing Trainium chips,” stated Tom Brown, co-founder of Anthropic. “Trainium2 will assist us construct and prepare fashions at a really giant scale, and we count on it to be at the very least 4x quicker than first technology Trainium chips for a few of our key workloads. Our collaboration with AWS will assist organizations of all sizes unlock new potentialities, as they use Anthropic’s state-of-the-art AI programs along with AWS’s safe, dependable cloud know-how.”

Source link