First APU-based Exascale System Shaping Up For 2024

Lawrence Livermore Nationwide Laboratory had acquired the primary parts of its upcoming El Capitan supercomputer and begun to put in them, the laboratory introduced on Wednesday. The system is about to return on-line in mid-2024 and is predicted to ship efficiency of over 2 ExaFLOPS.

LLML’s El Capitan relies on Cray’s Shasta supercomputer structure and might be constructed by HPE, similar to two different exascale programs within the U.S., Frontier and Aurora. Not like the primary two exascale machines, which use a conventional discrete CPU plus discrete GPU configuration, the El Capitan supercomputer would be the first one primarily based on AMD server-grade APUs that combine each processor varieties in to a single, extremely related bundle.

AMD’s Intuition MI300A APU incorporates each CPU and GPU chiplets, providing 24 general-purpose Zen 4 cores, compute GPUs powered by the CDNA 3 structure, and 128 GB of unified on-package HBM3 reminiscence. AMD has been internally evaluating its Intuition MI300A APU for months, and it seems that AMD and HPE at the moment are prepared to start out putting in the primary items of {hardware} that make up El Capitan.

In line with footage launched by the Lawrence Livermore Nationwide Laboratory, its engineers have already put a considerable variety of servers into racks. Although LLNL’s announcement leaves it unclear whether or not these are “accomplished” servers with production-quality silicon, or pre-production servers that might be crammed out with manufacturing silicon at a later date. Notably, components of Aurora had been initially assembled with pre-production CPUs, which had been solely swapped out for Xeon CPU Max chips over the previous couple of months. Given the quantity of validation work required to stand-up a world-class supercomputer, AMD and HPE could also be using an analogous technique right here.

“We’ve got begun receiving & putting in parts for El Capitan, first #exascale #supercomputer,” a Tweet by LLNL reads. “Whereas we’re nonetheless a methods from deploying it for nationwide safety functions in 2024, it’s thrilling to see years of labor changing into actuality.”

When it comes on-line in 2024, LLNL is anticipating El Capitan to be the quickest supercomputer on the planet. Although with its full specs nonetheless being held again, it is not clear how a lot sooner it’s on paper in comparison with the two EFLOPS Aurora – not to mention real-world efficiency. A part of the design aim of AMD’s MI300A APU is to use further efficiency effectivity positive aspects that come from inserting CPU and GPU blocks so shut collectively, so will probably be fascinating to see what the software program improvement groups programming for El Capitan can obtain, particularly as they get their software program additional optimized.

LLNL’s El Capitan is predicted to value $600 million. The system might be used nuclear weapons simulations and might be essential for the U.S. nationwide safety. It replaces Sierra, a supercomputer primarily based on IBM Energy 9 and NVIDIA Volta accelerators, and guarantees to supply efficiency that’s 16 occasions larger.

Source link