Tenstorrent has unveiled its next-generation Wormhole processor for AI workloads that guarantees to supply respectable efficiency at a low value. The corporate at present gives two add-on PCIe playing cards carrying one or two Wormhole processors in addition to TT-LoudBox, and TT-QuietBox workstations geared toward software program builders. The entire of at present’s launch is geared toward builders slightly than those that will deploy the Wormhole boards for his or her industrial workloads.
“It’s at all times rewarding to get extra of our merchandise into developer palms. Releasing growth programs with our Wormhole™ card helps builders scale up and work on multi-chip AI software program.” stated Jim Keller, CEO of Tenstorrent. “Along with this launch, we’re excited that the tape-out and power-on for our second technology, Blackhole, goes very nicely.”
Every Wormhole processor packs 72 Tensix cores (that includes 5 RISC-V cores supporting numerous knowledge codecs) with 108 MB of SRAM to ship 262 FP8 TFLOPS at 1 GHz at 160W thermal design energy. A single-chip Wormhole n150 card carries 12 GB of GDDR6 reminiscence that includes a 288 GB/s bandwidth.
Wormhole processors supply versatile scalability to satisfy the various wants of workloads. In a normal workstation setup with 4 Wormhole n300 playing cards, the processors can merge to perform as a single unit, showing as a unified, in depth community of Tensix cores to the software program. This configuration permits the accelerators to both work on the identical workload, be divided amongst 4 builders or run as much as eight distinct AI fashions concurrently. A vital function of this scalability is that it operates natively with out the necessity for virtualization. In knowledge heart environments, Wormhole processors will scale each inside one machine utilizing PCIe or exterior of a single machine utilizing Ethernet.
From efficiency standpoint, Tenstorrent’s single-chip Wormhole n150 card (72 Tensix cores at 1 GHz, 108 MB SRAM, 12 GB GDDR6 at 288 GB/s) is able to 262 FP8 TFLOPS at 160W, whereas the dual-chip Wormhole n300 board (128 Tensix cores at 1 GHz, 192 MB SRAM, aggregated 24 GB GDDR6 at 576 GB/s) can supply as much as 466 FP8 TFLOPS at 300W (based on Tom’s {Hardware}).
To place that 466 FP8 TFLOPS at 300W quantity into context, let’s examine it to what AI market chief Nvidia has to supply at this thermal design energy. Nvidia’s A100 doesn’t assist FP8, nevertheless it does assist INT8 and its peak efficiency is 624 TOPS (1,248 TOPS with sparsity). Against this, Nvidia’s H100 helps FP8 and its peak efficiency is huge 1,670 TFLOPS (3,341 TFLOPS with sparsity) at 300W, which is an enormous distinction from Tenstorrent’s Wormhole n300.
There’s a huge catch although. Tenstorrent’s Wormhole n150 is obtainable for $999, whereas n300 is offered for $1,399. Against this, one Nvidia H100 card can retail for $30,000, relying on portions. After all, we have no idea whether or not 4 or eight Wormhole processors can certainly ship the efficiency of a single H300, although they may accomplish that at 600W or 1200W TDP, respectively.
Along with playing cards, Tenstorrent gives builders pre-built workstations with 4 n300 playing cards contained in the inexpensive Xeon-based TT-LoudBox with energetic cooling and a premium EPYC-powered TT-QuietBox with liquid cooling.
Sources: Tenstorrent, Tom’s {Hardware}