LLM Training and Inference with Intel Gaudi 2 AI Accelerators

by sailpleaseon 1/6/2024, 2:37 AMwith 8 comments

by sailpleaseon 1/6/2024, 2:37 AM

"Based on these public on-demand quoted prices from AWS and IDC, we found that the IntelR GaudiR 2 has the best training performance-per-dollar, with an average advantage of 4.8x vs the NVIDIA A100-80GB, 4.2x vs. the NVIDIA A100-40GB, and 5.19x vs. the NVIDIA H100"

by remexreon 1/6/2024, 5:59 PM

Kinda funny that instead of NVLink, they're just using (presumably standard) 100GbE as their connector/protocol; wonder if this also lets you wire up larger and more complex topologies of these cards across servers using normal 100GbE switches