MLCommons has published the results of its industry-leading AI performance benchmark, MLPerf Training 3.0, in which both the Habana Gaudi 2 deep learning accelerator and 4th Gen Intel Xeon Scalable processor delivered impressive training results.
“The latest MLPerf results published by MLCommons validate the TCO value that Intel Xeon processors and Intel Gaudi deep learning accelerators provide to AI customers,” said Sandra Rivera, Intel executive vice president and general manager of Data Center and AI Group.
She added: “Xeon's built-in accelerators make it an ideal solution for running volume AI workloads on general-purpose processors, while Gaudi delivers competitive performance for large language models and generative AI. Intel’s scalable systems with optimized, easy-to-program open software lower the barrier for customers and partners to deploy a wide range of AI-based solutions across the data center, from the cloud to the intelligent edge.”
Why does this matter
The current industry narrative is that generative AI and large language models (LLMs) can only run on Nvidia GPUs. New data shows that Intel's portfolio of AI solutions offers competitive and compelling options for customers looking to break free from closed ecosystems that limit efficiency and scale.
The latest MLPerf Training 3.0 results highlight the performance of Intel products across a range of deep learning models. The maturity of software and training systems based on Gaudi2 has been demonstrated at scale in the large language model, GPT-3. Gaudi2 is one of only two semiconductor solutions to submit performance results to the GPT-3 LLM training benchmark.
Gaudi2 also offers substantially competitive cost advantages to customers in both server and systems costs. The accelerator's MLPerf-validated performance on GPT-3, computer vision and natural language models, plus upcoming software advancements make Gaudi2 an extremely attractive price/performance alternative to Nvidia's H100.
On the CPU side, the deep learning training performance of 4th generation Xeon processors with Intel AI engines demonstrated that customers can build with Xeon-based servers a single universal AI system for data preprocessing, training model and deployment to deliver the right mix of AI performance, efficiency, accuracy, and scalability.
The results of Habana Gaudi2
Training generative AI and large language models requires clusters of servers to meet enormous computing requirements at scale. These MLPerf results provide tangible validation of Habana Gaudi2's excellent performance and efficient scalability on the most demanding model tested, the 175 billion parameter GPT-3.
Highlights of the results:
- Gaudi2 showed an impressive training time on GPT-3 1 :311 minutes on 384 accelerators.
- 95% near-linear scaling from 256 to 384 accelerators in the GPT-3 model.
- Excellent training results in computer vision — ResNet-50 8 accelerators and Unet3D 8 accelerators — and natural language processing models — BERT 8 and 64 accelerators.
- Performance increases of 10% and 4% respectively for the BERT and ResNet models compared to the November presentation, evidence of the increasing maturity of the Gaudi2 software.
- Gaudi2 results were shipped “out of the box,” meaning customers can achieve comparable performance results when implementing Gaudi2 on-premises or in the cloud.