Os resultados do benchmark MLCommons mostram os ganhos da Intel AI

MLCommons benchmark results show Intel AI gains

MLCommons has published the results of its industry-leading AI performance benchmark, MLPerf Training 3.0, in which both the Habana Gaudi 2 deep learning accelerator and 4th Gen Intel Xeon Scalable processor delivered impressive training results.

4th generation Intel Xeon Scalable processor, codenamed Sapphire Rapids. (Credit: Intel Corporation)

“The latest MLPerf results published by MLCommons validate the TCO value that Intel Xeon processors and Intel Gaudi deep learning accelerators provide to AI customers,” said Sandra Rivera, Intel executive vice president and general manager of Data Center and AI Group.

She added: “Xeon's built-in accelerators make it an ideal solution for running volume AI workloads on general-purpose processors, while Gaudi delivers competitive performance for large language models and generative AI. Intel’s scalable systems with optimized, easy-to-program open software lower the barrier for customers and partners to deploy a wide range of AI-based solutions across the data center, from the cloud to the intelligent edge.”

Why does this matter

The current industry narrative is that generative AI and large language models (LLMs) can only run on Nvidia GPUs. New data shows that Intel's portfolio of AI solutions offers competitive and compelling options for customers looking to break free from closed ecosystems that limit efficiency and scale.

The latest MLPerf Training 3.0 results highlight the performance of Intel products across a range of deep learning models. The maturity of software and training systems based on Gaudi2 has been demonstrated at scale in the large language model, GPT-3. Gaudi2 is one of only two semiconductor solutions to submit performance results to the GPT-3 LLM training benchmark.

Gaudi2 also offers substantially competitive cost advantages to customers in both server and systems costs. The accelerator's MLPerf-validated performance on GPT-3, computer vision and natural language models, plus upcoming software advancements make Gaudi2 an extremely attractive price/performance alternative to Nvidia's H100.

On the CPU side, the deep learning training performance of 4th generation Xeon processors with Intel AI engines demonstrated that customers can build with Xeon-based servers a single universal AI system for data preprocessing, training model and deployment to deliver the right mix of AI performance, efficiency, accuracy, and scalability.

The results of Habana Gaudi2
Training generative AI and large language models requires clusters of servers to meet enormous computing requirements at scale. These MLPerf results provide tangible validation of Habana Gaudi2's excellent performance and efficient scalability on the most demanding model tested, the 175 billion parameter GPT-3.

Highlights of the results:

  • Gaudi2 showed an impressive training time on GPT-3 1 :311 minutes on 384 accelerators.
  • 95% near-linear scaling from 256 to 384 accelerators in the GPT-3 model.
  • Excellent training results in computer vision — ResNet-50 8 accelerators and Unet3D 8 accelerators — and natural language processing models — BERT 8 ​​and 64 accelerators.
  • Performance increases of 10% and 4% respectively for the BERT and ResNet models compared to the November presentation, evidence of the increasing maturity of the Gaudi2 software.
  • Gaudi2 results were shipped “out of the box,” meaning customers can achieve comparable performance results when implementing Gaudi2 on-premises or in the cloud.
The results of 4th generation Xeon processors
As the only CPU featured among numerous alternative solutions, MLPerf results prove that Intel Xeon processors provide enterprises with out-of-the-box capabilities to deploy AI in general-purpose systems and avoid the cost and complexity of introducing dedicated AI systems.

For a small number of customers training large models intermittently from scratch, they can use general-purpose CPUs and often on the Intel-based servers they are already deploying to run their business. However, most will use pre-trained models and fine-tune them with their own smaller, curated datasets. Intel previously released results demonstrating that this fine-tuning can be accomplished in just minutes using Intel AI software and industry-standard open source software.

MLPerf results highlights:

  • In the closed split, 4th generation Xeons could train BERT and ResNet-50 models in less than 50 minutes. (47.93 minutes) and less than 90 minutes. (88.17 minutes), respectively.
  • With BERT in the open split, the results show that Xeon was able to train the model in about 30 minutes (31.06 minutes) when scaling up to 16 nodes.
  • For the larger RetinaNet model, the Xeon was able to achieve a time of 232 minutes. on 16 nodes, allowing customers the flexibility to use off-peak Xeon cycles to train their models during the morning, during lunch, or overnight.
  • The 4th Gen Xeon with Intel Advanced Matrix Extensions (Intel® AMX) delivers significant out-of-the-box performance improvements that span multiple frameworks, complete data science tools, and a broad ecosystem of intelligent solutions.

Conteúdo Relacionado

Primeiro MPU single-core com interface de câmera MIPI CSI-2 e áudio
O mercado embarcado tem uma necessidade de soluções de...
O que são Sistemas Globais de Navegação por Satélite (GNSS) e como são usados?
Determinar uma localização precisa é necessário em várias indústrias...
What is the role of automotive sensors in modern vehicles?
A network of sensors is embedded in every vehicle,...
How to choose an e-bike controller
The motor controller is one of the most important...
How to Troubleshoot Common ESP32-CAM Problems
ESP32-CAM is a compact camera module that combines the...
A guide to USB standards from 1.0 to USB4
A evolução dos padrões USB foi fundamental para moldar...
Schurter aprimora série de seletores de tensão com revestimento prateado
A SCHURTER anuncia um aprimoramento para sua conhecida série...
A interface serial PCI fornece conectividade confiável em ambientes extremos
A Sealevel Systems anuncia o lançamento da Interface Serial...
STMicroelectronics expande portfólio de conversão de energia com diodos Trench Schottky de 100 V
A STMicroelectronics introduziu Diodos retificadores Schottky de trincheira de...
O conversor GaN de 50 W da STMicroelectronics permite projetos de energia de alta eficiência
O novo VIPerGaN50 da STMicroelectronics simplifica a construção de...
Deepfakes de IA: uma ameaça à autenticação biométrica facial
Vídeos deep fake ao vivo cada vez mais sofisticados...
Samsung e Red Hat farão parceria em software de memória de próxima geração
A Samsung Electronics, fornecedora de tecnologia de memória avançada,...
Desenvolvimento de produtos orientado por IA: da ideação à prototipagem
Aprenda como os processos baseados em IA aprimoram o...
Explore Stay Insulator – safety, installation and maintenance
You've probably seen stand an insulator sit on power...
Projeto de Circuito Digital: Por que as portas NAND e NOR são universais?
Você pode estar se perguntando por que Portas NAND...
Back to blog

Leave a comment

Please note, comments need to be approved before they are published.