Friday, April 28th 2023

NVIDIA H100 Compared to A100 for Training GPT Large Language Models

Apr 28th, 2023 01:59 Discuss (2 Comments)

NVIDIA's H100 has recently become available to use via Cloud Service Providers (CSPs), and it was only a matter of time before someone decided to benchmark its performance and compare it to the previous generation's A100 GPU. Today, thanks to the benchmarks of MosaicML, a startup company led by the ex-CEO of Nervana and GM of Artificial Intelligence (AI) at Intel, Naveen Rao, we have some comparison between these two GPUs with a fascinating insight about the cost factor. Firstly, MosaicML has taken Generative Pre-trained Transformer (GPT) models of various sizes and trained them using bfloat16 and FP8 Floating Point precision formats. All training occurred on CoreWeave cloud GPU instances.

Regarding performance, the NVIDIA H100 GPU achieved anywhere from 2.2x to 3.3x speedup. However, an interesting finding emerges when comparing the cost of running these GPUs in the cloud. CoreWeave prices the H100 SXM GPUs at $4.76/hr/GPU, while the A100 80 GB SXM gets $2.21/hr/GPU pricing. While the H100 is 2.2x more expensive, the performance makes it up, resulting in less time to train a model and a lower price for the training process. This inherently makes H100 more attractive for researchers and companies wanting to train Large Language Models (LLMs) and makes choosing the newer GPU more viable, despite the increased cost. Below, you can see tables of comparison between two GPUs in training time, speedup, and cost of training.

Source: MosaicML

Add your own comment

2 Comments on NVIDIA H100 Compared to A100 for Training GPT Large Language Models

kondamin

At those prices, isnit cheaper for researchers to buy the actual systems.

Scrizz

kondaminAt those prices, isnit cheaper for researchers to buy the actual systems.

Not really, the electricity costs, HVAC, and maintenance etc would surpass the price of the systems.
Having your own little data center is expensive.

Also forgot to mention the real estate.

NVIDIA H100 Compared to A100 for Training GPT Large Language Models

2 Comments on NVIDIA H100 Compared to A100 for Training GPT Large Language Models

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

NVIDIA H100 Compared to A100 for Training GPT Large Language Models

Related News

2 Comments on NVIDIA H100 Compared to A100 for Training GPT Large Language Models

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts