Monday, May 8th 2023

NVIDIA A800 China-Tailored GPU Performance within 70% of A100

The recent growth in demand for training Large Language Models (LLMs) like Generative Pre-trained Transformer (GPT) has sparked the interest of many companies to invest in GPU solutions that are used to train these models. However, countries like China have struggled with US sanctions, and NVIDIA has to create custom models that meet US export regulations. Carrying two GPUs, H800 and A800, they represent cut-down versions of the original H100 and A100, respectively. We reported about H800; however, it remained as mysterious as A800 that we are talking about today. Thanks to MyDrivers, we have information that the A800 GPU performance is within 70% of the regular A100.

The regular A100 GPU manages 9.7 TeraFLOPs of FP64, 19.5 TeraFLOPS of FP64 Tensor, and up to 624 BF16/FP16 TeraFLOPS with sparsity. A rough napkin math would suggest that 70% performance of the original (a 30% cut) would equal 6.8 TeraFLOPs of FP64 precision, 13.7 TeraFLOPs of FP64 Tensor, and 437 BF16/FP16 TeraFLOPs with sparsity. MyDrivers notes that A800 can be had for 100,000 Yuan, translating to about 14,462 USD at the time of writing. This is not the most capable GPU that Chinese companies can acquire, as H800 exists. However, we don't have any information about its performance for now.
Source: MyDrivers
Add your own comment

8 Comments on NVIDIA A800 China-Tailored GPU Performance within 70% of A100

#1
Verpal
Honestly, those better zh model like GLM are very efficient and will run on potato, A800 is probably more than enough for years to come.
Posted on Reply
#2
lemonadesoda
I dont see how this sanction is effective. The cut down models will be cheaper. Buy two instead of one. Get 70+70=140% performance unless the drivers force single implementations only, in which case, virtualize.
Posted on Reply
#3
Kohl Baas
lemonadesodaI dont see how this sanction is effective. The cut down models will be cheaper. Buy two instead of one. Get 70+70=140% performance unless the drivers force single implementations only, in which case, virtualize.
Standalone unit price is secondary. Power consumption and compute density is much more importand since you plan for years if not decades with servers built around these.

Having 2 of these instad of 1 in a blade that can fit like 4 means your whole array of these units will be ~1,5 times bigger. That also means more boards, more cooling, more power supplies too.
Posted on Reply
#4
lemonadesoda
I still dont see it as an effective sanction. A sanction needs to prohibit or scale back by an order of magnitude. BTW i dont agree with the sanction BUT if you are going to do it, do it properly.
Posted on Reply
#5
Fluffmeister
Yeah sadly these are more than capable of computing various ways of flattening Taiwan.
Posted on Reply
#6
kondamin
lemonadesodaI still dont see it as an effective sanction. A sanction needs to prohibit or scale back by an order of magnitude. BTW i dont agree with the sanction BUT if you are going to do it, do it properly.
You can do that with a party that can’t bite back.
in this case there are a lot of things china could do to make things really hard for the Biden administration

this boils down to nothing but a bit of an export tax.

that causes a whole lot of extra pollution, which is kinda funny for this admin to do.
Posted on Reply
#7
Vayra86
lemonadesodaI dont see how this sanction is effective. The cut down models will be cheaper. Buy two instead of one. Get 70+70=140% performance unless the drivers force single implementations only, in which case, virtualize.
Power is cost and if you have large scale datacenters you will want highly efficient stuff.
Posted on Reply
#8
lemonadesoda
Vayra86Power is cost and if you have large scale datacenters you will want highly efficient stuff.
Not sure where it says the H/A800 has poorer performance/watt than the H/A100 series.
Posted on Reply
Dec 21st, 2024 12:26 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts