Wednesday, January 29th 2025

Reports Suggest DeepSeek Running Inference on Huawei Ascend 910C AI GPUs

Huawei's Ascend 910C AI chip was positioned as one of the better Chinese-developed alternatives to NVIDIA's H100 accelerator—reports from last autumn suggested that samples were being sent to highly important customers. The likes of Alibaba, Baidu, and Tencent have long relied on Team Green enterprise hardware for all manner of AI crunching, but trade sanctions have severely limited the supply and potency of Western-developed AI chips. NVIDIA's region-specific B20 "Blackwell" accelerator is due for release this year, but industry watchdogs reckon that the Ascend 910C AI GPU is a strong rival. The latest online rumblings have pointed to another major Huawei customer—DeepSeek—having Ascend silicon in their back pockets.

DeepSeek's recent unveiling of its R1 open-source large language model has disrupted international AI markets. A lot of press attention has focused on DeepSeek's CEO stating that his team can access up to 50,000 NVIDIA H100 GPUs, but many have not looked into the company's (alleged) pool of natively-made chips. Yesterday, Alexander Doria—an LLM enthusiast—shared an interesting insight: "I feel this should be a much bigger story—DeepSeek has trained on NVIDIA H800, but is running inference on the new home Chinese chips made by Huawei, the 910C." Experts believe that there will be a plentiful supply of Ascend 910C GPUs—estimates from last September posit that 70,000 chips (worth around $2 billion) were in the mass production pipeline. Additionally, industry whispers suggest that Huawei is already working on a—presumably, even more powerful—successor.
Sources: Alexander Doria Tweet, Wccftech
Add your own comment

10 Comments on Reports Suggest DeepSeek Running Inference on Huawei Ascend 910C AI GPUs

#1
ZoneDymo
"Alexander Doria—an LLM enthusiast" ah so literally the most boring person in the world
Posted on Reply
#2
Chaitanya
Shouldnt it be NPU instead of GPU since there is no graphics being processed with LLMs.
Posted on Reply
#3
tommo1982
Would their choce of the Nvidia assembly language for R1 be influenced by the hardware? I don’t know much about AI training in general. It's interesting when you read the training was done on Huawei's hardware, however.
Posted on Reply
#4
OSdevr
tommo1982Would their choce of the Nvidia assembly language for R1 be influenced by the hardware? I don’t know much about AI training in general. It's interesting when you read the training was done on Huawei's hardware, however.
Training done on Nvidia, inference done on Huawei.
Posted on Reply
#5
igormp
That claim is misleading. That picture that served as the source is only for the distilled, smaller models, some of which you can even run in your smartphone.
ChaitanyaShouldnt it be NPU instead of GPU since there is no graphics being processed with LLMs.
NPUs are solely meant for inference of quantized types (like INT4 and INT8), not that useful for training nor inference with higher precision weights.
tommo1982Would their choce of the Nvidia assembly language for R1 be influenced by the hardware? I don’t know much about AI training in general. It's interesting when you read the training was done on Huawei's hardware, however.
They used PTX only for a small portion of the pipeline. From their own paper:
Specifically, we employ customized PTX (Parallel Thread Execution) instructions and auto-tune the communication chunk size, which significantly reduces the use of the L2 cache and the interference to other SMs.
Most of the code was likely in CUDA.
Also keep in mind that OP is talking about inference, not training.
Posted on Reply
#6
tommo1982
OSdevrTraining done on Nvidia, inference done on Huawei.
igormpThat claim is misleading. That picture that served as the source is only for the distilled, smaller models, some of which you can even run in your smartphone.


NPUs are solely meant for inference of quantized types (like INT4 and INT8), not that useful for training nor inference with higher precision weights.

They used PTX only for a small portion of the pipeline. From their own paper:

Most of the code was likely in CUDA.
Also keep in mind that OP is talking about inference, not training.
I see. So they're limited to US hardware in some way.
Had to check what inference and training mean in regards to AI. I thought it's the same, or English as 2nd language plays a role :)
Posted on Reply
#7
OSdevr
tommo1982I see. So they're limited to US hardware in some way.
Had to check what inference and training mean in regards to AI. I thought it's the same, or English as 2nd language plays a role :)
Basically training is creating the AI model (based on an architectural blueprint) and inference is running an AI model that already exists.
Posted on Reply
#8
MacZ
One does reassure itself how he/she can.

Training is more intensive but you perform inference a lot more times.

The hardware functions required for neural networks are basic mathematical operations on huge matrices and vectors. This is not rocket science nor a secret sauce.

The performance difference between a 3/4 nm chip or a 7nm (as I understand the chinese are able to produce) does not seem that important.

And really, trying to have smaller nodes and large numbers of GPU is trying to brute force the problem.

DeepSeek just demonstrated that cleverness and algorithmic optimizations provides much more benefits than brute force.

That's why I think that the usage of nVidia chips is more related to the software stack associated with it (CUDA) than the hardware, as it allows better capacities (to easily add, remove, change) to develop the model. Inference once you have the model with the layers and the weights is straightforward and thus does not need such flexible developpement.
Posted on Reply
#9
kondamin
MacZDeepSeek just demonstrated that cleverness and algorithmic optimizations provides much more benefits than brute force.
.
that’s the joke, now that cat is out of the bag, everyone can use it and brute force matters again.
the market reaction on this is dumb and extremely short sighted.

problem with the bubble still remains how are they going to make money with it, it’s not 60 queries a month for $20
Posted on Reply
#10
MacZ
kondaminthat’s the joke, now that cat is out of the bag, everyone can use it and brute force matters again.
the market reaction on this is dumb and extremely short sighted.

problem with the bubble still remains how are they going to make money with it, it’s not 60 queries a month for $20
You have to pay for brute force, and that may prove difficult if you are competing with a free model you can run at home.

Hence all the media chatter trying to smear it or fearmonger over it.

Also, you can always find better optimizations, and that is not something that the US can somehow restrict China from developping.

And that it is China that did it while the US was stuck on brute force is quite a problem, I think.
Posted on Reply
Jan 30th, 2025 16:19 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts