Monday, July 11th 2022

NVIDIA PrefixRL Model Designs 25% Smaller Circuits, Making GPUs More Efficient

Jul 11th, 2022 05:13 Discuss (43 Comments)

When designing integrated circuits, engineers aim to produce an efficient design that is easier to manufacture. If they manage to keep the circuit size down, the economics of manufacturing that circuit is also going down. NVIDIA has posted on its technical blog a technique where the company uses an artificial intelligence model called PrefixRL. Using deep reinforcement learning, NVIDIA uses the PrefixRL model to outperform traditional EDA (Electronics Design Automation) tools from major vendors such as Cadence, Synopsys, or Siemens/Mentor. EDA vendors usually implement their in-house AI solution to silicon placement and routing (PnR); however, NVIDIA's PrefixRL solution seems to be doing wonders in the company's workflow.

Creating a deep reinforcement learning model that aims to keep the latency the same as the EDA PnR attempt while achieving a smaller die area is the goal of PrefixRL. According to the technical blog, the latest Hopper H100 GPU architecture uses 13,000 instances of arithmetic circuits that the PrefixRL AI model designed. NVIDIA produced a model that outputs a 25% smaller circuit than comparable EDA output. This is all while achieving similar or better latency. Below, you can compare a 64-bit adder design made by PrefixRL and the same design made by an industry-leading EDA tool.

Training such a model is a compute-intensive task. NVIDIA reports that the training to design a 64-bit adder circuit took 256 CPU cores for each GPU and 32,000 GPU hours. The company developed Raptor, an in-house distributed reinforcement learning platform that takes unique advantage of NVIDIA hardware for this kind of industrial reinforcement learning, which you can see below and how it operates. Overall, the system is pretty complex and requires a lot of hardware and input; however, the results pay off with smaller and more efficient GPUs.

Source: NVIDIA

Add your own comment

43 Comments on NVIDIA PrefixRL Model Designs 25% Smaller Circuits, Making GPUs More Efficient

bug

It's interesting that 25% smaller yields a dis that is not that smaller, optically. Sure, it's basic geometry, but I bet most people will be surprised seeing those dies side by side.

Richards

This will help get higher clock speeds

ModEl4

I wonder who is going to use similar A.I. enhanced method next (Apple, Intel, Qualcomm, AMD?) and from where (in-house solution, Cadence, Synopsys, etc?)

bug

ModEl4I wonder who is going to use similar A.I. enhanced method next (Apple, Intel, Qualcomm, AMD?) and from where (in-house solution, Cadence, Synopsys, etc?)

I'm pretty sure everyone already does. It just happens Nvidia employs a better model, for the time being.
In general, the problem of dividing a given surface into small pieces in an optimal way is NP-complete (en.wikipedia.org/wiki/NP-completeness), thus not something you want to/can tackle without some form of approximation or heuristic.

bonehead123

RichardsThis will help get higher clock speeds

And don't forget the most important aspect (for Ngreedia).....

HIGHER PRICES !

Yes I know R&D aint cheap, but this all sounds like just ANUTHA way to justify keeping GPU prices & profits at scalper/pandemic levels, which they have become addicted to like crackheads & their rocks... they always need/want moar and can't quit even if they wanted to....

Nanochip

Raptor eh? I thought I also heard a Raptor is coming to a Lake near you in only a few short months time.

PapaTaipei

Good shit. Now release the new GPUs already.

bug

PapaTaipeiGood shit. Now release the new GPUs already.

This is for Hopper. Hopper is for datacenter.

GuiltySpark

Didn't get why reinforcement learning is necessary. Can't they compute a score directly from the result?

#10

bug

GuiltySparkDidn't get why reinforcement learning is necessary. Can't they compute a score directly from the result?

It probably influences the path taken (i.e. time spent) to get to the result.
Now that you've mentioned it, I'm starting to wonder why is enforcement learning a better fit than genetic algorithms. Start with a solution and mutate it till it gets significantly better. There's obviously an explanation for that (I'm not smarter than an entire department of engineers), I just don't know it.

#11

Valantar

bugIt's interesting that 25% smaller yields a dis that is not that smaller, optically. Sure, it's basic geometry, but I bet most people will be surprised seeing those dies side by side.

That's mainly because they're centre aligned. If they were corner aligned instead, the difference would be a lot more intuitive, even if it still doesn't "look like 25% less".

#12

Fourstaff

Ampere has >50bln transistors, any efficiency gains will be very welcome. I will not be surprised if this translates to a profitability margin gain vs competition.

#13

bug

ValantarThat's mainly because they're centre aligned. If they were corner aligned instead, the difference would be a lot more intuitive, even if it still doesn't "look like 25% less".

It's not because of alignment, it's because when the surface is 25% smaller, the sides are "only" 13% smaller themselves.
A case of intuition playing tricks on us.

FourstaffAmpere has >50bln transistors, any efficiency gains will be very welcome. I will not be surprised if this translates to a profitability margin gain vs competition.

If they're smart, they'll just split the difference.

#14

Valantar

bugIt's not because of alignment, it's because when the surface is 25% smaller, the sides are "only" 13% smaller themselves.
A case of intuition playing tricks on us.

That's true, but center alignment makes that look like even less by distributing the shrinkage visually.

#15

ModEl4

bugI'm pretty sure everyone already does. It just happens Nvidia employs a better model, for the time being.
In general, the problem of dividing a given surface into small pieces in an optimal way is NP-complete (en.wikipedia.org/wiki/NP-completeness), thus not something you want to/can tackle without some form of approximation or heuristic.

Does any other Apple,Intel, Qualcomm, AMD chip used similar A.I. enhanced method? (deep reinforcement learning)
I don't know, I just read the source (Nvidia):
«to the best of our knowledge, this is the first method using a deep reinforcement learning agent to design arithmetic circuits.»

#16

ppn

Yeah we can see that even before they had a much better density, so that AMD on 5nm may end up slighlty better than nvidia on 7nm
AMD on 7nm - 51,3 Mtr/mm2
Nvidia on 7nm - 65,6 Mtr/mm2 4/5nm - 98,2 Mtr/mm2

#17

GreiverBlade

NanochipRaptor eh? I thought I also heard a Raptor is coming to a Lake near you in only a few short months time.

Raptor? Raptor-Lake? eh?

i heard Raptor from AMD Gaming Evolved, oh wait ... no it was Raptr (and died in 2017 :laugh: )

oh, well, i guess a O more does not hurt :roll:

#18

bug

ModEl4Does any other Apple,Intel, Qualcomm, AMD chip used similar A.I. enhanced method? (deep reinforcement learning)
I don't know, I just read the source (Nvidia):
«to the best of our knowledge, this is the first method using a deep reinforcement learning agent to design arithmetic circuits.»

Hard to say. At what point does reinforcement learning become "deep"? It's possible others also use reinforcement learning (at least in some aspect), but treat it like a trade secret and don't talk about it. It's also possible Nvidia is the first that made this work.
What I meant to say is that I am pretty certain, in some form or another, others also use some AI techniques in their product pipelines.

#19

DeathtoGnomes

When I first started reading this, I thought: nice smaller and shorter video cards, back down to 2 slots maybe?

Then I started to function I drank my coffee...:banghead:

#20

zlobby

bugI'm pretty sure everyone already does. It just happens Nvidia employs a better model, for the time being.
In general, the problem of dividing a given surface into small pieces in an optimal way is NP-complete (en.wikipedia.org/wiki/NP-completeness), thus not something you want to/can tackle without some form of approximation or heuristic.

Allegedly better. Does smaller really mean better efficiency, performance and less bugs?

#21

bug

zlobbyAllegedly better. Does smaller really mean better efficiency, performance and less bugs?

Smaller dies mean more dies per waffer. Size is the main factor that dictates price. In this particular case it seems it also means a little better performance, since it says "achieving similar or better latency".

#22

Punkenjoy

bugI'm pretty sure everyone already does. It just happens Nvidia employs a better model, for the time being.
In general, the problem of dividing a given surface into small pieces in an optimal way is NP-complete (en.wikipedia.org/wiki/NP-completeness), thus not something you want to/can tackle without some form of approximation or heuristic.

I know AMD does it, and like you, pretty sure Intel and other manufacturer use AI too. But what make you think Nvidia use a better model ?

#23

Valantar

zlobbyAllegedly better. Does smaller really mean better efficiency, performance and less bugs?

bugSmaller dies mean more dies per waffer. Size is the main factor that dictates price. In this particular case it seems it also means a little better performance, since it says "achieving similar or better latency".

This. Also, smaller dice are generally more efficient due to shorter internal wire lengths. Not a huge difference, but given how many km of wiring a single modern chip contains, it adds up. Less bugs is... well, essentially random. Is there a possibility a chip with AI-designed functional blocks has more bugs than a traditionally designed one? Sure. But that might also not be the case. Better performance depends on a ton of factors, but generally, more compact designs perform better until thermal density becomes an issue.

#24

bug

PunkenjoyI know AMD does it, and like you, pretty sure Intel and other manufacturer use AI too. But what make you think Nvidia use a better model ?

Better than what? It's better than the classic attempts by a measurable 25%. Better than the perfect design? Probably not.

#25

Valantar

Tbh, I don't think I've seen any other major chipmaker confirm that they're actually using AI-generated layouts in high volume, large size chips before. I might obviously be wrong, but all I can remember seeing is various reports of this being tested. That also clearly leaves the possibility of it being used without this being public.

Add your own comment

NVIDIA PrefixRL Model Designs 25% Smaller Circuits, Making GPUs More Efficient

43 Comments on NVIDIA PrefixRL Model Designs 25% Smaller Circuits, Making GPUs More Efficient

Latest GPU Drivers

New Forum Posts

Popular Reviews

Controversial News Posts

NVIDIA PrefixRL Model Designs 25% Smaller Circuits, Making GPUs More Efficient

Related News

43 Comments on NVIDIA PrefixRL Model Designs 25% Smaller Circuits, Making GPUs More Efficient

Latest GPU Drivers

New Forum Posts

Popular Reviews

Controversial News Posts