Monday, July 11th 2022

NVIDIA PrefixRL Model Designs 25% Smaller Circuits, Making GPUs More Efficient

When designing integrated circuits, engineers aim to produce an efficient design that is easier to manufacture. If they manage to keep the circuit size down, the economics of manufacturing that circuit is also going down. NVIDIA has posted on its technical blog a technique where the company uses an artificial intelligence model called PrefixRL. Using deep reinforcement learning, NVIDIA uses the PrefixRL model to outperform traditional EDA (Electronics Design Automation) tools from major vendors such as Cadence, Synopsys, or Siemens/Mentor. EDA vendors usually implement their in-house AI solution to silicon placement and routing (PnR); however, NVIDIA's PrefixRL solution seems to be doing wonders in the company's workflow.

Creating a deep reinforcement learning model that aims to keep the latency the same as the EDA PnR attempt while achieving a smaller die area is the goal of PrefixRL. According to the technical blog, the latest Hopper H100 GPU architecture uses 13,000 instances of arithmetic circuits that the PrefixRL AI model designed. NVIDIA produced a model that outputs a 25% smaller circuit than comparable EDA output. This is all while achieving similar or better latency. Below, you can compare a 64-bit adder design made by PrefixRL and the same design made by an industry-leading EDA tool.
Training such a model is a compute-intensive task. NVIDIA reports that the training to design a 64-bit adder circuit took 256 CPU cores for each GPU and 32,000 GPU hours. The company developed Raptor, an in-house distributed reinforcement learning platform that takes unique advantage of NVIDIA hardware for this kind of industrial reinforcement learning, which you can see below and how it operates. Overall, the system is pretty complex and requires a lot of hardware and input; however, the results pay off with smaller and more efficient GPUs.
Source: NVIDIA
Add your own comment

43 Comments on NVIDIA PrefixRL Model Designs 25% Smaller Circuits, Making GPUs More Efficient

#1
bug
It's interesting that 25% smaller yields a dis that is not that smaller, optically. Sure, it's basic geometry, but I bet most people will be surprised seeing those dies side by side.
Posted on Reply
#2
Richards
This will help get higher clock speeds
Posted on Reply
#3
ModEl4
I wonder who is going to use similar A.I. enhanced method next (Apple, Intel, Qualcomm, AMD?) and from where (in-house solution, Cadence, Synopsys, etc?)
Posted on Reply
#4
bug
ModEl4I wonder who is going to use similar A.I. enhanced method next (Apple, Intel, Qualcomm, AMD?) and from where (in-house solution, Cadence, Synopsys, etc?)
I'm pretty sure everyone already does. It just happens Nvidia employs a better model, for the time being.
In general, the problem of dividing a given surface into small pieces in an optimal way is NP-complete (en.wikipedia.org/wiki/NP-completeness), thus not something you want to/can tackle without some form of approximation or heuristic.
Posted on Reply
#5
bonehead123
RichardsThis will help get higher clock speeds
And don't forget the most important aspect (for Ngreedia).....

HIGHER PRICES !


Yes I know R&D aint cheap, but this all sounds like just ANUTHA way to justify keeping GPU prices & profits at scalper/pandemic levels, which they have become addicted to like crackheads & their rocks... they always need/want moar and can't quit even if they wanted to....
Posted on Reply
#6
Nanochip
Raptor eh? I thought I also heard a Raptor is coming to a Lake near you in only a few short months time.
Posted on Reply
#7
PapaTaipei
Good shit. Now release the new GPUs already.
Posted on Reply
#8
bug
PapaTaipeiGood shit. Now release the new GPUs already.
This is for Hopper. Hopper is for datacenter.
Posted on Reply
#9
GuiltySpark
Didn't get why reinforcement learning is necessary. Can't they compute a score directly from the result?
Posted on Reply
#10
bug
GuiltySparkDidn't get why reinforcement learning is necessary. Can't they compute a score directly from the result?
It probably influences the path taken (i.e. time spent) to get to the result.
Now that you've mentioned it, I'm starting to wonder why is enforcement learning a better fit than genetic algorithms. Start with a solution and mutate it till it gets significantly better. There's obviously an explanation for that (I'm not smarter than an entire department of engineers), I just don't know it.
Posted on Reply
#11
Valantar
bugIt's interesting that 25% smaller yields a dis that is not that smaller, optically. Sure, it's basic geometry, but I bet most people will be surprised seeing those dies side by side.
That's mainly because they're centre aligned. If they were corner aligned instead, the difference would be a lot more intuitive, even if it still doesn't "look like 25% less".
Posted on Reply
#12
Fourstaff
Ampere has >50bln transistors, any efficiency gains will be very welcome. I will not be surprised if this translates to a profitability margin gain vs competition.
Posted on Reply
#13
bug
ValantarThat's mainly because they're centre aligned. If they were corner aligned instead, the difference would be a lot more intuitive, even if it still doesn't "look like 25% less".
It's not because of alignment, it's because when the surface is 25% smaller, the sides are "only" 13% smaller themselves.
A case of intuition playing tricks on us.
FourstaffAmpere has >50bln transistors, any efficiency gains will be very welcome. I will not be surprised if this translates to a profitability margin gain vs competition.
If they're smart, they'll just split the difference.
Posted on Reply
#14
Valantar
bugIt's not because of alignment, it's because when the surface is 25% smaller, the sides are "only" 13% smaller themselves.
A case of intuition playing tricks on us.
That's true, but center alignment makes that look like even less by distributing the shrinkage visually.
Posted on Reply
#15
ModEl4
bugI'm pretty sure everyone already does. It just happens Nvidia employs a better model, for the time being.
In general, the problem of dividing a given surface into small pieces in an optimal way is NP-complete (en.wikipedia.org/wiki/NP-completeness), thus not something you want to/can tackle without some form of approximation or heuristic.
Does any other Apple,Intel, Qualcomm, AMD chip used similar A.I. enhanced method? (deep reinforcement learning)
I don't know, I just read the source (Nvidia):
«to the best of our knowledge, this is the first method using a deep reinforcement learning agent to design arithmetic circuits.»
Posted on Reply
#16
ppn
Yeah we can see that even before they had a much better density, so that AMD on 5nm may end up slighlty better than nvidia on 7nm
AMD on 7nm - 51,3 Mtr/mm2
Nvidia on 7nm - 65,6 Mtr/mm2 4/5nm - 98,2 Mtr/mm2
Posted on Reply
#17
GreiverBlade
NanochipRaptor eh? I thought I also heard a Raptor is coming to a Lake near you in only a few short months time.
Raptor? Raptor-Lake? eh?

i heard Raptor from AMD Gaming Evolved, oh wait ... no it was Raptr (and died in 2017 :laugh: )

oh, well, i guess a O more does not hurt :roll:
Posted on Reply
#18
bug
ModEl4Does any other Apple,Intel, Qualcomm, AMD chip used similar A.I. enhanced method? (deep reinforcement learning)
I don't know, I just read the source (Nvidia):
«to the best of our knowledge, this is the first method using a deep reinforcement learning agent to design arithmetic circuits.»
Hard to say. At what point does reinforcement learning become "deep"? It's possible others also use reinforcement learning (at least in some aspect), but treat it like a trade secret and don't talk about it. It's also possible Nvidia is the first that made this work.
What I meant to say is that I am pretty certain, in some form or another, others also use some AI techniques in their product pipelines.
Posted on Reply
#19
DeathtoGnomes
When I first started reading this, I thought: nice smaller and shorter video cards, back down to 2 slots maybe?

Then I started to function I drank my coffee...:banghead:
Posted on Reply
#20
zlobby
bugI'm pretty sure everyone already does. It just happens Nvidia employs a better model, for the time being.
In general, the problem of dividing a given surface into small pieces in an optimal way is NP-complete (en.wikipedia.org/wiki/NP-completeness), thus not something you want to/can tackle without some form of approximation or heuristic.
Allegedly better. Does smaller really mean better efficiency, performance and less bugs?
Posted on Reply
#21
bug
zlobbyAllegedly better. Does smaller really mean better efficiency, performance and less bugs?
Smaller dies mean more dies per waffer. Size is the main factor that dictates price. In this particular case it seems it also means a little better performance, since it says "achieving similar or better latency".
Posted on Reply
#22
Punkenjoy
bugI'm pretty sure everyone already does. It just happens Nvidia employs a better model, for the time being.
In general, the problem of dividing a given surface into small pieces in an optimal way is NP-complete (en.wikipedia.org/wiki/NP-completeness), thus not something you want to/can tackle without some form of approximation or heuristic.
I know AMD does it, and like you, pretty sure Intel and other manufacturer use AI too. But what make you think Nvidia use a better model ?
Posted on Reply
#23
Valantar
zlobbyAllegedly better. Does smaller really mean better efficiency, performance and less bugs?
bugSmaller dies mean more dies per waffer. Size is the main factor that dictates price. In this particular case it seems it also means a little better performance, since it says "achieving similar or better latency".
This. Also, smaller dice are generally more efficient due to shorter internal wire lengths. Not a huge difference, but given how many km of wiring a single modern chip contains, it adds up. Less bugs is... well, essentially random. Is there a possibility a chip with AI-designed functional blocks has more bugs than a traditionally designed one? Sure. But that might also not be the case. Better performance depends on a ton of factors, but generally, more compact designs perform better until thermal density becomes an issue.
Posted on Reply
#24
bug
PunkenjoyI know AMD does it, and like you, pretty sure Intel and other manufacturer use AI too. But what make you think Nvidia use a better model ?
Better than what? It's better than the classic attempts by a measurable 25%. Better than the perfect design? Probably not.
Posted on Reply
#25
Valantar
Tbh, I don't think I've seen any other major chipmaker confirm that they're actually using AI-generated layouts in high volume, large size chips before. I might obviously be wrong, but all I can remember seeing is various reports of this being tested. That also clearly leaves the possibility of it being used without this being public.
Posted on Reply
Add your own comment
May 15th, 2024 17:50 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts