NVIDIA PrefixRL Model Designs 25% Smaller Circuits, Making GPUs More Efficient

AleksandarK · Jul 11, 2022

When designing integrated circuits, engineers aim to produce an efficient design that is easier to manufacture. If they manage to keep the circuit size down, the economics of manufacturing that circuit is also going down. NVIDIA has posted on its technical blog a technique where the company uses an artificial intelligence model called PrefixRL. Using deep reinforcement learning, NVIDIA uses the PrefixRL model to outperform traditional EDA (Electronics Design Automation) tools from major vendors such as Cadence, Synopsys, or Siemens/Mentor. EDA vendors usually implement their in-house AI solution to silicon placement and routing (PnR); however, NVIDIA's PrefixRL solution seems to be doing wonders in the company's workflow.

Creating a deep reinforcement learning model that aims to keep the latency the same as the EDA PnR attempt while achieving a smaller die area is the goal of PrefixRL. According to the technical blog, the latest Hopper H100 GPU architecture uses 13,000 instances of arithmetic circuits that the PrefixRL AI model designed. NVIDIA produced a model that outputs a 25% smaller circuit than comparable EDA output. This is all while achieving similar or better latency. Below, you can compare a 64-bit adder design made by PrefixRL and the same design made by an industry-leading EDA tool.

Training such a model is a compute-intensive task. NVIDIA reports that the training to design a 64-bit adder circuit took 256 CPU cores for each GPU and 32,000 GPU hours. The company developed Raptor, an in-house distributed reinforcement learning platform that takes unique advantage of NVIDIA hardware for this kind of industrial reinforcement learning, which you can see below and how it operates. Overall, the system is pretty complex and requires a lot of hardware and input; however, the results pay off with smaller and more efficient GPUs.

View at TechPowerUp Main Site | Source

bug · Jul 11, 2022

It's interesting that 25% smaller yields a dis that is not that smaller, optically. Sure, it's basic geometry, but I bet most people will be surprised seeing those dies side by side.

Richards · Jul 11, 2022

This will help get higher clock speeds

ModEl4 · Jul 11, 2022

I wonder who is going to use similar A.I. enhanced method next (Apple, Intel, Qualcomm, AMD?) and from where (in-house solution, Cadence, Synopsys, etc?)

bug · Jul 11, 2022

ModEl4 said:
I wonder who is going to use similar A.I. enhanced method next (Apple, Intel, Qualcomm, AMD?) and from where (in-house solution, Cadence, Synopsys, etc?)

I'm pretty sure everyone already does. It just happens Nvidia employs a better model, for the time being.
In general, the problem of dividing a given surface into small pieces in an optimal way is NP-complete (https://en.wikipedia.org/wiki/NP-completeness), thus not something you want to/can tackle without some form of approximation or heuristic.

bonehead123 · Jul 11, 2022

Richards said:
This will help get higher clock speeds

And don't forget the most important aspect (for Ngreedia).....

HIGHER PRICES !

Yes I know R&D aint cheap, but this all sounds like just ANUTHA way to justify keeping GPU prices & profits at scalper/pandemic levels, which they have become addicted to like crackheads & their rocks... they always need/want moar and can't quit even if they wanted to....

Nanochip · Jul 11, 2022

Raptor eh? I thought I also heard a Raptor is coming to a Lake near you in only a few short months time.

PapaTaipei · Jul 11, 2022

Good shit. Now release the new GPUs already.

bug · Jul 11, 2022

PapaTaipei said:
Good shit. Now release the new GPUs already.

This is for Hopper. Hopper is for datacenter.

GuiltySpark · Jul 11, 2022

Didn't get why reinforcement learning is necessary. Can't they compute a score directly from the result?

bug · Jul 11, 2022

GuiltySpark said:
Didn't get why reinforcement learning is necessary. Can't they compute a score directly from the result?

It probably influences the path taken (i.e. time spent) to get to the result.
Now that you've mentioned it, I'm starting to wonder why is enforcement learning a better fit than genetic algorithms. Start with a solution and mutate it till it gets significantly better. There's obviously an explanation for that (I'm not smarter than an entire department of engineers), I just don't know it.

Valantar · Jul 11, 2022

bug said:
It's interesting that 25% smaller yields a dis that is not that smaller, optically. Sure, it's basic geometry, but I bet most people will be surprised seeing those dies side by side.

That's mainly because they're centre aligned. If they were corner aligned instead, the difference would be a lot more intuitive, even if it still doesn't "look like 25% less".

Fourstaff · Jul 11, 2022

Ampere has >50bln transistors, any efficiency gains will be very welcome. I will not be surprised if this translates to a profitability margin gain vs competition.

bug · Jul 11, 2022

Valantar said:
That's mainly because they're centre aligned. If they were corner aligned instead, the difference would be a lot more intuitive, even if it still doesn't "look like 25% less".
View attachment 254382

It's not because of alignment, it's because when the surface is 25% smaller, the sides are "only" 13% smaller themselves.
A case of intuition playing tricks on us.

Fourstaff said:
Ampere has >50bln transistors, any efficiency gains will be very welcome. I will not be surprised if this translates to a profitability margin gain vs competition.

If they're smart, they'll just split the difference.

Valantar · Jul 11, 2022

bug said:
It's not because of alignment, it's because when the surface is 25% smaller, the sides are "only" 13% smaller themselves.
A case of intuition playing tricks on us.

That's true, but center alignment makes that look like even less by distributing the shrinkage visually.

ModEl4 · Jul 11, 2022

bug said:
I'm pretty sure everyone already does. It just happens Nvidia employs a better model, for the time being.
In general, the problem of dividing a given surface into small pieces in an optimal way is NP-complete (https://en.wikipedia.org/wiki/NP-completeness), thus not something you want to/can tackle without some form of approximation or heuristic.

Does any other Apple,Intel, Qualcomm, AMD chip used similar A.I. enhanced method? (deep reinforcement learning)
I don't know, I just read the source (Nvidia):
«to the best of our knowledge, this is the first method using a deep reinforcement learning agent to design arithmetic circuits.»

ppn · Jul 11, 2022

Yeah we can see that even before they had a much better density, so that AMD on 5nm may end up slighlty better than nvidia on 7nm
AMD on 7nm - 51,3 Mtr/mm2
Nvidia on 7nm - 65,6 Mtr/mm2 4/5nm - 98,2 Mtr/mm2

GreiverBlade · Jul 11, 2022

Nanochip said:
Raptor eh? I thought I also heard a Raptor is coming to a Lake near you in only a few short months time.

Raptor? Raptor-Lake? eh?

i heard Raptor from AMD Gaming Evolved, oh wait ... no it was Raptr (and died in 2017 :laugh:

)

oh, well, i guess a O more does not hurt :roll:

bug · Jul 11, 2022

ModEl4 said:
Does any other Apple,Intel, Qualcomm, AMD chip used similar A.I. enhanced method? (deep reinforcement learning)
I don't know, I just read the source (Nvidia):
«to the best of our knowledge, this is the first method using a deep reinforcement learning agent to design arithmetic circuits.»

Hard to say. At what point does reinforcement learning become "deep"? It's possible others also use reinforcement learning (at least in some aspect), but treat it like a trade secret and don't talk about it. It's also possible Nvidia is the first that made this work.
What I meant to say is that I am pretty certain, in some form or another, others also use some AI techniques in their product pipelines.

DeathtoGnomes · Jul 11, 2022

When I first started reading this, I thought: nice smaller and shorter video cards, back down to 2 slots maybe?

Then I started to function I drank my coffee... :banghead:

zlobby · Jul 11, 2022

bug said:
I'm pretty sure everyone already does. It just happens Nvidia employs a better model, for the time being.
In general, the problem of dividing a given surface into small pieces in an optimal way is NP-complete (https://en.wikipedia.org/wiki/NP-completeness), thus not something you want to/can tackle without some form of approximation or heuristic.

Allegedly better. Does smaller really mean better efficiency, performance and less bugs?

bug · Jul 11, 2022

zlobby said:
Allegedly better. Does smaller really mean better efficiency, performance and less bugs?

Smaller dies mean more dies per waffer. Size is the main factor that dictates price. In this particular case it seems it also means a little better performance, since it says "achieving similar or better latency".

Punkenjoy · Jul 11, 2022

bug said:
I'm pretty sure everyone already does. It just happens Nvidia employs a better model, for the time being.
In general, the problem of dividing a given surface into small pieces in an optimal way is NP-complete (https://en.wikipedia.org/wiki/NP-completeness), thus not something you want to/can tackle without some form of approximation or heuristic.

I know AMD does it, and like you, pretty sure Intel and other manufacturer use AI too. But what make you think Nvidia use a better model ?

Valantar · Jul 11, 2022

zlobby said:
Allegedly better. Does smaller really mean better efficiency, performance and less bugs?

bug said:
Smaller dies mean more dies per waffer. Size is the main factor that dictates price. In this particular case it seems it also means a little better performance, since it says "achieving similar or better latency".

This. Also, smaller dice are generally more efficient due to shorter internal wire lengths. Not a huge difference, but given how many km of wiring a single modern chip contains, it adds up. Less bugs is... well, essentially random. Is there a possibility a chip with AI-designed functional blocks has more bugs than a traditionally designed one? Sure. But that might also not be the case. Better performance depends on a ton of factors, but generally, more compact designs perform better until thermal density becomes an issue.

bug · Jul 11, 2022

Punkenjoy said:
I know AMD does it, and like you, pretty sure Intel and other manufacturer use AI too. But what make you think Nvidia use a better model ?

Better than what? It's better than the classic attempts by a measurable 25%. Better than the perfect design? Probably not.

Processor	Intel i5-12600k
Motherboard	Asus H670 TUF
Cooling	Arctic Freezer 34
Memory	2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s)	EVGA GTX 1060 SC
Storage	500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s)	Dell U3219Q + HP ZR24w
Case	Raijintek Thetis
Audio Device(s)	Audioquest Dragonfly Red :D
Power Supply	Seasonic 620W M12
Mouse	Logitech G502 Proteus Core
Keyboard	G.Skill KM780R
Software	Arch Linux + Win10

Processor	Intel i5-12600k
Motherboard	Asus H670 TUF
Cooling	Arctic Freezer 34
Memory	2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s)	EVGA GTX 1060 SC
Storage	500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s)	Dell U3219Q + HP ZR24w
Case	Raijintek Thetis
Audio Device(s)	Audioquest Dragonfly Red :D
Power Supply	Seasonic 620W M12
Mouse	Logitech G502 Proteus Core
Keyboard	G.Skill KM780R
Software	Arch Linux + Win10

System Name	The Little One
Processor	i5-11320H @4.4GHZ
Motherboard	AZW SEI
Cooling	Fan w/heat pipes + side & rear vents
Memory	64GB Crucial DDR4-3200 (2x 32GB)
Video Card(s)	Iris XE
Storage	WD Black SN850X 4TB m.2, Seagate 2TB SSD + SN850 4TB x2 in an external enclosure
Display(s)	2x Samsung 43" & 2x 32"
Case	Practically identical to a mac mini, just purrtier in slate blue, & with 3x usb ports on the front !
Audio Device(s)	Yamaha ATS-1060 Bluetooth Soundbar & Subwoofer
Power Supply	65w brick
Mouse	Logitech MX Master 2
Keyboard	Logitech G613 mechanical wireless
Software	Windows 10 pro 64 bit, with all the unnecessary background shitzu turned OFF !
Benchmark Scores	PDQ

Processor	Intel i5-12600k
Motherboard	Asus H670 TUF
Cooling	Arctic Freezer 34
Memory	2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s)	EVGA GTX 1060 SC
Storage	500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s)	Dell U3219Q + HP ZR24w
Case	Raijintek Thetis
Audio Device(s)	Audioquest Dragonfly Red :D
Power Supply	Seasonic 620W M12
Mouse	Logitech G502 Proteus Core
Keyboard	G.Skill KM780R
Software	Arch Linux + Win10

Processor	Ryzen 9 5900X
Motherboard	Gigabyte Aorus B550i pro ax
Cooling	Noctua NH-D15 chromax.black (with original fans)
Memory	G.skill 32GB 3200MHz
Video Card(s)	4060ti 16GB
Storage	1TB Samsung PM9A1, 256GB Toshiba pcie3 (from a laptop), 512GB crucial MX500, 2x 1TB Toshiba HDD 2.5
Display(s)	Mateview GT 34''
Case	Thermaltake the tower 100, 1 noctua NF-A14 ippc3000 on top, 2 x Arctic F14
Power Supply	Seasonic focus-GX 750W
Software	Windows 10, Ubuntu when needed.

NVIDIA PrefixRL Model Designs 25% Smaller Circuits, Making GPUs More Efficient

AleksandarK

News Editor

bug

Richards

ModEl4

bug

bonehead123

Nanochip

PapaTaipei

bug

GuiltySpark

bug

Valantar

Fourstaff

Moderator

bug

Valantar

ModEl4

ppn

GreiverBlade

bug

DeathtoGnomes

zlobby

bug

Punkenjoy

Valantar

bug

System Name	Hotbox
Processor	AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard	ASRock Phantom Gaming B550 ITX/ax
Cooling	LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory	32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s)	PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage	2TB Adata SX8200 Pro
Display(s)	Dell U2711 main, AOC 24P2C secondary
Case	SSUPD Meshlicious
Audio Device(s)	Optoma Nuforce μDAC 3
Power Supply	Corsair SF750 Platinum
Mouse	Logitech G603
Keyboard	Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software	Windows 10 Pro

System Name	Orange! // ItchyHands
Processor	3570K // 10400F
Motherboard	ASRock z77 Extreme4 // TUF Gaming B460M-Plus
Cooling	Stock // Stock
Memory	2x4Gb 1600Mhz CL9 Corsair XMS3 // 2x8Gb 3200 Mhz XPG D41
Video Card(s)	Sapphire Nitro+ RX 570 // Asus TUF RTX 2070
Storage	Samsung 840 250Gb // SX8200 480GB
Display(s)	LG 22EA53VQ // Philips 275M QHD
Case	NZXT Phantom 410 Black/Orange // Tecware Forge M
Power Supply	Corsair CXM500w // CM MWE 600w

System Name	main/SFFHTPCARGH!(tm)/Xiaomi Mi TV Stick/Samsung Galaxy S23/Ally
Processor	Ryzen 7 5800X3D/i7-3770/S905X/Snapdragon 8 Gen 2/Ryzen Z1 Extreme
Motherboard	MSI MAG B550 Tomahawk/HP SFF Q77 Express/uh?/uh?/Asus
Cooling	Enermax ETS-T50 Axe aRGB /basic HP HSF /errr.../oh! liqui..wait, no:sizable vapor chamber/a nice one
Memory	64gb Corsair Vengeance Pro 3600mhz DDR4/8gb DDR3 1600/2gb LPDDR3/8gb LPDDR5x 4200/16gb LPDDR5
Video Card(s)	Hellhound Spectral White RX 7900 XTX 24gb/GT 730/Mali 450MP5/Adreno 740/RDNA3 768 core
Storage	250gb870EVO/500gb860EVO/2tbSandisk/NVMe2tb+1tb/4tbextreme V2/1TB Arion/500gb/8gb/256gb/2tb SN770M
Display(s)	X58222 32" 2880x1620/32"FHDTV/273E3LHSB 27" 1920x1080/6.67"/AMOLED 2X panel FHD+120hz/FHD 120hz
Case	Cougar Panzer Max/Elite 8300 SFF/None/back/back-front Gorilla Glass Victus 2+ UAG Monarch Carbon
Audio Device(s)	Logi Z333/SB Audigy RX/HDMI/HDMI/Dolby Atmos/KZ x HBB PR2/Edifier STAX Spirit S3 & SamsungxAKG beans
Power Supply	Chieftec Proton BDF-1000C /HP 240w/12v 1.5A/4Smart Voltplug PD 30W/Asus USB-C 65W
Mouse	Speedlink Sovos Vertical-Asus ROG Spatha-Logi Ergo M575/Xiaomi XMRM-006/touch/touch
Keyboard	Endorfy Thock 75% <3/none/touch/virtual
VR HMD	Medion Erazer
Software	Win10 64/Win8.1 64/Android TV 8.1/Android 13/Win11 64
Benchmark Scores	bench...mark? i do leave mark on bench sometime, to remember which one is the most comfortable. :o

System Name	Dumbass
Processor	AMD Ryzen 7800X3D
Motherboard	ASUS TUF gaming B650
Cooling	Artic Liquid Freezer 2 - 420mm
Memory	G.Skill Sniper 32gb DDR5 6000
Video Card(s)	GreenTeam 4070 ti super 16gb
Storage	Samsung EVO 500gb & 1Tb, 2tb HDD, 500gb WD Black
Display(s)	1x Nixeus NX_EDG27, 2x Dell S2440L (16:9)
Case	Phanteks Enthoo Primo w/8 140mm SP Fans
Audio Device(s)	onboard (realtek?) - SPKRS:Logitech Z623 200w 2.1
Power Supply	Corsair HX1000i
Mouse	Steeseries Esports Wireless
Keyboard	Corsair K100
Software	windows 10 H
Benchmark Scores	https://i.imgur.com/aoz3vWY.jpg?2