• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA PrefixRL Model Designs 25% Smaller Circuits, Making GPUs More Efficient

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
2,299 (0.92/day)
When designing integrated circuits, engineers aim to produce an efficient design that is easier to manufacture. If they manage to keep the circuit size down, the economics of manufacturing that circuit is also going down. NVIDIA has posted on its technical blog a technique where the company uses an artificial intelligence model called PrefixRL. Using deep reinforcement learning, NVIDIA uses the PrefixRL model to outperform traditional EDA (Electronics Design Automation) tools from major vendors such as Cadence, Synopsys, or Siemens/Mentor. EDA vendors usually implement their in-house AI solution to silicon placement and routing (PnR); however, NVIDIA's PrefixRL solution seems to be doing wonders in the company's workflow.

Creating a deep reinforcement learning model that aims to keep the latency the same as the EDA PnR attempt while achieving a smaller die area is the goal of PrefixRL. According to the technical blog, the latest Hopper H100 GPU architecture uses 13,000 instances of arithmetic circuits that the PrefixRL AI model designed. NVIDIA produced a model that outputs a 25% smaller circuit than comparable EDA output. This is all while achieving similar or better latency. Below, you can compare a 64-bit adder design made by PrefixRL and the same design made by an industry-leading EDA tool.



Training such a model is a compute-intensive task. NVIDIA reports that the training to design a 64-bit adder circuit took 256 CPU cores for each GPU and 32,000 GPU hours. The company developed Raptor, an in-house distributed reinforcement learning platform that takes unique advantage of NVIDIA hardware for this kind of industrial reinforcement learning, which you can see below and how it operates. Overall, the system is pretty complex and requires a lot of hardware and input; however, the results pay off with smaller and more efficient GPUs.


View at TechPowerUp Main Site | Source
 

bug

Joined
May 22, 2015
Messages
13,344 (4.03/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
It's interesting that 25% smaller yields a dis that is not that smaller, optically. Sure, it's basic geometry, but I bet most people will be surprised seeing those dies side by side.
 
Joined
Oct 27, 2020
Messages
789 (0.60/day)
I wonder who is going to use similar A.I. enhanced method next (Apple, Intel, Qualcomm, AMD?) and from where (in-house solution, Cadence, Synopsys, etc?)
 

bug

Joined
May 22, 2015
Messages
13,344 (4.03/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
I wonder who is going to use similar A.I. enhanced method next (Apple, Intel, Qualcomm, AMD?) and from where (in-house solution, Cadence, Synopsys, etc?)
I'm pretty sure everyone already does. It just happens Nvidia employs a better model, for the time being.
In general, the problem of dividing a given surface into small pieces in an optimal way is NP-complete (https://en.wikipedia.org/wiki/NP-completeness), thus not something you want to/can tackle without some form of approximation or heuristic.
 
Joined
Oct 18, 2013
Messages
5,661 (1.46/day)
Location
Everywhere all the time all at once
System Name The Little One
Processor i5-11320H @4.4GHZ
Motherboard AZW SEI
Cooling Fan w/heat pipes + side & rear vents
Memory 64GB Crucial DDR4-3200 (2x 32GB)
Video Card(s) Iris XE
Storage WD Black SN850X 4TB m.2, Seagate 2TB SSD + SN850 4TB x2 in an external enclosure
Display(s) 2x Samsung 43" & 2x 32"
Case Practically identical to a mac mini, just purrtier in slate blue, & with 3x usb ports on the front !
Audio Device(s) Yamaha ATS-1060 Bluetooth Soundbar & Subwoofer
Power Supply 65w brick
Mouse Logitech MX Master 2
Keyboard Logitech G613 mechanical wireless
Software Windows 10 pro 64 bit, with all the unnecessary background shitzu turned OFF !
Benchmark Scores PDQ
This will help get higher clock speeds
And don't forget the most important aspect (for Ngreedia).....

HIGHER PRICES !

Yes I know R&D aint cheap, but this all sounds like just ANUTHA way to justify keeping GPU prices & profits at scalper/pandemic levels, which they have become addicted to like crackheads & their rocks... they always need/want moar and can't quit even if they wanted to....
 

bug

Joined
May 22, 2015
Messages
13,344 (4.03/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
Joined
Aug 31, 2021
Messages
21 (0.02/day)
Processor Ryzen 9 5900X
Motherboard Gigabyte Aorus B550i pro ax
Cooling Noctua NH-D15 chromax.black (with original fans)
Memory G.skill 32GB 3200MHz
Video Card(s) 4060ti 16GB
Storage 1TB Samsung PM9A1, 256GB Toshiba pcie3 (from a laptop), 512GB crucial MX500, 2x 1TB Toshiba HDD 2.5
Display(s) Mateview GT 34''
Case Thermaltake the tower 100, 1 noctua NF-A14 ippc3000 on top, 2 x Arctic F14
Power Supply Seasonic focus-GX 750W
Software Windows 10, Ubuntu when needed.
Didn't get why reinforcement learning is necessary. Can't they compute a score directly from the result?
 

bug

Joined
May 22, 2015
Messages
13,344 (4.03/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
Didn't get why reinforcement learning is necessary. Can't they compute a score directly from the result?
It probably influences the path taken (i.e. time spent) to get to the result.
Now that you've mentioned it, I'm starting to wonder why is enforcement learning a better fit than genetic algorithms. Start with a solution and mutate it till it gets significantly better. There's obviously an explanation for that (I'm not smarter than an entire department of engineers), I just don't know it.
 
Joined
May 2, 2017
Messages
7,762 (2.99/day)
Location
Back in Norway
System Name Hotbox
Processor AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard ASRock Phantom Gaming B550 ITX/ax
Cooling LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory 32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s) PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage 2TB Adata SX8200 Pro
Display(s) Dell U2711 main, AOC 24P2C secondary
Case SSUPD Meshlicious
Audio Device(s) Optoma Nuforce μDAC 3
Power Supply Corsair SF750 Platinum
Mouse Logitech G603
Keyboard Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software Windows 10 Pro
It's interesting that 25% smaller yields a dis that is not that smaller, optically. Sure, it's basic geometry, but I bet most people will be surprised seeing those dies side by side.
That's mainly because they're centre aligned. If they were corner aligned instead, the difference would be a lot more intuitive, even if it still doesn't "look like 25% less".
hopper.jpg
 

Fourstaff

Moderator
Staff member
Joined
Nov 29, 2009
Messages
10,051 (1.89/day)
Location
Home
System Name Orange! // ItchyHands
Processor 3570K // 10400F
Motherboard ASRock z77 Extreme4 // TUF Gaming B460M-Plus
Cooling Stock // Stock
Memory 2x4Gb 1600Mhz CL9 Corsair XMS3 // 2x8Gb 3200 Mhz XPG D41
Video Card(s) Sapphire Nitro+ RX 570 // Asus TUF RTX 2070
Storage Samsung 840 250Gb // SX8200 480GB
Display(s) LG 22EA53VQ // Philips 275M QHD
Case NZXT Phantom 410 Black/Orange // Tecware Forge M
Power Supply Corsair CXM500w // CM MWE 600w
Ampere has >50bln transistors, any efficiency gains will be very welcome. I will not be surprised if this translates to a profitability margin gain vs competition.
 

bug

Joined
May 22, 2015
Messages
13,344 (4.03/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
That's mainly because they're centre aligned. If they were corner aligned instead, the difference would be a lot more intuitive, even if it still doesn't "look like 25% less".
View attachment 254382
It's not because of alignment, it's because when the surface is 25% smaller, the sides are "only" 13% smaller themselves.
A case of intuition playing tricks on us.

Ampere has >50bln transistors, any efficiency gains will be very welcome. I will not be surprised if this translates to a profitability margin gain vs competition.
If they're smart, they'll just split the difference.
 
Joined
May 2, 2017
Messages
7,762 (2.99/day)
Location
Back in Norway
System Name Hotbox
Processor AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard ASRock Phantom Gaming B550 ITX/ax
Cooling LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory 32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s) PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage 2TB Adata SX8200 Pro
Display(s) Dell U2711 main, AOC 24P2C secondary
Case SSUPD Meshlicious
Audio Device(s) Optoma Nuforce μDAC 3
Power Supply Corsair SF750 Platinum
Mouse Logitech G603
Keyboard Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software Windows 10 Pro
It's not because of alignment, it's because when the surface is 25% smaller, the sides are "only" 13% smaller themselves.
A case of intuition playing tricks on us.
That's true, but center alignment makes that look like even less by distributing the shrinkage visually.
 
  • Like
Reactions: bug
Joined
Oct 27, 2020
Messages
789 (0.60/day)
I'm pretty sure everyone already does. It just happens Nvidia employs a better model, for the time being.
In general, the problem of dividing a given surface into small pieces in an optimal way is NP-complete (https://en.wikipedia.org/wiki/NP-completeness), thus not something you want to/can tackle without some form of approximation or heuristic.
Does any other Apple,Intel, Qualcomm, AMD chip used similar A.I. enhanced method? (deep reinforcement learning)
I don't know, I just read the source (Nvidia):
«to the best of our knowledge, this is the first method using a deep reinforcement learning agent to design arithmetic circuits.»
 

ppn

Joined
Aug 18, 2015
Messages
1,231 (0.38/day)
Yeah we can see that even before they had a much better density, so that AMD on 5nm may end up slighlty better than nvidia on 7nm
AMD on 7nm - 51,3 Mtr/mm2
Nvidia on 7nm - 65,6 Mtr/mm2 4/5nm - 98,2 Mtr/mm2
 
Joined
May 9, 2012
Messages
8,438 (1.91/day)
Location
Ovronnaz, Wallis, Switzerland
System Name main/SFFHTPCARGH!(tm)/Xiaomi Mi TV Stick/Samsung Galaxy S23/Ally
Processor Ryzen 7 5800X3D/i7-3770/S905X/Snapdragon 8 Gen 2/Ryzen Z1 Extreme
Motherboard MSI MAG B550 Tomahawk/HP SFF Q77 Express/uh?/uh?/Asus
Cooling Enermax ETS-T50 Axe aRGB /basic HP HSF /errr.../oh! liqui..wait, no:sizable vapor chamber/a nice one
Memory 64gb Corsair Vengeance Pro 3600mhz DDR4/8gb DDR3 1600/2gb LPDDR3/8gb LPDDR5x 4200/16gb LPDDR5
Video Card(s) Hellhound Spectral White RX 7900 XTX 24gb/GT 730/Mali 450MP5/Adreno 740/RDNA3 768 core
Storage 250gb870EVO/500gb860EVO/2tbSandisk/NVMe2tb+1tb/4tbextreme V2/1TB Arion/500gb/8gb/256gb/2tb SN770M
Display(s) X58222 32" 2880x1620/32"FHDTV/273E3LHSB 27" 1920x1080/6.67"/AMOLED 2X panel FHD+120hz/FHD 120hz
Case Cougar Panzer Max/Elite 8300 SFF/None/back/back-front Gorilla Glass Victus 2+ UAG Monarch Carbon
Audio Device(s) Logi Z333/SB Audigy RX/HDMI/HDMI/Dolby Atmos/KZ x HBB PR2/Edifier STAX Spirit S3 & SamsungxAKG beans
Power Supply Chieftec Proton BDF-1000C /HP 240w/12v 1.5A/4Smart Voltplug PD 30W/Asus USB-C 65W
Mouse Speedlink Sovos Vertical-Asus ROG Spatha-Logi Ergo M575/Xiaomi XMRM-006/touch/touch
Keyboard Endorfy Thock 75% <3/none/touch/virtual
VR HMD Medion Erazer
Software Win10 64/Win8.1 64/Android TV 8.1/Android 13/Win11 64
Benchmark Scores bench...mark? i do leave mark on bench sometime, to remember which one is the most comfortable. :o
Raptor eh? I thought I also heard a Raptor is coming to a Lake near you in only a few short months time.
Raptor? Raptor-Lake? eh?

i heard Raptor from AMD Gaming Evolved, oh wait ... no it was Raptr (and died in 2017 :laugh: )

oh, well, i guess a O more does not hurt :roll:
 

bug

Joined
May 22, 2015
Messages
13,344 (4.03/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
Does any other Apple,Intel, Qualcomm, AMD chip used similar A.I. enhanced method? (deep reinforcement learning)
I don't know, I just read the source (Nvidia):
«to the best of our knowledge, this is the first method using a deep reinforcement learning agent to design arithmetic circuits.»
Hard to say. At what point does reinforcement learning become "deep"? It's possible others also use reinforcement learning (at least in some aspect), but treat it like a trade secret and don't talk about it. It's also possible Nvidia is the first that made this work.
What I meant to say is that I am pretty certain, in some form or another, others also use some AI techniques in their product pipelines.
 
Joined
Jul 16, 2014
Messages
8,143 (2.25/day)
Location
SE Michigan
System Name Dumbass
Processor AMD Ryzen 7800X3D
Motherboard ASUS TUF gaming B650
Cooling Artic Liquid Freezer 2 - 420mm
Memory G.Skill Sniper 32gb DDR5 6000
Video Card(s) GreenTeam 4070 ti super 16gb
Storage Samsung EVO 500gb & 1Tb, 2tb HDD, 500gb WD Black
Display(s) 1x Nixeus NX_EDG27, 2x Dell S2440L (16:9)
Case Phanteks Enthoo Primo w/8 140mm SP Fans
Audio Device(s) onboard (realtek?) - SPKRS:Logitech Z623 200w 2.1
Power Supply Corsair HX1000i
Mouse Steeseries Esports Wireless
Keyboard Corsair K100
Software windows 10 H
Benchmark Scores https://i.imgur.com/aoz3vWY.jpg?2
When I first started reading this, I thought: nice smaller and shorter video cards, back down to 2 slots maybe?

Then I started to function I drank my coffee...:banghead:
 
Joined
Jul 10, 2017
Messages
2,671 (1.06/day)
I'm pretty sure everyone already does. It just happens Nvidia employs a better model, for the time being.
In general, the problem of dividing a given surface into small pieces in an optimal way is NP-complete (https://en.wikipedia.org/wiki/NP-completeness), thus not something you want to/can tackle without some form of approximation or heuristic.
Allegedly better. Does smaller really mean better efficiency, performance and less bugs?
 

bug

Joined
May 22, 2015
Messages
13,344 (4.03/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
Allegedly better. Does smaller really mean better efficiency, performance and less bugs?
Smaller dies mean more dies per waffer. Size is the main factor that dictates price. In this particular case it seems it also means a little better performance, since it says "achieving similar or better latency".
 
Joined
Oct 12, 2005
Messages
682 (0.10/day)
I'm pretty sure everyone already does. It just happens Nvidia employs a better model, for the time being.
In general, the problem of dividing a given surface into small pieces in an optimal way is NP-complete (https://en.wikipedia.org/wiki/NP-completeness), thus not something you want to/can tackle without some form of approximation or heuristic.

I know AMD does it, and like you, pretty sure Intel and other manufacturer use AI too. But what make you think Nvidia use a better model ?
 
Joined
May 2, 2017
Messages
7,762 (2.99/day)
Location
Back in Norway
System Name Hotbox
Processor AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard ASRock Phantom Gaming B550 ITX/ax
Cooling LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory 32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s) PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage 2TB Adata SX8200 Pro
Display(s) Dell U2711 main, AOC 24P2C secondary
Case SSUPD Meshlicious
Audio Device(s) Optoma Nuforce μDAC 3
Power Supply Corsair SF750 Platinum
Mouse Logitech G603
Keyboard Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software Windows 10 Pro
Allegedly better. Does smaller really mean better efficiency, performance and less bugs?
Smaller dies mean more dies per waffer. Size is the main factor that dictates price. In this particular case it seems it also means a little better performance, since it says "achieving similar or better latency".
This. Also, smaller dice are generally more efficient due to shorter internal wire lengths. Not a huge difference, but given how many km of wiring a single modern chip contains, it adds up. Less bugs is... well, essentially random. Is there a possibility a chip with AI-designed functional blocks has more bugs than a traditionally designed one? Sure. But that might also not be the case. Better performance depends on a ton of factors, but generally, more compact designs perform better until thermal density becomes an issue.
 

bug

Joined
May 22, 2015
Messages
13,344 (4.03/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
I know AMD does it, and like you, pretty sure Intel and other manufacturer use AI too. But what make you think Nvidia use a better model ?
Better than what? It's better than the classic attempts by a measurable 25%. Better than the perfect design? Probably not.
 
Top