• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Patents a New Method for GPU Instruction Scheduling

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
2,700 (1.00/day)
With growing revenues coming from strong sales of Ryzen and Radeon products, AMD is more focused on innovation than ever. It is important for any company to re-invest its capital into R&D, to stay ahead. And that is exactly what AMD is doing by focusing on future technologies, while constantly improving existing solutions.

On June 13th, AMD published a new method for instruction scheduling of shader programs for a GPU. The method operates on fixed number of registers. It works in five stages:
  • Compute liveness-based register usage across all basic blocks
  • Computer range of numbers of waves for shader program
  • Assess the impact of available post-register allocation optimizations
  • Compute the scoring data based on number of waves of the plurality of registers
  • Compute optimal number of waves




It is important to note that the "liveness" of registers is most probably a reference to register utilization, while the term "wave" refers to the machine states, like for example EOP (End Of Pipe) and DRAW which draws the shader. There are of course many more states but these are just few examples from AMD's "GPU Open" documentation. The new method is supposed to bring additional performance improvements and reduce latency by making data (machine states in this case) like a wave that is stored in a register.

You can find out more about it here.

View at TechPowerUp Main Site
 
Joined
Nov 4, 2005
Messages
12,048 (1.72/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s) 55" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
Looks like the first patent for a on die CPU scheduler for upcoming architecture, it may or may not be an X86-64 core, but it only makes sense if they have the know how now to make a 4Ghz scheduling CPU on die to make their GPU cores more efficient without any overhead since it could be considered the first basic AI for accelerating GPU workloads.
 
Joined
Jul 16, 2014
Messages
8,223 (2.15/day)
Location
SE Michigan
System Name Dumbass
Processor AMD Ryzen 7800X3D
Motherboard ASUS TUF gaming B650
Cooling Artic Liquid Freezer 2 - 420mm
Memory G.Skill Sniper 32gb DDR5 6000
Video Card(s) GreenTeam 4070 ti super 16gb
Storage Samsung EVO 500gb & 1Tb, 2tb HDD, 500gb WD Black
Display(s) 1x Nixeus NX_EDG27, 2x Dell S2440L (16:9)
Case Phanteks Enthoo Primo w/8 140mm SP Fans
Audio Device(s) onboard (realtek?) - SPKRS:Logitech Z623 200w 2.1
Power Supply Corsair HX1000i
Mouse Steeseries Esports Wireless
Keyboard Corsair K100
Software windows 10 H
Benchmark Scores https://i.imgur.com/aoz3vWY.jpg?2
might also be part of an interface IF for multiple GPUs.
 
Joined
Sep 17, 2014
Messages
22,842 (6.06/day)
Location
The Washing Machine
System Name Tiny the White Yeti
Processor 7800X3D
Motherboard MSI MAG Mortar b650m wifi
Cooling CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory 32GB Corsair Vengeance 30CL6000
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s) Gigabyte G34QWC (3440x1440)
Case Lian Li A3 mATX White
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse Steelseries Aerox 5
Keyboard Lenovo Thinkpad Trackpoint II
VR HMD HD 420 - Green Edition ;)
Software W11 IoT Enterprise LTSC
Benchmark Scores Over 9000
Nice to see some progress on AMD's GPU side. Its about goddamn time we get a bit more than a roadmap full of too little too late. But then this won't see the light of day for at least 3 years ahead.

It also doesn't look mighty complicated... 'when its full, see if you can stuff in some more' 'and then some' captures it quite well I think. But it does sound very much like a fix for AMD's resource allocation problem and efficiency.
 
Joined
Mar 10, 2010
Messages
11,880 (2.19/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Gskill Trident Z 3900cas18 32Gb in four sticks./16Gb/16GB
Video Card(s) Asus tuf RX7900XT /Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores laptop Timespy 6506
I'm surprised its this patent and the chip cooler did no one see the raytracing one on another site, it is on tom's


direct link to patent


I think these are from 2017 so in a few more years we Might see them.
 
Joined
Oct 8, 2006
Messages
173 (0.03/day)
infinity fabric gpus, like i saw from,
new mac pro with dual navi gpu card

  • Support for Infinity Fabric Link GPU interconnect technology – With up to 84GB/s per direction low-latency peer-to-peer memory access, the scalable GPU interconnect technology enables GPU-to-GPU communications up to 5X faster than PCIe Gen 3 interconnect speeds.
do chiplets on gpus and amd could have an easy time beating nvidia. either one would rock!
 
Joined
Nov 15, 2016
Messages
454 (0.15/day)
System Name Sillicon Nightmares
Processor Intel i7 9700KF 5ghz (5.1ghz 4 core load, no avx offset), 4.7ghz ring, 1.412vcore 1.3vcio 1.264vcsa
Motherboard Asus Z390 Strix F
Cooling DEEPCOOL Gamer Storm CAPTAIN 360
Memory 2x8GB G.Skill Trident Z RGB (B-Die) 3600 14-14-14-28 1t, tRFC 220 tREFI 65535, tFAW 16, 1.545vddq
Video Card(s) ASUS GTX 1060 Strix 6GB XOC, Core: 2202-2240, Vcore: 1.075v, Mem: 9818mhz (Sillicon Lottery Jackpot)
Storage Samsung 840 EVO 1TB SSD, WD Blue 1TB, Seagate 3TB, Samsung 970 Evo Plus 512GB
Display(s) BenQ XL2430 1080p 144HZ + (2) Samsung SyncMaster 913v 1280x1024 75HZ + A Shitty TV For Movies
Case Deepcool Genome ROG Edition
Audio Device(s) Bunta Sniff Speakers From The Tip Edition With Extra Kenwoods
Power Supply Corsair AX860i/Cable Mod Cables
Mouse Logitech G602 Spilled Beer Edition
Keyboard Dell KB4021
Software Windows 10 x64
Benchmark Scores 13543 Firestrike (3dmark.com/fs/22336777) 601 points CPU-Z ST 37.4ns AIDA Memory
Looks like the first patent for a on die CPU scheduler for upcoming architecture, it may or may not be an X86-64 core, but it only makes sense if they have the know how now to make a 4Ghz scheduling CPU on die to make their GPU cores more efficient without any overhead since it could be considered the first basic AI for accelerating GPU workloads.
so like how maxwell has an arm cpu integrated in it
 
Joined
Nov 4, 2005
Messages
12,048 (1.72/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s) 55" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
so like how maxwell has an arm cpu integrated in it


That's only used for boot and power management. An actual X86-64 core can run native code, and already runs much higher clock speed than ARM cores.
 

Attachments

  • main-qimg-0a52f3be40ff5b8a3cf75fb531eba7b4.png
    main-qimg-0a52f3be40ff5b8a3cf75fb531eba7b4.png
    120 KB · Views: 203
Joined
Oct 28, 2010
Messages
251 (0.05/day)
Looks like the first patent for a on die CPU scheduler for upcoming architecture, it may or may not be an X86-64 core, but it only makes sense if they have the know how now to make a 4Ghz scheduling CPU on die to make their GPU cores more efficient without any overhead since it could be considered the first basic AI for accelerating GPU workloads.
Patenting can help them, but in a relatively limited way.
Generally speaking, alternates that do the same thing can be developed and implemented without breaching what someone else did.
I think it's still for x64 stuff, what else could it serve ?
 
Joined
Mar 21, 2016
Messages
2,508 (0.78/day)
That's only used for boot and power management. An actual X86-64 core can run native code, and already runs much higher clock speed than ARM cores.
Yup let main CPU hand off a task to the GPU and from there it's on board GPU optimized CPU can handle the rest until it needs to communicate with it again which it could do in short bursts. The big benifit is it could be a more GPU optimized CPU in terms of L cache, instruction sets, and frequency scaling, and on top of that no OS contention to deal with unlike the primary CPU that has who knows what background tasks running, telemetry, windows updates, virus scans, ect that could be slowing it down or intermittently slowing it down and probably wouldn't scale as high frequency as a more simple 1-2c/2-4t CPU could especially with binning.

Think of Intel's 5GHz CPU's integrate 1-2cores like that on the GPU itself and suddenly that makes the primary CPU a lot less frequency starved from a gaming standpoint at 1080p esport epeen Intel talking points. When you think about it like that too it makes a lot more sense than trying to get 16c to run at 5GHz on all cores for example to match Intel general grasping at straws a bit performance advantage in games that don't scale at resolutions that don't scale lol with overkill refresh rates ofc because hey gotta win somehow at all costs 240p 960Hz refresh rate here I come pew pew pew!!
 
Top