• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD WMMA Instruction is Direct Response to NVIDIA Tensor Cores

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
2,651 (0.99/day)
AMD's RDNA3 graphics IP is just around the corner, and we are hearing more information about the upcoming architecture. Historically, as GPUs advance, it is not unusual for companies to add dedicated hardware blocks to accelerate a specific task. Today, AMD engineers have updated the backend of the LLVM compiler to include a new instruction called Wave Matrix Multiply-Accumulate (WMMA). This instruction will be present on GFX11, which is the RDNA3 GPU architecture. With WMMA, AMD will offer support for processing 16x16x16 size tensors in FP16 and BF16 precision formats. With these instructions, AMD is adding new arrangements to support the processing of matrix multiply-accumulate operations. This is closely mimicking the work NVIDIA is doing with Tensor Cores.

AMD ROCm 5.2 API update lists the use case for this type of instruction, which you can see below:
rocWMMA provides a C++ API to facilitate breaking down matrix multiply accumulate problems into fragments and using them in block-wise operations that are distributed in parallel across GPU wavefronts. The API is a header library of GPU device code, meaning matrix core acceleration may be compiled directly into your kernel device code. This can benefit from compiler optimization in the generation of kernel assembly and does not incur additional overhead costs of linking to external runtime libraries or having to launch separate kernels.

rocWMMA is released as a header library and includes test and sample projects to validate and illustrate example usages of the C++ API. GEMM matrix multiplication is used as primary validation given the heavy precedent for the library. However, the usage portfolio is growing significantly and demonstrates different ways rocWMMA may be consumed.


View at TechPowerUp Main Site | Source
 
Joined
Nov 11, 2016
Messages
3,459 (1.17/day)
System Name The de-ploughminator Mk-III
Processor 9800X3D
Motherboard Gigabyte X870E Aorus Master
Cooling DeepCool AK620
Memory 2x32GB G.SKill 6400MT Cas32
Video Card(s) Asus RTX4090 TUF
Storage 4TB Samsung 990 Pro
Display(s) 48" LG OLED C4
Case Corsair 5000D Air
Audio Device(s) KEF LSX II LT speakers + KEF KC62 Subwoofer
Power Supply Corsair HX850
Mouse Razor Death Adder v3
Keyboard Razor Huntsman V3 Pro TKL
Software win11
Oh so apparently tensor cores are not so useless now :roll:
 
Joined
Sep 17, 2014
Messages
22,666 (6.05/day)
Location
The Washing Machine
System Name Tiny the White Yeti
Processor 7800X3D
Motherboard MSI MAG Mortar b650m wifi
Cooling CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory 32GB Corsair Vengeance 30CL6000
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s) Gigabyte G34QWC (3440x1440)
Case Lian Li A3 mATX White
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse Steelseries Aerox 5
Keyboard Lenovo Thinkpad Trackpoint II
VR HMD HD 420 - Green Edition ;)
Software W11 IoT Enterprise LTSC
Benchmark Scores Over 9000
Well well, so the consensus is moving towards dedicated hardware.

Let's see where RDNA3's power budget goes...

I need to read better it seems
 
Last edited:
Joined
May 17, 2021
Messages
3,042 (2.31/day)
Processor Ryzen 5 5700x
Motherboard B550 Elite
Cooling Thermalright Perless Assassin 120 SE
Memory 32GB Fury Beast DDR4 3200Mhz
Video Card(s) Gigabyte 3060 ti gaming oc pro
Storage Samsung 970 Evo 1TB, WD SN850x 1TB, plus some random HDDs
Display(s) LG 27gp850 1440p 165Hz 27''
Case Lian Li Lancool II performance
Power Supply MSI 750w
Mouse G502
Oh so apparently tensor cores are not so useless now :roll:

they are if you don't have them, they aren't if you have them :D
 

ARF

Joined
Jan 28, 2020
Messages
4,670 (2.61/day)
Location
Ex-usa | slava the trolls
This is a good news. I hope AMD will show us how much the ray-tracing performance is improved from RDNA 2 to RDNA 3 - should be multiple times because RDNA 2's RT performance was abysmal.
 
Joined
May 17, 2021
Messages
3,042 (2.31/day)
Processor Ryzen 5 5700x
Motherboard B550 Elite
Cooling Thermalright Perless Assassin 120 SE
Memory 32GB Fury Beast DDR4 3200Mhz
Video Card(s) Gigabyte 3060 ti gaming oc pro
Storage Samsung 970 Evo 1TB, WD SN850x 1TB, plus some random HDDs
Display(s) LG 27gp850 1440p 165Hz 27''
Case Lian Li Lancool II performance
Power Supply MSI 750w
Mouse G502
This is a good news. I hope AMD will show us how much the ray-tracing performance is improved from RDNA 2 to RDNA 3 - should be multiple times because RDNA 2's RT performance was abysmal.

Ray tracing is a joke anyway, no one is missing much for not having it. It's true we hadn't much AAA releases, still after so long we have but a few games that done anything meaninfull with it. Seems more like a must have buzzword for the box than anything else
 

ARF

Joined
Jan 28, 2020
Messages
4,670 (2.61/day)
Location
Ex-usa | slava the trolls
Ray tracing is a joke anyway, no one is missing much for not having it. It's true we hadn't much AAA releases, still after so long we have but a few games that done anything meaninfull with it. Seems more like a must have buzzword for the box than anything else

It is not a joke because it makes the nvidia cards flying off the shelves - the nvidia users claim it's better because:
1. Ray-tracing;
2. DLSS;
3. CUDA, tensor, etc...
 
Joined
Oct 27, 2009
Messages
1,190 (0.21/day)
Location
Republic of Texas
System Name [H]arbringer
Processor 4x 61XX ES @3.5Ghz (48cores)
Motherboard SM GL
Cooling 3x xspc rx360, rx240, 4x DT G34 snipers, D5 pump.
Memory 16x gskill DDR3 1600 cas6 2gb
Video Card(s) blah bigadv folder no gfx needed
Storage 32GB Sammy SSD
Display(s) headless
Case Xigmatek Elysium (whats left of it)
Audio Device(s) yawn
Power Supply Antec 1200w HCP
Software Ubuntu 10.10
Benchmark Scores http://valid.canardpc.com/show_oc.php?id=1780855 http://www.hwbot.org/submission/2158678 http://ww
This is the third generation of matrix cores from AMD, CDNA 1/2 now getting added to the consumer line as they unify a certain amount of features for ROCm support with CDNA3 and RDNA3.
 
Joined
Sep 17, 2014
Messages
22,666 (6.05/day)
Location
The Washing Machine
System Name Tiny the White Yeti
Processor 7800X3D
Motherboard MSI MAG Mortar b650m wifi
Cooling CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory 32GB Corsair Vengeance 30CL6000
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s) Gigabyte G34QWC (3440x1440)
Case Lian Li A3 mATX White
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse Steelseries Aerox 5
Keyboard Lenovo Thinkpad Trackpoint II
VR HMD HD 420 - Green Edition ;)
Software W11 IoT Enterprise LTSC
Benchmark Scores Over 9000
It is not a joke because it makes the nvidia cards flying off the shelves - the nvidia users claim it's better because:
1. Ray-tracing;
2. DLSS;
3. CUDA, tensor, etc...
You mean flying off the pallets to miners...?

I haven't seen anything on shelves for a looong time tbh. Its just recently that we're getting some semblance of normal availability back, and as usual, Nvidia is faster in restocking the sales channels.
 
D

Deleted member 185088

Guest
Oh so apparently tensor cores are not so useless now :roll:
They are not useless per se as they do something, especially nVidia which I believe just want to focus on professional application of ML, for games I still think they are useless, the games IMO still look awful, mainly to low polygons count and textures, I don't see the point of having pretty lighting but cubes instead of mugs or ball, and textures are of so low quality (though vram is more more important).
Wasting silicon for special hardware just for some ML isn't the right way, once we achieve perfect geometry then I'm all for it.
 
Joined
Mar 10, 2010
Messages
11,878 (2.20/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in four sticks./16Gb/16GB
Video Card(s) Powercolour RX7900XT Reference/Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
Oh so apparently tensor cores are not so useless now :roll:
Except unlike Nvidia, AMD seam to have Not gone with seperated fix function hardware, and Instead use the 64bit wavefronts they already had via instructions and likely bigger registers.

More information is required though tbf, but this doesn't sound like specialised fix function hardware like tensor core's to me, just optimised use of what they're simd array could theoretically do.


"rocWMMA provides a C++ API to facilitate breaking down matrix multiply accumulate problems into fragments and using them in block-wise operations that are distributed in parallel across GPU wavefronts.".


As they say.
 
Joined
Nov 11, 2016
Messages
3,459 (1.17/day)
System Name The de-ploughminator Mk-III
Processor 9800X3D
Motherboard Gigabyte X870E Aorus Master
Cooling DeepCool AK620
Memory 2x32GB G.SKill 6400MT Cas32
Video Card(s) Asus RTX4090 TUF
Storage 4TB Samsung 990 Pro
Display(s) 48" LG OLED C4
Case Corsair 5000D Air
Audio Device(s) KEF LSX II LT speakers + KEF KC62 Subwoofer
Power Supply Corsair HX850
Mouse Razor Death Adder v3
Keyboard Razor Huntsman V3 Pro TKL
Software win11
They are not useless per se as they do something, especially nVidia which I believe just want to focus on professional application of ML, for games I still think they are useless, the games IMO still look awful, mainly to low polygons count and textures, I don't see the point of having pretty lighting but cubes instead of mugs or ball, and textures are of so low quality (though vram is more more important).
Wasting silicon for special hardware just for some ML isn't the right way, once we achieve perfect geometry then I'm all for it.

Just don't play games then, wait until you are on your deathbed, I'm sure games will look amazing then
 
Joined
Sep 17, 2014
Messages
22,666 (6.05/day)
Location
The Washing Machine
System Name Tiny the White Yeti
Processor 7800X3D
Motherboard MSI MAG Mortar b650m wifi
Cooling CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory 32GB Corsair Vengeance 30CL6000
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s) Gigabyte G34QWC (3440x1440)
Case Lian Li A3 mATX White
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse Steelseries Aerox 5
Keyboard Lenovo Thinkpad Trackpoint II
VR HMD HD 420 - Green Edition ;)
Software W11 IoT Enterprise LTSC
Benchmark Scores Over 9000
Except unlike Nvidia, AMD seam to have Not gone with seperated fix function hardware, and Instead use the 64bit wavefronts they already had via instructions and likely bigger registers.

More information is required though tbf, but this doesn't sound like specialised fix function hardware like tensor core's to me, just optimised use of what they're simd array could theoretically do.


"rocWMMA provides a C++ API to facilitate breaking down matrix multiply accumulate problems into fragments and using them in block-wise operations that are distributed in parallel across GPU wavefronts.".


As they say.

Wow, you read better than I did.
 
Joined
Mar 6, 2012
Messages
569 (0.12/day)
Processor i5 4670K - @ 4.8GHZ core
Motherboard MSI Z87 G43
Cooling Thermalright Ultra-120 *(Modded to fit on this motherboard)
Memory 16GB 2400MHZ
Video Card(s) HD7970 GHZ edition Sapphire
Storage Samsung 120GB 850 EVO & 4X 2TB HDD (Seagate)
Display(s) 42" Panasonice LED TV @120Hz
Case Corsair 200R
Audio Device(s) Xfi Xtreme Music with Hyper X Core
Power Supply Cooler Master 700 Watts
It is not a joke because it makes the nvidia cards flying off the shelves - the nvidia users claim it's better because:
1. Ray-tracing;
2. DLSS;
3. CUDA, tensor, etc...
RTX 3070 user here, they are useless for me.
 
Joined
Apr 8, 2008
Messages
342 (0.06/day)
System Name Xajel Main
Processor AMD Ryzen 7 5800X
Motherboard ASRock X570M Steel Legened
Cooling Corsair H100i PRO
Memory G.Skill DDR4 3600 32GB (2x16GB)
Video Card(s) ZOTAC GAMING GeForce RTX 3080 Ti AMP Holo
Storage (OS) Gigabyte AORUS NVMe Gen4 1TB + (Personal) WD Black SN850X 2TB + (Store) WD 8TB HDD
Display(s) LG 38WN95C Ultrawide 3840x1600 144Hz
Case Cooler Master CM690 III
Audio Device(s) Built-in Audio + Yamaha SR-C20 Soundbar
Power Supply Thermaltake 750W
Mouse Logitech MK710 Combo
Keyboard Logitech MK710 Combo (M705)
Software Windows 11 Pro
Oh so apparently tensor cores are not so useless now :roll:

AMD's idea is leaning against fixed function ASIC (like the tensor cores) which can be considered some how a waste of silicon because the die area will sit down doing nothing for most of the time, they like to make the silicon die do other things as well.

A single function ASIC is easier to make and faster to implement, that why NV didn't have to do more and could add it quicker.

AMD's way is harder and needs more engineering work, so even after NV announced it, they took sometime to implement a similar function, but the benefit is more as they're reusing mostly the same silicon die space they have before, it's just more tweaked to do more specialised work more while still being able to do other things in the same time, so it wont be like a fixed function ASIC that can only do a single thing.

It's like saying, NV is adding 15% more die area to have this function. AMD took 2 more years but they only needed to have 5% more die area. And they might be able to use it for future uses as well for other things.
 
Joined
Apr 12, 2013
Messages
7,563 (1.77/day)
Just don't play games then, wait until you are on your deathbed, I'm sure games will look amazing then
If you say so :slap:
When your whole life flashes before your eyes, how much of it do you want to not have ray tracing?
Never gets old, does it :laugh:
 
Joined
Mar 10, 2010
Messages
11,878 (2.20/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in four sticks./16Gb/16GB
Video Card(s) Powercolour RX7900XT Reference/Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
Wow, you read better than I did.
Well back when rapid packed math was introduced I couldn't believe they had not also incorporated this, you have a 64bit wavefront that can already do multiple math ops on one wave in one pass, so why not do quadratics like that, clearly patience was needed by me.
 
Joined
Nov 11, 2016
Messages
3,459 (1.17/day)
System Name The de-ploughminator Mk-III
Processor 9800X3D
Motherboard Gigabyte X870E Aorus Master
Cooling DeepCool AK620
Memory 2x32GB G.SKill 6400MT Cas32
Video Card(s) Asus RTX4090 TUF
Storage 4TB Samsung 990 Pro
Display(s) 48" LG OLED C4
Case Corsair 5000D Air
Audio Device(s) KEF LSX II LT speakers + KEF KC62 Subwoofer
Power Supply Corsair HX850
Mouse Razor Death Adder v3
Keyboard Razor Huntsman V3 Pro TKL
Software win11
AMD's idea is leaning against fixed function ASIC (like the tensor cores) which can be considered some how a waste of silicon because the die area will sit down doing nothing for most of the time, they like to make the silicon die do other things as well.

A single function ASIC is easier to make and faster to implement, that why NV didn't have to do more and could add it quicker.

AMD's way is harder and needs more engineering work, so even after NV announced it, they took sometime to implement a similar function, but the benefit is more as they're reusing mostly the same silicon die space they have before, it's just more tweaked to do more specialised work more while still being able to do other things in the same time, so it wont be like a fixed function ASIC that can only do a single thing.

It's like saying, NV is adding 15% more die area to have this function. AMD took 2 more years but they only needed to have 5% more die area. And they might be able to use it for future uses as well for other things.

Same with RT then, AMD's implementation of RT is just weak sauce.

Tensor cores is on its 4th gen with Ada now, probably takes less than 5% die space.

If you say so :slap:

Never gets old, does it :laugh:

Well if money is everything to you, then why are you spending them on useless PC stuff anyways.
 
Joined
Nov 11, 2016
Messages
3,459 (1.17/day)
System Name The de-ploughminator Mk-III
Processor 9800X3D
Motherboard Gigabyte X870E Aorus Master
Cooling DeepCool AK620
Memory 2x32GB G.SKill 6400MT Cas32
Video Card(s) Asus RTX4090 TUF
Storage 4TB Samsung 990 Pro
Display(s) 48" LG OLED C4
Case Corsair 5000D Air
Audio Device(s) KEF LSX II LT speakers + KEF KC62 Subwoofer
Power Supply Corsair HX850
Mouse Razor Death Adder v3
Keyboard Razor Huntsman V3 Pro TKL
Software win11
What do you mean? Selling someone/anything on that kind of sales pitch is just bad period ~ I'd rather see (all) wars end in my lifetime than be hung up on "real-time" ray tracing!

And yes all of us can do little things to make that day come forth.

well IMO Gamersnexus was being a dumbass for calling that article out, knowing now that those GPU get to keep insane resell value 3.5 years after launch
used.jpg
 
Joined
Feb 22, 2022
Messages
109 (0.11/day)
System Name Lexx
Processor Threadripper 2950X
Motherboard Asus ROG Zenith Extreme
Cooling Custom Water
Memory 32/64GB Corsair 3200MHz
Video Card(s) Liquid Devil 6900XT
Storage 4TB Solid State PCI/NVME/M.2
Display(s) LG 34" Curved Ultrawide 160Hz
Case Thermaltake View T71
Audio Device(s) Onboard
Power Supply Corsair 1000W
Mouse Logitech G502
Keyboard Asus
VR HMD NA
Software Windows 10 Pro
Same with RT then, AMD's implementation of RT is just weak sauce.
Can we give up this A < B, therefore A must be crap mentality - it's getting boring.
 
Joined
Jan 8, 2017
Messages
9,504 (3.27/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
You guys realize this is completely irrelevant for consumers, right ?
 
Joined
Apr 12, 2013
Messages
7,563 (1.77/day)
well IMO Gamersnexus was being a dumbass for calling that article out, knowing now that those GPU get to keep insane resell value 3.5 years after launch View attachment 253035
Purely in terms of resale value yes it's outdone probably any other dGPUs in the past, but then you forgot the backdrop? A one in 100 year global pandemic. As for your particular point about Tensor cores, correct me if I'm wrong, outside of DLSS are they actually that useful anywhere else? The way things are shaping up right now DLSS vs FSR will end up almost exactly as Gsync vs Freesync!

Unless of course Nvidia is willing to throw another billion or two each year for the next decade or so.
 
Joined
May 17, 2021
Messages
3,042 (2.31/day)
Processor Ryzen 5 5700x
Motherboard B550 Elite
Cooling Thermalright Perless Assassin 120 SE
Memory 32GB Fury Beast DDR4 3200Mhz
Video Card(s) Gigabyte 3060 ti gaming oc pro
Storage Samsung 970 Evo 1TB, WD SN850x 1TB, plus some random HDDs
Display(s) LG 27gp850 1440p 165Hz 27''
Case Lian Li Lancool II performance
Power Supply MSI 750w
Mouse G502
You guys realize this is completely irrelevant for consumers, right ?

Nvidia spent time and money and even rename the all brand of GPU's and AMD seems to try all it can to do some nice benchmarks, and yet for us consumers RTX is nothing, zero, a couple of games, a gimmick

The time and money spent on this is absurd and they pile on
 
Top