AMD WMMA Instruction is Direct Response to NVIDIA Tensor Cores

AleksandarK · Jun 30, 2022

AMD's RDNA3 graphics IP is just around the corner, and we are hearing more information about the upcoming architecture. Historically, as GPUs advance, it is not unusual for companies to add dedicated hardware blocks to accelerate a specific task. Today, AMD engineers have updated the backend of the LLVM compiler to include a new instruction called Wave Matrix Multiply-Accumulate (WMMA). This instruction will be present on GFX11, which is the RDNA3 GPU architecture. With WMMA, AMD will offer support for processing 16x16x16 size tensors in FP16 and BF16 precision formats. With these instructions, AMD is adding new arrangements to support the processing of matrix multiply-accumulate operations. This is closely mimicking the work NVIDIA is doing with Tensor Cores.

AMD ROCm 5.2 API update lists the use case for this type of instruction, which you can see below:

rocWMMA provides a C++ API to facilitate breaking down matrix multiply accumulate problems into fragments and using them in block-wise operations that are distributed in parallel across GPU wavefronts. The API is a header library of GPU device code, meaning matrix core acceleration may be compiled directly into your kernel device code. This can benefit from compiler optimization in the generation of kernel assembly and does not incur additional overhead costs of linking to external runtime libraries or having to launch separate kernels.

rocWMMA is released as a header library and includes test and sample projects to validate and illustrate example usages of the C++ API. GEMM matrix multiplication is used as primary validation given the heavy precedent for the library. However, the usage portfolio is growing significantly and demonstrates different ways rocWMMA may be consumed.

View at TechPowerUp Main Site | Source

nguyen · Jun 30, 2022

Oh so apparently tensor cores are not so useless now :roll:

Vayra86 · Jun 30, 2022

Well well, so the consensus is moving towards dedicated hardware.

Let's see where RDNA3's power budget goes...
I need to read better it seems

Bomby569 · Jun 30, 2022

nguyen said:
Oh so apparently tensor cores are not so useless now

they are if you don't have them, they aren't if you have them

ARF · Jun 30, 2022

This is a good news. I hope AMD will show us how much the ray-tracing performance is improved from RDNA 2 to RDNA 3 - should be multiple times because RDNA 2's RT performance was abysmal.

Bomby569 · Jun 30, 2022

ARF said:
This is a good news. I hope AMD will show us how much the ray-tracing performance is improved from RDNA 2 to RDNA 3 - should be multiple times because RDNA 2's RT performance was abysmal.

Ray tracing is a joke anyway, no one is missing much for not having it. It's true we hadn't much AAA releases, still after so long we have but a few games that done anything meaninfull with it. Seems more like a must have buzzword for the box than anything else

ARF · Jun 30, 2022

Bomby569 said:
Ray tracing is a joke anyway, no one is missing much for not having it. It's true we hadn't much AAA releases, still after so long we have but a few games that done anything meaninfull with it. Seems more like a must have buzzword for the box than anything else

It is not a joke because it makes the nvidia cards flying off the shelves - the nvidia users claim it's better because:
1. Ray-tracing;
2. DLSS;
3. CUDA, tensor, etc...

Patriot · Jun 30, 2022

This is the third generation of matrix cores from AMD, CDNA 1/2 now getting added to the consumer line as they unify a certain amount of features for ROCm support with CDNA3 and RDNA3.

Vayra86 · Jun 30, 2022

ARF said:
It is not a joke because it makes the nvidia cards flying off the shelves - the nvidia users claim it's better because:
1. Ray-tracing;
2. DLSS;
3. CUDA, tensor, etc...

You mean flying off the pallets to miners...?

I haven't seen anything on shelves for a looong time tbh. Its just recently that we're getting some semblance of normal availability back, and as usual, Nvidia is faster in restocking the sales channels.

Deleted member 185088 · Jun 30, 2022

nguyen said:
Oh so apparently tensor cores are not so useless now

They are not useless per se as they do something, especially nVidia which I believe just want to focus on professional application of ML, for games I still think they are useless, the games IMO still look awful, mainly to low polygons count and textures, I don't see the point of having pretty lighting but cubes instead of mugs or ball, and textures are of so low quality (though vram is more more important).
Wasting silicon for special hardware just for some ML isn't the right way, once we achieve perfect geometry then I'm all for it.

TheoneandonlyMrK · Jun 30, 2022

nguyen said:
Oh so apparently tensor cores are not so useless now

Except unlike Nvidia, AMD seam to have Not gone with seperated fix function hardware, and Instead use the 64bit wavefronts they already had via instructions and likely bigger registers.

More information is required though tbf, but this doesn't sound like specialised fix function hardware like tensor core's to me, just optimised use of what they're simd array could theoretically do.

"rocWMMA provides a C++ API to facilitate breaking down matrix multiply accumulate problems into fragments and using them in block-wise operations that are distributed in parallel across GPU wavefronts.".

As they say.

nguyen · Jun 30, 2022

Xex360 said:
They are not useless per se as they do something, especially nVidia which I believe just want to focus on professional application of ML, for games I still think they are useless, the games IMO still look awful, mainly to low polygons count and textures, I don't see the point of having pretty lighting but cubes instead of mugs or ball, and textures are of so low quality (though vram is more more important).
Wasting silicon for special hardware just for some ML isn't the right way, once we achieve perfect geometry then I'm all for it.

Just don't play games then, wait until you are on your deathbed, I'm sure games will look amazing then

Vayra86 · Jun 30, 2022

TheoneandonlyMrK said:
Except unlike Nvidia, AMD seam to have Not gone with seperated fix function hardware, and Instead use the 64bit wavefronts they already had via instructions and likely bigger registers.

More information is required though tbf, but this doesn't sound like specialised fix function hardware like tensor core's to me, just optimised use of what they're simd array could theoretically do.

"rocWMMA provides a C++ API to facilitate breaking down matrix multiply accumulate problems into fragments and using them in block-wise operations that are distributed in parallel across GPU wavefronts.".

As they say.

Wow, you read better than I did.

jigar2speed · Jun 30, 2022

ARF said:
It is not a joke because it makes the nvidia cards flying off the shelves - the nvidia users claim it's better because:
1. Ray-tracing;
2. DLSS;
3. CUDA, tensor, etc...

RTX 3070 user here, they are useless for me.

ARF · Jun 30, 2022

jigar2speed said:
RTX 3070 user here, they are useless for me.

Sell the card and buy a 16 GB Radeon RX 6800 XT or a 12 GB Radeon RX 6700 XT.

Why did you buy it in the first place?

Xajel · Jun 30, 2022

nguyen said:
Oh so apparently tensor cores are not so useless now

AMD's idea is leaning against fixed function ASIC (like the tensor cores) which can be considered some how a waste of silicon because the die area will sit down doing nothing for most of the time, they like to make the silicon die do other things as well.

A single function ASIC is easier to make and faster to implement, that why NV didn't have to do more and could add it quicker.

AMD's way is harder and needs more engineering work, so even after NV announced it, they took sometime to implement a similar function, but the benefit is more as they're reusing mostly the same silicon die space they have before, it's just more tweaked to do more specialised work more while still being able to do other things in the same time, so it wont be like a fixed function ASIC that can only do a single thing.

It's like saying, NV is adding 15% more die area to have this function. AMD took 2 more years but they only needed to have 5% more die area. And they might be able to use it for future uses as well for other things.

R0H1T · Jun 30, 2022

nguyen said:
Just don't play games then, wait until you are on your deathbed, I'm sure games will look amazing then

If you say so :slap:

When your whole life flashes before your eyes, how much of it do you want to not have ray tracing?

Just Buy It: Why Nvidia RTX GPUs Are Worth the Money

Some enthusiasts are upset about the price of the Nvidia RTX 2080 / 2070, but the opportunity cost of sticking with old technology is higher.

www.tomshardware.com

Never gets old, does it :laugh:

TheoneandonlyMrK · Jun 30, 2022

Vayra86 said:
Wow, you read better than I did.

Well back when rapid packed math was introduced I couldn't believe they had not also incorporated this, you have a 64bit wavefront that can already do multiple math ops on one wave in one pass, so why not do quadratics like that, clearly patience was needed by me.

nguyen · Jun 30, 2022

Xajel said:
AMD's idea is leaning against fixed function ASIC (like the tensor cores) which can be considered some how a waste of silicon because the die area will sit down doing nothing for most of the time, they like to make the silicon die do other things as well.

A single function ASIC is easier to make and faster to implement, that why NV didn't have to do more and could add it quicker.

AMD's way is harder and needs more engineering work, so even after NV announced it, they took sometime to implement a similar function, but the benefit is more as they're reusing mostly the same silicon die space they have before, it's just more tweaked to do more specialised work more while still being able to do other things in the same time, so it wont be like a fixed function ASIC that can only do a single thing.

It's like saying, NV is adding 15% more die area to have this function. AMD took 2 more years but they only needed to have 5% more die area. And they might be able to use it for future uses as well for other things.

Same with RT then, AMD's implementation of RT is just weak sauce.

Tensor cores is on its 4th gen with Ada now, probably takes less than 5% die space.

R0H1T said:
If you say so

Just Buy It: Why Nvidia RTX GPUs Are Worth the Money

Some enthusiasts are upset about the price of the Nvidia RTX 2080 / 2070, but the opportunity cost of sticking with old technology is higher.

www.tomshardware.com

Never gets old, does it

Well if money is everything to you, then why are you spending them on useless PC stuff anyways.

R0H1T · Jun 30, 2022

What do you mean? Selling someone/anything on that kind of sales pitch is just bad period ~ I'd rather see (all) wars end in my lifetime than be hung up on "real-time" ray tracing!

And yes all of us can do little things to make that day come forth.

nguyen · Jun 30, 2022

R0H1T said:
What do you mean? Selling someone/anything on that kind of sales pitch is just bad period ~ I'd rather see (all) wars end in my lifetime than be hung up on "real-time" ray tracing!

And yes all of us can do little things to make that day come forth.

well IMO Gamersnexus was being a dumbass for calling that article out, knowing now that those GPU get to keep insane resell value 3.5 years after launch

beedoo · Jun 30, 2022

nguyen said:
Same with RT then, AMD's implementation of RT is just weak sauce.

Can we give up this A < B, therefore A must be crap mentality - it's getting boring.

Vya Domus · Jun 30, 2022

You guys realize this is completely irrelevant for consumers, right ?

R0H1T · Jun 30, 2022

nguyen said:
well IMO Gamersnexus was being a dumbass for calling that article out, knowing now that those GPU get to keep insane resell value 3.5 years after launch View attachment 253035

Purely in terms of resale value yes it's outdone probably any other dGPUs in the past, but then you forgot the backdrop? A one in 100 year global pandemic. As for your particular point about Tensor cores, correct me if I'm wrong, outside of DLSS are they actually that useful anywhere else? The way things are shaping up right now DLSS vs FSR will end up almost exactly as Gsync vs Freesync!

Unless of course Nvidia is willing to throw another billion or two each year for the next decade or so.

Bomby569 · Jun 30, 2022

Vya Domus said:
You guys realize this is completely irrelevant for consumers, right ?

Nvidia spent time and money and even rename the all brand of GPU's and AMD seems to try all it can to do some nice benchmarks, and yet for us consumers RTX is nothing, zero, a couple of games, a gimmick

The time and money spent on this is absurd and they pile on

System Name	The de-ploughminator Mk-III
Processor	9800X3D
Motherboard	Gigabyte X870E Aorus Master
Cooling	DeepCool AK620
Memory	2x32GB G.SKill 6400MT Cas32
Video Card(s)	Asus RTX4090 TUF
Storage	4TB Samsung 990 Pro
Display(s)	48" LG OLED C4
Case	Corsair 5000D Air
Audio Device(s)	KEF LSX II LT speakers + KEF KC62 Subwoofer
Power Supply	Corsair HX1200
Mouse	Razor Death Adder v3
Keyboard	Razor Huntsman V3 Pro TKL
Software	win11

System Name	Tiny the White Yeti
Processor	7800X3D
Motherboard	MSI MAG Mortar b650m wifi
Cooling	CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory	32GB Corsair Vengeance 30CL6000
Video Card(s)	ASRock RX7900XT Phantom Gaming
Storage	Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s)	Gigabyte G34QWC (3440x1440)
Case	Lian Li A3 mATX White
Audio Device(s)	Harman Kardon AVR137 + 2.1
Power Supply	EVGA Supernova G2 750W
Mouse	Steelseries Aerox 5
Keyboard	Lenovo Thinkpad Trackpoint II
VR HMD	HD 420 - Green Edition ;)
Software	W11 IoT Enterprise LTSC
Benchmark Scores	Over 9000

Processor	Ryzen 5 5700x
Motherboard	B550 Elite
Cooling	Thermalright Perless Assassin 120 SE
Memory	32GB Fury Beast DDR4 3200Mhz
Video Card(s)	Gigabyte 3060 ti gaming oc pro
Storage	Samsung 970 Evo 1TB, WD SN850x 1TB, plus some random HDDs
Display(s)	LG 27gp850 1440p 165Hz 27''
Case	Lian Li Lancool II performance
Power Supply	MSI 750w
Mouse	G502

Processor	Ryzen 5 5700x
Motherboard	B550 Elite
Cooling	Thermalright Perless Assassin 120 SE
Memory	32GB Fury Beast DDR4 3200Mhz
Video Card(s)	Gigabyte 3060 ti gaming oc pro
Storage	Samsung 970 Evo 1TB, WD SN850x 1TB, plus some random HDDs
Display(s)	LG 27gp850 1440p 165Hz 27''
Case	Lian Li Lancool II performance
Power Supply	MSI 750w
Mouse	G502

System Name	[H]arbringer
Processor	4x 61XX ES @3.5Ghz (48cores)
Motherboard	SM GL
Cooling	3x xspc rx360, rx240, 4x DT G34 snipers, D5 pump.
Memory	16x gskill DDR3 1600 cas6 2gb
Video Card(s)	blah bigadv folder no gfx needed
Storage	32GB Sammy SSD
Display(s)	headless
Case	Xigmatek Elysium (whats left of it)
Audio Device(s)	yawn
Power Supply	Antec 1200w HCP
Software	Ubuntu 10.10
Benchmark Scores	http://valid.canardpc.com/show_oc.php?id=1780855 http://www.hwbot.org/submission/2158678 http://ww

AMD WMMA Instruction is Direct Response to NVIDIA Tensor Cores

AleksandarK

News Editor

nguyen

Vayra86

Bomby569

ARF

Bomby569

ARF

Patriot

Vayra86

Deleted member 185088

Guest

TheoneandonlyMrK

nguyen

Vayra86

jigar2speed

ARF

Xajel

R0H1T

Just Buy It: Why Nvidia RTX GPUs Are Worth the Money

TheoneandonlyMrK

nguyen

Just Buy It: Why Nvidia RTX GPUs Are Worth the Money

R0H1T

nguyen

beedoo

Vya Domus

R0H1T

Bomby569

System Name	RyzenGtEvo/ Asus strix scar II
Processor	Amd R5 5900X/ Intel 8750H
Motherboard	Crosshair hero8 impact/Asus
Cooling	360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory	Gskill Trident Z 3900cas18 32Gb in four sticks./16Gb/16GB
Video Card(s)	Asus tuf RX7900XT /Rtx 2060
Storage	Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s)	Samsung UAE28"850R 4k freesync.dell shiter
Case	Lianli 011 dynamic/strix scar2
Audio Device(s)	Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply	corsair 1200Hxi/Asus stock
Mouse	Roccat Kova/ Logitech G wireless
Keyboard	Roccat Aimo 120
VR HMD	Oculus rift
Software	Win 10 Pro
Benchmark Scores	laptop Timespy 6506

Processor	i5 4670K - @ 4.8GHZ core
Motherboard	MSI Z87 G43
Cooling	Thermalright Ultra-120 *(Modded to fit on this motherboard)
Memory	16GB 2400MHZ
Video Card(s)	HD7970 GHZ edition Sapphire
Storage	Samsung 120GB 850 EVO & 4X 2TB HDD (Seagate)
Display(s)	42" Panasonice LED TV @120Hz
Case	Corsair 200R
Audio Device(s)	Xfi Xtreme Music with Hyper X Core
Power Supply	Cooler Master 700 Watts

System Name	Xajel Main
Processor	AMD Ryzen 7 5800X
Motherboard	ASRock X570M Steel Legened
Cooling	Corsair H100i PRO
Memory	G.Skill DDR4 3600 32GB (2x16GB)
Video Card(s)	ZOTAC GAMING GeForce RTX 3080 Ti AMP Holo
Storage	(OS) Gigabyte AORUS NVMe Gen4 1TB + (Personal) WD Black SN850X 2TB + (Store) WD 8TB HDD
Display(s)	LG 38WN95C Ultrawide 3840x1600 144Hz
Case	Cooler Master CM690 III
Audio Device(s)	Built-in Audio + Yamaha SR-C20 Soundbar
Power Supply	Thermaltake 750W
Mouse	Logitech MK710 Combo
Keyboard	Logitech MK710 Combo (M705)
Software	Windows 11 Pro

System Name	Lexx
Processor	AMD 9950X, Thermal Grizzly Frame/Paste
Motherboard	Asus ROG Strix X870E
Cooling	Custom Water, Watercool IV CPU block/Res, Koolance Quick Disconnect
Memory	32GB G.Skill 6000MHz CL30
Video Card(s)	Liquid Devil 6900XT
Storage	6TB Solid State PCI/NVME/M.2
Display(s)	LG 34" Curved Ultrawide 160Hz
Case	Thermaltake View T71
Audio Device(s)	Onboard
Power Supply	Asus ROG Thor 1000W
Mouse	Logitech G502
Keyboard	Asus
VR HMD	NA
Software	Windows 11 Pro

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C