• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

CHOO CHOOOOO!!!!1! Navi Hype Train be rollin'

Joined
Mar 10, 2010
Messages
11,880 (2.19/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Gskill Trident Z 3900cas18 32Gb in four sticks./16Gb/16GB
Video Card(s) Asus tuf RX7900XT /Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores laptop Timespy 6506
Rapid Packed Math is really simple: the FP32 FPUs can alternatively handle 2xFP16 in the same space/cycle.
Well that's it's initial implementation, later versions support lower bit ranges like 4x16bit 8x8bit 16x4bit and that's through 64bit wavefronts not 32 ,on 32 bit jobs it can still throughout 2x.

This is why Gcn isn't changing as soon as some would like.
 
Joined
Feb 3, 2017
Messages
3,877 (1.33/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
Well that's it's initial implementation, later versions support lower bit ranges like 4x16bit 8x8bit 16x4bit and that's through 64bit wavefronts not 32 ,on 32 bit jobs it can still throughout 2x.
This is why Gcn isn't changing as soon as some would like.
Vega already has 1xFP32, 2xFP16, 4xINT8 and 8xINT4, so does Turing. Pascal should have everything besides 2xFP16.
Lower bit ranges have quite limited utility though and these have really not been used much in other than some ML applications.
 
Joined
May 31, 2016
Messages
4,473 (1.42/day)
Location
Currently Norway
System Name Bro2
Processor Ryzen 5800X
Motherboard Gigabyte X570 Aorus Elite
Cooling Corsair h115i pro rgb
Memory 32GB G.Skill Flare X 3200 CL14 @3800Mhz CL16
Video Card(s) Powercolor 6900 XT Red Devil 1.1v@2400Mhz
Storage M.2 Samsung 970 Evo Plus 500MB/ Samsung 860 Evo 1TB
Display(s) LG 27UD69 UHD / LG 27GN950
Case Fractal Design G
Audio Device(s) Realtec 5.1
Power Supply Seasonic 750W GOLD
Mouse Logitech G402
Keyboard Logitech slim
Software Windows 10 64 bit
Joined
Mar 10, 2010
Messages
11,880 (2.19/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Gskill Trident Z 3900cas18 32Gb in four sticks./16Gb/16GB
Video Card(s) Asus tuf RX7900XT /Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores laptop Timespy 6506
Vega already has 1xFP32, 2xFP16, 4xINT8 and 8xINT4, so does Turing. Pascal should have everything besides 2xFP16.
Lower bit ranges have quite limited utility though and these have really not been used much in other than some ML applications.
They ,meaning Nvidia, do not have RPM , they Can do all of it ,but do some of it with special hardware ie tensor or RtRt core's and some is done by cuda core's but they're not doing it the same way at all.

I have a vega, i know what it can do.
 

AlienIsGOD

Vanguard Beta Tester
Joined
Aug 9, 2008
Messages
5,120 (0.85/day)
Location
Kingston, Ontario Canada
System Name Aliens Ryzen Rig | 2nd Hand Omen
Processor Ryzen R5 5600 | Ryzen R5 3600
Motherboard Gigabyte B450 Aorus Elite (F61 BIOS) | B450 matx
Cooling DeepCool Castle EX V2 240mm AIO| stock for now
Memory 8GB X 2 DDR4 3000mhz Team Group Vulcan | 16GB DDR4
Video Card(s) Sapphire Pulse RX 5700 8GB | GTX 1650 4GB
Storage Adata XPG 8200 PRO 512GB SSD OS / 240 SSD + 2TB M.2 SSD Games / 1000 GB Data | SSD + HDD
Display(s) Acer Nitro x27OU 27" VA 165hz Freesync Premium|TCL 32" 1080P w/ HDR
Case NZXT H500 Black | HP Omen Obelisk
Audio Device(s) Onboard Realtek | Onboard Realtek
Power Supply EVGA SuperNOVA G3 650w 80+ Gold | 500w
Mouse Steelseries Rival 500 15 button mouse w/ Razor Goliathus Chroma XL mousemat | Logitech G502
Keyboard Corsair K65 Mini w/ Cherry MX brown keys | Logitech G513 Carbon w/ Romer G tactile keys
Software Windows 10 Pro | Windows 10 Pro
CHOO CHOOOOO!!!!1! Navi Hype Train be rollin'

looks like thread title was written by a 5 year old.....
 
Joined
Feb 3, 2017
Messages
3,877 (1.33/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
They ,meaning Nvidia, do not have RPM , they Can do all of it ,but do some of it with special hardware ie tensor or RtRt core's and some is done by cuda core's but they're not doing it the same way at all.
Yes, Nvidia has a different implementation. Does it matter all that much as long as the same featureset is there?
 
Joined
Mar 10, 2010
Messages
11,880 (2.19/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Gskill Trident Z 3900cas18 32Gb in four sticks./16Gb/16GB
Video Card(s) Asus tuf RX7900XT /Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores laptop Timespy 6506
Yes, Nvidia has a different implementation. Does it matter all that much as long as the same featureset is there?
It does to Nvidia and Amd , but not so much to us no.
But in saying that Nvidia are making quite the big deal at the moment about what they're Special hardware can do aren't they.
 
Joined
Feb 3, 2017
Messages
3,877 (1.33/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
But in saying that Nvidia are making quite the big deal at the moment about what they're Special hardware can do aren't they.
Well, it depends on the context or features/hardware in question.
Couple operations Nvidia implemented in hardware as RT Cores do seem to be somewhat worth hyping - doable in shaders definitely but RT Cores are clearly much more efficient at them.
Tensor cores are a question but it looks like Nvidia has been somewhat hush-hush about what these actually do. For example the part where FP16 is done (or can be done) on Tensor cores is worth noting but of the bigger sites Anandtech was the one that caught wind of it for their TU116 review. I would say this is interesting.
 
Joined
Mar 10, 2015
Messages
3,984 (1.10/day)
System Name Wut?
Processor 3900X
Motherboard ASRock Taichi X570
Cooling Water
Memory 32GB GSkill CL16 3600mhz
Video Card(s) Vega 56
Storage 2 x AData XPG 8200 Pro 1TB
Display(s) 3440 x 1440
Case Thermaltake Tower 900
Power Supply Seasonic Prime Ultra Platinum
CHOO CHOOOOO!!!!1! Navi Hype Train be rollin'

looks like thread title was written by a 5 year old.....

I am 7 actually, sheesh. Age Descrimination. The thread was supposed to be fun (and a joke) because everybody is salty as fuck. Like you. Carry on.
 
Joined
Mar 10, 2010
Messages
11,880 (2.19/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Gskill Trident Z 3900cas18 32Gb in four sticks./16Gb/16GB
Video Card(s) Asus tuf RX7900XT /Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores laptop Timespy 6506
Well, it depends on the context or features/hardware in question.
Couple operations Nvidia implemented in hardware as RT Cores do seem to be somewhat worth hyping - doable in shaders definitely but RT Cores are clearly much more efficient at them.
Tensor cores are a question but it looks like Nvidia has been somewhat hush-hush about what these actually do. For example the part where FP16 is done (or can be done) on Tensor cores is worth noting but of the bigger sites Anandtech was the one that caught wind of it for their TU116 review. I would say this is interesting.
So it does matter just only if it's Nvidia lauding it, anywho.

In the context of this thread we probably need to get more on topic.
 

AlienIsGOD

Vanguard Beta Tester
Joined
Aug 9, 2008
Messages
5,120 (0.85/day)
Location
Kingston, Ontario Canada
System Name Aliens Ryzen Rig | 2nd Hand Omen
Processor Ryzen R5 5600 | Ryzen R5 3600
Motherboard Gigabyte B450 Aorus Elite (F61 BIOS) | B450 matx
Cooling DeepCool Castle EX V2 240mm AIO| stock for now
Memory 8GB X 2 DDR4 3000mhz Team Group Vulcan | 16GB DDR4
Video Card(s) Sapphire Pulse RX 5700 8GB | GTX 1650 4GB
Storage Adata XPG 8200 PRO 512GB SSD OS / 240 SSD + 2TB M.2 SSD Games / 1000 GB Data | SSD + HDD
Display(s) Acer Nitro x27OU 27" VA 165hz Freesync Premium|TCL 32" 1080P w/ HDR
Case NZXT H500 Black | HP Omen Obelisk
Audio Device(s) Onboard Realtek | Onboard Realtek
Power Supply EVGA SuperNOVA G3 650w 80+ Gold | 500w
Mouse Steelseries Rival 500 15 button mouse w/ Razor Goliathus Chroma XL mousemat | Logitech G502
Keyboard Corsair K65 Mini w/ Cherry MX brown keys | Logitech G513 Carbon w/ Romer G tactile keys
Software Windows 10 Pro | Windows 10 Pro
I am 7 actually, sheesh. Age Descrimination. The thread was supposed to be fun (and a joke) because everybody is salty as fuck. Like you. Carry on.
LOL I'm not salty, just wish ppl could act and write more adult like.... This site has gone downhill forum wise the last few years...
 
Joined
Mar 10, 2015
Messages
3,984 (1.10/day)
System Name Wut?
Processor 3900X
Motherboard ASRock Taichi X570
Cooling Water
Memory 32GB GSkill CL16 3600mhz
Video Card(s) Vega 56
Storage 2 x AData XPG 8200 Pro 1TB
Display(s) 3440 x 1440
Case Thermaltake Tower 900
Power Supply Seasonic Prime Ultra Platinum
LOL I'm not salty, just wish ppl could act and write more adult like.

I'm sorry you couldn't see the joke that it was. I have ordered a happy meal for you.
 
Joined
May 15, 2014
Messages
235 (0.06/day)
Tensor cores are a question but it looks like Nvidia has been somewhat hush-hush about what these actually do. For example the part where FP16 is done (or can be done) on Tensor cores is worth noting but of the bigger sites Anandtech was the one that caught wind of it for their TU116 review. I would say this is interesting.

For RTX TU, FP16 is exclusively a tensor op. GTX TU FP16 is interesting given no tensors according to NV. I'm not entirely convinced the hardware is very different. TU SM layout is more tightly packed than GP, but RTX/Tensor silicon appears to be only ~10% of the die. TU uarch is higher area consuming even without the RTX pipeline. Given RTX features only make sense with a minimum raster performance level (2060), I wouldn't be surprised if GTX TU had similar hardware but limited to fp16 ops. The big benefit of RTX tensor cores IMO is the FP32 accumulate for data science.
 

eidairaman1

The Exiled Airman
Joined
Jul 2, 2007
Messages
43,255 (6.74/day)
Location
Republic of Texas (True Patriot)
System Name PCGOD
Processor AMD FX 8350@ 5.0GHz
Motherboard Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory 16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s) AMD Radeon 290 Sapphire Vapor-X
Storage Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s) NEC Multisync LCD 1700V (Display Port Adapter)
Case AeroCool Xpredator Evil Blue Edition
Audio Device(s) Creative Labs Sound Blaster ZxR
Power Supply Seasonic 1250 XM2 Series (XP3)
Mouse Roccat Kone XTD
Keyboard Roccat Ryos MK Pro
Software Windows 7 Pro 64
I am 7 actually, sheesh. Age Descrimination. The thread was supposed to be fun (and a joke) because everybody is salty as fuck. Like you. Carry on.

should be in General Nonsense
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,263 (4.42/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
I'm not sure where you are going with this but thanks for the tip and that's what I said. Mixed precision. Anyway my confusion with you is about a different matter. let me ask you straight. I understand that tensor cores are AI for you or the deep learning or did I just understand you wrong cause that's my impression.
The add is the only one that supports FP32 and the reason for that is so that it is less likely to overflow the FP16*FP16 result. The main point (and why it is good for AI) is that it is a matrix solver for tensor flow. AMD doesn't have a matrix solver. GCN has to do these calculations on the shaders which is much, much slower. Example: Vega can do about 24 TFLOP FP16; Volta can do over 100 TFLOP FP16 in its tensor cores alone.

They ,meaning Nvidia, do not have RPM , they Can do all of it ,but do some of it with special hardware ie tensor or RtRt core's and some is done by cuda core's but they're not doing it the same way at all.

I have a vega, i know what it can do.
NVIDIA added parallelism to deal with the problem in Turing where AMD made Vega more flexible. As a result, Turing has a lot of transistors but more performance where Vega has fewer transistors but less performance.

AMD is going to want to compete in AI so AMD is going to have to add tensor cores eventually but I don't think that is in Navi because it was made for Sony who has no use for it.
 
Joined
May 15, 2014
Messages
235 (0.06/day)
NVIDIA added parallelism to deal with the problem in Turing where AMD made Vega more flexible. As a result, Turing has a lot of transistors but more performance where Vega has fewer transistors but less performance.

Not entirely the same. GCN makes no distinction between graphics & compute modes & can schedule concurrently. TU is better at this than GP et al, but parallelism is a function of running integer & floats at the same time. Just highlights the different uarch approaches. NV prefers discrete specialized silicon costing more die space, whereas AMD (til now) has preferred generalist alus.
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,263 (4.42/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
Turing doesn't sacrifice anything (other than die space) for concurrent FP16 performance. Vega gets FP16 performance by taking away from FP32 performance. This is a disadvantage for Vega and an advantage for Turing when it comes to anything that can benefit from FP16.
 
Joined
Mar 10, 2010
Messages
11,880 (2.19/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Gskill Trident Z 3900cas18 32Gb in four sticks./16Gb/16GB
Video Card(s) Asus tuf RX7900XT /Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores laptop Timespy 6506
The add is the only one that supports FP32 and the reason for that is so that it is less likely to overflow the FP16*FP16 result. The main point (and why it is good for AI) is that it is a matrix solver for tensor flow. AMD doesn't have a matrix solver. GCN has to do these calculations on the shaders which is much, much slower. Example: Vega can do about 24 TFLOP FP16; Volta can do over 100 TFLOP FP16 in its tensor cores alone.


NVIDIA added parallelism to deal with the problem in Turing where AMD made Vega more flexible. As a result, Turing has a lot of transistors but more performance where Vega has fewer transistors but less performance.

AMD is going to want to compete in AI so AMD is going to have to add tensor cores eventually but I don't think that is in Navi because it was made for Sony who has no use for it.
Nvidia couldn't easily put back 64bit compute, they had to go special hardware they added tensor cores after Google ditched their GPUs for their own tensor asic.

And just look how much use their specific hardware is generally, it's useless.
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,263 (4.42/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
For games, mostly. Navi is a gaming product which is why I don't think it will have tensor cores. I would be shocked if Arcturus didn't have tensor cores because AMD is so far behind in machine learning. Then again, companies like Tesla are designing their own chips for machine learning anyway.

Point is: RPM doesn't help much with tensor flow where RTX's tensor cores do. DLSS isn't something Navi will have because it will lack the hardware to do it effectively.
 
Joined
Oct 19, 2007
Messages
8,269 (1.31/day)
Processor Intel i9 9900K @5GHz w/ Corsair H150i Pro CPU AiO w/Corsair HD120 RBG fan
Motherboard Asus Z390 Maximus XI Code
Cooling 6x120mm Corsair HD120 RBG fans
Memory Corsair Vengeance RBG 2x8GB 3600MHz
Video Card(s) Asus RTX 3080Ti STRIX OC
Storage Samsung 970 EVO Plus 500GB , 970 EVO 1TB, Samsung 850 EVO 1TB SSD, 10TB Synology DS1621+ RAID5
Display(s) Corsair Xeneon 32" 32UHD144 4K
Case Corsair 570x RBG Tempered Glass
Audio Device(s) Onboard / Corsair Virtuoso XT Wireless RGB
Power Supply Corsair HX850w Platinum Series
Mouse Logitech G604s
Keyboard Corsair K70 Rapidfire
Software Windows 11 x64 Professional
Benchmark Scores Firestrike - 23520 Heaven - 3670
If patterns are anything to go by, the AMD hype train for their GPU's are going to crash.
 
Joined
May 15, 2014
Messages
235 (0.06/day)
Turing doesn't sacrifice anything (other than die space) for concurrent FP16 performance.

What does "concurrent fp16" even mean? You are aware that half floats & RPM (2xfp16) are used instead of fp32 to increase performance of ops not requiring full float precision? It's a register/resource & throughput gain in the case of 2xfp16. Int32, Int16, transcendentals, etc, still happen in the SM. TU "concurrency" is the ability to pack both integer & floats in the pipeline without bubbles/stalls/context switching.

Vega gets FP16 performance by taking away from FP32 performance. This is a disadvantage for Vega and an advantage for Turing when it comes to anything that can benefit from FP16.

Frightening. You should read the TU uarch & mixed precision white papers.

Point is: RPM doesn't help much with tensor flow where RTX's tensor cores do. DLSS isn't something Navi will have because it will lack the hardware to do it effectively.

Tensor math is just 4x4 matrix FMA. It's the ability of the tensors to work on fp16, int8, int4 that makes them useful in nn ML. I asked someone else earlier: what do you think DLSS is?
 
Joined
Apr 1, 2019
Messages
97 (0.05/day)
Location
Indonesia
Processor Reasonably good Intel CPU
Motherboard Eh, the cheapest ATX that supports the processor
Cooling Big ass Noctua always a good thing to have
Memory Cheapest 32GB kit for my needs
Video Card(s) Nvidia 3070
Storage NVME 2x, SSD 2x, HDD 1x
Display(s) Dual monitor 1080p for life!
Case NZXT Flow
Audio Device(s) ALC something-something
Power Supply Good ol' Corsair
Mouse Good ol' Corsair
Keyboard Cheap Logitech wireless keyboard
Software Windows 11 Pro
If patterns are anything to go by, the AMD hype train for their GPU's are going to crash.
AMD has a bad marketing team. Instead of quelling down the rumors, they just let the rumor spread like wild fire. I was one of the victim of the Radeon 7 hype train. The fall from hype hurts so bad, I now consider EVERY rumor about Zen 2 and Navi as nothing but bad gossip. Frankly I don't want to be a part of a community that harbors and encourage spreading of bad information.
 
Joined
Jul 10, 2010
Messages
1,235 (0.23/day)
Location
USA, Arizona
System Name SolarwindMobile
Processor AMD FX-9800P RADEON R7, 12 COMPUTE CORES 4C+8G
Motherboard Acer Wasp_BR
Cooling It's Copper.
Memory 2 x 8GB SK Hynix/HMA41GS6AFR8N-TF
Video Card(s) ATI/AMD Radeon R7 Series (Bristol Ridge FP4) [ACER]
Storage TOSHIBA MQ01ABD100 1TB + KINGSTON RBU-SNS8152S3128GG2 128 GB
Display(s) ViewSonic XG2401 SERIES
Case Acer Aspire E5-553G
Audio Device(s) Realtek ALC255
Power Supply PANASONIC AS16A5K
Mouse SteelSeries Rival
Keyboard Ducky Channel Shine 3
Software Windows 10 Home 64-bit (Version 1607, Build 14393.969)
If one googles GFX1010:
//On GFX10 I$ is 4 x 64 bytes cache lines. By default prefetcher keeps one cache line behind and reads two ahead. We can modify it with S_INST_PREFETCH for larger loops to have two lines behind and one ahead. Therefor we can benefit from aligning loop headers if loop fits 192 bytes. If loop fits 64 bytes it always spans no more than two cache lines and does not need an alignment. Else if loop is less or equal 128 bytes we do not need to modify prefetch, Else if loop is less or equal 192 bytes we need two lines behind.

-> L0 cache, which is referred to below.

// In WGP mode the waves of a work-group can be executing on either CU of the WGP. Therefore need to invalidate the L0 which is per CU. Otherwise in CU mode and all waves of a work-group are on the same CU, and so the L0 does not need to be invalidated.

-> CU mode and WGP mode

// HWRC = Register destination cache
&
// Try to reassign registers on GFX10+ to reduce register bank conflicts.
// On GFX10 registers are organized in banks. VGPRs have 4 banks assigned in a round-robin fashion: v0, v4, v8... belong to bank 0. v1, v5, v9... to bank 1, etc. SGPRs have 8 banks and allocated in pairs, so that s0:s1, s16:s17, s32:s33 are at bank 0. s2:s3, s18:s19, s34:s35 are at bank 1 etc.
// The shader can read one dword from each of these banks once per cycle. If an instruction has to read more register operands from the same bank an additional cycle is needed. HW attempts to pre-load registers through input operand gathering, but a stall cycle may occur if that fails. For example V_FMA_F32 V111 = V0 + V4 * V8 will need 3 cycles to read operands, potentially incuring 2 stall cycles.
// The pass tries to reassign registers to reduce bank conflicts.
// In this pass bank numbers 0-3 are VGPR banks and 4-11 are SGPR banks, so that 4 has to be subtracted from an SGPR bank number to get the real value. This also corresponds to bit numbers in bank masks used in the pass.

-> HWRC and banking are part of Super-SIMD patents;
https://patents.google.com/patent/US20180357064A1
https://patents.google.com/patent/US20180121386A1

//In one embodiment, each bank of the vector destination cache holds 4 entries, for a total 8 entries with 2 banks.
-> destination register cache // HWRC => 8 destination registers with 3-entry source operand forwarding.

//In one embodiment, source operands buffer holds up to 6 VALU instruction's source operands. In one embodiment, source operand buffer includes dedicated buffers for providing 3 different operands per clock cycle to serve instructions like a fused multiply-add operation which performs a*b+c.
-> source operand buffer => 6 * 3-entry source operand buffer
 
Last edited:
Top