• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Radeon "Navi" OpenCL Bug Makes it Unfit for SETI@Home

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,300 (7.52/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
A bug with the Radeon RX 5700-series "Navi" OpenCL compute API ICD (installable client driver) is causing the GPUs to crunch incorrect results for distributed compute project SETI@Home. Since there are "many" Navi GPUs crunching the project cross-validating each others' incorrect results, the large volume of incorrect results are able to beat the platform's algorithm and passing statistical validation, "polluting" the SETI@Home database. Some volunteers at the SETI@Home forums, where the the issue is being discussed, advocate banning or limiting results from contributors using these GPUs, until AMD comes out with a fix for its OpenCL driver. SETI@Home is a distributed computing project run by SETI (Search for Extraterrestrial Intelligence), tapping into volunteers' compute power to make sense of radio waves from space.



View at TechPowerUp Main Site
 

Space Lynx

Astronaut
Joined
Oct 17, 2014
Messages
17,427 (4.68/day)
Location
Kepler-186f
Processor 7800X3D -25 all core
Motherboard B650 Steel Legend
Cooling Frost Commander 140
Video Card(s) Merc 310 7900 XT @3100 core -.75v
Display(s) Agon 27" QD-OLED Glossy 240hz 1440p
Case NZXT H710 (Red/Black)
Audio Device(s) Asgard 2, Modi 3, HD58X
Power Supply Corsair RM850x Gold
Thank goodness for the Scientific Method. :roll:
 
Joined
Jun 28, 2016
Messages
3,595 (1.16/day)
LOL
SETI@Home is fun and all, but this is a general problem in OpenCL. There's a suggestion that Navi has bad FFT implementation.
So as of this moment Navi cards are unfit for almost all computing production systems... and rather pointless for development (even students).

And this shows up basically a week after W5700 launch.

Fun stuff.
 
Joined
Mar 11, 2008
Messages
982 (0.16/day)
Location
Hungary / Budapest
System Name Kincsem
Processor AMD Ryzen 9 9950X
Motherboard ASUS ProArt X870E-CREATOR WIFI
Cooling Be Quiet Dark Rock Pro 5
Memory Kingston Fury KF560C32RSK2-96 (2×48GB 6GHz)
Video Card(s) Sapphire AMD RX 7900 XT Pulse
Storage Samsung 970PRO 500GB + Samsung 980PRO 2TB + FURY Renegade 2TB+ Adata 2TB + WD Ultrastar HC550 16TB
Display(s) Acer QHD 27"@144Hz 1ms + UHD 27"@60Hz
Case Cooler Master CM 690 III
Power Supply Seasonic 1300W 80+ Gold Prime
Mouse Logitech G502 Hero
Keyboard HyperX Alloy Elite RGB
Software Windows 10-64
Benchmark Scores https://valid.x86.fr/9qw7iq https://valid.x86.fr/4d8n02 X570 https://www.techpowerup.com/gpuz/g46uc
Is this project still going on?
Wow
 
Joined
Dec 22, 2011
Messages
3,890 (0.82/day)
Processor AMD Ryzen 7 3700X
Motherboard MSI MAG B550 TOMAHAWK
Cooling AMD Wraith Prism
Memory Team Group Dark Pro 8Pack Edition 3600Mhz CL16
Video Card(s) NVIDIA GeForce RTX 3080 FE
Storage Kingston A2000 1TB + Seagate HDD workhorse
Display(s) Samsung 50" QN94A Neo QLED
Case Antec 1200
Power Supply Seasonic Focus GX-850
Mouse Razer Deathadder Chroma
Keyboard Logitech UltraX
Software Windows 11
Navi gonna find aliens this way.
 
Joined
Jun 19, 2010
Messages
409 (0.08/day)
Location
Germany
Processor Ryzen 5600X
Motherboard MSI A520
Cooling Thermalright ARO-M14 orange
Memory 2x 8GB 3200
Video Card(s) RTX 3050 (ROG Strix Bios)
Storage SATA SSD
Display(s) UltraHD TV
Case Sharkoon AM5 Window red
Audio Device(s) Headset
Power Supply beQuiet 400W
Mouse Mountain Makalu 67
Keyboard MS Sidewinder X4
Software Windows, Vivaldi, Thunderbird, LibreOffice, Games, etc.
Thats a feature to stop non-gaming misusage.
 
Joined
Jan 27, 2015
Messages
454 (0.13/day)
System Name Marmo / Kanon
Processor Intel Core i7 9700K / AMD Ryzen 7 5800X
Motherboard Gigabyte Z390 Aorus Pro WiFi / X570S Aorus Pro AX
Cooling Noctua NH-U12S x 2
Memory Corsair Vengeance 32GB 2666-C16 / 32GB 3200-C16
Video Card(s) KFA2 RTX3070 Ti / Asus TUF RX 6800XT OC
Storage Samsung 970 EVO+ 1TB, 860 EVO 1TB / Samsung 970 Pro 1TB, 970 EVO+ 1TB
Display(s) Dell AW2521HFA / U2715H
Case Fractal Design Focus G / Pop Air RGB
Audio Device(s) Onboard / Creative SB ZxR
Power Supply SeaSonic Focus GX 650W / PX 750W
Mouse Logitech MX310 / G1
Keyboard Logitech G413 / G513
Software Win 11 Ent
LOL
SETI@Home is fun and all, but this is a general problem in OpenCL. There's a suggestion that Navi has bad FFT implementation.
So as of this moment Navi cards are unfit for almost all computing production systems... and rather pointless for development (even students).

And this shows up basically a week after W5700 launch.

Fun stuff.

Really?

Fourier Transform is one of the fundamentals for compute work. If AMD indeed screwed up its implementation at the hardware level, they would need a recall.
 
Joined
Jan 11, 2005
Messages
1,491 (0.20/day)
Location
66 feet from the ground
System Name 2nd AMD puppy
Processor FX-8350 vishera
Motherboard Gigabyte GA-970A-UD3
Cooling Cooler Master Hyper TX2
Memory 16 Gb DDR3:8GB Kingston HyperX Beast + 8Gb G.Skill Sniper(by courtesy of tabascosauz &TPU)
Video Card(s) Sapphire RX 580 Nitro+;1450/2000 Mhz
Storage SSD :840 pro 128 Gb;Iridium pro 240Gb ; HDD 2xWD-1Tb
Display(s) Benq XL2730Z 144 Hz freesync
Case NZXT 820 PHANTOM
Audio Device(s) Audigy SE with Logitech Z-5500
Power Supply Riotoro Enigma G2 850W
Mouse Razer copperhead / Gamdias zeus (by courtesy of sneekypeet & TPU)
Keyboard MS Sidewinder x4
Software win10 64bit ltsc
Benchmark Scores irrelevant for me
is a special bug inserted by aliens so we can't find them :roll:
 
Joined
Jan 27, 2015
Messages
1,065 (0.29/day)
System Name loon v4.0
Processor i7-11700K
Motherboard asus Z590TUF+wifi
Cooling Custom Loop
Memory ballistix 3600 cl16
Video Card(s) eVga 3060 xc
Storage WD sn570 1tb(nvme) SanDisk ultra 2tb(sata)
Display(s) cheap 1080&4K 60hz
Case Roswell Stryker
Power Supply eVGA supernova 750 G6
Mouse eats cheese
Keyboard warrior!
Benchmark Scores https://www.3dmark.com/spy/21765182 https://www.3dmark.com/pr/1114767
as usual amd is late to the party anyhow.

2080TIs already found space invaders.
 
Joined
Jan 8, 2017
Messages
9,506 (3.27/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Since there are "many" Navi GPUs crunching the project cross-validating each others' incorrect results, the large volume of incorrect results are able to beat the platform's algorithm and passing statistical validation, "polluting" the SETI@Home database.

What ? Why ? That is by far the shittiest validation method I have ever heard of.
 
Joined
Jun 28, 2016
Messages
3,595 (1.16/day)
What ? Why ? That is by far the shittiest validation method I have ever heard of.
?
That's how science works. If most people on Earth do an experiment incorrectly, the bad result becomes statistically relevant (as in: not an obvious outlier).
There's no way to test this other than perform a different experiment of the same phenomenon.

In fact, that's why we're able to notice these issues in computational science.
There are different libraries that do equivalent math. And there are different CPUs and GPUs that we can compare.

If Navi was doing some computation incorrectly, but no other hardware was used, there would be no way to test for this error.
 
Low quality post by PanicLake
Joined
Jul 16, 2014
Messages
8,219 (2.16/day)
Location
SE Michigan
System Name Dumbass
Processor AMD Ryzen 7800X3D
Motherboard ASUS TUF gaming B650
Cooling Artic Liquid Freezer 2 - 420mm
Memory G.Skill Sniper 32gb DDR5 6000
Video Card(s) GreenTeam 4070 ti super 16gb
Storage Samsung EVO 500gb & 1Tb, 2tb HDD, 500gb WD Black
Display(s) 1x Nixeus NX_EDG27, 2x Dell S2440L (16:9)
Case Phanteks Enthoo Primo w/8 140mm SP Fans
Audio Device(s) onboard (realtek?) - SPKRS:Logitech Z623 200w 2.1
Power Supply Corsair HX1000i
Mouse Steeseries Esports Wireless
Keyboard Corsair K100
Software windows 10 H
Benchmark Scores https://i.imgur.com/aoz3vWY.jpg?2
?
That's how science works. If most people on Earth do an experiment incorrectly, the bad result becomes statistically relevant (as in: not an obvious outlier).
There's no way to test this other than perform a different experiment of the same phenomenon.

In fact, that's why we're able to notice these issues in computational science.
There are different libraries that do equivalent math. And there are different CPUs and GPUs that we can compare.

If Navi was doing some computation incorrectly, but no other hardware was used, there would be no way to test for this error.
hold the phone, something i can agree from you? nahhh :p

the only way to test correctly is to use other hardware, iirc they try to not send the validation to a similar system. They wont just discard the data, they'll save it to send it out again. I do agree they should suspend the 5700s for the time being.
 
Joined
Jan 8, 2017
Messages
9,506 (3.27/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Yep, it is like saying Trump is a "nice person" because many people voted for him.

The way it works is you collect experimental and validation data within the same experiment, afterwards, when you have a model you use the validation data to test it and not the output of another model as it is pretty much the case here with the way SETI is testing these results.
 
Joined
Nov 7, 2009
Messages
4,513 (0.82/day)
Location
Denmark
System Name The work PC /2700x/5950x
Processor 3900X stock/ 2700x stock/ 5950x 4200 MHz fixed @ 1,056-1,08V
Motherboard Gigabyte AORUS Master X570/2xMSI X470 M7 AC
Cooling Custom WC XSPC RX480, Laing DDC, XSPC Laing DDC Top V3 and EK Velocity/NH15/NH-U12S SE
Memory 32 GB Viper 3600/14 /16 GB Trident Z F4-4000C18D-16GTZSW 3600 /32 GB G Skill Flare CL14 3400
Video Card(s) 2070 Super X MSI/GTX 970 MSI/ GTX 970 MSI
Storage 1 TB SSD+500 GB NVMe / 500 GB SSD/ 2 TB 990 Pro
Display(s) Dell UltraSharp U2518D/2408WFP
Case Corsair 800D / Lian test bench/NZXT 500
Power Supply AX 850 Titanium/AX 860i/AX 760
Software Dual boot/Win 10 / Linux / Win 10+Linux
They doesn't work at F@H either....
 
Joined
Mar 18, 2008
Messages
5,717 (0.93/day)
System Name Virtual Reality / Bioinformatics
Processor Undead CPU
Motherboard Undead TUF X99
Cooling Noctua NH-D15
Memory GSkill 128GB DDR4-3000
Video Card(s) EVGA RTX 3090 FTW3 Ultra
Storage Samsung 960 Pro 1TB + 860 EVO 2TB + WD Black 5TB
Display(s) 32'' 4K Dell
Case Fractal Design R5
Audio Device(s) BOSE 2.0
Power Supply Seasonic 850watt
Mouse Logitech Master MX
Keyboard Corsair K70 Cherry MX Blue
VR HMD HTC Vive + Oculus Quest 2
Software Windows 10 P
No surprise. OpenCL has been loosing developer interest for a long time. Small community, little resources, bugged GPU driver and etc.

This is the case for almost all “Open Standard” computation acceleration framework. Not a lot of researchers like to invest their money and human resources into such things due to fear of being ripped off by bigger fish since everything published will be fair game to use. It is a damn shame though. OpenCL would have been a great alternative to CUDA.
 
Joined
Apr 21, 2010
Messages
578 (0.11/day)
System Name Home PC
Processor Ryzen 5900X
Motherboard Asus Prime X370 Pro
Cooling Thermaltake Contac Silent 12
Memory 2x8gb F4-3200C16-8GVKB - 2x16gb F4-3200C16-16GVK
Video Card(s) XFX RX480 GTR
Storage Samsung SSD Evo 120GB -WD SN580 1TB - Toshiba 2TB HDWT720 - 1TB GIGABYTE GP-GSTFS31100TNTD
Display(s) Cooler Master GA271 and AoC 931wx (19in, 1680x1050)
Case Green Magnum Evo
Power Supply Green 650UK Plus
Mouse Green GM602-RGB ( copy of Aula F810 )
Keyboard Old 12 years FOCUS FK-8100
No surprise. OpenCL has been loosing developer interest for a long time. Small community, little resources, bugged GPU driver and etc.

This is the case for almost all “Open Standard” computation acceleration framework. Not a lot of researchers like to invest their money and human resources into such things due to fear of being ripped off by bigger fish since everything published will be fair game to use. It is a damn shame though. OpenCL would have been a great alternative to CUDA.

the Issue is Navi not Other AMD cards.this problem has nothing to do with OpenCL Driver or anything , Only Navi. Hold you breath.Man , Read all comments !!

I run a rx5700 and have noticed this issue. The task runs to completion and returns blatantly incorrect results. The only times when my rx5700 GPU gets a valid result is when it is validated against another AMD rx5700 series GPU (both gets the wrong result). I've currently stopped my computer from accepting GPU work units (it took me way too long to realize something was wrong, sorry). I believe this is an issue with the Navi architecture and not necessarily solely with AMD's OpenCL driver, as I see older AMD GPUs still returning "correct" results.

Someone has to redo all the work units where the results came from Navi AMD GPUs (RX5700, RX 5700XT, RX 5500M, RX 5500), and ban all AMD Navi GPUs until a fix is found.

Interestingly, my RX5700 has not been causing issues with other projects, like Einstein@home, Milkyway@home, Collatz, etc. Something about Navi and OpenCL really does not like Seti@home.

If any of you need any testing or logs on an AMD RX5700, hit me up.

edit: Corrected OpenGl to OpenCl, thanks Keith Myers
 

Cheeseball

Not a Potato
Supporter
Joined
Jan 2, 2009
Messages
2,043 (0.35/day)
Location
Pittsburgh, PA
System Name Titan
Processor AMD Ryzen™ 7 7950X3D
Motherboard ASRock X870 Taichi Lite
Cooling Thermalright Phantom Spirit 120 EVO CPU
Memory TEAMGROUP T-Force Delta RGB 2x16GB DDR5-6000 CL30
Video Card(s) ASRock Radeon RX 7900 XTX 24 GB GDDR6 (MBA)
Storage Crucial T500 2TB x 3
Display(s) LG 32GS95UE-B, ASUS ROG Swift OLED (PG27AQDP), LG C4 42" (OLED42C4PUA)
Case Cooler Master QUBE 500 Flatpack Macaron
Audio Device(s) Kanto Audio YU2 and SUB8 Desktop Speakers and Subwoofer, Cloud Alpha Wireless
Power Supply Corsair SF1000
Mouse Logitech Pro Superlight 2 (White), G303 Shroud Edition
Keyboard Keychron K2 HE Wireless / 8BitDo Retro Mechanical Keyboard (N Edition) / NuPhy Air75 v2
VR HMD Meta Quest 3 512GB
Software Windows 11 Pro 64-bit 24H2 Build 26100.2605
Hopefully the Adrenalin Pro drivers for the new Radeon Pro WX 5700 aren't affected by this, because this would be bad for its launch.

They probably prioritized fixing the random crashes in the drivers first before concentrating on GPGPU stuff.
 
Joined
Jun 28, 2016
Messages
3,595 (1.16/day)
They doesn't work at F@H either....
Until this is solved, we can safely assume Navi doesn't work in most popular computation scenarios.
Of course this can be fixed in software. Let's hope there will not be any performance penalty, because what would that mean for all the Navi supercomputers ordered? :D
 
Joined
Feb 25, 2016
Messages
292 (0.09/day)
Until this is solved, we can safely assume Navi doesn't work in most popular computation scenarios.
Of course this can be fixed in software. Let's hope there will not be any performance penalty, because what would that mean for all the Navi supercomputers ordered? :D
It's working fine with projects like Einstein@home, Milkyway@home, Collatz, etc. I know Seti@home isn't working fine. I'm not sure about F@H. And which supercomputers have ordered navi?
Vega is AMD's compute card atm. Arcturus is coming compute card, which is more similar to Vega than Navi.
 
Joined
Feb 18, 2005
Messages
5,847 (0.81/day)
Location
Ikenai borderline!
System Name Firelance.
Processor Threadripper 3960X
Motherboard ROG Strix TRX40-E Gaming
Cooling IceGem 360 + 6x Arctic Cooling P12
Memory 8x 16GB Patriot Viper DDR4-3200 CL16
Video Card(s) MSI GeForce RTX 4060 Ti Ventus 2X OC
Storage 2TB WD SN850X (boot), 4TB Crucial P3 (data)
Display(s) 3x AOC Q32E2N (32" 2560x1440 75Hz)
Case Enthoo Pro II Server Edition (Closed Panel) + 6 fans
Power Supply Fractal Design Ion+ 2 Platinum 760W
Mouse Logitech G602
Keyboard Razer Pro Type Ultra
Software Windows 10 Professional x64
LMAO ouch:

Keith Myers from SETI@home forums said:
The new Navi 5700 and 5700XT are useless for compute currently. The drivers are not ready for compute. All projects that rely on AMD OpenCL drivers are producing nothing but garbage results and invalids. The AMD developers and the Khronos group are aware of the problem but not a peep from either of them about what the real problem is or when to expect a fix. In the meantime, I think those cards should be banned until the drivers are fixed for all projects.

and:

Keith Myers from SETI@home forums said:
Phoronix did testing and reviews of the RX 5700XT and could not get the card and drivers to pass the OpenCL parts of their standardized test suite.
 
Joined
Dec 30, 2010
Messages
2,200 (0.43/day)
Could be pretty much due to running maths on a consumer graphics card instead of a pro version. Many of the Vega chips where initially designed as a PRO card but failed certain quality guidelines.
 
Joined
Jun 28, 2016
Messages
3,595 (1.16/day)
It's working fine with projects like Einstein@home, Milkyway@home, Collatz, etc. I know Seti@home isn't working fine. I'm not sure about F@H. And which supercomputers have ordered navi?
Vega is AMD's compute card atm. Arcturus is coming compute card, which is more similar to Vega than Navi.
Computing is not about funky distributed projects.
This problem was noticed in one of them because gamers already started using Navi (card for scientists/engineers was just announced and isn't used yet).

A GPU doesn't have a "calculate Seti@home" that doesn't work (while "calculate Einstein@home" does).
It makes errors in some math instruction that Einstein@home may not use. That's it.

As mentioned earlier: there's a possibility that FFT results are incorrect. FFT (Fast Fourier Transform) is a fundamental algorithm used for many problems. So the card is already almost useless for computing.
And another thing is about being reliable. It's obvious that AMD haven't properly tested this card, so there's really no reason to believe in other results. Everything will have to be tested by the clients... and there goes the "value".
 
Joined
Nov 4, 2005
Messages
12,014 (1.72/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s) 55" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
Unacceptable, these cards are sold with a feature set as advertised. Failing the standards set forth as advertised is false advertising, and consumers of all types should receive the product they pay for.
 
Top