• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Samsung SSD 860 EVO 250GB - major sudden corruption of random files - advice needed

Joined
Feb 1, 2019
Messages
3,667 (1.70/day)
Location
UK, Midlands
System Name Main PC
Processor 13700k
Motherboard Asrock Z690 Steel Legend D4 - Bios 13.02
Cooling Noctua NH-D15S
Memory 32 Gig 3200CL14
Video Card(s) 4080 RTX SUPER FE 16G
Storage 1TB 980 PRO, 2TB SN850X, 2TB DC P4600, 1TB 860 EVO, 2x 3TB WD Red, 2x 4TB WD Red
Display(s) LG 27GL850
Case Fractal Define R4
Audio Device(s) Soundblaster AE-9
Power Supply Antec HCG 750 Gold
Software Windows 10 21H2 LTSC
I would be diagnosing ram at this point. Change back to Jedec if overclocked and yes XMP is a overclock.
Also do a SATA cable swap, check properly connected.
If CPU overclocked or undervolted reset to stock.

Then continue any drive diagnosis, I am guessing is no backup based on you doing a clone after the problem started.
 

eternalsadness

New Member
Joined
Oct 4, 2024
Messages
26 (0.31/day)
I would be diagnosing ram at this point. Change back to Jedec if overclocked and yes XMP is a overclock.
Also do a SATA cable swap, check properly connected.
If CPU overclocked or undervolted reset to stock.

Then continue any drive diagnosis, I am guessing is no backup based on you doing a clone after the problem started.
SATA cable was changed, even though it had no cable errors in SMART, RAM was tested for 2 hours/1 pass in Memtest86. CPU is stock, 3770k.

Regarding mounting the drive in ubuntu livecd, i get this error from the photos. And shortly after,ubuntu freezes. I remember having this freeze issue with ubuntu livecd in the past(2 different computers, when dealing with drives that had ntfs partitions). This was performed on a 2nd computer.
 

Attachments

  • 1.jpeg
    1.jpeg
    328.8 KB · Views: 46
  • 2.jpeg
    2.jpeg
    279.1 KB · Views: 52
Joined
Jun 10, 2014
Messages
2,995 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
Regarding mounting the drive in ubuntu livecd, i get this error from the photos. And shortly after,ubuntu freezes. I remember having this freeze issue with ubuntu livecd in the past(2 different computers, when dealing with drives that had ntfs partitions). This was performed on a 2nd computer.
Click the partition first. /dev/sda is the entire drive, the correct partition will have a number at the end, like /dev/sda1, /dev/sda2, /dev/sda3, /dev/sda4

Which version of Ubuntu is this? If it's an old one, get a newer version.

To see if there is something useful before the system freezes, open an extra terminal and run this, it will track system logs:
Bash:
tail -f /var/log/syslog
 

eternalsadness

New Member
Joined
Oct 4, 2024
Messages
26 (0.31/day)
Click the partition first. /dev/sda is the entire drive, the correct partition will have a number at the end, like /dev/sda1, /dev/sda2, /dev/sda3, /dev/sda4

Which version of Ubuntu is this? If it's an old one, get a newer version.

To see if there is something useful before the system freezes, open an extra terminal and run this, it will track system logs:
Bash:
tail -f /var/log/syslog
I will do this once it finishes the 4th clone using the 2nd computer.
 
Joined
Feb 1, 2019
Messages
3,667 (1.70/day)
Location
UK, Midlands
System Name Main PC
Processor 13700k
Motherboard Asrock Z690 Steel Legend D4 - Bios 13.02
Cooling Noctua NH-D15S
Memory 32 Gig 3200CL14
Video Card(s) 4080 RTX SUPER FE 16G
Storage 1TB 980 PRO, 2TB SN850X, 2TB DC P4600, 1TB 860 EVO, 2x 3TB WD Red, 2x 4TB WD Red
Display(s) LG 27GL850
Case Fractal Define R4
Audio Device(s) Soundblaster AE-9
Power Supply Antec HCG 750 Gold
Software Windows 10 21H2 LTSC
SATA cable was changed, even though it had no cable errors in SMART, RAM was tested for 2 hours/1 pass in Memtest86. CPU is stock, 3770k.

Regarding mounting the drive in ubuntu livecd, i get this error from the photos. And shortly after,ubuntu freezes. I remember having this freeze issue with ubuntu livecd in the past(2 different computers, when dealing with drives that had ntfs partitions). This was performed on a 2nd computer.
memtest86 is easy to pass, and I have had memory pass tests with everything I throw at it, but then still caused i/o errors on SSD, it can happen.

If you reluctant to move RAM to jedec without a failed test, then try 'stressapptest' tool from a live linux boot.

 

eternalsadness

New Member
Joined
Oct 4, 2024
Messages
26 (0.31/day)
I started doing some low level analysis(with OSForensics and HxD).
OSForensics allows you to see the clusters associated with a file. This information is stored in MFT for each file.
HxD allows you to read clusters that you are interested in.
A cluster is 4096 bytes, you have to multiply the cluster size with the LCN value and you will obtain the decimal offset that needs to be queried with HxD.

I tried to compare some files that are present also in the backup. For example this one, left side is the backup, right side is the corrupted drive.

1728820482659.png


This is a small txt file, it should have 2 clusters, starting from LCN 2102126 (8610308096 decimal offset) on both the backup and the corrupted drive.
On the corrupted drive, there's garbage data, but I searched the whole drive and found the file still present at LCN 96872.

So for some reason, the file was moved to LCN 96872 and the MFT still thinks that it is at LCN 2102126.

Some ideas how to continue?
 
Joined
Feb 11, 2015
Messages
105 (0.03/day)
What process found the file on cluster 96K?

Is there no problem with the data of all directories, and all files list okay?

That file on cluster 96K, is it exactly the same (all 4,102 bytes)?
What's on that cluster in the "good copy"?

For disk recovery and twiddling I like DMDE:
 
Last edited:

eternalsadness

New Member
Joined
Oct 4, 2024
Messages
26 (0.31/day)
What process found the file on cluster 96K?
I looked for its ASCII string using HxD, over the whole disk. It found it, it's legit, I did it for multiple files, strings as long as possible.

Is there no problem with the data of all directories, and all files list okay?
I could not be able to detect a single problem with the directories or the filenames. Everything I checked, it was there.
What I saw is that the data for the files is at different offsets than the one specified in the MFT. At least for small files, the offsets are the same as in the backup(i'd need to verify this assumption for larger files, but these larger files almost all of them have corrupted fragments inside).

That file in cluster 96K, is it exactly the same (all 4,102 bytes)?
What's on that cluster in the "good copy"?
Exactly the same, a phrase that I wrote myself, some memories.

For disk recovery and twiddling I like DMDE:
I don't know how to use DMDE for this situation, but I tried this software.

How should I continue the investigation?
 

eidairaman1

The Exiled Airman
Joined
Jul 2, 2007
Messages
42,684 (6.68/day)
Location
Republic of Texas (True Patriot)
System Name PCGOD
Processor AMD FX 8350@ 5.0GHz
Motherboard Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory 16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s) AMD Radeon 290 Sapphire Vapor-X
Storage Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s) NEC Multisync LCD 1700V (Display Port Adapter)
Case AeroCool Xpredator Evil Blue Edition
Audio Device(s) Creative Labs Sound Blaster ZxR
Power Supply Seasonic 1250 XM2 Series (XP3)
Mouse Roccat Kone XTD
Keyboard Roccat Ryos MK Pro
Software Windows 7 Pro 64
Evo strikes again, back up your files and eliminate the evo drive



 

eternalsadness

New Member
Joined
Oct 4, 2024
Messages
26 (0.31/day)
Evo strikes again, back up your files and eliminate the evo drive




But how do you know that it is because of the EVO drive? There's no error in the SMART, according to SMART the SSD is working perfectly.
Indeed, I had many other SSDs/HDDs, and various BSODs(over the years, one per 1-2-4-8 weeks) and lots of computer uptime(I never turn it off, I put it to sleep) and such corruption hasn't happened to me since the WindowsMe era.

I had a Kingston KC600, brand new, it kept disconnecting completely from the computer(tested on 2 different computers) while the computer was running and could not be seen again in BIOS until I powered off the computer completely. It did this to me many times(but it didn't cause any corruption!!!), until I got fed up and migrated to this 860 EVO brand new, that gave me this corruption present.
 
Joined
Mar 18, 2023
Messages
935 (1.44/day)
System Name Never trust a socket with less than 2000 pins
I ran Memtest86 for 1 pass, over an hour(32GB of RAM), zero errors.
I know that usually you should run it for a day or so, if you really want to be thorough, but I didn't have the nerves at that point.

Plus, the corruption has happened across lots of files, and various directories that were written in different periods over the past year.

It is still more likely that general memory corruption is the cause. If you corrupt file allocation tables and directories you will absolutely destroy files you didn't touch lately. In fact your observation is typical.

I know you ran memtest, but it is time to bring out SuperPi and mprime/prime95. Memtest runs its tests, as good as they are, with a different load than SuperPi, which really stresses the memory controller and memory timings for correctness.
 

eternalsadness

New Member
Joined
Oct 4, 2024
Messages
26 (0.31/day)
It is still more likely that general memory corruption is the cause. If you corrupt file allocation tables and directories you will absolutely destroy files you didn't touch lately. In fact your observation is typical.

I know you ran memtest, but it is time to bring out SuperPi and mprime/prime95. Memtest runs its tests, as good as they are, with a different load than SuperPi, which really stresses the memory controller and memory timings for correctness.
I will do this now.

I allowed Prime95 to run for 10 minutes, it is still running. And SuperPI finished in 11 minutes, probably 50% of the time was in parallel with prime95. prime95 was set to memory controller stressing.

It failed!

1728832388003.png




Another one:

1728832664485.png


I don't know how to remove the XMP. On manual I have to come up with some values. 4x G-Skill Ares F3-1600C10-8GAO

1728834545697.png
 
Last edited:

Shadowized

New Member
Joined
Nov 21, 2023
Messages
14 (0.04/day)
When you do file operations in newer OS the file is copied to RAM and cached there so its very likely your RAM is bad. To disable XMP just set it to auto and save, then check, else just set it manual and 1333MHz.
 

eternalsadness

New Member
Joined
Oct 4, 2024
Messages
26 (0.31/day)
I put it auto and set it to 1333.
I started doing some low level analysis(with OSForensics and HxD).
OSForensics allows you to see the clusters associated with a file. This information is stored in MFT for each file.
HxD allows you to read clusters that you are interested in.
A cluster is 4096 bytes, you have to multiply the cluster size with the LCN value and you will obtain the decimal offset that needs to be queried with HxD.

I tried to compare some files that are present also in the backup. For example this one, left side is the backup, right side is the corrupted drive.

View attachment 367413

This is a small txt file, it should have 2 clusters, starting from LCN 2102126 (8610308096 decimal offset) on both the backup and the corrupted drive.
On the corrupted drive, there's garbage data, but I searched the whole drive and found the file still present at LCN 96872.

So for some reason, the file was moved to LCN 96872 and the MFT still thinks that it is at LCN 2102126.

Some ideas how to continue?
Actually at LCN 96872 it is a copy of the file, not the original one. It seems that I also had a copy in that folder.
I discovered using this command: fsutil volume querycluster D: 96872
and I saw a " - Copy" directory.
 
Joined
Feb 1, 2019
Messages
3,667 (1.70/day)
Location
UK, Midlands
System Name Main PC
Processor 13700k
Motherboard Asrock Z690 Steel Legend D4 - Bios 13.02
Cooling Noctua NH-D15S
Memory 32 Gig 3200CL14
Video Card(s) 4080 RTX SUPER FE 16G
Storage 1TB 980 PRO, 2TB SN850X, 2TB DC P4600, 1TB 860 EVO, 2x 3TB WD Red, 2x 4TB WD Red
Display(s) LG 27GL850
Case Fractal Define R4
Audio Device(s) Soundblaster AE-9
Power Supply Antec HCG 750 Gold
Software Windows 10 21H2 LTSC
I will do this now.

I allowed Prime95 to run for 10 minutes, it is still running. And SuperPI finished in 11 minutes, probably 50% of the time was in parallel with prime95. prime95 was set to memory controller stressing.

It failed!

View attachment 367447



Another one:

View attachment 367448

I don't know how to remove the XMP. On manual I have to come up with some values. 4x G-Skill Ares F3-1600C10-8GAO

View attachment 367450
Auto will probably remove it. But after you select Auto, reboot straight back in to bios, if it only adjusted timings but not clocks then manually fix the clock.
 
Joined
Feb 1, 2019
Messages
3,667 (1.70/day)
Location
UK, Midlands
System Name Main PC
Processor 13700k
Motherboard Asrock Z690 Steel Legend D4 - Bios 13.02
Cooling Noctua NH-D15S
Memory 32 Gig 3200CL14
Video Card(s) 4080 RTX SUPER FE 16G
Storage 1TB 980 PRO, 2TB SN850X, 2TB DC P4600, 1TB 860 EVO, 2x 3TB WD Red, 2x 4TB WD Red
Display(s) LG 27GL850
Case Fractal Define R4
Audio Device(s) Soundblaster AE-9
Power Supply Antec HCG 750 Gold
Software Windows 10 21H2 LTSC
Joined
Feb 1, 2019
Messages
3,667 (1.70/day)
Location
UK, Midlands
System Name Main PC
Processor 13700k
Motherboard Asrock Z690 Steel Legend D4 - Bios 13.02
Cooling Noctua NH-D15S
Memory 32 Gig 3200CL14
Video Card(s) 4080 RTX SUPER FE 16G
Storage 1TB 980 PRO, 2TB SN850X, 2TB DC P4600, 1TB 860 EVO, 2x 3TB WD Red, 2x 4TB WD Red
Display(s) LG 27GL850
Case Fractal Define R4
Audio Device(s) Soundblaster AE-9
Power Supply Antec HCG 750 Gold
Software Windows 10 21H2 LTSC

eternalsadness

New Member
Joined
Oct 4, 2024
Messages
26 (0.31/day)
I don't know how to make them stable in prime95 for largeFFTs. I put them at 1333 and 10-10-10-30, but after 30-60 minutes, some worker still fails.
 
Joined
Jun 3, 2008
Messages
775 (0.13/day)
Location
Pacific Coast
System Name Z77 Rev. 1
Processor Intel Core i7 3770K
Motherboard ASRock Z77 Extreme4
Cooling Water Cooling
Memory 2x G.Skill F3-2400C10D-16GTX
Video Card(s) EVGA GTX 1080
Storage Samsung 850 Pro
Display(s) Samsung 28" UE590 UHD
Case Silverstone TJ07
Audio Device(s) Onboard
Power Supply Seasonic PRIME 600W Titanium
Mouse EVGA TORQ X10
Keyboard Leopold Tenkeyless
Software Windows 10 Pro 64-bit
Benchmark Scores 3DMark Time Spy: 7695
You have 4x 8GB dimms? You may need to play with your CPU settings a bit to stabilize its memory controller. I would try increases PCH and VCCSA voltage a bit. Like 1.1v each, or around there.

Otherwise, ask Gskill. I know they have helped me in the past advising different speeds and latencies for when all the memory slots are full and the system struggles.

What Gskill has told me in the past is that the advertised speeds and latencies are only applicable to the memory in the form sold. So, if you buy 2x kits of 2 to make 4, or 4x kits of 1 to make 4, or anything like that, it is not abnormal for the memory to be unstable at the advertised speeds and frequencies.
 
Last edited:
Joined
Feb 1, 2019
Messages
3,667 (1.70/day)
Location
UK, Midlands
System Name Main PC
Processor 13700k
Motherboard Asrock Z690 Steel Legend D4 - Bios 13.02
Cooling Noctua NH-D15S
Memory 32 Gig 3200CL14
Video Card(s) 4080 RTX SUPER FE 16G
Storage 1TB 980 PRO, 2TB SN850X, 2TB DC P4600, 1TB 860 EVO, 2x 3TB WD Red, 2x 4TB WD Red
Display(s) LG 27GL850
Case Fractal Define R4
Audio Device(s) Soundblaster AE-9
Power Supply Antec HCG 750 Gold
Software Windows 10 21H2 LTSC
You're confusing DDR3 and DDR4 JEDEC speeds, OP has DDR3 which you can see in the BIOS picture.
Then the answer is DDR3 jedec speeds, probably 800-1333 which was in an earlier post anyway. Thanks for correcting me.

I think we just talking about here slowing down the RAM to the point its not going to be causing i/o errors to see if it stabilises the storage. So the lower the clock speed the better.
 

Shadowized

New Member
Joined
Nov 21, 2023
Messages
14 (0.04/day)
I don't know how to make them stable in prime95 for largeFFTs. I put them at 1333 and 10-10-10-30, but after 30-60 minutes, some worker still fails.
I would suggest upgrading your BIOS to the latest version if you haven't already, followed by loading optimized defaults, then try to test it again and if it still happens take out 2 of the 4 sticks and test them in pairs to narrow it further. If that still fails your CPU or RAM is at fault. I lean towards the RAM but it could just be weak IMC on the CPU that requires more tuning to make it stable, though ideally one shouldn't need to do that.
 

eternalsadness

New Member
Joined
Oct 4, 2024
Messages
26 (0.31/day)
After lots of prime95 testing, it seems that with 2 DIMMs it does not throw large FFTs errors. With 3 and 4 DIMMs, it throws.
I think I'm at 1.05V for PCH and VCCSA, RAM is at 1333MHz. What should I do? 1.1V is 10%, is it safe? Will I fry some chips?
GSkill didn't reply with voltage/config suggestions yet.
 
Top