Tuesday, January 24th 2023
Samsung 990 PRO Flagship SSD Has an Endurance Problem, Users Notice Rapid Drive-Health Drops
Samsung 990 PRO is the company's flagship client SSD, which is among the fastest Gen 4 NVMe SSDs you can buy. It also commands a very high price premium, with the 1 TB variant priced at $170, and the 2 TB variant at $290. When you're buying in this segment, you expect the highest endurance figures for your SSD. Client SSD endurance figures are already on the rise, as NAND flash technology evolves. Neowin noticed that their 990 PRO isn't meeting this vital expectation, and with a little digging, found that there are others with this problem, and they didn't just get a bad drive.
Apparently, the "drive health" reading in Samsung Magician—the utility software for Samsung SSDs—drops rather rapidly for the 990 PRO. After a clean software installation on a new drive, Neowin observed that their drive's health reading was already at 99% (something very unexpected for a new drive); and what's worse was that even with regular use of the drive in the following days, the drive health would drop by 1 percentage point every day. Drive health is interchangeable with endurance, as it indicates the number of program-erase (PE) cycles left on the NAND flash memory before regions of the drive's user-area become unwritable.Such a rapid drop in endurance used to be a problem in the very first generations of client SSDs some 15 years ago, but it's highly unusual for a flagship product like the Samsung 990 PRO. This user on Twitter claims that their drive health dropped down to 64% at just 2 TB of total bytes written—something you don't expect even entry-level SSDs to end up with. Neowin's initial RMA request was rejected (the drive returned) as the company found "no defect" with it, but once it realized that it was dealing with the press, it quickly reached out to replace the drive and try to reproduce the issue.
Source:
Neowin
Apparently, the "drive health" reading in Samsung Magician—the utility software for Samsung SSDs—drops rather rapidly for the 990 PRO. After a clean software installation on a new drive, Neowin observed that their drive's health reading was already at 99% (something very unexpected for a new drive); and what's worse was that even with regular use of the drive in the following days, the drive health would drop by 1 percentage point every day. Drive health is interchangeable with endurance, as it indicates the number of program-erase (PE) cycles left on the NAND flash memory before regions of the drive's user-area become unwritable.Such a rapid drop in endurance used to be a problem in the very first generations of client SSDs some 15 years ago, but it's highly unusual for a flagship product like the Samsung 990 PRO. This user on Twitter claims that their drive health dropped down to 64% at just 2 TB of total bytes written—something you don't expect even entry-level SSDs to end up with. Neowin's initial RMA request was rejected (the drive returned) as the company found "no defect" with it, but once it realized that it was dealing with the press, it quickly reached out to replace the drive and try to reproduce the issue.
80 Comments on Samsung 990 PRO Flagship SSD Has an Endurance Problem, Users Notice Rapid Drive-Health Drops
It is cheap, locally.
MX500 $275 for 4tb , $130 for 2tb
870EVO $330 for 4tb, $160 for 2tb
Consider buying 4 of those for RAID5 and you saved up the money for another drive which could be RAID6 / hot spare
There is no meaningful performance difference vs 870EVO
Very reliable in my use case
Currently running 6x2tb + 1x4tb + 3x500gb MX500s in the house in NAS / EPYC VM server
None of them fails so far.
Samsung magician is bias software
You can verify with above mentioned hwinfo64 using sensors only and agreeing to the warning popup message.
www.hwinfo.com/download/
You can also use crystal disk info
www.majorgeeks.com/files/details/crystaldiskinfo.html
Will be interesting to see what's causing it, i would think if its a reporting/calculation bug, it would be the same across all drives with the same firmware.
Here's my 2TB in HWINFO:
99% life on a very old 256gb 850 pro os ssd
Guess these days are long gone
.
Funny 990 has three temperatures now
HWInfo64 shows 3 for some reason but Drive Temperature is simply the NAND temp taken from another sensor most likely.
So unless Samsung has placed 3rd sensor on DRAM i doubt it shows more than the Controller and NAND temps.
I still have one 128GB 850 PRO. The thing is a tank. MLC NAND barely degraded. Before i retired it it has only like 1% wear after 100TB+ writes over multiple years, thousands of hours of power on and hundreds of power cycles.
photos.app.goo.gl/eCUa9RFDkTNwa3XJA :cool:
Normally every drive has a small batch of bad drives. Eg. go to any drive review and look at the often insignificant ( although unfortunate ) amount of bad reviews. How many people came forward with 990 pro life degradation so far? I heard 2 so far maybe more. What's the alternative pcie 5.0 nvme ssds that have no user reviews that cost 2 to 3 times the amount for a real world performance that is probably similar. If anyone does experience performance loss please post here. Although I don't want people to go on a testing frenzy even though testing shouldn't cause loss of life that rapid either.
Update: I am running them at Standard Mode, not sure if full performance is also a culprit?
I have a 960 pro that I had for several years now and its life is still at 98% (13k on-hours, total reads/writes 86/58TB), and the used Crucial P1 that is also a few years old but newer than the samsung is sitting at 99%
Funny there is no way to undo a drive at your own risk when it goes read only.
At the company I work for, we did extensive testing on SSD's years ago, after one brand failed us due to firmware bugs. This stuff is not hard to detect with a reasonable testing setup, if you're looking, especially if you have cooperation from the manufacturer, which reviewers typically have. It was a pet peave of mine when I used to read SSD reviews and all the reviewer would say about endurance is the marketing tarabytes written. I used to comment on reviews requesting them to actually test life exepctancy, or at least LOOK at the SMART data after all their testing, and get a gut feeling if it is ok or not. This would have caught this issue and it could have been presented either to Samsung to look at it, and hopefully fix it before it was released, or have a firmware ready to go soon after release. Or, reviewers could have given a warning to buyers.
Something to understand about life expectancy and "terabytes written". The spec that they advertise and gets repeated by most people and reviewers is NAND writes. And this is based on the NAND that was used in the product, it's well tested, how many writes can the NAND handle on average, mixed in with over provisioning, you have your terabytes written.
The problem is, that people (and probably some reviewers) expect that to be "OS bytes written", when you copy 1 TB to your disk, that's an OS write. Unfortunately, unless you look at the SMART data (you may need manufacturers help to understand how to decode it, or to get the the units they are storing in) and look at the NAND writes. you really have no idea how much you've written to the disk.
Your "1 TB" written, may actually be much greater than 1 TB NAND writes. Actually, now that I think about it, DRAMless cached models, it may even be greater than 2 TB, if they are copying it from some SLC mode NAND over to standard multi layer/3d NAND.
The difference between OS writes and NAND writes is called "write amplification", it's a known metric for SSD manufacturers, and different grade SSD's intended for different uses (consumer vs enterprise for example) will have acceptance of different write amplifications to be within spec. Thats why it's a faux pas to use a consumer grade drive in a NAS. Perhaps DRAMless SSD's have higher write amplifications due to caching, I don't know. But I do know that features like wear leveling and maintenance tasks will increase the write amplification, as the SSD maintains the data on the drive and pushes it around. This is all normal and expected, but if something is wrong with the firmware, things can easily go wrong, and write amplification can get out of control.
We should try to get a full dump of the SMART data, and see if samsung publishes it's SMART data documentation, and we'll probably find that NAND writes are actually right in line with the life expectancy that's being reported. If the workload is typical, then this is likely a firmware bug
I've also noticed some SSDs tend to wear out very quickly (not this model in particular), without much data written at all. My theory is that lots of small writes will cause a lot of wear, at least in certain patterns. But I haven't tested this out in a controlled environment.
I would actually like if TPU wrote an editorial asking the market to create a proper pro SSD again, like a SLC SSD of 256 GB, 512 GB, 1 TB, 2 TB and 4 TB variants. In a market oversaturated with crappy white label SSDs, isn't there room for at least one proper enthusiast/prosumer SSD? I would buy several.
Not sure if this helps Tom's article mentions the 980 pros early death might be fixes with new firmware www.tomshardware.com/news/samsung-980-pro-ssd-failures-firmware-update
The 870 EVO on paper is no better than the 860 EVO so one wonders why the 870 was even released and the only explanation is they done some cost cutting somewhere.
My post was made before i discovered the new firmware and found a way to update.
Media errors kept increasing and available spare kept dropping. Updating the firmware did stop the degradation.
Atleast currently these values have not changed anymore. It was also way more difficult because OEM versions of Samsung SSD's like my PM9A1 cant be updated trough Magician and have to be updated via command-line utility in a roundabout way.
Even finding this utility took me over a year as it was not available when i first checked in 2021.
They need to clean house in their firmware department or actually test the crapware better before release.
Think they adopted microsoft testing which are end users not paid ms personnel :laugh:
hardware/comments/110x049/_/j8by0s8semiconductor.samsung.com/consumer-storage/support/tools/
Unfortunately this only stops the degradation. It does not reverse it.