Intel Rolls Out SSD 670p Mainstream NVMe SSD Series

bug · Mar 4, 2021

newtekie1 said:
That does not happen. Every time a higher bit per cell of flash is used in SSDs the endurance ratings are always lower than the previous. When MLC drives came out, their endurance ratings were significantly lower than SLC drives. As MLC matured the endurance ratings increased, but they never got to SLC levels. When TLC came out, the endurance ratings were significantly lower than MLC. As TLC matured the endurance ratings went up, but never matched MLC. Now the same is happening with QLC.

Actually, TLC reached MLC endurance levels of endurance when manufacturers started to use V-NAND.

minami · Mar 4, 2021

Oh!
It's amazing that there are dealers who can perfectly distinguish between a NAND or controller failure when an SSD fails!
When the Ready/Busy signal on the NAND chip is fixed at "L", how can a dealer tell if it's a NAND failure or a controller failure?
A store that can tell if it's not a controller failure when a NAND chip with a power supply of around 2.7V has data corruption due to a problem with the power supply's buck converter.
A store that can determine that it's a NAND failure when one of the multiple NAND chips has a damaged boost circuit that prevents it from writing correctly!
If that's true, it would be nice to have one close to home.

This is one of my actual experiences, but the job of data recovery of Silicon Power S55 came to me.
I disassembled it and examined it thoroughly, and found that one small coil was defective.
The OS instructed me to initialize it. And the Device Manager informed me that it was a Kingston SSD.
When I connected the coil to where it should be, the SSD was recognized as a Silicon Power SSD and data could be read. This is a strange symptom, but the cause is a bad coil.

Today's NAND is based on a very delicate technology. Most of the failures of SSDs with low heat generation are caused by defective NAND or solder.
I see a lot of products that use poor quality TLC-NAND. Naturally, QLC-NAND also appears. It's a silicon lottery.
If you must buy a QLC-SSD, do your research and make sure you don't get a bad one.

lexluthermiester · Mar 4, 2021

bug said:
Actually, TLC reached MLC endurance levels of endurance when manufacturers started to use V-NAND.

Did they? I'd read somewhere that it was close but not quite there.

minami said:
When the Ready/Busy signal on the NAND chip is fixed at "L", how can a dealer tell if it's a NAND failure or a controller failure?

Two points. When a drive is readable, accessing the smart data and testing the wear by reading how many sectors have been reaccocated is a solid indication of drive stability. The sheer number of QLC based drives that have come back is the glaring indicator. Whether or not it's the NAND or the controller is sometimes impossible to determine. However, when the OS is telling the user that the drive is in a state of "imminent failure" and cautions them to replace the drive soon, that's when we can look at the drive data and see where the problem is. Of the QLC drives that have failed, only one refused to boot. For the rest it was not a controller problem, the NAND was wearing out.

bug · Mar 4, 2021

lexluthermiester said:
Did they? I'd read somewhere that it was close but not quite there.

With the countless manufacturing processes, let's just say they got in the same ballpark.
On one hand, he had TLC drives with better endurance than MLC: https://forums.anandtech.com/threads/how-is-3d-tlc-nand-more-reliable-than-mlc.2493029/
On the other, there was V-NAND MLC and of course that had way better endurance.

Bottom line, this little manufacturing trick brought the then worrisome TLC into a territory people were comfortable with. It doesn't look like QLC will get the same chance.

minami · Mar 4, 2021

lexluthermiester said:
Two points. When a drive is readable, accessing the smart data and testing the wear by reading how many sectors have been reaccocated is a solid indication of drive stability. The sheer number of QLC based drives that have come back is the glaring indicator. Whether or not it's the NAND or the controller is sometimes impossible to determine. However, when the OS is telling the user that the drive is in a state of "imminent failure" and cautions them to replace the drive soon, that's when we can look at the drive data and see where the problem is. Of the QLC drives that have failed, only one refused to boot. For the rest it was not a controller problem, the NAND was wearing out.

You're right. I do the same thing.
In addition, I read the entire area as a backup and write it back later.
It will wear out the NAND, but I am in the data recovery business, and data rescue is the most important thing.
After the rescue is completed, the entire area is written back to check the health of the NAND.

If there are a lot of blocks with long read latency right after they are written, even if the total write volume is small, the SSD can no longer be trusted.
In the worst case, you may have to wait more than 350ms to read one block.
With the advent of TLC-NAND, the number of jobs has increased, and QLC is naturally going to get worse.
If the controller is the cause, it's still better because the data can be saved. But the actual cause is NAND wear and solder.

efikkan · Mar 4, 2021

newtekie1 said:
That does not happen. Every time a higher bit per cell of flash is used in SSDs the endurance ratings are always lower than the previous. When MLC drives came out, their endurance ratings were significantly lower than SLC drives. As MLC matured the endurance ratings increased, but they never got to SLC levels. When TLC came out, the endurance ratings were significantly lower than MLC. As TLC matured the endurance ratings went up, but never matched MLC. Now the same is happening with QLC.

I was talking of the endurance ratings of the products, not the cells, being inflated. With TLC and QLC SSDs using multiple qualities of flash, it matters a lot how they estimate the usage patterns, as the QLC has somewhere of 1/100th - 1/1000th of the endurance of SLC, if we're optimistic.

Look at this 670p QLC device offering an impressive 370 TBW for 1 TB variant(this is close to TLC numbers), compared to the 660p at 200 TBW, both with a 140 GB SLC cache. So they mostly achieved this "improvement" in tuning how the SLC caching works, because the QLC flash itself contributes very little to this endurance number. Post #54 from TheLostSwede really shows in their marketing materials, they clearly have assumptions of how to use the SSD right, assume how much will be in the SLC on average, and uses this to inflate the endurance ratings. If your usage pattern deviates from this model just a tiny bit, your endurance will be less than half, and if it deviates a lot you probably get more than one order of magnitude less.

This is in stark contrast with good MLC SSDs like Samsung 960 Pro 1TB (800 TBW) and 970 Pro 1 TB (1200 TBW), which probably have optimistic endurance ratings too, but at least they were much less sensitive to the user using it the right way.

If intel thinks their buyers should stay <25% usage of their SSDs for them to have a decent lifetime, then what good are these capacity gains from QLC over TLC and MLC?

newtekie1 said:
And my experience contradicts yours. Of the hundreds of QLC drives I've sold through my shop at this point, not one has come back with a NAND failure(I have had a few come back due to controller failures).

Since you haven't observed the same cases, I don't think you can say it contradicts his experience, unless you somehow can discredit his observations. Proving a negative is hard; you need a significant statistical basis to claim there isn't a problem, while only needing a comparatively few samples to prove a problem exists.

lexluthermiester · Mar 4, 2021

bug said:
It doesn't look like QLC will get the same chance.

This is because the electrochemical process of programing QLC with data pushes the chemistry to it's physical limits very quickly. It's not a viable technology for data reliability unless the device it's used in is not exposed to frequent writes.

efikkan said:
as the QLC has somewhere of 1/10-1/100th of the endurance of TLC.

This needs correction.

efikkan said:
If intel thinks their buyers should stay <25% usage of their SSDs for them to have a decent lifetime, then what good are these capacity gains from QLC over TLC and MLC?

Very well said!

bug · Mar 4, 2021

lexluthermiester said:
This is because the electrochemical process of programing QLC with data pushes the chemistry to it's physical limits very quickly. It's not a viable technology for data reliability unless the device it's used in is not exposed to frequent writes.

And with no writes, you're subjected to leakage and thus data loss. It's kinda lose-lose situation.

I could tolerate QLC as my storage drive (and remember to write everything again every now and then). But I don't want to imagine every manufacturer saying bye-bye to TLC and start offering QLC only.

lexluthermiester · Mar 4, 2021

bug said:
And with no writes, you're subjected to leakage and thus data loss. It's kinda lose-lose situation.

To be fair, even with QLC, that process take nearly 4 years to become a serious problem to data. Most NAND controllers do a data refresh cycle to mitigate that problem.

bug said:
But I don't want to imagine every manufacturer saying bye-bye to TLC and start offering QLC only.

Right there with you on that one. They do that, I'll go back to HDD's exclusively.

newtekie1 · Mar 4, 2021

efikkan said:
If intel thinks their buyers should stay <25% usage of their SSDs for them to have a decent lifetime, then what good are these capacity gains from QLC over TLC and MLC?

Staying under 25% usage is not at all about lifetime, it is about performance. They aren't changing their TBW spec based on how full the drive is. The same is true with the SLC cache. The improvements there were about performance not lifespan of the drive and that is what those slides were demonstrating.

It is also worth pointing out that writing directly to the QLC on the 670p is significantly faster than writing directly to the QLC on the 660p(by a factor of 4x faster). So it would seem there have in fact been improvements to the QLC NAND itself.

efikkan said:
Since you haven't observed the same cases, I don't think you can say it contradicts his experience, unless you somehow can discredit his observations. Proving a negative is hard; you need a significant statistical basis to claim there isn't a problem, while only needing a comparatively few samples to prove a problem exists.

Sure it can. I've sold hundreds of QLC drives and none have come back with NAND errors. His relatively small sample size, and samples from just one manufacturer, isn't enough to say there is a problem with QLC in general.

bug said:
And with no writes, you're subjected to leakage and thus data loss. It's kinda lose-lose situation.

I could tolerate QLC as my storage drive (and remember to write everything again every now and then). But I don't want to imagine every manufacturer saying bye-bye to TLC and start offering QLC only.

Not really, the controller handles the refreshing of the cells as long as the drive has power. This solution has been in place since TLC drives which require the same thing. The only time it becomes a problem is when the drive sits unpowered for a significant amount of time.

bug · Mar 4, 2021

newtekie1 said:
Not really, the controller handles the refreshing of the cells as long as the drive has power. This solution has been in place since TLC drives which require the same thing. The only time it becomes a problem is when the drive sits unpowered for a significant amount of time.

True, I have forgotten about that. But even refreshing eats into the precious p/e cycles

It's one of the reason I was rooting for XPoint.

dragontamer5788 · Mar 4, 2021

bug said:
True, I have forgotten about that. But even refreshing eats into the precious p/e cycles
It's one of the reason I was rooting for XPoint.

Patrol Reads shouldn't have to erase-cycle cells unless the voltage level started going out-of-whack.

"If (block has error) rewrite block".

I don't know what kind of error correction exists per block, but its probably on the order of 16-bytes or more. Which means 8+bytes (64-bits) need to be in error before you reach an uncorrectable error. If you're constantly doing patrol-reads, then only one or two bits will go bad at a time, giving plenty of time for the controller to fix the issue.

bug · Mar 4, 2021

dragontamer5788 said:
Patrol Reads shouldn't have to erase-cycle cells unless the voltage level started going out-of-whack.

"If (block has error) rewrite block".

I don't know what kind of error correction exists per block, but its probably on the order of 16-bytes or more. Which means 8+bytes (64-bits) need to be in error before you reach an uncorrectable error. If you're constantly doing patrol-reads, then only one or two bits will go bad at a time, giving plenty of time for the controller to fix the issue.

Even so, to correct 1 or 2 bits, it still has to write an entire block (4kB), right?
But yes, I didn't mean to imply the drive will go bad just by refreshing itself. Just the refreshing itself isn't a "free lunch" kinda deal.

dragontamer5788 · Mar 4, 2021

bug said:
Even so, to correct 1 or 2 bits, it still has to write an entire block (4kB), right?

Depends on the Flash. The block erase could be 128kB for example even if individual read/write pages are 4kB.

EDIT: Now that I think of it: that's the point of TRIM. You erase 128kB sections, but then new writes are just 4kB at a time. So TRIM is a delayed "garbage collection", that helps batch up writes across your SSD, so this whole process is surprisingly efficient.

But yes, I didn't mean to imply the drive will go bad just by refreshing itself. Just the refreshing itself isn't a "free lunch" kinda deal.

Fair.

efikkan · Mar 5, 2021

newtekie1 said:
Staying under 25% usage is not at all about lifetime, it is about performance. They aren't changing their TBW spec based on how full the drive is. The same is true with the SLC cache. The improvements there were about performance not lifespan of the drive and that is what those slides were demonstrating.

It absolutely matters.
One user who does a lot of writes but never fills up a lot will cause way less wear compared to one user filling up most and randomly overwrites most of it. The endurance rating is based on a very optimistic usage pattern, where the QLC probably contributes to less than 2% of the TBW rating.

newtekie1 said:
It is also worth pointing out that writing directly to the QLC on the 670p is significantly faster than writing directly to the QLC on the 660p(by a factor of 4x faster). So it would seem there have in fact been improvements to the QLC NAND itself.

Faster doesn't mean it's more durable. Even if they managed to improve QLC endurance by 10x, it will still be fairly bad.

newtekie1 said:
Sure it can. I've sold hundreds of QLC drives and none have come back with NAND errors. His relatively small sample size, and samples from just one manufacturer, isn't enough to say there is a problem with QLC in general.

These problems are highly dependent on the usage pattern, not random chance, so a selection of hundreds is nowhere near close enough to dismiss the existence of a problem. You also may not know whether your customers are able to detect these problems in time (or even bother) or correctly identify them.

When people like me have witnessed up to hundreds of computers in companies which a specific usage pattern, and problems with file system corruption and SMART errors are far more frequent than they should, then this forms a solid basis to conclude that usage patterns can significantly affect the lifespan.

newtekie1 · Mar 5, 2021

efikkan said:
It absolutely matters.
One user who does a lot of writes but never fills up a lot will cause way less wear compared to one user filling up most and randomly overwrites most of it.

I'm not debating this.

efikkan said:
The endurance rating is based on a very optimistic usage pattern, where the QLC probably contributes to less than 2% of the TBW rating.

This is incorrect. The rating remains the same no matter how the user fills up the drive. Nothing Intel has said indicates the endurance rating changes based on how filled the drive is. It is also why their endurance ratings are a worst case and the drives will likely far exceed that rating.

efikkan said:
Faster doesn't mean it's more durable. Even if they managed to improve QLC endurance by 10x, it will still be fairly bad.

Bad compared to TLC/MCL, but still good enough for a system drive.

efikkan said:
These problems are highly dependent on the usage pattern, not random chance, so a selection of hundreds is nowhere near close enough to dismiss the existence of a problem. You also may not know whether your customers are able to detect these problems in time (or even bother) or correctly identify them.

When people like me have witnessed up to hundreds of computers in companies which a specific usage pattern, and problems with file system corruption and SMART errors are far more frequent than they should, then this forms a solid basis to conclude that usage patterns can significantly affect the lifespan.

And a selection of a few drives from a single manufacturer is not enough to indicate a problem in general with QLC.

bug said:
True, I have forgotten about that. But even refreshing eats into the precious p/e cycles
It's one of the reason I was rooting for XPoint.

A refresh is not a full erase and program. It might cause minor wear on the cell, but not nearly as much as a full erase and program.

dragontamer5788 · Mar 5, 2021

newtekie1 said:
This is incorrect. The rating remains the same no matter how the user fills up the drive. Nothing Intel has said indicates the endurance rating changes based on how filled the drive is. It is also why their endurance ratings are a worst case and the drives will likely far exceed that rating.

The rating doesn't change, but the fundamental attributes of static-wear leveling vs dynamic-wear leveling remain the same.

A full drive has to work harder (ie: more write amplification) than an empty SSD. That's just how modern SSDs work.

newtekie1 said:
A refresh is not a full erase and program. It might cause minor wear on the cell, but not nearly as much as a full erase and program.

NAND Flash literally can't be written to unless its been erased. Any form of writing by necessity, the very design of Flash-RAM, requires an erase cycle.

An erase sets an entire block of 128kB (or ~Mbit) either sets the bit to 1. Then writing sets the bits you care about to 0 (or 1 in inverted Flash). That's literally how erase / write cycles work. There's no way to set a singular bit to 1: you have to "erase" all 128kB... you have to set ALL the bits in the entire block to 1.

You can set individual bits to 0. But if a cell is 1 and needs to be refreshed back to 1, that necessitates an erase cycle.

newtekie1 · Mar 5, 2021

dragontamer5788 said:
The rating doesn't change, but the fundamental attributes of static-wear leveling vs dynamic-wear leveling remain the same.

A full drive has to work harder (ie: more write amplification) than an empty SSD. That's just how modern SSDs work.

And that is why the manufacturers of consumer drives use the worst case. They aren't expecting people to keep the drive 75% empty.

Processor	Intel i5-12600k
Motherboard	Asus H670 TUF
Cooling	Arctic Freezer 34
Memory	2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s)	EVGA GTX 1060 SC
Storage	500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s)	Dell U3219Q + HP ZR24w
Case	Raijintek Thetis
Audio Device(s)	Audioquest Dragonfly Red :D
Power Supply	Seasonic 620W M12
Mouse	Logitech G502 Proteus Core
Keyboard	G.Skill KM780R
Software	Arch Linux + Win10

Processor	Intel i5-12600k
Motherboard	Asus H670 TUF
Cooling	Arctic Freezer 34
Memory	2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s)	EVGA GTX 1060 SC
Storage	500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s)	Dell U3219Q + HP ZR24w
Case	Raijintek Thetis
Audio Device(s)	Audioquest Dragonfly Red :D
Power Supply	Seasonic 620W M12
Mouse	Logitech G502 Proteus Core
Keyboard	G.Skill KM780R
Software	Arch Linux + Win10

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

Processor	Intel i5-12600k
Motherboard	Asus H670 TUF
Cooling	Arctic Freezer 34
Memory	2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s)	EVGA GTX 1060 SC
Storage	500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s)	Dell U3219Q + HP ZR24w
Case	Raijintek Thetis
Audio Device(s)	Audioquest Dragonfly Red :D
Power Supply	Seasonic 620W M12
Mouse	Logitech G502 Proteus Core
Keyboard	G.Skill KM780R
Software	Arch Linux + Win10

Processor	Intel Core i7 10850K@5.2GHz
Motherboard	AsRock Z470 Taichi
Cooling	Corsair H115i Pro w/ Noctua NF-A14 Fans
Memory	32GB DDR4-3600
Video Card(s)	RTX 2070 Super
Storage	500GB SX8200 Pro + 8TB with 1TB SSD Cache
Display(s)	Acer Nitro VG280K 4K 28"
Case	Fractal Design Define S
Audio Device(s)	Onboard is good enough for me
Power Supply	eVGA SuperNOVA 1000w G3
Software	Windows 10 Pro x64

Intel Rolls Out SSD 670p Mainstream NVMe SSD Series

bug

minami

lexluthermiester

bug

minami

efikkan

lexluthermiester

bug

lexluthermiester

newtekie1

Semi-Retired Folder

bug

dragontamer5788

bug

dragontamer5788

efikkan

newtekie1

Semi-Retired Folder

dragontamer5788

newtekie1

Semi-Retired Folder