Monday, March 1st 2021
Intel Rolls Out SSD 670p Mainstream NVMe SSD Series
Intel today rolled out the SSD 670p series, a new line of M.2 NVMe SSDs that are targeted at the mainstream segment. Built in the M.2-2280 form-factor with PCI-Express 3.0 x4 host-interface, the drive implements Intel's latest 144-layer 3D QLC NAND flash memory, mated with a re-badged Silicon Motion SM2265G 8-channel controller that uses a fixed 256 MB DDR3L DRAM cache across all capacity variants. It comes in capacities of 512 GB, 1 TB, and 2 TB.
The 1 TB and 2 TB variants offer sequential read speeds of up to 3500 MB/s, while the 512 GB variant reads at up to 3000 MB/s. Sequential write speeds vary, with the 512 GB variant writing at up to 1600 MB/s, the 1 TB variant at up to 2500 MB/s, and the 2 TB variant at up to 2700 MB/s. The drives offer significantly higher endurance than past generations of QLC-based drives, with the 512 GB variant capable of up to 185 TBW, the 1 TB variant up to 370 TBW, and the 2 TB variant up to 740 TBW. Intel is backing the drives with 5-year warranties. The 512 GB variant is priced at $89, the 1 TB variant at $154, and the 2 TB variant at $329.
The 1 TB and 2 TB variants offer sequential read speeds of up to 3500 MB/s, while the 512 GB variant reads at up to 3000 MB/s. Sequential write speeds vary, with the 512 GB variant writing at up to 1600 MB/s, the 1 TB variant at up to 2500 MB/s, and the 2 TB variant at up to 2700 MB/s. The drives offer significantly higher endurance than past generations of QLC-based drives, with the 512 GB variant capable of up to 185 TBW, the 1 TB variant up to 370 TBW, and the 2 TB variant up to 740 TBW. Intel is backing the drives with 5-year warranties. The 512 GB variant is priced at $89, the 1 TB variant at $154, and the 2 TB variant at $329.
92 Comments on Intel Rolls Out SSD 670p Mainstream NVMe SSD Series
It's amazing that there are dealers who can perfectly distinguish between a NAND or controller failure when an SSD fails!
When the Ready/Busy signal on the NAND chip is fixed at "L", how can a dealer tell if it's a NAND failure or a controller failure?
A store that can tell if it's not a controller failure when a NAND chip with a power supply of around 2.7V has data corruption due to a problem with the power supply's buck converter.
A store that can determine that it's a NAND failure when one of the multiple NAND chips has a damaged boost circuit that prevents it from writing correctly!
If that's true, it would be nice to have one close to home.
This is one of my actual experiences, but the job of data recovery of Silicon Power S55 came to me.
I disassembled it and examined it thoroughly, and found that one small coil was defective.
The OS instructed me to initialize it. And the Device Manager informed me that it was a Kingston SSD.
When I connected the coil to where it should be, the SSD was recognized as a Silicon Power SSD and data could be read. This is a strange symptom, but the cause is a bad coil.
Today's NAND is based on a very delicate technology. Most of the failures of SSDs with low heat generation are caused by defective NAND or solder.
I see a lot of products that use poor quality TLC-NAND. Naturally, QLC-NAND also appears. It's a silicon lottery.
If you must buy a QLC-SSD, do your research and make sure you don't get a bad one.
On one hand, he had TLC drives with better endurance than MLC: forums.anandtech.com/threads/how-is-3d-tlc-nand-more-reliable-than-mlc.2493029/
On the other, there was V-NAND MLC and of course that had way better endurance.
Bottom line, this little manufacturing trick brought the then worrisome TLC into a territory people were comfortable with. It doesn't look like QLC will get the same chance.
In addition, I read the entire area as a backup and write it back later.
It will wear out the NAND, but I am in the data recovery business, and data rescue is the most important thing.
After the rescue is completed, the entire area is written back to check the health of the NAND.
If there are a lot of blocks with long read latency right after they are written, even if the total write volume is small, the SSD can no longer be trusted.
In the worst case, you may have to wait more than 350ms to read one block.
With the advent of TLC-NAND, the number of jobs has increased, and QLC is naturally going to get worse.
If the controller is the cause, it's still better because the data can be saved. But the actual cause is NAND wear and solder.
Look at this 670p QLC device offering an impressive 370 TBW for 1 TB variant(this is close to TLC numbers), compared to the 660p at 200 TBW, both with a 140 GB SLC cache. So they mostly achieved this "improvement" in tuning how the SLC caching works, because the QLC flash itself contributes very little to this endurance number. Post #54 from TheLostSwede really shows in their marketing materials, they clearly have assumptions of how to use the SSD right, assume how much will be in the SLC on average, and uses this to inflate the endurance ratings. If your usage pattern deviates from this model just a tiny bit, your endurance will be less than half, and if it deviates a lot you probably get more than one order of magnitude less.
This is in stark contrast with good MLC SSDs like Samsung 960 Pro 1TB (800 TBW) and 970 Pro 1 TB (1200 TBW), which probably have optimistic endurance ratings too, but at least they were much less sensitive to the user using it the right way.
If intel thinks their buyers should stay <25% usage of their SSDs for them to have a decent lifetime, then what good are these capacity gains from QLC over TLC and MLC? Since you haven't observed the same cases, I don't think you can say it contradicts his experience, unless you somehow can discredit his observations. Proving a negative is hard; you need a significant statistical basis to claim there isn't a problem, while only needing a comparatively few samples to prove a problem exists.
I could tolerate QLC as my storage drive (and remember to write everything again every now and then). But I don't want to imagine every manufacturer saying bye-bye to TLC and start offering QLC only.
It is also worth pointing out that writing directly to the QLC on the 670p is significantly faster than writing directly to the QLC on the 660p(by a factor of 4x faster). So it would seem there have in fact been improvements to the QLC NAND itself. Sure it can. I've sold hundreds of QLC drives and none have come back with NAND errors. His relatively small sample size, and samples from just one manufacturer, isn't enough to say there is a problem with QLC in general. Not really, the controller handles the refreshing of the cells as long as the drive has power. This solution has been in place since TLC drives which require the same thing. The only time it becomes a problem is when the drive sits unpowered for a significant amount of time.
It's one of the reason I was rooting for XPoint.
"If (block has error) rewrite block".
I don't know what kind of error correction exists per block, but its probably on the order of 16-bytes or more. Which means 8+bytes (64-bits) need to be in error before you reach an uncorrectable error. If you're constantly doing patrol-reads, then only one or two bits will go bad at a time, giving plenty of time for the controller to fix the issue.
But yes, I didn't mean to imply the drive will go bad just by refreshing itself. Just the refreshing itself isn't a "free lunch" kinda deal.
EDIT: Now that I think of it: that's the point of TRIM. You erase 128kB sections, but then new writes are just 4kB at a time. So TRIM is a delayed "garbage collection", that helps batch up writes across your SSD, so this whole process is surprisingly efficient. Fair.
One user who does a lot of writes but never fills up a lot will cause way less wear compared to one user filling up most and randomly overwrites most of it. The endurance rating is based on a very optimistic usage pattern, where the QLC probably contributes to less than 2% of the TBW rating. Faster doesn't mean it's more durable. Even if they managed to improve QLC endurance by 10x, it will still be fairly bad. These problems are highly dependent on the usage pattern, not random chance, so a selection of hundreds is nowhere near close enough to dismiss the existence of a problem. You also may not know whether your customers are able to detect these problems in time (or even bother) or correctly identify them.
When people like me have witnessed up to hundreds of computers in companies which a specific usage pattern, and problems with file system corruption and SMART errors are far more frequent than they should, then this forms a solid basis to conclude that usage patterns can significantly affect the lifespan.
A full drive has to work harder (ie: more write amplification) than an empty SSD. That's just how modern SSDs work. NAND Flash literally can't be written to unless its been erased. Any form of writing by necessity, the very design of Flash-RAM, requires an erase cycle.
An erase sets an entire block of 128kB (or ~Mbit) either sets the bit to 1. Then writing sets the bits you care about to 0 (or 1 in inverted Flash). That's literally how erase / write cycles work. There's no way to set a singular bit to 1: you have to "erase" all 128kB... you have to set ALL the bits in the entire block to 1.
You can set individual bits to 0. But if a cell is 1 and needs to be refreshed back to 1, that necessitates an erase cycle.