Monday, March 1st 2021

Intel Rolls Out SSD 670p Mainstream NVMe SSD Series

Intel today rolled out the SSD 670p series, a new line of M.2 NVMe SSDs that are targeted at the mainstream segment. Built in the M.2-2280 form-factor with PCI-Express 3.0 x4 host-interface, the drive implements Intel's latest 144-layer 3D QLC NAND flash memory, mated with a re-badged Silicon Motion SM2265G 8-channel controller that uses a fixed 256 MB DDR3L DRAM cache across all capacity variants. It comes in capacities of 512 GB, 1 TB, and 2 TB.

The 1 TB and 2 TB variants offer sequential read speeds of up to 3500 MB/s, while the 512 GB variant reads at up to 3000 MB/s. Sequential write speeds vary, with the 512 GB variant writing at up to 1600 MB/s, the 1 TB variant at up to 2500 MB/s, and the 2 TB variant at up to 2700 MB/s. The drives offer significantly higher endurance than past generations of QLC-based drives, with the 512 GB variant capable of up to 185 TBW, the 1 TB variant up to 370 TBW, and the 2 TB variant up to 740 TBW. Intel is backing the drives with 5-year warranties. The 512 GB variant is priced at $89, the 1 TB variant at $154, and the 2 TB variant at $329.
Add your own comment

92 Comments on Intel Rolls Out SSD 670p Mainstream NVMe SSD Series

#76
minami
Oh!
It's amazing that there are dealers who can perfectly distinguish between a NAND or controller failure when an SSD fails!
When the Ready/Busy signal on the NAND chip is fixed at "L", how can a dealer tell if it's a NAND failure or a controller failure?
A store that can tell if it's not a controller failure when a NAND chip with a power supply of around 2.7V has data corruption due to a problem with the power supply's buck converter.
A store that can determine that it's a NAND failure when one of the multiple NAND chips has a damaged boost circuit that prevents it from writing correctly!
If that's true, it would be nice to have one close to home.

This is one of my actual experiences, but the job of data recovery of Silicon Power S55 came to me.
I disassembled it and examined it thoroughly, and found that one small coil was defective.
The OS instructed me to initialize it. And the Device Manager informed me that it was a Kingston SSD.
When I connected the coil to where it should be, the SSD was recognized as a Silicon Power SSD and data could be read. This is a strange symptom, but the cause is a bad coil.

Today's NAND is based on a very delicate technology. Most of the failures of SSDs with low heat generation are caused by defective NAND or solder.
I see a lot of products that use poor quality TLC-NAND. Naturally, QLC-NAND also appears. It's a silicon lottery.
If you must buy a QLC-SSD, do your research and make sure you don't get a bad one.
Posted on Reply
#77
lexluthermiester
bugActually, TLC reached MLC endurance levels of endurance when manufacturers started to use V-NAND.
Did they? I'd read somewhere that it was close but not quite there.
minamiWhen the Ready/Busy signal on the NAND chip is fixed at "L", how can a dealer tell if it's a NAND failure or a controller failure?
Two points. When a drive is readable, accessing the smart data and testing the wear by reading how many sectors have been reaccocated is a solid indication of drive stability. The sheer number of QLC based drives that have come back is the glaring indicator. Whether or not it's the NAND or the controller is sometimes impossible to determine. However, when the OS is telling the user that the drive is in a state of "imminent failure" and cautions them to replace the drive soon, that's when we can look at the drive data and see where the problem is. Of the QLC drives that have failed, only one refused to boot. For the rest it was not a controller problem, the NAND was wearing out.
Posted on Reply
#78
bug
lexluthermiesterDid they? I'd read somewhere that it was close but not quite there.
With the countless manufacturing processes, let's just say they got in the same ballpark.
On one hand, he had TLC drives with better endurance than MLC: forums.anandtech.com/threads/how-is-3d-tlc-nand-more-reliable-than-mlc.2493029/
On the other, there was V-NAND MLC and of course that had way better endurance.

Bottom line, this little manufacturing trick brought the then worrisome TLC into a territory people were comfortable with. It doesn't look like QLC will get the same chance.
Posted on Reply
#79
minami
lexluthermiesterTwo points. When a drive is readable, accessing the smart data and testing the wear by reading how many sectors have been reaccocated is a solid indication of drive stability. The sheer number of QLC based drives that have come back is the glaring indicator. Whether or not it's the NAND or the controller is sometimes impossible to determine. However, when the OS is telling the user that the drive is in a state of "imminent failure" and cautions them to replace the drive soon, that's when we can look at the drive data and see where the problem is. Of the QLC drives that have failed, only one refused to boot. For the rest it was not a controller problem, the NAND was wearing out.
You're right. I do the same thing.
In addition, I read the entire area as a backup and write it back later.
It will wear out the NAND, but I am in the data recovery business, and data rescue is the most important thing.
After the rescue is completed, the entire area is written back to check the health of the NAND.

If there are a lot of blocks with long read latency right after they are written, even if the total write volume is small, the SSD can no longer be trusted.
In the worst case, you may have to wait more than 350ms to read one block.
With the advent of TLC-NAND, the number of jobs has increased, and QLC is naturally going to get worse.
If the controller is the cause, it's still better because the data can be saved. But the actual cause is NAND wear and solder.
Posted on Reply
#80
efikkan
newtekie1That does not happen. Every time a higher bit per cell of flash is used in SSDs the endurance ratings are always lower than the previous. When MLC drives came out, their endurance ratings were significantly lower than SLC drives. As MLC matured the endurance ratings increased, but they never got to SLC levels. When TLC came out, the endurance ratings were significantly lower than MLC. As TLC matured the endurance ratings went up, but never matched MLC. Now the same is happening with QLC.
I was talking of the endurance ratings of the products, not the cells, being inflated. With TLC and QLC SSDs using multiple qualities of flash, it matters a lot how they estimate the usage patterns, as the QLC has somewhere of 1/100th - 1/1000th of the endurance of SLC, if we're optimistic.

Look at this 670p QLC device offering an impressive 370 TBW for 1 TB variant(this is close to TLC numbers), compared to the 660p at 200 TBW, both with a 140 GB SLC cache. So they mostly achieved this "improvement" in tuning how the SLC caching works, because the QLC flash itself contributes very little to this endurance number. Post #54 from TheLostSwede really shows in their marketing materials, they clearly have assumptions of how to use the SSD right, assume how much will be in the SLC on average, and uses this to inflate the endurance ratings. If your usage pattern deviates from this model just a tiny bit, your endurance will be less than half, and if it deviates a lot you probably get more than one order of magnitude less.

This is in stark contrast with good MLC SSDs like Samsung 960 Pro 1TB (800 TBW) and 970 Pro 1 TB (1200 TBW), which probably have optimistic endurance ratings too, but at least they were much less sensitive to the user using it the right way.

If intel thinks their buyers should stay <25% usage of their SSDs for them to have a decent lifetime, then what good are these capacity gains from QLC over TLC and MLC?
newtekie1And my experience contradicts yours. Of the hundreds of QLC drives I've sold through my shop at this point, not one has come back with a NAND failure(I have had a few come back due to controller failures).
Since you haven't observed the same cases, I don't think you can say it contradicts his experience, unless you somehow can discredit his observations. Proving a negative is hard; you need a significant statistical basis to claim there isn't a problem, while only needing a comparatively few samples to prove a problem exists.
Posted on Reply
#81
lexluthermiester
bugIt doesn't look like QLC will get the same chance.
This is because the electrochemical process of programing QLC with data pushes the chemistry to it's physical limits very quickly. It's not a viable technology for data reliability unless the device it's used in is not exposed to frequent writes.
efikkanas the QLC has somewhere of 1/10-1/100th of the endurance of TLC.
This needs correction.
efikkanIf intel thinks their buyers should stay <25% usage of their SSDs for them to have a decent lifetime, then what good are these capacity gains from QLC over TLC and MLC?
Very well said!
Posted on Reply
#82
bug
lexluthermiesterThis is because the electrochemical process of programing QLC with data pushes the chemistry to it's physical limits very quickly. It's not a viable technology for data reliability unless the device it's used in is not exposed to frequent writes.
And with no writes, you're subjected to leakage and thus data loss. It's kinda lose-lose situation.

I could tolerate QLC as my storage drive (and remember to write everything again every now and then). But I don't want to imagine every manufacturer saying bye-bye to TLC and start offering QLC only.
Posted on Reply
#83
lexluthermiester
bugAnd with no writes, you're subjected to leakage and thus data loss. It's kinda lose-lose situation.
To be fair, even with QLC, that process take nearly 4 years to become a serious problem to data. Most NAND controllers do a data refresh cycle to mitigate that problem.
bugBut I don't want to imagine every manufacturer saying bye-bye to TLC and start offering QLC only.
Right there with you on that one. They do that, I'll go back to HDD's exclusively.
Posted on Reply
#84
newtekie1
Semi-Retired Folder
efikkanIf intel thinks their buyers should stay <25% usage of their SSDs for them to have a decent lifetime, then what good are these capacity gains from QLC over TLC and MLC?
Staying under 25% usage is not at all about lifetime, it is about performance. They aren't changing their TBW spec based on how full the drive is. The same is true with the SLC cache. The improvements there were about performance not lifespan of the drive and that is what those slides were demonstrating.

It is also worth pointing out that writing directly to the QLC on the 670p is significantly faster than writing directly to the QLC on the 660p(by a factor of 4x faster). So it would seem there have in fact been improvements to the QLC NAND itself.
efikkanSince you haven't observed the same cases, I don't think you can say it contradicts his experience, unless you somehow can discredit his observations. Proving a negative is hard; you need a significant statistical basis to claim there isn't a problem, while only needing a comparatively few samples to prove a problem exists.
Sure it can. I've sold hundreds of QLC drives and none have come back with NAND errors. His relatively small sample size, and samples from just one manufacturer, isn't enough to say there is a problem with QLC in general.
bugAnd with no writes, you're subjected to leakage and thus data loss. It's kinda lose-lose situation.

I could tolerate QLC as my storage drive (and remember to write everything again every now and then). But I don't want to imagine every manufacturer saying bye-bye to TLC and start offering QLC only.
Not really, the controller handles the refreshing of the cells as long as the drive has power. This solution has been in place since TLC drives which require the same thing. The only time it becomes a problem is when the drive sits unpowered for a significant amount of time.
Posted on Reply
#85
bug
newtekie1Not really, the controller handles the refreshing of the cells as long as the drive has power. This solution has been in place since TLC drives which require the same thing. The only time it becomes a problem is when the drive sits unpowered for a significant amount of time.
True, I have forgotten about that. But even refreshing eats into the precious p/e cycles :(
It's one of the reason I was rooting for XPoint.
Posted on Reply
#86
dragontamer5788
bugTrue, I have forgotten about that. But even refreshing eats into the precious p/e cycles :(
It's one of the reason I was rooting for XPoint.
Patrol Reads shouldn't have to erase-cycle cells unless the voltage level started going out-of-whack.

"If (block has error) rewrite block".

I don't know what kind of error correction exists per block, but its probably on the order of 16-bytes or more. Which means 8+bytes (64-bits) need to be in error before you reach an uncorrectable error. If you're constantly doing patrol-reads, then only one or two bits will go bad at a time, giving plenty of time for the controller to fix the issue.
Posted on Reply
#87
bug
dragontamer5788Patrol Reads shouldn't have to erase-cycle cells unless the voltage level started going out-of-whack.

"If (block has error) rewrite block".

I don't know what kind of error correction exists per block, but its probably on the order of 16-bytes or more. Which means 8+bytes (64-bits) need to be in error before you reach an uncorrectable error. If you're constantly doing patrol-reads, then only one or two bits will go bad at a time, giving plenty of time for the controller to fix the issue.
Even so, to correct 1 or 2 bits, it still has to write an entire block (4kB), right?
But yes, I didn't mean to imply the drive will go bad just by refreshing itself. Just the refreshing itself isn't a "free lunch" kinda deal.
Posted on Reply
#88
dragontamer5788
bugEven so, to correct 1 or 2 bits, it still has to write an entire block (4kB), right?
Depends on the Flash. The block erase could be 128kB for example even if individual read/write pages are 4kB.

EDIT: Now that I think of it: that's the point of TRIM. You erase 128kB sections, but then new writes are just 4kB at a time. So TRIM is a delayed "garbage collection", that helps batch up writes across your SSD, so this whole process is surprisingly efficient.
But yes, I didn't mean to imply the drive will go bad just by refreshing itself. Just the refreshing itself isn't a "free lunch" kinda deal.
Fair.
Posted on Reply
#89
efikkan
newtekie1Staying under 25% usage is not at all about lifetime, it is about performance. They aren't changing their TBW spec based on how full the drive is. The same is true with the SLC cache. The improvements there were about performance not lifespan of the drive and that is what those slides were demonstrating.
It absolutely matters.
One user who does a lot of writes but never fills up a lot will cause way less wear compared to one user filling up most and randomly overwrites most of it. The endurance rating is based on a very optimistic usage pattern, where the QLC probably contributes to less than 2% of the TBW rating.
newtekie1It is also worth pointing out that writing directly to the QLC on the 670p is significantly faster than writing directly to the QLC on the 660p(by a factor of 4x faster). So it would seem there have in fact been improvements to the QLC NAND itself.
Faster doesn't mean it's more durable. Even if they managed to improve QLC endurance by 10x, it will still be fairly bad.
newtekie1Sure it can. I've sold hundreds of QLC drives and none have come back with NAND errors. His relatively small sample size, and samples from just one manufacturer, isn't enough to say there is a problem with QLC in general.
These problems are highly dependent on the usage pattern, not random chance, so a selection of hundreds is nowhere near close enough to dismiss the existence of a problem. You also may not know whether your customers are able to detect these problems in time (or even bother) or correctly identify them.

When people like me have witnessed up to hundreds of computers in companies which a specific usage pattern, and problems with file system corruption and SMART errors are far more frequent than they should, then this forms a solid basis to conclude that usage patterns can significantly affect the lifespan.
Posted on Reply
#90
newtekie1
Semi-Retired Folder
efikkanIt absolutely matters.
One user who does a lot of writes but never fills up a lot will cause way less wear compared to one user filling up most and randomly overwrites most of it.
I'm not debating this.
efikkanThe endurance rating is based on a very optimistic usage pattern, where the QLC probably contributes to less than 2% of the TBW rating.
This is incorrect. The rating remains the same no matter how the user fills up the drive. Nothing Intel has said indicates the endurance rating changes based on how filled the drive is. It is also why their endurance ratings are a worst case and the drives will likely far exceed that rating.
efikkanFaster doesn't mean it's more durable. Even if they managed to improve QLC endurance by 10x, it will still be fairly bad.
Bad compared to TLC/MCL, but still good enough for a system drive.
efikkanThese problems are highly dependent on the usage pattern, not random chance, so a selection of hundreds is nowhere near close enough to dismiss the existence of a problem. You also may not know whether your customers are able to detect these problems in time (or even bother) or correctly identify them.

When people like me have witnessed up to hundreds of computers in companies which a specific usage pattern, and problems with file system corruption and SMART errors are far more frequent than they should, then this forms a solid basis to conclude that usage patterns can significantly affect the lifespan.
And a selection of a few drives from a single manufacturer is not enough to indicate a problem in general with QLC.
bugTrue, I have forgotten about that. But even refreshing eats into the precious p/e cycles :(
It's one of the reason I was rooting for XPoint.
A refresh is not a full erase and program. It might cause minor wear on the cell, but not nearly as much as a full erase and program.
Posted on Reply
#91
dragontamer5788
newtekie1This is incorrect. The rating remains the same no matter how the user fills up the drive. Nothing Intel has said indicates the endurance rating changes based on how filled the drive is. It is also why their endurance ratings are a worst case and the drives will likely far exceed that rating.
The rating doesn't change, but the fundamental attributes of static-wear leveling vs dynamic-wear leveling remain the same.

A full drive has to work harder (ie: more write amplification) than an empty SSD. That's just how modern SSDs work.
newtekie1A refresh is not a full erase and program. It might cause minor wear on the cell, but not nearly as much as a full erase and program.
NAND Flash literally can't be written to unless its been erased. Any form of writing by necessity, the very design of Flash-RAM, requires an erase cycle.

An erase sets an entire block of 128kB (or ~Mbit) either sets the bit to 1. Then writing sets the bits you care about to 0 (or 1 in inverted Flash). That's literally how erase / write cycles work. There's no way to set a singular bit to 1: you have to "erase" all 128kB... you have to set ALL the bits in the entire block to 1.

You can set individual bits to 0. But if a cell is 1 and needs to be refreshed back to 1, that necessitates an erase cycle.
Posted on Reply
#92
newtekie1
Semi-Retired Folder
dragontamer5788The rating doesn't change, but the fundamental attributes of static-wear leveling vs dynamic-wear leveling remain the same.

A full drive has to work harder (ie: more write amplification) than an empty SSD. That's just how modern SSDs work.
And that is why the manufacturers of consumer drives use the worst case. They aren't expecting people to keep the drive 75% empty.
Posted on Reply
Add your own comment
Nov 21st, 2024 13:46 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts