# Effect of SLC Caching on SSD Endurance



## WhoDecidedThat (Jan 8, 2022)

Kingston KC3000 Review - Faster Than Samsung 980 Pro
					

The Kingston KC3000 is built using the Phison E18 controller and Micron's best 176-layer TLC NAND flash. In our performance testing, the drive can beat the Samsung 980 Pro and is the fastest SSD we ever tested. It shares that performance throne with the WD Black SN850.




					www.techpowerup.com
				




The Kingston KC3000 has 2000 GB of TLC cache. It can use almost all of it (1930 GB) in SLC mode for 1930/3 = 643 GB.

I keep wondering though. Isn't this writing to the NAND twice? Say I write 100 GB. First I consume 300 GB worth of NAND when writing in SLC mode. Then I consume 100 GB worth NAND in TLC mode. Of course, writing in SLC isn't nearly as harmful but it is somewhat harmful isn't it? We don't even get to choose if we are willing to let go of this SLC caching for better endurance.

@Chris_Ramseyer Can you offer some insights into how harmful (or harmless) SLC caching is to NAND endurance?

@W1zzard Can you offer some insights to this?


----------



## Selaya (Jan 9, 2022)

writing to TLC in SLC mode causes about as much wear as writing to SLC in SLC mode - compared to actual TLC writes like, next to none at all.


----------



## Mussels (Jan 9, 2022)

SLC caching uses less writes, you could think of it like anything written in SLC mode uses 1/4 the lifespan of  QLC writes


my numbers there is made up, we'd need an expert to tell us how much it actually helps. Short version is, keep some free space on the drive if you want it to live longer.


----------



## WhoDecidedThat (Jan 9, 2022)

Mussels said:


> My numbers there is made up, we'd need an expert to tell us how much it actually helps. Short version is, keep some free space on the drive if you want it to live longer.



I read online that a 2D NAND cell in TLC mode has a life of 3k writes, in MLC has a life of 10k writes and in SLC mode has a life of 100k writes. Based on this if you do the math, the net effect of writing to the NAND chip first in SLC mode and then in TLC mode is about 10%. Meaning what would have been a life of 3000 cycles becomes closer to 2750 cycles. (See the math below)

However, not only is all this based on the assumption that a TLC chip in SLC mode will have 30x the endurance. I am also (incorrectly) assuming identical failure point for both. A NAND chip which has reached the point of failure in TLC mode should still be perfectly okay to be used in SLC mode. This means that SLC mode cannot have 30 times the write cycles. It might be closer to 10x or 20x, I don't know. Which is exactly why I asked the question. How much endurance are we sacrificing for increased write performance? Large read performance will become useful for gamers because of DirectStorage in 5-6 years so having that as an option might become necessary.

But how much write performance do most consumers need? Aren't their needs ultimately limited by their internet bandwidth? I am pretty sure most people don't have a 4 Gbps/500 MBps internet connection. Some might have a 5 Gbps/600 MBps external SSD but writing directly to TLC can already accomplish that.

Intel's 760p had a sequential write speed of 560 MBps and when compared to a similar drive (ADATA SX8200) which had 1660 MBps sequential writes, but identical read performance... the Adata was barely 6% faster overall in TPU's review because the 2 drives had basically identical real life performance for most uses (Photoshop Editing and ISO File Copy being the biggest differentiators).



P/E CyclesCapacity in GBTotal Life in GB%age life used for 10 GB writeSLC100,000​10​1,000,000​0.001%​TLC3,000​30​90,000​0.011%​Total %age life used0.012%​Effective Life in GB82,569​Effective P/E2,752​


----------



## Mussels (Jan 9, 2022)

No you dont lose any endurance from SLC mode, you gain it.

SLC mode *reduces* the number of write operations, since less cells have to write.


----------



## WhoDecidedThat (Jan 9, 2022)

Mussels said:


> No you dont lose any endurance from SLC mode, you gain it.
> 
> SLC mode *reduces* the number of write operations, since less cells have to write.


Allow me to explain how NAND works.

The smalled unit of storage in a NAND chip is called a page. The size of this would be 16 KB in TLC Mode or 5.33 KB in SLC Mode. This is the smallest amount you are allowed to write/read. Whether you are writing/reading 1 KB to your SSD or 16 KB, the SSD will have to read/write that whole page.

64 to 512 of these pages make a block.

Hundreds of these blocks are then organized into slices.

Each NAND chip is made of up multiple slices.

The parallelism in SSDs comes from having multiple slices and multiple chips.

Say, your SSD has 4 NAND chips each of which has 4 slices. That is 16 slices total. Your SSD controller has 16 connections to these 4 NAND chips. One for each slice.

You want to write 80 KB to your SSD.

Step 1 - break that 80 KB into fifteen 5.33 KB chunks

Step 2 - each 5.33 KB is written to a SLC mode page in each slice. this is done for all 15 chunks to 15 pages in 15 slices, parallely/together.

Btw, it is because of this parallel reading and writing that SSDs get faster (MBps) as your file gets larger.

Step 3 - The 80 KB worth of data that was written in SLC mode is written again to five 16 KB TLC mode pages (distributed across 5 slices).

Step 4 - Those fifteen 5.33 KB SLC pages are then emptied to be used again in the future.

Steps 3 and 4 taken together are called "flushing the SLC cache to TLC". 

This is the difference I am talking about.

Without SLC caching, I would have just consumed 5 TLC pages.

With SLC caching, I am consuming 15 SLC pages in addition to the 5 TLC pages.

This is why SLC caching reduces SSD endurance by some (as of yet undetermined) amount.


----------



## AlLOL2001 (Jan 12, 2022)

I've the same question, I'm researching before buying a SSD, I think that I'll go with the Samsung 980...

My new question is: there is someway to disable the SLC cache?

I'm not sure how do the "Intelligent TurboWrite" works, but I really hope that there's someway... If somebody know how to disable it, please tell me...

I've read something like Intel can disable that SLC cache...

Sorry, english isn't my first language... TY in advance...


----------



## joemama (Jan 12, 2022)

AlLOL2001 said:


> I've the same question, I'm researching before buying a SSD, I think that I'll go with the Samsung 980...
> 
> My new question is: there is someway to disable the SLC cache?
> 
> ...


I don't think the SLC cache function is something you can control from the computer side since it is written in the SSD controller


----------



## chrcoluk (Jan 12, 2022)

It depends how intelligent the controller is.

A SSD insider revealed sometime in 2020 that there is primitive controllers that write almost everything that's written to SLC cache again to TLC, then back to SLC again if the file is updated, so whilst SLC cache writes might be good for endurance, it doesn't mean much if they still have to be written again on TLC.

He developed a drive working with AMD, that had more intelligent behaviour files that are mostly writes would stay in SLC all the time e.g. logs, for this to work files would generally stay on SLC for a while whilst the drive figures out if they read or write heavy, and only after this delay read heavy files get moved to TLC, and with this new behaviour the expected endurance of the drive far exceeded 3D MLC.

Many existing drives which just go for headline benchmark results, I believe start moving data from SLC quite quickly to try and maximise performance for new writes.


----------



## bug (Jan 12, 2022)

Selaya said:


> writing to TLC in SLC mode causes about as much wear as writing to SLC in SLC mode - compared to actual TLC writes like, next to none at all.


I doubt that very much. Whether you write 0 and 1 or 0..7, you're writing to the same physical cell. You're using one p/e cycle still.

@blanarahul 3D TLC brings the p/e cycles in line with planar MLC. Endurance of these drives is more than enough. For example my 850EVO is over 5 years old, has been used a gaming and then as an OS drive and still displays over 90% integrity.


----------



## WhoDecidedThat (Jan 12, 2022)

chrcoluk said:


> Many existing drives which just go for headline benchmark results, I believe start moving data from SLC quite quickly to try and maximise performance for new writes.


I believe this to be the case as well.

Back in the day we could choose between having TLC without SLC caching and TLC with SLC caching. Nowdays we don't get that choice anymore. 





This is what Samsung's TLC without SLC caching SSD looks like. In comparison, with SLC caching (980 Pro) -



Random write performance sees quite the jump from 100k IOPS to 1000k IOPS and Seq Write numbers are 2.4 GBps for a 2 TB class drive. So it would definitely look quite bad for marketing to not have such performance. But if you ask me, having a 1 TB SSD with 1.2 GBps sequential writes would be more than okay.


This is what Kingston says about their TLC without caching drive.




Had they used a PCIe 4.0 controller they would be able to reach 7000 MB/sec reads no problem. But that 925 MB/sec will be hard to advertise. On the other hand, endurance numbers are a lot better for this drive - 1095 TB for their 960 GB model (compared to 600 TB for consumer 1 TB ssds).


----------



## GabrielLP14 (Jan 20, 2022)

Very interesting content, however you guys failed to mention one details that's a drawback, WAF. Write Amplification Factor


----------



## WhoDecidedThat (Jan 20, 2022)

GabrielLP14 said:


> WAF. Write Amplification Factor


How is that relevant to our discussion? Can you please explain?


----------



## Tetras (Jan 21, 2022)

blanarahul said:


> I believe this to be the case as well.
> 
> Back in the day we could choose between having TLC without SLC caching and TLC with SLC caching. Nowdays we don't get that choice anymore.
> 
> ...



I don't think the ironwolf 110 or 125 pro have a slc cache either and also much higher endurance.


----------



## Wooden Law - Black (Jan 31, 2022)

blanarahul said:


> How is that relevant to our discussion? Can you please explain?


Dynamic SLC design lead to a worse WAF (data may first be written as SLC and later be rewritten as TLC/QLC), while static one no. In fact, a SSD with dynamic SLC cache doesn't improve endurance, while an other one with static SLC yes, static SLC can improve NAND endurance to like 30-40K PEC (P/E cycles) - it depends on the NAND model.


----------



## bug (Jan 31, 2022)

Black [Super Saiyan Rosé] said:


> Dynamic SLC design lead to a worse WAF (data may first be written as SLC and later be rewritten as TLC/QLC), while static one no. In fact, a SSD with dynamic SLC cache doesn't improve endurance, while an other one with static SLC yes, static SLC can improve NAND endurance to like 30-40K PEC (P/E cycles) - it depends on the NAND model.


Whether the cache is static or dynamic, it's still just a cache (i.e. data gets written to it and then flushed to the main storage). What's the difference here? What am I missing?


----------



## Selaya (Jan 31, 2022)

dynamic slc cache contents get written into tlc at some point (either when the drive fills or when it's idle, depending on the configuration of the firmware), which means that data gets written twice (into slc cache, then into tlc proper) = write amplification


----------



## Wooden Law - Black (Jan 31, 2022)

bug said:


> Whether the cache is static or dynamic, it's still just a cache (i.e. data gets written to it and then flushed to the main storage). What's the difference here? What am I missing?


The fact that dynamic and static - even if we are talking about cache - aren't the same thing, they work in different way and have different advantages/disadvantages. The static SLC design is in the OP (over-provisioning) space, is always available and always in SLC mode for the device's life, the dynamic SLC design depends on "how much space the user is using, diminishing in size as the drive is filled.".


----------



## R0H1T (Jan 31, 2022)

blanarahul said:


> Steps 3 and 4 taken together are called "flushing the SLC cache to TLC".


Then you should also know that not all the data from SLC (or MLC cache) is flushed to the disk! Only the final writes, & that too if necessary!




Deferred caching, with PrimoCache is the easiest way to learn this ~ only final writes, or urgent writes when cache is full, is generally flushed to the disk! This saves a lot of unnecessary writes & is actually a major way to save on your disk's lifetime endurance. Whatever napkin math you're doing doesn't translate to real world numbers, there's literally a* ton of variables involved*.


----------



## WhoDecidedThat (Jan 31, 2022)

Black [Super Saiyan Rosé] said:


> Dynamic SLC design lead to a worse WAF (data may first be written as SLC and later be rewritten as TLC/QLC)


That's exactly what I said in my original post without calling it something fancy like WAF.


Black [Super Saiyan Rosé] said:


> static SLC can improve NAND endurance to like 30-40K PEC


By how much? Because whether you write to TLC directly or funnelled through SLC cache, you are writing to TLC either way. How is it improving TLC's endurance?


----------



## bug (Jan 31, 2022)

Black [Super Saiyan Rosé] said:


> The fact that dynamic and static - even if we are talking about cache - aren't the same thing, they work in different way and have different advantages/disadvantages. The static SLC design is in the OP (over-provisioning) space, is always available and always in SLC mode for the device's life, the dynamic SLC design depends on "how much space the user is using, diminishing in size as the drive is filled.".


Still, each write means one write to the cache and another one to the main storage. No difference in write count.


----------



## Wooden Law - Black (Jan 31, 2022)

Tetras said:


> I don't think the ironwolf 110 or 125 pro have a slc cache either and also much higher endurance.


From Real Hardware Reviews review (IronWolf 110): "At its heart this is a pseudo-SLC cache buffer; however, unlike most TLC solid-state drives which use a fixed capacity for their pseudo-SLC cache, DuraWrite is a complete floating pseudo-SLC cache buffer that will use every bit’s worth of free space on the drive."


----------



## WhoDecidedThat (Jan 31, 2022)

R0H1T said:


> Then you should also know that not all the data from SLC (or MLC cache) is flushed to the disk! Only the final writes, & that too if necessary!


I highly doubt this. What is your source on this?


----------



## R0H1T (Jan 31, 2022)

You mean apart from some/many of the SSD reviews?






It's not 100% copy of PrimoCache because SLC caches come in lots of different sizes & are implemented differently. But the net result is mostly the same.


----------



## Wooden Law - Black (Jan 31, 2022)

blanarahul said:


> By how much? Because whether you write to TLC directly or funnelled through SLC cache, you are writing to TLC either way.


Usually TLC flash is around 2000-3000 PEC, but also here depends on the model, because, for example, Micron B37R (128L) is 5000 PEC, while SpecTek one (which is a sub-brand with worse bin) is like 700.


blanarahul said:


> How is it improving TLC's endurance?


I think for the fact that static SLC is in OP or other non-accessible SSD's spaces, not like dynamic one which its size depends on the space in the SSD and other factors.


bug said:


> Still, each write means one write to the cache and another one to the main storage. No difference in write count.


Maybe yes, but in terms of SSD'/NAND's endurance no.


----------



## WhoDecidedThat (Jan 8, 2022)

Kingston KC3000 Review - Faster Than Samsung 980 Pro
					

The Kingston KC3000 is built using the Phison E18 controller and Micron's best 176-layer TLC NAND flash. In our performance testing, the drive can beat the Samsung 980 Pro and is the fastest SSD we ever tested. It shares that performance throne with the WD Black SN850.




					www.techpowerup.com
				




The Kingston KC3000 has 2000 GB of TLC cache. It can use almost all of it (1930 GB) in SLC mode for 1930/3 = 643 GB.

I keep wondering though. Isn't this writing to the NAND twice? Say I write 100 GB. First I consume 300 GB worth of NAND when writing in SLC mode. Then I consume 100 GB worth NAND in TLC mode. Of course, writing in SLC isn't nearly as harmful but it is somewhat harmful isn't it? We don't even get to choose if we are willing to let go of this SLC caching for better endurance.

@Chris_Ramseyer Can you offer some insights into how harmful (or harmless) SLC caching is to NAND endurance?

@W1zzard Can you offer some insights to this?


----------



## WhoDecidedThat (Jan 31, 2022)

R0H1T said:


> You mean apart from some/many of the SSD reviews?


You do realise that if I write 100 GB to the 980 Pro, wait for 15 minutes, the drive moves the data from the SLC cache to the TLC so that the SLC cache is ready to be used again right?



R0H1T said:


> It's not 100% copy of PrimoCache because SLC caches come in lots of different sizes & are implemented differently. But the net result is mostly the same.


I want to know. What do you think PrimoCache does?


----------



## R0H1T (Jan 31, 2022)

There's a simple way to test this ~ use CDM or any other benchmark write totally random data to a small file & loop it 5-10 times so that it fits well within the SLC cache size. Check for your write (throughput) speeds on the benchmark application & check actual data written through something like process Hacker, HD sentinel or any other utility in real-time. Now admittedly, like I said it's not 100% copy of PrimoCache, so all that data will not be "trimmed" like it would with PrimoCache but the writes should be much lower than requested by the (benchmark) app. It's important to use a utility which measure realtime data written.



blanarahul said:


> What do you think PrimoCache does?


What do you mean? Do you have an idea of what it does/doesn't do from that screenshot I posted?


----------



## bug (Jan 31, 2022)

Black [Super Saiyan Rosé] said:


> Maybe yes, but in terms of SSD'/NAND's endurance no.


And once again: if we're dealing with the same number of writes, what makes static cache better for endurance? Are you referring to the fact that, when dealing with  static cache, the user-facing storage is only written to once?


----------



## Wooden Law - Black (Jan 31, 2022)

bug said:


> And once again: if we're dealing with the same number of writes, what makes static cache better for endurance? Are you referring to the fact that, when dealing with static cache, the user-facing storage is only written to once?


Did you read the fact that SLC static is in OP (so a non-accessible user space)? Theoretically, how can you do writes on SLC if it isn't accessible to the user? 
Also, keep in mind that exists some pSLC SSDs, and that means that the entire SSD is in SLC cache, look at the LX3030 or the P200 for example (you can read that SLC static improves endurance).


----------



## WhoDecidedThat (Jan 31, 2022)

R0H1T said:


> What do you mean? Do you have an idea of what it does/doesn't do from that screenshot I posted?


PrimoCache and TLC SSD's SLC cache have different objectives and thus different approaches.

The history behind SLC caching is that when Samsung first released their first TLC SSD, the 840 back in 2012 they found that compared to its MLC counterpart (840 Pro), it had comparable read speed but much slower writes speeds. So to compensate for the slow write speed of native TLC NAND, they created SLC caching (840 Evo) in 2013. The objective of SLC caching was to increase write speeds for TLC NAND SSDs. This is why it is necessary for the SLC cache to be emptied. So that it is ready to be filled again when you are writing lots of data the next time.

PrimoCache was created to increase read/write speeds for data that is accessed more frequently while not affecting the read/write speeds for data that is accessed infrequently. For example, if you have a movie stored on your drive, you probably don't access it frequently so it stays in the slower storage while something like frequently accessed Windows system files stays in high speed storage. This why PrimoCache uses Deferred Caching.



Black [Super Saiyan Rosé] said:


> Did you read the fact that SLC static is in OP (so a non-accessible user space)? Theoretically, how can you do writes on SLC if it isn't accessible to the user?
> Also, keep in mind that exists some pSLC SSDs, and that means that the entire SSD is in SLC cache, look at the LX3030 or the P200 for example (you can read that SLC static improves endurance).


How should I explain this... Bug and I are saying that think of the SSD's entire NAND pool (OP or not). Irrespective of whether you use static/dynamic SLC cache, whenever I make a write to the SSD, I am writing to the NAND pool twice. First to the SLC portion of the entire NAND pool. Then to the TLC portion of the entire NAND pool. And we are concerned about how the endurance of the entire NAND pool is being affected because we are writing to it twice. Hope it makes sense.


----------



## bug (Jan 31, 2022)

Black [Super Saiyan Rosé] said:


> Did you read the fact that SLC static is in OP (so a non-accessible user space)? Theoretically, how can you do writes on SLC if it isn't accessible to the user?
> Also, keep in mind that exists some pSLC SSDs, and that means that the entire SSD is in SLC cache, look at the LX3030 or the P200 for example (you can read that SLC static improves endurance).


Technically, _you_ are not writing to the cache, that's something the SSD does on its own.

Long story short, can you explain, step by step, what in the static nature of a cache affects a drive's endurance? Pretend I know nothing about SSDs.


----------



## Wooden Law - Black (Jan 31, 2022)

bug said:


> Long story short, can you explain, step by step, what in the static nature of a cache affects a drive's endurance?





			https://cdn.discordapp.com/attachments/845376921478889512/915655122795761724/US20210342191A1.pdf


----------



## bug (Jan 31, 2022)

Black [Super Saiyan Rosé] said:


> https://cdn.discordapp.com/attachments/845376921478889512/915655122795761724/US20210342191A1.pdf


"This site is blocked due to a security threat." 
Will try a computer that's not managed by my employer later.


----------



## Wooden Law - Black (Jan 31, 2022)

bug said:


> "This site is blocked due to a security threat."
> Will try a computer that's not managed by my employer later.


Now I am on iPhone, I can’t download it and post it.


----------



## WhoDecidedThat (Jan 31, 2022)

bug said:


> "This site is blocked due to a security threat."
> Will try a computer that's not managed by my employer later.





Black [Super Saiyan Rosé] said:


> Now I am on iPhone, I can’t download it and post it.


----------



## bug (Jan 31, 2022)

Black [Super Saiyan Rosé] said:


> Now I am on iPhone, I can’t download it and post it.


No worries, I'll get to it later today. Linux isn't scared as easily as Cisco Umbrella 

Thanks @blanarahul , I doubt I'll go through all that. I was looking for a simple explanation, I doubt I need a 19 page document for that.


----------



## WhoDecidedThat (Jan 31, 2022)

bug said:


> Thanks @blanarahul , I doubt I'll go through all that. I was looking for a simple explanation, I doubt I need a 19 page document for that.


I'll post some excerpts.


> [ 0017 ] One downside to the use of SLC cache is that it increases the amount of times data is written to the physical memory because data is written twice once to the SLC cache , and then later to MLC storage . Instances in which same data is written multiple times to flash is called Write Amplification ( WA ) . WA can be defined as the actual amount of information physically written to the storage media in comparison to the logical amount intended to be written over the life of that data as it moves throughout the memory device . In addition to the use of SLC cache , an amount of WA is also affected by other necessary tasks on the NAND such as garbage collection . The larger the SLC cache , the more likely a write request is to be serviced by SLC cache . Consequentially , the larger the SLC cache the greater the likelihood of an increase in write amplification .


You and I aren't wrong in worrying about the effect of SLC caching on endurance.



> [ 0018 ] There are two types of SLC cache : static SLC cache in which blocks can only be used for in SLC mode ; and dynamic SLC cache in which blocks can be used in SLC mode or TLC mode . Most current mobile storage devices use dynamic SLC cache . *The maximum program / erase cycle ( PEC ) of the dynamic blocks is same as a TLC block regardless of whether the block is being used for SLC or TLC mode .* Thus , for dynamic SLC cache , the tera bytes written ( TBW ) of a dynamic SLC block is limited to the TBW of a TLC block .


This is extremely concerning and does somewhat answer the question I had. A drive like the Kingston KC3000 with dynamic SLC caching is trading endurance for speed.



> [ 0019 ] Currently , the static SLC cache size is fixed and the dynamic SLC cache size is dynamic . The present subject matter makes the static SLC cache size dynamically based on maximum logical saturation ( LS ) in a device lifetime , in various embodiments . For static SLC cache , the maximum PEC is 20-40 times of dynamic SLC cache , which means that static SLC cache may have 20-40 times data written in the same time period compared to the same size dynamic SLC cache .


So dynamic SLC caching is bad for endurance.



> [ 0050 ] FIG . 3B illustrates an example table for providing dynamic size of SLC static cache . In various embodiments , the device monitors the highest LS and changes the static SLC cache size based on the monitored highest LS . Thus , a memory device residing in different devices may have different static SLC cache sizes . In the depicted example , if the LS is A % , the SLC cache size is determined using the equation : ( 100 % -A % ) / 3 . In addition , assuming a current OP for the 100 % LS is 7 % , the OP static SLC cache is determined using the equation : ( ( 100 % -A % ) / 3 + 7 % ) / A % = 121 - A ) / ( 3A ) . As shown in the table of FIG . 3B , the largest number of blocks of GC to free one block is not increased . Thus , a device using the memory controller of the present subject matter can get the increased TBW benefit from the static SLC cache without increasing the worst - case GC to free additional storage .


I don't understand how they reached the conclusion they did regarding the TBW benefit from static SLC cache. There is no mention of TBW/endurance anywhere else in the paper.


----------



## bug (Jan 31, 2022)

blanarahul said:


> You and I aren't wrong in asking about the effect on endurance of SLC caching. I'll post more as I read more.
> 
> 
> I don't understand how they reached the conclusion they did regarding the TBW benefit from static SLC cache. There is no mention of TBW/endurance anywhere else in the paper.
> ...


Ah, you went for the red herring 

If you understand something well enough, you can explain it in plain language to somebody who doesn't understand the first thing about the subject. If you can't you'll do exactly what black did.

PS Of course writing in SLC more will still eat one p/e cycle. You're still physically writing to a 3bit cell. SLC mode means you're only setting the cell at max or min voltage level, which means voltage doesn't need to be as strict as you don't need to discern between 8 levels anymore. But the wear is still there, the cell continues to lose charge capacity.


----------



## WhoDecidedThat (Jan 31, 2022)

bug said:


> Ah, you went for the red herring


I am afraid I don't understand.


bug said:


> Of course writing in SLC more will still eat one p/e cycle. You're still physically writing to a 3bit cell. SLC mode means you're only setting the cell at max or min voltage level, which means voltage doesn't need to be as strict as you don't need to discern between 8 levels anymore. But the wear is still there, the cell continues to lose charge capacity.


It is because the voltage doesn't need to be strict for SLC mode that I was expecting it to consume less than one p/e cycle. The fact that TLC or SLC either way you are consuming 1 P/E cycle means that we are throwing half the endurance away for speed that most people will rarely use.


----------



## bug (Jan 31, 2022)

blanarahul said:


> I am afraid I don't understand.


He was unable to explain plainly how he got to the conclusion, he just dumped a (seemingly useless) document on us instead. And you went for it


----------



## R-T-B (Jan 31, 2022)

bug said:


> He was unable to explain plainly how he got to the conclusion, he just dumped a (seemingly useless) document on us instead. And you went for it


I mean citations do have uses.  But so do explanations, yeah.


----------



## Mussels (Jan 31, 2022)

Black [Super Saiyan Rosé] said:


> Did you read the fact that SLC static is in OP (so a non-accessible user space)? Theoretically, how can you do writes on SLC if it isn't accessible to the user?
> Also, keep in mind that exists some pSLC SSDs, and that means that the entire SSD is in SLC cache, look at the LX3030 or the P200 for example (you can read that SLC static improves endurance).


The drives can still use the OP space, the OS cant


----------



## Wooden Law - Black (Feb 1, 2022)

bug said:


> He was unable to explain plainly how he got to the conclusion, he just dumped a (seemingly useless) document on us instead. And you went for it


Oh, I understand, I'm sorry that I wasn't able to explain this to the "the guy who is right 90% of the time".


----------



## bug (Feb 1, 2022)

Black [Super Saiyan Rosé] said:


> Oh, I understand, I'm sorry that I wasn't able to explain this to the "the guy who is right 90% of the time".


You managed to get even the sarcasm wrong. Kudos.

And there's no need to apologize, there's still space left to put in a few words how a static cache improves endurance. I'll be around.


----------



## Maxx (Feb 1, 2022)

blanarahul said:


> I keep wondering though. Isn't this writing to the NAND twice? Say I write 100 GB. First I consume 300 GB worth of NAND when writing in SLC mode. Then I consume 100 GB worth NAND in TLC mode. Of course, writing in SLC isn't nearly as harmful but it is somewhat harmful isn't it? We don't even get to choose if we are willing to let go of this SLC caching for better endurance.



I can absolutely answer this for you in detail but it will have to be at a later time. I have discussed this a lot on my discord server. I'll be brief here for now on a quick post (so there might be some errors) but feel free to hit me up directly and/or on discord.

"Writing to TLC in SLC mode causes about as much wear as writing to SLC in SLC mode" from an above comment. This is absolutely false. Native SLC has higher endurance, for one thing, but also there's critical differences in static and dynamic pSLC. The former has its own wear zone, is in OP space, and is made up of the cells with the best data retention (top layers). The latter shares a zone with the native flash (e.g. TLC). Black unfortunately linked the wrong patent for this discussion; Intel has one where they clarify that on the balance, a dynamic SLC write that later goes to TLC is approximately 0.4 times as impactful as a TLC erase but they count it conservatively as a full TLC erase. Micron in their Dynamic Write Acceleration document also specifically talks about "additive wear" which means rewriting to TLC increases wear.

"Anything written in SLC mode uses 1/4 the lifespan of QLC writes" is also false. You can see with the pSLC Chia drives made from QLC, which is rated up to 1000-1500 P/E (64-96L Intel), that the flash is rated for 30K P/E in permanent (static) SLC mode. It's not a linear progression regardless; for example, you only need one read point for SLC but 7 for TLC and 15 for QLC which amounts to 7/3 (points/bits) or 2.33 for TLC and 15/4 or 3.75 for QLC nominally (see: Kioxia's 96L QLC ISSCC digest). For programming it's more complex but you need verification reads there as well.

"SLC mode *reduces* the number of write operations* - no, a page is a page. pSLC mode is just one page per word line while TLC is three pages per word line. SSDs generally write with page granularity which is 16k with modern consumer flash, sooner or later it may get moved to native flash and takes up the same amount of space. "Folding" is taking 3 SLC blocks and compressing them into a single TLC block, but each SLC block is made from a TLC block. As such you are doing a SLC write, a SLC read, and then a TLC write, with the TLC write being an average for writing all 3 of its pages (lower/middle/upper). If you mean that writing to SLC can defer writes and avoid writing to TLC which, on the balance, is better for wear, then that's true; DRAM on a SSD works similarly (for metadata updates) and likewise host memory (RAM) caching writes before committing to non-volatile media does as well.

"Intelligent behavior files ... would stay in SLC" - this is actually true. Modern SSDs have behavioral profiles and algorithms for SLC caching and will retain certain user data in SLC to improve read performance. It's also good to defer writes to reduce additive wear.

"What makes static cache better for endurance" - because it uses the best cells/blocks of each die, but also because it doesn't have to convert back to native flash. Dynamic does which as mentioned above is typically counted as a native flash erase; the SSD will cycle through all available flash (addressed logically) based on wear. The average lifespan of the deck is going to be weaker because the lower cells/blocks have worse data retention (but faster program speed) due to differences in the critical dimension and related coupling capacitance, as caused by uneven etching from the required high aspect ratio. As with all things this can become more complex because space used for static SLC may reduce what's available for ECC and/or spare, and in fact many patents (including Black's) allow for rebalancing as the flash is worn, a good example being focusing on OP early on (to reduce write amplification) then reallocating for more ECC near end of life, and similarly static SLC can be reallocated as dynamic-native. (note: static having its own wear/GC zone means that endurance is "worst of" between that zone and dynamic-native, and drives can balance writes accordingly later in lifespan, and in fact random writes -> SLC mode vs. sequential -> TLC is one strategy)

Back to the OP - yes, programming in pSLC mode has less impact since the charge threshold is much larger and you can use fewer pulses (ISPP). It's a bit more complicated than that when involving other factors (temperature incl. dwell/swing, wear level of the flash, architecture, et cetera) but on the balance it's an order-of-magnitude less harmful. I'm also discounting direct-to-native (e.g. direct-to-TLC) which can be done for algorithmic reasons, and we do see many modern drives get "stuck" in such a mode (for example, launch 980 PRO with benchmarks).


----------



## bug (Feb 1, 2022)

@Maxx Finally something I can work with.
Reserving best cells for pSLC doesn't seem to be an advantage. Controllers shuffle writes around to use the best cells anyway (like you noted), so the overall number of writes is still the same. That seems to be the only argument in favor of "static cache is better for endurance".
I also don't understand the "but also because it doesn't have to convert back to native flash". I would appreciate if you could elaborate a bit.


----------



## WhoDecidedThat (Feb 1, 2022)

> bug said:
> 
> 
> > I also don't understand the "but also because it doesn't have to convert back to native flash". I would appreciate if you could elaborate a bit.





Maxx said:


> Intel has one where they clarify that on the balance, a dynamic SLC write that later goes to TLC is approximately 0.4 times as impactful as a TLC erase but they count it conservatively as a full TLC erase.


If they won't be converting the dynamic SLC back to TLC, they can count it properly as 0.4 P/E. Or to put it another way static SLC mode has 25,000 P/E cycles and dynamic SLC mode has 10,000 P/E cycles if TLC has 10,000 P/E cycles. Of course, 3D NAND designed for SLC operation should be able to easily reach 100,000 P/E cycles (as it did for planar SLC flash a decade ago).



Maxx said:


> I have discussed this a lot on my discord server


Sorry I can't find the link to your discord in your profile. Can you share it here?


----------



## bug (Feb 1, 2022)

blanarahul said:


> If they won't be converting the dynamic SLC back to TLC, they can count it properly as 0.4 P/E. Or to put it another way static SLC mode has 25,000 P/E cycles and dynamic SLC mode has 10,000 P/E cycles if TLC has 10,000 P/E cycles. Of course, 3D NAND designed for SLC operation should be able to easily reach 100,000 P/E cycles (as it did for planar SLC flash a decade ago).


But if the cache is dynamic, each write is still of the "less harmful" kind.
More to the point, if the static SLC cell only has 1,000 p/e cycles left, that's it, it has 1,000 p/e cycles and there's nothing you can do about it.  But if  dynamic cache cell is down to 1,000, you can switch to another cell having more p/e cycles available that was previously allotted to main storage.


----------



## Maxx (Feb 1, 2022)

bug said:


> @Maxx Finally something I can work with.
> Reserving best cells for pSLC doesn't seem to be an advantage. Controllers shuffle writes around to use the best cells anyway (like you noted), so the overall number of writes is still the same. That seems to be the only argument in favor of "static cache is better for endurance".
> I also don't understand the "but also because it doesn't have to convert back to native flash". I would appreciate if you could elaborate a bit.


Static pSLC does use the top layers which due to HAR have better data retention characteristics. Samsung discusses this in a digest for their 6th generation V-NAND (92L, reference 3), citing this article (see figure 6). For a source on the static SLC part, see here (start reading at line 53, column 5, then refer to 330-1 in figure 3). Static pSLC is dedicated for the life of the device and never converts back to native flash in operation so does not have the additive wear associated with dynamic pSLC. Because of this, the wear zone (and garbage collection) is separate from native flash and dynamic pSLC (which share a zone), such that the lifetime of the flash is the worst of the two zones. Drives like Intel's 545s, which has static pSLC, counts SLC writes separate from TLC for this reason. There's multiple patents related to this, like this one. Plenty of other good patents with more details on endurance with SLC vs. XLC as well (listing 40K P/E for static SLC in this case - I do have articles suggesting the average P/E of a dynamic SLC block, again since it comes from the logical pool of native flash, is a bit lower, e.g. 30K relative).

Fundamentally, all else being equal, static pSLC improves the endurance of the flash for these reasons, however (again all else being equal) as I stated above this may need to be balanced with other factors as the device is worn. This patent illustrates how and why you might want to reallocate over time - see figure 3B. Specifically also read [0018] and [0019], then [0021] and [0022].



blanarahul said:


> If they won't be converting the dynamic SLC back to TLC, they can count it properly as 0.4 P/E. Or to put it another way static SLC mode has 25,000 P/E cycles and dynamic SLC mode has 10,000 P/E cycles if TLC has 10,000 P/E cycles. Of course, 3D NAND designed for SLC operation should be able to easily reach 100,000 P/E cycles (as it did for planar SLC flash a decade ago).
> 
> 
> Sorry I can't find the link to your discord in your profile. Can you share it here?


Here is the patent I'm referencing (hosted on my domain/site), start at line 4 in column 7. To see why they count it as a full TLC erase anyway, start at line 59 in column 6. If you check my sources above (in my quote-reply to bug) you'll see 40K P/E being referenced which is a real possibility - most datasheets are at 30-40K in static/permanent SLC mode (as with QLC for Chia drives). I've posted these on my discord, specifically Micron's datasheets list SLC and TLC mode endurance (B17A for example would be 30000/1500). Also correct that in the same node, native SLC will be 100K+. I've illustrated this by comparing Kioxia's 96L flash - they have digests for TLC, QLC, and SLC (XL Flash) which we have access to also.

I'm on Reddit under NewMaxx and also run a subreddit with the same name (/r/newmaxx) - which links to my server. Not sure on rules about posting these things here.



bug said:


> But if the cache is dynamic, each write is still of the "less harmful" kind.
> More to the point, if the static SLC cell only has 1,000 p/e cycles left, that's it, it has 1,000 p/e cycles and there's nothing you can do about it.  But if  dynamic cache cell is down to 1,000, you can switch to another cell having more p/e cycles available that was previously allotted to main storage.



Not all the cells will be written equally in absolute cycles, the controller picks the cells with the least _effective_ wear for dynamic SLC mode and cycles through over time. Blocks and their properties (differences) are tracked in tables, for example with bias for programming, because of variation. This variation implies that the dynamic-native zone will have lower average endurance/cycles than dedicated (static) because the average cell/block is in the middle of the deck. Worth noting from my source above, the lower cells do program faster and this is also a characteristic of P/E cycling (i.e. worn out flash performs worse with reads due to ECC and has worse data retention but programs faster due to material breakdown). So we're talking writes per block in comparison, but it's a bit irrelevant when you consider my sources above.


----------



## GabrielLP14 (Feb 1, 2022)

Nice explanation @Maxx or should i say Newmaxx aahah.


----------



## WhoDecidedThat (Jan 8, 2022)

Kingston KC3000 Review - Faster Than Samsung 980 Pro
					

The Kingston KC3000 is built using the Phison E18 controller and Micron's best 176-layer TLC NAND flash. In our performance testing, the drive can beat the Samsung 980 Pro and is the fastest SSD we ever tested. It shares that performance throne with the WD Black SN850.




					www.techpowerup.com
				




The Kingston KC3000 has 2000 GB of TLC cache. It can use almost all of it (1930 GB) in SLC mode for 1930/3 = 643 GB.

I keep wondering though. Isn't this writing to the NAND twice? Say I write 100 GB. First I consume 300 GB worth of NAND when writing in SLC mode. Then I consume 100 GB worth NAND in TLC mode. Of course, writing in SLC isn't nearly as harmful but it is somewhat harmful isn't it? We don't even get to choose if we are willing to let go of this SLC caching for better endurance.

@Chris_Ramseyer Can you offer some insights into how harmful (or harmless) SLC caching is to NAND endurance?

@W1zzard Can you offer some insights to this?


----------



## bug (Feb 1, 2022)

@Maxx Ok, that actually makes sense (I haven't gone through the documents yet).
And now the next question: if static cache can improve endurance, by how much does it do so?


----------



## Maxx (Feb 1, 2022)

bug said:


> @Maxx Ok, that actually makes sense (I haven't gone through the documents yet).
> And now the next question: if static cache can improve endurance, by how much does it do so?



It's effectively two arguments anyway, between actual and measured endurance. Which is to say, the implication is that dynamic pSLC is treated effectively as native/TLC even though there will still be endurance improvements (effectively). It's also oriented at consumer rather than enterprise workloads, similar to the differences by JEDEC. There's a lot to it without even considering the technical aspects at a lower level; I only mention them to illustrate that all flash is not equal even within a singular die, on top of other issues like write amplification.

Two drives you can compare are the FuzeDrive, which has static pSLC + QLC, and the T-Create Expert, which is dynamic pSLC with industrial TLC. The endurance document for the FuzeDrive demonstrates that they take the static pSLC at 30K and native QLC at 600 PEC. The 2TB (of flash) model has 137.44GB of static pSLC (549.76GB of QLC) and 1462.73GB of native QLC (total of 2012.49GB). With all in QLC mode at 600, this is ~1.2PB of writes. Calculating it with static pSLC comes out to 5PB. The T-Create Expert, conversely, uses flash that's rated for 10K PEC in TLC mode but utilizes dynamic pSLC; it's warranty is only 6PB per 1TB capacity (6000 PEC equivalent).

Static pSLC in most consumer drives is limited, e.g. ~12GB in a 1TB model. However you can calculate it using a typical B17A value of 1500/30000 with pSLC taking up 3 times the space to see the improvement isn't huge in direct terms (and again, must be balanced against other factors like OP), however be mindful also that SLC writes in that case do not count towards general NAND writes as found on dynamic-only drives and therefore can have a <1.0 WAF. However, most drives are moving towards a hybrid (static + dynamic) structure as with Samsung's TurboWrite; in that case, it writes to static first, then dynamic, and empties as such (FIFO) but there are complex algorithms for others like the P5/P5 Plus. Nevertheless if you read up on TW you will see they do static first to improve endurance. (more complicated algorithms will send workloads to the different zones - i.e. static pSLC versus native - based on their anticipated WAF, among other things, and I do have patents for that; additionally, controllers have algorithms for this even with dynamic pSLC - I have some from Phison - and moreover they can shift the size of zones as per SanDisk patent above but more recently Micron patents for the P5/P5 Plus proprietary controllers)


----------



## WhoDecidedThat (Feb 3, 2022)

Maxx said:


> With all in QLC mode at 600, this is ~1.2PB of writes. Calculating it with static pSLC comes out to 5PB.





Maxx said:


> most drives are moving towards a hybrid (static + dynamic) structure as with Samsung's TurboWrite; in that case, it writes to static first, then dynamic, and empties as such (FIFO)



Sorry if I sound like a broken record but the Fuzedrive has 5 TB endurance because it is not flushing the SLC writes to TLC. Samsung’s TurboWrite would eventually have to empty its static SLC cache. Won’t this negate any endurance benefit?


----------



## GabrielLP14 (Feb 3, 2022)

Eventually even data on the static pSLC gets moved to the TLC blocks, as @Maxx said, some data might be retained longer in order to improve performance in read request operations.


----------



## bug (Feb 3, 2022)

GabrielLP14 said:


> Eventually even data on the static pSLC gets moved to the TLC blocks, as @Maxx said, some data might be retained longer in order to improve performance in read request operations.


Reads aren't a bottleneck for SSDs, writes are.


----------



## WhoDecidedThat (Feb 3, 2022)

bug said:


> Reads aren't a bottleneck for SSDs, writes are.


Agreed. It is beneficial to have greater read speed (Intel P5800X Optane SSD) but it is the slow direct-to-TLC write speed that bothers SSD makers and makes them want SLC caching solutions in their products (which ends up harming endurance in the case of dynamic SLC caching).


----------



## GabrielLP14 (Feb 3, 2022)

bug said:


> Reads aren't a bottleneck for SSDs, writes are.


but that's not what i'm saying.


----------



## Maxx (Feb 3, 2022)

blanarahul said:


> Sorry if I sound like a broken record but the Fuzedrive has 5 TB endurance because it is not flushing the SLC writes to TLC. Samsung’s TurboWrite would eventually have to empty its static SLC cache. Won’t this negate any endurance benefit?


The QLC portion of the FuzeDrive does have dynamic pSLC, but the SLC portion is effectively static pSLC (there's also Chia drives from QLC as static pSLC, e.g. 8TB -> 2TB). I've disassembled Enmotus's driver (which is _required_ for proper use) and it uses a table structure to determine where data (via blocks) goes, but that doesn't mean data STAYS in one area of the other. As per AnandTech: "The host system sees device with one pool of storage, but the first 24GB or 128GB of logical block addresses are mapped to the SLC part of the drive and the rest is the QLC portion. The Enmotus FuzeDrive software abstracts over this to move data in and out of the SLC portion." To be fair, there are more considerations with a drive like this. (I believe more generally the drive is rated for 3.6PB of writes)

Their endurance calculation in the document is simply total of SLC section times capacity plus total of QLC section times capacity which is an accurate representation of static pSLC as talked about above because as I mentioned (and supported with a patent link), flash endurance is the worst of both zones such that a controller may balance writes to ensure maximum endurance, e.g. 5PB in this case. Indirectly you also have workload placement - which I mentioned, as an example, with random writes going to SLC and sequential to native, so that the higher WAF hits SLC - and other criteria (I've asked them about this, and they actually have more than a few considerations) here, important because utilizing SLC is not just about performance.

As for Samsung: again, static SLC never changes to native so doesn't have the additive wear. It's also a way to defer writes which can reduce write amplification, as with dynamic pSLC, although being in dedicated OP space it has a bit more flexibility (being mindful that it should be using the "best" cells/blocks). Good examples of utilization are static-only drives, like the SN550 and SN750, and hybrid caching drives with QLC, like Intel's 660p/665p/670p series.



bug said:


> Reads aren't a bottleneck for SSDs, writes are.


He's saying that data often languishes in SLC cache on consumer drives because consumer use is read-heavy but, yes, it's also done to defer writes which can reduce total wear. Patents showing this decision (via algorithm) indicate usually user/boot data remains in a SLC mode, for example to improve OS boot times, even though the difference is not huge. Other data is always or almost always stored in SLC, for example metadata including what is mirrored in DRAM, specifically to improve performance, although there tends to be the most benefit with writes. Read disturb and data decay (stale data) are growing issues even if not significant for consumer use and pSLC is less impacted.



blanarahul said:


> Agreed. It is beneficial to have greater read speed (Intel P5800X Optane SSD) but it is the slow direct-to-TLC write speed that bothers SSD makers and makes them want SLC caching solutions in their products (which ends up harming endurance in the case of dynamic SLC caching).


The SLC mode has many benefits, including some listed above in this larger reply. For example, SLC is much less prone to data-in-flight errors from power loss. It also ensures protection when data is moved to native flash, and also improves performance because you can bypass ECC with copyback (folding). Write amplification is reduced since copyback is sequential which is one reason some patents divide workloads by type, e.g. random writes to SLC, which takes ECC into consideration also based on data type (for efficiency purposes if nothing else). Which is to say there are many reasons SSD makers use a SLC mode, it's not just write performance, and pSLC is faster in tR as well (and is therefore often used for metadata). There are reasons to prefer direct-to-TLC such as heat generation but actually it seems a lot of manufacturers artificially limit native flash speeds these days, for example with the new SN550 (or TLC SN530), Samsung's 870 QVO, the P5 Plus, etc. Which certainly does indicate they are keen on being SLC-reliant. The endurance ramifications are more challenging to fully describe, although I don't think flash wear-out is a serious concern in any case here.


----------

