# Sequential R/W vs Random R/W



## Houdini (Jun 22, 2020)

Hi ! 

I'm trying to understand the behavior behind those 2 terms: "Random" and "sequential"

As I see examples of it , it is as follows:

1) *Sequential read*: One file can't be written part by part scattered all over the disk clusters right ?, I mean sectors taken by 1gig file comes one after another , So when software loads big file it's a best example of sequential read.
2) *Sequential write*: In short best example would be when software writes one big file.

3) *Random Read*: When software loads , it needs to load hundreds of dlls and other components which are scattered at different places around the disk.

4) *Random Write*: When software scatters files all over the disk, or What is the best example of Random write?  That is hardest part for me  

and after all, are that first 3 above statements correct at all ?

Thanks !


----------



## sam_86314 (Jun 22, 2020)

Sequential read/write is relevant when dealing with large files. Think moving game installs or OS images around.

Random read/write is relevant when dealing with lots of small files scattered around the disk. An example would be any program that needs to access system libraries.

SSDs tend to have excellent performance in both scenarios, and HDDs are usually decent when it come to sequential and pretty bad for random.


----------



## Houdini (Jun 22, 2020)

sam_86314 said:


> Sequential read/write is relevant when dealing with large files. Think moving game installs or OS images around.
> 
> Random read/write is relevant when dealing with lots of small files scattered around the disk. An example would be any program that needs to access system libraries.
> 
> SSDs tend to have excellent performance in both scenarios, and HDDs are usually decent when it come to sequential and pretty bad for random.



That's what I thought
So it means all 4 statements in first post are correct .


----------



## sam_86314 (Jun 22, 2020)

Houdini said:


> That's what I thought
> So it means all 4 statements in first post are correct .


Yes.

Random write would probably affect things like changing settings in your OS. Some games and programs may also save data in folders containing small files.

If you've seen how quickly a 4GB ISO (or whatever) file copies from one drive to another compared to a 4GB folder full of thousands of files, then you've seen the difference between sequential and random performance.


----------



## WarthogARJ (Jul 23, 2020)

Hi,
In order to define Sequential and Random, it's better not to use a specific example of loading software, especially since you have added in Read and Write.

Therefore, think of it as how they relate to the transfer of data to or from one "place" to the storage disk.
I say a "place", because let's not worry now where it's coming from, just that it's comiomng from outside the storage drive itself.

And there's a difference in how they behave, depending on the TYPE of storage disk.
As in if it's a HDD (spinning disk) or a SSD (solid state drive).

Fundamentally, RANDOM means writing, or reading from or to, RANDOM places on your storage disk: as in all over the place, in no order.
In a HDD, the head needs to physically move to that space, to write or read.
Since it only moves in and out, it therefore moves to the right radius on the disc, and picks up the data as it comes by on the spin.

If it's RANDOM, then it will need to move again, before the next piece of the file can be read, or written.
If it's SEQUENTIAL, then the data will be READ or WRITTEN without needing to move the HDD head.
So you can see that Random operations on a HDD are intrinsically slower than Sequential, all else being equal.

However, when you apply this to a SDD, without moving parts, it's different.
Partly because there is no mechanical movement needed, but also because the actions of READING and WRITING on a SSD are different from each other, and from what happens on a HDD.

This is very simpified, but the idea is that to WRITE data on an SSD, you first need to ERASE a block of NAND flash.
Or at least have an already erased block available.
It's a fairly large part of the total memory (you tend to have many thousands on a given SSD).
And then you can Write data to it (it's usually calld "Program" rather than Write").
And regardless if it's a big piece of data (say 1MB) or a small piece (4kB) you ALWAYS have to erase an entire block to write to it.

To READ, you don't worry about that.

If the data is being Read or Written sequentially, then it's more complex to explain.
It's NOT the same thing as just a faster version of a HDD.

Sequential Writes are written to an entire BLOCK of the NAND data, as compared to Random Writes that are not (written on several).
Reads are the same story.
So the Controller, will see a Sequential command to Write, and do it all to the same block.
A bunch of Random Writes, will as likely to sent to whatever space is available.
And Random reads, obviously it needs to do that in reverse.

Although there is no mechanical head moving, there is still some delay in the various operations, called LATENCY.
When you add that up, it tends to make Sequential FASTER than Random: all else being equal.

However, a lot depends on both how FULL the SSD is, and how big the demand is on it, to right and read.
The SSD NAND Flash interacts with the controller, and other cache (like DRAM, somtimes System RAM, and sometimes the NAND itself) to determine how fast everything happens.

Something called "garbage colection" is involved in terms of freeing up space that's faster to write to.
And the relative as well as absolute speed of Reads and Writes, be they Random or Sequential, happens on what has been done to the SSD before.
So it's quite a delicate balance of:
- Demand on the controller, in terms of VOLUME Reads and Writes, and the TYPE of Reads and Writes (size of data packet, queue depth, sequential or random)
- Free space left
- Type of Space left (if it's SLC, MLC, TLC, QLC or pseudoSLC)
- Amount of over proportioning (special term, you need to look that up)

If there's low demand, and lots of free space, it all goes smoothly, like a juggler with only a few balls.
If the juggler is given a LOT of balls, and has to ride a unicycle, while trying to sign a song, it's much more complex.

This is from Seagate:
"... the need for garbage collection affects an SSD’s performance, because any write operation to a “full” disk (one whose initial free space or capacity has been filled at least once) needs to await the availability of new free space created through the garbage collection process. Because garbage collection occurs at the block level, there is also a significant performance difference, depending on whether sequential or random data is involved. Sequential files fill entire blocks, which dramatically simplifies garbage collection. The situation is very different for random data.

As random data is written, often by multiple applications, the pages are written sequentially throughout the blocks of the flash memory.
The problem is: This new data is replacing old data distributed randomly in other blocks. This causes a potentially large number of small “holes” of invalid pages to become scattered among the pages still containing valid data. During garbage collection of these blocks, all valid data must be moved (i.e. read and re-written) to a different block.
By contrast, when sequential files are replaced, entire blocks are often invalid, so no data needs to be moved. Sometimes a portion of a sequential file might share a block with another file, but on average only about half of such blocks will need to be moved, making it much faster than garbage collection for randomly-written blocks. ..."

A good reference is here:
Seagate

But bear in mind, that some of what Seagate is saying is specific to their own controllers, that can work better with what's called co,pressible data.
However the genewral ideas are applicable to all SSD's.

You should Google about this, and read a number of links: there's a LOT written about this:
Understanding Flash Memory
Flash DBA
Flash Memory

This is EXCELLENT for the graphics, and very well explained definitions:
SNIA Standard
And the SNIA site is good, but takes some patience in reading and THINKING:





						Solid State Storage (SSS) Performance Test Specification (PTS) | SNIA
					

The SNIA has developed methods which enable manufacturers to set, and customers to compare, the performance specifications of Solid State Storage devices, which are evolving with the state of the technology. These specifications define a set of device level tests and methodologies which enable...




					www.snia.org
				




Micron has made these three modules which are worth going through:
Micron Training Modules

And this:
Micron White Paper

And going to a more detailed level, here's a link to a NAND flash data sheet:
Micron NAND Datasheet

If you just skim through it, and see the various details about NAND, it's useful.
Page 15 to 19 are good.
They show the schematic of how a given NAND package works.
An SSD will have several packages of NAND, each made up of dies of NAND.

The specifics are not important, but the CONCEPT that you have a faily complex ARRAY of flash, thats' arranged in a specific ORDER (or architecture is a VERY useful idea).
You can see that just for this quiteold type of Flash, there are LOTS of ways to combine it.
And this is only the older 1D variant.
The more modern 3D types are even more complex.

The other useful page 118 (Table 33) is where they summarise the LATENCY of the flash, in terms of how long it takes to do a specific operation:
Block Erase Time: 0.3 milliseconds typical
Program Page (i.e Write): 200 microseconds typical (or 0.2 milliseconds)
Data transfer time (i.e Read): 45 microseconds

These are a characteristic of that type of NAND flash, and change by quite. bit from type to type (number layers, if SLC, MLC, TLC, QLC, 1D or 3D etc).

The point to take away, is that this is why a Read, or Write, and Random or Sequential operations are going to be different.

To end this very brief answer to quite a complex question, is what SNIA says:
"The Storage Networking Industry Association – SNIA – determined a few years back that it should address SSDs since they were about to become an important part of most storage systems.  To this end SNIA created the Solid State Storage Initiative, or SSSI. They didn’t name it after SSDs *since there will clearly come a time when flash stops pretending it’s an HDD and abandons standard HDD mechanical and interface specifications*."

That's very important to understand, especially what I highlited at the end.

SSD is COMPLETELY different from HDD, but a lot of the terminology, concepts and benchmarks from HDD's are still being used on SSD's: not always very successfully.
It's not a complete evil, but you need to be careful, because it can lead you astray.

It's exactly like photography, where modern solid state photography, with a sensor that writes to memory, has replaced film format to most extents.
The ideas of ISO speed, shutter speed, and even the need for an SLR (with a prism etc) are not really directly translatable from film format to solid state photography. And yet for tradition, a lot are still used.
And that's sometimes useful to understand the concepts of photography in general, but can be misleading too.
That's the point that the SNIA is making.

Regards,
Alan
P.S. I wrote this because it was useful to me to put my thoughts down, in terms of helping ME understand things better.
I'm not an SSD expert per se, but I've spent time time looking into them, and the concepts are still fresh to me, and I think I'm aware of the critical things to understand for a newcomer (like myself).
If some thinks there's a drastic mistake, please tell me, and I can revise it.


----------



## WarthogARJ (Jul 25, 2020)

Hi Messieur Escape Artiste,
Not sure if you have read me response yet, I trust you followed it OK.

I tried to edit my first post, and I think it's been closed to edit for me automatically after a certain time.

Important point: your question was posted in the STORAGE part of the Forum, and I was interpreting it that way, but in truth, it is best to look at it first from the entire SYSTEM point of view, as in how the COMPUTER handles Read and Write commands, and if they are Random or Sequential. The SPEED and EFFICIENCY of this is indeed largely determined by the storage that are used upon, but in fact the computer as a whole affects them:
- OS
- CPU
- Motherboard
- RAM
- GPU (arguably indirectly)

There's a lot of interaction between the storage and the rest of the system. For instance, sometimes the drives uses part of system RAM as a cache ("HMB").

As I said, the distribution of data on a drive is very strongly affected by the TYPE of type: say HDD or SSD.

In a HDD, after some use, you find that data files are spread all over the entire volume. And if you want to Read one big file, it might be in small pieces, physicaly spread all over the disk. So that takes quite a bit olf time to read. What you can do to improve this is to "defragment" your drive, and physically read and write file data closer together, so you can access it faster.

But a SSD doesn't work like that, and you cannot "defragment" it. However the "trim" operations are similar in overall effect. And "garbage collection" is not what it sounds to be: quite the opposite in fact.

In addition, the term "Random" is a poor word to use, because it's not actually random in terms of absence of ordered sequence. A better way would be to think of "Sequential" as being Ordered, and "Random" as "not sequential". Something between Ordered and Random.

Lastly, throughput you measure from a R or W command, if it is Randdom or Sequential, in terms of how much data you transmit, is quite complex, and depends on factors like:
- Type of drive
- Computer system (OS, RAM, CPU etc)
- Characteristics of the drive (for a SSD, this is capacity, how full, type of NAND, amount of pSLC, amount of Over Provisioning)
- History: past operations to the drive (what was written to it before)
- Time: with respect to the actual throughput you are asking for, this affects things like thermal behaviour, and if you are in steady-state or not

If you look at the relative speed of Read vs Write, it becomes difficult to predict which is faster in a given case, unless you know all of the above. For instance, Read is not always faster than Write. You can see this by looking at benchmarks that measure Read vs Write, and even at Mixed R/W operations. 

In terms of Random vs Sequential, the general trend is that Sequential is faster, all else being equal. But not in every case. And the relative difference between the two in terms of speed can vary dramatically, even on the same drive.

The amount of data being written (or if you like, the time it took to write it) has a major influence, as the Seagate whitepaper Figure 4 showed (attached).
Steady state is achieved for sequential in 15 minutes, but it takes hours to do that for random. And as you can see from just a cursory examination of long term write tests, the performance before steady state, and even at steady state is extremely volatile.




Indeed, even from this one example you can see that for a short period just after 15 minutes that Random performance was much higher than Sequential. Maybe only for a few minutes, but it shows the point, and one might well see it happening for longer under other circumstances.

In general, a good analogy would be to think of these processes as a workman passing bricks to another. The 2nd guy could stack them in an ordered way (Sequential), and when it came time to use them, it would be faster in many ways. But stacking takes a bit longer, so when he gets rushed, he just tosses them around, in a "non-ordered" way.

If he gets a bit of a break from the bricks coming in, he can make things better by collecting them, and imposing a bit more order, to make it easier to use them in later stages (as in READ them).

A fancier system would have him put the incoming bricks on a wheelbarrow, that he could then move to where he wants to stack them in more order. Maybe not sequentially, but better than just flinging around. But if the first guy is flinging him bricks so fast that the wheelbarrow overflows, then he cannot use it efficiently.

So that's where the size of the cache comes in, in terms of more DRAM. It's a bigger wheelbarrow.

And initially he can deal with incoming bricks quite quicly, because hes' just had his break, and a strong coffee. But as time wears on, that effect is reduced, and he can only work at a slow speed. So having a bigger pseudo SLC cache is like having a stronger coffee.

As far as computers go, Random, Sequential, Read and Write are important in terms

Ask any questions, I'd be interested in improving it as much to make my own understanding it better, as for yours.
I'm a believer in what Feynman thought, that you only truly believe something if you can explain it in terms that an intelligent teenager would follow.
Not that I'm saying you are a teenager.

In addition, this is not my first rodeo, so to speak. Although I'm somewhat new to SSD's, and I suppose to the latest of computer science jargo, I'm not new to computing. My first analytical system was a slide rule (in high school), and my first computing course was using Fortran, for a mainframe, using punched cards.

I'm an engineer, and I like standards, and guidelines. And I don't see a very rigorous use of them in current computing science. At least not consistently, or completely.

So, rant over. But you do need to be careful about people trying to tell you things that are either wrong, or incomplete. Mostly because the actual foundation they have built up their understanding is weak. Hah.....OK, rant NOW over.


----------

