This topic has also been discussed in the FreeBSD forums related to ZFS:
https://forums.freebsd.org/threads/best-ashift-for-samsung-mzvlb512hajq-pm981.75333/#post-529225
OTOH using 4k filesystem sectors on a 512n disk would only mean that the drive firmware has to translate that one request into 8 distinct requests to the storage. You needed to access all of those 512byte sectors anyway, so the performance impact is miniscule if there even is any.
So it is always the safer option to go for 4k sectors on the filesystem side, especially if your pools are usually "here to stay". I've used ashift=13 during pool creation for several years now and even pools that have migrated from spinning rust to SSD and now to NVMe I've never seen any performance issues.
Except that the disk we're talking about here is not spinning rust, but flash cells. It has a real block size, which is likely to be much larger than either 512 or 4K. There are interestingly complex performance implications of adjust the block size that the file system uses (ashift for ZFS), and bigger is not alway better. The biggest problem is that a larger ashift value may waste space. In my personal opinion, that is much less relevant than people typically think (since most space in a file system is typically used by a small fraction of large files), but depending on the usage and workload, the correct answer can matter significantly.
Is there a measurable performance improvement using ashift=13 vs ashift=12. Personally i don't think so.
I am sure NVMe have some flash translation. Do you think they use pages too?
Deeper Question. Why do we need software trim when the drives handle garbage collection at the controller level?
Then there is also someone who has done a benchmark:
4096 # sectorsize
4 kbytes: 17.6 usec/IO = 222.2 Mbytes/s
8 kbytes: 19.3 usec/IO = 405.6 Mbytes/s
16 kbytes: 23.6 usec/IO = 661.4 Mbytes/s
32 kbytes: 29.7 usec/IO = 1050.8 Mbytes/s
64 kbytes: 42.4 usec/IO = 1473.2 Mbytes/s
128 kbytes: 68.6 usec/IO = 1823.1 Mbytes/s
256 kbytes: 118.8 usec/IO = 2104.9 Mbytes/s
512 kbytes: 235.2 usec/IO = 2125.7 Mbytes/s
1024 kbytes: 468.8 usec/IO = 2133.2 Mbytes/s
2048 kbytes: 938.9 usec/IO = 2130.2 Mbytes/s
4096 kbytes: 1870.4 usec/IO = 2138.6 Mbytes/s
8192 kbytes: 3758.2 usec/IO = 2128.7 Mbytes/s
512 # sectorsize
4 kbytes: 17.5 usec/IO = 223.3 Mbytes/s
8 kbytes: 19.1 usec/IO = 409.5 Mbytes/s
16 kbytes: 23.8 usec/IO = 657.9 Mbytes/s
32 kbytes: 29.7 usec/IO = 1052.2 Mbytes/s
64 kbytes: 42.5 usec/IO = 1471.1 Mbytes/s
128 kbytes: 68.1 usec/IO = 1834.2 Mbytes/s
256 kbytes: 130.2 usec/IO = 1920.9 Mbytes/s
512 kbytes: 252.6 usec/IO = 1979.4 Mbytes/s
1024 kbytes: 497.4 usec/IO = 2010.6 Mbytes/s
2048 kbytes: 958.2 usec/IO = 2087.2 Mbytes/s
4096 kbytes: 1861.7 usec/IO = 2148.6 Mbytes/s
8192 kbytes: 3718.1 usec/IO = 2151.6 Mbytes/s
No increase in speed at any level.
Hello, I'm running a server with 4xSAS raidz plus 1xNVMe SSD (SAMSUNG MZVLB512HAJQ). The NVMe drive is split in 3 parts: # zpool status pool: nvme state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM nvme ONLINE 0 0 0...
forums.freebsd.org
He achieves 1834.2 Mbytes/s for 128kbytes, but he achieves this for
random writes. While this should be about the maximum
sequential write speed according to Samsung.
This is striking because for 128kbytes there are huge differences between sequential and random writing:
Sequential I/Os always outperform Random I/Os. Accessing data randomly is slower & less efficient than accessing it sequentially. True for HDDs and SSDs.
condusiv.com
My guess is that windows 11 with ZFS will only get around 600 Mbytes/s with this drive for (128 kbytes) random writes or most likely even lower.
Linux will get around 1000 Mbytes/s for (128 kbytes) random writes on ZFS, or lower.
ZFS is mainly designed for features and it's the most robust form of data storage that exists. It is not designed for speed..
These are remarkable results.
We should not forget that NTFS is more than 40 years behind ZFS in terms of technological sophistication.
We can clearly see that the operating system is by far the single biggest determinant of write/read and IOPS performance.