I guess it is a mystery, then.
I feel like random reads/writes should be the main focus of engineers, both hardware and software. SSD progress is basically dead when it comes to that aspect. What is the point of introducing more bandwidth (Gen4) at almost double the price, when those drives have the same garbage random access performance as slower and cheaper drives.
It should be, but it doesn't look good on marketing slides, so it rarely is. Also, increasing random performance is a lot harder than increasing sequential performance, both due to the aforementioned OS bottlenecks as well as how flash memory functions physically. That's also why good SSD testing relies on real-world applications rather than canned benchmarks, as those benchmarks have little to do with real-world use and drives are often tuned to perform well in them (their access patterns and performance requirements are well known and relatively simple).
By the way, what does the 4K-64Thrd benchmark represent? Accessing 64 different random 4K blocks at the same time? So in theory software (including Windows) could be programmed to access data in that way? Is that what is needed to get the max out of SSDs?
Both increasing thread counts and queue depths are ways of increasing the amount of work an SSD is asked to do simultaneously/in rapid succession, which increases performance in random workloads. After all, if a single thread is asking for a single 4k read at a time, especially with processing or other stuff in between but even just waiting for that read to finish and the data to arrive before asking for another, the SSD is sitting idle for the
vast majority of the time. The problem is that you can't just change how software is programmed - while programming is full of hacks and poorly optimized code, at the end of the day any application only needs the data it needs, and only needs it when it's needed. Making your app capable of accessing data with tons of threads doesn't matter if none of those threads are actually needed for accessing data.
Of course, this quickly gets complicated: games, for example, have historically been developed with an expectation of HDD storage, as supposing an SSD (for example, near-instant seek times) can lead anyone using an HDD to experience severe performance issues - stuttering, data failing to load, major pop-in, etc. So a lot of things are developed for a lowest common denominator of performance. This also means stuff like how data is packaged together in files (things likely to be used together are put together to avoid random reads on HDDs; a lot is done to avoid fragmentation, etc.) is sub-optimal for SSD use. If the data structures, software behavior, and access patterns of these applications were tuned with the expectation of an SSD instead, this could lead to significant performance improvements. But AFAIK we're still not quite at the point where the industry has shifted over - SSD storage as a system requirement has started showing up, but it's not ubiquitous yet.