AMD GPUs are doing surprisingly well in the 1% lows here, much better than their Nvidia counterparts. These results correlate directly with their half-precision/FP16 processing power:
GPU | 1% low FPS @ 4K | FP16 TFLOPS |
---|
7900 XTX | 75 | 122.9 |
7900 XT | 63 | 103.2 |
4090 | 61 | 82.6 |
4080 | 54 | 48.8 |
6900 XT | 48 | 46.1 |
3090 Ti | 46 | 40.0 |
4070 Ti | 45 | 40.1 |
6800 XT | 45 | 41.5 |
3090 | 40 | 35.7 |
6800 | 38 | 32.3 |
Here's my theory:
Ratchet & Clank is the first PC title to take advantage of GPU Decompression. This feature of DirectStorage 1.1+ allows the decompression of game assets to take place on the GPU. This approach is way faster than the traditional decompression on the CPU because it avoids a number of bottlenecks, frees up the CPU for other game-related tasks, and takes advantage of massive parallel processing capabilities of modern GPUs. It also leverages much higher bandwidth of the card's VRAM for decompressing and copying game data. Since GPGPU workloads execute faster with partial floating point precision, I would presume that the GDeflate compression stream format used by GPU Decompression performs better on cards with higher FP16 rating.
Since the whole idea of GPU Decompression (besides reducing load times) is to improve asset streaming -- notably in open world games -- GPUs that can do it faster should also show better 1% and 0.1% low figures, allowing for smoother gameplay.
You're right in saying that Direct Storage increases VRAM usage, but it does so by a negligible amount. It places two additional staging buffers in VRAM whose size can be defined. It is assumed that a 128-256 MB per buffer is optimal:
View attachment 306681View attachment 306682
Images taken from
here. And
here's a good article on the subject.