So I have just got the brand new GTX 960 4GB ASUS in hand, with the BIOS submitted to the techpowerup database. However, I have just found out that the 4GB card is designed in the same way as GTX 970 does, with a lower bandwidth provided over 3.5 GB. And here is the result.
And as a result, I am going to RMA the card because I just need to pay $100 more to get a more juicy 970 card instead of this crappy card. Actually I would expect it to be running with full speed for all memory segment for this new release, but apparently it is not.
Code:
Nai's Cache Size Benchmark
DISCLAIMER:
This Benchmark tries to roughly estimate the L2 cache size in
CUDA by benchmarking memory latencies for differently sized
working sets and different chunks of global memory.
Use it without anything in the DRAM of your GPU or else
the swapping behaviour of the GPU may corrupt the measurement.
If the benchmark produces strange outputs nevertheless,
there is a high proability that this benchmark is not working
as intended. Your GPU is probably just fine. So please stop
making annoying whine posts in any forums, if this benchmark
produces a suspicous output.
Press any key to continue . . .
Device name: GeForce GTX 960
Device memory size: 4096 MiByte
Chunk Size: 128 MiByte
Allocated 30 Chunks
Allocated 3840 MiByte
Benchmarking L2 cache size
L2 cache size of chunk no. 0 (0 MiByte to 128 MiByte): 1024 kiByte
L2 cache size of chunk no. 1 (128 MiByte to 256 MiByte): 1024 kiByte
L2 cache size of chunk no. 2 (256 MiByte to 384 MiByte): 1024 kiByte
L2 cache size of chunk no. 3 (384 MiByte to 512 MiByte): 1024 kiByte
L2 cache size of chunk no. 4 (512 MiByte to 640 MiByte): 1024 kiByte
L2 cache size of chunk no. 5 (640 MiByte to 768 MiByte): 1024 kiByte
L2 cache size of chunk no. 6 (768 MiByte to 896 MiByte): 1024 kiByte
L2 cache size of chunk no. 7 (896 MiByte to 1024 MiByte): 1024 kiByte
L2 cache size of chunk no. 8 (1024 MiByte to 1152 MiByte): 1024 kiByt
L2 cache size of chunk no. 9 (1152 MiByte to 1280 MiByte): 1024 kiByt
L2 cache size of chunk no. 10 (1280 MiByte to 1408 MiByte): 1024 kiBy
L2 cache size of chunk no. 11 (1408 MiByte to 1536 MiByte): 1024 kiBy
L2 cache size of chunk no. 12 (1536 MiByte to 1664 MiByte): 1024 kiBy
L2 cache size of chunk no. 13 (1664 MiByte to 1792 MiByte): 1024 kiBy
L2 cache size of chunk no. 14 (1792 MiByte to 1920 MiByte): 1024 kiBy
L2 cache size of chunk no. 15 (1920 MiByte to 2048 MiByte): 1024 kiBy
L2 cache size of chunk no. 16 (2048 MiByte to 2176 MiByte): 1024 kiBy
L2 cache size of chunk no. 17 (2176 MiByte to 2304 MiByte): 1024 kiBy
L2 cache size of chunk no. 18 (2304 MiByte to 2432 MiByte): 1024 kiBy
L2 cache size of chunk no. 19 (2432 MiByte to 2560 MiByte): 1024 kiBy
L2 cache size of chunk no. 20 (2560 MiByte to 2688 MiByte): 1024 kiBy
L2 cache size of chunk no. 21 (2688 MiByte to 2816 MiByte): 1024 kiBy
L2 cache size of chunk no. 22 (2816 MiByte to 2944 MiByte): 1024 kiBy
L2 cache size of chunk no. 23 (2944 MiByte to 3072 MiByte): 1024 kiBy
L2 cache size of chunk no. 24 (3072 MiByte to 3200 MiByte): 1024 kiBy
L2 cache size of chunk no. 25 (3200 MiByte to 3328 MiByte): 1024 kiBy
L2 cache size of chunk no. 26 (3328 MiByte to 3456 MiByte): 1024 kiBy
L2 cache size of chunk no. 27 (3456 MiByte to 3584 MiByte): 1024 kiBy
Error estimating L2 cache size of chunk no. 28 (3584 MiByte to 3712 M
ably because of swapping!
Latency for the smallest working set: 0.000267 ms
Latency for the largest working set: 0.000268 ms
L2 cache size of chunk no. 29 (3712 MiByte to 3840 MiByte): 1024 kiBy
Benchmarking DRAM
0 MiByte to 128 MiByte: 88.68 GByte/s Read, 86.64 GByte/s Write
128 MiByte to 256 MiByte: 88.68 GByte/s Read, 86.24 GByte/s Write
256 MiByte to 384 MiByte: 88.65 GByte/s Read, 85.85 GByte/s Write
384 MiByte to 512 MiByte: 88.71 GByte/s Read, 86.62 GByte/s Write
512 MiByte to 640 MiByte: 88.68 GByte/s Read, 86.14 GByte/s Write
640 MiByte to 768 MiByte: 88.63 GByte/s Read, 85.81 GByte/s Write
768 MiByte to 896 MiByte: 88.65 GByte/s Read, 86.65 GByte/s Write
896 MiByte to 1024 MiByte: 88.71 GByte/s Read, 86.10 GByte/s Write
1024 MiByte to 1152 MiByte: 88.65 GByte/s Read, 85.85 GByte/s Write
1152 MiByte to 1280 MiByte: 88.71 GByte/s Read, 86.65 GByte/s Write
1280 MiByte to 1408 MiByte: 88.72 GByte/s Read, 86.34 GByte/s Write
1408 MiByte to 1536 MiByte: 88.64 GByte/s Read, 85.87 GByte/s Write
1536 MiByte to 1664 MiByte: 88.71 GByte/s Read, 86.65 GByte/s Write
1664 MiByte to 1792 MiByte: 88.58 GByte/s Read, 86.31 GByte/s Write
1792 MiByte to 1920 MiByte: 88.62 GByte/s Read, 85.94 GByte/s Write
1920 MiByte to 2048 MiByte: 88.71 GByte/s Read, 86.64 GByte/s Write
2048 MiByte to 2176 MiByte: 88.56 GByte/s Read, 86.30 GByte/s Write
2176 MiByte to 2304 MiByte: 88.66 GByte/s Read, 85.94 GByte/s Write
2304 MiByte to 2432 MiByte: 88.70 GByte/s Read, 86.61 GByte/s Write
2432 MiByte to 2560 MiByte: 88.56 GByte/s Read, 86.27 GByte/s Write
2560 MiByte to 2688 MiByte: 88.66 GByte/s Read, 85.94 GByte/s Write
2688 MiByte to 2816 MiByte: 88.71 GByte/s Read, 86.62 GByte/s Write
2816 MiByte to 2944 MiByte: 88.59 GByte/s Read, 86.29 GByte/s Write
2944 MiByte to 3072 MiByte: 88.61 GByte/s Read, 85.98 GByte/s Write
3072 MiByte to 3200 MiByte: 88.70 GByte/s Read, 86.60 GByte/s Write
3200 MiByte to 3328 MiByte: 88.56 GByte/s Read, 86.29 GByte/s Write
3328 MiByte to 3456 MiByte: 88.63 GByte/s Read, 86.06 GByte/s Write
3456 MiByte to 3584 MiByte: 14.00 GByte/s Read, 15.35 GByte/s Write
3584 MiByte to 3712 MiByte: 7.57 GByte/s Read, 8.42 GByte/s Write
3712 MiByte to 3840 MiByte: 9.48 GByte/s Read, 10.50 GByte/s Write
Benchmarking L2 cache
0 MiByte to 128 MiByte: 278.67 GByte/s Read, 284.78 GByte/s Write
128 MiByte to 256 MiByte: 278.70 GByte/s Read, 284.94 GByte/s Write
256 MiByte to 384 MiByte: 278.77 GByte/s Read, 284.93 GByte/s Write
384 MiByte to 512 MiByte: 278.76 GByte/s Read, 285.01 GByte/s Write
512 MiByte to 640 MiByte: 278.85 GByte/s Read, 285.13 GByte/s Write
640 MiByte to 768 MiByte: 278.78 GByte/s Read, 285.19 GByte/s Write
768 MiByte to 896 MiByte: 278.85 GByte/s Read, 284.98 GByte/s Write
896 MiByte to 1024 MiByte: 278.72 GByte/s Read, 284.93 GByte/s Write
1024 MiByte to 1152 MiByte: 278.73 GByte/s Read, 284.89 GByte/s Write
1152 MiByte to 1280 MiByte: 278.83 GByte/s Read, 284.85 GByte/s Write
1280 MiByte to 1408 MiByte: 278.78 GByte/s Read, 284.88 GByte/s Write
1408 MiByte to 1536 MiByte: 278.69 GByte/s Read, 284.79 GByte/s Write
1536 MiByte to 1664 MiByte: 278.76 GByte/s Read, 284.80 GByte/s Write
1664 MiByte to 1792 MiByte: 278.82 GByte/s Read, 284.73 GByte/s Write
1792 MiByte to 1920 MiByte: 278.64 GByte/s Read, 284.82 GByte/s Write
1920 MiByte to 2048 MiByte: 278.82 GByte/s Read, 284.80 GByte/s Write
2048 MiByte to 2176 MiByte: 278.71 GByte/s Read, 285.25 GByte/s Write
2176 MiByte to 2304 MiByte: 278.85 GByte/s Read, 284.73 GByte/s Write
2304 MiByte to 2432 MiByte: 278.58 GByte/s Read, 284.64 GByte/s Write
2432 MiByte to 2560 MiByte: 278.56 GByte/s Read, 285.08 GByte/s Write
2560 MiByte to 2688 MiByte: 278.82 GByte/s Read, 284.89 GByte/s Write
2688 MiByte to 2816 MiByte: 278.77 GByte/s Read, 285.10 GByte/s Write
2816 MiByte to 2944 MiByte: 278.95 GByte/s Read, 284.64 GByte/s Write
2944 MiByte to 3072 MiByte: 278.75 GByte/s Read, 285.56 GByte/s Write
3072 MiByte to 3200 MiByte: 278.65 GByte/s Read, 285.10 GByte/s Write
3200 MiByte to 3328 MiByte: 278.78 GByte/s Read, 284.69 GByte/s Write
3328 MiByte to 3456 MiByte: 278.83 GByte/s Read, 284.72 GByte/s Write
3456 MiByte to 3584 MiByte: 7.31 GByte/s Read, 8.55 GByte/s Write
3584 MiByte to 3712 MiByte: 7.32 GByte/s Read, 8.54 GByte/s Write
3712 MiByte to 3840 MiByte: 24.47 GByte/s Read, 28.34 GByte/s Write
Press any key to continue . . .
And as a result, I am going to RMA the card because I just need to pay $100 more to get a more juicy 970 card instead of this crappy card. Actually I would expect it to be running with full speed for all memory segment for this new release, but apparently it is not.