- Joined
- Dec 25, 2020
- Messages
- 7,889 (5.13/day)
- Location
- São Paulo, Brazil
System Name | "Icy Resurrection" |
---|---|
Processor | 13th Gen Intel Core i9-13900KS |
Motherboard | ASUS ROG Maximus Z790 Apex Encore |
Cooling | Noctua NH-D15S upgraded with 2x NF-F12 iPPC-3000 fans and Honeywell PTM7950 TIM |
Memory | 32 GB G.SKILL Trident Z5 RGB F5-6800J3445G16GX2-TZ5RK @ 7600 MT/s 36-44-44-52-96 1.4V |
Video Card(s) | NVIDIA RTX A2000 |
Storage | 500 GB WD Black SN750 SE NVMe SSD + 4 TB WD Red Plus WD40EFPX HDD |
Display(s) | 55-inch LG G3 OLED |
Case | Pichau Mancer CV500 White Edition |
Audio Device(s) | Sony MDR-V7 connected through Apple USB-C |
Power Supply | EVGA 1300 G2 1.3kW 80+ Gold |
Mouse | Microsoft Classic IntelliMouse (2017) |
Keyboard | IBM Model M type 1391405 |
Software | Windows 10 Pro 22H2 |
Benchmark Scores | I pulled a Qiqi~ |
Similar to your results I've noticed when gaming on my 5950x Windows tends to use the first CCD for gaming and usually little to no activity on the 2nd CCD.
The chiplet architecture Ryzen and EPYC utilize is particular to locality, if you can comport a workload's full demand onto a single CCD and avoid sending data through the infinity fabric or accessing data on cache that is currently residing in an adjacent tile (worst-case scenario), this is what you should do. Of course, that means only the resources locally available to that node are fully exploited. This is essentially why the X3D chips have some degree of trouble on Windows, the OS just sees it as "one big block of available resources" without any regard for their physical location. CCD1 should only be accessed if more resources than CCD0 can provide are requested by the application.
That's why a 8 threaded workload will run better on a 5800X or 5950X as opposed to a 5900X CPU, even though the latter has 12 cores and technically should comport 12 threads just fine. It's because it's 6+6, not 8+0 or 8+8. Zen 3 also did away with the issue that Zen 2 had with the CCXs by making them the same size as the CCD itself, the 3900X was effectively a 3+3+3+3 setup and 3950X 4+4+4+4.
Scroll down a little on this article, there is a huge chart demonstrating both inter-core, inter-CCD and inter-socket access latencies on a Turin system. As you can see, though, there are both memory bandwidth and access latency implications, the same concept also applies to the Ryzen 9's obviously at a much, much smaller scale.