Monday, October 8th 2018

AMD Introduces Dynamic Local Mode for Threadripper: up to 47% Performance Gain

AMD has made a blog post describing an upcoming feature for their Threadripper processors called "Dynamic Local Mode", which should help a lot with gaming performance on AMD's latest flagship CPUs.
Threadripper uses four dies in a multi-chip package, of which only two have a direct access path to the memory modules. The other two dies have to rely on Infinity Fabric for all their memory accesses, which comes with a significant latency hit. Many compute-heavy applications can run their workloads in the CPU cache, or require only very little memory access; these are not affected. Other applications, especially games, spread their workload over multiple cores, some of which end up with higher memory latency than expected, which results in a suboptimal performance.

The concept of multiple processors having different memory access paths is called NUMA (Non-uniform memory access). While technically it is possible for software to detect the NUMA configuration and attach each thread to the ideal processor core, most applications are not NUMA aware and the adoption rate is very slow, probably due to the low number of systems using such a concept.
In ThreadRipper, using Ryzen Master, users are free to switch between "Local Memory Access" mode or "Distributed Memory Access" mode, with the latter being the default for ThreadRipper, resulting in highest compute application performance. Local Mode on the other hand is better suited to games, but switching between the modes requires a reboot, which is very inconvenient for users.

AMD's new "Dynamic Local Mode" seeks to abolish that requirement by introducing a background process that continually monitors all running applications for their CPU usage and pushes the more busy ones onto the cores that have direct memory access, by adjusting their process affinity mask, which selects which processors the application is allowed to be scheduled on. Applications that require very little CPU are in turn pushed onto the cores with no memory access, because they are not so important for fast execution.
This update will be available starting October 29 in Ryzen Master, and will be automatically enabled unless the user manually chooses to disable it. AMD also plans to open the feature up to even more users by including Dynamic Local Mode as a default package in the AMD Chipset Drivers.
Source: AMD Blog Post
Add your own comment

86 Comments on AMD Introduces Dynamic Local Mode for Threadripper: up to 47% Performance Gain

#26
mouacyk
So in local mode, how low can the latency go in AIDA 64? Intel can already go sub-40ns.
Posted on Reply
#27
eidairaman1
The Exiled Airman
mouacykSo in local mode, how low can the latency go in AIDA 64? Intel can already go sub-40ns.
Being an announcement and not a finished deal we are unsure at the moment, just wait and see.
Posted on Reply
#28
bug
qubitAMD is still, what, 10% slower on IPC? That's still a win for Intel. Also, notice that I said Intel would win, not by how much. That depends on specific cases which is outside the scope of my comment.
It's less than 10%.
Interestingly enough, as diligent as HardOCP is testing Intel's IPC, they seem to have missed Zen completely. Still, I could find:
Amd/comments/8f0pxrwww.sweclockers.com/test/24701-intel-core-i9-7980xe-skylake-x/19#content

and if you don't mind looking at slightly older hardware:
forums.anandtech.com/threads/ryzen-strictly-technical.2500572/#post-38770109
mouacykSo in local mode, how low can the latency go in AIDA 64? Intel can already go sub-40ns.
I don't think this is supposed to lower latency. Just to move threads that need lower latency to the cores that are connected to RAM.
Posted on Reply
#29
cdawall
where the hell are my stars
I already did this manually for games that needed it (gta 5). Kind of cool they added a quick button for it.

I would like to still see some bios improvements the downcore modes for the 2990wx are silly on my asrock board. I want to just tell it to run two module mode and basically boot like a 2950x, but that isn't a thing unless I use ryzen master which kills my oc and I have to oc through windows which isn't the same for me.
Posted on Reply
#30
LiveOrDie
Well if this was inbuilt and didn't need software ill buy a AMD CPU.
Posted on Reply
#31
bug
Live OR DieWell if this was inbuilt and didn't need software ill buy a AMD CPU.
I very much doubt that is what makes or breaks a decision to go for an AMD CPU.
Posted on Reply
#32
eidairaman1
The Exiled Airman
Live OR DieWell if this was inbuilt and didn't need software ill buy a AMD CPU.
That might become a thing in the 3000 series.
Posted on Reply
#33
efikkan
No reason to celebrate, this is patchwork to deal with the flawed design of 2970WX/2990WX, they should have worked better than this in the first place.
MysteoaSo AMD can fix what Microsoft can't fix in the scheduler.
Why should Microsoft redesign their kernel to fit a flawed CPU design?
AMD is at fault for outfitting 2970WX/2990WX with two "crippled" dies. When AMD needs to make a program to manipulate the running threads in real time, then something is not right.

While 2950X(16-core) is an okay product, 2970WX/2990WX only scales well for certain workloads, more server workloads rather than workstation. AMD would have to do better for Zen 2 Threadrippers.
Posted on Reply
#34
eidairaman1
The Exiled Airman
efikkanNo reason to celebrate, this is patchwork to deal with the flawed design of 2970WX/2990WX, they should have worked better than this in the first place.


Why should Microsoft redesign their kernel to fit a flawed CPU design?
AMD is at fault for outfitting 2970WX/2990WX with two "crippled" dies. When AMD needs to make a program to manipulate the running threads in real time, then something is not right.

While 2950X(16-core) is an okay product, 2970WX/2990WX only scales well for certain workloads, more server workloads rather than workstation. AMD would have to do better for Zen 2 Threadrippers.
You sound like you're a paid puppet to say crap like that. Remember one thing this is a new architecture. There is no flaw in the design it's just that anytime a brand new processor comes out there's ajustments that need to be made to the operating system itself. But Microsoft too darn lazy to do it.

Where there is no problem in Linux. So you are telling me that open software foundations are able to fix the problems yeta multi-billion-dollar company cannot?

I will tell you what is flawed that's Intel processors due to Spectre and meltdown, that bios updates and winblows patches were released that crippled their performance even further. Yet Microsoft Tried to force in those patches on AMD Rigs, but won't fix scheduler flaws?

Go back into your cave and hibernate.
Posted on Reply
#35
bug
efikkanNo reason to celebrate, this is patchwork to deal with the flawed design of 2970WX/2990WX, they should have worked better than this in the first place.


Why should Microsoft redesign their kernel to fit a flawed CPU design?
AMD is at fault for outfitting 2970WX/2990WX with two "crippled" dies. When AMD needs to make a program to manipulate the running threads in real time, then something is not right.

While 2950X(16-core) is an okay product, 2970WX/2990WX only scales well for certain workloads, more server workloads rather than workstation. AMD would have to do better for Zen 2 Threadrippers.
Eh, it's not crippled any more than a core with HT on top.
It's an asymmetric design in a world that's not used to that. To that end, AMD could have probably done a better job and ensured everything was worked out before launch. That aside, if you understand the limitations and you really need all those cores*, these CPUs can deliver.

*admittedly a sliver of the market as a whole, but HEDT never catered to anything but
Posted on Reply
#36
cucker tarlson
eidairaman1You sound like you're a paid puppet to say crap like that. Remember one thing this is a new architecture. There is no flaw in the design it's just that anytime a brand new processor comes out there's ajustments that need to be made to the operating system itself. But Microsoft too darn lazy to do it.

Where there is no problem in Linux. So you are telling me that open software foundations are able to fix the problems yeta multi-billion-dollar company cannot?

I will tell you what is flawed that's Intel processors due to Spectre and meltdown, that bios updates and winblows patches were released that crippled their performance even further. Yet Microsoft Tried to force in those patches on AMD Rigs, but won't fix scheduler flaws?

Go back into your cave and hibernate.
do amd cpus get a performance hit form those updates too ?
Posted on Reply
#37
efikkan
eidairaman1You sound like you're a paid puppet to say crap like that. Remember one thing this is a new architecture. There is no flaw in the design it's just that anytime a brand new processor comes out there's ajustments that need to be made to the operating system itself. But Microsoft too darn lazy to do it.
Stop making excuses, this has nothing to do with this being a new architecture, only 2970WX/2990WX have scaling issues this severe. Claiming that Microsoft should make a specialized kernel to work around the design flaws of these two CPU models is ridiculous, even if Microsoft had all the money in the world. No amount of software workarounds will be a complete solution to this fault.
bugEh, it's not crippled any more than a core with HT on top.
Nope. Two of the dies have to go through two other dies to access memory, which is a major bottleneck. As seen in a number of benchmarks, the 32-core may even perform worse than the 16-core.
Posted on Reply
#38
bug
efikkanNope. Two of the dies have to go through two other dies to access memory, which is a major bottleneck. As seen in a number of benchmarks, the 32-core may even perform worse than the 16-core.
Yes and in the case of HT, two cores compete not only for the same memory bandwidth, but also for the same prefetch and decode hardware. In both cases, some flows work better, some work worse with these enabled.
Posted on Reply
#39
eidairaman1
The Exiled Airman
efikkanStop making excuses, this has nothing to do with this being a new architecture, only 2970WX/2990WX have scaling issues this severe. Claiming that Microsoft should make a specialized kernel to work around the design flaws of these two CPU models is ridiculous, even if Microsoft had all the money in the world. No amount of software workarounds will be a complete solution to this fault.


Nope. Two of the dies have to go through two other dies to access memory, which is a major bottleneck. As seen in a number of benchmarks, the 32-core may even perform worse than the 16-core.
As I said before you're a paid puppet and I ain't making no excuses you're the one that's making excuses and talking crap.

Go back to your Intel threads
Posted on Reply
#40
qubit
Overclocked quantum bit
efikkanNo reason to celebrate, this is patchwork to deal with the flawed design of 2970WX/2990WX, they should have worked better than this in the first place.


Why should Microsoft redesign their kernel to fit a flawed CPU design?
AMD is at fault for outfitting 2970WX/2990WX with two "crippled" dies. When AMD needs to make a program to manipulate the running threads in real time, then something is not right.

While 2950X(16-core) is an okay product, 2970WX/2990WX only scales well for certain workloads, more server workloads rather than workstation. AMD would have to do better for Zen 2 Threadrippers.
efikkanStop making excuses, this has nothing to do with this being a new architecture, only 2970WX/2990WX have scaling issues this severe. Claiming that Microsoft should make a specialized kernel to work around the design flaws of these two CPU models is ridiculous, even if Microsoft had all the money in the world. No amount of software workarounds will be a complete solution to this fault.


Nope. Two of the dies have to go through two other dies to access memory, which is a major bottleneck. As seen in a number of benchmarks, the 32-core may even perform worse than the 16-core.
While it's not my favourite design, I don't think you can claim it's flawed. As I said in a previous post above, a monolithic design like Intel's gives better performance without those latency drawbacks, but AMD's design is much easier to scale and bring to market. I'm sure later versions of the CPU will have a better version of Infinity Fabric, too.
Posted on Reply
#41
efikkan
bugYes and in the case of HT, two cores compete not only for the same memory bandwidth, but also for the same prefetch and decode hardware. In both cases, some flows work better, some work worse with these enabled.
SMT(like HT) are not two cores in any way, it can't be compared with having fast and "slow" real cores.
qubitWhile it's not my favourite design, I don't think you can claim it's flawed. As I said in a previous post above, a monolithic design like Intel's gives better performance without those latency drawbacks, but AMD's design is much easier to scale and bring to market. I'm sure later versions of the CPU will have a better version of Infinity Fabric, too.
Epyc doesn't suffer the same drawbacks. I'm not criticizing AMD's design of multiple dies, but putting crippled dies on two products.
Posted on Reply
#42
bug
cucker tarlsondo amd cpus get a performance hit form those updates too ?
The only possible hit would be during the threads migration. But that would take well under a second, so you wouldn't spot it in benchmarks. A minor hiccup is what I'd expect to see in a worst case scenario.
Posted on Reply
#43
eidairaman1
The Exiled Airman
cucker tarlsondo amd cpus get a performance hit form those updates too ?
Yes do because they are originally designed for Intel had them intentionally try to cripple thoew systems because Microsoft tries to slip them in without you knowing.
Posted on Reply
#44
cucker tarlson
eidairaman1Yes do because they are originally designed for Intel had them intentionally try to cripple tosw systems because Microsoft tries to slip them in without you knowing.
well that sucks.
is the hit that big ?
Posted on Reply
#45
qubit
Overclocked quantum bit
efikkanEpyc doesn't suffer the same drawbacks. I'm not criticizing AMD's design of multiple dies, but putting crippled dies on two products.
Ok great, so when you talk about crippled dies, do you mean disabled dies to make a lower end processor? If so, why is that a bad thing? It just means that they can still sell lower end products through binning.
Posted on Reply
#46
efikkan
qubitOk great, so when you talk about crippled dies, do you mean disabled dies to make a lower end processor? If so, why is that a bad thing? It just means that they can still sell lower end products through binning.
(facepalm)
No, not at all. Where do you get this from? Two dies on 2970WX/2990WX have to go through the Infinity Fabric to access memory, which causes significant latency. Many workloads are latency sensitive, and this only gets worse when using multiple applications at once. AMD could have made Threadripper without these limitations, but perhaps not on this socket.
I thought this was a tech forum…
Posted on Reply
#47
eidairaman1
The Exiled Airman
efikkan(facepalm)
No, not at all. Where do you get this from? Two dies on 2970WX/2990WX have to go through the Infinity Fabric to access memory, which causes significant latency. Many workloads are latency sensitive, and this only gets worse when using multiple applications at once. AMD could have made Threadripper without these limitations, but perhaps not on this socket.
I thought this was a tech forum…
Yes this is a techforum however you're a chronic troller of AMD threads.

As i said before gtho, go back to your intel threads.
Posted on Reply
#48
Octavean
This is for the new Threadripper 2000x series though isn't it? This won't work on the earlier 1000x series Threadripper or standard Ryzen.
Posted on Reply
#49
eidairaman1
The Exiled Airman
OctaveanThis is for the new Threadripper 2000x series though isn't it? This won't work on the earlier 1000x series Threadripper or standard Ryzen.
Send AMD An email to the developers of ryzen master, they would know
Posted on Reply
#50
R0H1T
bugEh, it's not crippled any more than a core with HT on top.
It's an asymmetric design in a world that's not used to that. To that end, AMD could have probably done a better job and ensured everything was worked out before launch. That aside, if you understand the limitations and you really need all those cores*, these CPUs can deliver.

*admittedly a sliver of the market as a whole, but HEDT never catered to anything but
Except it's crippled because there's still EPYC chips to sell. Intel on the other hand dumped another Xeon instead of releasing that 28 core 5 GHz 8180 killer.
IIRC hardware unboxed showed that 2970WX will be a great product & not slow down as much due to the 4 channel memory limitation.
Posted on Reply
Add your own comment
Jul 17th, 2024 05:24 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts