• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

AMD Introduces Dynamic Local Mode for Threadripper: up to 47% Performance Gain

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.
"AMD also plans to open the feature up to even more users by including Dynamic Local Mode as a default package in the AMD Chipset Drivers."

Does this mean the technology will be extended to other Ryzen CPUs and not just Threadripper?
 
To help clear up some understandings. The NUMA and UMA memory architecture is not normally associated with the number of dies or chips or cpus in a system. Its associated with who has access to the memory. A single die could easily have more than one memory access capabilities to the same bank of memory and be considered NUMA. I believe the Zen core has one memory controller per a die, so this dynamic local mode would only be for the Threadrippers and Rome processors as they are the ones with multiple memory controllers.

I really love these new tricks that AMD is constantly throwing at Intel. This is a brilliant maneuver in marketing to reward your fan base, but also continue to attract new customers. Especially when the rumor mill is starting to pick up on the new chips coming out supposedly next year.

This site really does a good means on visually showing the internal workings of the micro-architecture of Zen cores:

https://en.wikichip.org/wiki/amd/microarchitectures/zen
 
Brilliant? I wouldn't go that far....in fact their marketing is their main problem...overpromising and under delivering.... at least on the GPU side.

The amd chips certainly have a place in the market, but have their drawbacks due to their architecture and latency in certain heavily mutli-threaded applications.
 
Brilliantly rolled out and delivered... no

Brilliantly planed for end game... yes

But that's my view point. AMD has a lot to catch up on, so they will be behind on delivery industry standard features. Soup is prepped, its just on a very low flame at the moment.
 
I view this mode as a way to have less loss. I don't mean that in a bad way. A system has a certain peak performance, and you want to reduce anything that might prevent it from achieving that. Would I be right in saying that, a monolithic structure like Intel's wouldn't have suffered from this in the first place? Not saying that's better, as there are certainly good and bad points for either scheme.
 
Brilliant? I wouldn't go that far....in fact their marketing is their main problem...overpromising and under delivering.... at least on the GPU side.

The amd chips certainly have a place in the market, but have their drawbacks due to their architecture and latency in certain heavily mutli-threaded applications.

I have to agree with that. AMD is promising a lot but in real users get "beta" products which are good after a couple of weeks/months. At least this is how it looks like in processors. I feel like engineers are not working close to the marketing department and marketing is always overpromising.
In the same way they introduced every single product in last 2 years ... a lot of noise -> new product on the market -> high expectations on the end-user side -> disappointment of early adopters -> fixing bugs and improving product -> a lot of noise about improved performance what should be delivered in the days of premiere, not after a couple of weeks/months

Not so long time ago we've seen patches for new TR because they were underperforming so I wonder if this improvement is helping to achieve additional performance above what was planned or is only fixing issues which users found after the premiere of 2k TR.
There were also new Nvidia drivers which suppose to solve some issues on TR and improve performance. This was related mostly to 32 core TR but maybe lower models too.
 
Brilliantly rolled out and delivered... no

Brilliantly planed for end game... yes

But that's my view point. AMD has a lot to catch up on, so they will be behind on delivery industry standard features. Soup is prepped, its just on a very low flame at the moment.
I have to agree with that. AMD is promising a lot but in real users get "beta" products which are good after a couple of weeks/months. At least this is how it looks like in processors. I feel like engineers are not working close to the marketing department and marketing is always overpromising.
In the same way they introduced every single product in last 2 years ... a lot of noise -> new product on the market -> high expectations on the end-user side -> disappointment of early adopters -> fixing bugs and improving product -> a lot of noise about improved performance what should be delivered in the days of premiere, not after a couple of weeks/months

Not so long time ago we've seen patches for new TR because they were underperforming so I wonder if this improvement is helping to achieve additional performance above what was planned or is only fixing issues which users found after the premiere of 2k TR.
There were also new Nvidia drivers which suppose to solve some issues on TR and improve performance. This was related mostly to 32 core TR but maybe lower models too.

Are you sure Intel never had the same problem? It's really hard to find anything on Google right now with Spectre and Meltdown clogging the pipes. Also, if AMD is doing things a bit differently than Intel has in the past, then obviously they need to work out software to get the maximum performance out of it (e.g. FX core scheduler patches). Also again, this is new stuff. x86 has never had these core counts or so many "modules" before. Saying AMD has to "catch up" here is a bit silly given they're the first ones doing it. IBM has done extreme core counts in Power, but I don't recall them sharing any of their IP with AMD recently.
 
Last edited:
I'm not sure if cpu's work like this, but wouldn't it be better if these instructions were in the cpu instruction set instead of another program running in the background of whichever OS u use? I mean since it would be a fairly simple set of instructions to assign higher cpu to memory usage to cores with the lowest latency to the memory channels.

Of course I'm no programmer nor an engineer, therefore I have no clue as to how it woudl be done, but I'm sure the engineers can figure out which cores have the fastest communication to which channel of memory.

Here's a decent question, TR is quad channel memory correct? correct me if I'm wrong, but does that not mean there is actually four separate paths memory can take, is this for multi-tasking or is there a performance boost in single process workloads (I'm guessing multi-tasking.) So would it not be wise from an engineering standpoint to map the fastest route to a particular memory channel along with which cores and access it fastest to improve performance, and could this not be accomplished more effectively on the CPU without running more processes on an already flawed OS (M$ user here, never felt comfortable with Linux for some reason)?

I really should figure out how to make my posts' shorter, easier to read and more to the point.

Edit: All in all what AMD is doing is a good thing I think. this is sort of the thing they should have introduced with TR, imagine if AMD released TR with this function, would have made much more of an impact upon release. Yes I prefer team red, always have since the Duron days, so it make me happy when they innovate, I like to see competition (not that it's all that close lately, but getting there), something there really hasn't been for awhile with blue vs. red.
 
Last edited:
I'm not sure if cpu's work like this, but wouldn't it be better if these instructions were in the cpu instruction set instead of another program running in the background of whichever OS u use? I mean since it would be a fairly simple set of instructions to assign higher cpu to memory usage to cores with the lowest latency to the memory channels.

I do not have a deep knowledge of the instruction sets for all CPUs, but they should be implemented. For the most part, memory is controllable to where you can place it. You can call specific addresses or have the system design address space for you to play around with. From my understanding, a lot of applications will let the kernel/BIOS handle memory interaction as it will best understand where to place it. Here is where the Threadripper update helps. The kernel and BIOS is built by MS or Linux or whomever, and they are sometimes generated for specific processors, but mostly just a family of processors. They are not given the most optimized ways to utilize memory or other instruction calls, but they do the best they can. I'd say the majority of the time this works out, except for cases like this where the memory architecture is a bit unique from other designs. So you can think of this update as a means to assist the kernel to instruct it where to put memory.

Here's a decent question, TR is quad channel memory correct? correct me if I'm wrong, but does that not mean there is actually four separate paths memory can take, is this for multi-tasking or is there a performance boost in single process workloads (I'm guessing multi-tasking.) So would it not be wise from an engineering standpoint to map the fastest route to a particular memory channel along with which cores and access it fastest to improve performance, and could this not be accomplished more effectively on the CPU without running more processes on an already flawed OS (M$ user here, never felt comfortable with Linux for some reason)?

The TR is a Quad Memory channel because there are multiple dies that each have dual channel. The CPU FW is than configured to instantiate the 2-dual channel memory as a single quad channel memory. I don't think you would call this true quad channel memory, I'd confirm that with Woomak. I believe they can get away with this because the Zen architecture is a Master-Slave CPU configuration. Meaning any additional Zen Die (set of cores called CCX) or even CPUs (Multi-Proc) are considered Slaves to the primary Master Zen Die. This gives the Master complete control over the Slaves and can do what it wants with them. It's another reason as to why the memory architecture is flexible. The Master could either be in complete control over where the memory is stored, or defer to the Slaves to do what they want.
 
it sounds like this is what principled technologies were trying to do with the 9900k game benchmark debacle. is this what they call "game mode"?
 
it sounds like this is what principled technologies were trying to do with the 9900k game benchmark debacle. is this what they call "game mode"?

1. Perhaps, but given their apparent incompetence I doubt it. I suspect they saw the "Gamer!" label and blindly checked it off without reading. That or Intel told them to do it. Intel doesn't have a great history of keeping their hands above the table. This feature isn't even released yet, so I doubt PT would've had access to this even if they wanted it.
2. This is not "Game Mode". It achieves a similar function, but without disabling cores.
 
Last edited:
Back