AMD Introduces Dynamic Local Mode for Threadripper: up to 47% Performance Gain

Kenrou · Oct 8, 2018

https://www.techpowerup.com/248308/...de-for-threadripper-up-to-47-performance-gain

"This update will be available starting October 29 in Ryzen Master, and will be automatically enabled unless the user manually chooses to disable it. AMD also plans to open the feature up to even more users by including Dynamic Local Mode as a default package in the AMD Chipset Drivers."

trents · Oct 8, 2018

"AMD also plans to open the feature up to even more users by including Dynamic Local Mode as a default package in the AMD Chipset Drivers."

Does this mean the technology will be extended to other Ryzen CPUs and not just Threadripper?

EarthDog · Oct 8, 2018

Pretty sure Ryzen doesn't use a NUMA setup... TR on up (Epyc) however does.

caddi daddi · Oct 8, 2018

no, "ryzen cpus are all on one die, threadripper is multi die.

Dolk · Oct 9, 2018

To help clear up some understandings. The NUMA and UMA memory architecture is not normally associated with the number of dies or chips or cpus in a system. Its associated with who has access to the memory. A single die could easily have more than one memory access capabilities to the same bank of memory and be considered NUMA. I believe the Zen core has one memory controller per a die, so this dynamic local mode would only be for the Threadrippers and Rome processors as they are the ones with multiple memory controllers.

I really love these new tricks that AMD is constantly throwing at Intel. This is a brilliant maneuver in marketing to reward your fan base, but also continue to attract new customers. Especially when the rumor mill is starting to pick up on the new chips coming out supposedly next year.

This site really does a good means on visually showing the internal workings of the micro-architecture of Zen cores:

https://en.wikichip.org/wiki/amd/microarchitectures/zen

EarthDog · Oct 9, 2018

Brilliant? I wouldn't go that far....in fact their marketing is their main problem...overpromising and under delivering.... at least on the GPU side.

The amd chips certainly have a place in the market, but have their drawbacks due to their architecture and latency in certain heavily mutli-threaded applications.

Dolk · Oct 9, 2018

Brilliantly rolled out and delivered... no

Brilliantly planed for end game... yes

But that's my view point. AMD has a lot to catch up on, so they will be behind on delivery industry standard features. Soup is prepped, its just on a very low flame at the moment.

mackerel · Oct 9, 2018

I view this mode as a way to have less loss. I don't mean that in a bad way. A system has a certain peak performance, and you want to reduce anything that might prevent it from achieving that. Would I be right in saying that, a monolithic structure like Intel's wouldn't have suffered from this in the first place? Not saying that's better, as there are certainly good and bad points for either scheme.

Woomack · Oct 10, 2018

EarthDog said:
Brilliant? I wouldn't go that far....in fact their marketing is their main problem...overpromising and under delivering.... at least on the GPU side.

The amd chips certainly have a place in the market, but have their drawbacks due to their architecture and latency in certain heavily mutli-threaded applications.

I have to agree with that. AMD is promising a lot but in real users get "beta" products which are good after a couple of weeks/months. At least this is how it looks like in processors. I feel like engineers are not working close to the marketing department and marketing is always overpromising.
In the same way they introduced every single product in last 2 years ... a lot of noise -> new product on the market -> high expectations on the end-user side -> disappointment of early adopters -> fixing bugs and improving product -> a lot of noise about improved performance what should be delivered in the days of premiere, not after a couple of weeks/months

Not so long time ago we've seen patches for new TR because they were underperforming so I wonder if this improvement is helping to achieve additional performance above what was planned or is only fixing issues which users found after the premiere of 2k TR.
There were also new Nvidia drivers which suppose to solve some issues on TR and improve performance. This was related mostly to 32 core TR but maybe lower models too.

petteyg359 · Oct 10, 2018

Dolk said:
Brilliantly rolled out and delivered... no

Brilliantly planed for end game... yes

But that's my view point. AMD has a lot to catch up on, so they will be behind on delivery industry standard features. Soup is prepped, its just on a very low flame at the moment.

Woomack said:
I have to agree with that. AMD is promising a lot but in real users get "beta" products which are good after a couple of weeks/months. At least this is how it looks like in processors. I feel like engineers are not working close to the marketing department and marketing is always overpromising.
In the same way they introduced every single product in last 2 years ... a lot of noise -> new product on the market -> high expectations on the end-user side -> disappointment of early adopters -> fixing bugs and improving product -> a lot of noise about improved performance what should be delivered in the days of premiere, not after a couple of weeks/months

Not so long time ago we've seen patches for new TR because they were underperforming so I wonder if this improvement is helping to achieve additional performance above what was planned or is only fixing issues which users found after the premiere of 2k TR.
There were also new Nvidia drivers which suppose to solve some issues on TR and improve performance. This was related mostly to 32 core TR but maybe lower models too.

Are you sure Intel never had the same problem? It's really hard to find anything on Google right now with Spectre and Meltdown clogging the pipes. Also, if AMD is doing things a bit differently than Intel has in the past, then obviously they need to work out software to get the maximum performance out of it (e.g. FX core scheduler patches). Also again, this is new stuff. x86 has never had these core counts or so many "modules" before. Saying AMD has to "catch up" here is a bit silly given they're the first ones doing it. IBM has done extreme core counts in Power, but I don't recall them sharing any of their IP with AMD recently.

OC-NightHawk · Oct 10, 2018

I hope this makes its way into the Ryzen 3000 series processors.

t1nm4n · Oct 10, 2018

I'm not sure if cpu's work like this, but wouldn't it be better if these instructions were in the cpu instruction set instead of another program running in the background of whichever OS u use? I mean since it would be a fairly simple set of instructions to assign higher cpu to memory usage to cores with the lowest latency to the memory channels.

Of course I'm no programmer nor an engineer, therefore I have no clue as to how it woudl be done, but I'm sure the engineers can figure out which cores have the fastest communication to which channel of memory.

Here's a decent question, TR is quad channel memory correct? correct me if I'm wrong, but does that not mean there is actually four separate paths memory can take, is this for multi-tasking or is there a performance boost in single process workloads (I'm guessing multi-tasking.) So would it not be wise from an engineering standpoint to map the fastest route to a particular memory channel along with which cores and access it fastest to improve performance, and could this not be accomplished more effectively on the CPU without running more processes on an already flawed OS (M$ user here, never felt comfortable with Linux for some reason)?

I really should figure out how to make my posts' shorter, easier to read and more to the point.

Edit: All in all what AMD is doing is a good thing I think. this is sort of the thing they should have introduced with TR, imagine if AMD released TR with this function, would have made much more of an impact upon release. Yes I prefer team red, always have since the Duron days, so it make me happy when they innovate, I like to see competition (not that it's all that close lately, but getting there), something there really hasn't been for awhile with blue vs. red.

Dolk · Oct 11, 2018

t1nm4n said:
I'm not sure if cpu's work like this, but wouldn't it be better if these instructions were in the cpu instruction set instead of another program running in the background of whichever OS u use? I mean since it would be a fairly simple set of instructions to assign higher cpu to memory usage to cores with the lowest latency to the memory channels.

I do not have a deep knowledge of the instruction sets for all CPUs, but they should be implemented. For the most part, memory is controllable to where you can place it. You can call specific addresses or have the system design address space for you to play around with. From my understanding, a lot of applications will let the kernel/BIOS handle memory interaction as it will best understand where to place it. Here is where the Threadripper update helps. The kernel and BIOS is built by MS or Linux or whomever, and they are sometimes generated for specific processors, but mostly just a family of processors. They are not given the most optimized ways to utilize memory or other instruction calls, but they do the best they can. I'd say the majority of the time this works out, except for cases like this where the memory architecture is a bit unique from other designs. So you can think of this update as a means to assist the kernel to instruct it where to put memory.

Here's a decent question, TR is quad channel memory correct? correct me if I'm wrong, but does that not mean there is actually four separate paths memory can take, is this for multi-tasking or is there a performance boost in single process workloads (I'm guessing multi-tasking.) So would it not be wise from an engineering standpoint to map the fastest route to a particular memory channel along with which cores and access it fastest to improve performance, and could this not be accomplished more effectively on the CPU without running more processes on an already flawed OS (M$ user here, never felt comfortable with Linux for some reason)?

The TR is a Quad Memory channel because there are multiple dies that each have dual channel. The CPU FW is than configured to instantiate the 2-dual channel memory as a single quad channel memory. I don't think you would call this true quad channel memory, I'd confirm that with Woomak. I believe they can get away with this because the Zen architecture is a Master-Slave CPU configuration. Meaning any additional Zen Die (set of cores called CCX) or even CPUs (Multi-Proc) are considered Slaves to the primary Master Zen Die. This gives the Master complete control over the Slaves and can do what it wants with them. It's another reason as to why the memory architecture is flexible. The Master could either be in complete control over where the memory is stored, or defer to the Slaves to do what they want.

Brando · Oct 14, 2018

it sounds like this is what principled technologies were trying to do with the 9900k game benchmark debacle. is this what they call "game mode"?

petteyg359 · Oct 14, 2018

Brando said:
it sounds like this is what principled technologies were trying to do with the 9900k game benchmark debacle. is this what they call "game mode"?

1. Perhaps, but given their apparent incompetence I doubt it. I suspect they saw the "Gamer!" label and blindly checked it off without reading. That or Intel told them to do it. Intel doesn't have a great history of keeping their hands above the table. This feature isn't even released yet, so I doubt PT would've had access to this even if they wanted it.
2. This is not "Game Mode". It achieves a similar function, but without disabling cores.

AMD Introduces Dynamic Local Mode for Threadripper: up to 47% Performance Gain

Kenrou

Member

trents

Senior Member

EarthDog

Gulper Nozzle Co-Owner

caddi daddi

Godzilla to ant hills

Dolk

I once overclocked an Intel

EarthDog

Gulper Nozzle Co-Owner

Dolk

I once overclocked an Intel

mackerel

Member

Woomack

Benching Team Leader

petteyg359

Likes Popcorn

OC-NightHawk

Member

t1nm4n

Member

Dolk

I once overclocked an Intel

Brando

Member

petteyg359

Likes Popcorn

Similar threads