Can a Ryzen 2600-2700(X) owner run some benches please?

mackerel · Jul 27, 2018

Basically I want to look at the IPC of Zen+ with and without SMT on, but I don't own such a CPU (yet). Any volunteers to help with that?

Let's just go for two benchmarks for now to keep it simple: Cinebench R15, and 3DPM. We all know CB15, but you might wonder what 3DPM is... it is something Ian Cutress of Anandtech wrote himself. If you wonder why I'm interested in that, it is because it shows massive increases comparing SMT on vs. off, higher than on Intel.

Ian linked to 3DPM at the twitter location below.

https://twitter.com/x/status/1014558641385598981

Because I'm interested in IPC, it is of most importance to know what clock the CPU is running at, so I'd suggest setting a fixed clock regardless of number of cores active. It doesn't have to be a high clock, only that it is fixed and stable. Run each bench at least 3 times, and let me know the best overall scores with and without SMT enabled. For CB15, only the multi-thread test is needed, not the single core one as by definition that wont use SMT. We know ram has a weak influence on CB15, but I guess it doesn't directly impact 3DPM other than the cache speed tie, so please also state ram speed/configuration as well as CPU model/clock.

Edit: anyone know what the all core turbo is for Ryzen 2000 series? Is there even one at all? It looks like the new boost feature over 1st gen means it is less predictable as it aims to optimise performance to the conditions available.

Johan45 · Jul 27, 2018

Look near the end of my review it might already have what you want https://www.overclockers.com/amd-ryzen-7-2700x-and-ryzen-5-2600x-cpu-review/
Intel has app 5% IPC and AMD's SMT is up to 30% compared to20% on Intel but that varies by benchmark

mackerel · Jul 27, 2018

The CB15 HT gain of 22% sounds low to me, I'd expect closer to 30% based on my previous observations on quad core Intels. Haven't done same on 6 core, so guess I'll have to do that this weekend.

It's more the 3DPM result I'm interested in, CB15 is just to give another reference point as a sanity check since it is more widely understood.

EarthDog · Jul 27, 2018

I don't have Ryzen, but have his test suite that includes 3DPM and a 8700K (I work at AT,

).

Would the results in bench be helpful?
https://www.anandtech.com/bench/CPU/1603

mackerel · Jul 27, 2018

I forgot that existed, but the "problem" with using that data for my objective is I don't know the exact running clocks (Ryzen 2000 all core turbo???), and also I'm guessing there wont be data with SMT off. I'll have a deeper look later, but running something in known and fixed condition is the best way to get a result.

Again, my goal in this is to compare the IPC with and without HT/SMT, as far as is practical taking out other limitations. For example, CB15 doesn't scale with ram speed much so I think I can ignore it to a large extent. I'm not comparing CPU models, but more the CPU architectures. When Ryzen first came out, Agner Fog in his CPU microarchitecture guides did state that Zen has more execution potential with SMT over Intel. I did see some hints of that in early testing, and this is a follow up now that things are much more mature and stable. My past testing on HT shows 0-50% gain depending on the workload. I was wondering if 50% was a hard limit. Well, 3DPM is the first example I'm aware of on AMD side which smashes that. Ian's data showed almost 80%, I've only seen just under 70% on a quick test with 1st gen Ryzen but this will have to be repeated under more controlled conditions.

Is it crazy this is my idea of fun?

EarthDog · Jul 27, 2018

I wasn't sure, but wanted to throw it out there.

Also, you are testing the efficiency of SMT vs HT. IPC doesn't change when enabling/disabling HT/SMT. Each core is still cranking out the same amount of IPC /thread.

mackerel · Jul 27, 2018

EarthDog said:
I wasn't sure, but wanted to throw it out there.

Also, you are testing the efficiency of SMT vs HT. IPC doesn't change when enabling/disabling HT/SMT. Each core is still cranking out the same amount of IPC /thread.

IPC definitely changes with HT/SMT. If not, there would be no point in it existing. I'm looking at it per core. 1c1t will generally have lower total IPC than 1c2t loads. If you look at it per thread, IPC will be lower when both threads of one core are used due to the shared resources.

EarthDog · Jul 27, 2018

IPC doesn't change. The number of active threads which can be used change which yields more work that can be done. This is a test of HT and SMT efficiency. The amount of instructions per thread and clock does not change, just the number of threads. How the CPU deals with those non-logical cores is a different story.

EDIT: For exmaple if I have a 4 lane highway with each lane allowing 20 cars per second max (IPC), if I add more lanes, the amount of cars per lane does not change on the original, however more cars can go through because of more lanes (not faster lanes that can handle more cars). Its like adding a dirt road compared to cement highway, HT/SMT to real cores... the dirt road isn't as fast (that resource sharing), but its still adding volume/more work done.

EDIT2: The HT/SMT threads have less "IPC(thread)" than the physical cores due to the shared resources and scheduling.

EDIT3: From Wiki:

For each processor core that is physically present, the operating system addresses two virtual (logical) cores and shares the workload between them when possible. The main function of hyper-threading is to increase the number of independent instructions in the pipeline; it takes advantage of superscalar architecture, in which multiple instructions operate on separate data in parallel. With HTT, one physical core appears as two processors to the operating system, allowing concurrent scheduling of two processes per core. In addition, two or more processes can use the same resources: if resources for one process are not available, then another process can continue if its resources are available.

mackerel · Jul 27, 2018

Is there a formal definition of IPC that I might be deviating from? From my perspective, if a core does more work per clock, it has higher IPC. HT/SMT is an enabler of such.

EarthDog · Jul 27, 2018

Each core/thread is not doing anything different. There are simply MORE threads to use when enabling HT/SMT. And because of the resource sharing, it isn't a 1:1 situation as far as the amount of work that can be done.

By defintion: Calculation of IPC
The number of instructions per second and floating point operations per second for a processor can be derived by multiplying the number of instructions per cycle with the clock rate (cycles per second given in Hertz) of the processor in question. The number of instructions per second is an approximate indicator of the likely performance of the processor.

Does any of that change? The amount of instructions (per core/thread) does not change, nor does the clock rate. The only thing that changes is the number of threads available and virtualization of cores.

Hopefully that clarifies things.

mackerel · Jul 27, 2018

Still clear as mud. To me, it is blindingly clear IPC must change. If you do more work, without changing the clock, then one of two things must happen. Either the instruction used does more work, or you do more instructions. In the case of SMT/HT, it would be the latter. If you used different instructions (such as using AVX-512 over AVX2), it would be the former.

Say we have a 4c processor, with HT/SMT off. You run 4 single thread tasks on it. You get X amount of work done per unit time. You enable HT/SMT on it, without changing anything else. You run 8 single thread tasks on it. You get >100% the performance of before. It is the same CPU, the same clock, the same workload. You are doing more, it must be processing more instructions to do so.

Would it help if I stop using IPC and call it throughput or something?

EarthDog · Jul 27, 2018

It's funny that its 'blindingly obvious' to me that the number of threads is what is increasing the work completed.

Say we have a 4c processor, with HT/SMT off. You run 4 single thread tasks on it. You get X amount of work done per unit time. You enable HT/SMT on it, without changing anything else. You run 8 single thread tasks on it. You get >100% the performance of before. It is the same CPU, the same clock, the same workload. You are doing more, it must be processing more instructions to do so.

I see what you are saying (additive), but IPC isnt really the proper term for this.

Again, the same amount of traffic is able to go through each thread, that doesn't increase. The number of threads able to process the data did increase. To use your analogy, yes more WORK is getting done, but the amount of work each thread can potentially do (IPC) does not change, just the number of threads. So yes, technically more work is getting crunched, but its due to more threads, not an increase in instructions per clock/how much each thread can do per clock.

Would it help if I stop using IPC and call it throughput or something?

Yes, actually.

Kenrou · Jul 27, 2018

I always thought IPC was a CPU/gen fixed number, but would increase (or throughput would increase) with HT off simply because it doesn't need to "split the load" ?

mackerel · Jul 27, 2018

I think we're almost there now!

When people do architecture comparisons, they often use single thread performance. That makes sense, you push something through it, it does some work. Normalise for clock, you get single thread IPC.

To me, it is not wrong to have other variations of IPC. The one I'm thinking of here, would be "per core" IPC. Thus, 2 threads on one core is a valid measurement towards that. The definition you posted earlier didn't mention anything about cores or threads.

And if you think this is fun, I would even propose concepts like peak IPC. In my previous P95 testing, you can get a good feel for the peak IPC by using small FFT sizes as they run relatively unhindered out of the CPU cache. Large FFTs will be bottlenecked by ram and effective IPC drops. For the testing I want to do here, I want to determine the peak, and thus may have to avoid other limiting factors. One method is simply to run fewer cores, and/or lower clocks.

Are we done?

I'd still like that testing, whatever we might like to call it, done on a Zen+ CPU if anyone can help.

EarthDog · Jul 27, 2018

That depends on the load and software. Youll note in the links above there are passages explaining how it works and in part, uses idle time to feed the additional threads. But its workload specific depending on the resources each thread needs. If another thread is using the same resource, it waits (is how i understand it).

mackerel said:
I think we're almost there now!

When people do architecture comparisons, they often use single thread performance. That makes sense, you push something through it, it does some work. Normalise for clock, you get single thread IPC.

To me, it is not wrong to have other variations of IPC. The one I'm thinking of here, would be "per core" IPC. Thus, 2 threads on one core is a valid measurement towards that. The definition you posted earlier didn't mention anything about cores or threads.

And if you think this is fun, I would even propose concepts like peak IPC. In my previous P95 testing, you can get a good feel for the peak IPC by using small FFT sizes as they run relatively unhindered out of the CPU cache. Large FFTs will be bottlenecked by ram and effective IPC drops. For the testing I want to do here, I want to determine the peak, and thus may have to avoid other limiting factors. One method is simply to run fewer cores, and/or lower clocks.

Are we done?

I'd still like that testing, whatever we might like to call it, done on a Zen+ CPU if anyone can help.

were done... not going to beat a dead horse to a total pulp (seems squishy enough, lol). You can conjure up new terms and apply definitions to anything youd like...

Looking forward to seeing the end result.

Alaric · Jul 27, 2018

Again, the same amount of traffic is able to go through each thread, that doesn't increase. The number of threads able to process the data did increase. To use your analogy, yes more WORK is getting done, but the amount of work each thread can potentially do (IPC) does not change, just the number of threads. So yes, technically more work is getting crunched, but its due to more threads, not an increase in instructions per clock/how much each thread can do per clock.

I must be missing something. This makes perfect sense to me, but I'm the Slow Kid here.

Edit: Figured it out. ED used a car analogy. LOL

mackerel · Jul 27, 2018

I'm less interested on how SMT/HT works, as that is more a problem for a programmer to work out. To me it is more about how much benefit it can provide a user. The short version of my current understanding is, for a single thread per core per clock case, Intel have better performance. If you run SMT/HT, there is overall more execution potential in Zen allowing it to take a lead (outside of FPU heavy cases anyway). As said, I have seen that before, but this would be a more modern revisit of that. Over the years I have seen up to 50% throughput increase from HT, and was wondering if there was a hard limit. It doesn't seem to be the case for Zen SMT, but the question then is, under what situations might that be realised? I suspect Ian's 3DPM bench is the exception rather than the rule, and want to explore it more.

Johan45 · Jul 27, 2018

Hey Mack in that same review there are head to head benchmarks with the CPUs all at 4.0 GHz that might be closer to what you are looking for. Results vary depending on benchmark as Intel is stronger at some things than AMD but you can see both cinebench results just divide by cores

mackerel · Jul 27, 2018

Johan45 said:
Hey Mack in that same review there are head to head benchmarks with the CPUs all at 4.0 GHz that might be closer to what you are looking for. Results vary depending on benchmark as Intel is stronger at some things than AMD but you can see both cinebench results just divide by cores

Thanks but my main interest is in getting a 3DPM run with and without SMT on. CB15 is just a parallel check.

I guess it isn't clear what I'm trying to do... in essence, it is the HT/SMT scaling test, but with different workloads, and different CPUs. Best thing to do is the testing I was going to do on other systems (maybe some or all of: 6700k, 7800X, 8086k, 1700, 1600, possibly older CPUs if I feel like it). That way, it'll be more obvious where the Ryzen 2000 would fit in.

Johan45 · Jul 27, 2018

You'll have to wait at least for me , I'm setting up 2600X for LN2 then I need an Intel setup for some SW testing.

Can a Ryzen 2600-2700(X) owner run some benches please?

Member

Benching Team Leader Super Moderator

Member

Gulper Nozzle Co-Owner

Member

Gulper Nozzle Co-Owner

Member

Gulper Nozzle Co-Owner

Member

Gulper Nozzle Co-Owner

Member

Gulper Nozzle Co-Owner

Member

Member

Gulper Nozzle Co-Owner

New Member

Member

Benching Team Leader Super Moderator

Member

Benching Team Leader Super Moderator

Similar threads