• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

Relative IPC at prime number finding and other thoughts

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

mackerel

Member
Joined
Mar 7, 2008
I am one of those few people who care a lot about ram bandwidth, but things might be about to change due to software changes. The results below are multithread results using LLR, which can be seen as similar to Prime95 since it uses the same math library for the heavy calculations. FFT size of the biggest units are 1920k if you want to relate this to Prime95 stress settings.

Previously LLR could only be run on a single thread, so if you have multiple cores, you have to run one instance per core to load things up. Note HT has no benefit in this application, so people usually turn off HT if available to simplify management as leaving it on can result in a significant performance drop unless you manually manage affinity to compensate. In this running style, with each unit working on separate data sets, it exceeds the CPU's cache and really hammers the ram bandwidth. For a fast quad core, you will be ram bandwidth limited, not CPU limited.

With multithread now enabled, it looks like the picture is changing. With threads working on the same task, there isn't so much data to throw around. These task sizes are still too big to fit in most CPU caches at just under 16MB of data, although it could fit in Ryzen's 16MB. I wouldn't expect ram bandwidth to not be a factor, but how much so is a new question.

I took a bunch of results and worked out the BOINC credits per second awarded. The PrimeGrid PSP project used awards credit which scale with the work being done, and in a separate check this seemed to work pretty well, with deviations of individual units in the low single digit % range. So following is the actual credit per second for the system as is, and in brackets afterwards is the hypothetical performance if you had a quad core at 3 GHz. In other words, the second number is relative IPC.

i7-6700k 4c @ 4.2 : 0.65 (0.46)
i5-6600k 4c @ 4.2: 0.64 (0.46)
i5-5675c 4c @ 3.5: 0.51 (0.44)
i5-4570s 4c @ 3.2: 0.44 (0.42)
i3-4360 2c @ 3.7: 0.25 (0.41)
E5-2650 x2 16c @ 2.0: 0.68 (0.26)
R7 1700 8c @ 3.2: 0.48 (0.23)
E5-2583v3 14c @ 2.3: 0.99 (0.37)

On the Intel quads, they're in a similar ball park although I suspect Broadwell might be punching above its weight due to its L4 cache. Looks like scaling between Haswell i3 and i5 is pretty consistent. Ryzen is about half Skylake which is consistent with past observations. I would caution, the math library used in LLR hasn't been updates for Ryzen yet, so it is likely not using FMA3 code but older transform, and may do better once updated. I wouldn't expect it to make a big difference based on testing with pre-release test version of Prime95 29.1. Haswell Xeon scales poorly as I think the software can't cope with so many threads. Not looked at the SB Xeon system in detail yet.

The take away point of all this is that changing software can and does alter hardware needs. It is still early days in this testing, and using live project data is always risky as it isn't doing exactly the same work. For this type of application, Intel CPU architecture is still better than Ryzen, but the bandwidth demands might not be as bad as before.
 
Back