- Joined
- Mar 7, 2008
Above is the result of a bunch of testing... let me explain.
System
i3-8350k at either 4.0 GHz core and 4.0 GHz cache, or 5.0 GHz core, -3 AVX offset, 4.4 GHz cache
Asrock Z370 Pro4 bios 1.30
Ram is TridentZ 3000C14, 2x8GB, containing Samsung B-die
Rest probably doesn't matter.
Test software is latest Aida64, or Prime95 29.3 set to benchmark 4096k FFT in one and four worker configuration. 4 cores one worker will put all cores on the same data set, of 32MB size. 4 cores 4 workers is one worker per core, so the total load is 4x 32MB = 128MB. Because of the AVX offset, for the 5 GHz configuration, it would be running at 4.7 GHz at that time.
I've been overclocking the ram, hence the various scenarios above.
Basic performance is just by SPD 2133. Safe, slow.
The ram contains an XMP profile, which sets 3000C14.
After some tinkering, I got 3600C16 running, requiring some voltage tweaking for stability.
Here I hit a wall, and couldn't get 3700 running no matter what. Then one day I decide to try 3733, and it worked first time! I pushed on to...
3866, which required some more manual tuning of voltages.
Then my self torture started, and I went about tweaking the secondary and tertiary timings. This took a lot of trial and error, mostly error. I haven't done a final stability test, but what I have right now seems stable enough.
The Prime95 results are iter/s, where higher is better. The code is very efficient at getting work done on the core, and for data sets that don't entirely fit in CPU cache, ram performance becomes significant. The chosen test size was to ensure that would be the case. I had previously determined a rough rule of thumb that to be not significantly ram limited, a quad core Intel would need dual channel ram at a comparable rated speed compared to core clock. E.g. for a 4 GHz CPU, you'd aim for 4000 rated ram. That's tricky! This is also in part why I'm concerned about the rising core counts without a corresponding rise in memory channels.
Results here certainly do illustrate the ram limiting in action. For the overclocked CPU configuration, an 81% in ram speed (and timing optimisations also) gets up to 69% throughput increase! That suggests we're still deep in ram bandwidth limiting situation. By testing a near-stock and overclocked condition, we can also see the gains are held back significantly by the ram. In the past I don't overclock much on my prime number finding systems for this reason, the ram is limiting, so trying to make the CPU faster just ends up burning more power without getting much more throughput.
It is also interesting to compare the case of 3866 speed auto and manual timings. Depending on the scenario, we're seeing 6% to 14% increase. Remember, this is the same speed, but with highly optimised timings. In the past I had tried adjusting primary timings and they didn't make much difference. It would seem the key lies elsewhere. In the hopes of explaining this difference, I took the Aida64 measurements also. These don't show so much difference, around 4 to 6%. There is still something we're not seeing as the complete picture here. Even if you stack the latency difference that only gains up to another couple %.