• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

CPU vs GPU

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

mackerel

Member
Joined
Mar 7, 2008
cpugpu.png

I've done a bunch of testing. CPUs are stock, ram is probably over stock in many cases.
dc = dual channel
qc = quad channel
sr = single rank
dr = dual rank

The test is using genefer which looks for prime numbers in a particular format. The software is available for both CPU and GPU, and recently multi-thread support was added to CPU which makes them interesting again. Times are in hours estimated for 304208^2097152+1 which is representative of the current search range. In case you wonder what the OCL number is, this indicates which method is used. The software will do a mini-bench at the start to find out what is best for the hardware used at the time. The maths behind it is too complicated for me, but it could be said that they're not doing exactly the same calculation, but aiming to go for the best path to the result.

For GPU tests I left them a short time to warm up before taking the estimate as they all downclock as they get hotter. I can't say for sure I left it long enough for it to stop slowing.

Power is as reported in software. It should be noted the different scenarios are NOT comparable with each other. That is, we have 4 cases: nvidia GPU, AMD GPU, AMD CPU, Intel CPU.

nvidia GPU - I understand this is the total power taken by the card
AMD GPU - I understand this is only the power taken by the GPU, so doesn't include ram, VRM efficiency etc.
AMD CPU - this is the reported core+SoC total power
Intel CPU - this is the reported socket power

I also worked out kWh/unit as a kind of performance per watt metric, which will obviously still be limited by how those watts are reported.

General observations:
On nvidia GPUs, Turing is more efficient than Pascal, which is more efficient than Maxwell. Not really any surprise there. The Turing test (pun not intended) picked OCL5, whereas Pascal/Maxwell picked OCL4. The 970 did momentarily switch to OCL5, which made the time estimate jump up to 24 hours. The software may do this if it suspects a bad calculation due to hardware errors, as different methods do work in different ways and some are safer than others. The 970 in question was an EVGA card with core factory overclocked. Ram was still stock. I didn't try un-overclocking to see if it got rid of the transfer switch.

On AMD GPUs, the ones I have at least all picked OCL transform. 280X is faster than 580X, but it didn't report power so I don't know how well it does there. Vega did well in general. Without double checking, I think OCL uses FP64, which would be a strong point for the 280X.

On Intel CPUs, it isn't a big surprise but it seems many of these CPUs are held back by ram bandwidth. Look at the two 6700ks for example. One takes about 30% longer due to the slightly slower ram, and not having dual rank. It is unclear if Broadwell L4 is helping here, but I'm leaning towards it is comparing against the 6600k. Note the 5675C was tested twice, as the mobo used has a bug that means it hard limits to 65W TDP and I can only relax it via software after boot.

On AMD CPUs, feels like a similar story to Intel, in that I could use better ram in those systems.
 
Pretty interesting! I was always under the impression that Intel had plenty of ram bandwidth due to having integrated imc now, or is this considered to be a 'synthetic' test where it matters to have fastest ram to get biggest boost in performance?

Looks like quad channel sr on 5930k is as fast as dual channel dr on 6700k, but with much faster ram. Maybe this calculation set benefits a lot from better ram setup?
 
So what is the point of this? To test the power efficiency of various CPUs and GPUs?
 
There is a challenge at PrimeGrid starting on what will be Tuesday night my time. I'm debating how much resource to throw at it, which is balanced both by speed (don't want to run slow equipment) and power (house temperature). Also this is one of the few occasions where CPU and GPU can be used competitively. Other cases tend to favour one or the other.

I know from past experience for Prime95 like use cases, you hit ram bandwidth limiting pretty easily. This software isn't based off the Prime95 code, but another implementation. At the end of the day, it still has to do similar things, and we may hit similar limits. Both Prime95 and genefer also has scaling to consider when running multi-thread on CPU. I certainly wouldn't describe this software as "synthetic" since they do a meaningful task in the process. That of course doesn't mean it is indicative of mainstream uses. To do a certain amount of work, you need a certain amount of data moving to/from ram, and thus where the total CPU exceeds this, you hit similar limits on both Intel and AMD. This is in part why I cry at the thought of a potential 16 core CPU crippled by dual channel ram.

Given system variables, I didn't think it particularly fair to measure system power and show that, so the next best was reported power. Within the limits they measure differently... so it is an indication that could be used with caution, and doesn't tell the whole story.

The ram I have in my "fast" 6700k is certainly interesting. G.Skill Ripjaws V 3200C16 2x8GB kit. It is unlike most other recent modules in that it is dual rank, and that seems to give it a significant boost over single rank ram in this type of workload.
 
Interesting how we had triple channel in first gen i7, then we went to dual channel for a few gens, then went to quad in 5th gen and then back to dual from that onwards. Why?
 
The 1st gen was them trying stuff out I guess. 2nd gen onwards mainstream is dual channel. Quad channel is on HEDT platforms.
 
Thank you, Sir Mack
I started separate thread on channels, not to derail this one.
 
Back