Core Performace Guide

walaka7 · May 20, 2005

Many of you already know that not all procs are created equal and one will certainly perform better than another givin what project/core its running on. This is a work in progress as data is compiled into what is hopefully a user friendly device to aid in what flags to run to maximise production for a givin platform. Since most of my experience is in the socket A platform, Im gonna start with data pertaining to those first. Then I'll move on to other platforms and with the help of outside reference, an accurate depiction of what can be expected in terms of production may be constructed. Im going to be giving these results in ppd/ghz (points per day per gigahertz). From left to right> project number, what the project is worth, the ppd/ghz, and the type of project it is. Also memory specs could and probably do play a role here, so added is memory used as well:

Okay for the socket A cores runing pc 2100 memory speed:
Tbreds and palaminos: Figured in relation to 1 ghz

638 = 239 ppwu = 86 ppd/ghz Tinker
741 = 148 ppwu = 32 ppd/ghz Gromacs
1132 = 241 ppwu = 85 ppd/ghz Tinker
1136 = 241 ppwu = 85 ppd/ghz Tinker
1139 = 241 ppwu = 85 ppd/ghz Tinker
1282 = 72 ppwu = 45 ppd/ghz Gromacs
1312 = 254 ppwu = 77 ppd/ghz BP Gromacs
1140 = 600 ppwu = 87 ppd/ghz BP Gromacs
1475 = 364 ppwu = 74 ppd/ghz BP Gromacs
1476 = 364 ppwu = 77 ppd/ghz BP Gromacs
1605 = 78 ppwu = 50 ppd/ghz Gromacs

Barton Cores Aslo PC 2100:

957 = 48 ppwu = 58.75 ppd/ghz Gromacs
1123 = 237 ppwu = 81.40 ppd/ghz Tinker
1136 = 241 ppwu = 85.13 ppd/ghz Tinker
1137 = 241 ppwu = 83.04 ppd/ghz Tinker
1139 = 241 ppwu = 86.23 ppd/ghz Tinker
1141 = 600 ppwu = 89.77 ppd/ghz BP Gromacs
1285 = 73 ppwu = 46.34 ppd/ghz Gromacs
1316 = 343 ppwu = 95.82 ppd/ghz BP Gromacs
1321 = 346 ppwu = 100.58 ppd/ghz BP Gromacs
1322 = 302 ppwu = 84.28 ppd/ghz BP Gromacs
1475 = 364 ppwu = 78.44 ppd/ghz BP Gromacs
1476 = 364 ppwu = 82.22 ppd/ghz BP Gromacs

Now Barton cores with 3200 memory:

1137 B-3200- 84.64 ppd/ghz Tinker
1136 B-3200- 87.18 ppd/ghz Tinker
1141 B-3200- 112.21 ppd/ghz BP Gromacs
1317 B-3200- 121.71 ppd/ghz BP Gromacs
1321 B-3200- 119.33 ppd/ghz BP Gromacs
1478 B-3200- 127.47 ppd/ghz BP Gromacs

On tinkers there is almost no difference at all in production regarding memory. But for gromacs, notably BP's, there is on average of a 20% gain using 3200 memory over 2100. Perhaps you guys could help me a lilttle experiment: Take my numbers for the barton on 2100 and multiply by 1.23 and see if it closely relates to your producttion on BP's.

Lately Ive been folding with a couple of pentium M's.
Here's the results on those memory at pc 2700 speed
Dothan Core:

231 = 64 ppwu = 72.34 ppd/ghz Gromacs
234 = 52 ppwu = 60.43 ppd/ghz Gromacs
741 = 149 ppwu = 62.98 ppd/ghz Gromacs
952 = 48 ppwu = 67.44 ppd/ghz Gromacs
1138 = 241 ppwu = 65.11 ppd/ghz Tinker
1140 = 600 ppwu = 158.72 ppd/ghz BP Gromacs
1141 = 600 ppwu = 159.57 ppd/ghz BP Gromacs
1276 = 46 ppwu = 57.00 ppd/ghz Gromacs
1314 = 302 ppwu = 153.19 ppd/ghz BP Gromacs
1325 = 310 ppwu = 154.89 ppd/ghz Bp Gromacs
1476 = 364 ppwu = 302.13 ppd.ghz BP Gromacs
1477 = 364 ppwu = 318.72 ppd/ghz BP Gromacs
1479 = 364 ppwu = 318.64 ppd/ghz BP Gromacs
1605 = 78 ppwu = 62.28 ppd/ghz Gromacs
1912 = 450 ppwu = 119.16 ppd/ghz QMD

Im using a celly D right now for testing, but im not gonna have any much usable data on it as it is only folding for a week (stabilty testing for a friend

) Perhaps some of you with A64's and P4 northy's and pressies, and cellys (different cores) and even P3's could post up with results to make this more complete

Here's what Ive got now:

Celeron D 533 mhz memory is PC3200 speeds
1138 = 241 ppwu = 46.59 ppd/ghz Tinker
1317 = 343 ppwu = 101.16 ppd/ghz BP Gromacs
1318 = 343 ppwu = 102.99 ppd/ghz BP Gromacs
1140 = 600 ppwu = 68.09 ppd/ghz BP Gromacs
1900 = 125 ppwu = 97.90 ppd/ghz QMD
1912 = 450 ppwu = 93.65 ppd/ghz QMD

A64 3200+Newcastle at 200 MHz 3-3-3-8

240 38 ppwu = 58.3 ppd/ghz Gromacs
242 32 ppwu = 57.9 ppd/ghz Gromacs
249 158 ppwu = 55.4 ppd/ghz Gromacs
981 168 ppwu = 167 ppd/ghz DGromacs with variance between runs ~5 ppd
983 138 ppwu = 145 ppd/ghz GBGromacs
1477 364 ppwu = 167 ppd/ghz BPGromacs
1478 364 ppwu = 167.6ppd/ghz BPGromacs
1481 364 ppwu = 167 ppd/ghz BPGromacs
1551 108 ppwu = 57.6 ppd/ghz Gromacs
1554 134 ppwu = 56.4 ppd/ghz Gromacs
1405 234 ppwu = 55.0 ppd/ghz Gromacs
1705 56 ppwu = 62.1ppd/ghz Gromacs
2019 95 ppwu = 166 ppd/ghz BPGromacs

Perhaps some of you can add some data to this that have procs/cores not mentioned, and we could make a kind of guide as to what to expect out of a project/wu. Just make it in the same format as above and Ill cut and paste it

Thanks go out to CHasR, TollhouseFrank, and Who for their input!

dicecca112 · May 20, 2005

great start, I hope to see this continue. Keep up the good work

Steveo989 · May 21, 2005

What does that mean? from left to right is that core speed,points per work unit, then points per day avg for the core? I didn't understand because a processor with a lower clock speed had more ppd.

walaka7 · May 21, 2005

Steveo989 said:
What does that mean? from left to right is that core speed,points per work unit, then points per day avg for the core? I didn't understand because a processor with a lower clock speed had more ppd.

From left to right>> project number / points for given project/points per day per 1 ghz.

To maintain somewhat of a standard all results are posted per ghz of clock speed. Ie, the dothan results are based on 2 processors, one running 1879 mhz and the other running 2350 mhz. However they produce nearly the same per 1 ghz of power. The dothan running 1879 mhz completes a 1479 work unit at a rate of roughly 600 ppd (almost exactly) While the one running 2350 gets 749 ppd. Based on 318.64 ppd/ghz (318.64 * 1.879 & 318.64 * 2.35) will yield roughly those numbers.

The results, are the result of several folded wu's of the same project (except for the celeron results as i only had a week with it) and are averaged to achieve a guideline.

TehMize · May 21, 2005

Oh, that clears things up for me. Awesome work.

walaka7 · Jun 4, 2005

Some changes

walaka7 · Jun 4, 2005

Well hopefully more peeps can shoot me some more data or post it. Chasrs input was fantastic.. very well orginized. The addemdum with the faster memory is his, and the ones i listed were only a sample of what he sent. I will make that section more complete as time allows. But I would love to post data on P-4 variants, and if enough data is compiled, to be able to perhaps develop a loose coeffiecient for memory speed.

pik4chu · Jun 4, 2005

would i assume these numbers are gathered from the calculator that comes with EMIII or what? considering it doesnt seem to work for me but I have several machines I could post data with.

orionlion82 · Jun 4, 2005

Horray for a guide to folding hardware! weve needed it for ever

ChasR · Jun 5, 2005

pik4chu said:
would i assume these numbers are gathered from the calculator that comes with EMIII or what? considering it doesnt seem to work for me but I have several machines I could post data with.

The numbers I furnished, XP-M Barton w/PC3200, were obtained from the FAHlog during periods I knew the machine was otherwise idle. Generally I used the average of 10 frames to make the calculation.

pik4chu · Jun 5, 2005

ChasR said:
The numbers I furnished, XP-M Barton w/PC3200, were obtained from the FAHlog during periods I knew the machine was otherwise idle. Generally I used the average of 10 frames to make the calculation.

hmm ok perhaps ill dig thru there and see if I can come up with some numbers then, thankls

walaka7 · Jun 5, 2005

I used a 50 frame calculation on shorter wu's and 10-25 fram calcualtions on the larger ones. Wheter you do the first 50 or the last doesn't seem to make much difference EXCEPT for QMD's. The short time i spent on them with my celly D showed frame times to be quite inconsistant. So please on QMD projects use all frames for the wu calculation out of your folding logs.

ChasR · Jun 5, 2005

walaka7 said:
I used a 50 frame calculation on shorter wu's and 10-25 fram calcualtions on the larger ones. Wheter you do the first 50 or the last doesn't seem to make much difference EXCEPT for QMD's. The short time i spent on them with my celly D showed frame times to be quite inconsistant. So please on QMD projects use all frames for the wu calculation out of your folding logs.

On a dedicated machine, QMDs are consistent after convergence. But, if you do anything at all with the machine or fold two instances, frame times will vary as much as 20% on consecutive frames. Checkpoints take a long time too, 45* seconds on a fast HD. If you've got a P4C @ 3.0 GHz frames times are going to be a few seconds longer than 15 minutes. If you have checkpointing set for 15 minutes, sometimes it's going to write a checkpoint just before finishing a frame which will stretch out that frame and sometimes it won't.
Edit
* This turned out to be incorrect. The observed result was due to and extra step per frame, not checkpoint duration. QMD checkpoints only take 2 seconds on a fast HD.*

walaka7 · Jun 5, 2005

You are spot on as usual

these were the things i was seeing on the celly D. Tis why i kinda stressed using all the frames. Otherwise you would have to do sort of an average, because the frames required for convergence are different in trems of completion duration than the rest of the wu. So doing the first ten frames or even 70 for that metter, won't give an accurate indication of the time it takes to complete the whole wu

walaka7 · Jul 2, 2005

I swapped out the celly d i had running for a dothan and did a little QMD work. The dothan blows balls on QMD from what Ive seen so far:

1912 = 450 ppwu = 119.16 ppd/ghz

gulp35 · Jul 3, 2005

I think this is the appropriate thread to say that I FREAKING HATE THE 700 SERIES PROTIENS!!!! only 124PPD with the Green Rig in my Sig

ChasR · Jul 3, 2005

walaka7 said:
I swapped out the celly d i had running for a dothan and did a little QMD work. The dothan blows balls on QMD from what Ive seen so far:

1912 = 450 ppwu = 119.16 ppd/ghz

Assuming SSE2 was enabled, it would appear lack of memory bandwidth is holding back the Dothan. P4Cs produce about 145 ppd/GHz and P4Es (6xx anyway) produce 160ppd/ghz. While the Dothan blows away all comers on p147x (a bit less bandwidth intense) it just isn't doing it on the QMDs. How many steps were in the QMD you folded? More steps = more time = less ppd.

walaka7 · Jul 3, 2005

I did all but three frames of that one on the dothan and it was a dedicated run, no playing around with goodies. I wanted convergance to have little effect on the overall picture as the frame times are considerably shorter. And i concur on the bandwidth limitations as it appears to be the issue. Im not sure how much more testing im gonna do on the dothan and QMD's as you can't just get them on their own, they have to be piped or in my case already there with the proc swap. Ram timing was not the tightest.. 2-3-3-6 but the testing was done on a 533 version of the dothan and memory was @ 200 mhz

Oh and yes SSE2 was enabled

walaka7 · Jul 3, 2005

TollhouseFrank said:
I have you 2 more logs man. Both from my new A64. I hope it helps ya out.

Got em, thanks

Could you also be so kind as to send me the system specs. I wish to see if the little coef. i threw together is gonna work on A64's

walaka7 · Jul 3, 2005

Perfect! thanks :beer:

erm is there a smiley for cookies?

Core Performace Guide

Member

Member

Not officially involved with AAR

Member

Registered

Member

Member

Senior Yellow Forum Rat

Member

Senior Member

Senior Yellow Forum Rat

Member

Senior Member

Member

Member

Member

Senior Member

Member

Member

Member

Similar threads