CUDA performance

QuietIce · Sep 7, 2009

I'm trying to create a database of CUDA SETI performance. If you run CUDA would you please list your graphics card(s), core clock (if not stock), and approximate RAC from the card?

Here is the little bit of data I've managed to find so far and I'll add to this list as you post new numbers:

8600GTS (G84) ~1200 32:16:8 @ 675
8800GS (G92) ~1500 96:48:12 @ 550
8800GT (G92) ~2900 112:56:16 @ 600

9600GT (G94) ~3000 64:32:16 @ 675 (CPU core reserved)
9600GSO (G92) ~3500 96:48:12 @ 550 (dedicated cruncher)
9800GT (G92) ~3000 112:56:16 @ 667
9800GT (G92) ~5000 112:56:16 @ 635 (CPU core reserved)

GTX260 OC2 (G200) ~5500 216:72:28 @ 630
GTX260 (GT200) ~6000 192:64:28 @ 685
GTX260 (GT200) ~9000 192:64:28 @ 685 (CPU core reserved)
GTX260 (GT200) ~10500 192:64:28 @ 711 (CPU core reserved)
GTX280 (GT200) ~8500 240:80:32 @ 602

GTX285 (G200b) ~7000 240:80:32 @ 648 (calculated)
GTX285 (G200b) ~12000 240:80:32 @ 648 (CPU core reserved)
GTX295 (GT200b) ~10500 (2x) 240:80:32 @ 648 (Probably under-rated, see post #125)

Thanks in advance ...!

Cost comparison: (2010.0211)
(Newegg$ in RAC/$)
8600GTS $50 = 24
8800GTS $80 = 36 (no RAC data on this model - used RAC for 8800GT)

9800GT $95 = 32

GTX260 OC2 $210 = 26
GTX260 $215 = 28
GTX260 $215 = 42** (CPU core reserved)
GTX285 $390 = 18
GTX285 $390 = 31** (CPU core reserved)
GTX295 $540 = 19

** This does NOT take into account RAC lost on the reserved CPU core. Obviously lower RAC (and cheaper) CPUs with big video cards would be preferred - especially for a dedicated machine.

Duner · Sep 8, 2009

The 9600GSO, 9800GT, 9800GTX numbers were actually folding PPD numbers, not SETI RAC. I posted them in the other thread just to give an example of relative performance to each other.

I can confirm that a 9800GT 112:56:16 @ 666/1728/1000 (core/shader/memory) clocks does 3000RAC.

What I'm going to do is buy one of those cheap mobo/cpu combos and run 1 video card on them for a few weeks until RAC stabilizes and report back. I've got 4 different GPU's that I can test this way.

QuietIce · Sep 8, 2009

Thanks for the data!

Come on, guys, I know there are lots of you running CUDA out there ...

CryptokiD · Sep 10, 2009

how do you get rac of only cuda card and not cpu

Duner · Sep 10, 2009

Two ways.

1) have a stable CPU RAC and then when you add the GPU, see where it goes to. That's how I figured out the RAC for my 9800GT, though since the CUDA optimizations have come out, I imagine it could even be higher.
2) I just bought a rig off the classifieds. Old single core 3500+ combo for $35. I'm going to install a CUDA card, go into SETI preferences and only crunch with GPU, disable CPU crunching and that will give me the RAC for the card. I'll switch out the card after it's stable and see how each card performs. I figure I won't be losing too much RAC with that single core 3500+ not crunching.

QuietIce · Sep 10, 2009

As far as I know you can't (other than what Duner has stated).

But I have a plan that might work.

What I'd need from you guys is the average time for CPU units, which can only be seen in BOINC/Tasks. A screen shot showing a bunch of completed units would work for that as well. I'd also need to know what GPU's you have running in that rig plus some way I can distinguish that machine from others you may have. From there I can make all the calculations to figure a rough GPU RAC, assuming your rigs aren't hidden.

I've done this calulation for a couple of my non-CUDA rigs and both came out within ~500 RAC of the average for the last week of crunching, so it's not too far off.

Duner said:
Two ways.

1) have a stable CPU RAC and then when you add the GPU, see where it goes to. That's how I figured out the RAC for my 9800GT, though since the CUDA optimizations have come out, I imagine it could even be higher.
2) I just bought a rig off the classifieds. Old single core 3500+ combo for $35. I'm going to install a CUDA card, go into SETI preferences and only crunch with GPU, disable CPU crunching and that will give me the RAC for the card. I'll switch out the card after it's stable and see how each card performs. I figure I won't be losing too much RAC with that single core 3500+ not crunching.

A single core A64 will crunch 500-600 RAC in the 2.5-2.9 GHz range, 3.0 GHz runs just over 600 RAC for one core.

I look forward to your results ...

CryptokiD · Sep 10, 2009

could you calculate the rac from workunit time instead of disabling the cpu? that way no one would have to disable cpu.

my 9500gt has been going the newer more lengthly multimeam's in about 59 minutes. before they made the multimeabs take longer, it them them in siginificantly less time. maybe 35-45 minutes.

9500gt pci-e $49 @ compusa
g96 core. 32 shaders 550stock 760o/c
memory 10224mb ddr2 128bit bus 400mhzstock 640o/c
seti estimate 22gflops o/c and 16-17 stock

my 9800gtx+ thats now dead did the previous generation multimeams in about 14 minutes overclocked. that was a nice card :/ it o/c like a mad dog too.

yes, expanding on this idea more, is there any way to calculate the rac from a cuda workunit turnaround time? say i downclocked my 9500gt back to stock, let it churn out worknuits for a day and then averaged the results while throwing out the unutually long or slow workunits. can anything be done with that? coz my rig takes like weeks for the rac to stabilize, and i would hate to turn off the cpu for a week or 2.

QuietIce · Sep 11, 2009

CryptokiD said:
could you calculate the rac from workunit time instead of disabling the cpu? that way no one would have to disable cpu.

A screen shot showing your BOINC window with 2-3 dozen completed work units in it would be good. From that I should be able to calculate what the CPU is crunching.

I would also need to know which machine the screen shot is taken from, what GPU's you're running in it, and any OC on the GPU.

I don't expect anyone to stop crunching (CPU or GPU) for this ...

razorface · Sep 28, 2009

How about using the 'estimated XX GFLOPS' line from the BOINC message file. This is a non-subjective number that could be used as a base line. Adding a card and then measuring RAC increase is subject to too many unknowns. The everage time to complete, well arent the WU's all different complete times? Anyway, I have 4 different types, and here are the measurments for them:

Card Mem Est GFLOPS
9800GT 1024 63
9800GT 512 60
9800GT 512 55
8400GS 512 4
8400GT 512 8
9400GT 1024 8

Quite a bit of delta, eh?

Edited 10/25, I added a mem collum and some more cards. I dont know if these numbers mean anything now, cause two of the cards are identical, and show diff numbers.

Duner · Sep 28, 2009

Using the Gflops number is a great idea, but we should still try and figure out the RAC of each card. Maybe eventually we can come up with some formula for RAC/Gflop.

muddocktor · Sep 28, 2009

Does the estimated Gflops message also take into account any overclock on the card? In other words, is that number being given after a quick calculation test when you start the client up? If so, that would make a good number to keep track of then.

Here's the estimated Gflops on my 3 machines that are running the gpu client:

260GTX - (driver version 19062, compute capability 1.3, 896MB, est. 100GFLOPS) (updated)

8800 GTS 512 - (driver version 19062, compute capability 1.1, 512MB, est. 77GFLOPS)

GeForce 9800 GTX+ - (driver version 19062, compute capability 1.1, 512MB, est. 84GFLOPS)

nzaneb · Sep 28, 2009

muddocktor said:
Does the estimated Gflops message also take into account any overclock on the card? In other words, is that number being given after a quick calculation test when you start the client up?

Yes, that number will change with an overclock. I thought I read somewhere that Boinc was looking at implementing a Co-processor time count. That would easily solve our little debacle

Badbonji · Sep 28, 2009

muddocktor said:
Does the estimated Gflops message also take into account any overclock on the card? In other words, is that number being given after a quick calculation test when you start the client up? If so, that would make a good number to keep track of then.

Here's the estimated Gflops on my 3 machines that are running the gpu client:

260GTX - (driver version 19062, compute capability 1.3, 896MB, est. 87GFLOPS)

8800 GTS 512 - (driver version 19062, compute capability 1.1, 512MB, est. 77GFLOPS)

GeForce 9800 GTX+ - (driver version 19062, compute capability 1.1, 512MB, est. 84GFLOPS)

GTX260 looks a bit low compared to the other 2...

LandShark · Sep 28, 2009

9600M GT (MacBookPro) 14GFlops
8800GT (G92) 68-72GFlops
8800GTS/9800GT (G92) 84GFlops
GTS275 132-136GFlops

muddocktor · Sep 28, 2009

Badbonji said:
GTX260 looks a bit low compared to the other 2...

Yeah, it did look a bit low. I just closed out BOINC and restarted it and it gave me 100 GFLOPS this time, which sounds a bit better. I'll update the previous post on this.

razorface · Sep 28, 2009

muddocktor said:
Does the estimated Gflops message also take into account any overclock on the card? In other words, is that number being given after a quick calculation test when you start the client up? If so, that would make a good number to keep track of then.

No, I don't think it does, it would have to run a benchmark, which it dosen't seem too. I think they have a lookup table of some sorts. You could test it easily enough, I guess, by starting the client with no GPU OC'ing and then again with.

nzaneb · Sep 29, 2009

muddocktor said:
Yeah, it did look a bit low. I just closed out BOINC and restarted it and it gave me 100 GFLOPS this time, which sounds a bit better. I'll update the previous post on this.

My superclocked 260's show 104 Gflops each. Is your 260 a vanilla? Slight difference, for a slight factory OC. My 285's aren't crunching right now, so I can't be certain, but from memory they're around 127 or 137.

Boincs Gflop estimation is probably based on something simple, like the clock readings of the card. A quick query and it could obtain that, without running a benchmark.

CryptokiD · Sep 29, 2009

i dont know if that would be true. i have my 9500gt overclocked to 760mhz core and 1850mhz shader, and boinc only gives me around 22gflops for my card.

by the way my little 9500gt has done the last 100 or so work units in times between 10 minutes and 12 minutes each. is it just me or has cuda work units sped way up recently?

muddocktor · Sep 29, 2009

nzaneb said:
My superclocked 260's show 104 Gflops each. Is your 260 a vanilla? Slight difference, for a slight factory OC. My 285's aren't crunching right now, so I can't be certain, but from memory they're around 127 or 137.

Boincs Gflop estimation is probably based on something simple, like the clock readings of the card. A quick query and it could obtain that, without running a benchmark.

Yeah, mines a vanilla 55 nm evga model. I clocked it up a little more today and after restarting BOINC is showed 103.

CryptokiD said:
i dont know if that would be true. i have my 9500gt overclocked to 760mhz core and 1850mhz shader, and boinc only gives me around 22gflops for my card.

by the way my little 9500gt has done the last 100 or so work units in times between 10 minutes and 12 minutes each. is it just me or has cuda work units sped way up recently?

I think the CUDA processing has speeded up quite a bit with the lastest drivers and CUDA apps. My main system was running CUDA for about 6 months or so before I upgraded the drivers and apps and before it was averaging around 9-10k rac. Now it's at almost 14k and still climbing. Same vid card, just new vid drivers and updated BOINC client and apps.

nzaneb · Sep 30, 2009

nzaneb said:
Boincs Gflop estimation is probably based on something simple, like the clock readings of the card. A quick query and it could obtain that, without running a benchmark.

CryptokiD said:
i dont know if that would be true. i have my 9500gt overclocked to 760mhz core and 1850mhz shader, and boinc only gives me around 22gflops for my card.

The stream processor count is definitely factored into the Gflop number as well

CUDA performance

Disabled

Member

Disabled

Member

Member

Disabled

Member

Disabled

Member

Member

Retired

Senior Member

Member

Super Shark Moderator

Retired

Member

Senior Member

Member

Retired

Senior Member

Similar threads