• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

CUDA performance

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

QuietIce

Disabled
Joined
May 7, 2006
Location
Anywhere but there
I'm trying to create a database of CUDA SETI performance. If you run CUDA would you please list your graphics card(s), core clock (if not stock), and approximate RAC from the card?

Here is the little bit of data I've managed to find so far and I'll add to this list as you post new numbers:


8600GTS (G84) ~1200 32:16:8 @ 675
8800GS (G92) ~1500 96:48:12 @ 550
8800GT (G92) ~2900 112:56:16 @ 600

9600GT (G94) ~3000 64:32:16 @ 675 (CPU core reserved)
9600GSO (G92) ~3500 96:48:12 @ 550 (dedicated cruncher)
9800GT (G92) ~3000 112:56:16 @ 667
9800GT (G92) ~5000 112:56:16 @ 635 (CPU core reserved)

GTX260 OC2 (G200) ~5500 216:72:28 @ 630
GTX260 (GT200) ~6000 192:64:28 @ 685
GTX260 (GT200) ~9000 192:64:28 @ 685 (CPU core reserved)
GTX260 (GT200) ~10500 192:64:28 @ 711 (CPU core reserved)
GTX280 (GT200) ~8500 240:80:32 @ 602

GTX285 (G200b) ~7000 240:80:32 @ 648 (calculated)
GTX285 (G200b) ~12000 240:80:32 @ 648 (CPU core reserved)
GTX295 (GT200b) ~10500 (2x) 240:80:32 @ 648 (Probably under-rated, see post #125)


Thanks in advance ...! :)


Cost comparison: (2010.0211)
(Newegg$ in RAC/$)
8600GTS $50 = 24
8800GTS $80 = 36 (no RAC data on this model - used RAC for 8800GT)

9800GT $95 = 32

GTX260 OC2 $210 = 26
GTX260 $215 = 28
GTX260 $215 = 42** (CPU core reserved)
GTX285 $390 = 18
GTX285 $390 = 31** (CPU core reserved)
GTX295 $540 = 19

** This does NOT take into account RAC lost on the reserved CPU core. Obviously lower RAC (and cheaper) CPUs with big video cards would be preferred - especially for a dedicated machine.
 
Last edited:
The 9600GSO, 9800GT, 9800GTX numbers were actually folding PPD numbers, not SETI RAC. I posted them in the other thread just to give an example of relative performance to each other.

I can confirm that a 9800GT 112:56:16 @ 666/1728/1000 (core/shader/memory) clocks does 3000RAC.

What I'm going to do is buy one of those cheap mobo/cpu combos and run 1 video card on them for a few weeks until RAC stabilizes and report back. I've got 4 different GPU's that I can test this way.
 
Two ways.

1) have a stable CPU RAC and then when you add the GPU, see where it goes to. That's how I figured out the RAC for my 9800GT, though since the CUDA optimizations have come out, I imagine it could even be higher.
2) I just bought a rig off the classifieds. Old single core 3500+ combo for $35. I'm going to install a CUDA card, go into SETI preferences and only crunch with GPU, disable CPU crunching and that will give me the RAC for the card. I'll switch out the card after it's stable and see how each card performs. I figure I won't be losing too much RAC with that single core 3500+ not crunching.
 
As far as I know you can't (other than what Duner has stated). :( But I have a plan that might work.

What I'd need from you guys is the average time for CPU units, which can only be seen in BOINC/Tasks. A screen shot showing a bunch of completed units would work for that as well. I'd also need to know what GPU's you have running in that rig plus some way I can distinguish that machine from others you may have. From there I can make all the calculations to figure a rough GPU RAC, assuming your rigs aren't hidden. :)

I've done this calulation for a couple of my non-CUDA rigs and both came out within ~500 RAC of the average for the last week of crunching, so it's not too far off.
Two ways.

1) have a stable CPU RAC and then when you add the GPU, see where it goes to. That's how I figured out the RAC for my 9800GT, though since the CUDA optimizations have come out, I imagine it could even be higher.
2) I just bought a rig off the classifieds. Old single core 3500+ combo for $35. I'm going to install a CUDA card, go into SETI preferences and only crunch with GPU, disable CPU crunching and that will give me the RAC for the card. I'll switch out the card after it's stable and see how each card performs. I figure I won't be losing too much RAC with that single core 3500+ not crunching.
A single core A64 will crunch 500-600 RAC in the 2.5-2.9 GHz range, 3.0 GHz runs just over 600 RAC for one core.

I look forward to your results ... :)
 
Last edited:
could you calculate the rac from workunit time instead of disabling the cpu? that way no one would have to disable cpu.

my 9500gt has been going the newer more lengthly multimeam's in about 59 minutes. before they made the multimeabs take longer, it them them in siginificantly less time. maybe 35-45 minutes.

9500gt pci-e $49 @ compusa
g96 core. 32 shaders 550stock 760o/c
memory 10224mb ddr2 128bit bus 400mhzstock 640o/c
seti estimate 22gflops o/c and 16-17 stock

my 9800gtx+ thats now dead did the previous generation multimeams in about 14 minutes overclocked. that was a nice card :/ it o/c like a mad dog too.


yes, expanding on this idea more, is there any way to calculate the rac from a cuda workunit turnaround time? say i downclocked my 9500gt back to stock, let it churn out worknuits for a day and then averaged the results while throwing out the unutually long or slow workunits. can anything be done with that? coz my rig takes like weeks for the rac to stabilize, and i would hate to turn off the cpu for a week or 2.
 
could you calculate the rac from workunit time instead of disabling the cpu? that way no one would have to disable cpu.
A screen shot showing your BOINC window with 2-3 dozen completed work units in it would be good. From that I should be able to calculate what the CPU is crunching.

I would also need to know which machine the screen shot is taken from, what GPU's you're running in it, and any OC on the GPU.


I don't expect anyone to stop crunching (CPU or GPU) for this ...
 
Last edited:
How about using the 'estimated XX GFLOPS' line from the BOINC message file. This is a non-subjective number that could be used as a base line. Adding a card and then measuring RAC increase is subject to too many unknowns. The everage time to complete, well arent the WU's all different complete times? Anyway, I have 4 different types, and here are the measurments for them:

Card Mem Est GFLOPS
9800GT 1024 63
9800GT 512 60
9800GT 512 55
8400GS 512 4
8400GT 512 8
9400GT 1024 8

Quite a bit of delta, eh?

Edited 10/25, I added a mem collum and some more cards. I dont know if these numbers mean anything now, cause two of the cards are identical, and show diff numbers.
 
Last edited:
Using the Gflops number is a great idea, but we should still try and figure out the RAC of each card. Maybe eventually we can come up with some formula for RAC/Gflop.
 
Does the estimated Gflops message also take into account any overclock on the card? In other words, is that number being given after a quick calculation test when you start the client up? If so, that would make a good number to keep track of then.

Here's the estimated Gflops on my 3 machines that are running the gpu client:

260GTX - (driver version 19062, compute capability 1.3, 896MB, est. 100GFLOPS) (updated)

8800 GTS 512 - (driver version 19062, compute capability 1.1, 512MB, est. 77GFLOPS)

GeForce 9800 GTX+ - (driver version 19062, compute capability 1.1, 512MB, est. 84GFLOPS)
 
Last edited:
Does the estimated Gflops message also take into account any overclock on the card? In other words, is that number being given after a quick calculation test when you start the client up?


Yes, that number will change with an overclock. I thought I read somewhere that Boinc was looking at implementing a Co-processor time count. That would easily solve our little debacle:)
 
Does the estimated Gflops message also take into account any overclock on the card? In other words, is that number being given after a quick calculation test when you start the client up? If so, that would make a good number to keep track of then.

Here's the estimated Gflops on my 3 machines that are running the gpu client:

260GTX - (driver version 19062, compute capability 1.3, 896MB, est. 87GFLOPS)

8800 GTS 512 - (driver version 19062, compute capability 1.1, 512MB, est. 77GFLOPS)

GeForce 9800 GTX+ - (driver version 19062, compute capability 1.1, 512MB, est. 84GFLOPS)

GTX260 looks a bit low compared to the other 2...
 
Does the estimated Gflops message also take into account any overclock on the card? In other words, is that number being given after a quick calculation test when you start the client up? If so, that would make a good number to keep track of then.

No, I don't think it does, it would have to run a benchmark, which it dosen't seem too. I think they have a lookup table of some sorts. You could test it easily enough, I guess, by starting the client with no GPU OC'ing and then again with.
 
Yeah, it did look a bit low. I just closed out BOINC and restarted it and it gave me 100 GFLOPS this time, which sounds a bit better. I'll update the previous post on this.

My superclocked 260's show 104 Gflops each. Is your 260 a vanilla? Slight difference, for a slight factory OC. My 285's aren't crunching right now, so I can't be certain, but from memory they're around 127 or 137.

Boincs Gflop estimation is probably based on something simple, like the clock readings of the card. A quick query and it could obtain that, without running a benchmark.
 
i dont know if that would be true. i have my 9500gt overclocked to 760mhz core and 1850mhz shader, and boinc only gives me around 22gflops for my card.

by the way my little 9500gt has done the last 100 or so work units in times between 10 minutes and 12 minutes each. is it just me or has cuda work units sped way up recently?
 
My superclocked 260's show 104 Gflops each. Is your 260 a vanilla? Slight difference, for a slight factory OC. My 285's aren't crunching right now, so I can't be certain, but from memory they're around 127 or 137.

Boincs Gflop estimation is probably based on something simple, like the clock readings of the card. A quick query and it could obtain that, without running a benchmark.

Yeah, mines a vanilla 55 nm evga model. I clocked it up a little more today and after restarting BOINC is showed 103.

i dont know if that would be true. i have my 9500gt overclocked to 760mhz core and 1850mhz shader, and boinc only gives me around 22gflops for my card.

by the way my little 9500gt has done the last 100 or so work units in times between 10 minutes and 12 minutes each. is it just me or has cuda work units sped way up recently?

I think the CUDA processing has speeded up quite a bit with the lastest drivers and CUDA apps. My main system was running CUDA for about 6 months or so before I upgraded the drivers and apps and before it was averaging around 9-10k rac. Now it's at almost 14k and still climbing. Same vid card, just new vid drivers and updated BOINC client and apps.
 
Boincs Gflop estimation is probably based on something simple, like the clock readings of the card. A quick query and it could obtain that, without running a benchmark.

i dont know if that would be true. i have my 9500gt overclocked to 760mhz core and 1850mhz shader, and boinc only gives me around 22gflops for my card.

The stream processor count is definitely factored into the Gflop number as well
 
Back