PDA

View Full Version : bang for buck CUDA? advice?


Codeman05
04-05-09, 11:50 AM
I have enough parts it looks like to put up another CUDA rig. I'm not sure what the best bang for the buck is in terms of the cards to put in it, so looking to you all for advice!

I'm pretty new to the whole CUDA scene, so while I'm sure a gtx260 crunches faster then a 9800gtx for example....is it a significant difference? Or does a 8800gt do the job.

It is a tri-16x slot board, so I'm looking to put in 2-3 cards. CPU is a clunker e2200 (maybe 4500 have to check again). What would you guys recommend going with?

Thanks for the input

EDIT
So far the 8800GT or equivalent (9800gtx, etc) seems to be the way to go.
My GTX285's crank out CUDAs in 7-9 mins, whereas my 8800gt's seem to do 18-22 mins and can be had for 60-80 bucks a pop if your lucky.

For reference, my 8800GS's do roughly 35-40 mins.

# of shaders is key for CUDA apps.

nzaneb
04-14-09, 08:49 AM
I've been wondering this same thing...
Do you know what a single one of your 285's does? I'm going to fire up a 8600GT soon, so maybe we can do our own little price/performance comparison. Seeing how the 8600GT is about as low as you can go on the CUDA list, and your 285's are about at the top, we might have a decent comparison. Maybe someone with a midrange card can chime in with some numbers too.
This is my first CUDA enabled card, that I picked on the cheap, specifically for SETI... hoping it works well. I've got plenty of ATI power... too bad I can't use it :)

Codeman05
04-14-09, 09:43 AM
My understanding of CUDA...and I'm pretty new to this...is that the more stream processors (shaders) the better. Shader speed is very also important.

Now to give you an idea, here is the shader count on some popular CUDA cards
GTX280/285: 240
GTX260: 192
9800GTX: 128
8800GT: 112
8800GS: 96
8600GT: 32

Now I don't have a real precise way of knowing what my GPU's are individually putting out. My RAC hasn't stabilized yet and due to the CPU's crunching at the same time and the whole pending credit mess (over 100k right now), it's hard to say at this point.

The best way to test this would be to setup a rig with CUDA only and let it run 24 hours under a different PC name so that you know exactly what it did. Then with that exact same rig, run the alternate card you want to test. If you guys are really interested in something like this, I may be able to put one card in my X5200 rig sometime next week following the afore mentioned procedure. Of course, not all WU's are created equal, but I assume that if you look at the point value, you should have a decent comparison...rather than comparing WU volume.....if there is real interest in experimenting with a few different cards, I can do this with a GTX285, 8800gs, and 8800gt but it won't be until the end of the month.


Now I have a rig running 8800GS's at the moment as well. I can pretty well surmise from watching the two machines closely for several days that the GTX285's are easily a minimum of 2x as fast...card per card. This does correlate pretty well with the shader count...amoung other things (higher clocks etc...). Another note is that SETI shows my overclocked 8800GS's worth 50Gflops each...whereas my GTX285s show to be 130Gflops each stock. Obviously the higher the better...


All of that said, the 285's and the GTX series in general are damn fast...but I'm not sure I'd recommend putting them in a CUDA only cruncher. The price:RAC ratio is still a bit high. This rig is also my gaming rig, and I got the cards for a good price, so I was more or less able to justify it with 2 points of attack ;) Though it does crunch far more than it games.


I think the 8800 series, and probably the 8800GT particularly...would be the best bang for the buck. I've seen them new for $89 and I'm sure you can do better used.
If prices on the GTX260 continue to drop, that would probably be another good cuda bargin before long.

Of course, another good question would be watt/rac ratio for each card and how that plays into the price/performance mix.
For example (and this is totally theoretical). 2x 8800GTs together may consume more power than a single GTX285 that is putting out slightly higher rac.


Again, I'm new to CUDA, so please correct me if any of this seems out of wack. I've been looking it over pretty heavily over the last 1-2 weeks though so it should be fairly close ;)

shabado
04-14-09, 09:42 PM
72813

I have an ASUS EN9800GT. It completes a standard WU in 20 minutes and a shortie in a little over 8 minutes.

Every time a SETI WU is completed an entry goes into the job log. In XP it is located in "C:\Documents and Settings\All Users\Application Data\BOINC". The Application Data folder is a hidden folder. You must show hidden files and folders to see it.

Col 1:1239635091.875000 is the time in seconds since Jan 1 1970 that the entry was written.

Col 2: 344.039352 I think is time in seconds but I don't yet fully understand it.

Col 3: 80.234380 Is CPU time required to manage the CUDA card.

Col 4: 23780000000000.000000 is flops, I think.

Col 5: 13fe09aa.8280.13160.14.8.41_1 is the WU file name.

Col 6: 501.640625 is the clock time in seconds that it took to complete the WU.

Most of the shorties were completed in about 500 seconds.
Most normal AR WU's were completed in 1180 seconds.
Toward the end of the file there are 3 WU's that took over 5000 seconds. These are VLAR WU's which are very tough for the CUDA card. Rastermer over at KWSN lunatics has an optimized version which automatically deletes the VLAR's.

In about 33 hours my 9800GT completed 103 WU's

nzaneb
04-15-09, 08:25 AM
Thanks for the tips Shabado. As soon as I get the 8600GT I'll report in with the same information. Although I run AP exclusively, I'll have to switch it over for an equal test bed.

SuperMiguel
04-15-09, 08:40 AM
are CUDA's mainly for programers?

QuietIce
04-15-09, 09:15 AM
I wouldn't say so. There was a lot of confusion and problems with the launch (as there are with most new applications) but other than that it's relatively straight-forward to use ...

Voodoo Rufus
04-30-09, 01:02 PM
Between a single GTX285 and a pair of GTS250s, which would probably be the faster cruncher overall? I have an IP35 with 2 spare PCIE slots to fill.

Codeman05
04-30-09, 01:18 PM
A pair of 250's should be a bit faster.

According to tech report, Gflops rating (which is the main factor in crunching performance) for a GTS250 is 484 Gflops. A GTX285 is 744 Gflops. Since the 250 has roughly half the number of shaders (128), that sounds about right.

So you would likely be looking at about 15-25% more output with a pair of 250s vs a single GTX285.

I have some 8800GT's (fairly similar to a GTS250 in terms of CUDA), they are about half the speed of my 285s...so that seems to confirm the above as well. The GTS250s will be a bit faster than that, so two would be faster than a single 285.

YRMV of course

nzaneb
05-01-09, 08:35 AM
Seeing how ATi GPU support keeps getting pushed back. I just picked up a 9800GX2 for the cause :) Hopefully I can get it optimized correctly.

Codeman05
05-01-09, 09:44 AM
what's your userid on S@H?

nzaneb
05-01-09, 11:15 AM
what's your userid on S@H?

N-Zane-B... I only started crunching a few months ago, so Total credits I'm still a ways down the list (172 today), but daily output I'm in the top 25... definitely not top 5 like you...yet :)

Codeman05
05-01-09, 11:22 AM
Cool, yea i was just curious...I want to see what that GX2 does :)
That addition may get you up into the top 10 or atleast really close :beer:

nzaneb
05-01-09, 02:37 PM
Any tips on getting the optimizations to work? I've been trying to get my other CUDA rig working properly, but haven't been very successful. It's only SSE2 P4 with an 8600GT. I've been trying to use Raistmers optimizations, but I can never get it working correctly. I've dropped the files into the projects folder, and added a cc_config file with an added CPU. After my attempt last night I actually had 4 tasks running at the same time?! If I set it back to no optimizations... it doesn't seem to be running on the GPU very much. It gets stuck for an hour or two and then it decides to do a CUDA WU, then it waits a bit more, and does another? I'm hoping the newer Host and the 9800GX2 will be easier to optimize.

Codeman05
05-01-09, 04:43 PM
it can be a real PITA. there have been many updates to the optimized app...however the files are usually buried in the middle of a 20 page thread, so kinda hard to find. It took me a few days to get my AP/CUDA rigs running correctly.

When you setup your new rig, let me know. I have a fully configured SSSE3 AP/MB/CUDA app running well on two of my crunchers. I can just zip it all up and set it to you. All you would have to do is make a couple of minor changes to your app_info file.

That way you don't have to dig through all of the threads and start from scratch

QuietIce
05-01-09, 07:10 PM
You could copy/paste that app_info file - hint, hint ...! ;)

Codeman05
05-01-09, 07:32 PM
Ok well here is what I'm using, maybe it will help.
It is basically a culmination of the most recent Raistmer's mods. (AKv8, MBv11, etc...)
I zipped all the files into the zip below, you just need to follow a few steps to make it work for you.

Do this at your own risk, I don't see how this can cause a problem, but I accept no responsibility for you hosing BOINC, your PC, etc...
I've been using this configuration for almost a month now and my single rig RAC is at 16.5k and climbing...so it works rather well.

TO INSTALL UNMODIFIED REQUIRES
1) BOINC 6.6.20
2) 64bit O/S
3) A SSSE3 capable CPU
4) CUDA capable GPU(s)
You can switch to x86 and even SSE2...just replace the AP apps (leave CUDA alone) and change the app_info to reflect the new file names.


I zipped all of the "opt apt" files I'm using, download here:
http://codeman05.net/ocf/s@h-SSSE3x-CUDA.zip

PHASE 1
Now before you do this, go into BOINC and do the following:
1) Transmit all completed WUs
2) Suspend Network Activity
3) Allow no new tasks
4) Close BOINC


PHASE 2
1) Extract the contents of the zip file.
2) Edit the app_info.xml file (NOTEPAD ONLY)
3) Edit all of the FLOP values using the info below

Astropulse 500 flops is CPU FLOPS * 2.25
Astropulse 503 flops is CPU FLOPS * 2.6
Multibeam 603 flops is CPU FLOPS * 1.75
Multibeam 608 flops is GPU FLOPS * 0.2 (IE: 40Gflops =40,000,000,000 x 0.2)


The CPU flops value can be found in "client_state.xml" file. Look for the p_flops entry. The GPU flops can be found in the Messages tab in BOINC when you first fire up the client. Note that if you have multiple GPUs, you only use the Gflops value of a single card.

4) Edit the Number_of_GPUs file....put in the number of GPUs. Note that I don't believe this is used anymore, but it can't hurt.

5) Save these files and copy EVERYTHING into the projects/seti@home folder (where the WUs are stored).


PHASE 3
1) Go to the S@H website and make sure you have the following preferences enabled:

If no work for selected applications is available, accept work from other applications? yes
Use Graphics Processing Unit (GPU) if available yes
Use Central Processing Unit (CPU) yes


2) Start BOINC. Pay attention to the messages screen and make sure you don't see any errors.

3) Assuming you have no errors, Enable network activity and new tasks. However I strongly recommend that you RESET the project at this time as well.


You should get AP wu's pretty fast. You may not get CUDA (or vise-versa) immediately, give it some time.

If everything is working correctly you should now have a process running on each CPU core and each GPU. So on a C2D with two GPUs, you'd have 4 processes running for example


***DO NOT BE ALARMED IF YOU GET COMPUTATION ERRORS***
This is most likely VLAR kill. IE: VLAR WU's complete no faster on a GPU than a CPU, so the optimized app "kills" the task so you don't waste your time with it. That said, be sure to check the results on the seti@home website to make sure it says that VLAR kill was the reason and it was not a configuration error on your part.

Codeman05
05-01-09, 07:39 PM
Finally...if you have a SSE3 or SSE2 pc. You can use the above as a template to getting that to work. I did such for one of my older AMD rigs that has a couple of GPUs on it. It was slightly more complicated, as you have to locate the SSE2 optimized app files and then modify the app_info file to call those functions instead.

QuietIce
05-01-09, 07:47 PM
That's straight-forward enough ... Thanks! :)

Codeman05
05-01-09, 07:48 PM
yep, if any of you use the above, please let me know how it goes.

Also as a side note, the MB CUDA file says V10, but it is the most recent v11 file released last week...I just renamed it for simplicities sake ;)

Crunch on :beer:

QuietIce
05-01-09, 08:03 PM
OK, swapped x64 SSSE3 apps for AK_v8_win_SSE3 apps and changed the names in the app_info file. Changed CUDA app back to V11 (just to keep things straight!) and changed it's names in app_info. All set to swap vid cards and update SETI first thing in the morning. That'll give me all day tomorrow & Sunday to keep my eye on it. ;)

For this rig it'll be really easy to tell what kind of output difference the apps make. It's an old, but very reliable, Opty 165 - my file server - and it's been running 24/7 for over two years now so there's a long history for comparison. :) Over that time it's back-slipped slightly (changes at SETI) from 1250 RAC in late 2006 to it's current 1150-1200 RAC :( (and it's on AK_v8).


I think I'm ready to go ...! :beer:

Codeman05
05-01-09, 08:22 PM
sweet! Yep sounds like you should be golden, keep us posted!

nzaneb
05-01-09, 10:36 PM
You're a life saver man! Now I'm anxious to try it out :) Thanks!:beer:
EDIT: It appears to be working MUCH faster on my sse2 rig. I'll keep an eye on it and see what kind of results I get. 9800gx2 will be soon. Thanks for the help Codeman!

nzaneb
05-05-09, 07:37 AM
Well I've been struggling with this 8600GT, it just didn't seem to be doing much work.
So I went back through Codeman's write up, and low-and-behold there it was staring me right in the face.

Astropulse 500 flops is CPU FLOPS * 2.25
Astropulse 503 flops is CPU FLOPS * 2.6
Multibeam 603 flops is CPU FLOPS * 1.75
Multibeam 608 flops is GPU FLOPS * 0.2
My mistake was to use the CPU flops for the 608 calculation, not the GPU flops. So a quick recalculation, input, and WHOA! HUGE difference! CUDA units are now only taking around 46 minutes each, as opposed to several hours before.

So I have a question for Codeman or any other GPU Guru. What happens if I over estimate my GPU flop value? For instance, what if I use a .3 or .4 multiplier instead of .2, or maybe use 16 Gflops as opposed to the stated 15 Gflops? Will I get computation errors? Is there a possiblity that it will just run even faster?

Codeman05
05-05-09, 08:05 AM
glad you got it fixed.

My understanding is that the flops values in the app_info are primarily used to determine your queue size and make sure you have enough work for your rig.

Feel free to try different variables. I'm skeptical of seeing much if any performance increase, but I could be wrong.

nzaneb
05-05-09, 10:03 AM
glad you got it fixed.

My understanding is that the flops values in the app_info are primarily used to determine your queue size and make sure you have enough work for your rig.

Feel free to try different variables. I'm skeptical of seeing much if any performance increase, but I could be wrong.

You're probably right about the performance increase. I'm not sure that there isn't some performance gain though, from correct values. For the past few days it hardly completed 3 or 4 CUDA wu's, and since last night when I reconfigured it, it's cranked out at least 10. So something about the flop value change sped it up.

Codeman05
05-05-09, 10:11 AM
right i don't doubt you there at all. The wrong values will definitely confuse the apps and screw things up as you saw initially. I'm just not sure what the tolerances are.

If you decide to play with those values, definitely post it up

nzaneb
05-05-09, 01:43 PM
right i don't doubt you there at all. The wrong values will definitely confuse the apps and screw things up as you saw initially. I'm just not sure what the tolerances are.

If you decide to play with those values, definitely post it up

Over lunch, I adjusted the GPU flop value to where it was much closer to the actual computation time. So after crunch all night and morning on the last flop value change, I should have a little something to compare to, and see my if recent adjustment makes any change in the actual computation times... Now I'm anxious to get home and see how it's doing :)

nzaneb
05-05-09, 08:46 PM
No noticeable change in computation times, simply closer on the estimated finishing times. So evidently extremely low flop values have a very negative effect, but that's all.

QuietIce
05-05-09, 11:07 PM
We'll see if that helps my old Opty.

I changed the flops again just using the existing flops in the app_info file v the finish times as a base. I played with the numbers until finishing times were more in-line with what they should be ... :)

nzaneb
05-07-09, 01:24 PM
Well... I fired up the new 9800GX2 last night! It seems to be optimized correctly. It's slowed down my AP units on the 720, by a little bit, but that's to be expected due to the GPU's needing a little bit of the CPU. So far its gone through 130 WU's in about 19 hours. while still completing AP units in around 50,000 CPU seconds. Alot of the early CUDA WU's were shorties, but it's still running through quite a few. I can't wait to see what it does to my RAC on that Host. It's absolutely smashing the 8600GT performance in my old P4 rig... but that's to be expected:) Now I want another... :screwy:

Robbman
05-07-09, 04:55 PM
Just based on cost per stream processor and factoring in architecture, the best bang for buck CUDA card would be a GTX 260 216. Don't know if I'd want three in a rig though...

I've seen 9800GTX+s at $100 with MIR, so those would be another very good bang for the buck. Though if the price was right, a GTS 250 would be my choice, simply becuase it only needs one 6-pin connector (three of those would be a decent cruncher)

Voodoo Rufus
05-11-09, 12:34 PM
I've just finished following codeman's instructions, and this thing is acting funny. It's running 8 AP units simultaneously and ripping through Cuda units like nobodies business, faster than I've ever seen it. Shouldn't it only be using 4 AP units on a quad core QX9650? A few of my AP units have ended in computation errors as well.

nzaneb
05-11-09, 12:40 PM
I've just finished following codeman's instructions, and this thing is acting funny. It's running 8 AP units simultaneously and ripping through Cuda units like nobodies business, faster than I've ever seen it. Shouldn't it only be using 4 AP units on a quad core QX9650? A few of my AP units have ended in computation errors as well.

Do you have Hyperthreading enabled? That would be the only reason I could imagine that 8 AP units would be going at once.
As far as the CUDA, how fast are we talking here? a few seconds, a couple minutes. It could be that it picked up a bunch of VLAR's to start with... mine did that... and it's killing each of them off as soon as it realizes what they are.

Voodoo Rufus
05-11-09, 12:43 PM
Hyperthreading is not available on Yorkfields, only i7s and some P4s.

Cudas are finishing normally in a couple minutes, taking only 22 seconds of CPU time.

APs are dying all at 5:25 of CPU time.

nzaneb
05-11-09, 12:50 PM
Hyperthreading is not available on Yorkfields, only i7s and some P4s.

Cudas are finishing normally in a couple minutes, taking only 22 seconds of CPU time.

My bad. Can you tell I'm not an Intel guy?:) I've got 2 P4's running, both with HT, and I knew the I7 had it... so I just assumed all current Intels did. That's what I get for ASSuming :), so that's probably why you're getting computation errors. What optimizations did you use?

For the GPU, it may be running ok. For comparision, my 9800GX2 takes 10 minutes (give or take a minute) per WU, for each gpu; and depending on the unit as little as 60 CPU seconds.

Voodoo Rufus
05-11-09, 12:53 PM
I'm just using Codeman's instructions to the letter.

At first I considered that my CPU was at fault as it's running a little warmer than just doing MB units, but with ALL aborting at 5:25, it makes me scratch my head.

nzaneb
05-11-09, 01:04 PM
Did you go in and change the Flop values? I'm 'assuming' your proc is SSSE3X capable. I used his format, but I had to go in and replace the SSSE3X apps (and change the app_info) for use on my SSE3 and SSE2 units. I had the Flop values wrong on one and it went wacko.

Codeman05
05-11-09, 01:12 PM
something is definitely off. I'm not sure that having incorrect flop values would do that, but its possible.

Looking at your errored out WU's, something is definitely wrong with the AP processing.
I'm fairly sure the QX series supports SSSE3x, but double check with cpu-z. Assuming it does, post up your app_info.xml file, more than likely there is an error inside.

Voodoo Rufus
05-11-09, 01:14 PM
I could not find a "p_flops" string in the client state file so I used the biggest flops number I could find. I did find a p_fpops string. Think that would work?

My cpu does support ssse3x.

Voodoo Rufus
05-11-09, 01:14 PM
<app_info>
<app>
<name>astropulse</name>
</app>
<file_info>
<name>ap_5.00r103_SSE3.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>astropulse</app_name>
<version_num>500</version_num>
<flops>48511399689315</flops>
<file_ref>
<file_name>ap_5.00r103_SSE3.exe</file_name>
<main_program/>
</file_ref>
</app_version>
<app>
<name>astropulse_v5</name>
</app>
<file_info>
<name>ap_5.03r112_SSE3.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>astropulse_v5</app_name>
<version_num>503</version_num>
<flops>56057617418764</flops>
<file_ref>
<file_name>ap_5.03r112_SSE3.exe</file_name>
<main_program/>
</file_ref>
</app_version>
<app>
<name>setiathome_enhanced</name>
</app>
<file_info>
<name>AK_v8_win_x64_SSSE3x.exe</name>
<executable/>
</file_info>
<file_info>
<name>MB_6.08_mod_CUDA_V10.exe</name>
<executable/>
</file_info>
<file_info>
<name>cudart.dll</name>
<executable/>
</file_info>
<file_info>
<name>cufft.dll</name>
<executable/>
</file_info>
<file_info>
<name>libfftw3f-3-1-1a_upx.dll</name>
<executable/>
</file_info>
<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>603</version_num>
<platform>windows_intelx86</platform>
<flops>37731088647245</flops>
<file_ref>
<file_name>AK_v8_win_x64_SSSE3x.exe</file_name>
<main_program/>
</file_ref>
</app_version>
<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>608</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.127970</avg_ncpus>
<max_ncpus>0.127970</max_ncpus>
<flops>21200000000</flops>
<plan_class>cuda</plan_class>
<file_ref>
<file_name>MB_6.08_mod_CUDA_V10.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>cudart.dll</file_name>
</file_ref>
<file_ref>
<file_name>cufft.dll</file_name>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-1-1a_upx.dll</file_name>
</file_ref>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
</app_version>
</app_info>

nzaneb
05-11-09, 01:14 PM
What OS is it installed on. The included AK v8 is for x64. Not sure if installing it on a 32 would mess with anything else.

nzaneb
05-11-09, 01:16 PM
I could not find a "p_flops" string in the client state file so I used the biggest flops number I could find. I did find a p_fpops string. Think that would work?

My cpu does support ssse3x.

That's the one that I use, and it seems to work out.

Codeman05
05-11-09, 01:19 PM
it's on XP64....which should be fine though I've never tried it on that o/s.

Flop values are wayyyyyy too high, what's the first entry of <f_fpops> in your client_state.xml ?

QuietIce
05-11-09, 01:20 PM
I've just finished following codeman's instructions, and this thing is acting funny. It's running 8 AP units simultaneously and ripping through Cuda units like nobodies business, faster than I've ever seen it. Shouldn't it only be using 4 AP units on a quad core QX9650? A few of my AP units have ended in computation errors as well. Strange that the QX would do it but not the Q and I just checked - SETI shows it as 4 cores, not 8. *frown*


Just thought of something - you might want take a look at your client_state file. It'll be in the base BOINC folder - not projects. There's a line under host_info (about seven lines down in the file) that reads (on my old Opty): <p_ncpus>2</p_ncpus>

Might want to check and make sure yours says '4' there and not '8' ...

Voodoo Rufus
05-11-09, 01:23 PM
<p_fpops>4192500487.904749</p_fpops>

I'll recalculate the numbers and insert them and see what happens.

p_ncpus is 4

nzaneb
05-11-09, 01:24 PM
it's on XP64....which should be fine though I've never tried it on that o/s.

Flop values are wayyyyyy too high, what's the first entry of <f_fpops> in your client_state.xml ?

Also, what Flops value shows up for your GPU (not sure which you're using) in the Messages tab.

Voodoo Rufus
05-11-09, 01:28 PM
106GFlops for each GPU. I'm running a pair of GTX295s.

I had my cc_config.xml still active with 8 cpus listed (for Raistmer's V10 app). I disabled it and I now have 4 AP/603s running presently instead of 8.

Codeman05
05-11-09, 01:30 PM
yea, n_cpu's is no longer needed in cc_config, recommended you remove.

APs working now?

Voodoo Rufus
05-11-09, 01:30 PM
Nah, I fixed the 4 vs. 8 thing.

Do I need to reset the project? I have a hefty amount of work in the queue already. Will decimals cause problems if I left them in there?

Codeman05
05-11-09, 01:33 PM
they might, I've never seen anyone leave them in. I seem to recall Raistmaster saying not to leave them in.

Your corrected flop values should be something like:
AP 500: 9433126095
AP 503: 10900501266
MB 603: 7336875852


I would recommend resetting the project. If you see a few error in a row, kill it.

You can try without though first.


If that still errors, i think I'd do a fresh install of BOINC, making sure to nuke the BOINC/DATA folder as it sounds like something may be conflicting with the new apps.

nzaneb
05-11-09, 01:37 PM
106GFlops for each GPU. I'm running a pair of GTX295s.

I had my cc_config.xml still active with 8 cpus listed (for Raistmer's V10 app). I disabled it and I now have 4 AP/603s running presently instead of 8.

:drool: Now that's a rig! You should be 20,000+ on that thing alone! I'm glad you're on OUR team :)

Are the GPU's still screaming through units. You may have just gotten a bunch of shorties.

Voodoo Rufus
05-11-09, 01:40 PM
Yeah, it's my winter space heater at 600W continuous consumption. :D

The quad is running a hair under 4GHz on air cooling, also.

The highest I've gotten was 17k RAC on it and it never leveled out. Stupid instabilities due to OCing and my crazy work schedule do not help with getting a rig to play nice 24/7 sometimes. The GPUs are running stock speeds now unfortunately.

Project is reset and downloading tasks slowly. I'll report back on how it works out.

Voodoo Rufus
05-11-09, 01:57 PM
APs are moving along nicely now and my CUDA's are between 17 and 35 seconds.

Thanks for the help, guys.

Codeman05
05-11-09, 01:59 PM
good to hear!

Voodoo Rufus
05-11-09, 04:12 PM
This is kind of odd, but rather minor. Some of my Cuda units are taking ~12 minutes to complete. I'm guessing these are Vlars that are not killed?

Codeman05
05-11-09, 05:16 PM
sounds normal.

7-12 minutes are about average for the GTX280,285,295.
Those usually generate about 32-48 credits/wu.

You may get the occasional 2-4 min WU or less (only worth ~10 credits or less).
If a VLAR slips through, you might see 30 minutes.

You can take a look at your tasks on the seti@home website to verify. Note that it shows CPU time however, not elapsed time. A 10 min CUDA wu will probably only show 120-200 seconds of CPU time.

For reference, I'd say about 90% of my cuda units are taking 9-10 mins per gpu, you should be similar, so sounds like your on pace.

QuietIce
05-11-09, 07:31 PM
If we're ever to figure out a bang for buck comparison we need some intermediate card speeds. I've got the low end covered - it takes the 8400GS forever to crunch a WU (~110 min) and the top end cards are running 9-10 minutes ...

Codeman05
05-11-09, 07:36 PM
my 8800gt's seem to be about 20 mins on average. 8800gs's about 35 mins

QuietIce
05-11-09, 07:41 PM
Hmmm, that information, along with the higher-end card times, might be a nice addition to the first post ... ;)

Codeman05
05-11-09, 08:34 PM
nah, make em work for it ;)
















...edited #1

Voodoo Rufus
05-11-09, 09:52 PM
Looks to me like VLARs are not getting killed.

nzaneb
05-11-09, 09:54 PM
8600GT is taking right around 46 minutes for a 54 credit WU.

Codeman05
05-12-09, 07:53 AM
Looks to me like VLARs are not getting killed.

The v11 cuda app (one in my file your using) has been pretty good about that, how long are they taking?

With 4 GPU's, you may see slightly slower times as well due to the increased number of i/o hits. Each GPU needs some CPU time to get going, and with all CPUs crunching as well, the more GPUs you toss in, the longer wall time you will see on cuda units.

I looked at a few dozen of your completed WU's and nothing looks out of the ordinary, times look decent and angles are in green, .3-.4.

Watch your angle range on completed units, around 0.1 and lower is considered a VLAR (0.05 being true vlar). A couple may slip through, but I rarely see any.

QuietIce
05-12-09, 09:22 AM
8600GT is taking right around 46 minutes for a 54 credit WU. Got numbers on those 35-45 CS units? I've got a slew of them (especially 38-39 CS units) that both GPU and CPU's crunch so I thought everybody else got lots of them as well? I don't think the GPU has crunched any 54 CS units ...

Codeman05
05-12-09, 09:32 AM
CS? credit score?

I think it's just luck of the draw. I get a lot of 39 cred units, but some 50s and even 19s =/
The amount of processing time is relative to the credit score, so for the most part it's all relative.

QuietIce
05-12-09, 09:40 AM
CS? credit score? CobbleStones!

At least, that's the term the Berkeley people use most often.
I bet you thought that was just something I dreamed up for the Milestones thread! :p


It seems all of my rigs have those 35-45 CS units - s939 single and DC, AM2, AM2+, Q6600, and now CUDA - all of them, 1/3 to 1/2 of which are the 38/39 CS units. I just figured the 38/39 units were really popular so they'd be a good WU to use as a reference ... :shrug:

Codeman05
05-12-09, 09:52 AM
ahhh....durrrrr... :banghead:

Where are you seeing the CS value for a WU?

Voodoo Rufus
05-12-09, 10:12 AM
The vast majority are taking 13-20 seconds of elapsed time while very few (~5%) are taking 10-17 minutes of elapsed time.

Codeman05
05-12-09, 10:16 AM
The vast majority are taking 13-20 seconds of elapsed time while very few (~5%) are taking 10-17 minutes of elapsed time.

If you look at the seti website, your only getting .1 credits for those units taking 13-20 seconds...those that are taking 10+ minutes are giving you 30-40 credits.

A GTX295 is basically two GTX280's on a single card, so 10-17 minutes per wu is in line for a standard WU.

So nothing is wrong, you just have a lot of quick WU's in your queue :)
Once you run through those, you will likely start getting a vast majority of 10-15 minute units.


In the end, it all averages out, so I wouldn't worry about it.

QuietIce
05-12-09, 11:06 AM
ahhh....durrrrr... :banghead:

Where are you seeing the CS value for a WU? That's what the "credits" are, they're CobbleStones. You still see it used on the message boards sometimes.
http://en.wikipedia.org/wiki/BOINC_Credit_System#cobblestones


And "CS" (some people probably use 'c' or 'C') is a lot easier to type than "credits". I also like using it because I get credit in CS, not credit in credits, so there's no mix-up about what someone is saying. :)



BTW - Thanks for adding the info in the first post ... :):thup:

Voodoo Rufus
05-12-09, 11:12 AM
Bah, I always thought that those .01 units were all of my computation error ones. :(

Can you explain this message on my .01s?

Work Unit Info:
...............
WU true angle range is : 0.509319
SETI@Home Informational message -9 result_overflow
NOTE: The number of result
</stderr_txt>

Codeman05
05-12-09, 11:16 AM
Gotcha thanks for clarifying :beer:

QuietIce
05-12-09, 12:58 PM
Well. after looking through those rough numbers you posted and :bang head: for an hour I can't get any type of linear relationship out of it. I tried shaders, core speeds, both, adding in RAM, throwing in a fudge factor for RAM - it's just not linear. :(

Obviously there's more at work there and I would assume it's the core, with later cores being intrinsically more efficient than earlier ones. Makes it hard to predict the value of new cards, though. :-/


If I've got it right the RAC for the cards listed falls roughly like this:
8800GS ~1500 (~$55)
8800GT ~2900 (~$90)
GTX285 ~7000 :eek:

Codeman05
05-12-09, 01:14 PM
I'd say those RAC figures look pretty accurate. I'll have a better idea once my rigs stabilize, but that's probably pretty close in the ballpark.

QuietIce
05-12-09, 03:21 PM
Based in the vertex/geometry shader count I expected more from my 8400GS. As it turns out, the pixel shader count, along with the core clock, seems much closer to a performance indicator. For most cards that distinction doesn't make a difference since they all have the same vertex/geometry/pixel shader count ratio.

As a very rough calculation for the 8xxx (and probably 9xxx as well) the pixel shader count * core clock / 4 ±10% gets you in the ballpark for RAC. That's the best I could do with a formula and it's really not very good. The GTX ends up with a 3 divisor instead of 4 - again, a very very rough calculation ...

Voodoo Rufus
05-12-09, 03:41 PM
Remember that the core clock and shader clocks are different.

gpureview.com lists the shader clocks as well as core clocks for cards.

nzaneb
05-12-09, 08:16 PM
Got numbers on those 35-45 CS units? I've got a slew of them (especially 38-39 CS units) that both GPU and CPU's crunch so I thought everybody else got lots of them as well? I don't think the GPU has crunched any 54 CS units ...

CPU seconds for those units are anywhere from 347 to 370. That is all that the 8600GT seems to get. 54.44 claimed credit units, granted credit is usually 39, 46, or 54.

QuietIce
05-12-09, 10:26 PM
Remember that the core clock and shader clocks are different.

gpureview.com lists the shader clocks as well as core clocks for cards. Yea, I looked at the shader speeds as well - but the shader clocks on the 8800's are both running at core clock x2.5 so those numbers would fall the same. The GTX285 is different but the multiplier is even less, which means the numbers would drop instead of go up like they need to.

I've been using the Wiki for hardware data. It seems comprehensive and it's updated as new cards come out:
http://en.wikipedia.org/wiki/Comparison_of_NVIDIA_Graphics_Processing_UnitsCPU seconds for those units are anywhere from 347 to 370. That is all that the 8600GT seems to get. 54.44 claimed credit units, granted credit is usually 39, 46, or 54. And all those units from 39-54 take that same (real-time BOINC, not CPU) compute time? Well, that makes it hard to compare ... :-/

nzaneb
05-16-09, 09:14 AM
Ok... now that I'm hooked on GPU crunching, and nVidia is the only CURRENT solution. I am looking for some suggestions.
Here's what I have (GPU wise):
9800GX2
2 -4850 Top
My motherboard will not handle SLI setups (CFX only), but it will allow multiple nVidia GPU crunching with Dummy plugs.
Here are my options:

1. Sell both 4850's and buy another 9800GX2 = about an even trade
2. Sell the 4850's and the 9800GX2, plus some cash for a GTX295 = $150 more
3. Sell the 4850's and buy 2 GTX 260's = $50-100 more

I'll see the most Stream processors (688 total) from the addition of 2 GTX260's, but cooling and space might be an issue. The second place for SP's (512 total) will be trading straight up for another 9800GX2. Last place for SP's, but the best for power and cooling will the GTX 295 (480 total).

Just looking for everyones thoughts...

Codeman05
05-16-09, 09:49 AM
FYI, the board does not have to support SLI to have multiple crunchers. I have 2 8800GTs in a P45 board :D

I think I'd say go with a pair of GTX260s. That will also give you great gaming performance for when your not crunching. But another GX2 wouldn't be bad either if you can do a straight up swap.

nzaneb
05-19-09, 12:42 PM
Well I pulled the trigger on 2 EVGA GTX260 216's today:) Plus I traded one of my 4850's for a 9800GTX+ to replace the 8600GT in my P4 rig. Now I've just got to figure out where to put all these video cards.
I think I'm going to try the 260's and the 9800GX2 in my main cruncher, and then put the 9800GTX+ in the P4... or maybe I should switch the 9800GX2 and the 9800GTX+. either way I'm going to have an extra 8600GT, hmmmmm.

QuietIce
05-19-09, 12:49 PM
Time to look through the Classifieds and find a cheap PCIe system - preferably dual or better PCIe ... :)

nzaneb
05-19-09, 01:14 PM
Time to look through the Classifieds and find a cheap PCIe system - preferably dual or better PCIe ... :)

That's what I'm doing right now :)

I'm also trying to find Torin's dummy plugs for sale, but can't seem to locate them anywhere.

Sir-Epix
05-20-09, 04:26 PM
That's what I'm doing right now :)

I'm also trying to find Torin's dummy plugs for sale, but can't seem to locate them anywhere.

You could always PM him.

On a side note: I do not have any video cards any any of my systems (expect for a mobile 4850 in my laptop). I am interested in maybe picking one up, but I don't know what my desktop system can currently handle.

It only has a 300 watt antec powersupply that is like 5 years old? The motherboard is an ASRock AliveNF6P-VSTA paired with a 45W X2 processor. Any suggestions? The whole unit only cost $150 bucks to setup.

nzaneb
05-22-09, 08:04 AM
You could always PM him.

On a side note: I do not have any video cards any any of my systems (expect for a mobile 4850 in my laptop). I am interested in maybe picking one up, but I don't know what my desktop system can currently handle.

It only has a 300 watt antec powersupply that is like 5 years old? The motherboard is an ASRock AliveNF6P-VSTA paired with a 45W X2 processor. Any suggestions? The whole unit only cost $150 bucks to setup.

Evidently the post was too old, so it wasn't shown :shrug:, but I got ahold of him.

Your PSU probably doesn't have any PCI-e connectors, but you could use a molex converter. Depending on the rest of your components, I would think you could handle a single 9 series GPU or slower.
Here's a PSU calculator that I use... it's pretty accurate with alot of options
http://extreme.outervision.com/psucalculatorlite.jsp

nzaneb
05-22-09, 08:14 AM
So I received and installed my 2 EVGA GTX260's. I bought the cheapest 216 55nm versions with the 576 core speed. Low and behold, last night I'm running EVGA Precision, and I notice that the cards are running overclocked (625 core). So I reset to factory defaults (thinking it kept my settings from the 9800GX2) and they WERE running at the factory defaults. Hopped on Newegg, and sure enough, they're running the exact speed as the more expensive Superclocked Editions. So I checked the boxes, but they're label as the base model. Newegg says, "We shipped according to the boxes, so they're yours". So EVGA packaged me up a nice little bonus :beer: I'm liking that company more and more, by the second. These things are FLYING compared to the 9800GX2 (which I was impressed with... at first) We'll see where my RAC on that machine goes. The 9800GX2 only netted me around 2500 increase (it seemed close to topping out) I think that has alot to do with it slowing down my CPU so much, the AP units weren't finishing as fast. I'm hoping the 260's land me in the 12k-13k RAC range for that rig.

QuietIce
05-22-09, 11:05 AM
I've used eVGA several times over the years without a problem ... :)

QuietIce
06-09-09, 11:18 AM
bump for good info ... ;)