• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

bang for buck CUDA? advice?

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.
OK, swapped x64 SSSE3 apps for AK_v8_win_SSE3 apps and changed the names in the app_info file. Changed CUDA app back to V11 (just to keep things straight!) and changed it's names in app_info. All set to swap vid cards and update SETI first thing in the morning. That'll give me all day tomorrow & Sunday to keep my eye on it. ;)

For this rig it'll be really easy to tell what kind of output difference the apps make. It's an old, but very reliable, Opty 165 - my file server - and it's been running 24/7 for over two years now so there's a long history for comparison. :) Over that time it's back-slipped slightly (changes at SETI) from 1250 RAC in late 2006 to it's current 1150-1200 RAC :( (and it's on AK_v8).


I think I'm ready to go ...! :beer:
 
Last edited:
You're a life saver man! Now I'm anxious to try it out :) Thanks!:beer:
EDIT: It appears to be working MUCH faster on my sse2 rig. I'll keep an eye on it and see what kind of results I get. 9800gx2 will be soon. Thanks for the help Codeman!
 
Last edited:
Well I've been struggling with this 8600GT, it just didn't seem to be doing much work.
So I went back through Codeman's write up, and low-and-behold there it was staring me right in the face.
Astropulse 500 flops is CPU FLOPS * 2.25
Astropulse 503 flops is CPU FLOPS * 2.6
Multibeam 603 flops is CPU FLOPS * 1.75
Multibeam 608 flops is GPU FLOPS * 0.2
My mistake was to use the CPU flops for the 608 calculation, not the GPU flops. So a quick recalculation, input, and WHOA! HUGE difference! CUDA units are now only taking around 46 minutes each, as opposed to several hours before.

So I have a question for Codeman or any other GPU Guru. What happens if I over estimate my GPU flop value? For instance, what if I use a .3 or .4 multiplier instead of .2, or maybe use 16 Gflops as opposed to the stated 15 Gflops? Will I get computation errors? Is there a possiblity that it will just run even faster?
 
glad you got it fixed.

My understanding is that the flops values in the app_info are primarily used to determine your queue size and make sure you have enough work for your rig.

Feel free to try different variables. I'm skeptical of seeing much if any performance increase, but I could be wrong.
 
glad you got it fixed.

My understanding is that the flops values in the app_info are primarily used to determine your queue size and make sure you have enough work for your rig.

Feel free to try different variables. I'm skeptical of seeing much if any performance increase, but I could be wrong.

You're probably right about the performance increase. I'm not sure that there isn't some performance gain though, from correct values. For the past few days it hardly completed 3 or 4 CUDA wu's, and since last night when I reconfigured it, it's cranked out at least 10. So something about the flop value change sped it up.
 
right i don't doubt you there at all. The wrong values will definitely confuse the apps and screw things up as you saw initially. I'm just not sure what the tolerances are.

If you decide to play with those values, definitely post it up
 
right i don't doubt you there at all. The wrong values will definitely confuse the apps and screw things up as you saw initially. I'm just not sure what the tolerances are.

If you decide to play with those values, definitely post it up

Over lunch, I adjusted the GPU flop value to where it was much closer to the actual computation time. So after crunch all night and morning on the last flop value change, I should have a little something to compare to, and see my if recent adjustment makes any change in the actual computation times... Now I'm anxious to get home and see how it's doing :)
 
No noticeable change in computation times, simply closer on the estimated finishing times. So evidently extremely low flop values have a very negative effect, but that's all.
 
We'll see if that helps my old Opty.

I changed the flops again just using the existing flops in the app_info file v the finish times as a base. I played with the numbers until finishing times were more in-line with what they should be ... :)
 
Well... I fired up the new 9800GX2 last night! It seems to be optimized correctly. It's slowed down my AP units on the 720, by a little bit, but that's to be expected due to the GPU's needing a little bit of the CPU. So far its gone through 130 WU's in about 19 hours. while still completing AP units in around 50,000 CPU seconds. Alot of the early CUDA WU's were shorties, but it's still running through quite a few. I can't wait to see what it does to my RAC on that Host. It's absolutely smashing the 8600GT performance in my old P4 rig... but that's to be expected:) Now I want another... :screwy:
 
Just based on cost per stream processor and factoring in architecture, the best bang for buck CUDA card would be a GTX 260 216. Don't know if I'd want three in a rig though...

I've seen 9800GTX+s at $100 with MIR, so those would be another very good bang for the buck. Though if the price was right, a GTS 250 would be my choice, simply becuase it only needs one 6-pin connector (three of those would be a decent cruncher)
 
I've just finished following codeman's instructions, and this thing is acting funny. It's running 8 AP units simultaneously and ripping through Cuda units like nobodies business, faster than I've ever seen it. Shouldn't it only be using 4 AP units on a quad core QX9650? A few of my AP units have ended in computation errors as well.
 
I've just finished following codeman's instructions, and this thing is acting funny. It's running 8 AP units simultaneously and ripping through Cuda units like nobodies business, faster than I've ever seen it. Shouldn't it only be using 4 AP units on a quad core QX9650? A few of my AP units have ended in computation errors as well.

Do you have Hyperthreading enabled? That would be the only reason I could imagine that 8 AP units would be going at once.
As far as the CUDA, how fast are we talking here? a few seconds, a couple minutes. It could be that it picked up a bunch of VLAR's to start with... mine did that... and it's killing each of them off as soon as it realizes what they are.
 
Hyperthreading is not available on Yorkfields, only i7s and some P4s.

Cudas are finishing normally in a couple minutes, taking only 22 seconds of CPU time.

APs are dying all at 5:25 of CPU time.
 
Hyperthreading is not available on Yorkfields, only i7s and some P4s.

Cudas are finishing normally in a couple minutes, taking only 22 seconds of CPU time.

My bad. Can you tell I'm not an Intel guy?:) I've got 2 P4's running, both with HT, and I knew the I7 had it... so I just assumed all current Intels did. That's what I get for ASSuming :), so that's probably why you're getting computation errors. What optimizations did you use?

For the GPU, it may be running ok. For comparision, my 9800GX2 takes 10 minutes (give or take a minute) per WU, for each gpu; and depending on the unit as little as 60 CPU seconds.
 
I'm just using Codeman's instructions to the letter.

At first I considered that my CPU was at fault as it's running a little warmer than just doing MB units, but with ALL aborting at 5:25, it makes me scratch my head.
 
Did you go in and change the Flop values? I'm 'assuming' your proc is SSSE3X capable. I used his format, but I had to go in and replace the SSSE3X apps (and change the app_info) for use on my SSE3 and SSE2 units. I had the Flop values wrong on one and it went wacko.
 
something is definitely off. I'm not sure that having incorrect flop values would do that, but its possible.

Looking at your errored out WU's, something is definitely wrong with the AP processing.
I'm fairly sure the QX series supports SSSE3x, but double check with cpu-z. Assuming it does, post up your app_info.xml file, more than likely there is an error inside.
 
I could not find a "p_flops" string in the client state file so I used the biggest flops number I could find. I did find a p_fpops string. Think that would work?

My cpu does support ssse3x.
 
Back