bang for buck CUDA? advice?

QuietIce · May 1, 2009

OK, swapped x64 SSSE3 apps for AK_v8_win_SSE3 apps and changed the names in the app_info file. Changed CUDA app back to V11 (just to keep things straight!) and changed it's names in app_info. All set to swap vid cards and update SETI first thing in the morning. That'll give me all day tomorrow & Sunday to keep my eye on it.

For this rig it'll be really easy to tell what kind of output difference the apps make. It's an old, but very reliable, Opty 165 - my file server - and it's been running 24/7 for over two years now so there's a long history for comparison.

Over that time it's back-slipped slightly (changes at SETI) from 1250 RAC in late 2006 to it's current 1150-1200 RAC

(and it's on AK_v8).

I think I'm ready to go ...! :beer:

Codeman05 · May 1, 2009

sweet! Yep sounds like you should be golden, keep us posted!

nzaneb · May 1, 2009

You're a life saver man! Now I'm anxious to try it out

Thanks!

EDIT: It appears to be working MUCH faster on my sse2 rig. I'll keep an eye on it and see what kind of results I get. 9800gx2 will be soon. Thanks for the help Codeman!

nzaneb · May 5, 2009

Well I've been struggling with this 8600GT, it just didn't seem to be doing much work.
So I went back through Codeman's write up, and low-and-behold there it was staring me right in the face.

Codeman05 said:
Astropulse 500 flops is CPU FLOPS * 2.25
Astropulse 503 flops is CPU FLOPS * 2.6
Multibeam 603 flops is CPU FLOPS * 1.75
Multibeam 608 flops is GPU FLOPS * 0.2

My mistake was to use the CPU flops for the 608 calculation, not the GPU flops. So a quick recalculation, input, and WHOA! HUGE difference! CUDA units are now only taking around 46 minutes each, as opposed to several hours before.

So I have a question for Codeman or any other GPU Guru. What happens if I over estimate my GPU flop value? For instance, what if I use a .3 or .4 multiplier instead of .2, or maybe use 16 Gflops as opposed to the stated 15 Gflops? Will I get computation errors? Is there a possiblity that it will just run even faster?

Codeman05 · May 5, 2009

glad you got it fixed.

My understanding is that the flops values in the app_info are primarily used to determine your queue size and make sure you have enough work for your rig.

Feel free to try different variables. I'm skeptical of seeing much if any performance increase, but I could be wrong.

nzaneb · May 5, 2009

Codeman05 said:
glad you got it fixed.

My understanding is that the flops values in the app_info are primarily used to determine your queue size and make sure you have enough work for your rig.

Feel free to try different variables. I'm skeptical of seeing much if any performance increase, but I could be wrong.

You're probably right about the performance increase. I'm not sure that there isn't some performance gain though, from correct values. For the past few days it hardly completed 3 or 4 CUDA wu's, and since last night when I reconfigured it, it's cranked out at least 10. So something about the flop value change sped it up.

Codeman05 · May 5, 2009

right i don't doubt you there at all. The wrong values will definitely confuse the apps and screw things up as you saw initially. I'm just not sure what the tolerances are.

If you decide to play with those values, definitely post it up

nzaneb · May 5, 2009

Codeman05 said:
right i don't doubt you there at all. The wrong values will definitely confuse the apps and screw things up as you saw initially. I'm just not sure what the tolerances are.

If you decide to play with those values, definitely post it up

Over lunch, I adjusted the GPU flop value to where it was much closer to the actual computation time. So after crunch all night and morning on the last flop value change, I should have a little something to compare to, and see my if recent adjustment makes any change in the actual computation times... Now I'm anxious to get home and see how it's doing

nzaneb · May 5, 2009

No noticeable change in computation times, simply closer on the estimated finishing times. So evidently extremely low flop values have a very negative effect, but that's all.

QuietIce · May 6, 2009

We'll see if that helps my old Opty.

I changed the flops again just using the existing flops in the app_info file v the finish times as a base. I played with the numbers until finishing times were more in-line with what they should be ...

nzaneb · May 7, 2009

Well... I fired up the new 9800GX2 last night! It seems to be optimized correctly. It's slowed down my AP units on the 720, by a little bit, but that's to be expected due to the GPU's needing a little bit of the CPU. So far its gone through 130 WU's in about 19 hours. while still completing AP units in around 50,000 CPU seconds. Alot of the early CUDA WU's were shorties, but it's still running through quite a few. I can't wait to see what it does to my RAC on that Host. It's absolutely smashing the 8600GT performance in my old P4 rig... but that's to be expected

Now I want another... :screwy:

Robbman · May 7, 2009

Just based on cost per stream processor and factoring in architecture, the best bang for buck CUDA card would be a GTX 260 216. Don't know if I'd want three in a rig though...

I've seen 9800GTX+s at $100 with MIR, so those would be another very good bang for the buck. Though if the price was right, a GTS 250 would be my choice, simply becuase it only needs one 6-pin connector (three of those would be a decent cruncher)

Voodoo Rufus · May 11, 2009

I've just finished following codeman's instructions, and this thing is acting funny. It's running 8 AP units simultaneously and ripping through Cuda units like nobodies business, faster than I've ever seen it. Shouldn't it only be using 4 AP units on a quad core QX9650? A few of my AP units have ended in computation errors as well.

nzaneb · May 11, 2009

Voodoo Rufus said:
I've just finished following codeman's instructions, and this thing is acting funny. It's running 8 AP units simultaneously and ripping through Cuda units like nobodies business, faster than I've ever seen it. Shouldn't it only be using 4 AP units on a quad core QX9650? A few of my AP units have ended in computation errors as well.

Do you have Hyperthreading enabled? That would be the only reason I could imagine that 8 AP units would be going at once.
As far as the CUDA, how fast are we talking here? a few seconds, a couple minutes. It could be that it picked up a bunch of VLAR's to start with... mine did that... and it's killing each of them off as soon as it realizes what they are.

Voodoo Rufus · May 11, 2009

Hyperthreading is not available on Yorkfields, only i7s and some P4s.

Cudas are finishing normally in a couple minutes, taking only 22 seconds of CPU time.

APs are dying all at 5:25 of CPU time.

nzaneb · May 11, 2009

Voodoo Rufus said:
Hyperthreading is not available on Yorkfields, only i7s and some P4s.

Cudas are finishing normally in a couple minutes, taking only 22 seconds of CPU time.

My bad. Can you tell I'm not an Intel guy?

I've got 2 P4's running, both with HT, and I knew the I7 had it... so I just assumed all current Intels did. That's what I get for ASSuming

, so that's probably why you're getting computation errors. What optimizations did you use?

For the GPU, it may be running ok. For comparision, my 9800GX2 takes 10 minutes (give or take a minute) per WU, for each gpu; and depending on the unit as little as 60 CPU seconds.

Voodoo Rufus · May 11, 2009

I'm just using Codeman's instructions to the letter.

At first I considered that my CPU was at fault as it's running a little warmer than just doing MB units, but with ALL aborting at 5:25, it makes me scratch my head.

nzaneb · May 11, 2009

Did you go in and change the Flop values? I'm 'assuming' your proc is SSSE3X capable. I used his format, but I had to go in and replace the SSSE3X apps (and change the app_info) for use on my SSE3 and SSE2 units. I had the Flop values wrong on one and it went wacko.

Codeman05 · May 11, 2009

something is definitely off. I'm not sure that having incorrect flop values would do that, but its possible.

Looking at your errored out WU's, something is definitely wrong with the AP processing.
I'm fairly sure the QX series supports SSSE3x, but double check with cpu-z. Assuming it does, post up your app_info.xml file, more than likely there is an error inside.

Voodoo Rufus · May 11, 2009

I could not find a "p_flops" string in the client state file so I used the biggest flops number I could find. I did find a p_fpops string. Think that would work?

My cpu does support ssse3x.

bang for buck CUDA? advice?

Disabled

Member

Senior Member

Senior Member

Member

Senior Member

Member

Senior Member

Senior Member

Disabled

Senior Member

Member

Powder Junkie Moderator

Senior Member

Powder Junkie Moderator

Senior Member

Powder Junkie Moderator

Senior Member

Member

Powder Junkie Moderator

Similar threads