PDA

View Full Version : Stumped: what did I forget to do?


Pete Church
03-18-11, 02:16 AM
So i after I blew up my old CPU, i got one set back up. Using the same OC profile as before. Turned hyper threading off (hate HT now). Same 2 GTX460s, set at 2 tasks/card. 5 cores are crunching, the 6th is dedicated to the cards. The cards are OC'd the same, and running at about 95% GPU usage.

I can't think of anything that I've done in the past that is different from now, but my GPU tasks are taking 20-22 minutes to finish and it takes 18 before. Sucks because the CPUs are finishing a normal WU in like 25 minutes, so no getting as much benefit out of the cards as I should.

Suggestions of something for me to look at?

Careface
03-18-11, 07:33 AM
Hmm.. I'd check to see if the longer WU have slightly higher/lower AR, because CUDA cards don't crunch them very well (something about those ranges are slightly less fft intense.. or maybe I'm thinking something else :p)

On the flipside, is your RAC actually dropping at all? Are the longer WU claiming more credit?

Can't really think of anything else at the moment.. it's 1:30am and I'm tired :comp:

Pete Church
03-18-11, 11:06 AM
Yeah, thought of that too, but i use the BOINC rescheduler to make sure all the VHAR and VLAR WUs get pushed to the CPU instead. So i'm only getting the normal angled WUs sent to GPU.

It took a while to get a cache built up, it looks like SETI was down for power outage just as I was coming online. So my RAC is slowly coming back up. It should come back up to nearly normal as I've stopped working on Rosetta on this rig as well (want to get my RAC back up as fast as possible).

This isn't going to kill my RAC completely, but it is significant. At roughly 3 minutes longer than before, I think that works out to be about 48 WUs per day that I'm losing out on.

Tyrinon
03-18-11, 10:41 PM
Not sure if it is much of a consolation, but my i7 gpu times have also increased a couple mins on average (from my quick glances at #'s). I figured it was the current batch of wu's.

Careface
03-19-11, 05:14 AM
I'm getting the same thing on my i7 GPU WU.. Usually they sit at around 12mins, but now they're averaging 13-14mins. I'm attributing this to the fact that the AR for these WU are slightly lower (note I don't mean VLAR/VHAR, I mean slightly lower :)); sitting in the 0.405-0.41 range.. Remembering that 0.42-0.44 is the "sweet spot", which is where we will see the highest performance. The further away from these numbers in either direction (that is, the telescope either moving slower or faster across the sky due to direction, rotation of the Earth etc.) = lower performance.

Taking a look at your results, I'm going to re-suggest my original post, and say it's due to slightly lower AR

23.4mins, AR = 0.369001 (http://setiathome.berkeley.edu/result.php?resultid=1843043527)
23.0mins, AR = 0.369017 (http://setiathome.berkeley.edu/result.php?resultid=1843043522)
27.1mins, AR = 0.322816 (http://setiathome.berkeley.edu/result.php?resultid=1842856496) (note the 0.045 lower AR resulting in higher processing time)
26.5mins, AR = 0.323099 (http://setiathome.berkeley.edu/result.php?resultid=1842856495)

On the flip side,

19.4mins, AR = 0.414548 (http://setiathome.berkeley.edu/result.php?resultid=1842485601)
19.7mins, AR = 0.414594 (http://setiathome.berkeley.edu/result.php?resultid=1842485547)
20.4mins, AR = 0.414410 (http://setiathome.berkeley.edu/result.php?resultid=1842431810)
20.1mins, AR = 0.415845 (http://setiathome.berkeley.edu/result.php?resultid=1842384236)

I can't use rescheduler on this rig for some reason, but I did notice a week or two back that a 0.01 AR WU accidentally made its way onto my GPU.. not only did I experience major lag in any application I ran, I didn't notice this was the problem until I saw that the card had been crunching the WU for a little over an hour, and was about 20% through. I mean, I knew performance for VLAR/VHAR were rather poor on CUDA, but I didn't realise quite how bad it was.. We're talking the entire PC was lagging, even basic tasks like dragging windows across the screen was jerky and unresponsive >_<

Pete Church
03-19-11, 11:12 AM
Ok, you may be on to something. I looked at the average credit of the GPU WUs that I've put up over the last few days, and it looks like like the average credit of the GPU tasks has gone up from what I remember it being several weeks ago. That and I'm not the only one that has seen increased times, the lower AR may be just enough to beat up the cards a few more mins/WU.

I normally only pay attention to the VHAR/VLAR tasks and never noticed that even within the normal WU that the angle could be varied enough to sway the timing. Sounds like there's not much to worry about, just keep an eye out and see where my RAC ends up in a couple weeks.

Thanks guys, was a little baffled and somewhat worried.

Tyrinon
03-19-11, 06:24 PM
So... how do you set up the rescheduler to not get those longer units for GPUs? My i7 has been down since yesterday. Swapped the PSU and added another gpu and the whole system has been sluggish. I am working the overclock settings b/c the bios reset after the swap. What a pain. Anyway, shut the system off overnight and all day until a few mins ago, gpu work units have an estimated crunch time of almost an hour! Running two tasks per card on three gtx 460 1gb. Figured I had a bad o/c, and I am priming to check it. :(

Saaby900t
03-19-11, 08:47 PM
tyrinon I think that you should check the NV drivers. When mine crash the WU end up taking hours.

eaglescouter
03-19-11, 08:58 PM
My GPU went to an hour per unit, I reinstalled the nVidia drivers, re-optimized BOINC, including editing the number of gpu's and cpu's. That brought me back to the 12 minute area.

Careface
03-19-11, 09:30 PM
So... how do you set up the rescheduler to not get those longer units for GPUs? My i7 has been down since yesterday. Swapped the PSU and added another gpu and the whole system has been sluggish. I am working the overclock settings b/c the bios reset after the swap. What a pain. Anyway, shut the system off overnight and all day until a few mins ago, gpu work units have an estimated crunch time of almost an hour! Running two tasks per card on three gtx 460 1gb. Figured I had a bad o/c, and I am priming to check it. :(

If your GPU accidentally started crunching a VLAR/VHAR, then that will cause system-wide lag.. Recently-ish, since the VLARKill app fell into disrepute, the SETI team have affixed .vlar to these WU, and the latest lunatics app ignores files with these affixes, but sometimes an unmarked VLAR gets through. I've not been able to get rescheduler working on windows 7, but hopefully someone can point us in the right direction!

Taking a look at some of your 1hour+ GPU WU, there doesn't seem to be much to tell.. I could only suggest checking GPUz to see if your card is sometimes dropping back to low-power 3D/2D mode every now and then (this wouldn't show up in the stderr that shows up on the site; as far as I know, it only checks the clocks of the card upon loading the WU into the GPU)

tyrinon I think that you should check the NV drivers. When mine crash the WU end up taking hours.

+1, I'd definitely check this. IIRC, when the drivers crash, the card is restarted in low-power mode until next reboot, even if you use RivaTuner to set clocks.

My GPU went to an hour per unit, I reinstalled the nVidia drivers, re-optimized BOINC, including editing the number of gpu's and cpu's. That brought me back to the 12 minute area.

Out of interest, which card are you talking about? Your 285? If so, what drivers are you using (SETI site says 260.99)? I'm using 266.77, and I've got some annoying problems with non-SETI stuff that I've ruled everything except Windows 7 (only upgraded a month ago, from XP Pro), or NV drivers.. Perhaps you could shed some light :)

Tyrinon
03-19-11, 10:53 PM
I think it is a good idea for reinstalling the driver and re-optimizing (should have been a no brainer..., oh, well, last few days have been long ones). As for the cards in question, 3x gtx 460 1gb units. Will give that a try and report back. Hopefully the system does not abort all the wu's...

*edit*

My cpu o/c tested fine at 3.7, and my gpu's are running stock and always have, and after reinstalling the vid drivers, re-optimizing SETI, this thing is still a dog. Running one wu per gpu and looking at 30-40 mins ea. I must have done something wrong when I added this 3rd gpu b/c this thing is horribly slow. Running xp 64b, 6gb ram, and raid-0 w/7.2k sata2 hd's, on an Asus P6T, and the xp splash animation would flash/cycle 3-5 times and I would be at my desktop. Now it cycles 29-30 times, then has a black screen for a few seconds, then sits at the loading screen about another 20 seconds. It takes 20-30 seconds for ie8 to load or do anything. I will pull the plug on this for now and take it apart tomorrow to reseat everything to see if that makes a diff. I will put in my two old 460s to see if the issue goes away, and if so, swap one of those out for the new one to see if the issue comes back. Right now, pci-e is running 16/16/4. I do not see the 4x having an impact on everything else, but this is the first time I have run 3x vid cards in one machine... IDK, maybe I somehow hosed my raid. Will let you guys know what I find out, if anything, tomorrow. Night all.

Pete Church
03-20-11, 09:20 AM
There are several rescheduler's out there, this is the rescheduler (http://www.efmer.eu/forum_tt/index.php?topic=428.0)that I use. The posting has links to the latest stable version (works on windows 7 for me). It also has fairly detailed description of the various options.

eaglescouter
03-20-11, 11:07 AM
Out of interest, which card are you talking about? Your 285? If so, what drivers are you using (SETI site says 260.99)? I'm using 266.77, and I've got some annoying problems with non-SETI stuff that I've ruled everything except Windows 7 (only upgraded a month ago, from XP Pro), or NV drivers.. Perhaps you could shed some light :)


My problem was on the GTX570 on XP.

Careface
03-20-11, 05:23 PM
*edit*

My cpu o/c tested fine at 3.7, and my gpu's are running stock and always have, and after reinstalling the vid drivers, re-optimizing SETI, this thing is still a dog. Running one wu per gpu and looking at 30-40 mins ea. I must have done something wrong when I added this 3rd gpu b/c this thing is horribly slow. Running xp 64b, 6gb ram, and raid-0 w/7.2k sata2 hd's, on an Asus P6T, and the xp splash animation would flash/cycle 3-5 times and I would be at my desktop. Now it cycles 29-30 times, then has a black screen for a few seconds, then sits at the loading screen about another 20 seconds. It takes 20-30 seconds for ie8 to load or do anything. I will pull the plug on this for now and take it apart tomorrow to reseat everything to see if that makes a diff. I will put in my two old 460s to see if the issue goes away, and if so, swap one of those out for the new one to see if the issue comes back. Right now, pci-e is running 16/16/4. I do not see the 4x having an impact on everything else, but this is the first time I have run 3x vid cards in one machine... IDK, maybe I somehow hosed my raid. Will let you guys know what I find out, if anything, tomorrow. Night all.

Hmm. I've not messed around with SLI since the 6600GTs, but I hear people on the SETI forums saying that SLI does lower performance on SETI (like, rather than having 2*100% performing GPUs, it'll be more like 2*80%, 3*70* or similar) so still an overall speed up, but WU times will probably icrease... but I'd doubt it would cause as much slowdown as yours has been.. :shrug: not sure sorry! I'm about to head off to uni, so I'll mull it over while I'm in biological physics :p

There are several rescheduler's out there, this is the rescheduler (http://www.efmer.eu/forum_tt/index.php?topic=428.0)that I use. The posting has links to the latest stable version (works on windows 7 for me). It also has fairly detailed description of the various options.

Ahh, thanks for that :) I was using 2.4V and it was trashing all my WU.. guess there was some bugs fixed for 2.5V

My problem was on the GTX570 on XP.

Ah.. hmm. I might just do a rollback anyway - I'm just reluctant to go too far back since the CUDA 4.0 SDK is coming out, which should boost performance on all CUDA cards (particularly Fermi based, for all you 3xx/4xx/5xx owners :)), but they require rather new drivers.. so I guess we will see xD

Tyrinon
03-20-11, 10:40 PM
Well, after about 7.5hrs of troubleshooting, still in the same boat.

Swapped PSUs to my old psp750 for the testing up to two vid cards and triple dimm, then went to the Sparkle 1250 to test three vid cards and ram (I tested the Sparkle in another system to make sure it worked, since it was new, and it worked fine).

I tested each vid card separately, then in pairs, then all three, all fine. I tested each dimm, and in each orange memory slot (only ones I have used), and each worked fine. I went to dual/tripple memory, and they tested and worked fine (did the testing with one vid card installed, then two vid cards). It was here when I went back to the Sparkle.

Up to this point, all was fine until running three vid cards and three dimms. No slow downs running three vid cards and two memory modules until I add the third dimm, then the system goes to crap (to put it nicely). If I leave in the three dims, and pull a vid card, the system works fine again.

Am I hitting some kind of bottle neck in the system? This happens if I run the cpu at stock and oc settings. I am at a loss on how to correct this, if it is even possible. :shrug:

Guess I will let the system run with two dimms for now and see what happens. Shame there are no gpu wu's out there to crunch on right now...

Time to take a break from this thing before I go :screwy:

lol, then again, I may already be there. :D

Sys specs for those interested:

i7 920 c0
Asus p6t
Gskill ddr3 set 2gbx3
Sparkle 1250w
3x msi gtx 460 1gb (driver 266.58)
2x sata 500gb 7.2k rpm drives running raid 0
Phillips dvd burner (ide)
a few misc system fan.
...and I am running win xp 64b w/all updates.

I have not checked for an updated mobo bios, but will look into that and see if anything is avail.

Careface
03-21-11, 12:32 AM
Interesting.. I would try looking for an updated BIOS, but it seems to be a rather specific problem.. No harm in trying, though :) Also, I would check out the 267.24 beta drivers. There's a huge list of bug fixes, including a lot of multi-GPU fixes, so maybe your problem got caught in the crossfire (shame we're not talking about AMD here :p ) when debugging and is now fixed.

As for the bottleneck thing, that just seems so weird to have 3*GPU, 2*DIMM/2*GPU, 3*DIMM work, but 3*GPU, 3*DIMM not work.. I don't know much about the P6T mobo, but perhaps putting the RAM in the black slots (are these the "compatibility" slots? Or are they the orange ones?) might help.

EDIT: as for my issue, I changed from 266.77->261.00 and it made no difference to my problem; so I'm guessing it's probably just a Windows 7 thing, given that the issue wasn't present on Win XP using 261.00 drivers. Oh well :(

Tyrinon
03-26-11, 10:41 PM
Well, just wanted to give an update on Challenger. I finally had the time tonight to update the bios from version 202 to 1404 (yes, a difference of 1202!). lol First glance, things appear to be working very well with the three 460's and 3x2gb ram. Window's boots up after about six bars, and my internet browsing is where I expect it to be.

If this had not worked, I would have researched if the MSI cards being used are pci-e 4x compatible. I do not see how that would affect everything, but who knows. I did a preliminary search and did not turn up much data on it.

If things go belly up, I will just take one of these cards and stuff it into another system.

Now I can look forward to getting a system in to the top 50 for SETI (unless it goes belly up). :burn: