PDA

View Full Version : Help! 1 of 4 clients WAAAYY underproducing!


torin3
03-25-09, 07:15 AM
Ok, I just got a new donated motherboard (thank you) that has more than 1 PCI-E slot. It is an EVGA 790i MB. I install XP 32 on it after reformatting the hard drive. I install the MB drivers, NIC drivers, anti-virus, Rivatuner, Folding@home GPU2, and 182.08 drivers and that is it. It has 2 GTX 295 cards in it. They all have the same settings except machine ID. They are running cool, and the whole machine is pulling about 550 Watts.

3 of the clients (gpu 0/gpu 2/gpu3) are currently pulling about 6.5K ppd each. However, client 2 (gpu 1) is only pulling about 1.8k ppd.

I can't figure out why though. It occasionally starts out producing the same ppd as the others, but usually drops back down to the 1.8K level within 15 minutes.

When I get home tonight, I'm going to shut down, swap the cards, and see if it moves with the card or stays on the same client.

If it moves with the card, I take it I should RMA it?

If it stays with the client, are there any suggestions as to what I can do to troubleshoot it? This is driving me crazy :bang head :bang head :bang head

Oh yeah, and I don't know if it is related, but when I tried to copy my media files back to the computer from an external USB drive, it locks the computer up unless I do it in safe mode. Also, occasionally the computer just locks up.

Adak
03-25-09, 07:27 AM
Sounds like either the video card or that mobo slot, has a defect. What I usually do is strip the mobo down to the bare essentials and try to recreate the condition that causes it to lock up. In this case, copying media files from the usb drive.

Then move the same video card, over to slot #2, and re-test. If it locks up on two slots, with two video cards that are known to be good in a similar mobo, then it's the mobo.

Otherwise, you have to eliminate the video cards one by one, until you find the culprit. Remember to test each board, in each slot, until it's clear where the problem is.

Very frustrating! Good luck.

ozzlo
03-25-09, 07:32 AM
This is just a random idea without even researching the board or if my theory is even possible, but could this possibly be one of those boards that runs 16x/8x/8x... what if for some reason it was running 16x/16x/1x?

ok it's 5:30 in the morning... I think it's time for bed... my theory's are starting to sound redicilous

jonspd
03-25-09, 07:51 AM
my theory change gpu 0 machine id to be gpu 1 and the vise verse.

if 1 stop getting low ppd and 0 starts having very low ppd then I think it would be software if not it's hardware make sence?

StaTek
03-25-09, 07:57 AM
I agree with Adak's approach.

You could also open up the hardward monitoring on rivatuner and see if you notice anything when the drop happens...like the shader dropping to 2d levels.

I haven't heard of a drop in production before and not getting EUEs after it...you sure are getting wierd stuff with those 295s. :-/ Are you in SLI mode? Or do you have the 'do not use multi-gpu mode' marked under the nvidia control panel?

For the random lockups, I'd say look into raising the nForce SPP in bios. Or voltage levels on the memory...some brands like more powah (like OCZ). If that doesn't do the trick I'd read up on raising the nForce MCP.

Computekinc.us
03-25-09, 09:22 AM
I have the same problems using Rivatuner or Nvidia tools for overclocking after 2+ cards. Did you overclock? Did they work at stock clocks without Riva or Nvidia tools? I know when I loose a card it goes -50% or so, then I remove all overclocking tools and it's fixed. Hopefully this may be some help. I am going to just bios mod all of the cards and see if they work correctly. :bang head

Computekinc.us
03-25-09, 02:47 PM
I just reflashed all 4x GTX260 /216 to 660 CLK - 1533 SHDR - 1149 Mem and 100% fans on. I uninstalled all of the tuner software (Riva/Nvidia tools). So far is working great. I will let you know if it eue's or locks.

StaTek
03-25-09, 03:06 PM
Gratz Computekinc.us! How'd you reflash em? I have never been brave enough to look into it.

Surferseth
03-25-09, 04:22 PM
Torin,
what is CPU utilization looking like on the system? Is it possible that there just isn't enough CPU to push all 4 of the GPU clients?

For the USB media copy thing... I know the 780i and 790i chipsets had some issues with media files with early revs of their BIOS and drivers. Have you bios flashed up to P08? What Nvidia nforce drivers are you running? 15.25?

torin3
03-25-09, 05:30 PM
Sorry I didn't respond earlier...it was a busy day at work.

This is just a random idea without even researching the board or if my theory is even possible, but could this possibly be one of those boards that runs 16x/8x/8x... what if for some reason it was running 16x/16x/1x?

ok it's 5:30 in the morning... I think it's time for bed... my theory's are starting to sound redicilous

According GPUZ, all 4 GPUs have are running at 16x

my theory change gpu 0 machine id to be gpu 1 and the vise verse.

if 1 stop getting low ppd and 0 starts having very low ppd then I think it would be software if not it's hardware make sence?

Makes sense, and I tried it, and it follows the hardware.

I agree with Adak's approach.

You could also open up the hardward monitoring on rivatuner and see if you notice anything when the drop happens...like the shader dropping to 2d levels.

I haven't heard of a drop in production before and not getting EUEs after it...you sure are getting wierd stuff with those 295s. :-/ Are you in SLI mode? Or do you have the 'do not use multi-gpu mode' marked under the nvidia control panel?

For the random lockups, I'd say look into raising the nForce SPP in bios. Or voltage levels on the memory...some brands like more powah (like OCZ). If that doesn't do the trick I'd read up on raising the nForce MCP.

I'm not in SLI mode (as confirmed by GPUZ, and I've got the "do not use multi-gpu" checked. They are overclocked, and I haven't tried turning it off just yet, but I'll probably try that before going to bed tonight and seeing what the results are tomorrow. The motherboard doesn't seem to crash with just folding, so I'm going to try and see if I can isolate it to the cards. When I'm sure which card it is, I'll try the large data transfer with the card out. If so, then I'll work on the motherboard for stability.

I have the same problems using Rivatuner or Nvidia tools for overclocking after 2+ cards. Did you overclock? Did they work at stock clocks without Riva or Nvidia tools? I know when I loose a card it goes -50% or so, then I remove all overclocking tools and it's fixed. Hopefully this may be some help. I am going to just bios mod all of the cards and see if they work correctly. :bang head

That looks like it might be a good solution if it seems to be an OC related problem. I'll probably PM you if it looks like modding the BIOS will fix it.

Torin,
what is CPU utilization looking like on the system? Is it possible that there just isn't enough CPU to push all 4 of the GPU clients?

For the USB media copy thing... I know the 780i and 790i chipsets had some issues with media files with early revs of their BIOS and drivers. Have you bios flashed up to P08? What Nvidia nforce drivers are you running? 15.25?

CPU is running at about 0-3% per core, and I'm running over 90% idle all the time.

I just got the board Friday, I haven't had a chance to check the BIOS and the driver versions yet. But they are on the agenda.

Thanks again for the help everybody. I'll keep you updated

Surferseth
03-25-09, 05:40 PM
790i motherboard has the following:
PCI Express 2.0 x16 2
PCI Express x16 1

Not sure if this is at all related.

Computekinc.us
03-25-09, 07:14 PM
Gratz Computekinc.us! How'd you reflash em? I have never been brave enough to look into it.

You can use GPU-Z to copy your bios off to file. Then I have a utility that can edit the bios file - Change Core / Mem / Shader / Fan speeds also can Vmod to 1.18v on some cards and get real high OC. Then I use a USB pen drive to boot dos and use Nvidia flash tool to rewrite the bios. Then I cross my fingers say a prayer and hit enter. PM if you want the files and info if you think you might want to play with it. I have XFX Black Edition and Standard Bios backups if that may be helpfull also.

torin3
04-06-09, 07:25 AM
Well, when physically swapping the cards, the loss of production stays with the motherboard slot. So it isn't the cards.

I flashed the bios to the most recent version. Still has the same problem.

I'm going to check and see if the motherboard drivers have a more current version. If so, I'll install those.

If they don't work, or I have the latest version, would it be RMA time?

jonspd
04-06-09, 08:47 AM
that is a 3 slot board right?

don't use the bad slot :D

ChelseaOilman
04-06-09, 08:50 AM
If they don't work, or I have the latest version, would it be RMA time?
I don't think it's RMA time, it's just tricky setting these GTX295 cards up. I think your running the best OS choice for these cards. People seem to be having less issues setting up under XP versus Vista.

Did you go in nVidia control panel and disable SLI? Did you extend the desktop on each card?

There's a long thread over at the FF you should read through.
2 GTX295's installed, UNSTABLE_MACHINE issues (http://foldingforum.org/viewtopic.php?f=52&t=7874&start=150)

jaak ennuste seemed to get things figured out by page 11 of that thread.

torin3
04-06-09, 09:08 AM
that is a 3 slot board right?

don't use the bad slot :D

Tried that. The bad slot is slot 1. When I have cards in slots 2 & 3, and boot up, a BIOS message pops up and tells me it can't run SLI in that configuration, move the video card. Then stops, with no way to get out of that message without rebooting. Wash, Rinse, Repeat... :bang head

I don't think it's RMA time, it's just tricky setting these GTX295 cards up. I think your running the best OS choice for these cards. People seem to be having less issues setting up under XP versus Vista.

Did you go in nVidia control panel and disable SLI? Did you extend the desktop on each card?

There's a long thread over at the FF you should read through.
2 GTX295's installed, UNSTABLE_MACHINE issues (http://foldingforum.org/viewtopic.php?f=52&t=7874&start=150)

jaak ennuste seemed to get things figured out by page 11 of that thread.

Yes, SLI is disabled. I don't have to extend the desktops because I'm using the force GPU flag.

The thread doesn't really help me because the client is running and I'm not getting UNSTABLE_MACHINE errors...it is just running really slow.

jonspd
04-06-09, 09:11 AM
rma may end up being you best option then.

so you have to have a card in slot 1 and slot 1 is the bad slot?

torin3
04-06-09, 09:18 AM
rma may end up being you best option then.

so you have to have a card in slot 1 and slot 1 is the bad slot?

Yep, going from close to the CPU to further away, there are 3 slots. There is a card in slot 1 and slot 3. Slot 1 is the one that is having the problem. If I move the card from slot 1 to slot 2, I get the error message.

jonspd
04-06-09, 09:30 AM
have you tried it in slot 2 only without slot 1 and slot 3 populated?

torin3
04-06-09, 09:37 AM
have you tried it in slot 2 only without slot 1 and slot 3 populated?

No I haven't. I'm not sure what I'll be testing with that though.

jonspd
04-06-09, 09:44 AM
the error you receive without a card in slot 1 might have to do with bios settings.

I know a few board ask for pci-e priority or something like that.

ChelseaOilman
04-06-09, 09:46 AM
If you run with just one card installed in your machine, installed in slot one, do both GPUs fold properly? All screensavers turned off? Is one GPU dropping down to 2D mode? I would try each card individually before thinking of RMAing anything.

torin3
04-06-09, 09:54 AM
the error you receive without a card in slot 1 might have to do with bios settings.

I know a few board ask for pci-e priority or something like that.

I did look for settings like that in the bios and didn't find them, but yeah, I'll try it with one card in slot 2 and see if I get any errors.

If you run with just one card installed in your machine, installed in slot one, do both GPUs fold properly? All screensavers turned off? Is one GPU dropping down to 2D mode? I would try each card individually before thinking of RMAing anything.

I'll try the single card in slot 1 and see what I get. I do have no screensaver, but I'll check and make sure that it doesn't power down the monitor ever either.

torin3
04-07-09, 08:22 AM
Ok, I tried with cards in slots 1 & 2.
Bios warning (move card for SLI function), can't boot.

I tried single card in slot 2.
Bios warning (move card from slot 2 or 3 for NON-SLI function), can't boot.

Didn't try single card in slot 3 based on the above warning.

Single card works fine in slot 1. Needed to turn off SLI/PhysX again after booting up this way. No problems...and after 20 minutes...no slowdown.

Okayyyy.....

Go back to cards in slots 1 & 3. Boots up. Is already out of SLI/PhysX mode. (and it was out before I started this...one of the things I've been repeatedly checking). And when I got up this morning, it had been at full production all night.

NO SLOWDOWN!

:eek::confused:

Ok. Either it needed SLI turned off in with a single card in before it would really truly fully be off with 2 cards in....or it is another case of computer voodoo.

Computer Voodoo: Doing the same thing over and over again with a computer until it suddenly works right.

The thing that worries me about Computer Voodoo, is it awfuly close to this:

Sociopath: Someone who does the same thing over and over again, but expects different results.

Zerix01
04-07-09, 07:20 PM
Sociopath: Someone who does the same thing over and over again, but expects different results.

Hey I work with people like that....

I mean.....

Computer Voodoo: Doing the same thing over and over again with a computer until it suddenly works right.

Hey I work with people who do that!

Zerix01
04-07-09, 07:29 PM
Ok. Either it needed SLI turned off in with a single card in before it would really truly fully be off with 2 cards in

Was that the same card that has always been in slot 1? Maybe the first card acts like a controller for the rest of them. If it was last set to non SLI then the rest default to that. Move another card in to its slot and it defaults to SLI mode and is now the controlling card.

Just a theory.

Have you added the third card back yet?

torin3
04-07-09, 07:49 PM
Was that the same card that has always been in slot 1? Maybe the first card acts like a controller for the rest of them. If it was last set to non SLI then the rest default to that. Move another card in to its slot and it defaults to SLI mode and is now the controlling card.

Just a theory.

Have you added the third card back yet?

Actually, I'm not sure for the final configuration, but prior to that, both cards were in slot 1.

I don't have a 3rd card in this machine yet.

Also, it did lock up today, but I think that was caused by AVG. It is set to run at noon, and it locked up at 12:01. I've left it on, but since it doesn't get used to browse websites, and I have a hardware firewall, I've turned off the daily machine scan.