• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

GPUs seem to stop working without a reason when system is not idle?

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

txus.palacios

Member
Joined
Dec 10, 2011
Location
Cádiz (Spain)
Dear DC gurus:

Title says it all. Sometimes, when I'm working with my computer, and I have f@h on, CPU keeps working without a problem, but without a reason, the GPUs seem to stop working. They still report Running if I check the FAHv7 client, but if I check Afterburner, GPU usage is 0%.

Any tips?

Thank you very much,
a fellow folding@home user.
 
Watch the temps on the card as it's folding - especially right before you expect it to stop working.

That would be my first guess - the card is overheating, and shutting itself off. (Note that I am not the gpu guru however.) :D

Are your system administration error logs showing any events relating to this?
 
Watch the temps on the card as it's folding - especially right before you expect it to stop working.

That would be my first guess - the card is overheating, and shutting itself off. (Note that I am not the gpu guru however.) :D

Are your system administration error logs showing any events relating to this?

GPUs are watercooled, they barely reach the low 60s.

Nothing related to f@h that I can see in the logs. Only Source engine crashes (sometimes, when I close TF2 using the console, it says that hl2.exe crashed).
 
That's probably a driver problem - which is rather the bane of GPU folding set up. Some work fine, some don't work at all, some work only some of the time, etc.

Is this happening on the GTX-470's?

Answer that, and I'll ask a more current gpu folder to take a look at the problem.
 
I've sent the PM to a couple of our more knowledgeable gpu folders, but the reply may be slow, since it's Easter.

What driver are you using?

Ah! This is a fermi card :bang head They are special, let me check on that.

This isn't Fermi specific, but it may include Fermi cards as well:

First, update your folding client to the latest version of V7. (7.1.52) That alone, may fix it. Installer is here:
http://folding.stanford.edu/English/Main

click on the Windows "Download" button to d/l the installer.

Second, if that doesn't solve it, try this:
In the past, support for specific GPUs was built into the client. We are working on ways to automatically update this information more easily within the v7 client to support new GPUs, such as the Kepler GPUs which have just came out. While the automatic update isn't ready yet, here is how one can manually do this:

1) Download the GPUs.txt file from
https://fah-web.stanford.edu/file-releases/public/GPUs.txt


2) Copy the downloaded GPUs.txt file to the client's run directory. The run directory is also called the data directory. It's the same location as the 'client.db' file. In Windows there is a link to this directory in the start menu.

3) After installing the file you must restart your client.

The client has a built-in GPUs.txt which it will use if it does not find one on disk. The client will print a message to the log, very early on, when it reads GPUs.txt from the run directory.

In a future version of the v7 client, this will happen automatically, but for now, we are updating this file on our web site and donors can do this update manually for new hardware.

If the above doesn't work, you'll have to get the right driver info from one of the gpu folders.
 
Last edited:
Right now, I'm using the modded ini 301.10 drivers, because of all the 29x BS (broken SLi, f@h didn't even want to load a WU... et caetera)

Your link is broken, but just googling "1) Download the GPUs.txt file from https://fah-web.stanford.edu/file-re...ublic/GPUs.txt" got me a link. Tried that, and will leave the computer folding all the night (11:30PM here in Spain right now). Let's hope the system keeps working.
 
Did you upgrade to the latest version of V7 - *that* was my link. The other link is fah's, and I'll correct that link, and thank you for letting me know.

Where did you get the modded driver? That isn't something that FAH wants to do, unless absolutely necessary. They want to standardize everything, as much as possible.
 
Yes I upgraded to F@H v7.1.52 and installed the GPUs.txt file in FAHData folder (and also copied it to FAHCore, because I wasn't sure, couldn't find a client.db but another db in FAHData, so copied it into both)

The modded driver was something I found on Google. Basically you mod two files to add the device IDs of your card. It seems it works and adds nVidia's new FXAA.

So far, the system has been folding all night without any of the GPUs freezing. But I didn't register a single 100% usage point, it always was 60-80%. Maybe that has to do with the GPU drivers? But that would make me go back to 28x. No way I go again to 29x with all their problems.
 
Post the log. The system section at startup and the part from just before to just after the failure are most important.

My experience on Fermis is limited to setting up FAH for others. Never had a problem with the 29x.xx drivers. I did have a problem with too high an OC causing the driver to crash, which I suspect is your problem.
 
Post the log. The system section at startup and the part from just before to just after the failure are most important.

My experience on Fermis is limited to setting up FAH for others. Never had a problem with the 29x.xx drivers. I did have a problem with too high an OC causing the driver to crash, which I suspect is your problem.

Oh, the sig is not updated, I no longer run such a high OC, (excluding when I'm trying to bench something). I'm running 700/1850/1.1 right now. The fact is that I do not get a "Failed" f@h report or a crash, just that it seems that f@h is not giving any work to my GPU.

Just checked the logs, and it seems that I get this CLIENT_DIED error.

Code:
05:08:23:WU03:FS02:0xa4:
05:08:23:WU03:FS02:0xa4:*------------------------------*
05:08:24:WU03:FS02:0xa4:Folding@Home Gromacs GB Core
05:08:24:WU03:FS02:0xa4:Version 2.27 (Dec. 15, 2010)
05:08:24:WU03:FS02:0xa4:
05:08:24:WU03:FS02:0xa4:Preparing to commence simulation
05:08:24:WU03:FS02:0xa4:- Looking at optimizations...
05:08:24:WU03:FS02:0xa4:- Created dyn
05:08:24:WU03:FS02:0xa4:- Files status OK
05:08:24:WU03:FS02:0xa4:- Expanded 39801 -> 203980 (decompressed 512.4 percent)
05:08:24:WU03:FS02:0xa4:Called DecompressByteArray: compressed_data_size=39801 data_size=203980, decompressed_data_size=203980 diff=0
05:08:24:WU03:FS02:0xa4:- Digital signature verified
05:08:24:WU03:FS02:0xa4:
05:08:24:WU03:FS02:0xa4:Project: 10084 (Run 5, Clone 43, Gen 52)
05:08:24:WU03:FS02:0xa4:
05:08:24:WU03:FS02:0xa4:Assembly optimizations on if available.
05:08:24:WU03:FS02:0xa4:Entering M.D.
05:08:28:WU03:FS02:0xa4:Mapping NT from 4 to 4 
05:08:28:WU03:FS02:0xa4:Completed 0 out of 10000000 steps  (0%)
05:08:30:WU02:FS02:Upload complete
05:08:30:WU02:FS02:Server responded WORK_ACK (400)
05:08:30:WU02:FS02:Final credit estimate, 1419.00 points
05:08:30:WU02:FS02:Cleaning up
05:20:09:WU03:FS02:0xa4:Completed 100000 out of 10000000 steps  (1%)
05:31:39:WU03:FS02:0xa4:Completed 200000 out of 10000000 steps  (2%)
05:42:56:WU03:FS02:0xa4:Completed 300000 out of 10000000 steps  (3%)
05:54:22:WU03:FS02:0xa4:Completed 400000 out of 10000000 steps  (4%)
05:59:34:Lost lifeline PID 5452, exiting
05:59:34:Server connection id=1 ended
05:59:35:FS00:Shutting core down
05:59:35:FS01:Shutting core down
05:59:35:FS02:Shutting core down
05:59:35:WU03:FS02:0xa4:Client no longer detected. Shutting down core 
05:59:35:WU03:FS02:0xa4:
05:59:35:WU03:FS02:0xa4:Folding@home Core Shutdown: CLIENT_DIED
05:59:39:WU00:FS00:0x15:Client no longer detected. Shutting down core 
05:59:39:WU00:FS00:0x15:
05:59:39:WU00:FS00:0x15:Folding@home Core Shutdown: CLIENT_DIED
05:59:40:Clean exit
05:59:40:WU01:FS01:0x15:Client no longer detected. Shutting down core 
05:59:40:WU01:FS01:0x15:
05:59:40:WU01:FS01:0x15:Folding@home Core Shutdown: CLIENT_DIED

This is the beginning of that log.
Code:
*********************** Log Started 2012-04-08T22:37:21Z ***********************
22:37:21:************************* Folding@home Client *************************
22:37:21:      Website: http://folding.stanford.edu/
22:37:21:    Copyright: (c) 2009-2012 Stanford University
22:37:21:       Author: Joseph Coffland <[email protected]>
22:37:21:         Args: --lifeline 5452 --command-port=36330
22:37:21:       Config: C:/Program Files (x86)/FAHData/config.xml
22:37:21:******************************** Build ********************************
22:37:21:      Version: 7.1.52
22:37:21:         Date: Mar 20 2012
22:37:21:         Time: 19:37:42
22:37:21:      SVN Rev: 3515
22:37:21:       Branch: fah/trunk/client
22:37:21:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
22:37:21:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
22:37:21:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT
22:37:21:     Platform: win32 XP
22:37:21:         Bits: 32
22:37:21:         Mode: Release
22:37:21:******************************* System ********************************
22:37:21:          CPU: Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz
22:37:21:       CPU ID: GenuineIntel Family 6 Model 42 Stepping 7
22:37:21:         CPUs: 4
22:37:21:       Memory: 15.98GiB
22:37:21:  Free Memory: 9.84GiB
22:37:21:      Threads: WINDOWS_THREADS
22:37:21:   On Battery: false
22:37:21:   UTC offset: 2
22:37:21:          PID: 4080
22:37:21:          CWD: C:/Program Files (x86)/FAHData
22:37:21:           OS: Windows 7 Home Premium
22:37:21:      OS Arch: AMD64
22:37:21:         GPUs: 2
22:37:21:        GPU 0: FERMI:1 GF100 [GeForce GTX 470]
22:37:21:        GPU 1: FERMI:1 GF100 [GeForce GTX 470]
22:37:21:         CUDA: 2.0
22:37:21:  CUDA Driver: 4020
22:37:21:Win32 Service: false
22:37:21:***********************************************************************
22:37:22:<config>
22:37:22:  <!-- Folding Slot Configuration -->
22:37:22:  <gpu v='true'/>
22:37:22:
22:37:22:  <!-- Network -->
22:37:22:  <proxy v=':8080'/>
22:37:22:
22:37:22:  <!-- User Information -->
22:37:22:  <passkey v='********************************'/>
22:37:22:  <team v='32'/>
22:37:22:  <user v='Jesus_Velez_Palacios'/>
22:37:22:
22:37:22:  <!-- Folding Slots -->
22:37:22:  <slot id='0' type='GPU'/>
22:37:22:  <slot id='1' type='GPU'/>
22:37:22:  <slot id='2' type='SMP'/>
22:37:22:</config>
22:37:22:Trying to access database...
22:37:22:Successfully acquired database lock
22:37:22:Enabled folding slot 00: READY gpu:0:"GF100 [GeForce GTX 470]"
22:37:22:Enabled folding slot 01: READY gpu:1:"GF100 [GeForce GTX 470]"
22:37:22:Enabled folding slot 02: READY smp:4
22:37:22:WU01:FS01:Starting
22:37:22:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "C:/Program Files (x86)/FAHData/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_15.fah/FahCore_15.exe" -dir 01 -suffix 01 -version 701 -lifeline 4080 -checkpoint 15 -gpu 1
22:37:22:WU01:FS01:Started FahCore on PID 8412
22:37:22:WU01:FS01:Core PID:5060
22:37:22:WU01:FS01:FahCore 0x15 started
22:37:22:WU00:FS00:Starting
22:37:22:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "C:/Program Files (x86)/FAHData/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_15.fah/FahCore_15.exe" -dir 00 -suffix 01 -version 701 -lifeline 4080 -checkpoint 15 -gpu 0
22:37:22:WU00:FS00:Started FahCore on PID 6028
22:37:22:WU00:FS00:Core PID:1832
22:37:22:WU00:FS00:FahCore 0x15 started
22:37:22:WU02:FS02:Starting
22:37:22:WU02:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "C:/Program Files (x86)/FAHData/cores/www.stanford.edu/~pande/Win32/AMD64/Core_a3.fah/FahCore_a3.exe" -dir 02 -suffix 01 -version 701 -lifeline 4080 -checkpoint 15 -np 4
22:37:22:WU02:FS02:Started FahCore on PID 9464
22:37:22:WU02:FS02:Core PID:3360
22:37:22:WU02:FS02:FahCore 0xa3 started

I think I can discard a core corruption, I deleted everything f@h related before installing 7.1.52
 
My theory:
It appears to me that you have a custom installation and have located the FAH Data directory in Program Files (C:/Program Files (x86)/FAHData/config.xml), where it will not work. Windows 7 kills the FAH core when it attempts to write into the Program Files directory. A reinstall using the default locations should cure the problem.
 
My theory:
It appears to me that you have a custom installation and have located the FAH Data directory in Program Files (C:/Program Files (x86)/FAHData/config.xml), where it will not work. Windows 7 kills the FAH core when it attempts to write into the Program Files directory. A reinstall using the default locations should cure the problem.

Will try reinstalling using default location, but I don't know where I read that I should change it to that path. Anyway, will reply asap.
 
It has not crashed nor given a CLIENT_DIED signal, but I get 40-50% GPU usage, not the 99% I was used to when I wasn't running the SLi.
 
Glad to see you've made progress. Not sure why you're getting such low GPU usage, though. What work units are they crunching?

My SLI 580s run 98-99% constantly when they fold, but there's a specific WU that will drop their usage to 78-82%. Can't remember which, unfortunately.

Have you set "Prefer maximum performance" in the nVidia control panel?

Could also be your modded drivers causing the problem. I'm running 296.10 on my SLI setup and a single 460 without issues.
 
Back