Diagnostics
From time to time all of us experience seemingly baffling failures of the Folding@Home Clients. With multiple clients on the same machine, figuring out what's wrong becomes complicated. Many of us turn to the forum for help. Unfortunately, all too often, their request for help post reads something like " It won't work. What's wrong?" and that's it. Here are the basic's of what is needed to properly diagnose a problem.
1. All clients should be run with the -verbosity 9 flag in extra parameters.
2. Machine Specs
a. CPU @ GHz
b. GPU(s) @ core/shaders/memory, what you set and what they're running as reported by a monitoring app like GPU-Z or Afterburner.
3. Software
a. OS and Version
b. GPU driver version
4. The log at the start of the client including the banner.
5. THe log at the start of a failed WU
6. The log at the WU failure
This failure was due to a faulty GPU core on a single PCB GTX295 running SLI (Multi-gpu). Disabling Multi-GPU in nvidia control panel and folding only on the good core fixed it.
7. For configuration issues, the client.cfg of each client on the machine (I use NoteTab Light to correctly display this file)
Armed with the above information, most problems can be diagnosed quickly. With a few other questions answered, like "what does task manager say?" and "did you do a clean driver install with driver sweeper run in safe mode?" the member looking for help can get it within minutes instead of hours or days.
Feel free to post up anything you think I left out
From time to time all of us experience seemingly baffling failures of the Folding@Home Clients. With multiple clients on the same machine, figuring out what's wrong becomes complicated. Many of us turn to the forum for help. Unfortunately, all too often, their request for help post reads something like " It won't work. What's wrong?" and that's it. Here are the basic's of what is needed to properly diagnose a problem.
1. All clients should be run with the -verbosity 9 flag in extra parameters.
2. Machine Specs
a. CPU @ GHz
b. GPU(s) @ core/shaders/memory, what you set and what they're running as reported by a monitoring app like GPU-Z or Afterburner.
3. Software
a. OS and Version
b. GPU driver version
4. The log at the start of the client including the banner.
Code:
--- Opening Log file [December 24 14:35:18 UTC]
# Windows GPU Console Edition #################################################
###############################################################################
Folding@Home Client Version 6.XX
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: C:\FAH\FAH GPU1
Executable: C:\FAH\FAH GPU1\[email protected]
Arguments: -gpu 0 -verbosity 9
[14:35:18] - Ask before connecting: No
[14:35:18] - User name: ChasR (Team 32)
[14:35:18] - User ID: 28F95C4958A6F8B5
[14:35:18] - Machine ID: 2
[14:35:18]
[14:35:18] Gpu type=2 species=13.
[14:35:18] Loaded queue successfully.
[14:35:18]
[14:35:18] + Processing work unit
[14:35:18] - Autosending finished units... [December 24 14:35:18 UTC]
[14:35:18] Core required: FahCore_11.exe
[14:35:18] Core found.
[14:35:18] Trying to send all finished work units
[14:35:18] + No unsent completed units remaining.
[14:35:18] - Autosend completed
[14:35:18] Working on queue slot 08 [December 24 14:35:18 UTC]
[14:35:18] + Working ...
[14:35:18] - Calling '.\FahCore_11.exe -dir work/ -suffix 08 -nice 19 -priority 96 -nocpulock -checkpoint 30 -verbose -lifeline 6984 -version 6XX'
Code:
[11:44:17] + Processing work unit
[11:44:17] Core required: FahCore_11.exe
[11:44:17] Core found.
[11:44:17] Working on queue slot 08 [December 23 11:44:17 UTC]
[11:44:17] + Working ...
[11:44:17] - Calling '.\FahCore_11.exe -dir work/ -suffix 08 -nice 19 -priority 96 -nocpulock -checkpoint 30 -verbose -lifeline 3148 -version 6XX'
[11:44:18]
[11:44:18] *------------------------------*
[11:44:18] Folding@Home GPU Core
[11:44:18] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[11:44:18]
[11:44:18] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[11:44:18] Build host: amoeba
[11:44:18] Board Type: Nvidia
[11:44:18] Core :
[11:44:18] Preparing to commence simulation
[11:44:18] - Looking at optimizations...
[11:44:18] DeleteFrameFiles: successfully deleted file=work/wudata_08.ckp
[11:44:18] - Created dyn
[11:44:18] - Files status OK
[11:44:18] - Expanded 46775 -> 252912 (decompressed 540.6 percent)
[11:44:18] Called DecompressByteArray: compressed_data_size=46775 data_size=252912, decompressed_data_size=252912 diff=0
[11:44:18] - Digital signature verified
[11:44:18]
[11:44:18] Project: 5768 (Run 10, Clone 38, Gen 411)
[11:44:18]
[11:44:18] Assembly optimizations on if available.
[11:44:18] Entering M.D.
[11:44:24] Tpr hash work/wudata_08.tpr: 4069322040 2473967236 3851648921 374735687 640539219
[11:44:24]
[11:44:24] Calling fah_main args: 14 usage=100
[11:44:24]
[11:44:24] Working on Protein
[11:44:25] Client config found, loading data.
[11:44:25] Starting GUI Server
[11:45:01] Completed 1%
Code:
[12:00:36] Completed 27%
[12:00:36] mdrun_gpu returned
[12:00:36] NANs detected on GPU
[12:00:36]
[12:00:36] Folding@home Core Shutdown: UNSTABLE_MACHINE
[12:00:40] CoreStatus = 7A (122)
[12:00:40] Sending work to server
[12:00:40] Project: 5768 (Run 10, Clone 38, Gen 411)
[12:00:40] - Read packet limit of 540015616... Set to 524286976.
[12:00:40] - Error: Could not get length of results file work/wuresults_08.dat
[12:00:40] - Error: Could not read unit 08 file. Removing from queue.
[12:00:40] Trying to send all finished work units
[12:00:40] + No unsent completed units remaining.
7. For configuration issues, the client.cfg of each client on the machine (I use NoteTab Light to correctly display this file)
Code:
[settings]
username=ChasR
team=32
passkey=5d4520daa6737d909fbdf6f72088XXXX
asknet=no
machineid=2
bigpackets=big
extra_parms=-gpu 0 -verbosity 9
local=8583
[http]
active=no
host=localhost
port=8080
usereg=no
proxy_name=
proxy_passwd=
[core]
priority=96
cpuusage=100
disableassembly=no
nocpulock=1
checkpoint=30
addr=
[power]
battery=no
[clienttype]
memory=500
type=0
Armed with the above information, most problems can be diagnosed quickly. With a few other questions answered, like "what does task manager say?" and "did you do a clean driver install with driver sweeper run in safe mode?" the member looking for help can get it within minutes instead of hours or days.
Feel free to post up anything you think I left out
Last edited: