• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

Diagnostic Information

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

ChasR

Senior Member
Joined
Apr 12, 2004
Location
Atlanta
Diagnostics

From time to time all of us experience seemingly baffling failures of the Folding@Home Clients. With multiple clients on the same machine, figuring out what's wrong becomes complicated. Many of us turn to the forum for help. Unfortunately, all too often, their request for help post reads something like " It won't work. What's wrong?" and that's it. Here are the basic's of what is needed to properly diagnose a problem.

1. All clients should be run with the -verbosity 9 flag in extra parameters.
2. Machine Specs

a. CPU @ GHz
b. GPU(s) @ core/shaders/memory, what you set and what they're running as reported by a monitoring app like GPU-Z or Afterburner.

3. Software

a. OS and Version
b. GPU driver version

4. The log at the start of the client including the banner.
Code:
--- Opening Log file [December 24 14:35:18 UTC] 


# Windows GPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.XX

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\FAH\FAH GPU1
Executable: C:\FAH\FAH GPU1\[email protected]
Arguments: -gpu 0 -verbosity 9 

[14:35:18] - Ask before connecting: No
[14:35:18] - User name: ChasR (Team 32)
[14:35:18] - User ID: 28F95C4958A6F8B5
[14:35:18] - Machine ID: 2
[14:35:18] 
[14:35:18] Gpu type=2 species=13.
[14:35:18] Loaded queue successfully.
[14:35:18] 
[14:35:18] + Processing work unit
[14:35:18] - Autosending finished units... [December 24 14:35:18 UTC]
[14:35:18] Core required: FahCore_11.exe
[14:35:18] Core found.
[14:35:18] Trying to send all finished work units
[14:35:18] + No unsent completed units remaining.
[14:35:18] - Autosend completed
[14:35:18] Working on queue slot 08 [December 24 14:35:18 UTC]
[14:35:18] + Working ...
[14:35:18] - Calling '.\FahCore_11.exe -dir work/ -suffix 08 -nice 19 -priority 96 -nocpulock -checkpoint 30 -verbose -lifeline 6984 -version 6XX'
5. THe log at the start of a failed WU
Code:
[11:44:17] + Processing work unit
[11:44:17] Core required: FahCore_11.exe
[11:44:17] Core found.
[11:44:17] Working on queue slot 08 [December 23 11:44:17 UTC]
[11:44:17] + Working ...
[11:44:17] - Calling '.\FahCore_11.exe -dir work/ -suffix 08 -nice 19 -priority 96 -nocpulock -checkpoint 30 -verbose -lifeline 3148 -version 6XX'

[11:44:18] 
[11:44:18] *------------------------------*
[11:44:18] Folding@Home GPU Core
[11:44:18] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[11:44:18] 
[11:44:18] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[11:44:18] Build host: amoeba
[11:44:18] Board Type: Nvidia
[11:44:18] Core      : 
[11:44:18] Preparing to commence simulation
[11:44:18] - Looking at optimizations...
[11:44:18] DeleteFrameFiles: successfully deleted file=work/wudata_08.ckp
[11:44:18] - Created dyn
[11:44:18] - Files status OK
[11:44:18] - Expanded 46775 -> 252912 (decompressed 540.6 percent)
[11:44:18] Called DecompressByteArray: compressed_data_size=46775 data_size=252912, decompressed_data_size=252912 diff=0
[11:44:18] - Digital signature verified
[11:44:18] 
[11:44:18] Project: 5768 (Run 10, Clone 38, Gen 411)
[11:44:18] 
[11:44:18] Assembly optimizations on if available.
[11:44:18] Entering M.D.
[11:44:24] Tpr hash work/wudata_08.tpr:  4069322040 2473967236 3851648921 374735687 640539219
[11:44:24] 
[11:44:24] Calling fah_main args: 14 usage=100
[11:44:24] 
[11:44:24] Working on Protein
[11:44:25] Client config found, loading data.
[11:44:25] Starting GUI Server
[11:45:01] Completed 1%
6. The log at the WU failure
Code:
[12:00:36] Completed 27%
[12:00:36] mdrun_gpu returned 
[12:00:36] NANs detected on GPU
[12:00:36] 
[12:00:36] Folding@home Core Shutdown: UNSTABLE_MACHINE
[12:00:40] CoreStatus = 7A (122)
[12:00:40] Sending work to server
[12:00:40] Project: 5768 (Run 10, Clone 38, Gen 411)
[12:00:40] - Read packet limit of 540015616... Set to 524286976.
[12:00:40] - Error: Could not get length of results file work/wuresults_08.dat
[12:00:40] - Error: Could not read unit 08 file. Removing from queue.
[12:00:40] Trying to send all finished work units
[12:00:40] + No unsent completed units remaining.
This failure was due to a faulty GPU core on a single PCB GTX295 running SLI (Multi-gpu). Disabling Multi-GPU in nvidia control panel and folding only on the good core fixed it.

7. For configuration issues, the client.cfg of each client on the machine (I use NoteTab Light to correctly display this file)
Code:
[settings]
username=ChasR
team=32
passkey=5d4520daa6737d909fbdf6f72088XXXX
asknet=no
machineid=2
bigpackets=big
extra_parms=-gpu 0 -verbosity 9 
local=8583

[http]
active=no
host=localhost
port=8080
usereg=no
proxy_name=
proxy_passwd=

[core]
priority=96
cpuusage=100
disableassembly=no
nocpulock=1
checkpoint=30
addr=

[power]
battery=no

[clienttype]
memory=500
type=0

Armed with the above information, most problems can be diagnosed quickly. With a few other questions answered, like "what does task manager say?" and "did you do a clean driver install with driver sweeper run in safe mode?" the member looking for help can get it within minutes instead of hours or days.

Feel free to post up anything you think I left out
 
Last edited:
Stuck... still need to do something with the multitude of stickies, but for now. :)

Merry Christmas!!! :santa:
 
Ya that's on my to-do list, I have been meaning to re-do all of the FAH install info, troubleshooting, etc.

Guess I need to get on that soon...
 
Im getting alot of these
A few in my queue have 17+ upload failures.

[21:16:48] + Attempting to send results [July 18 21:16:48 UTC]
[21:17:21] - Couldn't send HTTP request to server
[21:17:21] + Could not connect to Work Server (results)
[21:17:21] (171.67.108.33:8080)
[21:17:21] + Retrying using alternative port
[21:17:52] - Couldn't send HTTP request to server
[21:17:52] + Could not connect to Work Server (results)
[21:17:52] (171.67.108.33:80)
[21:17:52] - Error: Could not transmit unit 04 (completed July 17) to work server.
[21:17:52] - Read packet limit of 540015616... Set to 524286976.
 
I couldn't open that work server in my browser, however the log for that server shows it up and fully functional at 18:55 PDT
 
Back