• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

SOLVED GPU issue - failing WU's "NANs detected on GPU"

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

gabi_golan

Member
Joined
Dec 15, 2011
Hey Team32!
I recently started to have some failing wu's on my 560ti.
Its started pretty much after i added the "client-type = beta" flag, and updated to 7.2.9 client version, but i think its unrelated to these things, because i tried to remove the "client type beta" flag, and reinstalled the 7.1.52 client (on Windwos 7 64bit) - but the wu's keep failing.
Here is the log:
Code:
*********************** Log Started 2012-11-07T19:05:38Z ***********************
19:05:38:************************* Folding@home Client *************************
19:05:38:      Website: http://folding.stanford.edu/
19:05:38:    Copyright: (c) 2009-2012 Stanford University
19:05:38:       Author: Joseph Coffland <[email protected]>
19:05:38:         Args: --lifeline 1848 --command-port=36330
19:05:38:       Config: D:/Program Files/Software/FAH Data/config.xml
19:05:38:******************************** Build ********************************
19:05:38:      Version: 7.1.52
19:05:38:         Date: Mar 20 2012
19:05:38:         Time: 20:36:05
19:05:38:      SVN Rev: 3515
19:05:38:       Branch: fah/trunk/client
19:05:38:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
19:05:38:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE2
19:05:38:               /QaxSSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT
19:05:38:     Platform: win32 Vista
19:05:38:         Bits: 32
19:05:38:         Mode: Release
19:05:38:******************************* System ********************************
19:05:38:          CPU: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
19:05:38:       CPU ID: GenuineIntel Family 6 Model 26 Stepping 4
19:05:38:         CPUs: 8
19:05:38:       Memory: 6.00GiB
19:05:38:  Free Memory: 1.82GiB
19:05:38:      Threads: WINDOWS_THREADS
19:05:38:   On Battery: false
19:05:38:   UTC offset: 2
19:05:38:          PID: 2940
19:05:38:          CWD: D:/Program Files/Software/FAH Data
19:05:38:           OS: Windows 7 Ultimate
19:05:38:      OS Arch: AMD64
19:05:38:         GPUs: 2
19:05:38:        GPU 0: FERMI:1 GF114 [GeForce GTX 560 Ti]
19:05:38:        GPU 1: UNSUPPORTED: Rage XL (Intel Corporation)
19:05:38:         CUDA: 2.1
19:05:38:  CUDA Driver: 4010
19:05:38:Win32 Service: false
19:05:38:***********************************************************************
19:05:39:<config>
19:05:39:  <!-- FahCore Control -->
19:05:39:  <core-priority v='low'/>
19:05:39:
19:05:39:  <!-- Folding Slot Configuration -->
19:05:39:  <gpu v='true'/>
19:05:39:
19:05:39:  <!-- Network -->
19:05:39:  <proxy v=':8080'/>
19:05:39:
19:05:39:  <!-- User Information -->
19:05:39:  <passkey v='********************************'/>
19:05:39:  <team v='32'/>
19:05:39:  <user v='gabi_golan'/>
19:05:39:
19:05:39:  <!-- Folding Slots -->
19:05:39:  <slot id='0' type='GPU'>
19:05:39:    <client-type v='beta'/>
19:05:39:  </slot>
19:05:39:</config>
19:05:40:Trying to access database...
19:05:40:Successfully acquired database lock
19:05:40:Enabled folding slot 00: READY gpu:0:"GF114 [GeForce GTX 560 Ti]"
19:05:40:WU01:FS00:Starting
19:05:40:WU01:FS00:Running FahCore: "D:\Program Files\Software\FAHClient 7.1.52 back from 7.2/FAHCoreWrapper.exe" "D:/Program Files/Software/FAH Data/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_15.fah/FahCore_15.exe" -dir 01 -suffix 01 -version 701 -lifeline 2940 -checkpoint 15 -gpu 0
19:05:40:Server connection id=1 on 0.0.0.0:36330 from 127.0.0.1
19:05:40:WU01:FS00:Started FahCore on PID 4872
19:05:41:WU01:FS00:Core PID:5964
19:05:41:WU01:FS00:FahCore 0x15 started
19:05:42:WU01:FS00:0x15:
19:05:42:WU01:FS00:0x15:*------------------------------*
19:05:42:WU01:FS00:0x15:Folding@Home GPU Core
19:05:42:WU01:FS00:0x15:Version                2.25 (Wed May 9 17:03:01 EDT 2012)
19:05:42:WU01:FS00:0x15:Build host             AmoebaRemote
19:05:42:WU01:FS00:0x15:Board Type             NVIDIA/CUDA
19:05:42:WU01:FS00:0x15:Core                   15
19:05:42:WU01:FS00:0x15:
19:05:42:WU01:FS00:0x15:Window's signal control handler registered.
19:05:42:WU01:FS00:0x15:Preparing to commence simulation
19:05:42:WU01:FS00:0x15:- Looking at optimizations...
19:05:42:WU01:FS00:0x15:- Files status OK
19:05:42:WU01:FS00:0x15:sizeof(CORE_PACKET_HDR) = 512 file=<>
19:05:42:WU01:FS00:0x15:- Expanded 126495 -> 507182 (decompressed 400.9 percent)
19:05:42:WU01:FS00:0x15:Called DecompressByteArray: compressed_data_size=126495 data_size=507182, decompressed_data_size=507182 diff=0
19:05:42:WU01:FS00:0x15:- Digital signature verified
19:05:42:WU01:FS00:0x15:
19:05:42:WU01:FS00:0x15:Project: 7624 (Run 353, Clone 0, Gen 87)
19:05:42:WU01:FS00:0x15:
19:05:42:WU01:FS00:0x15:Assembly optimizations on if available.
19:05:42:WU01:FS00:0x15:Entering M.D.
19:05:43:WU01:FS00:0x15:Will resume from checkpoint file 01/wudata_01.ckp
19:05:44:WU01:FS00:0x15:Tpr hash 01/wudata_01.tpr:  397988189 3447753660 1847899078 3636130656 1093746271
19:05:44:WU01:FS00:0x15:GPU device id=0
19:05:44:WU01:FS00:0x15:Working on Protein
19:05:44:WU01:FS00:0x15:Client config unavailable.
19:05:44:WU01:FS00:0x15:Starting GUI Server
19:06:46:WU01:FS00:0x15:Resuming from checkpoint
19:06:46:WU01:FS00:0x15:fcCheckPointResume: retreived and current tpr file hash:
19:06:46:WU01:FS00:0x15:   0    397988189    397988189
19:06:46:WU01:FS00:0x15:   1   3447753660   3447753660
19:06:46:WU01:FS00:0x15:   2   1847899078   1847899078
19:06:46:WU01:FS00:0x15:   3   3636130656   3636130656
19:06:46:WU01:FS00:0x15:   4   1093746271   1093746271
19:06:46:WU01:FS00:0x15:fcCheckPointResume: file hashes same.
19:06:46:WU01:FS00:0x15:fcCheckPointResume: state restored.
19:06:47:WU01:FS00:0x15:fcCheckPointResume: name 01/wudata_01.log Verified 01/wudata_01.log
19:06:47:WU01:FS00:0x15:fcCheckPointResume: name 01/wudata_01.trr Verified 01/wudata_01.trr
19:06:47:WU01:FS00:0x15:fcCheckPointResume: name 01/wudata_01.xtc Verified 01/wudata_01.xtc
19:06:47:WU01:FS00:0x15:fcCheckPointResume: name 01/wudata_01.edr Verified 01/wudata_01.edr
19:06:47:WU01:FS00:0x15:fcCheckPointResume: state restored 2
19:06:47:WU01:FS00:0x15:Resumed from checkpoint
19:06:47:WU01:FS00:0x15:Setting checkpoint frequency: 400000
19:06:47:WU01:FS00:0x15:Completed    800001 out of 40000000 steps (2%).
19:13:55:WU01:FS00:0x15:Completed   1200000 out of 40000000 steps (3%).
19:21:04:WU01:FS00:0x15:Completed   1600000 out of 40000000 steps (4%).
19:28:13:WU01:FS00:0x15:Completed   2000000 out of 40000000 steps (5%).
19:35:22:WU01:FS00:0x15:Completed   2400000 out of 40000000 steps (6%).
19:42:31:WU01:FS00:0x15:Completed   2800000 out of 40000000 steps (7%).
19:49:40:WU01:FS00:0x15:Completed   3200000 out of 40000000 steps (8%).
19:57:26:WU01:FS00:0x15:Completed   3600000 out of 40000000 steps (9%).
20:05:03:WU01:FS00:0x15:Completed   4000000 out of 40000000 steps (10%).
20:12:39:WU01:FS00:0x15:Completed   4400000 out of 40000000 steps (11%).
20:20:27:WU01:FS00:0x15:Completed   4800000 out of 40000000 steps (12%).
20:28:12:WU01:FS00:0x15:Completed   5200000 out of 40000000 steps (13%).
20:35:58:WU01:FS00:0x15:Completed   5600000 out of 40000000 steps (14%).
20:43:30:WU01:FS00:0x15:Completed   6000000 out of 40000000 steps (15%).
20:43:30:WU01:FS00:0x15:mdrun_gpu returned 52
20:43:30:WU01:FS00:0x15:NANs detected on GPU
20:43:30:WU01:FS00:0x15:
20:43:31:WU01:FS00:0x15:Folding@home Core Shutdown: UNSTABLE_MACHINE
20:43:31:WU01:FS00:FahCore returned: UNSTABLE_MACHINE (122 = 0x7a)
20:43:31:WU01:FS00:Starting
20:43:31:WU01:FS00:Running FahCore: "D:\Program Files\Software\FAHClient 7.1.52 back from 7.2/FAHCoreWrapper.exe" "D:/Program Files/Software/FAH Data/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_15.fah/FahCore_15.exe" -dir 01 -suffix 01 -version 701 -lifeline 2940 -checkpoint 15 -gpu 0
20:43:32:WU01:FS00:Started FahCore on PID 4748
20:43:33:WU01:FS00:Core PID:5396
20:43:33:WU01:FS00:FahCore 0x15 started
20:43:34:WU01:FS00:0x15:
20:43:34:WU01:FS00:0x15:*------------------------------*
20:43:34:WU01:FS00:0x15:Folding@Home GPU Core
20:43:34:WU01:FS00:0x15:Version                2.25 (Wed May 9 17:03:01 EDT 2012)
20:43:34:WU01:FS00:0x15:Build host             AmoebaRemote
20:43:34:WU01:FS00:0x15:Board Type             NVIDIA/CUDA
20:43:34:WU01:FS00:0x15:Core                   15
20:43:34:WU01:FS00:0x15:
20:43:34:WU01:FS00:0x15:Window's signal control handler registered.
20:43:34:WU01:FS00:0x15:Preparing to commence simulation
20:43:34:WU01:FS00:0x15:- Looking at optimizations...
20:43:34:WU01:FS00:0x15:DeleteFrameFiles: successfully deleted file=01/wudata_01.ckp
20:43:34:WU01:FS00:0x15:- Created dyn
20:43:34:WU01:FS00:0x15:- Files status OK
20:43:34:WU01:FS00:0x15:sizeof(CORE_PACKET_HDR) = 512 file=<>
20:43:35:WU01:FS00:0x15:- Expanded 126495 -> 507182 (decompressed 400.9 percent)
20:43:35:WU01:FS00:0x15:Called DecompressByteArray: compressed_data_size=126495 data_size=507182, decompressed_data_size=507182 diff=0
20:43:35:WU01:FS00:0x15:- Digital signature verified
20:43:35:WU01:FS00:0x15:
20:43:35:WU01:FS00:0x15:Project: 7624 (Run 353, Clone 0, Gen 87)
20:43:35:WU01:FS00:0x15:
20:43:35:WU01:FS00:0x15:Assembly optimizations on if available.
20:43:35:WU01:FS00:0x15:Entering M.D.
20:43:36:WU01:FS00:0x15:Tpr hash 01/wudata_01.tpr:  397988189 3447753660 1847899078 3636130656 1093746271
20:43:36:WU01:FS00:0x15:GPU device id=0
20:43:36:WU01:FS00:0x15:Working on Protein
20:43:36:WU01:FS00:0x15:Client config unavailable.
20:43:36:WU01:FS00:0x15:Starting GUI Server
20:44:41:WU01:FS00:0x15:Setting checkpoint frequency: 400000
20:44:41:WU01:FS00:0x15:Completed         3 out of 40000000 steps (0%).
20:44:42:WARNING:WU01:FS00:Detected clock skew, adjusting time estimates
20:52:26:WU01:FS00:0x15:Completed    400000 out of 40000000 steps (1%).
21:00:13:WU01:FS00:0x15:Completed    800000 out of 40000000 steps (2%).
21:07:48:WU01:FS00:0x15:Completed   1200000 out of 40000000 steps (3%).
Anyone knows what could it be?
 
Could the temp be the reason for wu to fail?
Temp was indeed hotter than usual, the max was 77C, it usually didnt go higher than 70-73. 77 seems too high.
 
77°C is too hot, imo. Can you ease off on the speed?
 
77°C is too hot, imo. Can you ease off on the speed?
What do you mean? I didnt understand.

I've restarted the client, deleted all configuration data, reconfigured it again. I also cleared the RealTemp monitor history: now i see that the max temp for the gpu overnight was 70C. Still i got one 8054 wu finished, and other wu ( i cant find the wu number on the log) that failed with the same problem. The log:
Code:
23:20:26:Saving configuration to config.xml
23:20:26:<config>
23:20:26:  <!-- Folding Slot Configuration -->
23:20:26:  <gpu v='true'/>
23:20:26:  <smp v='false'/>
23:20:26:
23:20:26:  <!-- Network -->
23:20:26:  <proxy v=':8080'/>
23:20:26:
23:20:26:  <!-- User Information -->
23:20:26:  <passkey v='********************************'/>
23:20:26:  <team v='32'/>
23:20:26:  <user v='gabi_golan'/>
23:20:26:
23:20:26:  <!-- Folding Slots -->
23:20:26:  <slot id='0' type='GPU'>
23:20:26:    <client-type v='beta'/>
23:20:26:  </slot>
23:20:26:</config>
23:21:18:WU00:FS00:0x15:Setting checkpoint frequency: 500000
23:21:18:WU00:FS00:0x15:Completed         3 out of 50000000 steps (0%).
23:23:33:WU00:FS00:0x15:Completed    500000 out of 50000000 steps (1%).
23:25:47:WU00:FS00:0x15:Completed   1000000 out of 50000000 steps (2%).
23:28:01:WU00:FS00:0x15:Completed   1500000 out of 50000000 steps (3%).
23:30:14:WU00:FS00:0x15:Completed   2000000 out of 50000000 steps (4%).
23:32:28:WU00:FS00:0x15:Completed   2500000 out of 50000000 steps (5%).
23:34:42:WU00:FS00:0x15:Completed   3000000 out of 50000000 steps (6%).
23:36:56:WU00:FS00:0x15:Completed   3500000 out of 50000000 steps (7%).
23:39:10:WU00:FS00:0x15:Completed   4000000 out of 50000000 steps (8%).
23:41:24:WU00:FS00:0x15:Completed   4500000 out of 50000000 steps (9%).
23:43:38:WU00:FS00:0x15:Completed   5000000 out of 50000000 steps (10%).
23:45:51:WU00:FS00:0x15:Completed   5500000 out of 50000000 steps (11%).
23:48:05:WU00:FS00:0x15:Completed   6000000 out of 50000000 steps (12%).
23:50:19:WU00:FS00:0x15:Completed   6500000 out of 50000000 steps (13%).
23:52:33:WU00:FS00:0x15:Completed   7000000 out of 50000000 steps (14%).
23:54:47:WU00:FS00:0x15:Completed   7500000 out of 50000000 steps (15%).
23:57:01:WU00:FS00:0x15:Completed   8000000 out of 50000000 steps (16%).
23:59:14:WU00:FS00:0x15:Completed   8500000 out of 50000000 steps (17%).
00:01:28:WU00:FS00:0x15:Completed   9000000 out of 50000000 steps (18%).
00:03:42:WU00:FS00:0x15:Completed   9500000 out of 50000000 steps (19%).
00:05:56:WU00:FS00:0x15:Completed  10000000 out of 50000000 steps (20%).
00:08:10:WU00:FS00:0x15:Completed  10500000 out of 50000000 steps (21%).
00:10:23:WU00:FS00:0x15:Completed  11000000 out of 50000000 steps (22%).
00:12:37:WU00:FS00:0x15:Completed  11500000 out of 50000000 steps (23%).
00:14:51:WU00:FS00:0x15:Completed  12000000 out of 50000000 steps (24%).
00:17:05:WU00:FS00:0x15:Completed  12500000 out of 50000000 steps (25%).
00:19:19:WU00:FS00:0x15:Completed  13000000 out of 50000000 steps (26%).
00:21:33:WU00:FS00:0x15:Completed  13500000 out of 50000000 steps (27%).
00:23:47:WU00:FS00:0x15:Completed  14000000 out of 50000000 steps (28%).
00:26:00:WU00:FS00:0x15:Completed  14500000 out of 50000000 steps (29%).
00:28:14:WU00:FS00:0x15:Completed  15000000 out of 50000000 steps (30%).
00:30:28:WU00:FS00:0x15:Completed  15500000 out of 50000000 steps (31%).
00:32:42:WU00:FS00:0x15:Completed  16000000 out of 50000000 steps (32%).
00:34:56:WU00:FS00:0x15:Completed  16500000 out of 50000000 steps (33%).
00:37:10:WU00:FS00:0x15:Completed  17000000 out of 50000000 steps (34%).
00:39:23:WU00:FS00:0x15:Completed  17500000 out of 50000000 steps (35%).
00:41:37:WU00:FS00:0x15:Completed  18000000 out of 50000000 steps (36%).
00:43:51:WU00:FS00:0x15:Completed  18500000 out of 50000000 steps (37%).
00:46:05:WU00:FS00:0x15:Completed  19000000 out of 50000000 steps (38%).
00:48:18:WU00:FS00:0x15:Completed  19500000 out of 50000000 steps (39%).
00:50:32:WU00:FS00:0x15:Completed  20000000 out of 50000000 steps (40%).
00:52:46:WU00:FS00:0x15:Completed  20500000 out of 50000000 steps (41%).
00:55:00:WU00:FS00:0x15:Completed  21000000 out of 50000000 steps (42%).
00:57:14:WU00:FS00:0x15:Completed  21500000 out of 50000000 steps (43%).
00:59:28:WU00:FS00:0x15:Completed  22000000 out of 50000000 steps (44%).
01:01:41:WU00:FS00:0x15:Completed  22500000 out of 50000000 steps (45%).
01:03:55:WU00:FS00:0x15:Completed  23000000 out of 50000000 steps (46%).
01:06:09:WU00:FS00:0x15:Completed  23500000 out of 50000000 steps (47%).
01:08:23:WU00:FS00:0x15:Completed  24000000 out of 50000000 steps (48%).
01:10:37:WU00:FS00:0x15:Completed  24500000 out of 50000000 steps (49%).
01:12:50:WU00:FS00:0x15:Completed  25000000 out of 50000000 steps (50%).
01:15:04:WU00:FS00:0x15:Completed  25500000 out of 50000000 steps (51%).
01:17:18:WU00:FS00:0x15:Completed  26000000 out of 50000000 steps (52%).
01:19:32:WU00:FS00:0x15:Completed  26500000 out of 50000000 steps (53%).
01:21:33:WU00:FS00:0x15:Completed  27000000 out of 50000000 steps (54%).
01:21:33:WU00:FS00:0x15:mdrun_gpu returned 52
01:21:33:WU00:FS00:0x15:NANs detected on GPU
01:21:33:WU00:FS00:0x15:
01:21:33:WU00:FS00:0x15:Folding@home Core Shutdown: UNSTABLE_MACHINE
01:21:33:WU00:FS00:FahCore returned: UNSTABLE_MACHINE (122 = 0x7a)
01:21:34:WU00:FS00:Starting
01:21:34:WU00:FS00:Running FahCore: "D:\Program Files\Software\FAHClient\FAHClient 7.1.52/FAHCoreWrapper.exe" "D:/Program Files/Software/FAH Data/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_15.fah/FahCore_15.exe" -dir 00 -suffix 01 -version 701 -lifeline 3916 -checkpoint 15 -gpu 0
01:21:34:WU00:FS00:Started FahCore on PID 4428
01:21:35:WU00:FS00:Core PID:6052
01:21:35:WU00:FS00:FahCore 0x15 started
01:21:35:WU00:FS00:0x15:
01:21:35:WU00:FS00:0x15:*------------------------------*
01:21:35:WU00:FS00:0x15:Folding@Home GPU Core
01:21:35:WU00:FS00:0x15:Version                2.25 (Wed May 9 17:03:01 EDT 2012)
01:21:35:WU00:FS00:0x15:Build host             AmoebaRemote
01:21:35:WU00:FS00:0x15:Board Type             NVIDIA/CUDA
01:21:35:WU00:FS00:0x15:Core                   15
01:21:35:WU00:FS00:0x15:
01:21:35:WU00:FS00:0x15:Window's signal control handler registered.
01:21:35:WU00:FS00:0x15:Preparing to commence simulation
01:21:35:WU00:FS00:0x15:- Looking at optimizations...
01:21:35:WU00:FS00:0x15:DeleteFrameFiles: successfully deleted file=00/wudata_01.ckp
01:21:35:WU00:FS00:0x15:- Created dyn
01:21:35:WU00:FS00:0x15:- Files status OK
01:21:35:WU00:FS00:0x15:sizeof(CORE_PACKET_HDR) = 512 file=<>
01:21:35:WU00:FS00:0x15:- Expanded 60356 -> 264278 (decompressed 437.8 percent)
01:21:36:WU00:FS00:0x15:Called DecompressByteArray: compressed_data_size=60356 data_size=264278, decompressed_data_size=264278 diff=0
01:21:36:WU00:FS00:0x15:- Digital signature verified
01:21:36:WU00:FS00:0x15:
01:21:36:WU00:FS00:0x15:Project: 8054 (Run 0, Clone 99, Gen 94)
01:21:36:WU00:FS00:0x15:
01:21:36:WU00:FS00:0x15:Assembly optimizations on if available.
01:21:36:WU00:FS00:0x15:Entering M.D.
01:21:37:WU00:FS00:0x15:Tpr hash 00/wudata_01.tpr:  2058243760 748406261 3247347251 2225436956 3372398577
01:21:37:WU00:FS00:0x15:GPU device id=0
01:21:37:WU00:FS00:0x15:Working on Good ROcking Metal Altar for Chronical Sinners
01:21:37:WU00:FS00:0x15:Client config unavailable.
01:21:38:WU00:FS00:0x15:Starting GUI Server
01:22:39:WU00:FS00:0x15:Setting checkpoint frequency: 500000
01:22:39:WU00:FS00:0x15:Completed         3 out of 50000000 steps (0%).
01:22:40:WARNING:WU00:FS00:Detected clock skew, adjusting time estimates
01:24:53:WU00:FS00:0x15:Completed    500000 out of 50000000 steps (1%).
01:27:06:WU00:FS00:0x15:Completed   1000000 out of 50000000 steps (2%).
01:29:20:WU00:FS00:0x15:Completed   1500000 out of 50000000 steps (3%).
01:31:34:WU00:FS00:0x15:Completed   2000000 out of 50000000 steps (4%).
01:33:48:WU00:FS00:0x15:Completed   2500000 out of 50000000 steps (5%).
01:36:02:WU00:FS00:0x15:Completed   3000000 out of 50000000 steps (6%).
01:38:16:WU00:FS00:0x15:Completed   3500000 out of 50000000 steps (7%).
01:40:30:WU00:FS00:0x15:Completed   4000000 out of 50000000 steps (8%).
01:42:44:WU00:FS00:0x15:Completed   4500000 out of 50000000 steps (9%).
01:44:57:WU00:FS00:0x15:Completed   5000000 out of 50000000 steps (10%).
01:47:11:WU00:FS00:0x15:Completed   5500000 out of 50000000 steps (11%).
01:49:25:WU00:FS00:0x15:Completed   6000000 out of 50000000 steps (12%).
01:51:39:WU00:FS00:0x15:Completed   6500000 out of 50000000 steps (13%).
01:53:53:WU00:FS00:0x15:Completed   7000000 out of 50000000 steps (14%).
01:56:07:WU00:FS00:0x15:Completed   7500000 out of 50000000 steps (15%).
01:58:21:WU00:FS00:0x15:Completed   8000000 out of 50000000 steps (16%).
02:00:35:WU00:FS00:0x15:Completed   8500000 out of 50000000 steps (17%).
02:02:48:WU00:FS00:0x15:Completed   9000000 out of 50000000 steps (18%).
02:05:02:WU00:FS00:0x15:Completed   9500000 out of 50000000 steps (19%).
02:07:16:WU00:FS00:0x15:Completed  10000000 out of 50000000 steps (20%).
02:09:30:WU00:FS00:0x15:Completed  10500000 out of 50000000 steps (21%).
02:11:44:WU00:FS00:0x15:Completed  11000000 out of 50000000 steps (22%).
02:13:58:WU00:FS00:0x15:Completed  11500000 out of 50000000 steps (23%).
02:16:12:WU00:FS00:0x15:Completed  12000000 out of 50000000 steps (24%).
02:18:26:WU00:FS00:0x15:Completed  12500000 out of 50000000 steps (25%).
02:20:39:WU00:FS00:0x15:Completed  13000000 out of 50000000 steps (26%).
02:22:53:WU00:FS00:0x15:Completed  13500000 out of 50000000 steps (27%).
02:25:07:WU00:FS00:0x15:Completed  14000000 out of 50000000 steps (28%).
02:27:21:WU00:FS00:0x15:Completed  14500000 out of 50000000 steps (29%).
02:29:35:WU00:FS00:0x15:Completed  15000000 out of 50000000 steps (30%).
02:31:49:WU00:FS00:0x15:Completed  15500000 out of 50000000 steps (31%).
02:34:03:WU00:FS00:0x15:Completed  16000000 out of 50000000 steps (32%).
02:36:17:WU00:FS00:0x15:Completed  16500000 out of 50000000 steps (33%).
02:38:30:WU00:FS00:0x15:Completed  17000000 out of 50000000 steps (34%).
02:40:44:WU00:FS00:0x15:Completed  17500000 out of 50000000 steps (35%).
02:42:58:WU00:FS00:0x15:Completed  18000000 out of 50000000 steps (36%).
02:45:12:WU00:FS00:0x15:Completed  18500000 out of 50000000 steps (37%).
02:47:26:WU00:FS00:0x15:Completed  19000000 out of 50000000 steps (38%).
02:49:40:WU00:FS00:0x15:Completed  19500000 out of 50000000 steps (39%).
02:51:54:WU00:FS00:0x15:Completed  20000000 out of 50000000 steps (40%).
02:54:08:WU00:FS00:0x15:Completed  20500000 out of 50000000 steps (41%).
02:56:21:WU00:FS00:0x15:Completed  21000000 out of 50000000 steps (42%).
02:58:35:WU00:FS00:0x15:Completed  21500000 out of 50000000 steps (43%).
03:00:49:WU00:FS00:0x15:Completed  22000000 out of 50000000 steps (44%).
03:03:03:WU00:FS00:0x15:Completed  22500000 out of 50000000 steps (45%).
03:05:16:WU00:FS00:0x15:Completed  23000000 out of 50000000 steps (46%).
03:07:30:WU00:FS00:0x15:Completed  23500000 out of 50000000 steps (47%).
03:09:44:WU00:FS00:0x15:Completed  24000000 out of 50000000 steps (48%).
03:11:58:WU00:FS00:0x15:Completed  24500000 out of 50000000 steps (49%).
03:14:12:WU00:FS00:0x15:Completed  25000000 out of 50000000 steps (50%).
03:16:26:WU00:FS00:0x15:Completed  25500000 out of 50000000 steps (51%).
03:18:40:WU00:FS00:0x15:Completed  26000000 out of 50000000 steps (52%).
03:20:53:WU00:FS00:0x15:Completed  26500000 out of 50000000 steps (53%).
03:23:07:WU00:FS00:0x15:Completed  27000000 out of 50000000 steps (54%).
03:25:21:WU00:FS00:0x15:Completed  27500000 out of 50000000 steps (55%).
03:27:35:WU00:FS00:0x15:Completed  28000000 out of 50000000 steps (56%).
03:29:48:WU00:FS00:0x15:Completed  28500000 out of 50000000 steps (57%).
03:32:02:WU00:FS00:0x15:Completed  29000000 out of 50000000 steps (58%).
03:34:16:WU00:FS00:0x15:Completed  29500000 out of 50000000 steps (59%).
03:36:30:WU00:FS00:0x15:Completed  30000000 out of 50000000 steps (60%).
03:38:44:WU00:FS00:0x15:Completed  30500000 out of 50000000 steps (61%).
03:40:58:WU00:FS00:0x15:Completed  31000000 out of 50000000 steps (62%).
03:43:11:WU00:FS00:0x15:Completed  31500000 out of 50000000 steps (63%).
03:45:25:WU00:FS00:0x15:Completed  32000000 out of 50000000 steps (64%).
03:47:39:WU00:FS00:0x15:Completed  32500000 out of 50000000 steps (65%).
03:49:53:WU00:FS00:0x15:Completed  33000000 out of 50000000 steps (66%).
03:52:07:WU00:FS00:0x15:Completed  33500000 out of 50000000 steps (67%).
03:54:21:WU00:FS00:0x15:Completed  34000000 out of 50000000 steps (68%).
03:56:35:WU00:FS00:0x15:Completed  34500000 out of 50000000 steps (69%).
03:58:49:WU00:FS00:0x15:Completed  35000000 out of 50000000 steps (70%).
04:01:03:WU00:FS00:0x15:Completed  35500000 out of 50000000 steps (71%).
04:03:16:WU00:FS00:0x15:Completed  36000000 out of 50000000 steps (72%).
04:05:30:WU00:FS00:0x15:Completed  36500000 out of 50000000 steps (73%).
04:07:44:WU00:FS00:0x15:Completed  37000000 out of 50000000 steps (74%).
04:09:58:WU00:FS00:0x15:Completed  37500000 out of 50000000 steps (75%).
04:12:12:WU00:FS00:0x15:Completed  38000000 out of 50000000 steps (76%).
04:14:25:WU00:FS00:0x15:Completed  38500000 out of 50000000 steps (77%).
04:16:39:WU00:FS00:0x15:Completed  39000000 out of 50000000 steps (78%).
04:18:53:WU00:FS00:0x15:Completed  39500000 out of 50000000 steps (79%).
04:21:07:WU00:FS00:0x15:Completed  40000000 out of 50000000 steps (80%).
04:23:21:WU00:FS00:0x15:Completed  40500000 out of 50000000 steps (81%).
04:25:35:WU00:FS00:0x15:Completed  41000000 out of 50000000 steps (82%).
04:27:48:WU00:FS00:0x15:Completed  41500000 out of 50000000 steps (83%).
04:30:02:WU00:FS00:0x15:Completed  42000000 out of 50000000 steps (84%).
04:32:16:WU00:FS00:0x15:Completed  42500000 out of 50000000 steps (85%).
04:34:30:WU00:FS00:0x15:Completed  43000000 out of 50000000 steps (86%).
04:36:44:WU00:FS00:0x15:Completed  43500000 out of 50000000 steps (87%).
04:38:58:WU00:FS00:0x15:Completed  44000000 out of 50000000 steps (88%).
04:41:11:WU00:FS00:0x15:Completed  44500000 out of 50000000 steps (89%).
04:43:25:WU00:FS00:0x15:Completed  45000000 out of 50000000 steps (90%).
04:45:39:WU00:FS00:0x15:Completed  45500000 out of 50000000 steps (91%).
04:47:53:WU00:FS00:0x15:Completed  46000000 out of 50000000 steps (92%).
04:50:07:WU00:FS00:0x15:Completed  46500000 out of 50000000 steps (93%).
04:52:20:WU00:FS00:0x15:Completed  47000000 out of 50000000 steps (94%).
04:54:34:WU00:FS00:0x15:Completed  47500000 out of 50000000 steps (95%).
04:56:48:WU00:FS00:0x15:Completed  48000000 out of 50000000 steps (96%).
04:59:02:WU00:FS00:0x15:Completed  48500000 out of 50000000 steps (97%).
05:01:16:WU00:FS00:0x15:Completed  49000000 out of 50000000 steps (98%).
05:03:30:WU00:FS00:0x15:Completed  49500000 out of 50000000 steps (99%).
05:03:30:WU01:FS00:Connecting to assign-GPU.stanford.edu:80
05:03:31:WU01:FS00:News: Welcome to Folding@Home
05:03:31:WU01:FS00:Assigned to work server 171.67.108.142
05:03:31:WU01:FS00:Requesting new work unit for slot 00: RUNNING gpu:0:"GF114 [GeForce GTX 560 Ti]" from 171.67.108.142
05:03:31:WU01:FS00:Connecting to 171.67.108.142:8080
05:03:33:WU01:FS00:Downloading 142.70KiB
05:03:34:WU01:FS00:Download complete
05:03:34:WU01:FS00:Received Unit: id:01 state:DOWNLOAD error:OK project:8018 run:1160 clone:0 gen:24 core:0x15 unit:0x000000216953ee2e500f1f55fa41d0f5
05:03:34:WU01:FS00:Downloading core from http://www.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/beta/Core_15.fah
05:03:34:WU01:FS00:Connecting to www.stanford.edu:80
05:03:36:WU01:FS00:FahCore 15: Downloading 1.88MiB
05:03:42:WU01:FS00:FahCore 15: 36.65%
05:03:48:WU01:FS00:FahCore 15: 73.31%
05:03:52:WU01:FS00:FahCore 15: Download complete
05:03:52:WU01:FS00:Valid core signature
05:03:52:WU01:FS00:Unpacked 7.71MiB to cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/beta/Core_15.fah/FahCore_15.exe
05:03:52:WU01:FS00:Downloading project 8018 description
05:03:52:WU01:FS00:Connecting to fah-web.stanford.edu:80
05:03:53:WU01:FS00:Project 8018 description downloaded successfully
05:05:44:WU00:FS00:0x15:Completed  50000000 out of 50000000 steps (100%).
05:05:44:WU00:FS00:0x15:Finished fah_main status=0
05:05:44:WU00:FS00:0x15:Successful run
05:05:44:WU00:FS00:0x15:DynamicWrapper: Finished Work Unit: sleep=10000
05:05:54:WU00:FS00:0x15:Reserved 330060 bytes for xtc file; Cosm status=0
05:05:54:WU00:FS00:0x15:Allocated 330060 bytes for xtc file
05:05:54:WU00:FS00:0x15:- Reading up to 330060 from "00/wudata_01.xtc": Read 330060
05:05:54:WU00:FS00:0x15:Read 330060 bytes from xtc file; available packet space=786100404
05:05:54:WU00:FS00:0x15:xtc file hash check passed.
05:05:54:WU00:FS00:0x15:Reserved 20616 20616 786100404 bytes for arc file=<00/wudata_01.trr> Cosm status=0
05:05:54:WU00:FS00:0x15:Allocated 20616 bytes for arc file
05:05:54:WU00:FS00:0x15:- Reading up to 20616 from "00/wudata_01.trr": Read 20616
05:05:54:WU00:FS00:0x15:Read 20616 bytes from arc file; available packet space=786079788
05:05:54:WU00:FS00:0x15:trr file hash check passed.
05:05:54:WU00:FS00:0x15:Allocated 544 bytes for edr file
05:05:54:WU00:FS00:0x15:Read bedfile
05:05:54:WU00:FS00:0x15:edr file hash check passed.
05:05:54:WU00:FS00:0x15:Allocated 36779 bytes for logfile
05:05:54:WU00:FS00:0x15:Read logfile
05:05:54:WU00:FS00:0x15:GuardedRun: success in DynamicWrapper
05:05:54:WU00:FS00:0x15:GuardedRun: done
05:05:54:WU00:FS00:0x15:Run: GuardedRun completed.
05:05:57:WU00:FS00:0x15:+ Opened results file
05:05:57:WU00:FS00:0x15:- Writing 388511 bytes of core data to disk...
05:05:57:WU00:FS00:0x15:Done: 387999 -> 357035 (compressed to 92.0 percent)
05:05:57:WU00:FS00:0x15:  ... Done.
05:05:57:WU00:FS00:0x15:DeleteFrameFiles: successfully deleted file=00/wudata_01.ckp
05:05:57:WU00:FS00:0x15:Shutting down core 
05:05:57:WU00:FS00:0x15:
05:05:57:WU00:FS00:0x15:Folding@home Core Shutdown: FINISHED_UNIT
05:05:57:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
05:05:57:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:8054 run:0 clone:99 gen:94 core:0x15 unit:0x000000656953ee2f50626a80695666af
05:05:58:WU01:FS00:Starting
05:05:58:WU00:FS00:Uploading 349.17KiB to 171.67.108.143
05:05:58:WU01:FS00:Running FahCore: "D:\Program Files\Software\FAHClient\FAHClient 7.1.52/FAHCoreWrapper.exe" "D:/Program Files/Software/FAH Data/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/beta/Core_15.fah/FahCore_15.exe" -dir 01 -suffix 01 -version 701 -lifeline 3916 -checkpoint 15 -gpu 0
05:05:58:WU00:FS00:Connecting to 171.67.108.143:8080
05:05:58:WU01:FS00:Started FahCore on PID 3832
05:05:59:WU01:FS00:Core PID:2288
05:05:59:WU01:FS00:FahCore 0x15 started
05:05:59:WU01:FS00:0x15:
05:05:59:WU01:FS00:0x15:*------------------------------*
05:05:59:WU01:FS00:0x15:Folding@Home GPU Core
05:05:59:WU01:FS00:0x15:Version                2.25 (Wed May 9 17:03:01 EDT 2012)
05:05:59:WU01:FS00:0x15:Build host             AmoebaRemote
05:05:59:WU01:FS00:0x15:Board Type             NVIDIA/CUDA
05:05:59:WU01:FS00:0x15:Core                   15
05:05:59:WU01:FS00:0x15:
05:05:59:WU01:FS00:0x15:Window's signal control handler registered.
05:05:59:WU01:FS00:0x15:Preparing to commence simulation
05:05:59:WU01:FS00:0x15:- Looking at optimizations...
05:05:59:WU01:FS00:0x15:DeleteFrameFiles: successfully deleted file=01/wudata_01.ckp
05:05:59:WU01:FS00:0x15:- Created dyn
05:05:59:WU01:FS00:0x15:- Files status OK
05:05:59:WU01:FS00:0x15:sizeof(CORE_PACKET_HDR) = 512 file=<>
05:05:59:WU01:FS00:0x15:- Expanded 145610 -> 660986 (decompressed 453.9 percent)
05:06:00:WU01:FS00:0x15:Called DecompressByteArray: compressed_data_size=145610 data_size=660986, decompressed_data_size=660986 diff=0
05:06:00:WU01:FS00:0x15:- Digital signature verified
05:06:00:WU01:FS00:0x15:
05:06:00:WU01:FS00:0x15:Project: 8018 (Run 1160, Clone 0, Gen 24)
05:06:00:WU01:FS00:0x15:
05:06:00:WU01:FS00:0x15:Assembly optimizations on if available.
05:06:00:WU01:FS00:0x15:Entering M.D.
05:06:01:WU01:FS00:0x15:Tpr hash 01/wudata_01.tpr:  3722694100 640778057 213910449 3799536088 3063139591
05:06:01:WU01:FS00:0x15:GPU device id=0
05:06:01:WU01:FS00:0x15:Working on GRowing Old MAkes el Chrono Sweat
05:06:01:WU01:FS00:0x15:Client config unavailable.
05:06:01:WU01:FS00:0x15:Starting GUI Server
05:06:05:WU00:FS00:Upload 91.65%
05:06:06:WU00:FS00:Upload complete
05:06:06:WU00:FS00:Server responded WORK_ACK (400)
05:06:06:WU00:FS00:Final credit estimate, 2387.00 points
05:06:07:WU00:FS00:Cleaning up
05:07:04:WU01:FS00:0x15:Setting checkpoint frequency: 250000
05:07:04:WU01:FS00:0x15:Completed         3 out of 25000000 steps (0%).
05:12:33:WU01:FS00:0x15:Completed    250000 out of 25000000 steps (1%).
05:18:03:WU01:FS00:0x15:Completed    500000 out of 25000000 steps (2%).
******************************** Date: 08/11/12 ********************************
05:23:32:WU01:FS00:0x15:Completed    750000 out of 25000000 steps (3%).
05:29:02:WU01:FS00:0x15:Completed   1000000 out of 25000000 steps (4%).
05:34:32:WU01:FS00:0x15:Completed   1250000 out of 25000000 steps (5%).
05:40:02:WU01:FS00:0x15:Completed   1500000 out of 25000000 steps (6%).
05:45:31:WU01:FS00:0x15:Completed   1750000 out of 25000000 steps (7%).
05:51:01:WU01:FS00:0x15:Completed   2000000 out of 25000000 steps (8%).
05:56:31:WU01:FS00:0x15:Completed   2250000 out of 25000000 steps (9%).
06:02:00:WU01:FS00:0x15:Completed   2500000 out of 25000000 steps (10%).
06:07:30:WU01:FS00:0x15:Completed   2750000 out of 25000000 steps (11%).
06:13:00:WU01:FS00:0x15:Completed   3000000 out of 25000000 steps (12%).
06:18:30:WU01:FS00:0x15:Completed   3250000 out of 25000000 steps (13%).
06:24:00:WU01:FS00:0x15:Completed   3500000 out of 25000000 steps (14%).
The temp didnt go over 70C - that seems alright, so maybe the temperature is not the reason for failing wu.
 
You can force the GPU fan to run at max speed to cool the GPU, EVGA has Precision X that I run on my GTS 450's and I run the fans at 70% PWM or more. I'm sure other makers of GT, GTS and GTX GPU's will have about the same thing. When I'm playing a game like EverQuest (EverCrack :) ) I saw that my temp was over 60c, to me that's too hot, after maxing the fan speed, it don't go over 45c.
 
WhitehawkEQ, I've set the fan to spin at 100% after 70C temp with MSIAfterburner. I had some fan issue on this gpu (1 of 2 fans is spinning slower, and is harder to spin when i touch it, even though it wasnt clogged or something, and i clean the dust occasionally) - i've added 2 fans around the GPU + one 12cm fan blowing in the direction od the GPU fans (from below the gpu, in upward direction) - this way i get max temps of 70-72C.

I'm back from work now and i see more failed wu's, not only 7624, but also 8018 (i thought that only 7624 fail).

I've read on EVGA's forum that the reasons for "nans (not a number)" error may occur on gpu for 2 reasons:
1)Unstable OC of the gpu, and some wu's are sensitive to OC'ed cards (fail if the clocks are not stock)
- in my case, it has some small factory oc, but it was stable all the time. Maybe if every other ways are unsuccessful i may try to get it to stock clocks and voltages - but that is kept as a last resort.

2) Disabling the advmethods flag for the gpu
- i dont remember that ive set it some time in the past, i also cant find that option in the v7 client configuration.

Also now I've noticed that after the update to 7.2.9 I didnt set the "Slightly Higher" F@h core priority - ive set it to "slightly higher" now - will see what it does, maybe THAT was the reason for the "Unstable Machine" error.
 
Update: WU's still fail with "nan's detected on gpu" (project 8018). Now I've added the "advmethods = false" flag, as EVGA forums suggeted - will see what it does, Later today I'll do some dust cleaning from the comp, although I still dont think thats the reason for failing as the max temp when folding was 70C and the gpu is not clogged with dust.
 
Unfortunately, I'll have to stop GPU folding.
I have checked the gpu fans - they were both clogged and didnt spin right (they have strong resistance to spin). I've tried to clean them with a brush, and a hair dryer (without the heat) - dust came out of the fan base, but it still doesn't spin well.

So, i removed the original gpu fans' remounted the heatsink (Using thermal compound of the Noctua nh-d14) and fixed 2 120mm fans under the gpu (one is 110 cfm, other a bit lower cfm) - no luck - i get even higher temps - up to 80-82C when folding - so I'd have to stop folding on the gpu, at least for now.

Maybe I'll look up for some aftermarket heatsink, like the Arctic Cooling Accelero Twin Turbo II, but I think that the WU's will still fail, because they failed with 70C max temps - these are ok temps for the 560ti, and I didnt get any errors with them until recently.
 
Don't feel too bad I had a HD 5870 that simply wouldn't fold without locking up or crashing under the NANS error that you reported. This card would play anything I threw at it, but simply wouldn't fold for squat. Anyway, no one could help me sort it out, the manufacturer didn't even understand what my complaint was, but I finally got an RMA received a replacement HD 5870 and guess what? It folded perfectly. Sometimes you can only know the limits of your card by folding on it.
 
Nah, I don't feel bad and keep folding on CPU's, maybe I'll sort something with the fans when the situation here calms down.
 
Looks like I've managed to solve the problem and fold again on the 560ti after downclocking it to 780 MHz - no more "NAN's" errors.
The card is GV-N560OC-1GI has factory OC to 900 MHz, the Nvidia default for this card is 820 MHz, but i downclocked more because of temps issue.
Now i get 22KPPD and 73C at p80XX,
and 25KPPD and 78C at 76XX.
Maybe there will be a temps issue later in the summer, when temps here will go up 4-5C.
 
Back