• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

Getting aborted work units

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

molnarc

Registered
Joined
Nov 10, 2009
Location
Glendale, AZ
For the past month or so I've been getting aborted work units. I was running the 6.34 client and had several aborts. I removed all overclocking and ran memory tests for hours with no problems. I then updated to the Version 7 client and had a couple of successful SMP units and now have had two successive aborted units. I run a Core I7 980X and 2 GTX580's. I don't see many problems with the GPU's, just the CPU.

I do not have any problems with any other applications. Here is a partial log file where one failing unit is Unit-02 at 01:25:01 and the other is Unit-03 at 02:55:29.

Does anyone see any clues in the log or have any suggestions on how to correct the problem or get additional information needed to debug?

Any help would be greatly appreciated as I seem to have wasted a lot of compute time lately.

Thank you for any help.

Code:
*********************** Log Started 21/Aug/2011-06:52:31 ***********************
06:52:31:************************* Folding@home Client *************************
06:52:31:      Website: http://folding.stanford.edu/
06:52:31:    Copyright: (c) 2009,2010 Stanford University
06:52:31:       Author: Joseph Coffland <[email protected]>
06:52:31:         Args: --lifeline 4008 --command-port=36330
06:52:31:       Config: C:/ProgramData/FAHClient/config.xml
06:52:31:******************************** Build ********************************
06:52:31:      Version: 7.1.24
06:52:31:         Date: Apr 6 2011
06:52:31:         Time: 21:37:58
06:52:31:      SVN Rev: 2908
06:52:31:       Branch: fah/trunk/client
06:52:31:     Compiler: Intel(R) C++ MSVC 1500 mode 1110
06:52:31:      Options: /TP /nologo /EHa /wd4297 /wd4103 /wd1786 /Ox -arch:SSE2
06:52:31:               /QaxSSE3,SSSE3,SSE4.1,SSE4.2 /Qrestrict /MT
06:52:31:     Platform: win32 Vista
06:52:31:         Bits: 32
06:52:31:         Mode: Release
06:52:31:******************************* System ********************************
06:52:31:           OS: Microsoft Windows 7 Ultimate
06:52:31:          CPU: Intel(R) Core(TM) i7 CPU X 980 @ 3.33GHz
06:52:31:       CPU ID: GenuineIntel Family 6 Model 44 Stepping 2
06:52:31:         CPUs: 12
06:52:31:       Memory: 23.99GiB
06:52:31:  Free Memory: 15.84GiB
06:52:31:      Threads: WINDOWS_THREADS
06:52:31:         GPUs: 2
06:52:31:        GPU 0: FERMI:1 GF110 [Geforce GTX 580]
06:52:31:        GPU 1: FERMI:1 GF110 [Geforce GTX 580]
06:52:31:         CUDA: 2.0
06:52:31:  CUDA Driver: 4000
06:52:31:   On Battery: false
06:52:31:   UTC offset: -7
06:52:31:          PID: 12600
06:52:31:          CWD: C:/ProgramData/FAHClient
06:52:31:Win32 Service: false
06:52:31:***********************************************************************
06:52:31:<config>
06:52:31:  <service-description v='Folding@home Client'/>
06:52:31:  <service-restart v='true'/>
06:52:31:  <service-restart-delay v='5000'/>
06:52:31:
06:52:31:  <!-- Client Control -->
06:52:31:  <cycle-rate v='4'/>
06:52:31:  <cycles v='-1'/>
06:52:31:  <data-directory v='.'/>
06:52:31:  <exec-directory v='C:\Program Files (x86)\FAHClient'/>
06:52:31:  <exit-when-done v='false'/>
06:52:31:  <max-delay v='21600'/>
06:52:31:  <min-delay v='60'/>
06:52:31:  <threads v='4'/>
06:52:31:
06:52:31:  <!-- Configuration -->
06:52:31:  <config-rotate v='true'/>
06:52:31:  <config-rotate-dir v='configs'/>
06:52:31:  <config-rotate-max v='16'/>
06:52:31:
06:52:31:  <!-- Debugging -->
06:52:31:  <assignment-servers>
06:52:31:    assign3.stanford.edu:8080 assign4.stanford.edu:80
06:52:31:  </assignment-servers>
06:52:31:  <capture-directory v='capture'/>
06:52:31:  <capture-sockets v='false'/>
06:52:31:  <debug-sockets v='false'/>
06:52:31:  <exception-locations v='true'/>
06:52:31:  <gpu-assignment-servers>
06:52:31:    assign-GPU.stanford.edu:80 assign-GPU.stanford.edu:8080
06:52:31:  </gpu-assignment-servers>
06:52:31:  <stack-traces v='false'/>
06:52:31:
06:52:31:  <!-- Error Handling -->
06:52:31:  <max-slot-errors v='5'/>
06:52:31:  <max-unit-errors v='5'/>
06:52:31:
06:52:31:  <!-- FahCore Control -->
06:52:31:  <checkpoint v='30'/>
06:52:31:  <core-dir v='cores'/>
06:52:31:  <core-priority v='idle'/>
06:52:31:  <cpu-affinity v='false'/>
06:52:31:  <cpu-usage v='100'/>
06:52:31:  <no-assembly v='false'/>
06:52:31:
06:52:31:  <!-- Folding Slot Configuration -->
06:52:31:  <client-subtype v='STDCLI'/>
06:52:31:  <client-type v='bigadv'/>
06:52:31:  <cpu-species v='X86_PENTIUM_II'/>
06:52:31:  <cpu-type v='X86'/>
06:52:31:  <cpus v='12'/>
06:52:31:  <extra-core-args v='-forceasm'/>
06:52:31:  <gpu v='false'/>
06:52:31:  <gpu-id v='0'/>
06:52:31:  <max-packet-size v='big'/>
06:52:31:  <os-species v='UNKNOWN'/>
06:52:31:  <os-type v='WIN32'/>
06:52:31:  <project-key v='0'/>
06:52:31:  <smp v='true'/>
06:52:31:
06:52:31:  <!-- Logging -->
06:52:31:  <log v='log.txt'/>
06:52:31:  <log-color v='false'/>
06:52:31:  <log-crlf v='true'/>
06:52:31:  <log-date v='false'/>
06:52:31:  <log-debug v='true'/>
06:52:31:  <log-domain v='false'/>
06:52:31:  <log-header v='true'/>
06:52:31:  <log-level v='true'/>
06:52:31:  <log-no-info-header v='true'/>
06:52:31:  <log-redirect v='false'/>
06:52:31:  <log-rotate v='true'/>
06:52:31:  <log-rotate-dir v='logs'/>
06:52:31:  <log-rotate-max v='16'/>
06:52:31:  <log-short-level v='false'/>
06:52:31:  <log-simple-domains v='true'/>
06:52:31:  <log-thread-id v='false'/>
06:52:31:  <log-time v='true'/>
06:52:31:  <log-to-screen v='true'/>
06:52:31:  <log-truncate v='false'/>
06:52:31:  <verbosity v='5'/>
06:52:31:
06:52:31:  <!-- Network -->
06:52:31:  <proxy v=':8080'/>
06:52:31:  <proxy-enable v='false'/>
06:52:31:  <proxy-pass v=''/>
06:52:31:  <proxy-user v=''/>
06:52:31:
06:52:31:  <!-- Process Control -->
06:52:31:  <child v='false'/>
06:52:31:  <daemon v='false'/>
06:52:31:  <pid v='false'/>
06:52:31:  <pid-file v='Folding@home Client.pid'/>
06:52:31:  <respawn v='false'/>
06:52:31:  <service v='false'/>
06:52:31:
06:52:31:  <!-- Remote Command Server -->
06:52:31:  <command-address v='0.0.0.0'/>
06:52:31:  <command-allow v='127.0.0.1'/>
06:52:31:  <command-allow-no-pass v='127.0.0.1'/>
06:52:31:  <command-deny v='0.0.0.0/0'/>
06:52:31:  <command-deny-no-pass v='0.0.0.0/0'/>
06:52:31:  <command-port v='36330'/>
06:52:31:  <password v='*******'/>
06:52:31:
06:52:31:  <!-- Slot Control -->
06:52:31:  <max-shutdown-wait v='60'/>
06:52:31:  <pause-on-battery v='false'/>
06:52:31:  <pause-on-start v='false'/>
06:52:31:
06:52:31:  <!-- User Information -->
06:52:31:  <machine-id v='0'/>
06:52:31:  <passkey v='********************************'/>
06:52:31:  <team v='32'/>
06:52:31:  <user v='Molnarc'/>
06:52:31:
06:52:31:  <!-- Work Unit Control -->
06:52:31:  <dump-after-deadline v='true'/>
06:52:31:  <max-queue v='16'/>
06:52:31:  <max-units v='0'/>
06:52:31:  <next-unit-percentage v='98'/>
06:52:31:
06:52:31:  <!-- Folding Slots -->
06:52:31:  <slot id='0' type='SMP'/>
06:52:31:  <slot id='1' type='GPU'>
06:52:31:    <gpu-id v='1'/>
06:52:31:  </slot>
06:52:31:  <slot id='2' type='GPU'/>
06:52:31:</config>
06:52:31:Trying to access database...
06:52:32:Database locked
06:52:32:Enabled folding slot 00: READY smp:12
06:52:32:Enabled folding slot 01: READY gpu:1:"GF110 [Geforce GTX 580]"
06:52:32:Enabled folding slot 02: READY gpu:0:"GF110 [Geforce GTX 580]"
06:52:32:Started thread 3 on PID 12600
06:52:32:Started thread 4 on PID 12600
06:52:32:Starting Unit 02
06:52:32:Started thread 6 on PID 12600
06:52:32:Started thread 1 on PID 12600
06:52:32:Started thread 5 on PID 12600
06:52:32:Running core: C:/ProgramData/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/Core_a5.fah/FahCore_a5.exe -dir 02 -suffix 01 -lifeline 12600 -version 701 -checkpoint 30 -np 12 -forceasm
06:52:32:Server connection id=1 on 0.0.0.0:36330 from 127.0.0.1
06:52:32:Started thread 7 on PID 12600
06:52:32:Started core on PID 11840
06:52:32:FahCore 0xa5 started
06:52:32:Started thread 8 on PID 12600
06:52:33:Starting Unit 01
06:52:33:Running core: C:/ProgramData/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/NVIDIA/Fermi/Core_15.fah/FahCore_15.exe -dir 01 -suffix 01 -lifeline 12600 -version 701 -checkpoint 30 -gpu 0 -forceasm
06:52:33:Unit 02:
06:52:33:Unit 02:*------------------------------*
06:52:33:Unit 02:Folding@Home Gromacs SMP Core
06:52:33:Unit 02:Version 2.27 (Mar 12, 2010)
06:52:33:Started core on PID 13844
06:52:33:Unit 02:
06:52:33:FahCore 0x15 started
06:52:33:Started thread 9 on PID 12600
06:52:33:Unit 02:Preparing to commence simulation
06:52:33:Unit 02:- Ensuring status. Please wait.
06:52:33:Starting Unit 03
06:52:33:Running core: C:/ProgramData/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/NVIDIA/Fermi/Core_15.fah/FahCore_15.exe -dir 03 -suffix 01 -lifeline 12600 -version 701 -checkpoint 30 -gpu 1 -forceasm
06:52:33:Started core on PID 11728
06:52:33:FahCore 0x15 started
06:52:33:Started thread 10 on PID 12600
06:52:34:Unit 01:
06:52:34:Unit 01:*------------------------------*
06:52:34:Unit 01:Folding@Home GPU Core
06:52:34:Unit 01:Version                2.20 (Tue Aug 2 12:06:37 PDT 2011)
06:52:34:Unit 01:Build host             SimbiosNvdWin7
06:52:34:Unit 01:Board Type             NVIDIA/CUDA
06:52:34:Unit 01:Core                   15
06:52:34:Unit 01:
06:52:34:Unit 01:Window's signal control handler registered.
06:52:34:Unit 01:Preparing to commence simulation
06:52:34:Unit 01:- Ensuring status. Please wait.
06:52:34:Unit 03:
06:52:34:Unit 03:*------------------------------*
06:52:34:Unit 03:Folding@Home GPU Core
06:52:34:Unit 03:Version                2.20 (Tue Aug 2 12:06:37 PDT 2011)
06:52:34:Unit 03:Build host             SimbiosNvdWin7
06:52:34:Unit 03:Board Type             NVIDIA/CUDA
06:52:34:Unit 03:Core                   15
06:52:34:Unit 03:GPU device id          1
06:52:34:Unit 03:
06:52:34:Unit 03:Window's signal control handler registered.
06:52:34:Unit 03:Preparing to commence simulation
06:52:34:Unit 03:- Ensuring status. Please wait.
06:52:42:Unit 02:- Assembly optimizations manually forced on.
06:52:42:Unit 02:- Not checking prior termination.
06:52:43:Unit 01:- Assembly optimizations manually forced on.
06:52:43:Unit 01:- Not checking prior termination.
06:52:43:Unit 01:sizeof(CORE_PACKET_HDR) = 512 file=<>
06:52:43:Unit 01:- Expanded 124563 -> 501826 (decompressed 402.8 percent)
06:52:43:Unit 01:Called DecompressByteArray: compressed_data_size=124563 data_size=501826, decompressed_data_size=501826 diff=0
06:52:43:Unit 01:- Digital signature verified
06:52:43:Unit 01:
06:52:43:Unit 01:Project: 7620 (Run 400, Clone 0, Gen 5)
06:52:43:Unit 01:
06:52:43:Unit 01:Assembly optimizations on if available.
06:52:43:Unit 01:Entering M.D.
06:52:43:Unit 03:- Assembly optimizations manually forced on.
06:52:43:Unit 03:- Not checking prior termination.
06:52:43:Unit 03:sizeof(CORE_PACKET_HDR) = 512 file=<>
06:52:43:Unit 03:- Expanded 124249 -> 501826 (decompressed 403.8 percent)
06:52:43:Unit 03:Called DecompressByteArray: compressed_data_size=124249 data_size=501826, decompressed_data_size=501826 diff=0
06:52:43:Unit 03:- Digital signature verified
06:52:43:Unit 03:
06:52:43:Unit 03:Project: 7621 (Run 314, Clone 0, Gen 3)
06:52:43:Unit 03:
06:52:43:Unit 03:Assembly optimizations on if available.
06:52:43:Unit 03:Entering M.D.
06:52:45:Unit 01:Will resume from checkpoint file 01/wudata_01.ckp
06:52:45:Unit 01:Tpr hash 01/wudata_01.tpr:  3883752428 635684737 129387903 3290618737 831376637
06:52:45:Unit 01:calling fah_main gpuDeviceId=0
06:52:45:Unit 01:Working on Protein
06:52:45:Unit 01:Client config unavailable.
06:52:45:Unit 03:Will resume from checkpoint file 03/wudata_01.ckp
06:52:45:Unit 03:Tpr hash 03/wudata_01.tpr:  1115717552 3400794249 773077988 2445058142 2685727039
06:52:45:Unit 03:calling fah_main gpuDeviceId=1
06:52:45:Unit 03:Working on Protein
06:52:45:Unit 03:Client config unavailable.
06:52:45:Unit 01:Starting GUI Server
06:52:45:Unit 03:Starting GUI Server
06:52:47:Unit 02:- Expanded 24863958 -> 30796292 (decompressed 123.8 percent)
06:52:47:Unit 02:Called DecompressByteArray: compressed_data_size=24863958 data_size=30796292, decompressed_data_size=30796292 diff=0
06:52:47:Unit 02:- Digital signature verified
06:52:47:Unit 02:
06:52:47:Unit 02:Project: 6900 (Run 39, Clone 5, Gen 33)
06:52:47:Unit 02:
06:52:47:Unit 02:Assembly optimizations on if available.
06:52:47:Unit 02:Entering M.D.
06:52:53:Unit 02:Using Gromacs checkpoints
06:52:54:Unit 02:Mapping NT from 12 to 12 
06:53:47:Unit 03:Resuming from checkpoint
06:53:47:Unit 03:fcCheckPointResume: retreived and current tpr file hash:
06:53:47:Unit 03:   0   1115717552   1115717552
06:53:47:Unit 03:   1   3400794249   3400794249
06:53:47:Unit 03:   2    773077988    773077988
06:53:47:Unit 03:   3   2445058142   2445058142
06:53:47:Unit 03:   4   2685727039   2685727039
06:53:47:Unit 03:fcCheckPointResume: file hashes same.
06:53:48:Unit 03:fcCheckPointResume: state restored.
06:53:48:Unit 03:fcCheckPointResume: name 03/wudata_01.log Verified 03/wudata_01.log
06:53:48:Unit 03:fcCheckPointResume: name 03/wudata_01.trr Verified 03/wudata_01.trr
06:53:48:Unit 03:fcCheckPointResume: name 03/wudata_01.xtc Verified 03/wudata_01.xtc
06:53:48:Unit 03:fcCheckPointResume: name 03/wudata_01.edr Verified 03/wudata_01.edr
06:53:48:Unit 03:fcCheckPointResume: state restored 2
06:53:48:Unit 03:Resumed from checkpoint
06:53:48:Unit 03:Setting checkpoint frequency: 400000
06:53:48:Unit 03:Completed  38000001 out of 40000000 steps (95%).
06:53:48:Unit 01:Resuming from checkpoint
06:53:48:Unit 01:fcCheckPointResume: retreived and current tpr file hash:
06:53:48:Unit 01:   0   3883752428   3883752428
06:53:48:Unit 01:   1    635684737    635684737
06:53:48:Unit 01:   2    129387903    129387903
06:53:48:Unit 01:   3   3290618737   3290618737
06:53:48:Unit 01:   4    831376637    831376637
06:53:48:Unit 01:fcCheckPointResume: file hashes same.
06:53:48:Unit 01:fcCheckPointResume: state restored.
06:53:48:Unit 01:fcCheckPointResume: name 01/wudata_01.log Verified 01/wudata_01.log
06:53:48:Unit 01:fcCheckPointResume: name 01/wudata_01.trr Verified 01/wudata_01.trr
06:53:48:Unit 01:fcCheckPointResume: name 01/wudata_01.xtc Verified 01/wudata_01.xtc
06:53:48:Unit 01:fcCheckPointResume: name 01/wudata_01.edr Verified 01/wudata_01.edr
06:53:48:Unit 01:fcCheckPointResume: state restored 2
06:53:48:Unit 01:Resumed from checkpoint
06:53:48:Unit 01:Setting checkpoint frequency: 400000
06:53:48:Unit 01:Completed  38000001 out of 40000000 steps (95%).
06:54:56:Unit 02:Resuming from checkpoint
06:54:56:Unit 02:Verified 02/wudata_01.log
06:54:57:Unit 02:Verified 02/wudata_01.trr
06:54:57:Unit 02:Verified 02/wudata_01.xtc
06:54:57:Unit 02:Verified 02/wudata_01.edr
06:57:43:Unit 02:Completed 43245 out of 250000 steps  (17%)
06:57:58:Unit 03:Completed  38400000 out of 40000000 steps (96%).
06:57:58:Unit 01:Completed  38400000 out of 40000000 steps (96%).
07:02:09:Unit 03:Completed  38800000 out of 40000000 steps (97%).
07:02:10:Unit 01:Completed  38800000 out of 40000000 steps (97%).
07:06:21:Unit 01:Completed  39200000 out of 40000000 steps (98%).
07:06:22:Connecting to assign-GPU.stanford.edu:80
07:06:22:News: Welcome to Folding@Home
07:06:22:Assigned to work server 171.64.65.105
07:06:22:Requesting new work unit for slot 02: RUNNING gpu:0:"GF110 [Geforce GTX 580]" from 171.64.65.105
07:06:22:Connecting to 171.64.65.105:8080
07:06:23:Slot 02: Downloading 122.17KiB
07:06:23:Slot 02: Download complete
07:06:23:Received Unit: id:00 state:DOWNLOAD project:7620 run:440 clone:0 gen:6 core:0x15 unit:0x00000006664f2dd14e42f5de7477ea2d
07:06:25:Unit 03:Completed  39200000 out of 40000000 steps (98%).
07:06:26:Connecting to assign-GPU.stanford.edu:80
07:06:26:News: Welcome to Folding@Home
07:06:26:Assigned to work server 171.64.65.105
07:06:26:Requesting new work unit for slot 01: RUNNING gpu:1:"GF110 [Geforce GTX 580]" from 171.64.65.105
07:06:26:Connecting to 171.64.65.105:8080
07:06:27:Slot 01: Downloading 122.79KiB
07:06:27:Slot 01: Download complete
07:06:27:Received Unit: id:04 state:DOWNLOAD project:7621 run:502 clone:0 gen:1 core:0x15 unit:0x00000001664f2dd14e4307b69ffb389f
07:10:33:Unit 01:Completed  39600000 out of 40000000 steps (99%).
07:10:36:Unit 03:Completed  39600000 out of 40000000 steps (99%).
07:14:44:Unit 01:Completed  40000000 out of 40000000 steps (100%).
07:14:45:Unit 01:Finished fah_main status=0
07:14:45:Unit 01:Successful run
07:14:45:Unit 01:DynamicWrapper: Finished Work Unit: sleep=10000
07:14:48:Unit 03:Completed  40000000 out of 40000000 steps (100%).
07:14:48:Unit 03:Finished fah_main status=0
07:14:48:Unit 03:Successful run
07:14:48:Unit 03:DynamicWrapper: Finished Work Unit: sleep=10000
07:14:55:Unit 01:Reserved 762776 bytes for xtc file; Cosm status=0
07:14:55:Unit 01:Allocated 762776 bytes for xtc file
07:14:55:Unit 01:- Reading up to 762776 from "01/wudata_01.xtc": Read 762776
07:14:55:Unit 01:Read 762776 bytes from xtc file; available packet space=785667688
07:14:55:Unit 01:xtc file hash check passed.
07:14:55:Unit 01:Reserved 47688 47688 785667688 bytes for arc file=<01/wudata_01.trr> Cosm status=0
07:14:55:Unit 01:Allocated 47688 bytes for arc file
07:14:55:Unit 01:- Reading up to 47688 from "01/wudata_01.trr": Read 47688
07:14:55:Unit 01:Read 47688 bytes from arc file; available packet space=785620000
07:14:55:Unit 01:trr file hash check passed.
07:14:55:Unit 01:Allocated 544 bytes for edr file
07:14:55:Unit 01:Read bedfile
07:14:55:Unit 01:edr file hash check passed.
07:14:55:Unit 01:Allocated 36880 bytes for logfile
07:14:55:Unit 01:Read logfile
07:14:55:Unit 01:GuardedRun: success in DynamicWrapper
07:14:55:Unit 01:GuardedRun: done
07:14:55:Unit 01:Run: GuardedRun completed.
07:14:58:Unit 03:Reserved 760532 bytes for xtc file; Cosm status=0
07:14:58:Unit 03:Allocated 760532 bytes for xtc file
07:14:58:Unit 03:- Reading up to 760532 from "03/wudata_01.xtc": Read 760532
07:14:58:Unit 03:Read 760532 bytes from xtc file; available packet space=785669932
07:14:58:Unit 03:xtc file hash check passed.
07:14:58:Unit 03:Reserved 47688 47688 785669932 bytes for arc file=<03/wudata_01.trr> Cosm status=0
07:14:58:Unit 03:Allocated 47688 bytes for arc file
07:14:58:Unit 03:- Reading up to 47688 from "03/wudata_01.trr": Read 47688
07:14:58:Unit 03:Read 47688 bytes from arc file; available packet space=785622244
07:14:58:Unit 03:trr file hash check passed.
07:14:58:Unit 03:Allocated 544 bytes for edr file
07:14:58:Unit 03:Read bedfile
07:14:58:Unit 03:edr file hash check passed.
07:14:58:Unit 03:Allocated 36880 bytes for logfile
07:14:58:Unit 03:Read logfile
07:14:58:Unit 03:GuardedRun: success in DynamicWrapper
07:14:58:Unit 03:GuardedRun: done
07:14:58:Unit 03:Run: GuardedRun completed.
07:14:58:Unit 01:+ Opened results file
07:14:58:Unit 01:- Writing 848400 bytes of core data to disk...
07:14:59:FahCore, running Unit 01, returned: FINISHED_UNIT (100)
07:14:59:Sending unit results: id:01 state:SEND project:7620 run:400 clone:0 gen:5 core:0x15 unit:0x0000000b664f2dd14e42f5aa30262c40
07:15:00:Unit 01: Uploading 801.00KiB
07:15:00:Starting Unit 00
07:15:00:Connecting to 171.64.65.105:8080
07:15:00:Running core: C:/ProgramData/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/NVIDIA/Fermi/Core_15.fah/FahCore_15.exe -dir 00 -suffix 01 -lifeline 12600 -version 701 -checkpoint 30 -gpu 0 -forceasm
07:15:00:Started core on PID 10680
07:15:00:FahCore 0x15 started
07:15:00:Started thread 11 on PID 12600
07:15:00:Unit 00:
07:15:00:Unit 00:*------------------------------*
07:15:00:Unit 00:Folding@Home GPU Core
07:15:00:Unit 00:Version                2.20 (Tue Aug 2 12:06:37 PDT 2011)
07:15:00:Unit 00:Build host             SimbiosNvdWin7
07:15:00:Unit 00:Board Type             NVIDIA/CUDA
07:15:00:Unit 00:Core                   15
07:15:00:Unit 00:
07:15:00:Unit 00:Window's signal control handler registered.
07:15:00:Unit 00:Preparing to commence simulation
07:15:00:Unit 00:- Assembly optimizations manually forced on.
07:15:00:Unit 00:- Not checking prior termination.
07:15:00:Unit 00:sizeof(CORE_PACKET_HDR) = 512 file=<>
07:15:00:Unit 00:- Expanded 124594 -> 501826 (decompressed 402.7 percent)
07:15:00:Unit 00:Called DecompressByteArray: compressed_data_size=124594 data_size=501826, decompressed_data_size=501826 diff=0
07:15:00:Unit 00:- Digital signature verified
07:15:00:Unit 00:
07:15:00:Unit 00:Project: 7620 (Run 440, Clone 0, Gen 6)
07:15:00:Unit 00:
07:15:00:Unit 00:Assembly optimizations on if available.
07:15:01:Unit 00:Entering M.D.
07:15:01:Unit 03:+ Opened results file
07:15:01:Unit 03:- Writing 846156 bytes of core data to disk...
07:15:01:Unit 03:Done: 845644 -> 817562 (compressed to 96.6 percent)
07:15:01:Unit 03:  ... Done.
07:15:02:Unit 03:DeleteFrameFiles: successfully deleted file=03/wudata_01.ckp
07:15:02:Unit 03:Shutting down core 
07:15:02:Unit 03:
07:15:02:Unit 03:Folding@home Core Shutdown: FINISHED_UNIT
07:15:02:Unit 00:Tpr hash 00/wudata_01.tpr:  388382055 862599623 516521409 2648003334 1809571494
07:15:02:Unit 00:calling fah_main gpuDeviceId=0
07:15:02:Unit 00:Working on Protein
07:15:02:Unit 00:Client config unavailable.
07:15:02:FahCore, running Unit 03, returned: FINISHED_UNIT (100)
07:15:02:Unit 00:Starting GUI Server
07:15:03:Sending unit results: id:03 state:SEND project:7621 run:314 clone:0 gen:3 core:0x15 unit:0x00000004664f2dd14e4306cbffd4d395
07:15:04:Unit 03: Uploading 798.90KiB
07:15:04:Starting Unit 04
07:15:04:Connecting to 171.64.65.105:8080
07:15:04:Running core: C:/ProgramData/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/NVIDIA/Fermi/Core_15.fah/FahCore_15.exe -dir 04 -suffix 01 -lifeline 12600 -version 701 -checkpoint 30 -gpu 1 -forceasm
07:15:04:Started core on PID 10856
07:15:04:FahCore 0x15 started
07:15:04:Started thread 12 on PID 12600
07:15:04:Unit 01: Upload complete
07:15:04:Unit 04:
07:15:04:Unit 04:*------------------------------*
07:15:04:Unit 04:Folding@Home GPU Core
07:15:04:Unit 04:Version                2.20 (Tue Aug 2 12:06:37 PDT 2011)
07:15:04:Unit 04:Build host             SimbiosNvdWin7
07:15:04:Unit 04:Board Type             NVIDIA/CUDA
07:15:04:Unit 04:Core                   15
07:15:04:Unit 04:GPU device id          1
07:15:04:Unit 04:
07:15:04:Unit 04:Window's signal control handler registered.
07:15:04:Unit 04:Preparing to commence simulation
07:15:04:Unit 04:- Assembly optimizations manually forced on.
07:15:04:Unit 04:- Not checking prior termination.
07:15:04:Unit 04:sizeof(CORE_PACKET_HDR) = 512 file=<>
07:15:04:Unit 04:- Expanded 125222 -> 501826 (decompressed 400.7 percent)
07:15:04:Unit 04:Called DecompressByteArray: compressed_data_size=125222 data_size=501826, decompressed_data_size=501826 diff=0
07:15:04:Unit 04:- Digital signature verified
07:15:04:Unit 04:
07:15:04:Unit 04:Project: 7621 (Run 502, Clone 0, Gen 1)
07:15:04:Unit 04:
07:15:04:Unit 04:Assembly optimizations on if available.
07:15:04:Unit 04:Entering M.D.
07:15:04:Server responded WORK_ACK (400)
07:15:04:Final credit estimate, 5187.00 points
07:15:05:Cleaning up Unit 01
07:15:06:Unit 04:Tpr hash 04/wudata_01.tpr:  4094808338 3356398220 360061434 2359766096 2780207650
07:15:06:Unit 04:calling fah_main gpuDeviceId=1
07:15:06:Unit 04:Working on Protein
07:15:06:Unit 04:Client config unavailable.
07:15:06:Unit 04:Starting GUI Server
07:15:08:Unit 03: Upload complete
07:15:08:Server responded WORK_ACK (400)
07:15:08:Final credit estimate, 5187.00 points
07:15:09:Cleaning up Unit 03
07:16:04:Unit 00:Setting checkpoint frequency: 400000
07:16:04:Unit 00:Completed         3 out of 40000000 steps (0%).
07:16:08:Unit 04:Setting checkpoint frequency: 400000
07:16:08:Unit 04:Completed         3 out of 40000000 steps (0%).
07:20:15:Unit 00:Completed    400000 out of 40000000 steps (1%).
07:20:20:Unit 04:Completed    400000 out of 40000000 steps (1%).
07:24:27:Unit 00:Completed    800000 out of 40000000 steps (2%).
07:24:31:Unit 04:Completed    800000 out of 40000000 steps (2%).
07:27:00:Unit 02:Completed 45000 out of 250000 steps  (18%)
07:28:38:Unit 00:Completed   1200000 out of 40000000 steps (3%).
07:28:44:Unit 04:Completed   1200000 out of 40000000 steps (3%).
07:32:50:Unit 00:Completed   1600000 out of 40000000 steps (4%).
07:33:08:Unit 04:Completed   1600000 out of 40000000 steps (4%).

01:23:23:Unit 01:
01:23:23:Unit 01:Folding@home Core Shutdown: FINISHED_UNIT
01:23:23:FahCore, running Unit 01, returned: FINISHED_UNIT (100)
01:23:24:Sending unit results: id:01 state:SEND project:7620 run:174 clone:0 gen:7 core:0x15 unit:0x00000007664f2dd14e42f48f24faaeca
01:23:24:Unit 01: Uploading 802.14KiB
01:23:24:Starting Unit 00
01:23:24:Connecting to 171.64.65.105:8080
01:23:24:Running core: C:/ProgramData/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/NVIDIA/Fermi/Core_15.fah/FahCore_15.exe -dir 00 -suffix 01 -lifeline 12600 -version 701 -checkpoint 30 -gpu 0 -forceasm
01:23:24:Started core on PID 16904
01:23:24:FahCore 0x15 started
01:23:24:Started thread 24 on PID 12600
01:23:25:Unit 00:
01:23:25:Unit 00:*------------------------------*
01:23:25:Unit 00:Folding@Home GPU Core
01:23:25:Unit 00:Version                2.20 (Tue Aug 2 12:06:37 PDT 2011)
01:23:25:Unit 00:Build host             SimbiosNvdWin7
01:23:25:Unit 00:Board Type             NVIDIA/CUDA
01:23:25:Unit 00:Core                   15
01:23:25:Unit 00:
01:23:25:Unit 00:Window's signal control handler registered.
01:23:25:Unit 00:Preparing to commence simulation
01:23:25:Unit 00:- Assembly optimizations manually forced on.
01:23:25:Unit 00:- Not checking prior termination.
01:23:25:Unit 00:sizeof(CORE_PACKET_HDR) = 512 file=<>
01:23:25:Unit 00:- Expanded 125351 -> 501826 (decompressed 400.3 percent)
01:23:25:Unit 00:Called DecompressByteArray: compressed_data_size=125351 data_size=501826, decompressed_data_size=501826 diff=0
01:23:25:Unit 00:- Digital signature verified
01:23:25:Unit 00:
01:23:25:Unit 00:Project: 7621 (Run 604, Clone 0, Gen 1)
01:23:25:Unit 00:
01:23:25:Unit 00:Assembly optimizations on if available.
01:23:25:Unit 00:Entering M.D.
01:23:27:Unit 00:Tpr hash 00/wudata_01.tpr:  2340313871 3498369425 2956118668 561125340 4152990171
01:23:27:Unit 00:calling fah_main gpuDeviceId=0
01:23:27:Unit 00:Working on Protein
01:23:27:Unit 00:Client config unavailable.
01:23:27:Unit 00:Starting GUI Server
01:23:29:Unit 01: Upload complete
01:23:29:Server responded WORK_ACK (400)
01:23:29:Final credit estimate, 5187.00 points
01:23:29:Cleaning up Unit 01
01:24:28:Unit 00:Setting checkpoint frequency: 400000
01:24:28:Unit 00:Completed         3 out of 40000000 steps (0%).
01:25:01:Unit 02:mdrun returned 255
01:25:01:Unit 02:Going to send back what have done -- stepsTotalG=250000
01:25:01:Unit 02:Work fraction=0.9986 steps=250000.
01:25:05:Unit 02:logfile size=192984 infoLength=192984 edr=0 trr=25
01:25:05:Unit 02:logfile size: 192984 info=192984 bed=0 hdr=25
01:25:05:Unit 02:- Writing 193522 bytes of core data to disk...
01:25:06:Unit 02:  ... Done.
01:25:10:Unit 02:
01:25:10:Unit 02:Folding@home Core Shutdown: EARLY_UNIT_END
01:25:10:FahCore, running Unit 02, returned: BAD_WORK_UNIT (114)
01:25:11:Sending unit results: id:02 state:SEND project:6900 run:39 clone:5 gen:33 core:0xa5 unit:0x0000003552be740d4de96badd5a72ac9
01:25:11:Unit 02: Uploading 188.99KiB
01:25:11:Starting Unit 03
01:25:11:Connecting to 130.237.232.141:8080
01:25:11:Running core: C:/ProgramData/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/Core_a5.fah/FahCore_a5.exe -dir 03 -suffix 01 -lifeline 12600 -version 701 -checkpoint 30 -np 12 -forceasm
01:25:11:Started core on PID 16732
01:25:11:FahCore 0xa5 started
01:25:11:Started thread 25 on PID 12600
01:25:12:Unit 03:
01:25:12:Unit 03:*------------------------------*
01:25:12:Unit 03:Folding@Home Gromacs SMP Core
01:25:12:Unit 03:Version 2.27 (Mar 12, 2010)
01:25:12:Unit 03:
01:25:12:Unit 03:Preparing to commence simulation
01:25:12:Unit 03:- Assembly optimizations manually forced on.
01:25:12:Unit 03:- Not checking prior termination.
01:25:16:Unit 03:- Expanded 24866519 -> 30796292 (decompressed 123.8 percent)
01:25:16:Unit 03:Called DecompressByteArray: compressed_data_size=24866519 data_size=30796292, decompressed_data_size=30796292 diff=0
01:25:17:Unit 02: Upload complete
01:25:17:Server responded WORK_ACK (400)
01:25:17:Unit 03:- Digital signature verified
01:25:17:Unit 03:
01:25:17:Unit 03:Project: 6900 (Run 19, Clone 7, Gen 27)
01:25:17:Unit 03:
01:25:17:Unit 03:Assembly optimizations on if available.
01:25:17:Unit 03:Entering M.D.
01:25:17:Cleaning up Unit 02
01:25:23:Unit 03:Mapping NT from 12 to 12 
01:25:26:Unit 03:Completed 0 out of 250000 steps  (0%)
01:26:13:Unit 04:Completed   4000000 out of 40000000 steps (10%).
01:28:39:Unit 00:Completed    400000 out of 40000000 steps (1%).
01:30:25:Unit 04:Completed   4400000 out of 40000000 steps (11%).
01:32:50:Unit 00:Completed    800000 out of 40000000 steps (2%).
01:34:37:Unit 04:Completed   4800000 out of 40000000 steps (12%).
01:37:01:Unit 00:Completed   1200000 out of 40000000 steps (3%).
01:38:49:Unit 04:Completed   5200000 out of 40000000 steps (13%).
01:41:12:Unit 00:Completed   1600000 out of 40000000 steps (4%).
01:43:01:Unit 04:Completed   5600000 out of 40000000 steps (14%).
01:45:24:Unit 00:Completed   2000000 out of 40000000 steps (5%).
01:47:12:Unit 04:Completed   6000000 out of 40000000 steps (15%).
01:49:35:Unit 00:Completed   2400000 out of 40000000 steps (6%).
01:51:24:Unit 04:Completed   6400000 out of 40000000 steps (16%).
01:53:46:Unit 00:Completed   2800000 out of 40000000 steps (7%).
01:55:36:Unit 04:Completed   6800000 out of 40000000 steps (17%).
01:58:00:Unit 00:Completed   3200000 out of 40000000 steps (8%).
01:59:48:Unit 04:Completed   7200000 out of 40000000 steps (18%).
02:02:22:Unit 00:Completed   3600000 out of 40000000 steps (9%).
02:03:06:Unit 03:Completed 2500 out of 250000 steps  (1%)
02:04:11:Unit 04:Completed   7600000 out of 40000000 steps (19%).
02:06:33:Unit 00:Completed   4000000 out of 40000000 steps (10%).
02:08:23:Unit 04:Completed   8000000 out of 40000000 steps (20%).
02:10:44:Unit 00:Completed   4400000 out of 40000000 steps (11%).
02:12:35:Unit 04:Completed   8400000 out of 40000000 steps (21%).
02:14:55:Unit 00:Completed   4800000 out of 40000000 steps (12%).
02:16:47:Unit 04:Completed   8800000 out of 40000000 steps (22%).
02:19:06:Unit 00:Completed   5200000 out of 40000000 steps (13%).
02:20:58:Unit 04:Completed   9200000 out of 40000000 steps (23%).
02:23:17:Unit 00:Completed   5600000 out of 40000000 steps (14%).
02:25:10:Unit 04:Completed   9600000 out of 40000000 steps (24%).
02:27:28:Unit 00:Completed   6000000 out of 40000000 steps (15%).
02:29:22:Unit 04:Completed  10000000 out of 40000000 steps (25%).
02:31:39:Unit 00:Completed   6400000 out of 40000000 steps (16%).
02:32:32:Unit 03:Completed 5000 out of 250000 steps  (2%)
02:33:33:Unit 04:Completed  10400000 out of 40000000 steps (26%).
02:35:50:Unit 00:Completed   6800000 out of 40000000 steps (17%).
02:37:45:Unit 04:Completed  10800000 out of 40000000 steps (27%).
02:40:01:Unit 00:Completed   7200000 out of 40000000 steps (18%).
02:41:57:Unit 04:Completed  11200000 out of 40000000 steps (28%).
02:44:12:Unit 00:Completed   7600000 out of 40000000 steps (19%).
02:46:09:Unit 04:Completed  11600000 out of 40000000 steps (29%).
02:48:23:Unit 00:Completed   8000000 out of 40000000 steps (20%).
02:50:21:Unit 04:Completed  12000000 out of 40000000 steps (30%).
02:52:34:Unit 00:Completed   8400000 out of 40000000 steps (21%).
02:54:33:Unit 04:Completed  12400000 out of 40000000 steps (31%).
02:56:45:Unit 00:Completed   8800000 out of 40000000 steps (22%).
02:58:46:Unit 04:Completed  12800000 out of 40000000 steps (32%).
03:00:57:Unit 00:Completed   9200000 out of 40000000 steps (23%).
03:01:40:Unit 03:Completed 7500 out of 250000 steps  (3%)
03:02:59:Unit 04:Completed  13200000 out of 40000000 steps (33%).
03:05:08:Unit 00:Completed   9600000 out of 40000000 steps (24%).
03:07:13:Unit 04:Completed  13600000 out of 40000000 steps (34%).
03:09:19:Unit 00:Completed  10000000 out of 40000000 steps (25%).
03:11:33:Unit 04:Completed  14000000 out of 40000000 steps (35%).
03:13:30:Unit 00:Completed  10400000 out of 40000000 steps (26%).
03:15:52:Unit 04:Completed  14400000 out of 40000000 steps (36%).
03:17:41:Unit 00:Completed  10800000 out of 40000000 steps (27%).
03:20:04:Unit 04:Completed  14800000 out of 40000000 steps (37%).
03:21:52:Unit 00:Completed  11200000 out of 40000000 steps (28%).
03:24:17:Unit 04:Completed  15200000 out of 40000000 steps (38%).
03:26:03:Unit 00:Completed  11600000 out of 40000000 steps (29%).
03:28:29:Unit 04:Completed  15600000 out of 40000000 steps (39%).
03:30:14:Unit 00:Completed  12000000 out of 40000000 steps (30%).
03:31:48:Unit 03:Completed 10000 out of 250000 steps  (4%)
03:32:47:Unit 04:Completed  16000000 out of 40000000 steps (40%).
03:34:25:Unit 00:Completed  12400000 out of 40000000 steps (31%).
03:37:06:Unit 04:Completed  16400000 out of 40000000 steps (41%).
03:38:36:Unit 00:Completed  12800000 out of 40000000 steps (32%).
03:41:21:Unit 04:Completed  16800000 out of 40000000 steps (42%).

02:33:55:Unit 03:Completed 127500 out of 250000 steps  (51%)
02:34:55:Unit 02:Completed  27200000 out of 40000000 steps (68%).
02:35:15:Unit 00:Completed  24000000 out of 40000000 steps (60%).
02:39:07:Unit 02:Completed  27600000 out of 40000000 steps (69%).
02:39:26:Unit 00:Completed  24400000 out of 40000000 steps (61%).
02:43:19:Unit 02:Completed  28000000 out of 40000000 steps (70%).
02:43:37:Unit 00:Completed  24800000 out of 40000000 steps (62%).
02:47:31:Unit 02:Completed  28400000 out of 40000000 steps (71%).
02:47:48:Unit 00:Completed  25200000 out of 40000000 steps (63%).
02:51:45:Unit 02:Completed  28800000 out of 40000000 steps (72%).
02:51:59:Unit 00:Completed  25600000 out of 40000000 steps (64%).
02:55:29:Unit 03:mdrun returned 255
02:55:29:Unit 03:Going to send back what have done -- stepsTotalG=250000
02:55:29:Unit 03:Work fraction=0.5173 steps=250000.
02:55:33:Unit 03:logfile size=106323 infoLength=106323 edr=0 trr=25
02:55:33:Unit 03:logfile size: 106323 info=106323 bed=0 hdr=25
02:55:33:Unit 03:- Writing 106861 bytes of core data to disk...
02:55:33:Unit 03:  ... Done.
02:55:35:FahCore, running Unit 03, returned: BAD_WORK_UNIT (114)
02:55:36:Sending unit results: id:03 state:SEND project:6900 run:19 clone:7 gen:27 core:0xa5 unit:0x0000002e52be740d4de960f6a2b4c99d
02:55:36:Unit 03: Uploading 104.36KiB
02:55:36:Connecting to assign3.stanford.edu:8080
02:55:36:Connecting to 130.237.232.141:8080
02:55:37:News: Welcome to Folding@Home
02:55:37:Assigned to work server 130.237.232.141
02:55:37:Requesting new work unit for slot 00: READY smp:12 from 130.237.232.141
02:55:37:Connecting to 130.237.232.141:8080
02:55:40:WARNING: Exception: Failed to send results to work server: Server responded: HTTP_INTERNAL_SERVER_ERROR
02:55:40:Trying to send results to collection server
02:55:40:Unit 03: Uploading 104.36KiB
02:55:40:Connecting to 130.237.165.141:8080
02:55:43:Slot 00: Downloading 23.71MiB
02:55:44:Unit 03: Upload complete
02:55:44:Server responded WORK_ACK (400)
02:55:44:Cleaning up Unit 03
02:55:49:Slot 00: 9.98%
02:55:55:Slot 00: 23.20%
02:55:57:Unit 02:Completed  29200000 out of 40000000 steps (73%).
02:56:01:Slot 00: 36.11%
02:56:07:Slot 00: 48.00%
02:56:10:Unit 00:Completed  26000000 out of 40000000 steps (65%).
02:56:13:Slot 00: 54.60%
02:56:19:Slot 00: 61.28%
02:56:25:Slot 00: 68.11%
02:56:31:Slot 00: 75.25%
02:56:37:Slot 00: 82.44%
02:56:43:Slot 00: 89.35%
02:56:49:Slot 00: 96.42%
02:56:51:Slot 00: Download complete
02:56:51:Received Unit: id:01 state:DOWNLOAD project:6900 run:7 clone:7 gen:18 core:0xa5 unit:0x0000001952be740d4de95a8395170aa2
02:56:52:Starting Unit 01
02:56:52:Running core: C:/ProgramData/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/Core_a5.fah/FahCore_a5.exe -dir 01 -suffix 01 -lifeline 12600 -version 701 -checkpoint 30 -np 12 -forceasm
02:56:52:Started core on PID 22412
02:56:52:FahCore 0xa5 started
02:56:52:Started thread 32 on PID 12600
02:56:52:Unit 01:
02:56:52:Unit 01:*------------------------------*
02:56:52:Unit 01:Folding@Home Gromacs SMP Core
02:56:52:Unit 01:Version 2.27 (Mar 12, 2010)
02:56:52:Unit 01:
02:56:52:Unit 01:Preparing to commence simulation
02:56:52:Unit 01:- Assembly optimizations manually forced on.
02:56:52:Unit 01:- Not checking prior termination.
02:56:57:Unit 01:- Expanded 24860329 -> 30796292 (decompressed 123.8 percent)
02:56:57:Unit 01:Called DecompressByteArray: compressed_data_size=24860329 data_size=30796292, decompressed_data_size=30796292 diff=0
02:56:57:Unit 01:- Digital signature verified
02:56:57:Unit 01:
02:56:57:Unit 01:Project: 6900 (Run 7, Clone 7, Gen 18)
02:56:57:Unit 01:
02:56:57:Unit 01:Assembly optimizations on if available.
02:56:57:Unit 01:Entering M.D.
02:57:03:Unit 01:Mapping NT from 12 to 12 
02:57:07:Unit 01:Completed 0 out of 250000 steps  (0%)
03:00:10:Unit 02:Completed  29600000 out of 40000000 steps (74%).
03:00:23:Unit 00:Completed  26400000 out of 40000000 steps (66%).
03:04:26:Unit 02:Completed  30000000 out of 40000000 steps (75%).
03:04:34:Unit 00:Completed  26800000 out of 40000000 steps (67%).
03:08:41:Unit 02:Completed  30400000 out of 40000000 steps (76%).
 
The odds of two different p6900s in a row failing due to bad WUs is pretty slim. I'd run the OCCT beta 16 and see it it passes on small and large data sets. You might want to go back to verbosity 3 on the v7 client. Most of the stuff verbosity 5 adds to the log is useless for our diagnosis and makes the log even messier.
 
I've run OCCT on both large and small data sets for over 2 hours each with no problems. I'm not sure if that is long enough to prove anything. I've changed verbosity to 3 and currently have a p6900 at 32% complete and will see if it fails.
 
Back