the_cultie
02-26-09, 03:39 PM
I've been having a slight problem with folding with my second rig; I'm running Ubuntu in a VM This last few weeks each time the client gets an a2 WU the client stops working. Here is the approporiate section of the log
[16:27:35] - Warning: Could not delete all work unit files (2): Core returned invalid code
[16:27:35] Trying to send all finished work units
[16:27:35] + No unsent completed units remaining.
[16:27:35] - Preparing to get new work unit...
[16:27:36] + Attempting to get work packet
[16:27:36] - Will indicate memory of 724 MB
[16:27:36] - Detect CPU. Vendor: AuthenticAMD, Family: 15, Model: 11, Stepping: 2
[16:27:36] - Connecting to assignment server
[16:27:36] Connecting to http://assign.stanford.edu:8080/
[16:27:36] Posted data.
[16:27:36] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[16:27:36] + News From Folding@Home: Welcome to Folding@Home
[16:27:37] Loaded queue successfully.
[16:27:37] Connecting to http://171.64.65.56:8080/
[16:27:41] Posted data.
[16:27:41] Initial: 0000; - Receiving payload (expected size: 3985424)
[16:28:14] - Downloaded at ~117 kB/s
[16:28:14] - Averaged speed for that direction ~243 kB/s
[16:28:14] + Received work.
[16:28:14] Trying to send all finished work units
[16:28:14] + No unsent completed units remaining.
[16:28:14] + Closed connections
[16:28:14]
[16:28:14] + Processing work unit
[16:28:14] Core required: FahCore_a2.exe
[16:28:14] Core found.
[16:28:14] Working on Unit 03 [February 25 16:28:14]
[16:28:14] + Working ...
[16:28:14] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 03 -checkpoint 15 -forceasm -verbose -lifeline 5451 -version 602'
[16:28:14]
[16:28:14] *------------------------------*
[16:28:14] Folding@Home Gromacs SMP Core
[16:28:14] Version 2.04 (Thu Jan 29 16:43:57 PST 2009)
[16:28:14]
[16:28:14] Preparing to commence simulation
[16:28:14] - Ensuring status. Please wait.
[16:28:23] - Assembly optimizations manually forced on.
[16:28:23] - Not checking prior termination.
[16:28:27] - Expanded 3984912 -> 16935197 (decompressed 424.9 percent)
[16:28:28] Called DecompressByteArray: compressed_data_size=3984912 data_size=16935197, decompressed_data_size=16935197 diff=0
[16:28:28] - Digital signature verified
[16:28:28]
[16:28:28] Project: 2675 (Run 0, Clone 178, Gen 4)
[16:28:28]
[16:28:28] Assembly optimizations on if available.
[16:28:28] Entering M.D.
[16:28:43] CoreStatus = FF (255)
[16:28:43] Client-core communications error: ERROR 0xff
[16:28:43] Deleting current work unit & continuing...
As you can see at the end it says that its Deleting current work unit & continuing but it doesn't. It just sits there until i stop the client, delete the Work folder and the queue.dat file and restart; and then it gets an a1 WU which works fine. However it means i'm stuck on a1 based WU's which is giving me a bit of a PPD hit :bang head. Is there anyway to fix this or should i just reinstall the client?
[16:27:35] - Warning: Could not delete all work unit files (2): Core returned invalid code
[16:27:35] Trying to send all finished work units
[16:27:35] + No unsent completed units remaining.
[16:27:35] - Preparing to get new work unit...
[16:27:36] + Attempting to get work packet
[16:27:36] - Will indicate memory of 724 MB
[16:27:36] - Detect CPU. Vendor: AuthenticAMD, Family: 15, Model: 11, Stepping: 2
[16:27:36] - Connecting to assignment server
[16:27:36] Connecting to http://assign.stanford.edu:8080/
[16:27:36] Posted data.
[16:27:36] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[16:27:36] + News From Folding@Home: Welcome to Folding@Home
[16:27:37] Loaded queue successfully.
[16:27:37] Connecting to http://171.64.65.56:8080/
[16:27:41] Posted data.
[16:27:41] Initial: 0000; - Receiving payload (expected size: 3985424)
[16:28:14] - Downloaded at ~117 kB/s
[16:28:14] - Averaged speed for that direction ~243 kB/s
[16:28:14] + Received work.
[16:28:14] Trying to send all finished work units
[16:28:14] + No unsent completed units remaining.
[16:28:14] + Closed connections
[16:28:14]
[16:28:14] + Processing work unit
[16:28:14] Core required: FahCore_a2.exe
[16:28:14] Core found.
[16:28:14] Working on Unit 03 [February 25 16:28:14]
[16:28:14] + Working ...
[16:28:14] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 03 -checkpoint 15 -forceasm -verbose -lifeline 5451 -version 602'
[16:28:14]
[16:28:14] *------------------------------*
[16:28:14] Folding@Home Gromacs SMP Core
[16:28:14] Version 2.04 (Thu Jan 29 16:43:57 PST 2009)
[16:28:14]
[16:28:14] Preparing to commence simulation
[16:28:14] - Ensuring status. Please wait.
[16:28:23] - Assembly optimizations manually forced on.
[16:28:23] - Not checking prior termination.
[16:28:27] - Expanded 3984912 -> 16935197 (decompressed 424.9 percent)
[16:28:28] Called DecompressByteArray: compressed_data_size=3984912 data_size=16935197, decompressed_data_size=16935197 diff=0
[16:28:28] - Digital signature verified
[16:28:28]
[16:28:28] Project: 2675 (Run 0, Clone 178, Gen 4)
[16:28:28]
[16:28:28] Assembly optimizations on if available.
[16:28:28] Entering M.D.
[16:28:43] CoreStatus = FF (255)
[16:28:43] Client-core communications error: ERROR 0xff
[16:28:43] Deleting current work unit & continuing...
As you can see at the end it says that its Deleting current work unit & continuing but it doesn't. It just sits there until i stop the client, delete the Work folder and the queue.dat file and restart; and then it gets an a1 WU which works fine. However it means i'm stuck on a1 based WU's which is giving me a bit of a PPD hit :bang head. Is there anyway to fix this or should i just reinstall the client?