PDA

View Full Version : Half a folding problem


the_cultie
02-26-09, 03:39 PM
I've been having a slight problem with folding with my second rig; I'm running Ubuntu in a VM This last few weeks each time the client gets an a2 WU the client stops working. Here is the approporiate section of the log

[16:27:35] - Warning: Could not delete all work unit files (2): Core returned invalid code
[16:27:35] Trying to send all finished work units
[16:27:35] + No unsent completed units remaining.
[16:27:35] - Preparing to get new work unit...
[16:27:36] + Attempting to get work packet
[16:27:36] - Will indicate memory of 724 MB
[16:27:36] - Detect CPU. Vendor: AuthenticAMD, Family: 15, Model: 11, Stepping: 2
[16:27:36] - Connecting to assignment server
[16:27:36] Connecting to http://assign.stanford.edu:8080/
[16:27:36] Posted data.
[16:27:36] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[16:27:36] + News From Folding@Home: Welcome to Folding@Home
[16:27:37] Loaded queue successfully.
[16:27:37] Connecting to http://171.64.65.56:8080/
[16:27:41] Posted data.
[16:27:41] Initial: 0000; - Receiving payload (expected size: 3985424)
[16:28:14] - Downloaded at ~117 kB/s
[16:28:14] - Averaged speed for that direction ~243 kB/s
[16:28:14] + Received work.
[16:28:14] Trying to send all finished work units
[16:28:14] + No unsent completed units remaining.
[16:28:14] + Closed connections
[16:28:14]
[16:28:14] + Processing work unit
[16:28:14] Core required: FahCore_a2.exe
[16:28:14] Core found.
[16:28:14] Working on Unit 03 [February 25 16:28:14]
[16:28:14] + Working ...
[16:28:14] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 03 -checkpoint 15 -forceasm -verbose -lifeline 5451 -version 602'

[16:28:14]
[16:28:14] *------------------------------*
[16:28:14] Folding@Home Gromacs SMP Core
[16:28:14] Version 2.04 (Thu Jan 29 16:43:57 PST 2009)
[16:28:14]
[16:28:14] Preparing to commence simulation
[16:28:14] - Ensuring status. Please wait.
[16:28:23] - Assembly optimizations manually forced on.
[16:28:23] - Not checking prior termination.
[16:28:27] - Expanded 3984912 -> 16935197 (decompressed 424.9 percent)
[16:28:28] Called DecompressByteArray: compressed_data_size=3984912 data_size=16935197, decompressed_data_size=16935197 diff=0
[16:28:28] - Digital signature verified
[16:28:28]
[16:28:28] Project: 2675 (Run 0, Clone 178, Gen 4)
[16:28:28]
[16:28:28] Assembly optimizations on if available.
[16:28:28] Entering M.D.
[16:28:43] CoreStatus = FF (255)
[16:28:43] Client-core communications error: ERROR 0xff
[16:28:43] Deleting current work unit & continuing...

As you can see at the end it says that its Deleting current work unit & continuing but it doesn't. It just sits there until i stop the client, delete the Work folder and the queue.dat file and restart; and then it gets an a1 WU which works fine. However it means i'm stuck on a1 based WU's which is giving me a bit of a PPD hit :bang head. Is there anyway to fix this or should i just reinstall the client?

Adak
02-26-09, 03:43 PM
A2 WU's need more memory. Sounds like you could be running out of it, and that's why the client just crashes, instead of getting another WU.

What's the size of your VM?

the_cultie
02-26-09, 03:48 PM
I have the memory for the VM set to 740MB. If it's memory it needs, then its memory I'll have to get; unless my friend buy the pc 1st.

Edward2
02-26-09, 05:17 PM
I don't know whether memory is your problem or not, but I have my VM's set to 1024MB.

jintatsu
02-26-09, 08:52 PM
Haven't encountered that problem. I have my memory set to 640MB.

harlam357
02-27-09, 09:14 AM
I've got two VMs running on 512mb. I did recently experience a problem when I actually set the VM memory usage too high and the VMs grabbed so much memory that windows started paging (I was watching things in Task Manager). In that case, my VM running on an A2 WU crapped out (really before it could get started)... the solution was to decrease the memory usage back to 512mb. Now I may be paging (swapping) inside the VM itself. But windows remains with a 100~200mb to spare, only 2GB in that machine.

ozzlo
02-27-09, 01:10 PM
I am running 4VM's each set to 768megs and have had no real issues. Also, can't the VM swap some memory to the virtual drive?

HayesK
02-27-09, 02:10 PM
I am running all my VMs set 832MB. Am thinking that it took that much to keep Ubuntu out of the Ubuntu swap when I set them up a good while ago. I just checked three of them and found a p2653 using zero swap, but the other two using ~60mb. Perhaps the newer wu are using more memory.

the_cultie
02-27-09, 05:50 PM
I'm just about to finish this a1 WU i have so I'll lower the amout of memory to 600MB and hope I get an a2 WU. Will let you guys know what happens on 15 mins.

the_cultie
02-27-09, 06:49 PM
Well i dont think its a memory issue. I decreased the RAM to 600MB, and got an a2 WU and it errored our again. This is what was in the terminal window in Linux:

Reading file work/wudata_02.tpr, VERSION 3.3.99_development_20070618 (single precision)

-------------------------------------------------------
Program mdrun, VERSION 4.0.3_pre
Source code file: symtab.c, line: 108

Fatal error:
symtab get_symtab_handle 2612 not found
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 0 out of 4

gcq#0: Thanx for Using GROMACS - Have a Nice Day

[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
[0]0:Return code = 255
[0]1:Return code = 0, signaled with Quit
[0]2:Return code = 0, signaled with Quit
[0]3:Return code = 0, signaled with Quit
[23:33:26] CoreStatus = FF (255)
[23:33:26] Client-core communications error: ERROR 0xff
[23:33:26] Deleting current work unit & continuing...

It still had that "Warning: Could not delete all work unit files (2): Core returned invalid code".
Going to try and reinstall the folding client and see if that will fix the problem.

the_cultie
02-28-09, 05:19 AM
Good news everyone!! Reinstall complete sucess, crunching away on an a2 WU now. Getting ~1200ppd at the moment on my 3800+ X2 :D