PDA

View Full Version : Qfix for OSX?


Surferseth
02-27-09, 05:03 PM
I have had a couple WU's go bad on me over the past few days, below is the log info. It loops the below into over and over until I kill the queue.dat, WUinfo, and work files / folders.

--- Opening Log file [February 27 22:00:28 UTC]


# Mac OS X SMP Console Edition ################################################
################################################## #############################

Folding@Home Client Version 6.20

http://folding.stanford.edu

################################################## #############################
################################################## #############################

Launch directory: /Users/seth/Library/Folding@home
Executable: /Applications/Folding@home.app/fah6


[22:00:28] - Ask before connecting: No
[22:00:28] - User name: surferseth (Team 32)
[22:00:28] - User ID: 7AA1A0362AA27851
[22:00:28] - Machine ID: 1
[22:00:28]
[22:00:28] Loaded queue successfully.
[22:00:28]
[22:00:28] + Processing work unit
[22:00:28] Core required: FahCore_a2.exe
[22:00:28] Core found.
[22:00:28] Working on queue slot 01 [February 27 22:00:28 UTC]
[22:00:28] + Working ...
[22:00:28]
[22:00:28] *------------------------------*
[22:00:28] Folding@Home Gromacs SMP Core
[22:00:28] Version 2.04 (Wed Jan 21 09:16:08 PST 2009)
[22:00:28]
[22:00:28] Preparing to commence simulation
[22:00:28] - Ensuring status. Please wait.
[22:00:37] - Looking at optimizations...
[22:00:37] - Working with standard loops on this execution.
[22:00:37] - Files status OK
[22:00:40] - Expanded 4844336 -> 23994061 (decompressed 495.3 percent)
[22:00:40] Called DecompressByteArray: compressed_data_size=4844336 data_size=23994061, decompressed_data_size=23994061 diff=0
[22:00:40] - Digital signature verified
[22:00:40]
[22:00:40] Project: 2675 (Run 3, Clone 69, Gen 67)
[22:00:40]
[22:00:41] Entering M.D.
[22:00:47] Will resume from checkpoint file
[22:00:51] Resuming from checkpoint
[22:00:51] fcSaveRestoreState: I/O failed dir=0, var=0501A000, varsize=581916
[22:00:51] fcCheckPointResume: failure in call to fcSaveRestoreState() to restore state.
[22:00:55] CoreStatus = FF (255)
[22:00:55] Client-core communications error: ERROR 0xff
[22:00:55] This is a sign of more serious problems, shutting down.


I have tried to download and run qfix for OSX, but I cant seem to figure out how to run the command. Any help would be appreciated.

ihrsetrdr
02-27-09, 05:55 PM
Download qfix and place a copy in your F@H folder - that is, the folder containing the client and work folder for the hung work unit.

Open up a Terminal shell.

At the prompt type "cd" (no quotes, but keep the space in there), then grab the icon for the F@H folder and drag it onto the terminal window. The path to that folder will appear after the "cd ". Hit return.

Just to be sure the next step will work ok, list the details of the files in that folder by typing "ls -la" at the next prompt (no quotes, again, and hit return) You should see qfix on the list, but the little table of "drwx" info to the left of the file name will have no x's.

At the next terminal prompt type "chmod +x qfix; ls -la" (no quotes, mind the spaces, hit return)

This will make qfix into a Unix executable file and re-list the files in your work folder. Leave the Terminal window open for the next step.

Now, from InCrease, your old Terminal shell, or the preference pane, stop Folding. Check that there is a .results file in the work folder for the queue position corresponding to the hung work unit. [wuresults_xx.dat for Mac -- Phantom]

(*) In the Terminal window you were working in, type "./qfix" (no quotes, hit return).
Qfix will crank for a little while and then give you a list of your queue. It will say that everything is OK, but you'll see that at the left of the queue position for the hung unit, a number 1 instead of number 0. So far qfix has not fixed anything, only diagnosed. Remember that queue position xx (xx = 00 to 09) because you will use it in the next step.

In the Terminal window type "./fah6 -local -delete xx" (no quotes, hit return). This removes the record of unit xx from the queue, but doesn't touch the files associated with it.

In the terminal window type "./qfix" (no quotes, return). After a few seconds to minutes cranking, qfix will list the queue for you again, but this time the '1' will be replaced with the appropriate '0'.

In the terminal window type "./fah6 -local -send xx" (no quotes, return) where xx is the queue number of the problem work unit. [Note that it will take a few minutes to upload the results. Be patient. You can monitor the uploading using Activity Monitor -- to see the data sent/second to the network. -- Phantom]

If you have more than one hung work unit in a queue (unlikely) run the ./fah6 -local -delete xx command for each of them before rebuilding the queue and resending.

If you have hung work units in multiple folders, stop folding and back up each folder containing a problem WU, copy qfix to each new folder, cd to the new folder by dragging the folder icon into the terminal window as describe above, and repeat the above procedure from the (*). At this point qfix is already executable so you can skip the chmod step.

When you're done repairing the queues and sending the work units, it's safe to start Folding again and get rid of your backups.

copy/pasted from: http://teammacosx.org/forum/cgi-bin/ikonboard.pl?act=ST;f=3;t=3630;hl=qfix

hope this helps.

Surferseth
02-27-09, 06:28 PM
Thank you! It was very helpful up to this point:

./fah6 -local -delete xx

I dont have a fah6 executable in my Folding@home folder. I run the system preference version, not the console version. I know fah6 runs as I saw it in activity monitor before I turned off F@H, just not sure where to look.

Found the fah6 file lives inside the Folding@home.app in the applications folder. Still cant get the 1 to change to a 0, but I will...