PDA

View Full Version : Grrrrr!!!


Audioaficionado
09-24-08, 12:19 AM
[02:52:28] Completed 495000 out of 500000 steps (99 percent)



M E G A - F L O P S A C C O U N T I N G

Parallel run - timing based on wallclock.
RF=Reaction-Field FE=Free Energy SCFE=Soft-Core/Free Energy
T=Tabulated W3=SPC/TIP3p W4=TIP4p (single or pairs)
NF=No Forces

Computing: M-Number M-Flops % of Flops
-----------------------------------------------------------------------
VdW(T) 1362696.261442 73585598.117868 16.6
RF Coul 541549.901934 17871146.763822 4.0
RF Coul [W3] 2405.828058 235771.149684 0.1
RF Coul + VdW(T) 613396.797033 39870791.807145 9.0
RF Coul + VdW(T) [W3] 287065.504534 37318515.589420 8.4
RF Coul + VdW(T) [W3-W3] 767987.491107 245755997.154240 55.4
Outer nonbonded loop 238993.886439 2389938.864390 0.5
1,4 nonbonded interactions 1059.002118 95310.190620 0.0
NS-Pairs 500793.896929 10516671.835509 2.4
Reset In Box 3876.077520 34884.697680 0.0
Shift-X 77423.154846 464538.929076 0.1
CG-CoM 1765.285305 51193.273845 0.0
Sum Forces 116280.232560 116280.232560 0.0
Bonds 13248.026496 569665.139328 0.1
Angles 15591.531183 2541419.582829 0.6
Propers 5376.010752 1231106.462208 0.3
Impropers 1006.002012 209248.418496 0.0
RB-Dihedrals 14114.028228 3486164.972316 0.8
Virial 38814.077628 698653.397304 0.2
Update 38760.077520 1201562.403120 0.3
Stop-CM 38760.000000 387600.000000 0.1
P-Coupling 38760.077520 232560.465120 0.1
Calc-Ekin 38760.155040 1046524.186080 0.2
Constraint-V 38760.077520 232560.465120 0.1
Constraint-Vir 25213.550427 605125.210248 0.1
Settle 8404.516809 2714658.929307 0.6
-----------------------------------------------------------------------
Total 443463488.237335 100.0
-----------------------------------------------------------------------

NODE (s) Real (s) (%)
Time: 81467.000 81467.000 100.0
22h37:47
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 43.884 5.443 1.061 22.630
[03:05:59] Writing local files
[03:05:59] Completed 500000 out of 500000 steps (100 percent)
[03:05:59] Writing final coordinates.
[03:05:59] Past main M.D. loop
[03:05:59] Will end MPI now
[03:06:59]
[03:06:59] Finished Work Unit:
[03:06:59] - Reading up to 3721200 from "work/wudata_07.arc": Read 3721200
[03:06:59] - Reading up to 1776960 from "work/wudata_07.xtc": Read 1776960
[03:06:59] goefile size: 0
[03:06:59] logfile size: 16915
[03:06:59] Leaving Run
[03:07:01] - Writing 5519475 bytes of core data to disk...
[03:07:01] ... Done.
[03:07:02] - Shutting down core
[03:07:02]
[03:07:02] Folding@home Core Shutdown: FINISHED_UNIT
[04:06:48] - Autosending finished units...
[04:06:48] Trying to send all finished work units
[04:06:48] + No unsent completed units remaining.
[04:06:48] - Autosend completed
[10:06:48] - Autosending finished units...
[10:06:48] Trying to send all finished work units
[10:06:48] + No unsent completed units remaining.
[10:06:48] - Autosend completed
[16:06:48] - Autosending finished units...
[16:06:48] Trying to send all finished work units
[16:06:48] + No unsent completed units remaining.
[16:06:48] - Autosend completed
[22:06:48] - Autosending finished units...
[22:06:48] Trying to send all finished work units
[22:06:48] + No unsent completed units remaining.
[22:06:48] - Autosend completed
[04:06:48] - Autosending finished units...
[04:06:48] Trying to send all finished work units
[04:06:48] + No unsent completed units remaining.
[04:06:48] - Autosend completed
[10:06:48] - Autosending finished units...
[10:06:48] Trying to send all finished work units
[10:06:48] + No unsent completed units remaining.
[10:06:48] - Autosend completed
[16:06:48] - Autosending finished units...
[16:06:48] Trying to send all finished work units
[16:06:48] + No unsent completed units remaining.
[16:06:48] - Autosend completed
[22:06:48] - Autosending finished units...
[22:06:48] Trying to send all finished work units
[22:06:48] + No unsent completed units remaining.
[22:06:48] - Autosend completed
[04:06:48] - Autosending finished units...
[04:06:48] Trying to send all finished work units
[04:06:48] + No unsent completed units remaining.
[04:06:48] - Autosend completed
[10:06:48] - Autosending finished units...
[10:06:48] Trying to send all finished work units
[10:06:48] + No unsent completed units remaining.
[10:06:48] - Autosend completed
[12:56:33] ***** Got an Activate signal (2)
[12:56:33] Killing all core threads

Folding@Home Client Shutdown.
audioaficionado@debian-lenny:~/fah$ ./fah6 -smp -verbosity 9

Note: Please read the license agreement (fah6 -license). Further
use of this software requires that you have read and accepted this agreement.

2 cores detected


--- Opening Log file [September 22 12:57:15]


# SMP Client ################################################## ################
################################################## #############################

Folding@Home Client Version 6.02

http://folding.stanford.edu

################################################## #############################
################################################## #############################

Launch directory: /home/audioaficionado/fah
Executable: ./fah6
Arguments: -smp -verbosity 9

[12:57:15] - Ask before connecting: No
[12:57:15] - User name: audioaficionado (Team 32)
[12:57:15] - User ID: 6147E20A5A473921
[12:57:15] - Machine ID: 2
[12:57:15]
[12:57:15] Loaded queue successfully.
[12:57:15] - Autosending finished units...
[12:57:15] Trying to send all finished work units
[12:57:15] + No unsent completed units remaining.
[12:57:15] - Autosend completed
[12:57:15]
[12:57:15] + Processing work unit
[12:57:15] Core required: FahCore_a1.exe
[12:57:15] Core found.
[12:57:15] Working on Unit 07 [September 22 12:57:15]
[12:57:15] + Working ...
[12:57:15] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 07 -checkpoint 30 -verbose -lifeline 4079 -version 602'

[12:57:15]
[12:57:15] *------------------------------*
[12:57:15] Folding@Home Gromacs SMP Core
[12:57:15] Version 1.74 (November 27, 2006)
[12:57:15]
[12:57:15] Preparing to commence simulation
[12:57:15] - Ensuring status. Please wait.
[12:57:15] - Shutting down core
[12:57:32] put
[12:57:32] - Starting from initial work packet
[12:57:32]
[12:57:32] Project: 0 (Run 0, Clone 0, Gen 0)
[12:57:32]
[12:57:32] Error: Could not write local file. Exiting.
[12:57:32] - Shutting down core
[12:58:30] ***** Got an Activate signal (2)
[12:58:30] Killing all core threads

Folding@Home Client Shutdown.
audioaficionado@debian-lenny:~/fah$ [0]0:Return code = 18
[0]1:Return code = 0, signaled with Quit
[0]2:Return code = 0, signaled with Quit
[0]3:Return code = 0, signaled with Quit
I won' bother if this keeps happening. The only way to fix this the last several times was to blow away the FAH folder and start from scratch. :rolleyes:

I'm too busy with college to keep this up much longer. :bang head

Stanford... we have a problem! :mad:

Adak
09-24-08, 03:20 AM
I posted your problem over at FAH's forum. There are many others with this same complaint, unfortunately. Some suggestions:

Remove the -advmethods flag, if your client is setting it.

Remove the -forceasm flag, if your client is setting it.

Use qfix as soon as you see the problem - don't restart the client when you see the problem, because it will quickly thrash the queue, ruining the WU result, beyond repair.


I'll check back in a few days and see what they have come up with for an answer, but right now they say "we're working on it".

Zerix01
09-24-08, 04:23 AM
Just out of curiosity, is Linux installed right on the hardware or is it in a VM?

If it is a VM, see if it makes any difference to run the client as root.

Audioaficionado
09-24-08, 11:07 AM
I posted your problem over at FAH's forum. There are many others with this same complaint, unfortunately. Some suggestions:

Remove the -advmethods flag, if your client is setting it.

Remove the -forceasm flag, if your client is setting it.

Use qfix as soon as you see the problem - don't restart the client when you see the problem, because it will quickly thrash the queue, ruining the WU result, beyond repair.


I'll check back in a few days and see what they have come up with for an answer, but right now they say "we're working on it".No advanced flags are set only -smp -verbosity 9. I do have the config set to accept advanced work and eventually I get the a2 core for the occasional advanced work units.

What is qfix?

Just out of curiosity, is Linux installed right on the hardware or is it in a VM?

If it is a VM, see if it makes any difference to run the client as root.They are both pure Debian Lenny testing. I've run root terminals but the problem still occurs.

Like I said, I'm too busy to futz with the clients much these days so if they crash, it's Stanford's loss. My rigs will just use less electricity while not folding just spinning error loops. Less points for me but that's not a life or death situation for me anyway. :shrug:

Macaholic
09-24-08, 04:21 PM
What is qfix?

Check the thread here (http://foldingforum.org/viewtopic.php?f=8&t=191) for more details.

drshivas
09-24-08, 06:20 PM
Try what I did. My VMs are fine now:

http://www.ocforums.com/showpost.php?p=5800448&postcount=27