PDA

View Full Version : -BigAdv VM Stability issue?


Surferseth
01-30-10, 01:05 PM
Commonly seeing this error with my Ubuntu VM running -bigadv WUs. Have 4600MB of RAM dedicated to the VM.

seth@seth-desktop:~/folding$ ./fah

Note: Please read the license agreement (fah6 -license). Further
use of this software requires that you have read and accepted this agreement.

8 cores detected


--- Opening Log file [January 30 06:19:35 UTC]


# Linux SMP Console Edition ################################################## #
################################################## #############################

Folding@Home Client Version 6.24R3

http://folding.stanford.edu

################################################## #############################
################################################## #############################

Launch directory: /home/seth/folding
Executable: ./fah6
Arguments: -smp 8 -bigadv -verbosity 9

seth@seth-desktop:~/folding$ [06:19:35] - Ask before connecting: No
[06:19:35] - User name: surferseth (Team 32)
[06:19:35] - User ID: 6FC4A9AC0DD3ABF0
[06:19:35] - Machine ID: 1
[06:19:35]
[06:19:35] Loaded queue successfully.
[06:19:35]
[06:19:35] + Processing work unit
[06:19:35] Core required: FahCore_a2.exe
[06:19:35] Core found.
[06:19:35] - Autosending finished units... [January 30 06:19:35 UTC]
[06:19:35] Trying to send all finished work units
[06:19:35] + No unsent completed units remaining.
[06:19:35] - Autosend completed
[06:19:35] Working on queue slot 03 [January 30 06:19:35 UTC]
[06:19:35] + Working ...
[06:19:35] - Calling './mpiexec -np 8 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -nice 19 -suffix 03 -checkpoint 15 -verbose -lifeline 9962 -version 624'

[06:19:35]
[06:19:35] *------------------------------*
[06:19:35] Folding@Home Gromacs SMP Core
[06:19:35] Version 2.10 (Sun Aug 30 03:43:28 CEST 2009)
[06:19:35]
[06:19:35] Preparing to commence simulation
[06:19:35] - Looking at optimizations...
[06:19:35] - Working with standard loops on this execution.
[06:19:35] - Files status OK
[06:19:45] is execution.
[06:19:45] - Files status OK
[06:21:33] (decompressed 101.8 percent)
[06:21:34] 49 (decompressed 101.8 percent)
[06:21:57] ressed_data_size=30327709 data_size=159726549, decompressed_data_size=159726549 diff=0
[06:21:57] ssed_data_size=159726549 diff=0
[06:21:59] - Digital signature verified
[06:21:59]
[06:21:59] Project: 2681 (Run 5, Clone 10, Gen 67)
[06:21:59]
[06:22:41] Entering M.D.
[06:22:47] Using Gromacs checkpoints
NNODES=8, MYRANK=0, HOSTNAME=seth-desktop
NODEID=0 argc=23
NNODES=8, MYRANK=1, HOSTNAME=seth-desktop
NNODES=8, MYRANK=2, HOSTNAME=seth-desktop
NODEID=2 argc=23
NNODES=8, MYRANK=3, HOSTNAME=seth-desktop
NODEID=3 argc=23
NNODES=8, MYRANK=4, HOSTNAME=seth-desktop
NODEID=4 argc=23
NNODES=8, MYRANK=5, HOSTNAME=seth-desktop
NODEID=5 argc=23
NNODES=8, MYRANK=6, HOSTNAME=seth-desktop
NODEID=6 argc=23
NNODES=8, MYRANK=7, HOSTNAME=seth-desktop
NODEID=7 argc=23
NODEID=1 argc=23
Reading file work/wudata_03.tpr, VERSION 3.3.99_development_20070618 (single precision)
Note: tpx file_version 48, software version 68

Reading checkpoint file work/wudata_03.cpt generated: Fri Jan 29 05:24:43 2010


NOTE: The tpr file used for this simulation is in an old format, for less memory usage and possibly more performance create a new tpr file with an up to date version of grompp

Making 1D domain decomposition 8 x 1 x 1
starting mdrun 'SINGLE VESICLE in water'
17000001 steps, 68000.0 ps (continuing from step 16938345, 67753.4 ps).
[06:23:14] data_03.log
[06:23:14] Verified work/wudata_03.trr
[06:23:16] Verified work/wudata_03.xtc
[06:23:16] Verified work/wudata_03.edr
[06:24:05] Completed 188344 out of 250000 steps (75%)
[06:47:42] Completed 190000 out of 250000 steps (76%)
[07:22:49] Completed 192500 out of 250000 steps (77%)
[07:57:49] Completed 195000 out of 250000 steps (78%)
[08:32:42] Completed 197500 out of 250000 steps (79%)
[09:07:42] Completed 200000 out of 250000 steps (80%)
[09:42:47] Completed 202500 out of 250000 steps (81%)
[10:17:34] Completed 205000 out of 250000 steps (82%)

t = 67823.363 ps: Water molecule starting at atom 904089 can not be settled.
Check for bad contacts and/or reduce the timestep.

t = 67823.363 ps: Water molecule starting at atom 876966 can not be settled.
Check for bad contacts and/or reduce the timestep.
[10:29:20]
[10:29:20] Folding@home Core Shutdown: INTERRUPTED
application called MPI_Abort(MPI_COMM_WORLD, 102) - process 0
[0]0:Return code = 102
[0]1:Return code = 0, signaled with Quit
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
[0]4:Return code = 0, signaled with Quit
[0]5:Return code = 0, signaled with Quit
[0]6:Return code = 0, signaled with Quit
[0]7:Return code = 0, signaled with Quit
[10:29:28] CoreStatus = 66 (102)
[10:29:28] + Shutdown requested by user. Exiting.***** Got a SIGTERM signal (15)
[10:29:28] Killing all core threads

Folding@Home Client Shutdown.


This error is somewhat random. For days it will fold away finishing WU after WU, and then it just seems to become unstable. Any ideas?:confused:

Norcalsteve
01-30-10, 01:26 PM
Check your OC, i thought my i7 was stable... Prime95 24hrs, but then when i tried Folding, i got my VM/Client going haywire, turns out that i needed a bump in QPI/Vcore

Giz
01-30-10, 01:48 PM
Check your OC, i thought my i7 was stable... Prime95 24hrs, but then when i tried Folding, i got my VM/Client going haywire, turns out that i needed a bump in QPI/Vcore


+1

I was Prime95 stable till I started folding. I was getting the same SIGTERM shutdown until I lowered my overclock.