PDA

View Full Version : Project: 2677 (Run 30, Clone 70, Gen 39)


Audioaficionado
08-30-09, 09:01 AM
I picked this one up and it ran out of time. Over 1 hour per step :eek:

It was on a VM with two virtual processors (2 VMs on a Q6600)

[21:24:49] Completed 147500 out of 250000 steps (59%)
[22:35:03] Completed 150000 out of 250000 steps (60%)
[23:45:17] Completed 152500 out of 250000 steps (61%)
[00:55:41] Completed 155000 out of 250000 steps (62%)
[02:04:51] Completed 157500 out of 250000 steps (63%)
[02:04:51] Unit 6's deadline (August 29 01:23) has passed.
[02:04:51] Going to interrupt core and move on to next unit...


Received the TERM signal, stopping at the next step



Received the TERM signal, stopping at the next step



Received the TERM signal, stopping at the next step



Received the TERM signal, stopping at the next step


Average load imbalance: 300.0 %
Part of the total run time spent waiting due to load imbalance: 0.6 %
Steps where the load balancing was limited by -rdd, -rcon and/or -dds: Z 9 %


NOTE: 52 % of the run time was spent communicating energies,
you might want to use the -nosum option of mdrun


Parallel run - timing based on wallclock.

NODE (s) Real (s) (%)
Time: 261669.000 261669.000 100.0
3d00h41:09
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 0.000 2.674 0.104 230.734

gcq#0: Thanx for Using GROMACS - Have a Nice Day

Error encountered before initializing MPICH
[02:04:52] mdrun_gpu returned 1
[02:04:52] Gromacs was interrupted
[02:04:52] Folding@home Core Shutdown: INTERRUPTED
[03:23:46] - Autosending finished units... [August 29 03:23:46 UTC]
[03:23:46] Trying to send all finished work units
[03:23:46] + No unsent completed units remaining.
[03:23:46] - Autosend completed
[09:23:46] - Autosending finished units... [August 29 09:23:46 UTC]
[09:23:46] Trying to send all finished work units
[09:23:46] + No unsent completed units remaining.
[09:23:46] - Autosend completed
[15:23:46] - Autosending finished units... [August 29 15:23:46 UTC]
[15:23:46] Trying to send all finished work units
[15:23:46] + No unsent completed units remaining.
[15:23:46] - Autosend completed
[21:23:46] - Autosending finished units... [August 29 21:23:46 UTC]
[21:23:46] Trying to send all finished work units
[21:23:46] + No unsent completed units remaining.
[21:23:46] - Autosend completed
[03:23:46] - Autosending finished units... [August 30 03:23:46 UTC]
[03:23:46] Trying to send all finished work units
[03:23:46] + No unsent completed units remaining.
[03:23:46] - Autosend completed



I finally just crtl + c and started over as it couldn't seem to restart by it self.

PhilColwill
08-30-09, 09:40 AM
I believe that project was noted in the Bad WU's thread ..

http://www.ocforums.com/showthread.php?t=615646

I think Jolly had some issues with it too ..

GeneralMac
08-30-09, 09:54 AM
I picked this one up and it ran out of time. Over 1 hour per step :eek:

It was on a VM with two virtual processors (2 VMs on a Q6600)

[21:24:49] Completed 147500 out of 250000 steps (59%)
[22:35:03] Completed 150000 out of 250000 steps (60%)
[23:45:17] Completed 152500 out of 250000 steps (61%)
[00:55:41] Completed 155000 out of 250000 steps (62%)
[02:04:51] Completed 157500 out of 250000 steps (63%)
[02:04:51] Unit 6's deadline (August 29 01:23) has passed.
[02:04:51] Going to interrupt core and move on to next unit...


I finally just crtl + c and started over as it couldn't seem to restart by it self.

sorry thaought it wa sumthing else, if the wu is only running on 1 core probably a bad wu

ihrsetrdr
08-30-09, 11:26 AM
I had one of those(not sure which Run-Clone-Gen) running on a quad; it started spewing "LINCS warnings"...basically trashed the WU. :rolleyes:

ChasR
08-30-09, 11:42 AM
Bad a2 WUs have a smaller than expected download size. If it's around 1.5 MB, you don't have to wait on progress, stop the client, delete the Work directory, queue.dat and machinedependent.dat and you will get a new WU on restart of the client. Deleting machinedependent.dat will cause your Stanford processor count to go up by one. A fix is coming on Monday.

Audioaficionado
08-30-09, 07:35 PM
Bad a2 WUs have a smaller than expected download size. If it's around 1.5 MB, you don't have to wait on progress, stop the client, delete the Work directory, queue.dat and machinedependent.dat and you will get a new WU on restart of the client. Deleting machinedependent.dat will cause your Stanford processor count to go up by one. A fix is coming on Monday.OK...

Did all that and Stanford's server served me up the same crap again :mad:

3rd time was the charm :)

ChasR
08-30-09, 08:37 PM
I've never gotten the same WU when I deleted machinedependent.dat and you can look at my processor count to see I've deleted it a lot since the bad WUs came out. Delete that file and the rig is seen as a fresh install by the AS. Odds are very high against getting the same WU on a fresh install, but not impossible. You must not be very lucky. If you didn't delete machinedependent.dat, you will get the same WU several times before the AS gives up and sends you a different one.

Voidn
09-02-09, 10:09 AM
ugg, I just got 2 bad 2667's and a bad 2669 in a row. watching the file size is very helpful.

Bobnova
09-02-09, 10:56 AM
I've managed to get the same lousy WU a couple times in a row deleting everything but fah6 and the config file. Odds are against it, but it happens.

HayesK
09-02-09, 08:25 PM
Caught three more 1 core A2 today
p2677 (R1-C89-G43)
p2677 (R3-C79-G33)
p2677 (R39-C20-G41)

Good news. Received the new core 2.10 after deleting the old core. The first rig received a good WU. The 2nd rig initially got a bad WU, but errored immediately, then downloaded the same WU and errored again two more times before getting a good WU on the fourth attempt. On the third rig, I only deleted the old core and received the new core, which errored on the current bad wu, then then downloaded the same WU and errored again four more times before getting a good WU on the fifth attempt.