PDA

View Full Version : SMP Client hangs for no reason... o.O


pik4chu
04-25-07, 12:21 AM
Not sure why this is doing this, but it has done this 4 times now in the last week or two where the progress will just stop. I currently use the console client and it launches via startup (but not as a service) and it has not always been obvious because of the timezone thing in the client but here is the most recent one.


[17:36:33] Preparing to commence simulation
[17:36:33] - Ensuring status. Please wait- Created dyn
[17:36:33] - Files status OK
[17:36:39] - Starting from initial work packet
[17:36:40]
[17:36:40] Project: 2609 (Run 0, Clone 74, Gen 3)
[17:36:40]
[17:36:42] Clone 74, Gen 3)
[17:36:42]
[17:36:44] M.D.
[17:36:44] ing M.D.
[17:36:54] packet
[17:36:54]
[17:36:54] Project: 2609 (Run 0, Clone 74, Gen 3)
[17:36:54]
[17:36:54] Entering M.D.
[17:37:00] Rejecting checkpoint
[17:37:01] Protein: Protein
[17:37:01] Writing local files
[17:37:02] Extra SSE boost OK.
[17:37:02] Writing local files
[17:37:03] Completed 0 out of 500000 steps (0 percent)
[17:55:40] Writing local files
[17:55:40] Completed 5000 out of 500000 steps (1 percent)
[18:14:16] Writing local files
[18:14:17] Completed 10000 out of 500000 steps (2 percent)
[18:32:53] Writing local files
[18:32:53] Completed 15000 out of 500000 steps (3 percent)
[18:51:30] Writing local files
[18:51:30] Completed 20000 out of 500000 steps (4 percent)
[19:10:07] Writing local files
[19:10:07] Completed 25000 out of 500000 steps (5 percent)
[19:28:45] Writing local files
[19:28:45] Completed 30000 out of 500000 steps (6 percent)
[19:47:24] Writing local files
[19:47:24] Completed 35000 out of 500000 steps (7 percent)
[20:06:01] Writing local files
[20:06:01] Completed 40000 out of 500000 steps (8 percent)
[20:24:38] Writing local files
[20:24:39] Completed 45000 out of 500000 steps (9 percent)
[20:43:16] Writing local files
[20:43:16] Completed 50000 out of 500000 steps (10 percent)
[21:01:52] Writing local files
[21:01:52] Completed 55000 out of 500000 steps (11 percent)
*snip*
[21:28:55] Writing local files
[21:28:55] Completed 440000 out of 500000 steps (88 percent)
[21:47:58] Writing local files
[21:47:58] Completed 445000 out of 500000 steps (89 percent)
[22:07:02] Writing local files
[22:07:02] Completed 450000 out of 500000 steps (90 percent)
[22:26:05] Writing local files
[22:26:05] Completed 455000 out of 500000 steps (91 percent)
[22:45:08] Writing local files
[22:45:08] Completed 460000 out of 500000 steps (92 percent)
[23:04:12] Writing local files
[23:04:12] Completed 465000 out of 500000 steps (93 percent)
[23:23:15] Writing local files
[23:23:16] Completed 470000 out of 500000 steps (94 percent)
[23:42:18] Writing local files
[23:42:19] Completed 475000 out of 500000 steps (95 percent)
[00:01:21] Writing local files
[00:01:22] Completed 480000 out of 500000 steps (96 percent)

Folding@Home Client Shutdown at user request.

Folding@Home Client Shutdown.



the shutdown is me closing the client and here it is once I start it again. Each time I restart it the thing pretends nothing ever happened and continues on and finishes, uploads, points are counted and downloads the next one and keeps going. This certainly wont do on what I hope to be an unattended Farm or is this to be expected for the Beta client?


--- Opening Log file [April 25 02:48:07]


# SMP Client ################################################## ################
################################################## #############################

Folding@Home Client Version 5.91beta

http://folding.stanford.edu

################################################## #############################
################################################## #############################

Launch directory: C:\Program Files\Folding@Home Windows SMP Client V1.01
Executable: C:\Program Files\Folding@Home Windows SMP Client V1.01\fah.exe


[02:48:07] - Ask before connecting: No
[02:48:07] - User name: [OC]Pik4chu (Team 32)
[02:48:07] - User ID: 3F2991767EAA9CA4
[02:48:07] - Machine ID: 1
[02:48:07]
[02:48:07] Loaded queue successfully.
[02:48:07]
[02:48:07] + Processing work unit
[02:48:07] Core required: FahCore_a1.exe
[02:48:07] Core found.
[02:48:07] Working on Unit 07 [April 25 02:48:07]
[02:48:07] + Working ...
[02:48:08]
[02:48:08] *------------------------------*
[02:48:08] Folding@Home Gromacs SMP Core
[02:48:08] Version 1.74 (March 10, 2007)
[02:48:08]
[02:48:08] Preparing to commence simulation
[02:48:08] - Ensuring status. Please wait.
[02:48:25] - Looking at optimizations...
[02:48:25] - Working with standard loops on this execution.
[02:48:25] - Previous termination of core was improper.
[02:48:25] - Going to use standard loops.
[02:48:25] - Files status OK
[02:48:38] - Expanded 3965785 -> 21619596 (decompressed 545.1 percent)
[02:48:40]
[02:48:40] Project: 2609 (Run 0, Clone 74, Gen 3)
[02:48:40]
[02:48:42] Entering M.D.
[02:48:48] Calling FAH init
[02:48:49] Writing local files
[02:48:49] ing from checkpoint)
[02:48:49] Read checkpoint
[02:48:49] Protein: Protein
[02:48:49] Writing local files
[02:48:50] Completed 480000 out of 500000 steps (96 percent)
[02:48:50] Extra SSE boost OK.
[03:08:25] Writing local files
[03:08:25] Completed 485000 out of 500000 steps (97 percent)
[03:27:45] Writing local files
[03:27:45] Completed 490000 out of 500000 steps (98 percent)
[03:47:05] Writing local files
[03:47:05] Completed 495000 out of 500000 steps (99 percent)
[04:06:23] Writing local files
[04:06:24] Completed 500000 out of 500000 steps (100 percent)
[04:06:24] Writing final coordinates.
[04:06:25] Past main M.D. loop
[04:06:25] Will end MPI now
[04:07:25]
[04:07:25] Finished Work Unit:
[04:07:25] - Reading up to 6048840 from "work/wudata_07.arc": Read 6048840
[04:07:25] - Reading up to 20680104 from "work/wudata_07.xtc": Read 20680104
[04:07:25] goefile size: 0
[04:07:25] logfile size: 356059
[04:07:25] Leaving Run
[04:07:26] - Writing 27187735 bytes of core data to disk...
[04:07:27] ... Done.
[04:07:28] - Failed to delete work/wudata_07.sas
[04:07:28] - Failed to delete work/wudata_07.goe
[04:07:28] Warning: check for stray files
[04:07:28] - Shutting down core
[04:09:28]
[04:09:28] Folding@home Core Shutdown: FINISHED_UNIT
[04:09:28]
[04:09:28] Folding@home Core Shutdown: FINISHED_UNIT
[04:09:33] CoreStatus = 64 (100)
[04:09:33] Sending work to server


[04:09:33] + Attempting to send results
[04:19:42] + Results successfully sent
[04:19:42] Thank you for your contribution to Folding@Home.

Adak
04-25-07, 01:27 AM
I don't know if this is your trouble, but I've been running the Linux SMP client for awhile, and have had problems on 3 of my 4 rigs, just stopping work, for no reason. This is using Ubuntu 6.10. You should check your similar settings in your BIOS and in your Windows O.S.

The cure for them was to go to System Tab>>Services>>Unclick the box for "Power Management (acpid)", close it, and reboot.

My foxconn (Intel) board had no problem with this being left checked, but 3 abit mobo's and 1 Gigabyte board, would work for awhile, and then just stop. Doing the above was their cure.

Your acpid manager is probably in your BIOS, but may also have a setting in Windows for it.

Good luck.

Adak

WarriorII
04-25-07, 01:30 AM
I've run the SMP client(Linux 6.10 also) on my Asus board with zero problems.

Are you using the Windoz SMP or Linux?

WarriorII
04-25-07, 01:31 AM
NM, Windows client.

ChasR
04-25-07, 07:43 AM
If you hadn't caught it, after three hours the client would have gone to sleep and probably EUEd the WU. I'm seeing this on one rig at about the same time each night. I first atributed it to Windows update but it now seems more likely to be a power management thing. Check to be sure hibernation is disabled.

pscout
04-25-07, 08:55 AM
Another possible cause for the win smp client to just stop is losing the lan connection which i imagine can happen easily if you are running wireless.

harlam357
04-25-07, 09:10 AM
I had similar problems on a Linux SMP client... but the cause seemed to be highly clocked RAM. Not a power management issue. :shrug:

pik4chu
04-25-07, 11:04 AM
If you hadn't caught it, after three hours the client would have gone to sleep and probably EUEd the WU. I'm seeing this on one rig at about the same time each night. I first atributed it to Windows update but it now seems more likely to be a power management thing. Check to be sure hibernation is disabled.
the only time I have ever had the cores sleep was when I messed up the username when running the install.bat. And I have had this hang like I mentioned above for a good deal more than 3 hours and upon restarting all is well. But this is however very troublesome after the initial setup I have little interest in checking each rig once a day to make sure this flakey client is working :(

I will check power settings in the BIOS when I get home today and see, but Im pretty sure it is the same as every other rig I setup.

Goshawk
04-25-07, 11:15 AM
Well, i've got the same thing happening on All four of my farming rigs. I have to restart the dumba** clients twice a day in order to keep a decent PPD up :/ otherwise, thesae things wouldnt do any work at all.

Checked all machines, none of them are running the acpid service, nor do they give me the option to disable it.


*frusterated*

EDIT: I tested things to make sure, I've run the SMP Client and a conventional client side by side for the past 2 days and the SMP client is the only one that stops working. I'm just about to uninstall the SMP and go with 4 regular clients and eat the PPD hit :/


~ Gos

AlabamaCajun
04-25-07, 01:43 PM
I've seen this on two Fedora Linux rigs.

I've also noticed my points are not being credited. Yester 2 Windows SC clients uploaded and no results posted. Only 1 SMP and one Ribo, the other 2 were some of these new cores. One is on the Win graphical client, the other a win-service. :bang head

pik4chu
04-25-07, 03:06 PM
I've seen this on two Fedora Linux rigs.

I've also noticed my points are not being credited. Yester 2 Windows SC clients uploaded and no results posted. Only 1 SMP and one Ribo, the other 2 were some of these new cores. One is on the Win graphical client, the other a win-service. :bang head
just noticed that the points for one I posted above have not been tallied... hmmm hopefuly it is just another issue with stanford and they are not being sent to a black hole or anything.
But it seems that this problem is with all SMP clients not just windows so that is a little more comforting but still frustrating.

Andisoss
04-25-07, 05:01 PM
I keep getting core downloading errors with my SMP client in Vista64. Trying to figure that one out..

pik4chu
04-25-07, 06:05 PM
well I think that last WU went into oblivion after all, wonderful just wonderful. :mad: