PDA

View Full Version : Weird problem with BigPackets


Audioaficionado
09-11-04, 10:18 PM
I have two clients set up for BigPackets. FAH1 and FAH2. For some strange reason the BigPacket dies on FAH2 but not FAH1. Same exact package every time. p1301_1RYP_AAAA_UM. What could be causing this? Can I set the affinity to both physical CPUs and let the logical CPUs run the regular WUs?

The only differance on the client setups is that I did the -config shortcut to setup #1 and, using notepad, just copied and inserted the □bigpackets=yes□ over to the #2 config file rather than use the -config shortcut switch again for the second client.

Both clients were stopped and restarted after the config changes. The config files are the same except for machine and local ID#.

FAH1:
machineid=1
local=423
bigpackets=yes

FAH2:
machineid=2
local=417
bigpackets=yes

FizzledFiend
09-11-04, 10:48 PM
do they run @ all on it or does it fail imediatly after it starts up? had this same problem took a reinstall of the cores to correct it.

Audioaficionado
09-11-04, 11:14 PM
Well FAH#1 finished one and is busy with another. FAH#2 lost one at 66 frames and a second one at 33 frames. Both are in the early stages with two more. All five BP WUs I've recieved were the p1301_1RYP_AAAA_UM worth 242 and fold a third faster than a 242 Tinker. DGromacs still do the best for me PPW.

stan03
09-11-04, 11:23 PM
iv heard that the bigpackets just aren't stable, iv lost a few myself although i haven't noticed only one instance loosing it. Is the error the CLIENT DIED one or the EARLY UNIT END one?

aftermath
09-12-04, 05:53 AM
it would be a good idea to try and get each to fold on one cpu to see if ones bad. but... (i dont think this is posible with the service as its a sytem program and dooes not alow user to intrefear. stop the servoice and launce the consoles manualy.)
try siwitching the clients to the other cpu. if it still caries on with the one that was failing, failing on the other reliable cpu then its just big packets.

http://www.ocforums.com/showthread.php?t=327042
iv found that my system wasnt quite as stable as a thought with these after a little trweaking im back at normal oc speed and having more luck with bp wu. i just droped the volatage a bit.
iv had one die since then

Audioaficionado
09-12-04, 12:06 PM
As much as it can be trusted, Task Manager shows 4 CPU graphs and if I start one, two or three clients, all CPU graphs show usage. Some more than others but never on just one CPU. With FAH off, my typical usage is 0 to 3% and the usage of the other CPUs durring one client running exceeds that.

Here's the snippet of FAH log.

[22:14:06] Completed 33000 out of 100000 steps (33)
[22:25:45] Quit 101 - Fatal error:
[22:25:45] Step 33474, time 33.474 (ps) LINCS WARNING
[22:25:45] relative constraint deviation after LINCS:
[22:25:45] max 0.212701 (between atoms 119473 and 119475) rms 0.001324
[22:25:45]
[22:25:45] Simulation instability has been encountered. The run has entered a
[22:25:45] state from which no further progress can be made.
[22:25:45] If you often see other project units terminating early like this
[22:25:45] too, you may wish to check the stability of your computer (issues
[22:25:45] such as high temperature, overclocking, etc.).
[22:25:45] Going to send back what have done.
[22:25:45] logfile size: 32545
[22:25:45] - Writing 33231 bytes of core data to disk...
[22:25:45] ... Done.
[22:25:45]
[22:25:45] Folding@home Core Shutdown: EARLY_UNIT_END
[22:25:48] CoreStatus = 72 (114)
[22:25:48] Sending work to server


[22:25:48] + Attempting to send results
[22:25:52] + Results successfully sent
[22:25:52] Thank you for your contribution to Folding@Home.
[22:25:56] - Preparing to get new work unit...
[22:25:56] + Attempting to get work packet
[22:25:56] - Connecting to assignment server
[22:25:57] - Successful: assigned to (171.67.89.154).
[22:25:57] + News From Folding@Home: Welcome to Folding@Home
[22:25:57] Loaded queue successfully.
[22:26:32] + Closed connections
[22:26:37]
[22:26:37] + Processing work unit
[22:26:37] Core required: FahCore_78.exe
[22:26:37] Core found.
[22:26:37] Working on Unit 01 [September 11 22:26:37]
[22:26:37] + Working ...
[22:26:37]
[22:26:37] *------------------------------*
[22:26:37] Folding@Home Gromacs Core
[22:26:37] Version 1.68 (August 18, 2004)
[22:26:37]
[22:26:37] Preparing to commence simulation
[22:26:37] - Looking at optimizations...
[22:26:37] - Created dyn
[22:26:37] - Files status OK
[22:26:51] - Expanded 3877865 -> 22162121 (decompressed 571.5 percent)
[22:26:51] - Starting from initial work packet
[22:26:51]
[22:26:51] Project: 1301 (Run 552, Clone 0, Gen 7)
[22:26:51]
[22:26:52] Assembly optimizations on if available.
[22:26:52] Entering M.D.
[22:27:04] Protein: p1301_1RYP_AAAA_UM
[22:27:04]
[22:27:04] Writing local files
[22:27:39] Extra SSE boost OK.
[22:27:40] Writing local files
[22:27:41] Completed 0 out of 100000 steps (0)

[22:54:06] Writing local files
[22:54:07] Completed 1000 out of 100000 steps (1)

BTW both BigPackets I have running are now at frame 43 and so far so good.

stan03
09-12-04, 01:11 PM
i believe early unit end has to do with the wu itself and not your computer

FizzledFiend
09-12-04, 10:50 PM
I had about 4 or 5 early ends there @ the last ones they would fire up and then shut down...after that it started completing totally. I think it's what standford said about even though they end early it tells them alot and they can correct the issues...just keep going man your getting credit non the less

Audioaficionado
09-13-04, 02:56 AM
Well FAH2 lost a third one at frame 55 so I deleted the core and had a new one redownloaded just in case it might help.

Audioaficionado
09-15-04, 01:26 AM
Update: Since I replaced that core, FAH2 has competed a BP and so far is working well on the next one. FAH1 hasn't lost any yet.