• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

7500/7501 UNSTABLE and EARLY

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

petteyg359

Likes Popcorn
Joined
Jul 31, 2004
I've just lost 10 units in a row (all 7500 and 7501) from UNSTABLE_MACHINE and EARLY_UNIT_END. All other units have been completing fine. The annoying part is that after giving me 10 of these bad units in a row, the assignment server decided to ignore me for 24 hours.

Code:
[08:46:14] Folding@home Core Shutdown: UNSTABLE_MACHINE
[08:46:15] CoreStatus = 7A (122)
[08:46:15] Sending work to server
[08:46:15] Project: 7500 (Run 0, Clone 59, Gen 98)
--
[08:47:20] Folding@home Core Shutdown: UNSTABLE_MACHINE
[08:47:20] CoreStatus = 7A (122)
[08:47:20] Sending work to server
[08:47:20] Project: 7500 (Run 0, Clone 135, Gen 71)
--
[08:48:24] Folding@home Core Shutdown: UNSTABLE_MACHINE
[08:48:24] CoreStatus = 7A (122)
[08:48:24] Sending work to server
[08:48:24] Project: 7500 (Run 0, Clone 103, Gen 79)
--
[08:49:27] Folding@home Core Shutdown: UNSTABLE_MACHINE
[08:49:28] CoreStatus = 7A (122)
[08:49:28] Sending work to server
[08:49:28] Project: 7500 (Run 0, Clone 147, Gen 70)
--
[08:50:29] Folding@home Core Shutdown: UNSTABLE_MACHINE
[08:50:29] CoreStatus = 7A (122)
[08:50:29] Sending work to server
[08:50:29] Project: 7500 (Run 0, Clone 91, Gen 81)
--
[17:21:43] Folding@home Core Shutdown: UNSTABLE_MACHINE
[17:21:43] CoreStatus = 7A (122)
[17:21:43] Sending work to server
[17:21:43] Project: 7500 (Run 0, Clone 51, Gen 80)
--
[17:22:46] Folding@home Core Shutdown: UNSTABLE_MACHINE
[17:22:46] CoreStatus = 7A (122)
[17:22:46] Sending work to server
[17:22:46] Project: 7500 (Run 0, Clone 33, Gen 69)
--
[17:23:45] Folding@home Core Shutdown: EARLY_UNIT_END
[17:23:45] CoreStatus = 72 (114)
[17:23:45] Sending work to server
[17:23:45] Project: 7500 (Run 0, Clone 256, Gen 0)
--
[17:24:48] Folding@home Core Shutdown: UNSTABLE_MACHINE
[17:24:48] CoreStatus = 7A (122)
[17:24:48] Sending work to server
[17:24:48] Project: 7501 (Run 0, Clone 33, Gen 40)
--
[17:25:50] Folding@home Core Shutdown: UNSTABLE_MACHINE
[17:25:51] CoreStatus = 7A (122)
[17:25:51] Sending work to server
[17:25:51] Project: 7501 (Run 0, Clone 24, Gen 37)
--
[17:26:51] Folding@home Core Shutdown: EARLY_UNIT_END
[17:26:51] CoreStatus = 72 (114)
[17:26:51] Sending work to server
[17:26:51] Project: 7500 (Run 0, Clone 257, Gen 0)
--
[17:27:54] Folding@home Core Shutdown: UNSTABLE_MACHINE
[17:27:54] CoreStatus = 7A (122)
[17:27:54] Sending work to server
[17:27:54] Project: 7501 (Run 0, Clone 86, Gen 37)

Anybody know anything about those?
 
Fantastic producers for regular smp WUs. I haven't seen a bad one yet. I suspect you really do have an unstable machine.
 
Fantastic producers for regular smp WUs. I haven't seen a bad one yet. I suspect you really do have an unstable machine.

It's an i7 930 in a datacenter in Germany (at all stock settings). It's been folding fine (-smp 7) for 7 months. Just those 10-in-a-row 7500/1 a couple days ago that were not happy.
 
p7500 and p7501 will not run at all if -smp is set to a prime number. That's the problem. Set to -smp or -smp 6 and it'll work fine.
 
p7500 and p7501 will not run at all if -smp is set to a prime number. That's the problem. Set to -smp or -smp 6 and it'll work fine.

That sounds kind of stupid... but I'll try it.
 
From Pande Group's Peter Kasson:
kasson said:
Re: new project 7500

Postby kasson » 07 May 2011, 19:36
Hmm--the core shouldn't allow SMP 7 in the first place. Please don't run SMP 7--it's a bad idea for WU stability. I'll talk to the core developers about removing it.

No benchmark changes to this project in place or planned.

It should map threads to an even number, but we can't see if it did that from what you posted.
Code:
[02:12:22] Project: 7500 (Run 0, Clone 42, Gen 42)
[02:12:22] 
[02:12:22] Entering M.D.
[02:12:28] Using Gromacs checkpoints
[02:12:28] [COLOR="Red"]Mapping NT from 4 to 4 [/COLOR]
On yours, I believe it should say "Mapping NT from 7 to 6" or it won't work.
 
Nope. It was staying with 7 (and again, it was doing just fine with 7 on every other project).

Code:
[08:46:31] Connecting to http://128.143.199.97:8080/
[08:46:32] Posted data.
[08:46:32] Initial: 0000; - Receiving payload (expected size: 1247514)
[08:46:38] - Downloaded at ~203 kB/s
[08:46:38] - Averaged speed for that direction ~516 kB/s
[08:46:38] + Received work.
[08:46:38] Trying to send all finished work units
[08:46:38] + No unsent completed units remaining.
[08:46:38] + Closed connections
[08:46:43]
[08:46:43] + Processing work unit
[08:46:43] Core required: FahCore_a3.exe
[08:46:43] Core found.
[08:46:43] Working on queue slot 01 [June 8 08:46:43 UTC]
[08:46:43] + Working ...
[08:46:43] - Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 01 -np 7 -priority 96 -checkpoint 15 -verbose -lifeline 11456 -version 634'

[08:46:43]
[08:46:43] *------------------------------*
[08:46:43] Folding@Home Gromacs SMP Core
[08:46:43] Version 2.27 (Dec. 15, 2010)
[08:46:43]
[08:46:43] Preparing to commence simulation
[08:46:43] - Looking at optimizations...
[08:46:43] - Created dyn
[08:46:43] - Files status OK
[08:46:43] - Expanded 1247002 -> 2077012 (decompressed 166.5 percent)
[08:46:43] Called DecompressByteArray: compressed_data_size=1247002 data_size=2077012, decompressed_data_size=2077012 diff=0
[08:46:43] - Digital signature verified
[08:46:43]
[08:46:43] Project: 7500 (Run 0, Clone 135, Gen 71)
[08:46:43]
[08:46:43] Assembly optimizations on if available.
[08:46:43] Entering M.D.
[08:46:49] Mapping NT from 7 to 7
[08:46:49] mdrun returned 255
[08:46:49] Going to send back what have done -- stepsTotalG=500000
[08:46:49] Work fraction=304942678016.0000 steps=500000.
[08:46:53] logfile size=6828 infoLength=6828 edr=25 trr=1
[08:46:53] logfile size: 6828 info=6828 bed=25 hdr=1
[08:46:53] - Writing 7366 bytes of core data to disk...
[08:46:53] Done: 6854 -> 2442 (compressed to 35.6 percent)
[08:46:53]   ... Done.
[08:47:20]
[08:47:20] Folding@home Core Shutdown: UNSTABLE_MACHINE
[08:47:20] CoreStatus = 7A (122)
[08:47:20] Sending work to server
[08:47:20] Project: 7500 (Run 0, Clone 135, Gen 71)
 
Opps... and here I have been advocating -smp 7 for i7 w/GPUs. It's always worked much better for me. Guess I'm gonna have to rethink that strategy.

What other tricks to get GPU(s) to run well with -smp 8 configurations ChasR? Besides the normal priority (low/idle) settings. I'm not even sure if the GPU cores lock themselves to a specific CPU core anymore or not. Does it help to spread the GPU work across all cpus using the affinity environment variable?
 
I think the changes in a3 core are the cause for the failures and we'll be seeing more and more of them with new WUs based on the latest Gromacs cores. The a5 core is an older versions of Gromacs and the prime number -smp failures haven't affected WUs at least at the lower prime numbers. I wouldn't bother folding gpu on a 2600K at -smp 8 no matter how you set it. On any rig folding -bigadv, you could get a p7500 or p7501, if the servers are out of -bigadv WUs. If you're running -smp 7, you're screwed.
 
Agreed on 2600K... but what about socket 1366 i7 CPUs? That's what I was really asking... how best to mix bigadv w/GPUs while also avoiding -smp 7 on 1366?
 
Back