• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

Stability and folding, a frank discussion

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

LuvToGame

Member
Hi fellow folders,

I searched the forum and other sources without finding a satisfactory answer on this delicate subject. How stable should a machine be to fold without introducing error into the project. Many seem to hold the opinion that if it completes a run then all is good no foul. However the integrity of the science comes before the PPD--or at least it does with myself. I personally fold on a machine that absolutely will not pass a prime95 torture test past 8 hrs, but has never, ever failed to complete a wu due to machine error.

So how stable must stable be to fold without risk. What is your opinion and why?
 
Generally it is the other way around. Machines that are 24 hours prime stable aren't folding stable. Of late, more robust checkpointing built into the gromacs core results in far less lost WUs due to instability.
 
Generally it is the other way around. Machines that are 24 hours prime stable aren't folding stable. Of late, more robust checkpointing built into the gromacs core results in far less lost WUs due to instability.

So if I understand you correctly, Gromacs is robust enough to detect any and all errors that might occur on these consumer processors? Furthermore, are you suggesting that the software will recover and retry the calculations until it passes check?
 
Without risk? There's always a risk... even with off the shelf hardware.

Personally, I use tools like, P95, LinX, IBT, OCCT, etc to test the stability of my CPU(s). Then after those have been sufficiently tested, which is relative to one's personal standard, it's just a matter of Folding and seeing if problems arise.

If one can run a machine 24x7 for several weeks (a month perhaps) straight under FAH without issue. I call that really stable. :)
 
L2G,
A lot depends on why the machine is unstable. Gromacs tends to recover from random reboots (due to undervolting for the speed of the cpu) pretty well. If the ram is whacked or the cpu isn't the least bit stable, you will see a bunch of EUEs and Unstable machines. Most of my machines will run over 100 SMP WUs per failure, that's pretty stable. Thanks to Harlam and HFM, it's really easy to track completions and failures and that is one of the main reasons I have not adopted v7, even though I am a tester.

Gromacs has very good internal testing and will fail the WU if the results/trajectory go out of bounds. If it completes I'd assume it's good.
 
Back