View Full Version : So what am I doing wrong here? (Stability problems)
Hi all.
Ive upgraded my computer a bit :)
I've got myself a Q6600, with 8GB RAM and a ATI 4850 Vidcard. Running Vista 64 bit.
And of course I want to fold with this thing. :D I have been running the GPU client very nicely and on 6.20 client for two of the CPU cores.
Everything was working fine for a week or so, but now the 6.20 core isn't running stable anymore. Here is what happens:
[08:30:50] - Ask before connecting: No
[08:30:50] - User name: Wega! (Team 32)
[08:30:50] - User ID: 77E0CA5E472721AB
[08:30:50] - Machine ID: 1
[08:30:50]
[08:30:50] Loaded queue successfully.
[08:30:50]
[08:30:50] + Processing work unit
[08:30:50] Core required: FahCore_78.exe
[08:30:50] Core found.
[08:30:50] Working on queue slot 01 [December 2 08:30:50 UTC]
[08:30:50] + Working ...
[08:30:50]
[08:30:50] *------------------------------*
[08:30:50] Folding@Home Gromacs Core
[08:30:50] Version 1.90 (March 8, 2006)
[08:30:50]
[08:30:50] Preparing to commence simulation
[08:30:50] - Ensuring status. Please wait.
[08:31:07] - Assembly optimizations manually forced on.
[08:31:07] - Not checking prior termination.
[08:31:09] - Expanded 414346 -> 10981737 (decompressed 2650.3 percent)
[08:31:09]
[08:31:09] Project: 2620 (Run 3, Clone 12, Gen 22)
[08:31:09]
[08:31:09] Assembly optimizations on if available.
[08:31:09] Entering M.D.
[08:31:16] Protein: p2620_p1475_tet1_03_1 t= 20000.00000
[08:31:16]
[08:31:16] Writing local files
[08:31:16] Gromacs error.
[08:31:16]
[08:31:16] Folding@home Core Shutdown: UNKNOWN_ERROR
[08:31:18] CoreStatus = 79 (121)
[08:31:18] Client-core communications error: ERROR 0x79
[08:31:18] This is a sign of more serious problems, shutting down.
Sometimes it can run a WU, but most of the time I get this error.
So what could be the problem? I haven't overclocked anything in the computer (:screwy:) and I can't remember that I have changed anything important. I'm not using any flags, it's just a standard F@H installation.
I hope you can help me out with this problem :)
$SOLID$Necro
12-02-08, 04:39 AM
It's possible the Folding install is corrupt, try uninstalling it and reinstalling it, don't forget to manually check for any leftover files (Corrupt cores for example)
Have you bothered to monitor your temps on the CPU and GPU? (Try Real Temp, Core Temp, or Speed Fan..just google them)
My GTX 260 at stock fan speeds (Around 30-40%) will hit 85c! I use the NVIDA OC tool (You can get it from there site) to bump my fan speed to around 70% and the temp sits at a comfy 60C.
N-TUNE
http://www.nvidia.com/object/ntune_5.05.54.00.html
You will have to play around with what you are comfortable with noise wise, I have another fan that is even louder then when I am at 70%, so it works well for me, but you may find over 50% is noticable.
CPU temps should also be kept at around 60C or less, though you should be safe at up to 70c.
Electronics do not like heat, the cooler running you can kep it, the longer it will last...runing the CPU and GPU at 80+c will shorten it's lifespan by a fair amount, probably just long enough to run out of warranty, lol.
Do you have a good brand power supply? some of the lesser brands do not hold up well under heavy loads like your trying, and will often corrupt software as a result.
I am using a PC Power and cooling 750 watt with 80 amps on a single rail, and it's solid as a rock, I could get by easily with a 600 watt version of the same brand, but i geot a good deal on it.
You may be hitting close to the limits of a poor quality PSU even if it's rated at 500 watts, they tend to exagerate that number on cheapo's, they will often be only capable of supplying 400 watts or less at full load and max temp.
A 200 watt video card and your quad core will be right at the edge of that 400 watts..think of driving your car with a top speed of 100MPH 24-7 at 98MPH. It would do it for a while, but will eventually have problems.
If your car could do 125mph, cruising at 100MPH 24-7 would be alot easier on it!
If you still have corruption issues with your set up, try just running the GPU folding for a while..see if it's stable, then try just CPU folding.
If you can't do them both stabily at the same time, it's either going to be heat related as I mentioned, or your PSU isn't up to the task.
EDIT: OOPS! I see you are using the ATI 4850..it is known to run extremely hot!
There are "Fan Fixes" all over the net for ATI, again..keeping the temps as low as possible is what you should try for. If possible, you may consider adding a 80-120 MM on the side panel of your case, or a fan inside the case blowing directly on the card itself...this may allow you too keep the ati fan speed low enough to not be annoying while keeping the card cool.
http://forums.extremeoverclocking.com/t296594.html
This is what the FAH Wiki has to say about this error:
Gromacs error.
Folding@home Core Shutdown: UNKNOWN_ERROR
CoreStatus = 79 (121)
Client-core communications error: ERROR 0x79
Deleting current work unit & continuing...
This error can occur with these lines preceding the error message above:
- Couldn't open work/wudata_xx.chk
- Couldn't open work/wudata_xx.chk
Couldn't open for writing
Writing local files
In this case the error is caused by the core being unable to open, and therefore write-to, it's checkpoint file. Check that the permissions on the files are correct and that you didn't run out of space on the disk
This error also can be caused by memory errors which may be related to overclocking or wrong voltages or simply by bad RAM. If this error occurs when the core is just starting, there's a reasonable chance that it was an "unable to allocate" issue such as running out of space in the paging file or a memory fragmentation issue.
This error also can be caused by a WU which is corrupted during downloading.
Folding-community: p1488 r6 c9 g2, dies at 98% - Linux & error 79
From the above, it seems the trouble could be anything from dust bunnies making your temps too high, to RAM or HD problems, or even the fault of the WU being mangled in transmission.
Checking your temps, and testing your RAM (this usually needs a long test of a few hours, at least), and your HD, could help. I'd also check the FAH forum for any mention of problems with the specific WU you're folding.
Good luck. Let us know what you find, please.
If I understand your post correctly, you're running 3 clients on your quad, two unicpu and one GPU. The log is from a unicpu WU. I presume the GPU is folding fine? Post the log of each client particularly at the start of each client showing the FAH version and paths. You could have an installation issue (clients in the same directory).
The current version of the unicpu client and GPU client is 6.23, though they are different executables. I recommend you upgrade via a complete reinstallation.
$SOLID$Necro: Yes I have bothered to check my temps :)
My Vidcard is running between 70-75C, and the two CPU cores that are running F@H is about 50-55C, all at full load.
I'm running with a Antec 500W PSU.
Adak:
Other than Folding@home my computer is running 100% stable. If it is a problem with my RAM, shouldn't I get error in other programs like Photoshop or in games? Also I think my temps are fine, so that should not be a problem, or am I wrong about my temps?
I will try to install the client on another HD to see if that helps.
ChasR:
No I'm only running two clients: 1 GPU and 1 SMP client. So Iøm "only" using my vidcard and 2 out of the 4 CPU cores. I will try the 6.23 version.
Thank you all for helping me with this :)
Your temps look OK. :)
When RAM first begins to go bad, one program will nearly always show it first - and that program is FAH. Last guy that had RAM fail, ran a memtest program for about an hour and it showed everything fine.
So he stopped the test. The errors continued with FAH, and only a few weeks later, did he notice an error in another program. Then he ran the memtest program again, this time for 3 hours - and sure enough, he had one stick of memory that had to be replaced.
It's a common mistake to "logically" try and figure out what is and what is not wrong when we troubleshoot. That's just human, but usually wrong. :p
Only by testing it thoroughly, do we really know what's what.
Cluster
12-02-08, 07:12 PM
Thought about running the Linux SMP client? I've had much more luck with it, especially with the beta client that can be found on the foldingforum.org forums. That and production almost triples that of the windows client. Something to think about.
The WU in the portion of the log you posted is neither an SMP nor GPU WU, so you do have config problem.
A little update.
I reinstalled the 6.23 client on another HD and it's been running nicly since then...
Cluster: I've run on a Linux setup before, trough VM. But my computer is not a dedicated folder, so I won't go that way this time.
Thank you all for your help :)
vBulletin® v3.8.7, Copyright ©2000-2012, vBulletin Solutions, Inc.