PDA

View Full Version : Crapped GPU/Kick in the nuttz


AmbientFiction
12-08-09, 04:15 PM
Yep so I went out to the store for like 2 hrs today and came back to this crap.

[19:34:47] Resuming from checkpoint
[19:34:47] fcCheckPointResume: retreived and current tpr file hash:
[19:34:47] 0 1051483441 1051483441
[19:34:47] 1 3992233678 3992233678
[19:34:47] 2 1644592073 1644592073
[19:34:47] 3 2701558493 2701558493
[19:34:47] 4 240015355 240015355
[19:34:47] Verified work/wudata_06.log
[19:34:47] Verified work/wudata_06.edr
[19:34:48] Verified work/wudata_06.xtc
[19:34:48] Completed 25%
[19:39:57] Completed 26%
[19:46:32] Completed 27%
[19:53:06] Completed 28%
[19:53:46] mdrun_gpu returned
[19:53:46] NANs detected on GPU
[19:53:46]
[19:53:46] Folding@home Core Shutdown: UNSTABLE_MACHINE
[19:53:50] CoreStatus = 7A (122)
[19:53:50] Sending work to server
[19:53:50] Project: 5745 (Run 1, Clone 42, Gen 671)
[19:53:50] - Read packet limit of 540015616... Set to 524286976.
[19:53:50] - Error: Could not get length of results file work/wuresults_06.dat
[19:53:50] - Error: Could not read unit 06 file. Removing from queue.
[19:53:50] Trying to send all finished work units
[19:53:50] + No unsent completed units remaining.
[19:53:50] - Preparing to get new work unit...
[19:53:50] + Attempting to get work packet
[19:53:50] - Will indicate memory of 2047 MB
[19:53:50] - Detect CPU. Vendor: AuthenticAMD, Family: 15, Model: 3, Stepping: 3

[19:53:50] - Connecting to assignment server
[19:53:50] Connecting to http://assign-GPU.stanford.edu:8080/
[19:53:54] Posted data.
[19:53:54] Initial: 40AB; - Successful: assigned to (171.64.65.102).
[19:53:54] + News From Folding@Home: Welcome to Folding@Home
[19:53:54] Loaded queue successfully.
[19:53:54] Connecting to http://171.64.65.102:8080/
[19:53:55] Posted data.
[19:53:55] Initial: 0000; - Receiving payload (expected size: 69067)
[19:53:55] Conversation time very short, giving reduced weight in bandwidth avg
[19:53:55] - Downloaded at ~134 kB/s
[19:53:55] - Averaged speed for that direction ~88 kB/s
[19:53:55] + Received work.
[19:53:55] Trying to send all finished work units
[19:53:55] + No unsent completed units remaining.
[19:53:55] + Closed connections
[19:54:00]
[19:54:00] + Processing work unit
[19:54:00] Core required: FahCore_11.exe
[19:54:00] Core found.
[19:54:00] Working on queue slot 07 [December 8 19:54:00 UTC]
[19:54:00] + Working ...
[19:54:00] - Calling '.\FahCore_11.exe -dir work/ -suffix 07 -checkpoint 15 -ver
bose -lifeline 744 -version 623'

[19:54:00]
[19:54:00] *------------------------------*
[19:54:00] Folding@Home GPU Core - Beta
[19:54:00] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
[19:54:00]
[19:54:00] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14
.00.50727.762 for 80x86
[19:54:00] Build host: amoeba
[19:54:00] Board Type: AMD
[19:54:00] Core :
[19:54:00] Preparing to commence simulation
[19:54:00] - Looking at optimizations...
[19:54:00] - Created dyn
[19:54:00] - Files status OK
[19:54:00] - Expanded 68555 -> 357580 (decompressed 521.5 percent)
[19:54:00] Called DecompressByteArray: compressed_data_size=68555 data_size=3575
80, decompressed_data_size=357580 diff=0
[19:54:00] - Digital signature verified
[19:54:00]
[19:54:00] Project: 5745 (Run 1, Clone 42, Gen 671)
[19:54:00]
[19:54:01] Assembly optimizations on if available.
[19:54:01] Entering M.D.
[19:54:07] Tpr hash work/wudata_07.tpr: 1051483441 3992233678 1644592073 270155
8493 240015355
[19:54:07] Working on Protein
[19:54:09] Client config found, loading data.
[19:54:09] Starting GUI Server
[20:00:11] mdrun_gpu returned
[20:00:11] NANs detected on GPU
[20:00:11]
[20:00:11] Folding@home Core Shutdown: UNSTABLE_MACHINE
[20:00:15] CoreStatus = 7A (122)
[20:00:15] Sending work to server
[20:00:15] Project: 5745 (Run 1, Clone 42, Gen 671)
[20:00:15] - Read packet limit of 540015616... Set to 524286976.
[20:00:15] - Error: Could not get length of results file work/wuresults_07.dat
[20:00:15] - Error: Could not read unit 07 file. Removing from queue.
[20:00:15] Trying to send all finished work units
[20:00:15] + No unsent completed units remaining.
[20:00:15] - Preparing to get new work unit...
[20:00:15] + Attempting to get work packet
[20:00:15] - Will indicate memory of 2047 MB
[20:00:15] - Connecting to assignment server
[20:00:15] Connecting to http://assign-GPU.stanford.edu:8080/
[20:00:16] Posted data.
[20:00:16] Initial: 40AB; - Successful: assigned to (171.64.65.102).
[20:00:16] + News From Folding@Home: Welcome to Folding@Home
[20:00:16] Loaded queue successfully.
[20:00:16] Connecting to http://171.64.65.102:8080/
[20:00:17] Posted data.
[20:00:17] Initial: 0000; - Receiving payload (expected size: 69067)
[20:00:17] Conversation time very short, giving reduced weight in bandwidth avg
[20:00:17] - Downloaded at ~134 kB/s
[20:00:17] - Averaged speed for that direction ~93 kB/s
[20:00:17] + Received work.
[20:00:17] Trying to send all finished work units
[20:00:17] + No unsent completed units remaining.
[20:00:17] + Closed connections
[20:00:22]
[20:00:22] + Processing work unit
[20:00:22] Core required: FahCore_11.exe
[20:00:22] Core found.
[20:00:22] Working on queue slot 08 [December 8 20:00:22 UTC]
[20:00:22] + Working ...
[20:00:22] - Calling '.\FahCore_11.exe -dir work/ -suffix 08 -checkpoint 15 -ver
bose -lifeline 744 -version 623'

[20:00:23]
[20:00:23] *------------------------------*
[20:00:23] Folding@Home GPU Core - Beta
[20:00:23] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
[20:00:23]
[20:00:23] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14
.00.50727.762 for 80x86
[20:00:23] Build host: amoeba
[20:00:23] Board Type: AMD
[20:00:23] Core :
[20:00:23] Preparing to commence simulation
[20:00:23] - Looking at optimizations...
[20:00:23] - Created dyn
[20:00:23] - Files status OK
[20:00:23] - Expanded 68555 -> 357580 (decompressed 521.5 percent)
[20:00:23] Called DecompressByteArray: compressed_data_size=68555 data_size=3575
80, decompressed_data_size=357580 diff=0
[20:00:23] - Digital signature verified
[20:00:23]
[20:00:23] Project: 5745 (Run 1, Clone 42, Gen 671)
[20:00:23]
[20:00:23] Assembly optimizations on if available.
[20:00:23] Entering M.D.
[20:00:29] Tpr hash work/wudata_08.tpr: 1051483441 3992233678 1644592073 270155
8493 240015355
[20:00:29] Working on Protein
[20:00:31] Client config found, loading data.
[20:00:31] Starting GUI Server
[20:03:20] mdrun_gpu returned
[20:03:20] NANs detected on GPU
[20:03:20]
[20:03:20] Folding@home Core Shutdown: UNSTABLE_MACHINE
[20:03:23] CoreStatus = 7A (122)
[20:03:23] Sending work to server
[20:03:23] Project: 5745 (Run 1, Clone 42, Gen 671)
[20:03:23] - Read packet limit of 540015616... Set to 524286976.
[20:03:23] - Error: Could not get length of results file work/wuresults_08.dat
[20:03:23] - Error: Could not read unit 08 file. Removing from queue.
[20:03:23] Trying to send all finished work units
[20:03:23] + No unsent completed units remaining.
[20:03:23] - Preparing to get new work unit...
[20:03:23] + Attempting to get work packet
[20:03:23] - Will indicate memory of 2047 MB
[20:03:23] - Connecting to assignment server
[20:03:23] Connecting to http://assign-GPU.stanford.edu:8080/
[20:03:24] Posted data.
[20:03:24] Initial: 40AB; - Successful: assigned to (171.64.65.102).
[20:03:24] + News From Folding@Home: Welcome to Folding@Home
[20:03:24] Loaded queue successfully.
[20:03:24] Connecting to http://171.64.65.102:8080/
[20:03:25] Posted data.
[20:03:25] Initial: 0000; - Receiving payload (expected size: 69067)
[20:03:25] Conversation time very short, giving reduced weight in bandwidth avg
[20:03:25] - Downloaded at ~134 kB/s
[20:03:25] - Averaged speed for that direction ~98 kB/s
[20:03:25] + Received work.
[20:03:25] Trying to send all finished work units
[20:03:25] + No unsent completed units remaining.
[20:03:25] + Closed connections
[20:03:30]
[20:03:30] + Processing work unit
[20:03:30] Core required: FahCore_11.exe
[20:03:30] Core found.
[20:03:30] Working on queue slot 09 [December 8 20:03:30 UTC]
[20:03:30] + Working ...
[20:03:30] - Calling '.\FahCore_11.exe -dir work/ -suffix 09 -checkpoint 15 -ver
bose -lifeline 744 -version 623'

[20:03:30]
[20:03:30] *------------------------------*
[20:03:30] Folding@Home GPU Core - Beta
[20:03:30] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
[20:03:30]
[20:03:30] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14
.00.50727.762 for 80x86
[20:03:30] Build host: amoeba
[20:03:30] Board Type: AMD
[20:03:30] Core :
[20:03:30] Preparing to commence simulation
[20:03:30] - Looking at optimizations...
[20:03:30] - Created dyn
[20:03:30] - Files status OK
[20:03:30] - Expanded 68555 -> 357580 (decompressed 521.5 percent)
[20:03:30] Called DecompressByteArray: compressed_data_size=68555 data_size=3575
80, decompressed_data_size=357580 diff=0
[20:03:30] - Digital signature verified
[20:03:30]
[20:03:30] Project: 5745 (Run 1, Clone 42, Gen 671)
[20:03:30]
[20:03:31] Assembly optimizations on if available.
[20:03:31] Entering M.D.
[20:03:37] Tpr hash work/wudata_09.tpr: 1051483441 3992233678 1644592073 270155
8493 240015355
[20:03:37] Working on Protein
[20:03:39] Client config found, loading data.
[20:03:40] Starting GUI Server
[20:07:05] mdrun_gpu returned
[20:07:05] NANs detected on GPU
[20:07:05]
[20:07:05] Folding@home Core Shutdown: UNSTABLE_MACHINE
[20:07:09] CoreStatus = 7A (122)
[20:07:09] Sending work to server
[20:07:09] Project: 5745 (Run 1, Clone 42, Gen 671)
[20:07:09] - Read packet limit of 540015616... Set to 524286976.
[20:07:09] - Error: Could not get length of results file work/wuresults_09.dat
[20:07:09] - Error: Could not read unit 09 file. Removing from queue.
[20:07:09] Trying to send all finished work units
[20:07:09] + No unsent completed units remaining.
[20:07:09] - Preparing to get new work unit...
[20:07:09] + Attempting to get work packet
[20:07:09] - Will indicate memory of 2047 MB
[20:07:09] - Connecting to assignment server
[20:07:09] Connecting to http://assign-GPU.stanford.edu:8080/
[20:07:10] Posted data.
[20:07:10] Initial: 40AB; - Successful: assigned to (171.64.65.102).
[20:07:10] + News From Folding@Home: Welcome to Folding@Home
[20:07:10] Loaded queue successfully.
[20:07:10] Connecting to http://171.64.65.102:8080/
[20:07:11] Posted data.
[20:07:11] Initial: 0000; - Receiving payload (expected size: 69067)
[20:07:11] Conversation time very short, giving reduced weight in bandwidth avg
[20:07:11] - Downloaded at ~134 kB/s
[20:07:11] - Averaged speed for that direction ~102 kB/s
[20:07:11] + Received work.
[20:07:11] Trying to send all finished work units
[20:07:11] + No unsent completed units remaining.
[20:07:11] + Closed connections
[20:07:16]
[20:07:16] + Processing work unit
[20:07:16] Core required: FahCore_11.exe
[20:07:16] Core found.
[20:07:16] Working on queue slot 00 [December 8 20:07:16 UTC]
[20:07:16] + Working ...
[20:07:16] - Calling '.\FahCore_11.exe -dir work/ -suffix 00 -checkpoint 15 -ver
bose -lifeline 744 -version 623'

[20:07:16]
[20:07:16] *------------------------------*
[20:07:16] Folding@Home GPU Core - Beta
[20:07:16] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
[20:07:16]
[20:07:16] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14
.00.50727.762 for 80x86
[20:07:16] Build host: amoeba
[20:07:16] Board Type: AMD
[20:07:16] Core :
[20:07:16] Preparing to commence simulation
[20:07:16] - Looking at optimizations...
[20:07:16] - Created dyn
[20:07:16] - Files status OK
[20:07:16] - Expanded 68555 -> 357580 (decompressed 521.5 percent)
[20:07:16] Called DecompressByteArray: compressed_data_size=68555 data_size=3575
80, decompressed_data_size=357580 diff=0
[20:07:16] - Digital signature verified
[20:07:16]
[20:07:16] Project: 5745 (Run 1, Clone 42, Gen 671)
[20:07:16]
[20:07:17] Assembly optimizations on if available.
[20:07:17] Entering M.D.
[20:07:23] Tpr hash work/wudata_00.tpr: 1051483441 3992233678 1644592073 270155
8493 240015355
[20:07:23] Working on Protein
[20:07:25] Client config found, loading data.
[20:07:26] Starting GUI Server
[20:10:50] mdrun_gpu returned
[20:10:50] NANs detected on GPU
[20:10:50]
[20:10:50] Folding@home Core Shutdown: UNSTABLE_MACHINE
[20:10:54] CoreStatus = 7A (122)
[20:10:54] Sending work to server
[20:10:54] Project: 5745 (Run 1, Clone 42, Gen 671)
[20:10:54] - Read packet limit of 540015616... Set to 524286976.
[20:10:54] - Error: Could not get length of results file work/wuresults_00.dat
[20:10:54] - Error: Could not read unit 00 file. Removing from queue.
[20:10:54] EUE limit exceeded. Pausing 24 hours.
:cry::cry::rain::rain:

What a frigging kick in the nuttz. Anyone have any ideas? Running XP Pro x64. Rig one in sig. Ati Hd4350. Just installed loopback today. It gave me crap so I uninstalled it but the client was running fine after a restart.

Good thing I've got another rig coming up with 2 nvidia cards coming in.

**Edit** Deleted all files but client.cfg and my log. Grabbed 6.23 of the site and reinstalled. Client seems to be running fine. Only time will let me know on this.

ChasR
12-08-09, 04:23 PM
Do the standard stuff.
reboot: fixes the problem a good percentage of the time.
reduce flush interval: the 4350 can't handle large flush intervals.
uninstall and reinstall the driver: usually fixes the problem.

AmbientFiction
12-08-09, 04:26 PM
Do the standard stuff.
reboot: fixes the problem a good percentage of the time.
reduce flush interval: the 4350 can't handle large flush intervals.
uninstall and reinstall the driver: usually fixes the problem.

Is 128 too high of a flush interval?

Running the following vars:
BROOK_YIELD=2
CAL_NO_FLUSH=1
CAL_PRE_FLUSH=1
FLUSH_INTERVAL=128

See anything I need to change?

**Edit** Correction this is my 4650 my bad.

ChasR
12-08-09, 06:20 PM
128 might be too high for the 4350. Not sure about the 4650. HD3850 runs at 128, HD 4870 runs at 192.

Bobnova
12-08-09, 10:32 PM
If your GPU doesn't have heatsinks on the memory i strongly suggest putting them on, it may or may not have anything to do with what you have going on now (probably not), but i got the same error constantly before sinking my gpu's ram.

EDIT:
Plus i got an extra 300mhz ram speed :beer:

AmbientFiction
12-08-09, 11:49 PM
Yeah I know I have so many projects to do. It will all come with time. Also when I get in the 8800 GTX Jester220 is sending me I should have some ramsinks. I've also got a lot of old heatsinks. I'm sure a hack saw would make some quick sinks.

I just modded an ASKA Silver Mtn to fit a socket 939 x2. I'm gonna test it out to see if it is boss enough to cool off that 3800x2. I feel it should be and the size is perfect. I'll let you know how it goes. 1.2lbs of copper coated in silver w/92mm fan on it.