View Full Version : I need smp help
Mustanley
03-01-07, 08:45 AM
Trying to get smp running in native debain. Had been running the regular client without problems. Finished off the last two WUs with the -oneunit flag and installed the smp client.
The machine is a dual opteron (2 x single core opteron 246s) w/ 2gb of ram and the kernel is 2.6. Here's a sample from the log file showing the error I'm getting:
[14:39:10] Trying to send all finished work units
[14:39:10] + No unsent completed units remaining.
[14:39:10] - Preparing to get new work unit...
[14:39:10] + Attempting to get work packet
[14:39:10] - Will indicate memory of 1943 MB
[14:39:10] - Connecting to assignment server
[14:39:10] Connecting to http://assign.stanford.edu:8080/
[14:39:10] Posted data.
[14:39:10] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[14:39:10] + News From Folding@Home: Welcome to Folding@Home
[14:39:10] Loaded queue successfully.
[14:39:10] Connecting to http://171.64.65.56:8080/
[14:39:13] Posted data.
[14:39:13] Initial: 0000; - Receiving payload (expected size: 2955328)
[14:40:04] - Downloaded at ~56 kB/s
[14:40:04] - Averaged speed for that direction ~57 kB/s
[14:40:04] + Received work.
[14:40:04] + Closed connections
[14:40:09]
[14:40:09] + Processing work unit
[14:40:09] Core required: FahCore_a1.exe
[14:40:09] Core found.
[14:40:09] Working on Unit 08 [March 1 14:40:09]
[14:40:09] + Working ...
[14:40:09] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 08 -checkpoint 15 -forceasm -verbose -lifeline 4177 -version 591'
[14:40:09] CoreStatus = 0 (0)
[14:40:09] Client-core communications error: ERROR 0x0
[14:40:09] Deleting current work unit & continuing...
[14:40:09] - Warning: Could not delete all work unit files (8): Core file absent
tia everybody
EDIT, Looks like you did get the core, I have no clue! My personal troubleshooting would include scrapping the whole FAH SMP directory and re-installing......
Mustanley
03-01-07, 10:00 AM
tried reinstalling in a new directory... still same problem.
[15:58:41] Initial: DC33; + 1484800 bytes downloaded
[15:58:41] Initial: 55C2; + 1489733 bytes downloaded
[15:58:41] Verifying core Core_a1.fah...
[15:58:41] Signature is VALID
[15:58:41]
[15:58:41] Trying to unzip core FahCore_a1.exe
[15:58:42] Decompressed FahCore_a1.exe (3624144 bytes) successfully
[15:58:42] + Core successfully engaged
[15:58:47]
[15:58:47] + Processing work unit
[15:58:47] Core required: FahCore_a1.exe
[15:58:47] Core found.
[15:58:47] Working on Unit 01 [March 1 15:58:47]
[15:58:47] + Working ...
[15:58:47] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 01 -checkpoint 15 -forceasm -verbose -lifeline 7494 -version 591'
[15:58:47] CoreStatus = 0 (0)
[15:58:47] Client-core communications error: ERROR 0x0
[15:58:47] Deleting current work unit & continuing...
[15:58:47] - Warning: Could not delete all work unit files (1): Core file absent
[15:58:47] Trying to send all finished work units
[15:58:47] + No unsent completed units remaining.
[15:58:47] - Preparing to get new work unit...
[15:58:47] + Attempting to get work packet
[15:58:47] - Will indicate memory of 1943 MB
[15:58:47] - Connecting to assignment server
[15:58:47] Connecting to http://assign.stanford.edu:8080/
[15:58:47] Posted data.
[15:58:47] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[15:58:47] + News From Folding@Home: Welcome to Folding@Home
[15:58:48] Loaded queue successfully.
[15:58:48] Connecting to http://171.64.65.56:8080/
[15:58:51] Posted data.
[15:58:51] Initial: 0000; - Receiving payload (expected size: 2949369)
fg
./fah5 -forceasm -verbosity 9
[15:58:58] ***** Got an Activate signal (2)
[15:58:58] Killing all core threads
Folding@Home Client Shutdown.
what's with all of the flags you've got going? I dont run any flags when I go SMP.....not sure if any work besides like -sendall and stuff like that
**Since I dont know much Linux, is this a 64 bit distro? SMP Beta currently needs a 64 bit distro
Mustanley
03-01-07, 10:26 AM
those flags are not required, but the vebosity 9 flag make the log file contain more information and forceasm flag prevents the client from disabling optimization after a crash or EUE.
definitely a 64bit OS.
jstanley@web:~/folding$ uname -a
Linux web 2.6.8-12-amd64-k8-smp #1 SMP Tue Dec 5 23:23:41 UTC 2006 x86_64 GNU/Linux
mkay....this is the end of my ability to support! Any you probably already did all of the stuff I said before I mentioned it :p
Mustanley
03-01-07, 11:04 AM
thanks for trying dz_jad :)
Macaholic
03-01-07, 12:42 PM
Just to verify. You were running the Uniprocessor client under Linux and then installed the SMP client for Linux? Are you sure it is pointed to the right folder? If you have the Uniprocessor client folder and the SMP client folder on the same install, it may be getting confused as to which folder to use. I've had a similar thing happen under OS X. Just be sure you are pointed to the correct SMP client folder by checking the ls la in terminal (you should see mpiexec and FahCore_a1.exe in the listing) and perhaps use local to make certain it is using the correct folder.
Mustanley
03-01-07, 01:46 PM
Yes, I was running dual uniprocessor clients first. I installed the smp client to a different directory.
it looks to be using the correct path and I even added the local flag but no luck. same error.
jstanley@web:~/folding$ ls -al
total 4028
drwxr-xr-x 3 jstanley jstanley 360 2007-03-01 10:58 .
drwxr-xr-x 9 jstanley jstanley 560 2007-03-01 10:54 ..
-rwxr-x--- 1 jstanley jstanley 119 2007-03-01 10:56 client.cfg
-rwx--x--x 1 jstanley jstanley 249556 2007-01-30 22:29 fah5
-rwxr-x--- 1 jstanley jstanley 3624144 2007-03-01 10:58 FahCore_a1.exe
-rw-r--r-- 1 jstanley jstanley 14234 2007-03-01 10:59 FAHlog.txt
-rw-r--r-- 1 jstanley jstanley 138220 2007-01-30 22:30 FAH_SMP_Linux.tgz
-rw-r--r-- 1 jstanley jstanley 8 2007-03-01 10:56 machinedependent.dat
-rwx------ 1 jstanley jstanley 68492 2006-11-21 11:12 mpiexec
-rw-r--r-- 1 jstanley jstanley 1490 2007-03-01 10:56 MyFolding.html
-rw-r--r-- 1 jstanley jstanley 7168 2007-03-01 10:58 queue.dat
drwxr-x--- 2 jstanley jstanley 80 2007-03-01 10:58 work
edit:
here's one more error I see each time after it redownoads the new core...
[20:01:10] Initial: DC33; + 1484800 bytes downloaded
[20:01:10] Initial: 55C2; + 1489733 bytes downloaded
[20:01:10] Verifying core Core_a1.fah...
[20:01:10] Signature is VALID
[20:01:10]
[20:01:10] Trying to unzip core FahCore_a1.exe
[20:01:11] Decompressed FahCore_a1.exe (3624144 bytes) successfully
[20:01:11] + Core successfully engaged
[20:01:11] Deleting current work unit & continuing...
[20:01:11] - Warning: Could not delete all work unit files (2): Core file absent
[20:01:11] Trying to send all finished work units
[20:01:11] + No unsent completed units remaining.
[20:01:11] - Preparing to get new work unit...
[20:01:11] + Attempting to get work packet
[20:01:11] - Will indicate memory of 1943 MB
[20:01:11] - Connecting to assignment server
[20:01:11] Connecting to http://assign.stanford.edu:8080/
[20:01:11] Posted data.
[20:01:11] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[20:01:11] + News From Folding@Home: Welcome to Folding@Home
[20:01:12] Loaded queue successfully.
[20:01:12] Connecting to http://171.64.65.56:8080/
[20:01:12] Posted data.
[20:01:12] Initial: 0000; - Error: Bad packet type from server, expected work assignment
[20:01:12] - Error: Attempt #1 to get work failed, and no other work to do.
Waiting before retry.
leelegend
03-01-07, 01:59 PM
i have had that a few times....let it do it a few times over and over
eventually mine downloaded a new core and all went fine
lee
Mustanley
03-01-07, 02:10 PM
I've let download a new core probably 10 times by now.
the /work directory has 10 wudata_xx.dat files piled up...:confused: :shrug:
This is going to sound stupid but are you running a 64bit version of debian?
and if you are make sure you have the 32bit libs installed
the smp client needs these to run
of course make sure your permissions are set correctly
I am not familiar with anything linux besides ubuntu 6.06 6.10 and only limitedly.
Hope this helps
I seem to recall some problem i had similar to this and i can't remember what it was:bang head
Shelnutt2
03-01-07, 05:18 PM
Debian includes the 32bit libraries by default still. Also if it wasn't a 64bit OS he would not be able to run it. The client checks first to make sure its a 64bit OS.
Here is what I would try. 1) Delete everything. 2) Redownload. 3)Run with the -configonly flag. Then run with the -verbosity(9?) flag. And/Or just follow the guide on the FAH SMP FAQ.
Also have you messed with your hosts file at all?
Macaholic
03-01-07, 05:44 PM
of course make sure your permissions are set correctly
Yeah. That is an easy one to forget sometimes, be sure to chmod +x fah5 and mpiexec.
Mustanley
03-02-07, 07:39 AM
permissions are set correctly...
hosts file is pretty much stock aside for one entry for the LAN
the only other entry is for the loopback interface
127.0.0.1 localhost.localdomain localhost
:shrug: I hope you figure it out!
Macaholic
03-02-07, 08:50 AM
the only other entry is for the loopback interface
Hmmm. Have a look at the known bug list (http://forum.folding-community.org/ftopic16928.html) for the SMP client. Have you tried pinging yourself? Look at the last bug on the list;
22) There are potential errors if you've chosen a hostname for your computer which also exists on the internet. If your machine is called "MyComputer" and a ping MyComputer gives you any appreciable time delay, change your computer's hostname.
Any other unusual network configurations or considerations?
Mustanley
03-02-07, 09:40 AM
I tried changing the hostname to something that was ping-able but it didn't make a difference.
Not sure what network settings might be breaking the smp client. This machine serves two web site domains, is a mail server and dns server.
nothing else else on the known bug list seems applicable.
vBulletin® v3.8.7, Copyright ©2000-2012, vBulletin Solutions, Inc.