PDA

View Full Version : Trouble Sending WUs in


U D13 N0W
02-28-09, 03:07 AM
Hi all
Just got my main rig setup back online after a several month hiatus. As was the case right before shutting it down, the rigs can pull in WUs and work them through, but for whatever reason I cannot seem to get them to send the WUs back. Im running the four rigs through a proxy using port 6588, I know a copy of the console text would help but my flash drive is acting up. I will try to get one posted in the next few days.
My only suspicion is that since my boxes are using port 6588 and FAH servers respond to 8080 that im just not getting through on an open port (probably firewall blocking it or such). The boxes used to get onto the servers just fine. Any thoughts?

science man
02-28-09, 03:33 AM
You should post the fahlog on here but I would say to try not using a proxy.

Adak
02-28-09, 03:47 AM
Let the client know about your proxy, in it's configuration.

You can just delete the current client.cfg file to force the client to re-start it's configuration function, or you can add the -config or -configonly initialization strings to it's start up command line.

First test: try pinging the FAH servers listed in the connectivity sticky by ChasR, and see if you get an OK reply.

PeddlerOfFlesh
02-28-09, 04:41 AM
More info on your setup may help, too. What proxy? What's it running on? In what kind of environment? Why are you running a proxy in the first place? What kind of firewall and NAT? Outgoing really shouldn't be affected. Looking at the logs, it seems to act just like posting data over http.

Without knowing anything about your setup, there's about a million things it could be :(

Jolly-Swagman
02-28-09, 05:28 AM
Could also well be Stanford Servers as I too have a few WU"s queued to send,

See here Stanford Servers Status http://fah-web.stanford.edu/serverstat.html

Some of the GPU servers are full.

U D13 N0W
02-28-09, 12:17 PM
ok, the long version.
First, why use a proxy: I'm on a college network and for whatever reason they wont allow any more than one computer to be seen coming from a single room. So as a nice little work around to this i'm feeding the four folding rigs through a USB nat router into my laptop that has a connection that feeds into the network. Thus all traffic being sent looks as though it comes from one computer (idealy).
Two of the rigs are on the Windows 7 Beta 7000 build, and the other two are windows XP pro
As far as external firewalls in the network the school's tech policy is generally dont ask dont tell. So i have no clue if they are blocking my traffic.
The four rigs connect into my machine through a netgear switch then my machine is connected into a similar switch that is then connected to the web.
The proxy only will let me use port:6588
In the connectivity test http://171.65.103.100/ took two tries but came up ok
http://171.64.65.64/ doesnt load
and http://171.64.122.76 comes up w/ and error msg sayinng unable to contatct site, please refresh browser to try again, proxy w4.14(Release)
Looking at what the different links go to, my rigs should be fine. I'm running the standard CPU folding console, so the fact that i cant connect to the SMP servers shouldnt be a problem. IE connected fine to all the others. and i was able to get WUs from FAH but i cant send them in. :shrug:

Adak
02-28-09, 01:43 PM
Shut down your folding client. (Ctrl + C) and then restart it.

Paste a copy of that restart portion of the fahlog.txt file, right from the very first line of the restart, to where it keeps trying to send, but can't.

PeddlerOfFlesh
02-28-09, 03:35 PM
Did you do those tests from the computer sharing the internet connection? If you did that and it worked, then it may be something you can fix. If not, then it might be out of your control. If you have a firewall on your computer that is sharing the connection, try disabling it. Most firewall software for home computers will not handle connections well. The fact that this has lasted months makes me think it's not Stanford. Do you have any other weird connection problems.

On a side note, I've known people in dorms that just registered the MAC of a router, then set up computers behind that. Now that I think of it, you can most likely clone the MAC of your PC into a router, so you wouldn't even have to update it with IT and could just borrow one to see if it works. That should work for you and would probably be better. At least you will have internet if your computer you're sharing it through goes down. You'd probably have better network handling, too.

U D13 N0W
03-09-09, 09:34 PM
Sorry for the delay, had a few papers and a book to read as well for classes. To PeddlerOfFlesh, yes i ran the tests from internet sharing rigs. Also I like the idea of telling one of the switches to pick up my IP address, not sure of the process, since I got the two netgear switches used off ebay and last i tried to access their settings I couldnt. Might have been doing it wrong, never tried to before.

Here's a copy of the FAH log from one of the rigs



--- Opening Log file [March 10 01:16:42 UTC]


# Windows CPU Console Edition #################################################
################################################## #############################

Folding@Home Client Version 6.23

http://folding.stanford.edu

################################################## #############################
################################################## #############################

Launch directory: C:\Documents and Settings\admin 2\Desktop\Folding@home-Win32-x86-623
Executable: C:\Documents and Settings\admin 2\Desktop\Folding@home-Win32-x86-623\Folding@home-Win32-x86.exe


[01:16:42] - Ask before connecting: No
[01:16:42] - Proxy: 10.0.0.1:6588
[01:16:42] - User name: U_D13_N0W (Team 32)
[01:16:42] - User ID: 4B4A813225E01DB1
[01:16:42] - Machine ID: 1
[01:16:42]
[01:16:42] Loaded queue successfully.
[01:16:42]
[01:16:42] + Processing work unit
[01:16:42] Core required: FahCore_78.exe
[01:16:42] Core found.
[01:16:42] Project: 4459 (Run 630, Clone 4, Gen 7)
[01:16:42] - Read packet limit of 540015616... Set to 524286976.


[01:16:42] + Attempting to send results [March 10 01:16:42 UTC]
[01:16:42] Working on queue slot 09 [March 10 01:16:42 UTC]
[01:16:42] + Working ...
[01:16:42]
[01:16:42] *------------------------------*
[01:16:42] Folding@Home Gromacs Core
[01:16:42] Version 1.90 (March 8, 2006)
[01:16:42]
[01:16:42] Preparing to commence simulation
[01:16:42] - Looking at optimizations...
[01:16:42] - Files status OK
[01:16:47] - Expanded 2224737 -> 15108313 (decompressed 679.1 percent)
[01:16:47]
[01:16:47] Project: 2496 (Run 3, Clone 19, Gen 0)
[01:16:47]
[01:16:48] Assembly optimizations on if available.
[01:16:48] Entering M.D.
[01:17:10] (Starting from checkpoint)
[01:17:10] Protein: system
[01:17:10]
[01:17:10] Writing local files
[01:17:10] Completed 153167 out of 250000 steps (61%)
[01:17:13] Extra SSE boost OK.
[01:19:52] - Couldn't send HTTP request to server
[01:19:52] + Could not connect to Work Server (results)
[01:19:52] (171.67.108.13:8080)
[01:19:52] + Retrying using alternative port
[01:20:13] - Couldn't send HTTP request to server
[01:20:13] + Could not connect to Work Server (results)
[01:20:13] (171.67.108.13:80)
[01:20:13] - Error: Could not transmit unit 01 (completed February 25) to work server.
[01:20:13] - Read packet limit of 540015616... Set to 524286976.


[01:20:13] + Attempting to send results [March 10 01:20:13 UTC]
[01:20:13] - Couldn't send HTTP request to server
[01:20:13] + Could not connect to Work Server (results)
[01:20:13] (171.67.108.17:8080)
[01:20:13] + Retrying using alternative port
[01:20:14] - Couldn't send HTTP request to server
[01:20:14] + Could not connect to Work Server (results)
[01:20:14] (171.67.108.17:80)
[01:20:14] Could not transmit unit 01 to Collection server; keeping in queue.
[01:20:14] Project: 2606 (Run 19, Clone 10, Gen 153)
[01:20:14] - Read packet limit of 540015616... Set to 524286976.


[01:20:14] + Attempting to send results [March 10 01:20:14 UTC]
[01:20:14] - Couldn't send HTTP request to server
[01:20:14] + Could not connect to Work Server (results)
[01:20:14] (171.64.65.65:8080)
[01:20:14] + Retrying using alternative port
[01:20:15] - Couldn't send HTTP request to server
[01:20:15] + Could not connect to Work Server (results)
[01:20:15] (171.64.65.65:80)
[01:20:15] - Error: Could not transmit unit 02 (completed February 26) to work server.
[01:20:15] - Read packet limit of 540015616... Set to 524286976.


[01:20:15] + Attempting to send results [March 10 01:20:15 UTC]
[01:20:15] - Couldn't send HTTP request to server
[01:20:15] + Could not connect to Work Server (results)
[01:20:15] (171.67.108.25:8080)
[01:20:15] + Retrying using alternative port

Folding@Home Client Shutdown.


--- Opening Log file [March 10 01:20:22 UTC]


# Windows CPU Console Edition #################################################
################################################## #############################

Folding@Home Client Version 6.23

http://folding.stanford.edu

################################################## #############################
################################################## #############################

Launch directory: C:\Documents and Settings\admin 2\Desktop\Folding@home-Win32-x86-623
Executable: C:\Documents and Settings\admin 2\Desktop\Folding@home-Win32-x86-623\Folding@home-Win32-x86.exe


[01:20:22] - Ask before connecting: No
[01:20:22] - Proxy: 10.0.0.1:6588
[01:20:22] - User name: U_D13_N0W (Team 32)
[01:20:22] - User ID: 4B4A813225E01DB1
[01:20:22] - Machine ID: 1
[01:20:22]
[01:20:22] Loaded queue successfully.
[01:20:22]
[01:20:22] + Processing work unit
[01:20:22] Core required: FahCore_78.exe
[01:20:22] Core found.
[01:20:22] Project: 4459 (Run 630, Clone 4, Gen 7)
[01:20:22] Working on queue slot 09 [March 10 01:20:22 UTC]
[01:20:22] - Read packet limit of 540015616... Set to 524286976.


[01:20:23] + Attempting to send results [March 10 01:20:23 UTC]
[01:20:22] + Working ...
[01:20:23]
[01:20:23] *------------------------------*
[01:20:23] Folding@Home Gromacs Core
[01:20:23] Version 1.90 (March 8, 2006)
[01:20:23]
[01:20:23] Preparing to commence simulation
[01:20:23] - Looking at optimizations...
[01:20:23] - Files status OK
[01:20:27] - Expanded 2224737 -> 15108313 (decompressed 679.1 percent)
[01:20:27]
[01:20:27] Project: 2496 (Run 3, Clone 19, Gen 0)
[01:20:27]
[01:20:29] Assembly optimizations on if available.
[01:20:29] Entering M.D.
[01:20:50] (Starting from checkpoint)
[01:20:50] Protein: system
[01:20:50]
[01:20:50] Writing local files
[01:20:51] Completed 153167 out of 250000 steps (61%)
[01:20:54] Extra SSE boost OK.
[01:23:24] + Could not connect to Work Server (results)
[01:23:24] (171.67.108.13:8080)
[01:23:24] + Retrying using alternative port
[01:28:28] + Could not connect to Work Server (results)
[01:28:28] (171.67.108.13:80)
[01:28:28] - Error: Could not transmit unit 01 (completed February 25) to work server.
[01:28:28] - Read packet limit of 540015616... Set to 524286976.


[01:28:28] + Attempting to send results [March 10 01:28:28 UTC]
[01:28:29] - Couldn't send HTTP request to server
[01:28:29] + Could not connect to Work Server (results)
[01:28:29] (171.67.108.17:8080)
[01:28:29] + Retrying using alternative port
[01:28:30] - Couldn't send HTTP request to server
[01:28:30] + Could not connect to Work Server (results)
[01:28:30] (171.67.108.17:80)
[01:28:30] Could not transmit unit 01 to Collection server; keeping in queue.
[01:28:30] Project: 2606 (Run 19, Clone 10, Gen 153)
[01:28:30] - Read packet limit of 540015616... Set to 524286976.


[01:28:31] + Attempting to send results [March 10 01:28:31 UTC]
[01:28:31] - Couldn't send HTTP request to server
[01:28:31] + Could not connect to Work Server (results)
[01:28:31] (171.64.65.65:8080)
[01:28:31] + Retrying using alternative port
[01:28:33] - Couldn't send HTTP request to server
[01:28:33] + Could not connect to Work Server (results)
[01:28:33] (171.64.65.65:80)
[01:28:33] - Error: Could not transmit unit 02 (completed February 26) to work server.
[01:28:33] - Read packet limit of 540015616... Set to 524286976.


[01:28:33] + Attempting to send results [March 10 01:28:33 UTC]
[01:28:34] - Couldn't send HTTP request to server
[01:28:34] + Could not connect to Work Server (results)
[01:28:34] (171.67.108.25:8080)
[01:28:34] + Retrying using alternative port
[01:28:35] - Couldn't send HTTP request to server
[01:28:35] + Could not connect to Work Server (results)
[01:28:35] (171.67.108.25:80)
[01:28:35] Could not transmit unit 02 to Collection server; keeping in queue.
[01:28:35] Project: 4460 (Run 961, Clone 3, Gen 8)
[01:28:35] - Read packet limit of 540015616... Set to 524286976.


[01:28:35] + Attempting to send results [March 10 01:28:35 UTC]
[01:28:36] - Couldn't send HTTP request to server
[01:28:36] + Could not connect to Work Server (results)
[01:28:36] (171.67.108.13:8080)
[01:28:36] + Retrying using alternative port
[01:28:37] - Couldn't send HTTP request to server
[01:28:37] + Could not connect to Work Server (results)
[01:28:37] (171.67.108.13:80)
[01:28:37] - Error: Could not transmit unit 03 (completed February 27) to work server.
[01:28:37] - Read packet limit of 540015616... Set to 524286976.


[01:28:37] + Attempting to send results [March 10 01:28:37 UTC]
[01:28:38] - Couldn't send HTTP request to server
[01:28:38] + Could not connect to Work Server (results)
[01:28:38] (171.67.108.17:8080)
[01:28:38] + Retrying using alternative port
[01:28:39] - Couldn't send HTTP request to server
[01:28:39] + Could not connect to Work Server (results)
[01:28:39] (171.67.108.17:80)
[01:28:39] Could not transmit unit 03 to Collection server; keeping in queue.
[01:28:39] Project: 2496 (Run 2, Clone 4, Gen 0)
[01:28:39] - Read packet limit of 540015616... Set to 524286976.


[01:28:39] + Attempting to send results [March 10 01:28:39 UTC]
[01:28:40] - Couldn't send HTTP request to server
[01:28:40] + Could not connect to Work Server (results)
[01:28:40] (171.65.103.160:80)
[01:28:40] + Retrying using alternative port
[01:28:42] - Couldn't send HTTP request to server
[01:28:42] + Could not connect to Work Server (results)
[01:28:42] (171.65.103.160:8080)
[01:28:42] - Error: Could not transmit unit 04 (completed March 3) to work server.
[01:28:42] - Read packet limit of 540015616... Set to 524286976.


[01:28:42] + Attempting to send results [March 10 01:28:42 UTC]

U D13 N0W
03-11-09, 01:40 AM
have a partial solution at least for the current batch of completed WUs but I need to run it by OCF before being sure it will work. Could I connect my HDDs into a computer back home and send the WUs in that way? (that is assuming that the machine I hook them to is compatable with the hard drives)

Jolly-Swagman
03-11-09, 02:35 AM
[01:20:22] - Ask before connecting: No
[01:20:22] - Proxy: 10.0.0.1:6588
[01:20:22] - User name: U_D13_N0W (Team 32)
[01:20:22] - User ID: 4B4A813225E01DB1
[01:20:22] - Machine ID: 1
[01:20:22]
[01:20:22] Loaded queue successfully.

Have your Proxy point to ports 80:8080 as it doesnt seem like you are connecting through these ports.

Also have you run qfix to see if that fixes the queue dat file

U D13 N0W
03-11-09, 11:36 AM
qfix? what is that never heard of it before.
And with this proxy Im using I cant change the port, it is automatically set to 6588. I tried using the apache proxy before this one but couldnt make sense of it (two years and a bit of education ago) and i stumbled onto this one which is really straitforward. If anyone knows of a better proxy...

Adak
03-11-09, 07:51 PM
Qfix is a utility to fix your queue in FAH. :clap:

Yeah! for the fix it crew! Please don't keep starting the FAH client, because it will trash your client's queue, and then the WU will be lost.

I'll edit this with the right info on Qfix in a minute. Since I'm not familiar with how the FAH client works with a proxy, I'm going to ask about this right now, in the FAH forum, and see what the Stanford guys can advise.

Back in a bit.

Here's the way to use Qfix: (quote is from ChasR, last month)

If you stop and restart the client, you'll trash the queue and lose the completed WU. Stop the client. Run qfix in the FAH directory. It should find a full queue slot and say the file is OK but it really isn't. If, perchance, qfix were to report it had fixed the file and requeued the WU for upload, you could go ahead and start FAH. This has never happened for me. Run fah6 with the delete flag and the 2 digit queue # (leading zero): -delete 0X.

The client will report after several minutes that it wasn't able to delete the WU and close. Run qfix again and it will report the file fixed and the WU requeued for upload.

Restart FAH and your WU will be sent.



All this queue work is fine, but the FAH client still needs to get through and connect with the FAH servers.

U D13 N0W
03-11-09, 09:20 PM
is there a problem with my queue? all i had noticed was that i wasnt getting out to the results server. If there is a queue problem i'll try this out.

Also would my above mentioned solution of pulling the harddrives and hooking them up to another rig work? The other compy would be back home not using a proxy. I know i have been told before that my main machine cant send the wu's in but if it's the same folding data on the same harddrive just in a different rig than actually folded it will that work?

I'm willing to change my proxy if anyone knows of an easy to use one that is reliable for sending in WUs. Going to go start a thread on this in the FAH section (already threw one up in the software/apps area).

ChasR
03-11-09, 09:36 PM
much easier to use a flash drive than pull the hard drives.

U D13 N0W
03-12-09, 12:23 AM
how would I go about doing that thou? as far as being able to send in the WUs

Adak
03-12-09, 03:19 AM
The server you've been trying to reach has had some problems. New server code is scheduled to come out any day now.

When you complete a WU, it tries to return it to the right server for you, according to your WU, etc. If that server is down or too busy, then it will switch to the backup collection server, at some point.

What I'd like to see you do is just restart the client a few times, and see if you can get it through to the secondary (or back to the primary) collection server.

I can't comment on sneaker netting because I haven't done it it in years. When I could really have used sneaker netting, I only had one computer, so it didn't do me any good, anyway. :p

ChasR
03-12-09, 08:17 AM
how would I go about doing that thou? as far as being able to send in the WUs

Copy the contents of your folding directory to a flash drive. Move it to another computer and run the FAH executable from the flash drive with the -send all flag. FAH will only send the WUs and close on completion.

ChasR
03-12-09, 08:24 AM
If your proxy truly only passed port 6588, you wouldn't connect to the internet and you wouldn't have gotten a reply from FAH servers using port 8080 or 80. Post up a copy of your client.cfg.

U D13 N0W
03-12-09, 03:45 PM
Copy the contents of your folding directory to a flash drive. Move it to another computer and run the FAH executable from the flash drive with the -send all flag. FAH will only send the WUs and close on completion.

so if i were to stop the FAH client on my main rig and then start the one that is located on the flashdrive that would send the units in properly?
Can someone post a link to the the page on adding flags? it's been over a year since doing that for me and I dont want to screw it up. would it work the same if i deleted the old config file and just retyped the info?

Jolly-Swagman
03-12-09, 11:13 PM
so if i were to stop the FAH client on my main rig and then start the one that is located on the flashdrive that would send the units in properly?
Can someone post a link to the the page on adding flags? it's been over a year since doing that for me and I dont want to screw it up. would it work the same if i deleted the old config file and just retyped the info?

Here is the link for Flags for clients
http://fahwiki.net/index.php/How_do_I_know_what_the_client_flags_(-switches)_are_and_what_they_do%3F (http://fahwiki.net/index.php/How_do_I_know_what_the_client_flags_%28-switches%29_are_and_what_they_do%3F)

Here is also a useful link too on Sneakernetting
http://fahwiki.net/index.php/Sneakernetting

when you send the WU, use "-send all " ( without quotes)

U D13 N0W
03-13-09, 01:38 AM
Here is the link for Flags for clients
http://fahwiki.net/index.php/How_do_I_know_what_the_client_flags_(-switches)_are_and_what_they_do%3F (http://fahwiki.net/index.php/How_do_I_know_what_the_client_flags_%28-switches%29_are_and_what_they_do%3F)

Here is also a useful link too on Sneakernetting
http://fahwiki.net/index.php/Sneakernetting

when you send the WU, use "-send all " ( without quotes)

Thanks for the links Jolly-Swagman. On the topic of sneakernetting would it be possible for me to do something relatively similar to that. My situation specifically involves rigs that all have access to the get work server but only one has access to the returns server (not sure why really, and still trying to puzzle that one out). If I copy the folding folder from one of the rigs that has a backup of completed WUs and then open that client on the rig that can access the returns server would the units send? and would i get credit for them? since technically they were assigned to and done by a different machine? Maybe if I change the machine id in the config file of the other machines before hooking them up to my main rig?

HayesK
03-13-09, 09:53 AM
You need to use the "-send all" flag when you run the client on the 2nd rig to cause the client to shudown after sending the results, otherwise the client will request another WU from the server.

No reason to delete your "cfg" file, just copy the shortcut that you use to start FAH and add the extra parameter to the copy.

I have several different scripts/shortcuts in my FAH folders with special parameters for performing various maintenance activities like "-send all", "-cfg", "-del 09", "-del 06".

I keep the same FAH client version on all my rigs, so no concern about the client version, but suspect that ChasR suggested you copy the entire FAH folder and run FAH from the jump drive to ensure the same client version is used and that you get all the necessary files.

There are a number of ways to "move" WU between rigs. My method is to setup an extra FAH instance on each rig using a different "machine ID", copy the key files ("work folder", queue, FahCore, unitinfo) from the problem rig into the extra instance, then start FAH with the desired flags. All my rigs are on the same home network, so it is easy for me to move the files around.

If you are going to run FAH off the jump drive using copy of the folder from the problem machine, I would suggest that you stop any clients running on the second rig to avoid possible conflict with "machine ID". You could also change the cfg on all the rigs to use different IDs.

I do not have any windows CPU clients, but below is what the log looks like running "-send all" on Linux SMP. This particular log is from this AM after running qfix to fix a hung WU.

# Linux Console Edition ################################################## #####
################################################## #############################

Folding@Home Client Version 6.24beta

http://folding.stanford.edu

################################################## #############################
################################################## #############################

Launch directory: /home/homeuser/folding/FAH-SMP1
Executable: ./fah6
Arguments: -local -verbosity 9 -send all

[12:06:00] - Ask before connecting: No
[12:06:00] - User name: HayesK (Team 32)
[12:06:00] - User ID: 4DA244A3339EE19E
[12:06:00] - Machine ID: 1
[12:06:00]
[12:06:00] Loaded queue successfully.
[12:06:00] Attempting to return result(s) to server...
[12:06:00] Trying to send all finished work units
[12:06:00] Project: 2653 (Run 3, Clone 11, Gen 130)
[12:06:00] - Read packet limit of 540015616... Set to 524286976.


[12:06:00] + Attempting to send results [March 13 12:06:00 UTC]
[12:06:00] - Reading file work/wuresults_09.dat from core
[12:06:01] (Read 5517365 bytes from disk)
[12:06:01] Connecting to http://171.64.65.64:8080/
[12:06:17] Posted data.
[12:06:17] Initial: 0000; - Uploaded at ~336 kB/s
[12:06:17] - Averaged speed for that direction ~249 kB/s
[12:06:17] + Results successfully sent
[12:06:17] Thank you for your contribution to Folding@Home.
[12:06:17] + Number of Units Completed: 285

[12:06:18] + Sent 1 of 1 completed units to the server
[12:06:18] ***** Got a SIGTERM signal (15)
[12:06:18] Killing all core threads

Folding@Home Client Shutdown.

U D13 N0W
03-13-09, 02:57 PM
victory has been achieved at last, got almost all of the WUs in properly using a variation of the sneakernetting method Jolly Swagman posted. While it is less than idea and does cause weird spikes in my point count, it works and thats much more than what I had going for me yesterday.
Thanks to all who replied.
Still looking for a different proxy if anyone has imput to share. that way maybe next time this issue wont arise.

ChasR
03-13-09, 07:28 PM
Post up your client.cfg so we can see how you're addressing the proxy with FAH.

U D13 N0W
03-16-09, 07:23 PM
I'm home right now so dont have the client.cfg file handy, I just did it through the options of use proxy (yes) bind to (10.0.0.1) port (6588). will post it once back at school.

Got another question, after sucessfully sending in the WUs I still havent recieved points for them and they havent shown up as completed according to Extreme overclocking's stats page. sent in about 16 WUs last friday, shows only 2 turned in and all of the 16 came back as being recieved in the FAH console. Thoughts?