Results 1 to 3 of 3
Thread: New Diskless Cruncher difficulty
09-20-02, 10:01 PM #1
- Join Date
- Dec 2001
- Corner of No and Where
New Diskless Cruncher difficulty
I haven't added, changed or even tweaked anything on my 3 pc cluster for a while because i haven't had time, BUT a new problem has been occurring:
every few days one of my clients will get stuck with a completed wu and not be able to upload the unit.
Rebooting the client does not fix the problem, I have to restart the server and then the clients to get the wu to upload each time.
screenlog.txt just shows that seti couldn't connect and will retry in an hour, var\log\messages shows this:
Sep 20 19:37:29 ltspserver dhcpd: DHCPREQUEST for 192.168.0.3 from 00:50:bf:40:0c:10 via eth0
Sep 20 19:37:29 ws003.ltsp dhclient: DHCPREQUEST on eth0 to 192.168.0.254 port 67
Sep 20 19:37:29 ltspserver dhcpd: DHCPACK on 192.168.0.3 to 00:50:bf:40:0c:10 via eth0
Sep 20 19:37:29 ws003.ltsp dhclient: DHCPACK from 192.168.0.254
Sep 20 19:37:29 ws003.ltsp dhclient: bound to 192.168.0.3 -- renewal in 10800 seconds.
Sep 20 19:37:31 ws003.ltsp -- MARK --
Sep 20 20:37:17 ws001.ltsp -- MARK --
Sep 20 20:37:32 ws003.ltsp -- MARK --
Sep 20 21:37:17 ws001.ltsp -- MARK --
Sep 20 21:37:32 ws003.ltsp -- MARK --
as the most recent entry.
I am thinking it is a dhcp timeout problem, but......
Anyone have ideas?Still overclocked and running linux on watercooled computers after all these years.
09-20-02, 11:43 PM #2
Looks like the client is not getting its IP renewed and consequently it can't get out on the lan to fetch a unit. As far as why this is happening, I'm gonna hang my head down low and kick the bucket a few times
09-21-02, 04:22 AM #3
- Join Date
- Aug 2002
- Chicago, IL
1) Ya, i'm getting that too, completed WU, can't send.
2) Sometimes, if I've reboot the machines, one machine gets stuck and can't continue in the middle of a WU. Run machine on, off, can't seem to get it started, have to delete all the *.sah files and then it starts up fresh.
3) Both problems occur only occasionally, AND, it has happened to some of my machines booting their own OS, NOT just to LTSP nodes.
No solutions from me, but maybe descriptions of these other issues can jog someones thoughts ......IT HAS BECOME TIME TO CRUNCH MORE
Dell XPS M1710, core 2 CPU
Alienware quad core extreme dual gforce 280gtx.
Quad core Q6600 - 1, 2 X 260GTX, 216 core
Quad core Q6600 - 2, 2 X 260GTX, 216 core
OLD AMD socket 939, 260GTX 196 core + 8800GTX
HP proliant ML350G4 dual xeon server
2 pcie slots left for more CUDA to be added later ....