Results 1 to 3 of 3
  1. #1
    Linux challenged Senior, not that it stops me... rogerdugans's Avatar
    Join Date
    Dec 2001
    Location
    Corner of No and Where
    Posts
    5,956

    New Diskless Cruncher difficulty

    I haven't added, changed or even tweaked anything on my 3 pc cluster for a while because i haven't had time, BUT a new problem has been occurring:

    every few days one of my clients will get stuck with a completed wu and not be able to upload the unit.

    Rebooting the client does not fix the problem, I have to restart the server and then the clients to get the wu to upload each time.

    screenlog.txt just shows that seti couldn't connect and will retry in an hour, var\log\messages shows this:

    Sep 20 19:37:29 ltspserver dhcpd: DHCPREQUEST for 192.168.0.3 from 00:50:bf:40:0c:10 via eth0
    Sep 20 19:37:29 ws003.ltsp dhclient: DHCPREQUEST on eth0 to 192.168.0.254 port 67
    Sep 20 19:37:29 ltspserver dhcpd: DHCPACK on 192.168.0.3 to 00:50:bf:40:0c:10 via eth0
    Sep 20 19:37:29 ws003.ltsp dhclient: DHCPACK from 192.168.0.254
    Sep 20 19:37:29 ws003.ltsp dhclient: bound to 192.168.0.3 -- renewal in 10800 seconds.
    Sep 20 19:37:31 ws003.ltsp -- MARK --
    Sep 20 20:37:17 ws001.ltsp -- MARK --
    Sep 20 20:37:32 ws003.ltsp -- MARK --
    Sep 20 21:37:17 ws001.ltsp -- MARK --
    Sep 20 21:37:32 ws003.ltsp -- MARK --

    as the most recent entry.

    I am thinking it is a dhcp timeout problem, but......

    Anyone have ideas?
    Still overclocked and running linux on watercooled computers after all these years.

  2. #2
    Senior Seti Addict TC's Avatar
    Join Date
    Jan 2001
    Location
    Denver, CO
    Posts
    7,504
    Looks like the client is not getting its IP renewed and consequently it can't get out on the lan to fetch a unit. As far as why this is happening, I'm gonna hang my head down low and kick the bucket a few times

  3. #3
    Member
    Join Date
    Aug 2002
    Location
    Chicago, IL
    Posts
    935
    1) Ya, i'm getting that too, completed WU, can't send.

    2) Sometimes, if I've reboot the machines, one machine gets stuck and can't continue in the middle of a WU. Run machine on, off, can't seem to get it started, have to delete all the *.sah files and then it starts up fresh.

    3) Both problems occur only occasionally, AND, it has happened to some of my machines booting their own OS, NOT just to LTSP nodes.

    No solutions from me, but maybe descriptions of these other issues can jog someones thoughts ......
    IT HAS BECOME TIME TO CRUNCH MORE
    Dell XPS M1710, core 2 CPU
    Alienware quad core extreme dual gforce 280gtx.
    Quad core Q6600 - 1, 2 X 260GTX, 216 core
    Quad core Q6600 - 2, 2 X 260GTX, 216 core
    OLD AMD socket 939, 260GTX 196 core + 8800GTX
    HP proliant ML350G4 dual xeon server
    2 pcie slots left for more CUDA to be added later ....

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •