CentOS 5.9: system goes to sleep and locks up?

magellan · Jul 9, 2014

I have a weird problem that if I go away from my CentOS 5.9 box for a while (a couple of hours or more) and come back it'll appear to be hung w/no screen output. Moving the mouse, hitting keys on the keyboard does nothing. The power light was a steady green (as it always is when powered up). I have to manually shut the system down by holding down the power button. One time I hear the HDD power down and the heads retract, after that the system was hung. Could this have to do w/some sort of sleep/hibernation mode? How can I stop it?

EarthDog · Jul 9, 2014

Not sure this is an OS issue. Those Intel's are notorious for having problems coming out of sleep. I have had sleep disabled for years anyway. Having an SSD is.the reason why when I can get an active desktop in well under 20s.

RJARRRPCGP · Jul 9, 2014

Current: It may be from the Vcore being too high in combination with stock clocks. (Overclocking may stop those freezes.) I would turn off all "green" stuff.

In the past: (informational only)

(2001 and 2002: I only seen this with Windows 98 on my first Athlon build. There was a bug that apparently was with the driver bundled with my Philips 107 S CRT monitor. When used with Windows 98, when Windows 98 is kept idle, I often come back with a black screen and the system failing to respond!

magellan · Jul 10, 2014

Oops, I should've explained this is not the system in my sig. This is my dedicated Linux box -- an old HP Compaq dc7700 convertible minitower with a bottom of the line Core 2 Duo.
Could the problem still be the hibernation/sleep mode? Can I turn this off in the BIOS?

EarthDog · Jul 10, 2014

I dont recall the problem with that generation Intel... that said, yes sleep modes can be disabled in the bios.

magellan · Jul 12, 2014

How hot would a 1st gen Core 2 Duo have to get before it locks up? I remember my old q9550 could get up to the 70's (in degrees celsius) without any issues, but I'm not sure if this applies to a Core 2 Duo. It seems really unlikely the system would lock up (because it always happens when I'm not using it at all) due to overheating. I wonder if it could be the north bridge (MCH?) overheating, because there's no fan whatsoever on it?

EarthDog · Jul 13, 2014

I want to say its TJmax is around 100C...

Unless you are overclocking the CPU, the MCH should be fine. I mean

magellan · Jul 15, 2014

My box just locked up again. I heard the HDD heads retract and that was that, it was hung.
I've checked smartctl and HDD looks good. It's an old 320 GiB EIDE though. I'm wondering if it might be so old it might not support being powered down.
At least it's not overheating (thanks for that info ED).

ihrsetrdr · Jul 16, 2014

magellan said:
I have a weird problem that if I go away from my CentOS 5.9 box for a while (a couple of hours or more) and come back it'll appear to be hung w/no screen output. Moving the mouse, hitting keys on the keyboard does nothing. The power light was a steady green (as it always is when powered up). I have to manually shut the system down by holding down the power button. One time I hear the HDD power down and the heads retract, after that the system was hung. Could this have to do w/some sort of sleep/hibernation mode? How can I stop it?

I suspect that it is sleep/hibernation related. Could you post the terminal output of

Code:

cat /proc/acpi/wakeup

magellan · Jul 16, 2014

[bobaroni@localhost ~]# cat /proc/acpi/wakeup

Device Sleep state Status
PCI0 4 disabled
COM1 4 disabled
PEG1 4 disabled
IGBE 4 disabled
PCX1 4 disabled
PCX2 4 disabled
HUB 4 disabled
USB1 3 disabled
USB2 3 disabled
USB3 3 disabled
USB4 3 disabled
USB5 3 disabled
EUS1 3 disabled
EUS2 3 disabled
PBTN 4 * enabled

ihrsetrdr · Jul 16, 2014

magellan said:
[bobaroni@localhost ~]# cat /proc/acpi/wakeup

Device Sleep state Status
PCI0 4 disabled
COM1 4 disabled
PEG1 4 disabled
IGBE 4 disabled
PCX1 4 disabled
PCX2 4 disabled
HUB 4 disabled
USB1 3 disabled
USB2 3 disabled
USB3 3 disabled
USB4 3 disabled
USB5 3 disabled
EUS1 3 disabled
EUS2 3 disabled
PBTN 4 * enabled

OK, looks like all "power-on" devices are disabled, except for the pbtn(power button).

magellan said:
Oops, I should've explained this is not the system in my sig. This is my dedicated Linux box -- an old HP Compaq dc7700 convertible minitower with a bottom of the line Core 2 Duo.
Could the problem still be the hibernation/sleep mode? Can I turn this off in the BIOS?

You should be abe to, here is the Technical reference guide for the Compaq dc7700.

magellan · Jul 17, 2014

When using hdparm to configure HDD settings (for example hdparm -S 0 /dev/hde ) are they
persistent? Or do I have to manually configure a file somewhere in /etc?
I've disabled APM, but the drive still is going into sleep mode (where you can hear the heads retract).
Here's what hdparm -iI /dev/hde reports:

/dev/hde:

Model=WDC WD3200JB-00KFA0, FwRev=08.05J08, SerialNo=WD-WCAMR2193465
Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=65
BuffType=unknown, BuffSize=8192kB, MaxMultSect=16, MultSect=16
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=268435455
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
AdvancedPM=no WriteCache=enabled
Drive conforms to: Unspecified: ATA/ATAPI-1 ATA/ATAPI-2 ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6

* signifies the current active mode

ATA device, with non-removable media
Model Number: WDC WD3200JB-00KFA0
Serial Number: WD-WCAMR2193465
Firmware Revision: 08.05J08
Standards:
Supported: 6 5 4
Likely used: 6
Configuration:
Logical max current
cylinders 16383 16383
heads 16 1
sectors/track 63 63
--
CHS current addressable sectors: 1032129
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 625142448
device size with M = 1024*1024: 305245 MBytes
device size with M = 1000*1000: 320072 MBytes (320 GB)
Capabilities:
LBA, IORDY(can be disabled)
Standby timer values: spec'd by Standard, with device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
Recommended acoustic management value: 128, current value: 254
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_BUFFER command
* READ_BUFFER command
* DOWNLOAD_MICROCODE
SET_MAX security extension
Automatic Acoustic Management feature set
* 48-bit Address feature set
* Device Configuration Overlay feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
Security:
Master password revision code = 65534
supported
not enabled
not locked
not frozen
not expired: security count
not supported: enhanced erase
HW reset results:
CBLID- above Vih
Device num = 0 determined by the jumper
Checksum: correct

DocClock aka MadClocker · Jul 17, 2014

From your output of cat /proc/acpi/wakeup, it would appear that the only device that will respond when it goes to sleep is the power button, in which case, you should be able to just hit the power button, and it should wake up...knowing nothing about linux, I would still assume that you can edit some file that will allow you to hit the spacebar or another key to wake the system, but like I saud, I know nothing about Linux so all the above is an assumption on my part.

DocClock aka MadClocker · Jul 18, 2014

Something I do know though, is that every system needs a swap file for hibernation, so you need to make sure you have one and it is set to at least the same size as your ram,double is preffered e.g. if you have 2gb ram then you would idealy have a swap file of 4gb.
hope this helps.

ihrsetrdr · Jul 18, 2014

magellan said:
When using hdparm to configure HDD settings (for example hdparm -S 0 /dev/hde ) are they
persistent? Or do I have to manually configure a file somewhere in /etc?
I've disabled APM, but the drive still is going into sleep mode (where you can hear the heads retract).

I'm not sure about Centos but in Debian you can edit /etc/hdparm.conf with gedit, vi or nano(or?) to make changes permanent. Centos may have the hdparm.conf file somewhere else, or handle it in another manner.

magellan · Jul 18, 2014

ihrsetrdr said:
I'm not sure about Centos but in Debian you can edit /etc/hdparm.conf with gedit, vi or nano(or?) to make changes permanent. Centos may have the hdparm.conf file somewhere else, or handle it in another manner.

find / -name hdparm.conf
didn't turn up any hdparm.conf file.

Using the hdparm utility did stop the HDD from powering down though,
so I didn't have any lockups.

magellan · Jul 18, 2014

DocClock aka MadClocker said:
From your output of cat /proc/acpi/wakeup, it would appear that the only device that will respond when it goes to sleep is the power button, in which case, you should be able to just hit the power button, and it should wake up...knowing nothing about linux, I would still assume that you can edit some file that will allow you to hit the spacebar or another key to wake the system, but like I saud, I know nothing about Linux so all the above is an assumption on my part.

If the system locks up again today, I'll try this. Thanks.

magellan · Jul 23, 2014

OK, I had some weird errors reported by the HDD on startup, but when I ran the Western Digital diagnostics long test it reported no errors, but I look at smartctl -a /dev/hde it reports some recent errors:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail Always - 29
3 Spin_Up_Time 0x0003 205 183 021 Pre-fail Always - 4750
4 Start_Stop_Count 0x0032 094 094 000 Old_age Always - 6628
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 200 200 051 Pre-fail Always - 0
9 Power_On_Hours 0x0032 069 069 000 Old_age Always - 23092
10 Spin_Retry_Count 0x0013 100 100 051 Pre-fail Always - 0
11 Calibration_Retry_Count 0x0012 100 100 051 Old_age Always - 0
12 Power_Cycle_Count 0x0032 094 094 000 Old_age Always - 6587
194 Temperature_Celsius 0x0022 116 103 000 Old_age Always - 34
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 2
200 Multi_Zone_Error_Rate 0x0009 200 200 051 Pre-fail Offline - 0

SMART Error Log Version: 1
ATA Error Count: 39 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 39 occurred at disk power-on lifetime: 23090 hours (962 days + 2 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 42 eb a7 06 e0 Error: UNC 66 sectors at LBA = 0x0006a7eb = 436203

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 40 ed a7 06 00 58 00:01:48.970 READ DMA EXT
25 00 48 e5 a7 06 00 58 00:01:46.815 READ DMA EXT
25 00 50 dd a7 06 00 58 00:01:44.825 READ DMA EXT
25 00 58 d5 a7 06 00 58 00:01:42.835 READ DMA EXT
25 00 60 cd a7 06 00 58 00:01:40.675 READ DMA EXT

Error 38 occurred at disk power-on lifetime: 23090 hours (962 days + 2 hours)
When the command that caused the error occurred, the device was active or idle.

So is this HDD dying?

CentOS 5.9: system goes to sleep and locks up?

magellan

Member

EarthDog

Gulper Nozzle Co-Owner

RJARRRPCGP

Member

magellan

Member

EarthDog

Gulper Nozzle Co-Owner

magellan

Member

EarthDog

Gulper Nozzle Co-Owner

magellan

Member

ihrsetrdr

Señor Senior Member

magellan

Member

ihrsetrdr

Señor Senior Member

magellan

Member

DocClock aka MadClocker

Senior Member

DocClock aka MadClocker

Senior Member

ihrsetrdr

Señor Senior Member

magellan

Member

magellan

Member

magellan

Member

Similar threads