• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

My Exchange server 2003 crashed

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

Joeteck

Retired
Joined
Oct 5, 2001
Location
Long Island
Came in this morning (7:30AM) and to see my Outlook has frozen.

Go to the server, and sure enough the system is LOCKED.

Shutdown, RAID arrays come up fine, showing "optimal."

Rebooted and system came up without a problem.

About 10 minutes into system idle, the system freezes.

Reboot again..

I shut down services for Symantec for exchange, just in case that's causing it.

5 minutes, freeze. I can't even get into event viewer.

Open up MMC on a different server and add a plugin for my exchange server, that server freezes!

Reboot exchange again.

Finally I be-line to event viewer after windows comes up.

Event viewer > system. Shows an error on one of my controllers. Boot RAID 1 array, having problems. FROZE!

Shut down yet again.

After restart I run diskeeper 2008. During the attempt to defrag, I hear the drives making some grinding then a pause, grinding or an attempt to relocate the data. Does not sound correct. So I stop dragging. Reboot once again.

This time I remove one of the SCSI drives. So now the array is degraded.

It starts to boot, then restarts... Oh crap! That looks like the bad drive. So I shut down again and reconnect the other drive and disconnect the failed drive.

System comes up, however still making that strange noise when I defrag, and guess what it freezes!

I shut down, find my Acronis disk, and back this thing up before I loose the entire system!

8 minutes later it backsup to my external 500gig 2.5" usb drive.

I grab a backup drive I bought when I built this server. Run Acronis again, and restore to a single 74gig 15K drive.

5 minutes later, reboot. System does the standard "check disk" for inconsistencies.

Reboot again, and we are back online! Whew! 4 Hours down. and about 600 email need to get delivered.

What makes this so very strange is that my RAID 1 array FAILED. Both drives "POOF" They started to show some major signs of problems. In all the years of network administration this is the first I've seen this..

I wanted to share my morning with you!

Joe
 

ThePerfectCore

Red Raccoon Dojo
Joined
Mar 1, 2002
Location
Texas
Well that was a pleasant ending. I was half-expecting you to finish up with "now how to do I fix this PLEASE HELP INTERNETS"

Were the disks from the same batch/manufacturer/model etc?
 

ThePerfectCore

Red Raccoon Dojo
Joined
Mar 1, 2002
Location
Texas
What was the manufacturer?

I guess I should add some content to this thread.

On Monday afternoon I was speed-walking out of the office. I'd had a new build delivered to my apartment and needed to get home before it became someone else's. While walking past our domain controller I heard the tell-tale "click whirrr.... click whirrr..." of a dying hard disk.

My company has two DCs, one in the main office and one in the satellite office. We use off-the-shelf external drives for nightly backups in the satellite office. Fortunately only the external drive had died.

I plugged the other external in and left.

On Tuesday I walked in and guess what! The second external had died too. Clickity-clickity-click-buzz! Both drives died within 12 hours of each other.

Of course, they were both Seagates.
 
OP
Joeteck

Joeteck

Retired
Joined
Oct 5, 2001
Location
Long Island
What was the manufacturer?

I guess I should add some content to this thread.

On Monday afternoon I was speed-walking out of the office. I'd had a new build delivered to my apartment and needed to get home before it became someone else's. While walking past our domain controller I heard the tell-tale "click whirrr.... click whirrr..." of a dying hard disk.

My company has two DCs, one in the main office and one in the satellite office. We use off-the-shelf external drives for nightly backups in the satellite office. Fortunately only the external drive had died.

I plugged the other external in and left.

On Tuesday I walked in and guess what! The second external had died too. Clickity-clickity-click-buzz! Both drives died within 12 hours of each other.

Of course, they were both Seagates.

That's a funny story....

The drives are Fujitsu... Now owned by Toshiba.

Model MAX3073NP x 2 and MAX3147NP x 2 for my information store.
 

simcom

Member
Joined
Sep 4, 2006
I have a thought:
Since you bought the drives at the same time, that mean both the drives went through the same shipping, handling and environment stress? Just wondering if this can be a factor.

Also if some hardware (usually the RAID card or the card's RAM) is failing, it too will corrupt both drives in RAID 1 together.

And like fellow board members all say, RAID IS NOT A BACKUP!
 
OP
Joeteck

Joeteck

Retired
Joined
Oct 5, 2001
Location
Long Island
I have a thought:
Since you bought the drives at the same time, that mean both the drives went through the same shipping, handling and environment stress? Just wondering if this can be a factor.

Also if some hardware (usually the RAID card or the card's RAM) is failing, it too will corrupt both drives in RAID 1 together.

And like fellow board members all say, RAID IS NOT A BACKUP!

hahaha. Funny. These drives failed 3 years later. The server is up and running with another drive on the same controller. Correct RAID 1 is not a backup, however a RAID 1 is a low cost solution for redundancy. Both drives in a RAID 1 array never usually fail all at once.. This is extremely rare...
 

ThePerfectCore

Red Raccoon Dojo
Joined
Mar 1, 2002
Location
Texas
Both drives in a RAID 1 array never usually fail all at once.. This is extremely rare...

But it almost got you. :p

This is why when I build RAID arrays I buy two drives from different manufacturers and build the array according to whichever drive ends up being the smallest (usually by a few megs at most). I believe this lessens the chance of dual hard disk failures... does anyone else do this?
 

madhatter256

Special Member
Joined
Jul 5, 2008
Location
CFL
I've seen a RAID 1 system fail completely. Both HDDs refused to spin. Thought the PSU was a the issue, but when the system is running (POST) the voltages are fine and the drives DO get power.

They were both from the same manufacturer, but one was 1 year older than the other.
 

[email protected]

Member
Joined
May 29, 2006
I suppose it's only a matter of time before it happens, statistically speaking. A million different servers, running a million different RAID 1s, etc etc. Craziness...

I shut down, find my Acronis disk, and back this thing up before I lose the entire system!

8 minutes later it backs up to my external 500gig 2.5" usb drive.

Bacon==saved. Nice that you got out of that one without issue, you should casually mention that you saved every last email ever to your boss :)
-Drew
 
Last edited:
OP
Joeteck

Joeteck

Retired
Joined
Oct 5, 2001
Location
Long Island
But it almost got you. :p

This is why when I build RAID arrays I buy two drives from different manufacturers and build the array according to whichever drive ends up being the smallest (usually by a few megs at most). I believe this lessens the chance of dual hard disk failures... does anyone else do this?

You can't do this... This is a good way to have your array break often. Specs are different. One drive will most likely be slower and will end up in a failed array... I would not recommend this at all..

We are talking about SCSI 15K drives with TCQ... Not SATA drives with NCQ.. You can't beat SCSI or SAS in a server... they just rock!
 

ThePerfectCore

Red Raccoon Dojo
Joined
Mar 1, 2002
Location
Texas
Enterprise drives are enterprise drives. A good RAID card can handle different disks, and the idea IS if one disk fails the chances of the other failing for being in the same batch / from the same manufacturer / on the same truck is less.
 

Xaotic

Very kind Senior
Joined
Mar 13, 2002
Location
Greensboro NC
Any chance that your controller has available ports and hot spare capability? I have seen more failures during rebuild than at any other time, due to increased disk activity. Not having to source a spare and having the rebuild start immediately can be priceless. Is there a monitoring software included with the card? Many have SNMP and/or email notification.

At least the good news is that you had no data loss, just an outage.
 
OP
Joeteck

Joeteck

Retired
Joined
Oct 5, 2001
Location
Long Island
Any chance that your controller has available ports and hot spare capability? I have seen more failures during rebuild than at any other time, due to increased disk activity. Not having to source a spare and having the rebuild start immediately can be priceless. Is there a monitoring software included with the card? Many have SNMP and/or email notification.

At least the good news is that you had no data loss, just an outage.

This SCSI (29320ALP-R) card only has one port, which is all you need. Because you can have 15 drives. When you're on a budget you have to make sacrifices, and I had very little money to do anything radical. So I picked two of these, and setup two RAID 1 arrays.. Boot and Information store. Its very fast and fits my business model perfectly. I built this 3 years ago... Things have changed!

EDIT: I may setup two RAID 10's in the near future... four drives each.
 

visbits

Member
Joined
Jan 20, 2009
I doubt both drives are going, did you try a different controller and import the foreign configuration? I've never been happy with adaptec scsi controllers. The best I've found so far is an HP641, has a huge amount of ram and a very fast processor.
:comp:
 
OP
Joeteck

Joeteck

Retired
Joined
Oct 5, 2001
Location
Long Island
I doubt both drives are going, did you try a different controller and import the foreign configuration? I've never been happy with adaptec scsi controllers. The best I've found so far is an HP641, has a huge amount of ram and a very fast processor.
:comp:


Yes, both drives were on their way out.. I degraded the array purposely and booted off one of the drives which midway, rebooted right in the middle. I then moved to the other drive which the system froze about 10 minutes into it. Did you even read my original post?? I made an image of the so called working drive, restored it to a new 15K drive, and wa-la all good... Both drives failed at the same time... yes...RARE!
 

visbits

Member
Joined
Jan 20, 2009
I still don't believe that, the fact you were able to just make a copy of the array no problem but it kept locking up on boot. I've been working with raid and controllers for many many years and the situation you described is 99% of the time bad controller ram.


Did you check smart status on those drives and run diags on them?? Its rare for enterprise drives to fail that quickly, and all enterprise drives do a KILLER good job of reporting problems, perhaps the monitoring you were in control of was inadequate.

Life as a administrator :thup: