• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

Project: Rackmount Overkill

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.
The new server, Lucid, is in the rack and fully functional. I hit an issue that was my mistake: I passed through the H700 RAID controller into the file server virtual machine instead of the M1015, which promptly caused the server to have issues and require a restart. Other than that, it is working really well.

r710_second_racked1.JPG


r710_second_racked2.JPG
 
Well, it was bound to happen at one point. This is why you run RAID, folks. The drive was working perfectly up until a day or two ago. Problem is, I don't have any spares.

Code:
        NAME                                            STATE     READ WRITE CKSUM
        StoragePool                                     DEGRADED     0     0     0
          raidz1-1                                      DEGRADED     0     0     0
            ata-ST31500341AS_9VS18FCY                   ONLINE       0     0     0
            ata-ST31500341AS_9VS3DX5F                   ONLINE       0     0     0
            ata-ST31500341AS_9VS1Y86S                   ONLINE       0     0     0
            [COLOR=Red]ata-ST31500341AS_9VS20RAK                   UNAVAIL      4   114     1  corrupted data[/COLOR]
My option is to either get one here, buy one off eBay, or buy new drives for the pool. The chances of someone having one lying around ready to be sold is pretty slim, so I'm not expecting that will work well. Getting a disk off eBay is risky in and of itself, but they are way overpriced (>90/e).

The latter option is the most expensive (4x $150) and the most direct, but with the new cluster sizes, it won't work well with a four drive RAIDz1. If I would have known that, I'd have built the array differently.
 
Very nice ongoing project...How many drives fail on your monthly thiddy?
This is the second drive I've had fail, and I've had more than 40. It is a pretty rare occurrence.

And, what are you using these for? :p
The failed drives? They get destroyed. The storage array is all personal usage and will (when I get the time) be also used for a virtual machine remote storage target.
 
Ah, pretty cool...I don't store so many stuffs but it'd be nice to have something like this.
Today I was looking at some cool HP servers, with dual hexa core cpus.
I was in shock when I saw they only cost 399usd! (ebay, refurbished).
So tempted!
 
If they are only 400 USD, then they are likely the old AMD six cores and I would suggest avoiding them. It is old tech and you'd be much better off getting something newer.
 
Finally took some time to figure out why ZFS was not automounting the array upon starting the virtual machine and saw the configuration in init.d. It looks to see if SELinux is enabled and drops out if it is. I added ZFS to the startup list (chkconfig zfs on) if it wasn't there and then stopped SELinux from running (permissive).

I can now add CrashPlan to the startup list and I don't have to mount everything manually. However, I will still want to see if the array is mounted before starting CrashPlan as it will write to the virtual machine drive, which is substantially smaller, and will prevent ZFS from starting.
 
Well, that was pretty easy. This script will work on one mount point, but could be easily adapted to use an array to check multiple points. I don't need it to do that, however. The script will take the path set in the script, check to see if it is mounted, then start the service if everything looks good. It relies on "mountpoint" to return whether a path is mounted.

If you add this to your init.d folder, you can use "chkconfig StartCrashPlan on" (change "StartCrashPlan" to whatever you called the file). When you restart, it will work.

Code:
#!/bin/bash
#chkconfig: 345 95 05

#This checks to see if the given mount point exists for ZFS (or anything really).
#If the location is mounted, it is safe to start the CrashPlan service. If it isn't, then make sure the service is stopped.
#We check this because CrashPlan writing files to the virtual machine root disk will quickly fill it and will prevent ZFS (or other mounts) from working.

LOCATION="/mnt/StoragePool"

start() {
        #Check to see if the location is mounted
        if mountpoint -q $LOCATION; then
                #The location is mounted, yay. Start the service.
                service crashplan start
                return 0
        else
                #The location is not mounted, boo. Make sure it isn't running.
                service crashplan stop
                return 1
        fi
}

stop() {
        #Simply stop the service from running
        service crashplan stop
}

case "$1" in
        start)
                start
                RETVAL=$?
                ;;
        stop)
                stop
                RETVAL=$?
                ;;
        restart)
                stop
                start
                ;;
        *)
                echo $"Usage: $0 {start|stop|restart}"
                RETVAL=3
                ;;
esac

exit 0

There's a job I could apply to here, Unix event monitoring. 12hs staring at a bunch of servers here. :p
I wish I knew more, my job is much worse than that.

The servers I found are these.
I'm not seeing a lot of information on those processors, but they were not what I expected them to be. I'd want to find out exactly what it has before buying it, but that doesn't look terrible.
 
Just found a slightly easier method to locate what PCI address a USB device is plugged into. My old method was going through the list of USB ports to pass in and then figuring out what they mapped to on the server. This takes forever and is stupid. I stumbled across a slightly better method that doesn't require brute force.

First, locate the USB device and note the ID tag (04f9:0033 in this case):
Code:
[root@lucid ~]# lsusb
Bus 005 Device 005: ID 0624:0248 Avocent Corp. 
Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 001 Device 002: ID 0424:2514 Standard Microsystems Corp. USB 2.0 Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
[COLOR=Red]Bus 003 Device 004: ID 04f9:0033 Brother Industries, Ltd [/COLOR]
Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 006 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Then tell lsusb to print in tree mode, locate the device in the list, then note the device it is under (this is the PCI bus):
Code:
[root@lucid ~]# lsusb -t
Bus#  6
`-Dev#   1 Vendor 0x1d6b Product 0x0001
Bus#  5
`-Dev#   1 Vendor 0x1d6b Product 0x0001
  `-Dev#   5 Vendor 0x0624 Product 0x0248
Bus#  4
`-Dev#   1 Vendor 0x1d6b Product 0x0001
[COLOR=Red]Bus#  3
`-Dev#   1 Vendor 0x1d6b Product 0x0001
  `-Dev#   4 Vendor 0x04f9 Product 0x0033[/COLOR]
Bus#  1
`-Dev#   1 Vendor 0x1d6b Product 0x0002
  `-Dev#   2 Vendor 0x0424 Product 0x2514
Now we need to know the address of that parent PCI bus. Look for the Bus/Device numbers with the following command to get the PCI address:
Code:
[root@lucid ~]# lsusb -v
<a bunch of output removed>

[COLOR=Red]Bus 003 Device 001: ID 1d6b:0001[/COLOR] Linux Foundation 1.1 root hub
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               1.10
  bDeviceClass            9 Hub
  bDeviceSubClass         0 Unused
  bDeviceProtocol         0 Full speed (or root) hub
  bMaxPacketSize0        64
  idVendor           0x1d6b Linux Foundation
  idProduct          0x0001 1.1 root hub
  bcdDevice            2.06
  iManufacturer           3 Linux 2.6.32.43-0.4.1.xs1.6.10.734.170748xen uhci_hcd
  iProduct                2 UHCI Host Controller
  [COLOR=Red]iSerial                 1 0000:00:1a.0[/COLOR]
Bam, now we have the PCI address to pass through in Xen/XCP. The first address is the m1015 RAID card.
Code:
[root@lucid ~]# xe vm-param-set other-config:pci=0/0000:06:00.0,[COLOR=Red]1/0000:00:1a.0[/COLOR] uuid=VIRTUALMACHINEUUIDHERE
Shutdown the virtual machine (this is important, a restart will NOT work) to update the PCI passthrough, then check to see if it is there:
Code:
[root@vm-fileserver ~]# lsusb
Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 001 Device 002: ID 0627:0001 Adomax Technology Co., Ltd
[COLOR=Red]Bus 002 Device 002: ID 04f9:0033 Brother Industries, Ltd[/COLOR]
Then do whatever you need to from there.
 
I'm not seeing a lot of information on those processors, but they were not what I expected them to be. I'd want to find out exactly what it has before buying it, but that doesn't look terrible.

Processors.

Just a tidbit for you, they're 60W TDP Nehalem chips. 1366 stuff.
Seems like most people want ~$100 for one of those processors by itself.
 
Well, I didn't have much choice on the disks. I couldn't get one here and I didn't want to pay the ridiculous prices on eBay for a used replacement drive that may or may not live for longer than ten seconds and may or may not arrive this year.

2013-08-24 13_20_37-Place Your Order - Amazon.com Checkout.png

This will allow me to rebuild my array properly with additional redundancy. I had been using an even number of disks in RAIDz1, which is not good for performance. I'll be doing 6 disks in RAIDz2 and I'll add more vdevs in the future.
 
Finally got Icinga up and running. Getting Icinga itself going was easy and didn't take much time at all. Configuring Nsclient++ (Windows Nagios plugin "replacement") was an absolute headache because the documentation is difficult to understand, and finding help online is just plain hard. I did finally get that working, however. Never thought that Linux NRPE/Nagios plugins would be easier than Windows.

The second part was a lot harder. My new ISP blocks outgoing connections to port 25, so I can't let Icinga handle emails normally. Instead, I had to configure Postfix to use my ISP email as a forward on outgoing email, which took awhile to figure out how to do. But, I finally did get that going, as well.

2013-08-24 17_22_27-Icinga - HostStatus.png


I picked up the Seagate NAS 3TBs for a little bit less than that, but the Reds should be a solid choice. Was two-day not fast enough?
If one more drive fails in that array, it is catastrophic data loss. While I have backups, I'll pay the $20 to get it a day early. I still have to migrate the data, which is going to stress the disks. I'm not comfortable with it being down that long.
 
Last edited:
I'm in a similar situation...

I've had good luck with hard drives until 2 weeks ago. Only had one IBM deathstar disk fail on me in all my years of computing (and I'm going back to Amiga days!) Now I've had 3 failures in less than two week!

Had my old PATA Maxtor drive fail in my IPfire router, replaced it with a WD blue 500GB 2.5" drive that was an RMA drive and its only lasted less than a week! :mad: RMA time again..

So there is three so far,

And then one of the four 1TB Hitachi drives failed in my RAID5 Data array. The drive was a lot older than the other three, identical but with 16MB cache rather than 32MB. AFAIK

Now I either buy a replacement for £55 (GBP) or start over with three WD RED 2TB drives (£270)... If money was no problem it would be option two.

Looks like my good luck with hard disks came to an end this month :cry:

My DATA array has been critical for over a week now, lets hope the remaining three Hitachi drives hold up and the 3TB WD green that the array is backed up to. Going to be over three week until I can get a replacement drive. :-/
 
Icinga is completely configured and monitoring all the virtual machines and hypervisors after much desk-face-smashing.

I created a custom plugin to check if a location is mounted.
Code:
#!/bin/bash
#Created by thideras.

#This plugin relies on the binary "mountpoint" to check if a location is mounted and returns the value.

#Check to see if the mountpoint binary exists before attempting to use it
MOUNTPOINTLOC=`which mountpoint 2>> /dev/null`
if [ "$MOUNTPOINTLOC" = "" ]; then
        echo MOUNT WARNING - Binary \"mountpoint\" does not exist
        exit 2
fi

LOCATION=$1

if mountpoint -q $LOCATION; then
        echo MOUNT OK - $LOCATION
        exit 0
else
        echo MOUNT CRITICAL - $LOCATION
        exit 1
fi
It is called with the following line in nrpe.cfg:
Code:
command[check_mount_storagepool]=/usr/lib64/nagios/plugins/check_mount "/mnt/StoragePool"
2013-08-25 17_51_04-Icinga - Host List.png
The only items not being watched now are the switch and the firewall.


Notifications are also working:
icinganotifications.png
 
Drives are here, so I'm getting started on that.

DSC_0255.JPG






The problem then becomes: I have a 20 (19 now) drive ZFS array that I can't remove disks from without greatly increasing the chance of catastrophic data loss. Copying files across the network would work, but would be very slow and getting the data off the array is important. So, how do you do it?

LIKE THIS (Certified Dell Technicians, don't click)
 
Back