Project: Rackmount Overkill

synthetic_fenix · Apr 16, 2013

Sorry, wasn't implying that you didn't know. Just wasn't sure what you were using.

Automata · Apr 17, 2013

synthetic_fenix said:
Sorry, wasn't implying that you didn't know. Just wasn't sure what you were using.

No apology required. I appreciate the offer, but I created the testing methodology for the NAS server tests I was going to do before I got a bit more busy. To get real results, you would need to use real files, which is what I'm doing. Synthetic tests can tell you important information, but I find it best to use my own files as it is what I will be actually seeing when I use it, and it is a bit more random/realistic.

I got interrupted before I could finish my compression tests, which I'm starting over right now. Curious to see how it fairs when dealing with "document" data. I may dedicate a pool to backups/archives (snapshots even) and I'd like to see how well it compresses what I currently have compared to how long it takes to copy. The compression is certainly multithreaded, as it is taking up 8 full cores on Ruby.

Still on the plate is deduplication, reading speed, pure read/write sequential throughput, snapshots, and something else I can't think of right now for some reason.

Automata · Apr 17, 2013

The compression test finished last night and it is pretty impressive. I sync a large dataset every hour with rsync, which includes my /home folder, a secondary mounted storage drive, and my entire Windows disk (gaming). That gives the following stats:

541.6 GiB
448,381 items

Uncompressed, ncdu reports the same amount of disk space being taken by the files themselves, which tells me that I can compare numbers elsewhere with confidence. "Zpool list" shows 821G allocated, which sounds like it is reporting the file size plus parity. I can't account for the extra space in any other way.

With compression on the exact same dataset, ncdu reported 423.1 GiB used (78% of the original size) and "zpool list" reports 635G allocated (77% of original). That may not seem like a lot, but that would effectively increase the space in that pool by 28%. Not bad.

I also received the rails for the fiber switch tonight, so I got that installed after a bit of adjusting.

Automata · Apr 19, 2013

Reading from the array is pretty quick. This is much faster than it was previously. I'm watching actual iostat numbers to make sure the system isn't cheating by preloading it into memory.

Code:

               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
zfstest     5.35T  87.0G  2.15K     95   275M   348K
zfstest     5.35T  87.0G  1.83K    116   234M   377K
zfstest     5.35T  87.0G  1.93K    121   248M   455K
zfstest     5.35T  87.0G  2.15K    138   275M   579K
zfstest     5.35T  87.0G  2.17K     64   277M   323K
zfstest     5.35T  87.0G  2.31K      0   295M      0

Code:

38999659736 bytes (39 GB) copied, 132.507 s, 294 MB/s

I've filled the file system to see how quickly it does a scrub. Disks are reading individually at 300 MB/sec. ZFS says it is about 5 hours away from a full scrub.

Code:

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb               0.00         0.00         0.00          0          0
sda               1.00         0.00         0.01          0          0
[B]sdc             421.80        79.27         0.00        396          0
sdh             423.80        79.21         0.00        396          0
sdd             438.00        79.21         0.00        396          0
sdf             408.60        79.62         0.00        398          0
sdi             406.00        79.74         0.00        398          0
sde             411.60        79.87         0.00        399          0[/B]
sdg               0.00         0.00         0.00          0          0
dm-0              2.00         0.00         0.01          0          0
dm-1              0.00         0.00         0.00          0          0

Code:

[root@ruby ~]# zpool status zfstest
  pool: zfstest
 state: ONLINE
  scan: scrub in progress since Fri Apr 19 21:16:09 2013
    50.9G scanned out of 5.35T at 343M/s, 4h30m to go
    0 repaired, 0.93% done
config:

        NAME        STATE     READ WRITE CKSUM
        zfstest     ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdh     ONLINE       0     0     0
            sdd     ONLINE       0     0     0
          raidz1-1  ONLINE       0     0     0
            sdf     ONLINE       0     0     0
            sdi     ONLINE       0     0     0
            sde     ONLINE       0     0     0
        spares
          sdg       AVAIL

errors: No known data errors


[root@ruby ~]# zpool list
NAME      SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
zfstest  5.44T  5.35T  87.0G    98%  1.00x  ONLINE  -

After working with it for a few days, I can't say that there is anything that I don't like, short of not being able to remove disks.

Automata · Apr 20, 2013

I've cleaned up my data to fit on the zfstest temporary array to get ready for the migration to ZFS. Getting ready to copy everything over now. Wish me luck.

Migrating the OS disk is going to be very interesting. It doesn't carry over between RAID cards.

txus.palacios · Apr 21, 2013

Hey thid. It seems you're running ZFS on Linux. Is it so?

If so, keep a Illumos/Solaris live disk around (f.ex., OpenIndiana or OmniOS). That thing is a real lifesaver for ZFS. When everything else fails it is usually recommended to load your zpools on a "native" ZFS system for kicks and giggles, and usually it yields access to data you thought was lost.

Automata · Apr 21, 2013

Yes, this is ZFS on Linux. I'll grab a copy and keep it local, thanks for the tip. The copy should be done, but I'm still at work and can't check.

Still trying to decide if I want to include the "external" 1.5 TB drives in the ZFS pool. If the drives get disconnected or shut off, the pool is offline. The upside is that the pool is huge. Also trying to figure out if I want to go with one big vdev per drive type (all 1 TB drives in a RAIDz2 vdev) or break it into groups of 3/4 with RAIDz1. The first gets me more space and is easier to configure, but it will be more difficult to upgrade in the future as I'd have to replace more drives.

txus.palacios · Apr 21, 2013

What "external" 1.5TBs?

Automata · Apr 21, 2013

txus.palacios said:
What "external" 1.5TBs?

At the bottom of this picture, you can see my Omnistar (Rackable Systems) external SAS expander tray, which is currently holding eight 1.5 TB drives. I looked through the thread and see that I didn't post picture of just the unit installed in the rack, it seems. Ignore the mess please. It is temporary as I'm waiting for new parts and I'm participating in the folding competition.

txus.palacios · Apr 21, 2013

Oh, that. Why would they get disconnected or shut off?

Automata · Apr 21, 2013

txus.palacios said:
Oh, that. Why would they get disconnected or shut off?

If the power goes out, the unit does not turn back on. I'm fairly sure I can configure it via a serial connection, but I need a cable to do it. Plus, things can go wrong. I don't want to design the system assuming everything is working.

txus.palacios · Apr 21, 2013

Oh, right.

When are you going to get a UPS? For these scenarios it would surely be useful.

BTW: Found this. Especially, this.

It seems there is a jumper somewhere that would enable the automatic power-on feature?

Automata · Apr 21, 2013

Interesting, I'll have to check that out.

I'm thinking combining them all into one pool will still be best. Doing some quick math, it should give me 25 TB usable space with smaller groups, and 28 TB usable space if I lump them into RAIDz2 groups by drive type.

The problem now is trying to image over the OS drive. I have no file server to save it to because it is the file server. I'll make the R710 a temporary share for a bit. That is pretty much the only way I can bootstrap this process.

Automata · Apr 21, 2013

That was a slight scare with ZFS. I got the OS drive moved over to the new RAID card, did a "zpool status" to make sure everything was ok, and everything was not ok. Over half the total disks in my migration pool were showing offline ("missing disk"). Doing a quick search, it looked like it searches for the disks on import, but since it was already imported (and broken), it wouldn't let me do that. Had to export the array, import, then all was well.

Ruby is doing updates for the first time in forever. When that finishes, I'm going to start the disk migration of the R510 to its local disks by installing the 8708EM2 that I just pulled out of Ruby. I may also reinstall the OS on the R710 when this is all done (later tonight probably) to get XCP back up because it won't finish another WU before the race finishes.

Automata · Apr 21, 2013

Well, this is great. One SAS drive in Ruby is throwing fits or I have a bigger overall problem. I got everything installed, updated, went to reboot and "Input/output error". Every command I typed. I had to manually power the server down. Thinking this was because of the third OS drive I had in as a hot spare (which had tripped the RAID alarm in the past), I pulled the drive, and re-imaged the disk. It got about 10 seconds in before it exploded with buffer I/O errors and now the disks are rebuilding again.

Lovely. I'd like an upgrade to finish without error on the first time for once. That'd be a break.

Automata · Apr 22, 2013

After running all night and day without issue, the RAID 1 array broke again today. I discarded the array setup and I'm just booting a single SAS drive at the moment. I'm just going to order a Samsung Pro and be done with this. I don't know if it is the card or not. No other drives are having issues, just these three.

Automata · Apr 23, 2013

The file server is behaving better now with the SAS drive off of the SAS expander. I have no idea why only those drives are throwing fits, but I'm taking absolutely zero chances with my file server. In the meantime, I have ZFS completely setup with my data on it, I just need to sort it out into the different pools.

Code:

[root@ruby mnt]# zpool status
  pool: StoragePool
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        StoragePool  ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            sdh     ONLINE       0     0     0
            sdf     ONLINE       0     0     0
            sdp     ONLINE       0     0     0
            sdq     ONLINE       0     0     0
            sdn     ONLINE       0     0     0
            sdo     ONLINE       0     0     0
            sdg     ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdi     ONLINE       0     0     0
          raidz2-1  ONLINE       0     0     0
            sdy     ONLINE       0     0     0
            sdx     ONLINE       0     0     0
            sdv     ONLINE       0     0     0
            sdu     ONLINE       0     0     0
            sdr     ONLINE       0     0     0
            sdt     ONLINE       0     0     0
            sdw     ONLINE       0     0     0
            sds     ONLINE       0     0     0
          raidz2-2  ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sde     ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sdj     ONLINE       0     0     0
            sdm     ONLINE       0     0     0
            sdl     ONLINE       0     0     0
            sdk     ONLINE       0     0     0

errors: No known data errors

Code:

[root@ruby mnt]# zpool list
NAME          SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
StoragePool  33.4T  4.02T  29.4T    12%  1.00x  ONLINE  -

Interestingly enough, moving data between pools is not the same as moving a folder on the same drive. It is re-balancing the data as I move it.

Code:

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sda               0.00         0.00         0.00          0          0
sdy             195.80        13.45        10.36         67         51
sdx             195.20        13.48        10.32         67         51
sdv             202.80        13.49        10.54         67         52
sdu             199.40        13.52        10.17         67         50
sdr             205.20        13.62        10.80         68         54
sdt             197.00        13.35        10.38         66         51
sdw             187.60        13.44        10.84         67         54
sds             199.40        13.44        10.60         67         53
sdc              70.20         0.00        12.26          0         61
sde              72.40         0.00        12.20          0         61
sdd              64.00         0.00        11.71          0         58
sdj              63.40         0.00        12.33          0         61
sdh             265.20        13.09        11.26         65         56
sdf             170.80        16.10        11.82         80         59
sdm              64.00         0.00        12.10          0         60
sdp             260.80        13.31        11.27         66         56
sdq             267.80        13.13        11.26         65         56
sdl              79.60         0.00        12.11          0         60
sdn             164.80        16.05        11.53         80         57
sdk              63.00         0.00        12.19          0         60
sdo             266.80        13.04        11.21         65         56
sdg             265.80        13.14        11.25         65         56
sdb             170.60        15.94        11.56         79         57
sdi             264.60        12.99        11.30         64         56
dm-0              0.00         0.00         0.00          0          0
dm-1              0.00         0.00         0.00          0          0

Also, say hello to AWK (R710) and SED (R510). My goal was to switch them over yesterday, but a migraine decided to be a jerk and I was out pretty much the entire day. I got them switched over this morning and completely re-did the network in the rack. Each server has four 1 GB bonded NICs to the Powerconnect. I also got the NICs bound to their final addresses on the router, including the management iDRAC cards. I'll try to get pictures of that tomorrow. It isn't done as I still need to route the fiber cables, but it is pretty close.

synthetic_fenix · Apr 24, 2013

LOL maybe I should use you for my offsite backup

Automata · Apr 24, 2013

synthetic_fenix said:
LOL maybe I should use you for my offsite backup

Well, I am already offsite storage for Google, as per the Storage Megathread title.

Here are the slightly updated wiring pictures, as promised. I still have quite a bit more to do, mainly the power cables. Those are flying every which way. The IBM x3650 M1's are no longer in the rack.

Automata · Apr 25, 2013

thideras said:
The file server is behaving better now with the SAS drive off of the SAS expander.

That didn't last long. I was configuring CrashPlan on the server now that all the data is in its final location and in the right pool when suddenly all my terminals pop up with "kernel: journal commit I/O error". Great, this again. Try to run a command that I haven't run before and I get an input/output error. I go down to the rack and it is covered in /dev/sda device errors (SAS drive). I have the disk cloning over to another disk right now, but it is cloning as "raw", which isn't a good sign. Not going to lose anything important other than configuration files, but that is still pretty annoying. The replacement should be here shortly.

Project: Rackmount Overkill

Risen From the Ashes

Destroyer of Empires and Use

Destroyer of Empires and Use

Destroyer of Empires and Use

Destroyer of Empires and Use

Member

Destroyer of Empires and Use

Member

Destroyer of Empires and Use

Member

Destroyer of Empires and Use

Member

Destroyer of Empires and Use

Destroyer of Empires and Use

Destroyer of Empires and Use

Destroyer of Empires and Use

Destroyer of Empires and Use

Risen From the Ashes

Destroyer of Empires and Use

Destroyer of Empires and Use

Similar threads