• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

ZFS All-In-One Instructions (complete)

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

cw823

Honeybadger Moderator
Joined
Dec 18, 2000
Location
Earth
I'm trying to put all of the instructions for an ESXI based ZFS All-In-One in one location. I will try to cite my sources as used.

My ZFS All In One consists of the following:
Hardware:
Supermicro X8SIL-F (with IPMI)
Xeon x3440 ES
4x8Gb DDR3-1333 ECC Registered (as much as you can afford)
16Gb Kingston USB Flash Drive (ESXi Install)
Hardware RAID1 - 2x250Gb SATA drives for main datastore (software raid will not be seen by ESXi)
IBM M1015 SAS/SATA controller in IT mode - passed through via vt-d to VM
(4) 4tb Hitachi 7k4000
(2) 60Gb Agility 3

Software:
ESXi 5.5 Hypervisor
OmniOS (latest stable release from their website)
Server 2K8
Ubuntu


VMs:
NAS (OmniOS)
Torrent (all legal stuff, of course)
Plex server serving multiple Roku 3
Vcenter
Other VMs as needed

http://www.napp-it.org/napp-it/all-in-one/index_en.html
http://www.napp-it.org/doc/downloads/all-in-one.pdf
http://napp-it.org/doc/ESXi-OmniOS_Installation_HOWTO_en.pdf
The PDFs have most of what you need to know, with a few things I'll add below.

Based on my hardware above, I built the OmniOS NAS VM with the following virtual hardware:
24Gb Memory
2 Virtual CPUs (1 virtual socket, two cores)
30Gb hard drive
M1015 passed through to OS
(1) e1000g (for install/setup, will remove later)
(3) vmxnet3s adapters, two to the main vswitch (one mgmt and one lan access), and one to the vswitch for the NFS share to ESX (screenshots to follow)
Floppy drive (OmniOS absolutely will not install without a floppy drive. Which you don't use for the install. Who knows. :shrug: )
CD/DVD drive which you will use to mount the iso for whatever NAS software that you want to install.

The PDF has the setup instructions for the most part.
If you have a hardware raid1, I would not worry about mirroring the bootdisks. If you don't, I'd suggest two hard drives as datastores with a 30Gb hard drive on each datastore assigned to the VM (which you will mirror in your NAS software)

ESX Network Setup
I use two vswitches in ESX, one for lan/mgmt and one for NFS share
vswitchsetup.jpg
You will create the second vswitch (add networking, virtual machine, etc...., then add a vkernel to that, and you can set the ip of your nfs network here i.e. 192.168.7.x)

Omnios Network setup -
so login as root (no pw)

ipadm create-if e1000g0
ipadm create-addr -T dhcp e1000g0/dhcp
echo 'nameserver 8.8.8.8' >> /etc/resolv.conf
cp /etc/nsswitch.dns /etc/nsswitch.conf
ping 8.8.4.4 (google dns)
you should get "8.8.4.4 is alive"
now ping google.com
you should get "google.com is alive" (which means your dns and routing is working)

Install napp-it
5. install napp-it 0.9
wget -O - www.napp-it.org/nappit | perl
reboot after installation of napp-it !!
reboot

Install vmtools
Use the instructions in the ESXi-OmniOS_Installation_HOWTO_en.pdf linked to above

Once that is completed, your vmxnet3 adapters should show up when you do a
dladm show-link
You can either setup the IP addresses manually, OR you can use the nappit interface at:
http://<e1000gipaddressviadhcp>:81
of course substituting the actual IP address.
There is no password for napp-it.
Now click on System, network, and you'll have a page where you can set the IP information for your vmxnet3 adapters
Mine are set to:
10.10.10.x - management
10.10.10.x - lan access
192.168.7.x - NFS share to ESXi

now let's enable jumbo frames:
If you go back to command-line in your NAS console
FOR vmxnet3s:
vi /kernel/drv/vmxnet3s.conf
We need to change the entries for LSO and MTU. See screenshot for what it should look like here
vmxnet3_mtu.jpg
FOR e1000g:
vi /kernel/drv/e1000g.conf
We need to change the entries for MaxFrameSize
Reboot the VM. You can do this via the ESXi console via VM, Power, "Restart Guest" now that vmtools is installed.
log back in as root, now is a good time to change the password:
"passwd root" and type your new root password in twice
do a "ndd -set /dev/vmxnet3s0 accept-jumbo 1" for each vmxnet3 adapter (in my case there are three, so I did this for vmxnet3s0, vmxnet3s1, and vmxnet3s2) THIS STEP IS CRUCIAL. They will show in napp-it as 9000 MTU but until we tell Solaris to accept jumbo frames, JF will absolutely not work.

Once that is configured, you will want to setup your pool of drives that are passed through via the M1015 card. You can do this from napp-it on the pools tab, which is very simple.
RAIDz for space, raidz2 for space with extra redundancy.
if you need I/O, only use 2 or 3 drive mirrors in sets so ZFS can stripe your data across the mirrors (fastest). The more spindles, the more I/O
Once your pool is created, create a file system for our NFS share to ESXi. Go to the ZFS Filessystems tab
"Create"
Make sure the pool you just created is listed in the "Pool" field (mine is ZFS1)
Enter the name of your ZFS Filesystem (mine is ESXi)
Change SMB Share to off
Change nbmand to off
Submit
Click the ZFS Filessystems tab again and you should be able to see your new Filessystem for NFS share
Click on "off" under the NFS column next to this file system
It will come up as sharenfs= on
Click "set property"
You are now ready to connect to ESXi

Enabling jumbo frames in ESXi
Configuration tab, networking. If you click the "properties" for the NFS vswitch that you created you will see that both "vSwitch" and "VMkernel" options (under ports) have the MTU set to 1500. We need to change this to 9000 for both. It will bark at you about no physical network adapters, and that's ok. Make both entries look like:
jf1.jpg


Mount the NFS share in ESXi. Given my pool name, ip address, and filesystem name above....
in ESXi click on "configuration" tab, and "Storage"
Click on "Add Storage"
Choose "Network File System"
Server: 192.168.7.x (this is your nas ip on the separate private network for NFS)
Folder: ZFS1/ESXi (this is caps sensitive)
Datastore Name: whatever you want. Mine is NASESXi
Next
your datastore should show up now in the list, and you can start building VMs and storing them on your NAS through ESXi.

One box, high performance, many servers!

Other tweaks:
"ipadm set-prop -p max_buf=4194304 tcp"
"ipadm set-prop -p send_buf=1048576 tcp"
"ipadm set-prop -p recv_buf=1048576 tcp"

Other notes:
Do NOT do the "upgrade virtual hardware" option if you right-click a VM. This will make it impossible to change settings in the vSphere Client, you will have to use VMware Workstation 10 (limited functionality) or vcenter server (not free)

http://blog.cyberexplorer.me/2013/03/improving-vm-to-vm-network-throughput.html
 
Last edited:
I will get the screenshots attached later today
 
Benchmarks. ZFS1 is 4x 4tb 7k4000 7.2k Hitachi drives attached to an M1015 controller. ZFStemp is 1x 4tb 7k4000 7.2k Hitachi drive. ZFS VM has 32Gb of ECC Ram and ZFS1 has a 128Gb SSD for ZIL.

20Gb ddbench
20G.jpg

40Gb ddbench
40G.jpg

80Gb ddbench
80G.jpg

iozone 1g
iozone_1g.jpg
 
I would be interested in seeing some benchmarks of SSD being used as L2ARC also. What SSD cell technology are you using for the ZIL?
 
I would be interested in seeing some benchmarks of SSD being used as L2ARC also. What SSD cell technology are you using for the ZIL?

Regular OCZ consumer SSD. I need to get a 3700 and try that on there to see if any improvement can really be made.

Actually I could drop the SSD from ZIL and attach it as l2arc but not sure I would want to run any benchmarks right away.

All benchmarks have been done with a minimum of 4 active VMs.
 
sorry to be annoying, but can you upload them to the site? the scaling is pretty hard to read on the site linked from photobucket.

Pretty cool numbers there though, I would have thought there would be a larger improvement with the SSD added
 
sorry to be annoying, but can you upload them to the site? the scaling is pretty hard to read on the site linked from photobucket.

Pretty cool numbers there though, I would have thought there would be a larger improvement with the SSD added

Maybe.....

When I go through my next round of testing I will do so. And yes, I thought I would see a huge improvement on writes
 
I would be interested in seeing some benchmarks of SSD being used as L2ARC also.

As I think about this, I'm not sure how I could benchmark L2ARC. I will have to google to see if there is a way, understanding what L2ARC is and what its purpose is.
 
C-Dub was holding back. I'm going to read through this. Thanks for your time, hard work, and dedicated, brother!!
 
Ok, so I'd assume you'd want some feedback from someone who is not too savvy with this stuff.
Could you elaborate on the VLAN setup for NFS? I added a vKernel, so I tried 10.0.69.1, 255.255.255.0 and tried 10.0.69.1 as the default gateway, and also the physical network's gateway to no avail. I can't grab an IP from DHCP, so I'm guessing I set it up wrong.
*Edit* OK, so I have to use a static IP in napp-it and try something within that subnet and I was able to get the datastore added via the NFS network.


Also, why do we want the physical network attached twice with two NICs (LAN and Management?)... I set one to static for management, and the other as DHCP... in my case 192.168.69.170 (static) and 192.168.69.35 DHCP. I get mixed results when pinging the hostname (omni-napp). Just trying to figure out the benefit to having two NICs since I can access management via both. For now, I removed the second NIC.

I tried to enable jumbo frames and it still shows up as 1500 in napp-it. I tried everything you mentioned in the CLI, even rebooted napp-it, and I still see 1500 MTU for my two NICs.
 
Last edited:
ZFS is the best file system.

Meh, I don't like how it's immutable, and RAM hungry. It's more mature than btrfs, and feature complete, but I think btrfs has more promise. That's what I use, by the way, on my computer. I tried ZFS, but it wasn't for me, mostly because of how much of a pain it is to get to work in Linux, but also the aforementioned RAM and immutability issues.
 
ZFS by nature is ram hungry because of how it uses it for various cacheing of data, and reads and writes. Of course there is also deduping which is not required or on by default which is the biggest memory hog since ZFS does deduping in real time. But without deduping, OpenIndiana hosting a ZFS drive pool TB's big can run in as little as 2-3GiB.
 
Hmmm, well, I still don't like how I can't change the array without starting over. Just last week, for instance, I took elements of a btrfs volume offline one at a time, put a bcache backing partition on them, adding the backing partition back into the volume rinsed and repeated. I know that ZFS natively have SSD caching (again its a mature FS), however what about when I want to add another drive, like I did over the summer, and will do this summer or spring.

If I was building a NAS, I definitely would use ZFS over btrfs, but for everyday use, even though btrfs is still under heavily development, I would choose btrfs.
 
I tried to enable jumbo frames and it still shows up as 1500 in napp-it. I tried everything you mentioned in the CLI, even rebooted napp-it, and I still see 1500 MTU for my two NICs.

vmxnet3 or e1000?

For e1000 you'd want to edit /kernel/drv/e1000g.conf
Change the "max frame size" to 3 for how ever many adapters you have. For me I just changed the first one so it reads MaxFrameSize=3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;
Save
Reboot
for vmxnet3 you'd want to edit /kernel/drv/vmxnet3s.conf
Change the "MTU" to 9000 for how ever many adapters you have. For me it's MTU=9000,9000,9000,9000,1500,1500,1500,1500;

yeah I have more vmxnet3 than you'll have, not necessary.
 
Hmmm, well, I still don't like how I can't change the array without starting over. Just last week, for instance, I took elements of a btrfs volume offline one at a time, put a bcache backing partition on them, adding the backing partition back into the volume rinsed and repeated. I know that ZFS natively have SSD caching (again its a mature FS), however what about when I want to add another drive, like I did over the summer, and will do this summer or spring.

If I was building a NAS, I definitely would use ZFS over btrfs, but for everyday use, even though btrfs is still under heavily development, I would choose btrfs.

https://rudd-o.com/linux-and-free-software/ways-in-which-zfs-is-better-than-btrfs

Lack of planning does not make ZFS a lesser system than another system. If you are concerned about not being able to add drives, then do drive sets of 2. yes, if you install a 7 drive RAIDz2, you can't just add one drive to it to expand it.

ZFS lets you do write cache, and L2ARC
 
The implications brought up in that link are inaccurate. It implies that btrfs can't do a number of things that the author say's ZFS can. For instance, mounting subvolumes willy-nilly. I can, and do, mount subvolumes on top of parents. I can, and do, give each user his own home directory. Anyone who says that these things are hard to do in btrfs is beyond lazy or incredibly stupid because its all handled by /etc/fstab.

Snapshotting a volume and all of its children are one command in btrfs. Again I've done it. However it was an accident since I ended up snapshotting the snapshots ( a script error that was easy to fix).

At that point I stopped reading because the author clearly didn't do his or her research, or is outdated. I don't use ZFS regularly, however I also don't go around listing off how many things it can't do, or doesn't do well.

Bcache can work as write caching in the proper Unix sense of "everything has one job and does it well". I have btrfs sitting ontop of bcache, and it works wonderfully and is easy to tune.

I won't argue that in its current state ZFS is superior to btrfs, I said so myself earlier. However I will say that ZFS is not for me, as well as many others. Given that Oracle owns ZFS and is the chief driving force behind btrfs, yet backs btrfs as the future speaks volumes about the worth of the two. In this way, its not just me.

The the three most significant things that btrfs is missing are (1) parity based RAID, (2) per-subvolume RAID levels, and (3) per-subvolume mount options. All three of these are on the table for btrfs, and ZFS is not getting that second one, which admitting is less important than the third.

[edit] Oh yeah, forgot about dedupping. That's also a pretty big one.
 
I'll be updating this thread with new information, as I migrate to a new ZFS AIO server.

Hardware:
Supermicro X9SRL-F motherboard
Xeon E5-2628L v2 - 8-core w/ HT (16 logical cores)
64Gb (8x8Gb) DDR3-RDIMM
HP P410 (1Gb cache, BBU) w/ 2x 300Gb SAS 15k drives - ESXi local storage
1x 80Gb Intel DC S3500 SSD (provisioned to 40Gb) - ESXi cache

~10 VM, including AD, DNS, torrent, MCM, 2x Plex servers, pfsense

NAS VM:
(2) virtual cores (plenty of CPU if not using dedup)
32Gb memory locked (locked memory is a requirement if passing through PCI devices)
(2) passed through Dell H200 flashed as 9211-8i IT mode
(1) 20Gb virtual HD for OmniOS
Array 1 - aka ZFS1 - (6) Crucial M500 240Gb SSD
- (1) Intel DC S3500 SSD (provisioned to ~10Gb) SLOG
Array 2 - aka ZFS2 - (8) WD 2Tb RE4 Enteprise Hard Drive
- 2x 4 drive vdevs

I've learned a few other tricks that I will pass along for the build. I will most likely still be using ESX 5.5
 
As long as the ZFS drives are not changing, it should be a super easy swap. If you aren't reinstalling the virtual machine, you shouldn't even need to export/import!
 
Back