• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

Storage Megathread - The basics of storage

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

Automata

Destroyer of Empires and Use
Joined
May 15, 2006
This is the storage megathread, which will contain all information regarding non-volatile storage in computers. Storage changes frequently and is a very important part of our computers. This threads aims to stay updated and be a complete reference. If you have information that you'd like to see added, corrected, or changed, please post here or message me directly. I encourage you to post in this thread, either to discuss storage, ask questions, or simply to help others out. Feel free to post pictures of your storage setups! If you want to help out, please see the end of this post.

Disclaimer: The information in this thread is a combination of my own knowledge of storage devices, information freely found online, and information compiled from forums or other posts. I don't have a way to verify every piece of information is completely correct, so there is a chance that something in this post is incorrect. If you see something wrong, please post in this thread or contact me directly so we can discuss. Regardless, the posters of this forum (including myself) have done our best to ensure that the information contained is accurate and non-destructive. With that being said, I have to put an obligatory warning stating that the forum posters, staff, and myself are not responsible for any mistakes contained here or for any issues that may result of using the information.

Table of contents
1. Terminology


2. RAID


3. Common RAID levels


4. Nested RAID levels




Definitions:.

Hard drives:.
A hard drive is a device that stores information on spinning metal platters contained within an enclosure. There can be one or more platters in a single hard drive. Information on the platters is stored in sectors of a fixed size (as of this writing, usually 512 bytes or 4 kilobytes). Each side of the platter has a read-write head that swivels to access information. As the platter spins, the head is pre-emptively placed where the sector will be. When it passes under the head, it can access the information. Most consumer level hard drives spin in the 5400-7200 RPM (revolutions per minute) range, where servers can be up to 15,000 RPM. Common sizes for enclosures are 2.5" (usually found in laptops or small devices) and 3.5".

Solid state drives:.
A solid state drive is a device similar to a hard drive, in that can read and write information. The difference is in how it stores the information: there are no mechanical parts that move. Instead of using spinning platters with a read-write head, solid state drives use non-volatile memory chips. Because of this, solid state drives are capable of far faster random access time. Hard drives are usually in the range of 7-15 ms (milliseconds) to read a file, solid state drives are usually 0.1 ms or less. That is magnitudes faster than a hard drive. For now, solid state drives are expensive when compared with hard drives, but offer a large performance increase.

TLER:.
Time Limited Error Recovery (TLER) is found on all Western Digital hard drives, but there are similar technologies available from other manufacturers and they work in roughly the same manner. This is often confused with the "Green" model's feature that parks heads when there is no activity; this is not related. TLER allows the hard drive to retry reading or writing data when there is an error, up to a specified time limit. How long it tries is limited by the configuration of the hard drive. In consumer level drives (Green/Blue/Black), the limit is set to 90 seconds and in enterprise drives, the limit is 7 seconds. If the time limit is reached, the hard drive will give up and report the error to the disk controller. Using consumer level drives in RAID is generally a bad idea because during the retry, the hard drive is completely unresponsive. In hardware RAID, the controller will wait around 9 seconds before kicking the drive from the array and marking it as degraded. You can see this would be a large issue if the drive is allowed up to 90 seconds to recover. In software RAID, it usually isn't as much of a problem because the software will simply wait until the hard drive returns; this is likely a setting in the software, so be sure to research it if you will use these drives.

MTBF:.
Mean Time Between Failure (MTBF) is the predicted (average) time between failures of a product. If the MTBF of a hard drive is 50,000 hours, the hard drive should last that long. External factors such as manufacturing quality or environment will impact this time. Please note that a higher MTBF does not always mean the hard drive is higher quality than alternatives. Remember this is a number given by the manufacturer.

RAID:.
RAID stands for Redundant Array of Independent/Inexpensive Disks, with the keyword being "redundant". For a history of RAID, see here. All levels of RAID (except for 0) are redundant. It achieves redundancy by using parity or mirroring, which can be spread across multiple disks. If a hard drive fails, information is read from the remaining drives and compared with the parity data to reconstruct the original data. Different levels of RAID have their own pros and cons, so choosing the right level for your system is crucial for good performance and reliability later on.

Parity:.
Parity is used in RAID to provide redundancy, detect discrepancies of data between hard drives, and rebuild data should a hard drive fail. For a more detailed explanation of parity in RAID 5/6, see this post.

Mirroring:.
Used in RAID, mirroring is simply treating two or more hard drives as a single device and writing the same information to both. Drives that are mirrored will contain the exact same information in the exact same way.

Fault tolerance:.
An array's ability to sustain a hard drive failure. The more fault tolerant an array is, the more drives can fail. This usually comes at the cost of hard drive space.

Write hole:.
This can happen when a write operation is interrupted and the system is unable to write all data to the hard drive. Common causes could be a power outage, hardware failure, or stability issues. To prevent this from happening, you should only use write-back cache if there is a battery backup on the RAID controller; otherwise use write-through. Another option is to use a journaled file system. This does not completely eliminate the chance of a write hole, but reduces it by a fair amount; hard drives themselves have caches. To detect incorrect data on the array, you should be doing fairly frequent consistency checks.

BBU:.
A battery backup unit is a device that powers volatile RAM found on RAID controllers. Its job is preventing data loss during a system failure (power outage, hardware failure, or instability) when information is stored on the RAID card and before the data is written to the hard drives. If you have write-back cache enabled, it is highly suggested that you have a BBU.

Write-back cache:.
This option is commonly found on RAID controllers and is used to speed up writing data to the hard drive. This is used in conjunction with the hard drive controller's memory. If the operating system requests that a file be written to the array, the controller accepts the data, stores it in the controller's cache, and immediately tells the operating system that the write is done. In reality, the file is not written to the hard drive. This allows the controller to decide when it is best to actually write the file to the hard drive. A good example is if you were writing thousands of 4 KiB files to the hard drive. Instead of writing each file to the hard drives individually, it saves them up in a cache and writes them in much fewer operations. This can greatly speed up file writes. Since the data is stored in volatile memory until it is written, you should have a BBU to prevent data loss in case of a power failure.

Write-through cache:.
Unlike write-back cache, the hard drive controller will write each file to the hard drives as it is requested. This will be slower than write-back cache, but is safer.

Hot swap:.
The ability to remove a drive from a system without powering the computer down. All RAID controllers should have the ability to hot swap drives, and motherboards should be able to if AHCI is enabled. This feature is useful for RAID arrays since you can remove a disk from the system and put in a replacement without turning the computer off. You can even setup some RAID controllers to start rebuilding when a new disk is inserted in the place of an old one.

AHCI:.
Advanced Host Controller Interface is a mode that can be enabled for SATA controllers. This enables some features for the hard drives, such as hot swap or port multiplication. If you are reinstalling your operating system and are not using RAID, enable this feature. If you are running Windows and want to enable this feature without reinstalling, you will have to do a registry change and install the drivers first.

DAS:.
Direct Attached Storage. As the acronym implies, a DAS is a local disk that you have block-level access to. You have direct access to the file system and can do whatever you want to it. An example would be a hard drive in your desktop computer. When you access a file, the location on the hard drive is looked up in an index table, and the file is read.

NAS:.
Network Attached Storage. This is storage that you access on a file-level basis. Because the requests are done at the file-level ("importantfile.txt"), the client computers do not access the file system directly. This means client computers do not need to understand the file system on the server. They only need to know how to request a file. A common method by which it shares is called SMB (Server Message Block) or CIFS (Common Internet File System). Windows has this built in; all folder shares and network drive mounts use this method. Linux uses Samba for the server and CIFS to mount remote shares. NAS servers or devices are commonly used in home networks.

SAN:.
Storage Area Network. The concept of a SAN is a mixture between DAS and NAS. It is shared out to multiple systems through a network, but unlike NAS, the access is block-level instead of file-level. A SAN is mounted and appears like a local disk. This means the system mounting the device will need to know how to read the file system since it is accessing it directly. Depending on the file system used, multiple computers could access the same share and folder structure. SAN usage is generally limited to business or data centers due to complexity and cost.



RAID topics:
.

"RAID is not a backup":.
A common misunderstanding is that RAID is a form a backup; this is completely false. RAID is designed to prevent data loss when a hard drive fails. A backup is a "snapshot" of data from a specific point in time and it can be restored if something was deleted or lost. If data on a RAID array is updated, overwritten, or deleted, the changes immediately propagate to all the hard drives. By the time you realize the data is gone, it is too late; there is no way to pull the old version. However, you can use backups and RAID in tandem to create a more resilient backup, since it will be guarded against hard drives failures. It is important to know the difference between RAID and backups.

Catastrophic data loss:.
When data is unrecoverable, it is defined as "catastrophic data loss". For example, if you had one hard drive that failed, that is catastrophic data loss. While the goal of RAID is to avoid the loss, it can still happen if too many hard drives fail.

Degraded array:.
If a hard drives fails in an array with parity, it is marked as "degraded". Unlike catastrophic data loss, a degraded array can be recovered using the parity data. You can still read data from the array just as you normally would, but performance will be slow since the RAID controller (or software) will have to reconstruct data from parity as you need it. Unless the circumstance is dire, it is important that you do not read or write information to a degraded array and get it reconstructed as quickly as possible. It isn't that reading or writing will break the array itself, but rather that more hard drive failures may cause catastrophic data loss. Basically, run a degraded array for as short as you possibly can. If you are waiting on a disk replacement, shut the system down.

Hardware RAID: .
A RAID controller that has a dedicated processor (and usually RAM) is a hardware RAID controller. When accessing the array, the hardware controller is tasked with the parity calculations, instead of leaving it to the CPU. If the system is used for other tasks than storage, this frees up the processor. Hardware RAID controllers are generally faster, but with today's hardware, you probably won't notice a difference for home use. Since there is extra hardware, the cost will be higher than a RAID card that uses software RAID. However, you can get features that are not (yet) available for software RAID.

Software RAID:.
As a general definition, software RAID encompasses controllers or software that uses the CPU for calculations instead of a dedicated processor. Most motherboard RAID controllers use the CPU, which makes them software RAID. Many cheap RAID controllers also use software RAID. Features and performance will be limited by the hardware and software used.




Common RAID Levels:.
Not all RAID levels are discussed here because they won't be used in a home environment. For a full listing of RAID levels, see the following articles: Standard RAID levels, Nested RAID levels, and Non-standard RAID levels. In all RAID levels, the disks should be the same size, otherwise you would be limited by the smallest disk in the array. For example, if you had a 500 GB and 1 TB in RAID 0, the array would only be 1 TB in size; the last 500 GB of the larger drive is not accessible. Mixing different model disks is generally a bad idea, as well. Performance wise, you will be limited by the slowest disk in the array. There may be some incompatibility issues, but none have been reported. It is generally a good idea to make sure the drives are the same model, to mitigate those problems.

0:.
RAID 0 doesn't actually fit the definition of "redundant", as found in the acronym. Data is spread into chunks (called "stripes") that are evenly distributed through all the disks. This allows a RAID 0 array to be very fast in reading large amounts of sequential information, because it can tell each disk to load stripes at the same time. Since it is using the disks concurrently, the theoretical throughput is the number of disks multiplied by the speed of each. Meaning, if you have two hard drives in this array and each could do 100 MB/sec, you'd have a theoretical maximum throughput of 200 MB/sec. There are a few downsides, however. The biggest issue is a problem of MTBF. Since there is only one copy of the data spread across both disks, there would be no way to recover any* data if one disk failed. This means our two drive RAID 0 example from earlier would have roughly twice the chance of catastrophic data loss. You will also lose some random access time since the RAID controller has to reconstruct data read from the drives, but the difference is a few percent.

*This isn't 100% true. If the files are smaller or the same size as the stripe, you could technically recover them if they resided on the disk that didn't fail. This shouldn't be relied on, by any means.

1:.
RAID 1 is called "mirroring" and is the simplest of the RAID levels. As the nickname implies, the hard drives are mirrored and contain the exact same data. If one fails, a full copy of the data is on the other hard drive. For read performance, RAID 1 can get near-RAID 0 speeds since it can read different information from both drives at the same time, but this depends on the controller. Write performance is limited to the slowest disk because the information has to be written to all the hard drives. MTBF of the array is more than the individual drives. No matter how many disks are in the array, the array size will always be the size of one disk.

5:.
A RAID 5 array uses block-level striping and distributed parity to prevent data loss and maximize data storage. It will have one disk worth of parity. So, if you have four 1 TB disks in RAID 5, you would have 3 TB usable space. The smallest array is three disks. It is recommended if you have five or more hard drives, that you use RAID 6 since it has more fault tolerance. Read speeds are similar to RAID 0, minus the disk worth of parity. Write speeds will have a penalty since parity data needs to be calculated. Therefore, RAID 5 is good for large amounts of information that don't frequently change, such as backups or media.

6:.
Like RAID 5, RAID 6 uses block-level striping and distributed parity. The only difference is there are two disks worth of parity and the minimum number of disks is four.



Nested RAID levels:.
Basically put, they are RAID arrays of RAID arrays. These are a combination of basic RAID levels to increase performance or fault tolerance. Normally, these are reserved for arrays with a large number of disks or for special situations. The basic RAID levels can be combined in any order and fashion.

10:.
RAID 10 is a RAID 0 array of paired disks in RAID 1. The array will always be half the size of all disks in the array and the minimum number of disks is 4. You must also keep an even number of disks (e.g. you can't have five disks in this array level). Compared to a RAID 5 or 6 array, RAID 10 will have better write performance since there are no parity calculations.

Others: .
Any combination of common RAID levels can be combined to give different effects. For example, a RAID 50 array would give better sequential throughput compared to a RAID 5 array. To get more redundancy than RAID 50, a RAID 51 could be used.






Versions:
1.0 - 07/14/12: Initial post






.
If you are knowledgeable and want to help out, doing research or writing sections is a great way to get involved! Sections will be added as they are written. Please let me know if you have a section you'd like to see added. Here is a list of topics that I'd like to see added:

HDD - RPM, speed, interfaces, (de)fragmentation
SSD - Types (NAND, etc), speed, lifespan (level wearing, etc), (de)fragmentation, TRIM/garbage collection
File systems - FAT32, NTFS, EXT3/4, ZFS, Reiser4, Btrfs, etc
ZFS - Features, operating systems
Interfaces - SATA, eSATA, USB 2/3, Firewire(?), Thunderbolt
Flash drives
NAS - Basics, types, operating systems, reviews (in progress)


In addition, I would like to see a list of all front page articles related to storage. Other lists could include: useful posts/threads, links, pictures, and whatever you can find. This thread is aiming to have as much information as possible.
 
Last edited:
Whoa! This could be really cool, and really huge.

I've actually been toying around with doing something like this for just SSDs. It is just that when I made a list of topics it ended up being intimidatingly huge. It also helped shine a spotlight on all the stuff I don't know about it. Still, I could probably do as much as is written here just talking about NAND.
 
Whoa! This could be really cool, and really huge.

I've actually been toying around with doing something like this for just SSDs. It is just that when I made a list of topics it ended up being intimidatingly huge. It also helped shine a spotlight on all the stuff I don't know about it. Still, I could probably do as much as is written here just talking about NAND.
If that is something you are interested in, that would be great! I can always do research on topics, but I'd rather defer to someone that knows it already. :cool:
 
Whoa! This could be really cool, and really huge.

I've actually been toying around with doing something like this for just SSDs. It is just that when I made a list of topics it ended up being intimidatingly huge. It also helped shine a spotlight on all the stuff I don't know about it. Still, I could probably do as much as is written here just talking about NAND.

Mr. A, if you have the energy, it would be a terrific thread from you. :thup:
 
Since I haven't had time to update or add much, I've stickied the thread.
 
Great post thideras. It's also worth mentioning that even the same size hard drives from a manufacturer that are different models or even revisions may not have the same number of sectors, and therefore you can only RAID to the lowest common denominator. I know of some people who will only use 90-95% of the drive capacity for their RAID in case their back up drives are different models.

Possibilities to add for definitions are NCQ and SSD alignment, especially the latter since I've seen that question asked on a frequent basis, as well as SAS vs SATA.
 
I've recently gone into the deep-end on a 20TB+ Freenas setup at home and am loving it. Would some sort of guide with use cases, basic instructions, etc. be helpful to anyone?

I'm relatively busy at my job during the day but could *try* to write something up.
 
As long as it is related to storage, feel free to do a write up. I mainly created the thread to define terminology or explain concepts.
 
As a topic for discussion (would it be better to make a new thread?), I have a question on drive reliability and people's opinions on it:

1) If creating a RAID array with fault tolerance (ex: RAID 5), is it a bad idea to use drives from the same 'batch'? By batch, I'm suggesting that that multiple drives were manufactured in the same plant on the same day. It seems that 'identical' drives operating under the same conditions, they would all die at the same time.

Similarly, would you consider doing regular drive changes? After 1 year, replace 25% of the drives with new ones. After 2 years, replace a different 25%...

2) What is the largest number of drives/size of the drives is 'safe' to use in a RAID array with fault tolerance? Would you feel safer using:
a) Four 4TB hard drives in a RAID 5 array​
b) Five 3TB hard drives in a RAID 5 array​
c) Six 3TB hard drives in a RAID 6 array​
d) Seven 2TB hard drives in a RAID 5 array​
e) Seven 2TB hard drives in a RAID 6 array​
Assume drive performance (speed and reliability) are similar, and controller performance is similar. I'm curious if different RAID levels require different hardware requirements (eg option 'c' may take signficantly longer to rebuild than option 'b', so the change of data loss is higher with 'c'? Will it take half the time for array 'd' to rebuild compared to 'a', offsetting the the higher probability of a drive failing?).

Also, has anyone played with the standalone prebuilt DAS boxes? For example, the Drobos or a SANS DIGITAL TR4UTBPN 4Bay RAID 5 box? My PERC 5i with 1.5TB drives is getting very full, and I'm looking at very simple[/b] plug-n-play options. I might end up just getting an army of 4TB externals...
 
Questions here are fine and one of the reasons I created the thread. :)

1) I don't see this as much more of a risk if the disks are tested before being put in use (as with any hardware). If the drives are prone to problems, the chances of them failing at exactly the same time is slim. However, if you don't have a spare handy to start rebuilding the array immediately, your chances of failure are going to be higher, but that is the case with any situation. I wouldn't worry about this.

2) Assuming that the drive is immediately replaced and rebuilt, RAID 6 in any of those situations is going to be far superior. However, you want to consider the uncorrectable read error rate of the drives. For example, if the drives are rated for 1 out of every 12 TB to be uncorrectable, you could theoretically have a RAID 5 array with 10 TB raw disk space and be ok during a rebuild, but you are pushing it. Having RAID 6 means you still have a way of verifying data when it is rebuilding. So if you do get an uncorrectable error, it can recover. With RAID 5, you're screwed.

Assuming the controller isn't garbage and the disks are not bothered (you aren't using the array) while rebuilding, it should rebuild in about the same time it takes to do full read of the entire disk. Adding thrashing or a slow controller is going to take a lot longer.
 
Thanks for the info!

In the future, do you have plans on consolidating information like "how to test a new hard drive" and "what are acceptable drive parameters/temps/SMART info" and "here are a list of RAID controllers and average prices paid for them"? ;)
 
I could if I get the time to write them, but I'm hoping that people would contribute so I don't have to write the whole thing. :)
 
This thread is just amazing! :attn: I can't believe I never seen this. Thank you thideras. :salute:
 
Back