Practical RAID Decision-Making

FEBRUARY 26TH, 2015

A truly monumental amount of information abounds in reference to RAID storage systems exploring topics such as risk, performance, capacity, trends, approaches, and more. While the work on this subject is nearly staggering the information can be distilled into a handful of common, practical storage approaches that will cover nearly all use cases. My goal here is to provide a handy guide that will allow a non-storage practitioner to approach RAID decision making in a practical and, most importantly, safe way. For the purposes of this guide we will assume storage projects of no more than twenty five traditional drives (spinning platter drives properly known as Winchester drives.) These drives could be SFF (2.5") or LFF (3.5") commonly, SATA or SAS, consumer or enterprise. We will not tackle solid state drives as these have very different characteristics and require their own guidance. Storage systems larger than roughly twenty five spindles should not work from standard guidance but delve deeper into specific storage needs to ensure proper planning. The guidance here is written for standard systems in 2015. Over the past two decades the common approaches to RAID storage have changed dramatically and while it is not anticipated that the key factors that influence these decisions will change enough in the future to alter these recommendations, it is very possible that they will. Good RAID design of 1998 is very poor RAID design today. The rate of change in the industry has dropped significantly since that time and these recommendations are likely to stand for a very long time, very possibly until spindle-based drive storage is no longer available or at least popular, but like all things predictions are subject to great change.

One Big Array

In general we use what is termed a "One Big Array" approach. That is a single RAID array on which all system and data partitions are created. The need or desire to split our storage into multiple, physical arrays is mostly gone today and should only be done in non-general circumstances. Only in situations where careful study of the storage needs and heavy analysis are being done should we look at array splitting. Array splitting is far more likely to cause harm rather than good. When in doubt, avoid split arrays. The goal of this guide is general rules of thumb to allow any IT Pro to build a safe and reliable storage system. Rules of thumb do not and cannot cover every scenario because exceptions always exist. But the idea here is to cover the vast majority of cases with tried and true approaches that are designed around modern equipment, use cases, and needs while being mindful to err on the side of safety—when a choice is less than ideal it is still safe. None of these choices are at all reckless and at worst they are overly conservative.

RAID 0 for Non-critical data

The first scenario we should consider is if your data does not matter. This may sound like an odd thing to consider but it is a very important scenario. There are many times where data saved to disk is considered ephemeral and does not need to be protected. This is common for reconstructable data such as working space for rendering, intermediary calculation spaces, or caches - situations where spending money to protect data is wasted and it would be acceptable to simply recreate lost data rather than protecting it. This could be a case where downtime is not a problem and data is static or nearly so and rather than spending to reduce downtime we only worry about protecting the data via backup mechanisms so that if an array fails we simply restore the array completely. In these cases the obvious choice is RAID 0. It is very fast, very simple and provides the most cost effective capacity. The only downside of RAID 0 is that it is fragile and provides no protection against data loss in case of drive failure or even an unrecoverable read error (URE, which would cause data corruption the same as a desktop drive faces). It should be noted that an exception to the "One Big Array" approach that would be common is in systems using RAID 0 for data. There would be a very good argument made for a small drive array dedicated to the OS and application data that would be cumbersome to reinstall in case of array loss being kept on RAID 1 and the RAID 0 data array being separate from it. This way recovery could be very rapid rather than needing to completely rebuild the entire system from scratch rather than simply recreating the data. Assuming that we have eliminated cases where the data does not require protection, we will assume for all remaining cases that the data is quite important and we want to protect it at some cost. We will assume that protecting the data as it exists on the live storage is important, generally because we want to avoid downtime or because we want to ensure data integrity because the data on disk is not static and an array failure would also constitute data loss. With this assumption we will continue.

RAID 1 for a two-disk array

If we have an array of only two disks the answer is very simple, we choose RAID 1. There is no other option at this size, so no decision to be made. In theory we should be planning our arrays holistically and not after the number of drives is determined, the number of drives and the type of array chosen should be done together not drives purchased then use determined based on that arbitrary number, but two drive chassis are so common that it is worth mentioning as a case.

RAID 10 for four-disk array

Likewise, with a four drive array the only real choice to consider is RAID 10. There is no need for further evaluation. Simply select RAID 10 and continue.

The case of the three-disk array

An awkward case is a three drive array. It is very, very rare that we are limited to three drives as the only common chassis limited to three drives was the Apple Xserve and this has been off of the market for some time so the need to deal with decision making around three spindle arrays should be extremely unlikely. In cases where we have three drives it is often best to seek guidance but the most common approaches are to add a fourth drive and ergo chose RAID 10 or, if capacity of greater than a single drive's worth is not needed, to put all three drives into a single triple-mirror RAID 1.

RAID 6 and 10 for a five to twenty five-disk array

For all other cases, therefore, we are dealing with five to twenty five drives. Since we have eliminated the situations where RAID 0 and RAID 1 would apply we are left with all common scenarios coming down to RAID 6 and RAID 10, and these constitute the vast majority of cases. Choosing between RAID 6 and RAID 10 becomes the biggest challenge that we will face as we must look solely at our "soft" needs of reliability, performance, and capacity.

RAID 10 advantages

Choosing between RAID 6 and RAID 10 should not be incredibly difficult. RAID 10 is ideal for situations where performance and safety are the priorities. RAID 10 has much faster write performance and is safe regardless of disk type used (low cost consumer disks can still be extremely safe, even in large arrays.) RAID 10 scales well to extremely large sizes, much larger than should be implemented using rules of thumb! RAID 10 is the safest of all choices, it is fast and safe. The obvious downsides are that RAID 10 has less storage capacity from the same disks and is more costly on the basis of capacity. It must be mentioned that RAID 10 can only utilize an even number of disks as disks are added in pairs.

RAID 6 advantages

RAID 6 is generally safe and fast but never as safe or as fast as RAID 10. RAID 6 specifically suffers from write performance so is very poorly suited for workloads such as databases and heavily mixed loads like in large virtualization systems. RAID 6 is cost effective and provides a heavy focus on available capacity compared to RAID 10. When budgets are tight or capacity needs dominate over performance, RAID 6 is an ideal choice. Rarely is the difference in safety between RAID 10 and RAID 6 a concern except in very large systems with consumer-class drives. RAID 6 is subject to additional risk with consumer class drives that RAID 10 is not affected by, which could warrant some concern around reliability in larger RAID 6 systems such as those above roughly 40TB when consumer drives are used.

Conclusion

In the small business space especially, the majority of systems will use RAID 10 simply because arrays rarely need to be larger than four drives. When arrays are larger RAID 6 is the more common choice due to somewhat tight budgets and generally low concern around performance. Both RAID 6 and RAID 10 are safe and effective solutions for nearly all usage scenarios with RAID 10 dominating when performance or extreme reliability are key and RAID 6 dominating when cost and capacity are key. And, of course, when storage needs are highly unique or very large, such as larger than twenty five spindles in an array, remember to leverage a storage consultant as the scenario can easily become complex. Storage is one place where it pays to be extra diligent as so many things depend upon it, mistakes are so easy to make, and the flexibility to change it after the fact is so low. If you’re curious about performance at different RAID levels, check out our article Understanding RAID Performance at Various Levels. Photo credit: John Athayde via Flickr