Differences between Hardware RAID, HBAs, and Software RAID
With Windows Home Server 2011 coming out in the near future many less experienced home users are looking into RAID subsystems to create larger storage pools. This is a segment where without a very solid basic understanding of what the technologies involved are, a user can make a purchasing decision detrimental to their machine’s ultimate performance and data security. It is important that a user understands the relative strengths and weaknesses of different RAID philosophies, software RAID, “Fake-RAID”, and hardware RAID.
As an example, a favorite site of mine, HomeServerShow recently reviewed a low cost Fake-RAID solution, the HighPoint 2680 SGL as an alternative to onboard RAID (appears to be the P55′s BD82P55 PCH’s controller from the pictures.) The review did a fairly good job of explaining the merits of an add-in controller such as, portability between motherboard vendors (one can easily migrate ICH9R to ICH10R RAID arrays for example, but not ICH10R to AMD motherboards, this card can transcend motherboard manufacturer differences), additional ports using SFF-8087 connectors, and potentially better RAID rebuild times. What it did not mention is that the the HighPoint’s onboard Marvell 88SE6485 is actually a simple I/O controller, and not a RAID on Chip solution making the HighPoint card a “Fake-RAID” solution.
This has been an article I have been considering doing for a long time, but with Windows Home Server 2011 being a hot topic it is more important than ever. This piece is really a high level overview that will not cover every RAID incarnation out there and cover them in-depth. Rather, this is to help guide the hardware purchasing decisions.
Software RAID (OS/ File system Level)
Generally when one speaks of pure software RAID they mean a controller agnostic RAID platform that does mirroring, striping, and parity calculations using the CPU. Some hybrid solutions, like the Promise C3500 and C5500 based solutions use special embedded Intel Xeon processors with RAID functions built in to allow an OS to perform quicker parity calculations. Those solutions do blur the lines a bit between pure software RAID, but as this is a general primer, I will focus on the common cases.
Common incarnations of software RAID would include the Oracle/ Sun ZFS, Linux’s mdadm, FlexRAID, Drobo BeyondRAID, Lime Technology’s unRAID, Windows Dynamic Disk based-RAID functionality, NetApp’s RAID-DP, and etc. Windows Home Server V1′s Drive Extender was not a RAID 1 implementation, but it utilize the CPU to make stored data redundant as can be attested to by anyone that has been impacted by DEmigrator.exe. For purposes of picking hardware, if one continues use of Windows Home Server V1 Drive Extender, then the software RAID category is probably the place to look for ideas.
One big advantage of software RAID is that it can be hardware agnostic when it comes to migrating drives and arrays. If a server fails, one can move drives to a new system with new HBAs and access data in most cases assuming that the vendor allows migration and the new system’s controllers are compatible. An example of migration not working using software RAID would be if one were to take Drobo drives and place them into another system without the proprietary RAID implementation.
Another major advantage of software RAID is that one can get many advanced features with software RAID, and the feature set may expand over time. ZFS is a great example here as things like de-duplication, L2ARC SSD caching, encryption, and triple parity RAID-Z3. These are really enterprise-class features added in successive ZFS versions.
For software RAID, one wants to purchase simple host bus adapters (HBAs) for use in systems. HBAs perform the simple task of providing the hardware interface so that a drive can be accessed by the underlying operating system. It is best practice not to use RAID controllers with additional RAID logic built-in because one does not want to have three controllers, the drive’s, the RAID controller’s, and the OS all potentially trying to do things like error correction.
From a cost and support perspective, this is an area where LSI excels. HBAs based upon controllers such as the LSI 1068E and SAS2008 can be flashed and used in initiator-target (IT) mode discussed extensively on this site to turn them into simple HBAs. These two controllers are used in literally millions of systems as they are sold by OEMs such as Dell, IBM, Intel, HP, Sun, Supermicro and etc. As a result, driver support is generally excellent and prices are reasonable.
Fake-RAID the Hardware-Software Solution
Users generally refer to “Fake-RAID” when referring to products such as the Intel ICH10R, AMD SB850, and various Marvell products (as another example) where RAID mirroring, striping, and parity calculations occur through software powered by the host system’s CPU. The key here is that, this solution, unlike if it is done at the OS level, is generally tied to a controller type. Although some add-in cards do support arrays spanning multiple controllers, the vast majority limit array size to a single controller type. Controller type is important here because one can generally migrate arrays from one system to another, so long as the new system’s controller is compatible. For example, moving a RAID 1 array from an Intel ICH9R to an ICH10R is a very simple process.
The major advantage of Fake-RAID is simply cost. Intel supports it with Intel Matrix Storage, and AMD has south bridge support too. For most users, especially if using a decent server chip set (or most non-budget conscious consumer motherboards), this is a “free” feature. For RAID 0 and RAID 1, especially using a south bridge/ PCH implementation, Fake-RAID can have solid performance due to high bandwidth, low latency interfaces to the CPU. Another advantage of Fake-RAID is that many implementations can be used by multiple operating systems. For example, one can format a FAT32 volume based on an ICH10R and then change host system operating systems to Linux and utilize the volume. Under software RAID scenarios, such as using ZFS volumes directly by Windows or Linux systems, is at minimum difficult but in most cases impossible.
Two caveats here are that most Fake-RAID solutions, are limited to at most RAID 0, RAID 1, RAID 10, RAID 5, and RAID 50. With modern 2Tb and 3TB drives, double parity protection schemes such as RAID 6 become both practical and arguably necessary over single parity RAID 4 and RAID 5 implementations.
Aside from the lack of double parity options, there is one other major Fake-RAID caveat, Write-back cache (also known as copy back cache) in many applications can be enabled but should be avoided by those building storage servers. Fake-RAID implementations are tied to hardware solutions, but do not have on board DRAM cache. As a result, enabling write-back cache means that data is temporarily stored in main system memory before . While one may think that this is a good thing if they have lots of fast memory, this is not good for data integrity. If power fails in the system, data stored in main memory will be lost. To mitigate this risk, UPS systems and redundant power supplies can be used, however in a major failure, data can still be lost. Without write-back caching, RAID 5 and RAID 50 performance is hindered in situations where there are large numbers of writes. Best practice is to not turn on write-back cache on Fake-RAID controllers
From a controller recommendation perspective, I would argue that the Intel ICH10R/ PCH solutions and AMD SB850 solutions are probably the best bets using RAID 1 or RAID 10 (RAID 0 does not provide redundancy.) Frankly, for years to come both setups will have available, low cost motherboards that can read arrays in a recovery situation. That often limits one to four to six ports of connectivity, but drive counts in excess of that should look to something like a LSI SAS2008 controller in IR (RAID) mode for RAID 1 or 10, or hardware/ software RAID solutions. Both Silicon Image and Marvell make popular controllers that are used in “Fake-RAID” class add-in controllers.
Hardware RAID is usually the most expensive option, but it still provides a lot of value in many applications. Hardware RAID can most easily be thought of as a miniature computer on an expansion board. It generally hascomponents such as its own BIOS, its own management interface, sometimes a web server and NIC (in high-end Areca cards), a CPU (such as the venerable Intel IOP348 or newer chips), onboard ECC DRAM, optional power supplies (battery back up units), drive interfaces, and I/O through the PCIe bus to peripherals (in this case the rest of the PC.) If one wants to understand why many true hardware RAID solutions are expensive, that illustration of how a hardware RAID controller is akin to a PC is probably a good model to keep in mind.
Hardware RAID has some definite advantages. It is usually OS agnostic, so volumes are not specific to an OS/ File system like software RAID. Beyond that, hardware RAID usually has at least options for battery backed or newer capacitor-flash based write caches. These allow for write-back caching to be enabled with the added security of having protection for extended power outages. In battery backed write cache schemes, a battery back up unit (BBU) is connected to the controller and maintains power to the DRAM in the event that power is no longer being supplied to the card. In capacitor-flash based protection schemes, a power outage event for the card will allow the DRAM to transfer contents to NAND storage while the capacitor keeps the NAND and DRAM powered. BBUs typically are spec’d for at least two days of power protection. NAND storage theoretically can provide months of data maintenance. This is not best practice, but as an interesting note one could theoretically pull the plug to a server while data is being written and cached in DRAM and the controller/ drives installed into a new system the next day, and no data will be lost. I have done this on two occasions but I will stress, do not try this unless there is no other option.
Performance wise, hardware RAID is an interesting mix. When new controllers are released, generally they offer higher IOPS, newer PCIe and drive interfaces, faster DRAM, and etc. which have a positive effect on performance. Near the end of controller life cycles, performance is generally not up to par for the newest generation(s) of drives. For example, Dell PERC 5/i with an older Intel IOP333 processor will choke when used with eight “generation 2″ solid state drives. Solid state drives are not the only way to bottle an older controller, many large disks in arrays can cause long rebuild times due to processor speed and the sheer amount of data to be processed.
One important factor is that many vendors offer things like SSD caching with Adaptec maxCache and LSI CacheCade), SSD optimizations, controller fail-over (drives are not the only storage components that fail) and etc. on hardware RAID cards. Many times, these features do guide purchasing decisions.
Probably the two biggest disadvantages to hardware RAID controllers are vendor lock-in and cost. Vendor lock-in involves being able to migrate arrays only to controllers from the same vendor. Product lines can be crossed, for example Adaptec 3805 created arrays can be migrated to Adaptec 5805 controllers if appropriate firmware revisions are used, however those same arrays will not work on Areca RAID controllers. This is especially important when cost is put into perspective. Full featured hardware RAID controllers can cost several hundred dollars with BBUs adding an additional $100 or more for each controller. If a controller fails, a replacement can be an expensive proposition. A disadvantage of some controllers is that RAID arrays cannot span multiple controllers. If this is the case, a solution is limited to the number of drives that can be connected to a single controller. Software RAID usually does not have this limitation.
Right now, the Areca 1880 series and LSI 9260 and LSI 9280 series are probably the top hardware RAID solutions offering 6.0gbps SAS and SATA connectivity and a host of enhancements over previous generations. It should be noted that expensive battery backed hardware RAID solutions are only really required if RAID 5 or RAID 6 are being used. RAID 1 or RAID 10 solutions work fairly well even without expensive hardware RAID controllers and can be acquired relatively inexpensively.
This was a big article to write for an evening, but hopefully it helps people understand the general implications of different RAID implementations. There are a lot of variations in each implementation of the above, so consider this a general overview on the subject. As always, there is more to come in the coming weeks and months on this subject. If anyone sees something that could be more clear, please let me know on the forums. Also, the RAID controller and HBA forums are a great place do discuss different controller options.