When purchasing servers, customers often get a choice of warranty options during the process. Various extended durations and response time features can be offered and can add extensively to the overall cost of a purchase. However, buying an expensive warranty from your chosen server vendor is not the only option available, and if you purchase a server from a smaller vendor, you may not have the SLA you want. In this article, we will explore alternatives. Specifically the notion of self-warrantying which is what many cloud providers and major OEMs do themselves. For many organizations, this approach can yield significant benefits. Most of this article will be focused on hard drives, but a similar methodology can be applied to different components or even entire servers as well.
OEMs and Cloud Providers vs Manufacturer Warranties
When major OEMs like Dell, HPE, or Apple sell you a hard drive, they typically purchase those drives without a warranty from the manufacturer, only taking a guarantee of failure rates. By doing this, they receive discounted pricing and take on the service warranty from the drive vendor. This is the reason why you typically must contact your OEM for warranty support. Even though the drive may say Seagate or WD, it is serviced by the OEM that sold the server. OEMs cover the warranty for the drives and other system components with the markup from the system and by selling you a warranty for the whole server.
There is another aspect to this which is that large OEMs typically have their own tested and validated firmware. This firmware is often tuned by the vendor to work specifically with their own systems and the vendors build warranty assumptions around systems using drives with specific firmware features. Increasingly this firmware and data is being used by OEMs to perform predictive analytics and determine when a device is about to fail so a replacement can be dispatched before the device fails.
Likewise, when a hard drive fails at a major cloud provider, that is often purchasing drives by weight, they do not send each drive back. Instead, this is baked into their purchase agreement and spares are part of the overall discount and volume purchase discussion.
This contrasts with when you acquire drives from retail, where the individual drives have warranties serviced by their manufacturer. Purchasing whitebox storage or servers from some resellers may also rely upon manufacturer warranties, especially with smaller and lower-cost resellers.
Anecdote: Service Contracts Are Not 100% Guaranteed
This falls into the category of an anecdote and represents an extremely unlikely scenario, but it did happen to me personally so I feel like I can tell this story. Approximately 8 years ago, I was operating a couple of 48-bay SANs from a big-name vendor. When those SANs were purchased they came with a 4-hour onsite service contract lasting 5 years. We thought this was a path to never worrying about hardware.
At some point early in their life, a drive failed in the SAN being used as the primary unit, and we rang the vendor for support. They were very willing to help us out, of course, but there was a problem; they simply did not have any drives available. If you remember back to late 2011, flooding in Thailand had a large impact on hard drive availability and our vendor was impacted. As a result, they could not deliver within their 4-hour replacement window; in fact, it took nearly two weeks before a replacement drive arrived. During those two weeks, the degraded SAN suffered multiple additional drive failures which consumed all the internally configured hot-spare disks and began to pose a critical risk to loss of the array. By the time replacement disks started arriving, we were seriously contemplating pilfering disks from the secondary SAN unit to stave off disaster on the primary device. That is not a situation we expected to be in with a 4-hour replacement service contract.
Clearly, this story is unlikely to repeat itself, but it is not impossible to think up a scenario where replacement disks from a vendor might be delayed for one reason or another. This incident was the original genesis for the “buy some spares” purchasing philosophy which led to this article being written. After getting perilously close to losing my primary SAN, we purchased our own cold-spare to avoid ever being in that situation again.
Drive Failure Rate Assumptions
First, it would be a good idea to determine the likelihood of a drive failure in the first place. Obviously the failure rate of a hard drive will vary from model to model, but for the purposes of this article, we are interested in some kind of average. Backblaze, an online backup company operating over 100k hard drives, has been collecting and providing quarterly statistics on their fleet of disks for years now. They have enough drives to provide some real data we can use without relying upon anecdotal evidence. According to their Q3 2019 report (link) the overall Annualized Failure Rate (AFR) across their entire fleet of disks, across the entire lifetime of those disks, is 1.73%.
The hypothetical system we will be considering will have 8x 8TB drives. Assuming each drive has an individual 1.73% chance of failure in any given year, you are looking at around a 13.03% chance that our 8-drive system will suffer a single failure in any given year. Over a 5 year lifetime, there is approximately a 50% chance that at least one disk will fail, and around a 25% chance for two disk failures. Of course, these are just probabilities – throw in some good or bad luck and your experience could be very different. For the purposes of this article, let us assume we have a bit of bad luck and will suffer two drive failures in that five year period.
Author’s note: I have corrected my math on the probability of failures here. Thanks to the comments section for pointing out I’m a bit bad at math!
Drive Cost: System Vendor vs CDW
Let us take a look at Dell EMC PowerEdge T640 hard drive pricing by way of example since they are the largest server vendor in the world at the time of this writing. We are going to use 7.2K rpm 512e SATA hard drives of 8TB capacity to use for our comparison as they are a fairly common size.
Of course, for the big OEMs your warranty does not come from the drives themselves but from the overall warranty on the server you purchase them with, which can add its own costs. Using Dell as an example, our hypothetical system with 8x of their 8TB drives listed above was $9107.95 with a 3-year next-business-day service contract, and moving to a 5-year term brought an increase of $460.90.
If we turn to the retail drives from CDW, a large US IT distributor/ reseller we see very different pricing:
|Seagate Exos 7E8 4TB 7.2K 512n
|Seagate Exos 7E8 8TB 7.2k 512e
|Seagate Exos X14 10TB 7.2k 512e
|Seagate Exos X14 12TB 7.2k 512e
|Seagate Exos X16 14TB 7.2k 512e
|Seagate Exos X16 16TB 7.2k 512e
These drives will not all be an exact model match obviously, but they should be comparable in their capabilities to their OEM brethren. In comparing retail drive costs to Dell (and this usually works for Lenovo and HPE as well), it is less expensive to buy 16TB drives at retail than it is to buy 8TB drives from the big system OEMs.
Our 8TB 512e 7.2K rpm hard drive is $287.99 from CDW and $636.61 from Dell EMC for a delta of $348.62. Or put another way, you can get 2.21 CDW 8TB drives for every 1 Dell EMC drive. This is on a component with a failure rate of only 1.73% per year. At 1.02:1 we would be at a virtual tie, but at 2.21:1, this can be a significant source of cost savings.
Self-Warranty Through Cold Spares
With the standard warranty process for a retail drive, the turnaround time for getting a disk replaced via RMA can take weeks. One way to sidestep this is to simply purchase a cold-spare drive or two when the disks are originally purchased, paying for the drives out of the cost savings achieved by buying retail in the first place. In our example above where 8x 8TB drives were purchased, with Dell that would have cost $5092.88, plus the extra $460.90 to bring the warranty up to 5 years. Buying the retail drives was only $2303.92, resulting in a cost savings of approximately $3250. Put another way, for every Dell drive you purchased with this warranty, you could buy two CDW drives and still save around $1000. That gives you eight cold spares and eight running drives plus the opportunity to RMA drives and keep your spare pile stocked. You would have the drives immediately on-hand in case they are needed.
While this, again, is not for everyone, a 1:1 installed to cold spare ratio is needlessly high. We expect only about a 25% chance that two of the eight drives will fail in five years yet we are provisioning 8 cold spares; a more reasonable general recommendation would be 2 or 3 spare drives.
Another important aspect here is that hard drive prices tend to fall over time. Today’s 16TB $589.99 drive in five years is likely less than half that. STH had an article focusing on the math behind this logic years ago in Internal or External Hard Drives: Are Warranties Worth the Cost? You can see the analysis there using future value discount rates and an even higher AFR (5%) using no-warranty external drives for comparison.
Buying drives up-front as spares helps to protect against events such as the Thailand flooding where buying drives on the open market becomes challenging.
Clear Benefits of Vendor Support
With all the cost difference there are obvious differences in the method where warranties are serviced. Your 3 or 5-year warranty with Dell includes the cost of a technician who will come onsite to replace a faulty disk. Depending on your physical proximity to your server equipment, this convenience can be worth a lot. If your server is in a data center hundreds of miles away, servicing it yourself may not be a viable option or may incur additional charges for remote hands support. For many organizations, this is a key benefit. Big vendor service organizations often have harrowing tales of the extreme places they have replaced hard drives.
Most retail purchased drives are serviced by sending them back to the manufacturer and then receiving a replacement drive in return which can require two trips to the data center or two sets of remote hands. This may seem like a small detail, but it can be important when working with many drives. Dell EMC drives and replacement drives will come with an appropriate sled/ carrier while a retail bare drive will not. One may have to replace the drive in an existing carrier or find another carrier adding to the cost of drive replacement.
For many organizations, this is all that matters and that is why these agreements are so popular.
Self-Warranty Not Just Limited to Hard Drives
This article has been all about hard drives, but the concept of keeping spare equipment around is not limited to disks. Especially as equipment ages or exceeds warranty terms, keeping some spare parts around can be a good idea. Power supplies are another semi-common point of failure and are a relatively inexpensive investment to keep a spare. For 1-2 server installations, having cold spares on hand can seem like a waste. At larger installation sizes the cost becomes relatively small. If you were to look at a hyper-scale data center, they are not waiting for a server vendor to replace a part. Instead, they have spares on hand that their staff can use for replacement. They are buying high-quality but lower-cost servers to ensure this model works.
When it came time to put in new core switches in my datacenter, the decision was made to buy less expensive switches without a service contract, and use the savings to buy an entire extra switch. The extra unit was given a generic configuration and racked in-between our two core switches, ready to take over for either in the case of failure. The same logic can be used to order another server. Spare server capacity immediately available in a rack is invaluable in a failure scenario.
Planning how to configure the servers you buy and whether you buy the extra warranty contract or not is a multi-factor decision and a risk management balancing act. The OEMs offer convenience and relatively quick service at a sometimes-steep cost. Indeed, buying a complete support package is the risk-averse way to make it someone else’s responsibility to deal with failure. For the vast majority of businesses, that is the model they want and frankly, the model vendors such as Dell EMC, HPE, and Lenovo cater to.
For those who are extra cost-conscious or who are averse to the risk similar to what we had during the Thailand flooding where the large vendor could simply not get a drive even under a support contract, then self-warranting can make sense if the structure is in place to replace the drive or other components in the data center. There are a lot of variables in this equation but navigating those variables can help manage risk while potentially offering greater than 50% cost savings.