As part of our reviews, we always highlight one server attribute: hot-swap fans. Fans are a hot topic because of a few notable characteristics. Modern fans, along with heatsink and other design elements, dictate the cooling capacity and the ability of the server to handle hot components. They can also have a notable impact on environmental attributes such as power consumption, vibration, and acoustics. A common question we would receive years ago was whether fans are hot-swappable in a server. This became such a prevalent question that it is one we regularly address in our reviews.
Over the past few months, I have been casually asking representatives from every major server vendor about fans. I have asked major enterprise customers as well as a few folks that work on servers from the “Super 7” hyperscalers. Though these discussions, I think it is worth starting a conversation on fans, and whether we still need hot-swap fans.
Why Hot-Swap Fans Matter
Hot-swap fans are excellent insofar as they enable a few different scenarios. Primarily, if a fan fails, one can replace the fan easily. That statement is usually true, but there are more factors. Fans are usually found in the midplanes of servers, such as this Lenovo example from our Lenovo ThinkSystem SR650 2U Server Review:
One of the key factors there is that one needs to pull the chassis out of the rack, then open a cover, then replace a fan. Meanwhile, well-designed servers should be able to continue operating, but that is not always a given.
Great care has gone into designing different handles and connectors to make hot-swap possible, but one still needs to physically move a server to access the fan partition. If you do not have proper cable lengths, and/ or cable arms, you may end up with a hot-swap fan in a chassis that cannot be practically serviced while fully operational.
Other examples of why one may want to swap a fan can be for cleaning or if a server is being upgraded, higher-performance fans can be installed easily. These are corner cases and if one is upgrading CPUs or PCIe cards, the server is powered off anyway, making hot-swap unnecessary.
One thing is for certain, hot-swap fans to make servicing easier. Especially in denser fan configurations, having the ability to quickly remove a fan, often in one hand and with little effort, can mean greatly decreased service times. When a fan does fail, a replacement fan can be swapped out in under a minute if the rack environment is designed for hot-swap servicing.
The question is: do we still need hot-swap fans?
A Case Against Hot-Swap Fans
Although nice carriers make one excited for the mechanical design aspects, things have changed. Fans are one of the few moving parts, if not the only moving parts in modern servers with all-flash storage. They are indeed the only parts where a motor can stop working in today’s $100,000+ servers.
Historically, fans were one of the parts that failed frequently, but this has changed. Electric motors and fans built around them are now extremely reliable. Advancements in materials and manufacturing have led to excellent reliability, especially outside of external aggregators such as debris getting into the rotating portion.
Just because a fan is not hot-swappable does not make it necessarily difficult to service. The above HPE ProLiant DL20 Gen10 fan takes seconds to replace. Likewise, the Dell Networking X4012 fan below takes longer to get to via chassis screws than the minute or so it takes to swap.
In my travels asking people who make servers and who operate large data centers, fans have become very reliable. For example, one hyperscale datacenter tech told me that they keep no more than a few fans onsite as spares because they rarely fail.
We are seeing server designs like the HPE ProLiant DL325 Gen10, and even the Supermicro BigTwin we reviewed (see picture above) forgo hot-swap fans. When I ask if this configuration has been a problem on the service side, the answer is “no, fans rarely fail.” Indeed, some customers prefer their 2U4N platforms, like the BigTwin nodes shown above, to have fans on the nodes. This is because it is easier to service an entire node, including fans than to move the chassis out of the rack far enough to reach midplane fans.
Normally if I ask a question like this to 20-30 people, I get at least one outlier. At a minimum, I get a great anecdotal story of one customer one time with a machine. As I asked around, I never got that anecdote. Instead, the consensus is that fans rarely fail in servers. What that says, by extension, is that fans over the past few generations have been very reliable since we have not heard stories even dating back to the Xeon E5 days.
Adding carriers and special connectors to both the fans and the motherboards may not be the largest cost per node. Today, the engineering for hot-swap fans is excellent. Designs from major vendors are easy to operate and work smoothly. The cost of mechanical design has been borne by generations of servers.
Ten years ago, when we started STH, hot-swap fans were a big deal. Not every vendor had a great design. There was an imminent risk of fans failing to the point our readers wanted us to highlight fans failing in our reviews. Now the questions we get are the opposite, are hot-swap fans still needed?
Today, the purpose of this piece is to ask whether the world has changed. Do we still need to highlight hot-swap fans in reviews? Should we consider them in our Design rating category? The more I got into this in informal interviews, since nobody wanted to share hard data, the more I have become of the opinion that hot swap does not necessarily matter. Instead, fans need to be easy to service. The standard 4-pin PWM connector for fans works well, but one also needs a system to manage fan cables if going that route. Perhaps the answer is that we do need hot-swap fans to aid robots in server manufacturing instead of for field service.
Personally, I think we need a new standard for fan form factors so server manufacturers can easily automate their installation, and fan manufacturers can standardize production. Fans last a long time, and are one of the most reusable parts in servers, especially given their reliability. Moving to a standard across servers that is easy to install and service will help the industry become more sustainable in the longer term.
Of course, if you have thoughts, feel free to discuss below, or in our forums which are a better venue for this discussion. I am sure folks have anecdotes with fan failures in modern servers or opinions on how to evolve cooling technology.