Home Lab and Secondary Server Market Killer
Thinking about the lifecycle of a server for a moment, when it is first sold with an AMD EPYC processor, the fuses are blown, and the server OEM effectively brands the AMD EPYC CPU in its ecosystem. Beyond that chain, we now have challenges. These are where the new security paradigm will introduce new hurdles.
Today, there is a healthy grey market for CPUs. Many of the Intel Xeon Bronze and low-end EPYC 7232P / 7252 system sales are actually because those CPUs can often be removed and a higher-end CPU replaced in the socket at a lower cost than it is offered by OEMs. The reverse is also true that a discount on an entire server may make it viable to order a higher-end SKU and then downgrade the processor. Extra CPUs can be purchased for projects, testing, or advanced stock. Perhaps a partner needs to buy 20 more CPUs to hit a program tier limit, so a partner simply just purchase extras to resell so they can move up a tier. These are some of the ways that even “new” CPUs can hit the secondary market. If any of these CPUs are from say a Dell server that has turned on and one-time-programmable fuse blown, then when that CPU is resold on the grey market, it may not work in systems by other vendors. For a grey market reseller, this is an enormous headache, especially if they do not know about the feature. If you do have a reseller, send them this article and ask them to detail what they are doing about their supply chain.
Beyond the early grey market, decommissioned servers are often repurposed as used systems. These can be for home labs, or simply companies that want to use the lower-cost used gear. Something about that market that is fascinating is that often these systems are not repurposed as a whole. STH’s first colocation web hosting servers were Dell C6100 systems that were repurposed Twitter and Facebook systems. Today’s hyper-scale servers likely would not work in many environments so recycling companies pull CPUs, RAM, and potentially other components that can be used to upgrade or augment other existing systems. That can also happen in enterprise refreshes where recyclers break down enterprise servers to make certain valuable components easier to resell. For security “dumb” processors of yesteryear, this is a normal model. Once these field programmable fuses are blown it gets more complex. Moving a processor to a new system can have a high likelihood of the processor not working.
So I asked Dell EMC about this. Not all vendors are currently using the PSB feature, so by enabling it, it effectively makes a Dell EMC system or systems from other vendors who enable this feature less able to be re-used. That limit of reusability severely curtails useful lifespans of systems, and is, therefore, less green. Here is Dell EMC’s statement to STH:
As you rightly point out, the AMD Platform Secure Boot Feature (PSB) is a mitigation for firmware Advanced Persistent Threats. This allows us to establish an unbroken chain of trust from AMD’s silicon root of trust to our BIOS and then from the BIOS to the OS Bootloader using UEFI secure boot. This provides a very powerful defense against remote and local attackers seeking to embed malware into a platform’s firmware.
We design PowerEdge servers with security built-in as the security of our products is critical to helping ensure our customers’ data and systems are protected. Given the pervasiveness and increasing sophistication of these ongoing persistent threats, we decided to enable the PSB function that AMD makes available. The tradeoff, as you pointed out, is that the CPU would only be able to operate in another Dell EMC PowerEdge server. However, we feel that’s a rather limited use case for how customers look to decommission old equipment and wanted to err on the side of security.
We have a decades-long focus on sustainability, and are fully committed to accelerating the circular economy and diverting all e-waste from landfills (more information here about our efforts and our sustainability goals). We also provide recycling solutions that protect our customers’ data, safeguard their brand reputation, reuse precious materials and responsibly recycle in compliance with local regulatory guidelines, such as the EPA and WEEE legislation and waste regulations. (Source: Dell EMC statement to STH.)
My position is that this feature will lead to more e-waste, but Dell points out it is pro-environment and recycling. We will let our readers think about the implications having heard both sides and draw their own conclusions.
At the end of the day, vendors and their customers are happy to have better security so these are touted features. It is also fairly easy to see how they will impact the secondary markets for used servers, CPUs, either in a professional or even a “home lab” scenario. Imagine the confusion it will cause if the source of CPUs is not tracked down to the level of which firmware is being used. In theory, a vendor servicing multiple hyper-scalers could use the same motherboard, but load different cryptographic signatures for the firmware used by each end customer. Once a CPU is removed from a system after fuses are blown, there is no easy way to visually tell what has previously transpired. That record needs to travel with CPUs which is easy enough, except today’s ecosystem does not track this information.
We have been focusing on AMD EPYC CPUs, but it goes beyond AMD.
Stepping Back: A Look at the Future
It is important to remember that these security features are being implemented by AMD ahead of Intel, but they are being demanded by customers. At some point, Intel will have to match this feature set. Intel’s Management Engine found in its Lewisburg PCH is not as feature-rich, and so Intel will be forced to change. The LBG-R refresh part is already making major moves, but at some point, other modern architectures do not have PCHs so Intel is likely to follow suit here.
HPE told us, for example, they do work with the LBG ME, but the functionality it is using to secure Xeon platforms is not as robust as what AMD is offering. The bottom line is that Intel will need to meet customer demands in this space especially given AMD’s lead. Ice Lake Xeons will see a move in this direction but there is a security roadmap Intel is pursuing for the Xeon line. As a fun piece of trivia knowledge, this means that every Intel Xeon Scalable server has multiple Intel Quark cores inside. If you thought Quark is a dead architecture, it is alive in mainstream server platforms today. Indeed, one could even say that Facebook is deploying Quark servers even in its new Cooper Lake generation.
Outside of x86, IBM POWER10 is making a push for enhanced security, so the will need to have a silicon root of trust to enable their security feature set.
Even Arm offerings are going to need to add these types of security processors. We originally wrote this article with Marvell ThunderX3 before it was canceled (although Marvell OCTEON is still moving in this direction) but we can see Ampere receiving the same security mandates from customers and implementing them in its designs.
Ampere Altra has control processors that handle many of its security features. Assuming vendors want CPUs to validate against signed platform firmware, Ampere will have to do something similar.
Neither the IBM Power10 or Ampere Altra are commercially available to purchase for a site like STH today given where we are in their lifecycles, so we are not going to comment on features of these platforms.
The key takeaway from this section is that we have focused on AMD here because they are a few years ahead of Intel on addressing this requirement. Other vendors to meet hyper-scale, government, and some enterprise requirements will need similar functionality. This is not AMD trying to harm the used market, it is really that the initial customers buying these parts want that functionality.
This started out as being a 500-word post clarifying some comments in the Dell EMC PowerEdge C6525 Review and video. It turned into multiple exchanges with vendors and a few large customers to understand what is going on. Hardware security is a big deal especially given potential state-level threats so those that are driving the server industry are making it a requirement that what they buy has a level of security capability they are comfortable with. Vendors are tailoring their products to those requirements, as they should.
For those who buy or sell either used IT equipment or grey-market equipment, this is a new functionality that needs to be on your mind today with AMD EPYC, but then with other platforms in the future. Intel, for its part, benefits in the secondary market for not having the same security features that make AMD EPYC processors attractive in the primary market. Still, it should be practice to document where any AMD EPYC processors are pulled from so that evidence can be provided to potential buyers since these are invisible changes.
One mitigation would be to add an “un-securable” feature to these CPUs. Once this feature is set, they cannot be used with secure firmware/ platforms. That would allow the secondary market to use these chips knowing they are not secure. A consequence here is that doing so would not help the case where a grey market CPU gets fused un-securable and is sold and used in a secure platform. Perhaps if this was some kind of physical external and visible PCB based fuse it would make it clear to buyers that this was irreversibly set. There are no great options, but the industry will have to explore possibilities in the future.
For STH readers, we want you to be aware of the implications of increased platform security so you can inform your colleagues, suppliers, and customers of the changes. If we learned nothing from the PowerEdge C6525 comments on this subject, it is that the market does not have a great grasp of the features and perhaps as importantly, the implications of increased security.