Ever since the Intel Xeon Scalable Processor family launched, we have been fielding questions asking for comparisons to the AMD EPYC. This week we posted some initial results showing the differences between DDR4-2400 and DDR4-2666 on AMD Infinity Fabric. Expect a larger set of benchmarks over the next few days. The Supermicro test bed we are using we feel is close to a shipping spec.
In the market today, there is a ton of fear, uncertainty, and doubt (aka FUD) that needs to be addressed. Furthermore, there are many commentators that do not have hands-on experience with the platforms or little architectural understanding, and yet are coronating a winner. We have had test systems for some time. At this point, we wanted to share a few myths we are consistently hearing and getting questions on. While we do not do a ton of these pieces, we also do not have the bandwidth to respond to individual requests so this format is easier on the STH team.
Myth 1: AMD EPYC Was Available Before Intel Xeon Scalable
In terms of public launch events, we were at the AMD EPYC launch on June 20, 2017 in Austin, TX. We wrote about how pleasantly surprised we were to see the flourishing AMD EPYC ecosystem. It is clear that the industry is behind AMD EPYC. Before we proceed, we do think AMD EPYC will capture noticeable market share in this generation.
Launches are important. CPUs generally have only five or six quarters on the market until new generations arrive. Launch timing is important because that often sets a clock towards when we can expect a new generation. While AMD “launched” EPYC on June 20, 2017, commercially available systems will arrive in August 2017. Conversely, Intel “launched” Xeon Scalable on July 11, 2017 but in reality, it was shipping significant volume before that date:
While that may not be a full quarter worth of shipments, it is enough to see public cloud deployments, TOP500 supercomputers, and other organizations not just buy but test and deploy systems well before launch. Essentially, Intel’s launch was a product that had been shipping and deployed for months while AMD EPYC was something that would be shipping a half quarter after launch.
I received quite a few comments from folks on why we have not published a full suite of competitive benchmarks yet for AMD EPYC given the fact we have had a test system since before the June 20th launch. One of the biggest reasons is that we have watched AMD performance improve substantially. When we publish numbers, readers see them for months and years. We were the first major publication with a test system, but publishing on a castrated set of pre-production AMD AGESA firmware and DDR4-2400 we deemed to be irresponsible. Here is one glimpse of performance gains we have seen from woefully pre-production firmware to production firmware with DDR4-2666 and production with DDR4-2666.
Those gains are enormous. Our test suite takes many days to run, but it appears as though we have a commercial system (e.g. one you can buy not an AMD test rig) that has usable commercial firmware.
Just being clear, we think that the first Supermicro firmware we are going to say is OK to ship arrived on July 18. This is after a motherboard upgrade (to shipping spec), multiple firmware upgrades, and an upgrade to DDR4-2666 RAM.
AMD EPYC is getting close to where we think the platform is manageable. However, as of July 24 we contacted our Dell EMC, HPE, Lenovo, and Supermicro reps and resellers cash-in-hand and nobody would sell us an EPYC system. Further, we have heard from some of the earlier vendors that we should expect early August 2017 availability of the initial systems.
With its July 24, 2017 earnings announcement, AMD released information that its Enterprise, Embedded and Semi-Custom segment revenues were down slightly compared to the same quarter a year ago. Once AMD EPYC starts shipping we expect this to change drastically as AMD EPYC will sell in the market.
The bottom line is, AMD EPYC is not something we can buy off the shelf today. We did buy an Intel Xeon Silver dual socket machine last week. We have also bought a number of retail Intel Xeon Scalable CPUs at retail for testing.
It seems as though AMD EPYC ship date is sometime after Intel Xeon Scalable. Our sense is that through July 2017, there is no real Intel Scalable v. AMD EPYC because AMD EPYC is not shipping in commercial systems outside of potentially a very limited early ship program. If AMD’s early ship program was comparable to Intel’s 500K units, AMD would have likely doubled or more its last quarter enterprise revenue.
Myth 2: An AMD EPYC Chip is a Single NUMA Node
We have seen this one even from prominent industry analysts. Here is what a dual socket AMD EPYC looks like in Windows and Linux (Ubuntu):
NUMA is not a new concept. In our initial latency testing, there is a significant difference between on die, on package, and cross socket Infinity Fabric latencies. As a result, you want software and OSes to be aware of this.
While we may be moving to a world with multiple chip modules, we need accurate architectural reporting to the software layers.
Myth 3: AMD has more PCIe lanes than Intel Xeon Scalable in Dual Socket
AMD v. Intel in the single socket arena is no contest. AMD has more hands down thanks to its innovative architecture. It is very strange that Intel has not been interested in ramping up PCIe lanes and in recent years has been reducing them in some key product lines. For example, the Intel Xeon E7 chips only had 32x PCIe 3.0 lanes while the E5 lines had 40x lanes. The actual silicon PCIe controllers on the Broadwell-EX die could support extra lanes, Intel made a product definition decision to limit its previous-generation CPUs to 32 instead of 40 lanes. As a result, the four socket Xeon E5 V4 systems had more PCIe lanes than the four socket Xeon E7 V4 systems. Why there are so few lanes on the Intel Xeon Scalable processor family we have no idea. Our speculation would be that Intel set its LGA3647 socket details before NVMe became a major push.
In a dual socket configuration, looking at the spec sheets, you may assume that there are 128 lanes for AMD EPYC and 96 for Intel in a dual socket configuration. That is not entirely true. Intel actually has 64x PCIe 3.0 lanes per CPU, they just are not exposed in most use cases.
On the AMD side, there are 128 lanes, but some are lost to get a usable server platform. Most implementations from vendors we have seen expose 96-112 PCIe 3.0 lanes to add-in devices.
On balance, we do (strongly) prefer AMD’s implementation with EPYC. System vendors can, through a BIOS change, re-allocate SATA lanes to PCIe as an example. However, saying that Intel has a huge effective PCIe lane deficiency in dual socket configuration should have a disclaimer attached at all times. If you want to read more, see our AMD EPYC and Intel Xeon Scalable Architecture Ultimate Deep Dive piece.
When it comes to single socket configurations, AMD has the advantage hands-down which is why we see single socket AMD platforms being touted by major vendors such as HPE.
PCIe is Not Always the PCIe You Expect
On the subject of PCIe lanes, when is a PCIe lane not a PCIe lane? That may sound like a trick question, but it is far from trivial. Serious analysts and journalists need to know the difference as it is of supreme importance to many companies that may be evaluating the AMD EPYC platform.
While at the AMD EPYC launch event we saw a HPE system that fit 24x NVMe devices into a 1U form factor. That is absolutely cool!
When we inspected the server internals, something struck us with the design, the U.2 drives were not hot swap.
We wondered why this might be and asked the HPE representatives presenting the system. At the event, HPE’s response to us was that hyper-scale customers will use fail in place and not hot swap the drives. HPE cited that SSDs do not fail that often anyway.
That still seemed like an answer that could be improved upon and it turns out there is a fairly significant feature missing with the AMD EPYC platforms at present. We have had this confirmed by five different server vendors and it seems to be a known challenge that the server vendors are working around.
At present, AMD EPYC servers do not support industry standard hot plug NVMe
U.2 NVMe SSDs are popular because they have the ability to hot swap much like traditional SAS drives. For data center customers, this means that failed drives can be replaced in the field using hot swap trays, even by unskilled data center remote hands staff without taking systems down.
If one thinks back to the early days of U.2 NVMe SSDs on Intel and ARM platforms, not every server vendor supported NVMe hot swap. It seems as though AMD and its system partners are going through the same support cycle with EPYC. We hear that workarounds are coming, but are still a few months/ quarters away.
While PCIe lanes may be PCIe lanes, the inability to hotplug a NVMe SSD on an AMD EPYC platform may be a non-starter for enterprise software defined storage vendors until this functionality is brought to EPYC.
We have little doubt that this will have some sort of fix for this generation of AMD EPYC. We have had five different vendors confirm that they are aware and are working on workarounds. Until that fix is out, the “holy grail” of the single socket AMD EPYC single socket system that directly connects 24x NVMe drives is still going to be unsuitable for many enterprise customers.
Edit 2017-07-31: AMD requested that we re-iterate that they support hot plug NVMe on EPYC. The above references that there may be OEM decisions to not support the feature. Other OEMs are still working on enabling the feature on their platforms. We have not heard of a shipping EPYC system as of this update that supports hot plug NVMe
Myth 4: AMD EPYC is a Bad Server Platform
Just as we see folks spreading misinformation on the Intel Xeon Scalable platform, the AMD EPYC platform is surprisingly good. Going back to our Myth 1 statement that we have been looking to purchase AMD EPYC, after testing a system we are looking to add AMD EPYC to our hosting infrastructure.
There are applications where AMD EPYC will be very good. For example, in storage servers (especially once NVMe hot plug is available) EPYC can be great. Furthermore, for a web hosting tier or as a VPN server, AMD EPYC is likely to make an excellent solution. Here is an example of preliminary results we are seeing in one of the AMD EPYC’s better workloads:
There is certainly a ton of merit to the platform even though it is decidedly less mature.
We do see the single socket platforms as potential standouts, especially for containerized workloads. Docker and AMD EPYC have been working very well in our initial testing. We have been getting 0.5% variance between Dockerized and bare metal benchmarks which are normal test variations for the workloads.
AMD EPYC is a great product, but there are some uphill battles they are fighting. While AMD systems we expect to be able to purchase in August, the market is deploying Xeon Scalable today. We had a vendor tell us this week that two of our Skylake-SP review systems were being delayed because “we are still trying to meet customer demand for Skylake orders and cannot manufacture systems fast enough.”
At the same time, we have been experiencing the AMD EPYC system go from a product in June that had a lot of rough edges to a product in late July that we gave our stamp of approval to and are willing to deploy in our production cluster. Current challenges like the NVMe hot plug will get fixed because there will be customer demand for the feature and we know several vendors are working on it.
Looking forward to December 2017 and into 2018, we do think that many of these early shipping challenges will be ironed out and we will see more enterprises start buying test systems or clusters. For those enterprises that want an alternative architecture, adding EPYC is a 2-3 on a 10 scale for difficulty while adding ARM or POWER can be a 7-10.
Finally, we have heard requests to add more TCO analysis like we have been doing with our GPU compute DeepLearning10 and DeepLearning11 reference builds and with the Intel Xeon E3-1200 V6 series. We will be heading that request once pricing gets finalized. In the enterprise server space, CPU costs are usually a relatively minor component. Here is an example from DeepLearning11 using a 12-month cost for a server with relatively little RAM and storage and that uses completely open source software and management tools. In 3-5 years, especially with per-core licensed software CPU costs can be negligible.