The AMD EPYC 7742 has 64 cores and 128 threads per socket. As such, it is peerless in today’s market. The Intel Xeon Scalable LGA3647 lineup in the market during the second half of 2019 and much of 2020 has a maximum of 28 cores. For general purpose workloads that do not take advantage of some of the bleeding edge Xeon features, AMD is matching and sometimes beating Intel’s per-core performance. If you want a general-purpose x86 server today that can offer huge consolidation ratios over Intel Xeon E5 V1-V4 and even Xeon Scalable servers, the AMD EPYC 7742 is simply the way to go.
Key stats for the AMD EPYC 7742: 64 cores / 128 threads with a 2.25GHz base clock and 3.4GHz turbo boost. There is 256MB of onboard L3 cache. The CPU features a 225W TDP. One can configure the cTDP up to 240W. These are $6,950 list price parts.
Here is what the lscpu output looks like for an AMD EPYC 7742:
Cores are easy to point to as being more. We wanted to focus on the cache and NUMA advantages of these 64 core chips for a moment.
AMD EPYC 7742 Cache Advantage
256MB of L3 cache is an enormous number in the current x86 ecosystem. The 28 core Intel Xeon Platinum 8280 has 66.5MB of L2 plus L3 cache. L2 plus L3 on the AMD EPYC 7742 is 288MB. With that, the AMD EPYC 7742 has about 4.5MB of L2 + L3 cache per core. Intel Xeon Platinum 8280, 2.375MB of L2 + L3 cache. Having almost double the cache is a big deal.
Another way to look at this is is that if you have 8GB of RAM per core, much like a cloud provider, you would target 512GB per EPYC 7742 and have about 0.5MB of L3 cache per GB of RAM. For Intel, even adding L2 cache, one still has less than 0.13MB of L2+L3 cache per GB of RAM at 512GB per CPU. A big part of AMD’s design is putting massive high-speed caches near compute to minimize requests off-chip. This is an exact parallel to how accelerators like the NVIDIA Tesla V100S, Intel’s next-gen Xe HPC GPU, Intel’s Nervana NNP-T, Intel’s new Agilex FPGAs, Xilinx Versal AI Core ACAP, Habana Labs Gaudi AI training accelerator, the Graphcore C2 IPU, and even the Cerebras Wafer Scale Engine AI chip are designed. Big fast caches, close to the compute resources. Intel’s non-Xeon silicon products are already doing this. The rest of the industry is focused on minimizing DDR4 (or GDDR6 access) using caches.
Moving to chiplets and 7nm allows AMD to do this with yields that Intel cannot get with its current monolithic die designs. In a single die, it is too hard for Intel to yield a 28 core chip with 256MB of L3 cache on 14nm. In future generations, Intel will address this, and AMD will have future iterations as well. AMD is just selling you a more modern processor design today. In 2021, that may change, but the AMD EPYC 7742 has been shipping now for months as we covered in our AMD EPYC 7002 Series Rome Delivers a Knockout piece. Aside from the enormous caches, AMD made improvements to how it fills caches with the new generation as we covered in that article.
Intel is getting some limited traction with the Xeon Platinum 9200 series at up to 56 cores, but that is not socketed meaning Intel supplies both chips and the motherboard. The Xeon Platinum 9200 series systems will not have features such as iDRAC and HPE iLO. One cannot specify one server model and run from a single socket low-end SKU to a dual-socket high core count SKU. Also, the Platinum 9200 uses significantly more power making it a less eco-friendly solution for general-purpose workloads. If you want a socketed platform, you are limited to the Platinum 8200 series today.
AMD EPYC 7742 NUMA Advantage
That has an important impact on another area: NUMA domains. Here is a look at a dual AMD EPYC 7742 server’s topology:
In the EPYC 7601 days (2017), we would see eight NUMA nodes for 64 cores. Now we see one domain for 64 cores or two for 128 cores. With Intel Xeon, technically to get to 64 cores, the only way to do it is with 4x 16 core CPUs in a four-socket server. I did not see a 4x 16 core Intel Xeon topology map in our CMS, but here is a 4x 12 Intel Xeon Scalable topology map from our Supermicro SYS-2049U-TR4 Review:
As you can see, there are more NUMA domains and PCIe devices sit on different sockets which can hammer inter-socket bandwidth over UPI. Quad Xeon servers currently have lower inter-socket bandwidth which adds to that stress. The key takeaway is that to scale cores, we essentially need twice the number of chips and NUMA nodes and a less efficient architecture on the Intel Xeon side than with the AMD EPYC 7742. Even the Platinum 9200 series enumerates each die in a CPU package separately more similar to how the AMD EPYC 7001 series did than the EPYC 7002 series.
In terms of memory capacity, the AMD EPYC 7742 is limited to 4TB of memory in 8 channels or 16 DIMMs per CPU (16x 256GB.) Technically, the Xeon “L” SKUs can handle up to 4.5TB of memory in 6 channels and 12 DIMMs using Optane DCPMM. AMD is pricing to compete with standard Xeon Scalable SKUs that have 1TB of memory support. Intel also offers a premium “M” series that can support 2TB of memory in this generation.
Not everyone needs or wants 64 cores per socket. Some will want Intel’s unique AVX-512 / VNNI features, DCPMM. Most in the market will continue buying what they bought before. AMD has other SKUs in its stack which may work better for organizations (e.g. the AMD EPYC 7702P single-socket part.) Intel has a lot of pricing levers it can use. Still, at the end of the day, if you want 64 x86 cores in a socket, the AMD EPYC 7742 is peerless.
For most of our charts (except the dual-socket virtualization testing) we are using the Tyan Transport SX TS65A-B8036.
- Platform: Tyan Transport SX TS65A-B8036
- CPU: AMD EPYC 7742
- RAM: 8x 32GB Micron DDR4-3200 RDIMMs
- OS SSD: 400GB Intel DC S3700
- Data SSD: 960GB Intel Optane 905P
You are going to see more about this platform, but this is a PCIe Gen4 single-socket platform from Tyan that has 16x front U.2 NVMe bays, 10x front SATA/ SAS bays, two rear 2.5″ SATA OS SSD bays, six expansion slots on risers and an OCP mezzanine slot. All this is achieved using a single AMD EPYC 7002 series CPU.
If you count the PCIe lanes and bandwidth in the platform, Tyan is offering more PCIe I/O than a dual-socket Xeon server in a single socket AMD EPYC 7002 server. We were going to use a dual AMD EPYC 7002 series platform for this review, but the single AMD EPYC 7742 is closer to two Xeons and we have an acceptable, at best, quad Xeon 6200/ 8200 data set.
Also, one may read that you cannot use Optane with AMD EPYC. While it is true that you cannot use the Intel Optane DCPMMs with AMD EPYC, you can use their standard NVMe form factor. Intel Optane NVMe SSDs are excellent for database applications as an example when paired with EPYC.
A Word on Power Consumption
We tested these in a number of configurations. Here is what we found using the AMD EPYC 7742 in this configuration when we pushed the sliders all the way to performance mode and a 225W cTDP:
- Idle Power (Performance Mode): 117W
- STH 70% Load: 299W
- STH 100% Load: 323W
- Maximum Observed Power (Performance Mode): 351W
Note there is a 240W cTDP mode that we are not using here since we are running this series at default cTDP. Depending on the chip, you can get a bin or two of additional performance by running at 240W, just ensure your platform can handle the additional heat.
Next, let us look at our performance benchmarks before getting to market positioning and our final words.