Supermicro SYS-2049U-TR4 Topology
One area that we are keenly aware of today, and will be increasingly so with future multi-chip packages, is system topology. The Supermicro SYS-2049U-TR4 has one of the most complex topologies available in current systems since it is a four-socket solution.
One can see that we have 384GB per CPU giving us a total of 1.5TB of RAM. One can also see that in our test system shown above, we have devices using PCIe lanes from every CPU except one, but that is because we did not use any cards in that CPUs x16 slots. The net impact of this is that one will have to cross the UPI bus if a PCIe device or RAM is on a different NUMA node. Perhaps the larger impact is that we suggest using no less than four CPUs in these systems so that all PCIe lanes are available.
Supermicro SYS-2049U-TR4 Bandwith and Latency
Using Intel MLC, we have the idle latencies between the different NUMA nodes using the Intel Xeon Platinum 8158 4P configuration.
In general, we would expect higher core count chips to have slightly higher latency due to the larger die being used.
Here is memory bandwidth between nodes:
Here is the cache-to-cache transfer latency:
One of the biggest reasons we hear some companies move away from quad-socket servers is the complexity that the NUMA nodes create. In this generation, Intel did a relatively good job of managing that with the UPI setup, but as we move to faster and larger PCIe complexes, a single UPI link between nodes is not going to be enough.
Next, we are going to look at the performance of the Supermicro SYS-2049U-TR4 before moving to power consumption and our final thoughts.