AMD Milan-X Delivers AMD EPYC Caches to the GB-era

12

AMD Milan-X: Performance and Power Consumption

Here we are going to take a look at two different aspects. First, what the technical computing/ simulation/ HPC community thinks about this. Second, what we have found testing the chips ourselves. Then, we are going to bring it together and simply discuss how you can think about Milan-X for your organization. To me, that is actually the more impactful exercise having seen our data and third-party data.

AMD was pushing the concept of Super-Linear scaling. The basic idea is that if you can scale-out and keep data warm and on cache next to cores, you can get application speedups that exceed just by adding additional CPUs.

AMD EPYC 7003X Milan X Super Linear Scaling
AMD EPYC 7003X Milan X Super Linear Scaling

This is something several ISV’s have shown, but it is not necessarily something that all applications will show, and not even close.

Normally we do not go into partner benchmarks, but the target market for Milan-X is one where we do not do a lot of benchmarking, and the applications are very expensive also require expertise to set up. Since the software vendors are experts in their applications, and these are big applications, we are going to show their thoughts on Milan-X.

Microsoft Azure HPC with AMD Milan-X

Microsoft Azure has a special HPC cloud. Perhaps the most shocking part when we covered AMD Milan-X previously was that Microsoft is taking the existing HBv3 instance hardware and swapping in Milan-X CPUs without changing the instance type. Microsoft saw such huge gains that it is doing a rip-and-replace upgrade of current-generation EPYC 7003 CPUs. That is perhaps the best testimonial to the impact Microsoft and its cloud HPC customers see from the new chips. Also, the F1_racecar_140m model you are going to see a lot of.

Supercomputing With Azure And Milan X Performance
Supercomputing With Azure And Milan X Performance

Here is Azure’s super-linear scaling due to a higher portion of hot data being cached in L3. This is using that racecar Ansys Fluent simulation.

Supercomputing With Azure And Milan X Performance Scaling
Supercomputing With Azure And Milan X Performance Scaling

Here is what the new chips will do to the Azure HBv3 instances. I think Microsoft upgraded its west coast site, then Europe, then east cost and it should all be complete for this launch or soon from what we have heard.

Supercomputing With Azure And Milan X New HBv3 Instance Specs
Supercomputing With Azure And Milan X New HBv3 Instance Specs

Again, up to 1.5GB of L3 cache (across two sockets) per system is awesome. We are in the Gigabyte Era of CPUs.

Siemens Simcenter STAR CCM+ with AMD Milan X

Siemens Simcenter STAR CCM+ is a multi-physics platform that is used in many engineering shops.

Siemens Simcenter STAR CCM+ Multi Physics Platform
Siemens Simcenter STAR CCM+ Multi-Physics Platform

Engineers use it to see how systems will work without having to build physical prototypes.

Siemens Simcenter STAR CCM+ Description
Siemens Simcenter STAR CCM+ Description

Here, on Microsoft Azure, Siemens is seeing super-linear scaling. We will quickly note that this is not an example on Microsoft’s slide above, nor on AMD’s, this is net-new.

Siemens Simcenter STAR CCM+ With AMD Milan X Speedup
Siemens Simcenter STAR CCM+ With AMD Milan X Speedup

We are going to see more of these, but Siemens is talking about the AMD EPYC 7V73X here. That is Microsoft Azure’s custom Milan-X SKU and that is why it is not one of the four SKUs we listed above.

Altair AcuSolve and Radioss with AMD Milan X

Altair application provider in the simulation space. AcuSolve is for problems like simulating airflow.

Altair AcuSolve CFD
Altair AcuSolve CFD

Altair shows that simply flipping the switch to enable 3D V-Cache (it can be disabled in BIOS) adds 5-40% more performance with AcuSolve.

Altair AcuSolve AMD Milan X Performance Boost
Altair AcuSolve AMD Milan X Performance Boost

Bigger gains are with Altair Radioss for crash simulation.

Altair Radioss Crash Simulation
Altair Radioss Crash Simulation

Altair claims 10-80% increased performance.

Altair Radioss Crash Simulation AMD Milan X Performance Boost
Altair Radioss Crash Simulation AMD Milan X Performance Boost

For those wondering “Neon” is a lower complexity model of a Dodge Neon automobile. As a smaller model, more fits into the cache so there is an 80% speedup there. The Ford Taurus example is much larger, so less is being cached which is why that is a 10% speedup with the additional cache.

12 COMMENTS

  1. This is excellent. I’m excited to get a r7525 with these and try them out. I sent this to my boss this morning and he OK’d ordering one so we can do profiling on our VMware servers

  2. @cedric – make sure you order it with all the connectivity you’ll ever want. Dell has been a bunch of [censored] when we’ve opened cases about bog-standard Intel X710 NICs not working correctly in our 7525s. So much for being an open platform.

    Not that I’m bitter.

  3. Now that the 7003x “shipping”, perhaps they can get around to shipping the 7003 in bulk. I’ve got orders nearly 9 months old.

  4. While per-core licensing costs seem to be a consideration for some people, I think this kind of optimisation is only possible because certain proprietary licensing models need updating to account for modern computer hardware. Given the nonlinear scaling between frequency and power consumption, it appears environmentally backwards to base hardware choices on weird software licensing costs rather than performance per watt or something similar that neglects arbitrary licensing constraints.

    On another note, NOAA open sourced their weather forecasting codes a few years ago and WRF (based on models developed by NCAR) has been open source for much longer. I think the benchmark problems associated with these applications would make for an interesting journalistic comparison between new server CPUs with larger cache sizes.

  5. @Eric – Environmentally backwards, possibly, but so often the hardware platform is the cheapest part of the solution – at least in terms of capital costs. I don’t think it’s necessarily unreasonable to optimize for licensing costs when the software can easily dwarf the hardware costs–sometimes by multiple orders of magnitude. To your point though, yes, the long-term operational expense, including power consumption, should be considered as well.

    The move to core-based licensing was largely a response to increasing core counts – per-socket licensing was far more common before cores started reaching the dozen+ level. Hopefully you’re not advocating for a performance/benchmark based licensing model…it’s certainly been done (Oracle).

  6. I find the speedups in compilation a bit underwhelming. My hunch is that the tests are performed the usual way – each file as a separate compilation unit. I work on projects with tens of thousands of C++ files and the build system generates files that contain includes for the several hundred cpp files each and then compiles those.

    When you have a complicated set of header files, just parsing and analyzing the headers takes most of the compilation time. When you bunch lots of source files together you amortize this cost. I guess in such scenario the huge L3 cache would help more than for a regular file-by-file build.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.