Deploying AMD Instead of Arm in our Infrastructure 2025 Here is Why

9

Feature Parity with Cloud Offerings

Hybrid-multi-cloud is a hot topic these days, and will continue to be one. The idea of having on-prem infrastructure to lower costs and then additional capacity and capabilities from cloud providers is one that is going to continue growing in importance. As new AI capabilities roll-out, leveraging multiple cloud providers will be important. At the same time, even those AI applications can have payback periods of under 12 months bringing them on-prem (or realistically into colocation.)

MikroTik CRS520 4XS 16XQ RM Annapurna Labs CPU Heatsink
MikroTik CRS520 4XS 16XQ RM Annapurna Labs CPU Heatsink

Still, hitting feature parity is a really weird topic. Amazon has its own Graviton chips that are only found in its cloud. Companies like Oracle have their Ampere Altra and AmpereOne instances. Some other providers have a mix of Ampere Altra and custom Arm processors. Each option has very different capabilities. If you have heavy floating point applications, those are not Ampere’s design optimization point. If you want to have simple features many take for granted like nested virtualization, then you do not want to run on an Ampere Altra (Max) platform.

Cloud providers say that these chips cost less, but that somewhat side-steps the way hyper-scalers negotiate pricing. Hyper-scale clients are so sophisticated they can take silicon die area, build models with yield and cost to fab the silicon, add costs for things like packaging, and come up with a cost to produce a chip. They then will give a margin to silicon providers, and that is what they buy at. Compare this to high list price high discount in enterprise sales, and one can imagine why hyper-scalers usually get great pricing. On the other hand, in building these models, the only difference might end up being the margin percentage building chips themselves versus what they will accept from other providers.

AMD EPYC 9005 SKU Stack 2024 10
AMD EPYC 9005 SKU Stack 2024 10

Realistically, the discounts we see in hyper-scale cloud pricing for Arm processors are used for a different purpose. Baked into cloud instance pricing is not just the price of the chips. Instead, cloud providers know there is an attach rate with every instance to other services. If a web application is build in a cloud using a compute instance, it often has storage attached, backup storage, cloud egress bandwidth, and so forth. So getting a compute instance means a cloud provider can sell more services around that instance.

5th Gen Intel Xeon SKU List
5th Gen Intel Xeon SKU List

That brings us to another benefit for cloud providers. Absent any real enterprise hardware for companies to repatriate instances there really is no legitimate way to download an instance image and turn it on in a server that you bought from a major vendor on prem. If you are on x86, this is less of an issue since there are plenty of options to run on-prem. Cloud providers know this, so Arm becomes a Hotel California proposition. Indeed, if you are running on AWS Graviton, then you might be able to get an image running on an AmpereOne or Altra (Max) instance, but performance varies to the point that you might have to spend time analyzing that. This is different from the x86 side where you can simply buy the generation of server (or a newer one) than the cloud provider is running.

Intel Xeon 6780E Sierra Forest SP And Xeon 6781P Granite Rapids SP 1
Intel Xeon 6780E Sierra Forest SP And Xeon 6781P Granite Rapids SP 1

Where the x86 side is like this, in a way, is with the new Intel E-core CPUs since if you have a heavy FP workload, then moving from a P-core instance to an E-core instance you might have an application that will work, but performance may be very different. STH readers might throw a flag here since Intel has two types of P-cores in Xeon 6 as well with the Xeon 6700P and 6900P (and Xeon 6 SoC) and then another P-core that does not support AVX-512. At the same time, you can go buy servers with all of those options so it is a bit different.

Software Support

Software wise, the world is divided into “cloud-native” and many licensed packages. We started reviewing Arm servers when they became usable with the original Cavium ThunderX (now Marvell) servers in April 2016. If you had asked me in 2016, I would have said I thought Arm would make it into the mainstream almost a decade later. In the software world, there are two strong and different categories. Cloud-native and enterprise.

On the cloud-native side, if you want to run WordPress for example on an Arm application stack, then it is really easy these days. Indeed, with containers and the maturity of these applications on Arm, then life is pretty easy. On the other hand, in the enterprise, the push is just not there. It somewhat makes sense.

If enterprises cannot buy Arm servers, then they cannot deploy them. If there is not an installed base that is Arm, then existing applications will not run on Arm. With neither a short-term Arm server answer nor an installed base, there is not a huge push behind it.

It is a cycle. If you are an ISV looking at which architectures to support, x86 is a must because that is the majority of the market. Looking beyond that, it is hard to get to get excited about porting and supporting software on Arm or RISC-V for that matter beyond IoT and edge use cases. Putting it in a different light, if low volume, harder to source platforms were easy to port and support, then IBM POWER would likely be a strong #2 in the market since that is a well-known architecture, with a stable customer base accustomed to spending money. Still, a huge library of enterprise software is out there that ISVs do not support on POWER. Arm has the advantage of volume over POWER, but there are parallels that are hard to ignore.

That cycle is vicious. Without software support beyond cloud-native applications, then why would I ask my server OEM to make and sell me an Arm server? Without those servers deployed, why would an ISV care about supporting Arm? The answer might be a vendor-lock in with a cloud provider or NVIDIA’s push to sell full stack solutions like IBM Z. In either case, with almost a decade using Arm servers, hearing OEMs and customers in the market with them, and watching the dynamics play out, I have become less confident that this cycle fixes itself because of the pull of potentially saving a few watts serving web pages.

Licensing is Hard

That brings me to my biggest point, and that is licensing. Arm server vendors are happy to talk about cloud native applications because usually they do not have a license fee associated. If they do, many of those businesses are run on a supported node basis or something like that.

Let us however, say that you are an enterprise and, as many do, have Microsoft Windows Server. For this hypothetical let us say that you could get supported Windows Server on Arm version for on-prem. Then you would need to license that. Currently, that is licensed on a per-core basis. Features like SMT and maximum performance per core are much better for per physical core licensed products like we went into in our recent virtualization piece. If you are paying on a per-core basis, then most would strongly prefer one SMT core that performs as well as two or more lower power cores.

AMD Pensando Elba DPU 100G And 25G 2
AMD Pensando Elba DPU 100G And 25G 2

That problem is not just limited to Microsoft’s licensing. Take VMware for example. We showed VMware ESXio running on the AMD Pensando DPU more than two years ago now. While that is a supported model, running VMware on Arm is still mostly a fling. Nick had a piece for STH in 2020 on running VMware’s fling on a Raspberry Pi. In 2021, Tom Fenton and my book on running VMware ESXi on Arm and a Raspberry Pi came out (See: Running ESXi on a Raspberry Pi.) In June 2025 when I am writing this, good luck trying to get that running for production use on an Arm server. Given Broadcom and VMware’s changes to licensing, it would be hard to have any desire to run VMware on Arm just given the cost.

AMD Pensando DSC2 100 100G 2P QSFP56 DPU VMware ESXio 8.0.1 Enable ESXi Shell
AMD Pensando DSC2 100 100G 2P QSFP56 DPU VMware ESXio 8.0.1 Enable ESXi Shell

With basic low-level components not having licenses that are effectively useful with Arm hardware in the market, higher-level applications also do not fit. Would you want to license CFD software on a per core basis if each core was not the fastest it could be? Even per-socket licensing would still fall in favor of x86 with the AMD EPYC 9005 Turin hitting 192 cores and 384 threads. For ISVs, they would need to create different licensing schemes around the per-core performance to make it attractive. That is a slippery slope when you have hyper-scalers offering everything from older Ampere Altra Arm cores to newer custom cores.

VMware VSphere Client With NVIDIA BlueField 2 DPU And ESXi 8.0 Host New VM UPT Activated Post Driver
VMware VSphere Client With NVIDIA BlueField 2 DPU And ESXi 8.0 Host New VM UPT Activated Post Driver

If you did have per core or per socket licensing that worked with Arm applications that worked on a server, then the next challenge arises: what would you do with it? Would you use the same license on lower performance per socket or per core Arm or higher performance per socket and core EPYC (and in some cases Xeon?) Better yet, if you run a licensed application on VMware ESXi, then you have to figure out layers of licensing and where to place workloads between x86 and Arm servers. If you have not had a nightmare about licensing yet, then apologies if you have one after reading and pondering that point.

Across those five vectors, it is simply hard for enterprises to deploy Arm servers in 2025. That brings us to our decision.

9 COMMENTS

  1. While software is key, it’s an interesting observation that current ARM hardware is not attractive enough to motivate further software development.

    ARM had its chance when the Raspberry Pi craze put a small ARM development system on every software engineer’s shelf while Fujitsu and Nvidia started building systems with competitive performance. Unfortunately Nvidia’s bid to take over ARM with a well capitalised development team was rejected on political grounds, AmpereOne underperformed, the Raspberry Pi craze faded and ARM sued Qualcomm for breach of license. Given how ARM intellectual property appears impossible to sell, the only recourse was for SoftBank to purchase Ampere. The above chaos suggests an uncertain future and missed opportunity for ARM.

    On the other hand, IBM Power has no entry level hardware, no new customers and as a result few independent developers. It’s possible OpenPower will lead to cost competitive hardware ahead of RISC-V. It’s also possible neither will succeed and Loongson with LoongArch will emerge as the next dominant architecture.

    Yesterday the enterprise solution was System z, today it is x86 and tomorrow has not yet arrived.

  2. If ARM were in the NVDA stable as obnoxious as it sounds, it would have a brighter future, forcing the software dev in the pursuit of AI. ARM has for decades chased efficiency rather than raw performance. And currently it’s close on performance, close enough it would take off if it weren’t for per-core pricing and migration headaches between architectures. The answer from ARM’s stable should be – extension of the arch for performance gains. That – means bigger silicon, losing the efficiency that got it in mobile splitting the designs between mobile and power.

    LoongArch/Loongson would have an even larger up-hill battle in the enterprise for adoption even more so in the ‘west’, having all the caveates of ARM, and RISCV but also the fallout of political/tariff issues as well. Apple’s ARM arch will continue to be the prime competitor from an architectural standpoint. Compiler builds per arch probably, pits Apple ARM M chips vs x86, nothing else comes close today.

    I really like RISCV but it’s open nature will mean fragmentation of designs. I don’t think it would be adopted server-side any better than ARM and probably worse.

    Also, RPi’s are everywhere still, and I don’t think they are moving to RISCV anytime soon.

  3. I think most arguments are not relevant. Regarding software, the Linux stack with its open software in the distribution’s repository overwelmingly runs on ARM, which covers the most common server use cases.
    Things like nested virtualization, proprietry software (Oracle, etc.) exist but today are not comprising the majority of use cases.

    My argument is that the thing that matters most is total cost per performance unit. On AWS, ARM is slowly eating into the marketshare, currently at 25% of total and rising.
    I don’t see this trend changing anytime soon, and other hyperscalers will follow. people/companies self-hosting/colo-hosting these days are not early adopters and will follow over time.

  4. WTF do you need all that computing muscle for ?
    Do you have a massive operation and STH is just hobby for you guys or what ?

  5. @Patrick Do you have any idea why there are no Tower Servers with EPYCs from any of the major OEMs? (Dell, Lenovo, HPe)? The Tower server market is a bit niche but it’s also a very useful option when you don’t have dedicated server rooms or cabinets. Unfortunately, all of these options are intel-Only. Any ideas?

  6. That’s AWS. They’re discounting to lock companies into their cloud. If you’re running enterprise IT you’re running on x86 today b/c you’ve got many apps that don’t run on Arm. Outside of AI spend, the hip thing to do today is to move off of cloud into colo. Companies that are still cloud only are weak IT departments that don’t have the skills to do it themselves because they’ve got weak CIOs. I work at a large F500 company, and our ROI for moving workload off the public cloud was under 7 quarters. The workloads we moved off were the result of a trend following our previous CIO who wanted to sound like they were doing something on trend, but they were just putting IT on autopilot without adding skills to our team.

    Public cloud is great if you need burst, or if you need so fast you can’t do it yourself yet. If you can, then it isn’t just about the instance pricing and it’s a lot more expensive once they’ve gotten you locked into their platform.

    I have over 200 people working for me. If one of them stood by, we need to add Graviton because it’s cool, I’d coach them to find a new job.

  7. For enterprise IT with established on-premise datacenters, hybrid cloud (whatever that means) is the sensible approach. For me hybrid cloud implies the same or similar infrastructure in the cloud is also available locally and provides flexible resilience as well as a lever when negotiating both on and off premise prices.

    As discussed in the article, ARM is not great for hybrid cloud strategy because on-premise Altra and AmpereOne servers are slower than Amazon Graviton and Microsoft Cobalt. As also mentioned, since it’s difficult to migrate valuable legacy software to ARM an enterprise with existing datacenters ends up with a long-term combination of x86 and ARM systems–yuck.

    For IBM shops the problem is reversed. Hybrid cloud is difficult because the major cloud providers–Amazon, Azure and Oracle–do not provide Power and System z instances. Given Amazon’s attempt to capture HPC and AI workloads, I’m somewhat surprised they haven’t sought traditional IBM workloads.

    I also wonder what Serve the Home does with all their servers when not evaluating them for a journalistic review. Practical use provides important insight and that’s what this article is about.

    While likely just a brainstorm, an independent test bed available for companies to compare competing hardware would be really useful and Serve the Home has the stuff to do that. It’s another level to securely give people access to run their own tests, but doing so would illustrate additional aspects the review hardware.

  8. > “… there really is no legitimate way to download an instance image and turn it on in a server that you bought from a major vendor on prem. … but performance varies to the point that you might have to spend time analyzing that.”.

    Easy to say: Just use “dd”, VMWare migration, or Hashicorp Packer. Slightly harder to do: Practice makes perfect. It’s not just the CPU (and this applies to x86 too), they’ve got the connectivity (and bigger pipes), more hardware in many cities to failover to (which you can do from the home or office; but monthly fee), and people 24 hours, and you can reconfigure or move quickly and scale-up huge for an event – all things difficult to do from the office.

    It’s never one thing, one thing is the best. It’s frequently several things all work together extremely well, maybe almost perfectly, even if a few of those things aren’t the ‘best’ (and x86 isn’t far from it for most people), it’s that all the things just work; there’s no tripping point or wall or unexpected goal post movement.

  9. @Vincent S. Cojot: The Lenovo ST45 V3 Tower Server features the AMD EPYC 4000 series CPU’s. As you call out, the tower form factor is ideally suited for deployments where you don’t have dedicated server room or cabinet. It is a compact tower and currently supports up to 12 cores (16 cores that can optimize Windows Server licensing coming soon)

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.