HPE The Machine reaches 160TB Using Cavium ThunderX2

4
HPE The Machine May 2017
HPE The Machine May 2017

Today HPE The Machine saw some additional disclosures. For those not familiar with the project, The Machine is HPE’s vision for the future of computing. With the new infrastructure, HPE hopes to move more data into memory and then perform in-memory computation for faster big data analytics. Today HPE came out with two intriguing insights: a 160TB memory capacity and Cavium ThunderX2.

New HPE The Machine Technical Specifications

From HPE’s tech specs on the new prototype:

  • 160 TB of shared memory spread across 40 physical nodes, interconnected using a high-performance fabric protocol.
  • An optimized Linux-based operating system (OS) running on ThunderX2, Cavium’s flagship second generation dual socket capable ARMv8-A workload optimized System on a Chip.
  • Photonics/Optical communication links, including the new X1 photonics module, are online and operational.
  • Software programming tools designed to take advantage of abundant persistent memory.

(Source: HPE)

Commentary on HPE’s New Disclosures

At STH, our Editor-in-Chief Patrick is a big believer in Cavium’s architecture. He calls it “the first usable ARM architecture for general purpose data center compute.” A major point he has is that Cavium was the first to market multi-socket ARM systems with usable memory bandwidth and capacity. You can see more of his published thoughts and performance on ThunderX (1) here. The Cavium ThunderX2 STH covered in the context of the Microsoft OCP platform.

The 160TB of memory capacity is clearly spread among multiple nodes. Using the optical communication links, HPE is able to stitch memory capacity into high figures. From the release: “HPE expects the architecture could easily scale to an exabyte-scale single-memory system and, beyond that, to a nearly-limitless pool of memory—4,096 yottabytes.” Being able to store active data in persistent memory is where the industry is heading. Intel Optane memory is another clear step in this direction. Persistent memory does create challenges for software designers over the current paradigm so HPE is developing tools to address those challenges.

We are still some time away from the commercial release and introduction of The Machine, but what HPE has going looks promising.

4 COMMENTS

  1. It would be great to see some Thunder X2 benchmarks here at STH. X1 was already very promising.

  2. So, any mention of what type of memory “The Machine” uses? I have not heard much about their effort in memristors for years now (and not from their partners either, not even PCM), so are they using NVDIMM:s currently?

    So if they actually are using the NVDIMM:s then what is really the difference between “The Machine” and a cluster of servers? The difference seems to only be the fabric and software that creates a unified memory pool?

    So “The Machine”, in its current state, is closer to a regular server with a custom fabric and interconnects (since it uses aggregated nodes just like all other servers) than what the original “The Machine” was touted to be when it was first announced years ago.

    Also, a second point, since it is aggregated nodes that creates a shared memory structure, what about node or even rack failures? Is everything stored in memory with 1+n copies or does it use some sort of reed solomon code/algorithm to guard itself from failure modes?

    Sorry to sound like a huge complainer, i just find it interesting and i am curious how they have solved some of the stuff.

  3. I was at HP Discover last year and spent a bit of time talking to the team there about The Machine. At least at that time, the emphasis was on memory centric and interconnect. CPU and GPU were secondary and just bolt ons. The Machine is agnostic to CPU or memory technologies. Also, without a doubt, they were using existing tech (like DRAM) for the time being, to allow them to make progress on the software and photonics interconnect. As new memory options become available, they are just swapped in. I think the point he made to me was something along the times of “don’t worry about what type of memory/CPU/GPU it is using, otherwise you are missing what is important/exciting about The Machine”

  4. Sounds like a somewhat extended NUMA system where you can not only access memory from other CPus but also from other nodes in the cluster over the optical interconnect (somewhat like RDMA).

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.