Server

Interview with Alan Chang of Inspur on OCP Regional Summit 2019

September 29, 2019

At last week’s OCP Regional Summit 2019 in Amsterdam, Inspur announced a new set of products, solutions, and initiatives. I had the opportunity to sit down with Alan Chang, deputy GM for Inspur’s server business about the new announcements. Some of them, STH already covered in our Inspur OAM UBB Sets New Accelerator Platform Standard.

Over the past few years, Inspur has become increasingly active in the Open Compute Project. This coincides with many of its customers, including Baidu, also moving to OCP. I wanted to give some insights beyond just the press release on both what Inspur announced. Importantly, why Inspur is investing in these key areas. About two months ago I had the chance to do a joint interview with CTO’s John Hu at Inspur and Bill Carter at OCP. You can read more about that here: STH Interview with Bill Carter of OCP and John Hu of Inspur.

Interview with Alan Chang of Inspur

Patrick Kennedy (PK): Could you give our readers a bit on your background?

Alan Chang (AC): Sure. I’m the deputy GM for the server business at Inspur. I’ve been working in servers for the last 10 years. I have worked in a range of roles from software developer, to system architect, to where I am right now, product marketing for servers.

PK: Could you just give us a quick overview of Inspur’s participation in OCP, and why you are participating in the OCP event in Amsterdam.

AC: Inspur’s participation in the OCP, we see a lot of value in it. Obviously, as a vendor, we want to work on various technologies including the OCP platform. The OCP platform allows us to have access to a lot of end user’s problems, and then try to create solutions for it.

Second, there are a lot more members related to silicon development, and they have a lot of technology they want to bring it into the market. Sometimes, from Intel’s perspective, NVIDIA’s perspective, all these big companies, they have resources to sourcing out to supplier like us to bring technology to the market. However, there are a lot of startups, right? As we know, AI is a growing trend in this new computing infrastructure. A lot of startups do not have the chance or resource to actually access the market. We use OCP as a platform to access new technologies, not only the big silicon developer but also the small silicon developer as well. I think that is a great opportunity for everyone.

Open Accelerator Infrastructure

PK: Great. Moving into the big announcements for next week. The first one I want to talk about was the OAM UBB. Could you just give us a little bit of background in terms of what your perspective on Open Accelerator Module (OAM) is and why you’re participating?

AC: Sure. So when we first heard about OAM, we felt like this is absolutely a great idea. The PCIe standard has been around for many, many years. Right now it’s PCIe Gen3 and everyone’s talking about PCIe Gen4. But in terms of the how the peripheral form factor it always has been, you know, a small form factor or a, you know, half-height/ half-width, full-length/ double-width or others. It has been like that for a while. A couple of years ago, NVIDIA showed something for the acceleration with SXM2 modules which was mind-blowing for the industry.

I think the OCP community saw that as one of the solution parts. Then I think to create that OAM, allowing silicon companies to have an equal opportunity to actually just share something with customers. OAM allows the end-user to pick the best technology fitting their needs rather than picking a lot of different proprietary stuff and then spending a lot of money trying to figure out what’s going on. With the open solution module, you do not need to pick a proprietary platform and companies do not have to develop it.

I mean you start out with OCP and Intel was really aggressive on this. That’s why on the OAM, the first mock-up that we are demonstrating in OCP Summit Amsterdam is going to be a face on Intel Silicon.

PK: So it is Nervana?

AC: It is Nirvana, yes. Other than that, in the last couple of months we have seen a lot of new startups. Right. Of course the best silicon companies including NVIDIA, AMD, and others were also participating in the OAM ecosystem. I think again, this is a great opportunity for everyone to have better technology and to compete at the same level as the more established companies. With OAM, the best technologies can win without having to re-invent a platform.

PK: Are we going to start seeing OAM get adopted more in the PCIe Gen4, so really 2020 timeframe, or are we going to be seeing that more in, you know, the CXL PCIe Gen5 timeline?

AC: Well for me, I think it’s a little bit uncertain at this point, to be honest with you. Right now we are developing everything based on PCIe Gen3 as we speak because that’s as current technology as we have. I think most of the people that we talk to are also trying to figure it out, not just the PCIe protocol, but also which silicon they want to use down the road.

For mass deployment, I would say yeah, probably somewhere around 2020 right. It really depends on how PCIe Gen4 and then CLX really compete in the ecosystem and in that regard. But from what I see today 2019, we want to have OAM available for a company to actually test it out in terms of the physical dimensions and the technology itself, silicon itself, and then see which one wins.

PK: Can you tell us about your work on the OAM Universal Baseboard (UBB)?

AC: Sure, absolutely. We have been participating in UBB and OAM very early on. It’s great for Inspur’s product management to actually participate in this and make the decision to participate this early on. We have been working very closely with the OCP project leads and then we are going to be one of the leading suppliers for the OAM module.

There are multiple typologies that can be formed with the OAM to the UBB. We picked one which we believe really provide a benefit to end customers that we talk to. We know there are other suppliers who are working on various different forms and I think that’s also great, right?

Again, at the end of the day, better technology wins. So for us, we want to demonstrate with an OAM at this OCP Summit a joint development with Baidu, which is X-MAN 4.0. We have been working on X-MAN 1.0, 2.0, 3.0, 4.0 is a fourth-generation force here. We see that’s going to be something adopted very soon by Baidu and then hopefully we’ll see adoption with other customers.

Baidu X-MAN 4.0

PK: That is a great segue into X-MAN 4.0. Can you tell us a little bit about it and what makes it special?

AC: X-MAN is based on the ODCC. ODCC is a standard in China, which is very similar to what OCP doing in the United States today. Baidu was already participating in ODCC because of their massive deployment, they want to be more effective, be more efficient.

Baidu is part of OCP, right? So with that being said, I mean ODCC is what they’re using right now, but they are aggressively also talking to all these OCP participants and members. What I’m trying to figure out is how do they merge the ODCC and OCP together. Even though X-MAN 4.0 is still an ODCC standard or form factor we found we could incorporate OCP. The Universal Baseboard can be leveraged and reused in the ODCC, open rack, the project Olympus rack, and even the Open19 rack. So the board itself is going to be very flexible. Like you just have to put it in a different enclosure with power delivery and cooling.

Note: STH previously covered the Baidu X-MAN Liquid Cooled 8-Way NVIDIA Tesla V100 Shelf, and Inspur OAM UBB Sets New Accelerator Platform Standard.

Inspur I-Flex and PCIe Switched Infrastructure

PK: Regarding the I-Flex, could you give us an idea about what your customer uses it for, and what your involvement is?

AC: The I-Flex is based on a PCIe switch topology. It used to be the case where there were only a PCIe device such as a NIC or a RAID card in a server. Now we are seeing NVMe drives, GPUs, and different form factors. We have a hyper-scale customer that is preparing for attaching many PCIe devices to servers. They are asking what’s the best and most efficient way to have multiple CPUs or different hosts to connect to a device or to utilize all of these expensive PCIe devices in a more effective way?

I-Flex is based on the idea of having centralized PCIe switching within the rack. Maybe there’s one or maybe there’s two, depends on how many PCIe peripherals you want to attach. Then you have a more like a centralized box to connect to all these different PCIe devices pooling resources. I think that’s a great idea to have that. The way we designed I-Flex is based on our customer’s specifications. The board itself is going to be publicly available and we’ll continue to contribute so it can be reused for different various platform enclosures.

PK: So your customer is moving towards a more disaggregated infrastructure. You are taking PCIe devices outside of the traditional server chassis and then start pooling them across a number of servers?

AC: Yeah, I think that’s all the hyperscaler companies are trying to do. This customer has a lot of low latency applications. When that happens they are a little bit more aggressive moving toward to the PCIe devices compared other protocols. So I would say yes, I absolutely agree on the move into disaggregating and sharing devices.

For now, it is this particular hyperscaler that is using this approach. Everyone is trying to do the same thing but their approaches are a little bit different. Sometimes companies may put the PCIe switch inside an enclosure to do external PCIe switching to servers attached to it. The approach with the I-Flex is to have PCIe switches externally at a rack level, providing a different layer of flexibility. I think that’s a great idea and great approach.

PK: When does that move beyond the hyperscale guys and into enterprises, hosting companies, startups? When does that transition happen?

AC: I think that adoption for the traditional enterprise could be a little bit slower. I also think that, other than the hyperscaler, what’s the next vertical or market could adopt this? I think it will be the telco 5G space because, as you can imagine, the telco has many offices or cellular locations. With 5G infrastructure, there is also a space constraint. An application that has to be run on a 5G network. Having that flexibility to change configurations of PCIe attached devices is a useful capability. Dynamic allocation of PCIe devices allows 5G providers to have a little bit more flexibility in a constrained space and then they can do a lot more. I think that’s the next highly motivated vertical to actually use this technology.

OpenRMC Project

PK: Moving to OpenRMC. Could you give us a quick overview of the project, why Inspur has been involved for some time, and what the next steps are?

AC: We have all heard about this open-source management, right? When delving into that, the first one is obvious, OpenBMC. That’s for individual boxes. As Inspur, the way we deliver the quantity that we deliver and engage with hyperscale customers everything is developed in racks. It’s great to have individual BMC with OpenBMC on a single box, but everyone wants to have simplified management across racks. You can see, OCP, ODCC, they have centralized power supplies. Even ODCC has even a little bit more aggressive with centralized fans and cooling.

All of these components that are outside individual nodes need to be managed and controlled. When we see this concept of OpenBMC we also feel like there is a need for control at the rack level with OpenRMC. That’s why we kicked off the project in the community. We spoke to Bill Carter CTO of OCP saying, “Hey, we feel like there’s a need for a rack-level solution and can we start a subgroup.” Bill absolutely sees value into it as well. So that’s why we created OpenRMC. Ever since we started, releasing the first 1.0 spec in August of 2018 we have gained momentum. We are planning version 2.0 in the 2020 so it’s still not 100% complete at this time, I would say. It’s an open-source software project, so we want to give this idea to the community and hopefully, there’s going to be more participation.

PK: Does OpenRMC end up being limited to OCP in Inspur’s ecosystem? Or is OpenRMC something that Inspur will build into its other platforms and management toolsets?

AC: Absolutely. That’s a great, question. Inspur was coming from creating a lot of complex systems. Entire rack systems were proprietary rack systems. Then we also have proprietary servers, right? So what we see is, that OCP is going to be a platform collecting all this information and requirements.

At the same time, we’re creating and taking all this knowledge that we learn and create in OCP, and implement it into 19″ standard form factors. So our idea is OpenBMC, OpenRMC, and open servers are going to be a future. That’s what the industry is driving to. And then in, I think I will have to guesstimate, it’s not a commitment, right? In 2020, we are still going to support the proprietary sets and solutions, but soon we’re going to create a different set, a duplicate set, for all the platforms that we have. We should be able to offer proprietary and open management sets at the same time. I think that’s going to be great for giving customers different options. I think that’s the goal for sure.

OCP Inspired or Accepted Products

PK: On the OCP accepted or inspired products, you have had a couple so far. Do you want to talk about the new ones for the OCP Regional Summit 2019?

AC: This year, at the beginning of the year we made a commitment to ourselves, saying we are going to deliver at least four to five different platforms to the OCP market.

In March of this year at the beginning of the OCP Summit 2019, we joined working with Intel to announce the first-ever four-socket 2U 19-inch form factor pizza box. I think it is really well received. A lot of people were saying they wanted to try a four-socket platform but it was always much larger, not the 2U we delivered. It’s been well received and there are a lot of customers doing POCs. We have delivered a lot of systems since then. That’s the one we actually finalized and contributed the flies and everything to the OCP market so you can order that from OCP market space as today.

Note: STH reviewed this platform in Inspur Systems NF8260M5 4P Intel Xeon OCP Server Review

The others that we contributed this year include the GPU server and storage server based on the open rack. Let me tell you the reason why we did this. Even though the OCP is open, what we see is that most of the suppliers are actually just doing exactly what Facebook might buy today, right?

There’s a little less, I would say derivative or different form factor that could be available for the wider community to choose from since everyone is building for Facebook. We talked to various different end-users and they like what Facebook was doing, but sometimes the application isn’t equivalent to what Facebook is using as a social network. There might be different applications requiring a different GPU typology, right? There are different ways that storage can work. That’s why we created these two products.

We want to create some sort of alternative options, but not try to dictate which is better. Different usage models have different requirements, right? That’s the same thing as a 19-inch rack today because there are practical rack constraints in data centers. We just want to help the marketplace and OCP community by developing more options.

The last contribution we have is actually a project on a four-socket platform as well. As you can imagine, we’re leveraging a lot of different four-socket motherboards that we have internally and design it into OCP because that can help drive higher quality and lower cost to the customer.

Everything that we have contributed to the market space today is going to be something we ship already in huge volume. For OCP, we want to deliver something really solid and really reliable cost-effectively to the customer. Hopefully, customers will see this and tag Inspur as one of their suppliers.

Final Words

I wanted to thank Alan and the team at Inspur for arranging this interview. As Inspur and its traditional customers move into the OCP space, we are seeing major market impacts such as the Open Accelerator Infrastructure change entire generations of hardware. This is certainly a key place STH is watching for new industry developments.

Interview with Alan Chang of Inspur

Open Accelerator Infrastructure

Baidu X-MAN 4.0

Inspur I-Flex and PCIe Switched Infrastructure

OpenRMC Project

OCP Inspired or Accepted Products

Final Words

RELATED ARTICLESMORE FROM AUTHOR

Why Servers Are Using So Much Power TDP Growth Over Time

CXL 3.1 Specification Aims for Big Topologies

Microsoft Azure Eagle is a Paradigm Shifting Cloud Supercomputer

LEAVE A REPLY

RELATED ARTICLES MORE FROM AUTHOR