Inspur is one of, if not the, biggest AI and deep learning server vendors in the world. At Inspur Partner Forum 2019, the company had over 51% deep learning and AI server market share in China. Earlier this year we had our Interview with Liu Jun, AVP and GM of AI and HPC for Inspur. He graciously agreed to join STH again for our Deep Learning and AI Q3 2019 Interview Series. This is a great perspective as he leads AI systems at the company leading the Chinese market. From IPF 2019, I can say that AI is a major focus of Inspur and its customers ranging from as Baidu, Tencent, and Alibaba to smaller organizations work on edge applications.
Inspur Focus on Deep Learning and AI Q3 2019
In this series, we sent major server vendors several questions and gave them carte blanche to interpret and answer them as they saw fit. Our goal is simple, provide our readers with unique perspectives from the industry. Each person in this series is shaped by their background, company, customer interactions, and unique experiences. The value of the series is both in the individual answers, but also what they all say about how the industry views its future.
Who are the hot companies for training applications that you are seeing in the market? What challenges are they facing to take over NVIDIA’s dominance in the space?
The so-called Super 7 Internet companies, including Google, have either launched or are developing AI processors such as TPUs. They will be the most promising players in the training market, since they are themselves superusers and super developers of AI. Besides them, there are also a number of emerging AI chip companies. NVIDIA’s leading position in the AI market is not only due to the powerful performance of its GPU, but also the overall maturity of the GPU-based software ecosystem, which is a huge challenge faced by new chip manufacturers.
How are form factors going to change for training clusters? Are you looking at OCP’s OAM form factor for future designs or something different?
At present, there are various form factors of training clusters, such as PCIE cards and separate heterogeneous acceleration modules. The PCIE card form factor can support up to 16 GPUs. As for the node form factor, racks are in the mainstream. Heterogeneous acceleration modules external to the server enable greater vertical scalability as well as pooling and sharing of heterogeneous acceleration resources. This solution is more powerful and flexible, and is mainly used for large-scale and ultra-large-scale training scenarios.
The OAM specification was developed by the OAI team in the OCP community. Two versions of OAM have been updated to date. The latest version is V 0.90, which is based on the AI deployment practices of companies such as Baidu and Facebook.
As a member of the OAI team, Inspur is actively participating in the development of the OAM standard, which primarily aims to solve the standardization problem of acceleration modules. Current heterogeneous acceleration schemes are very complicated, and it often takes 6-12 months to design one. The OAM standard simplifies this task, effectively reducing the time to market and cost of innovative heterogeneous technologies.
What kind of storage back-ends are you seeing as popular for your deep learning training customers? What are some of the lessons learned from your customers that STH readers can take advantage of?
Given the specific nature of AI data training, we suggest that customers use a shared storage system built on high-performance NVMe to handle the heavy IO load of massive sample data during training of deep learning models.
What storage and networking solutions are you seeing as the predominant trends for your AI customers? What will the next generation AI storage and networking infrastructure look like?
In the future, AI will be an important part of enterprise applications and will gradually integrate with other businesses. Thus, AI will also be an important part of the enterprise IT infrastructure. Enterprise AI applications require higher computing performance and data throughput, whether it’s during offline training or online inferencing. Therefore, a hybrid AI architecture should be employed. Some trends we see are:
- Unification and integration of heterogeneous computing technologies such as CPU, GPU, and FPGA
- Multi-level cache storage systems consisting of Optane memory, NVME, SSD hard disk and a mechanical hard disk
- Network gradually upgraded to 10 Gigabit and 100 Gigabit Ethernet or Infiniband
For enterprises, especially large enterprises, only this type of infrastructure can support the development of future AI applications.
Over the next 2-3 years, what are trends in power delivery and cooling that your customers demand?
The power supply for single racks will increase dramatically, accompanied by innovations in cooling methods. However, air cooling or hybrid air/water cooling will remain mainstream for the next 2-3 years.
What should STH readers keep in mind as they plan their 2019 AI clusters?
We believe that customers should pay attention to AutoML, which allows more users to build AI algorithm models and launch their own AI business. AutoML will play a very important role in accelerating the application of AI in enterprise scenarios. Meanwhile, AutoML relies heavily on high-performance, scalable AI computing clusters, which require more powerful performance in AutoML frameworks and high-speed interconnection.
Recently, Inspur released AutoML Suite, which enables one-stop automatic generation of model products based on GPU cluster visualization. It has four major features:
- One-stop visual processing. The user can construct a network model for a learning task and obtain high precision simply by following six steps of visualization: task setting, data uploading, model searching, model training, model evaluation, and model deployment;
- Automatic generation of CV models. Through reinforcement learning, classification and regression model generation can be automatically completed, supporting both supervised learning and unsupervised learning;
- Flexible deployment. As the world’s first product that supports dual-mode deployment (on-premise and cloud), it can be deployed in minutes;
- Supports parallelism across multiple machines and multiple GPU cards, greatly reducing the time required for model searching and model training. For example, during the model searching phase, by running on a 16-GPU AI server cluster, the average search time of a single model is reduced to 9.6 minutes. As a result, 144 models can be searched in one day, greatly improving the model production efficiency.
Are you seeing a lot of demand for new inferencing solutions based on chips like the NVIDIA Tesla T4?
In the past few years, the entire industry has been investing heavily in online training. In this amount of time, a large number of application systems have been trained for application deployment. Therefore, the market demand for online inferencing is increasing. The future market size of online inferencing is also expected to be equivalent to and even greater than that of offline training. Video recognition, image recognition and natural language recognition remain the primary applications of offline inferencing. Traditional industries with the fasted adoption include finance, telecommunications, and transportation.
NVIDIA Tesla T4 combines high-performance video image processing with powerful AI inferencing performance. Many video and image-based AI applications are using T4-based solutions. With the arrival of 5G, at the edge end, customers have a high demand for inferencing chips that maintain performance while meeting low power requirements.
Are your customers demanding more FPGAs in their infrastructures?
While conducting business at Inspur, we found that many customers are trying and deploying FPGA-based solutions due to flexible deployment and low-latency AI solutions.
According to the 2018-2019 China Artificial Intelligence Computational Development Evaluation Report issued by IDC and Inspur, the distribution of AI computational power will follow the “two-eighth rule”, where 80% of the computing power will be concentrated on the training scenario in the early stage, and 80% of the computing power will be concentrated on the inferencing scenario in the large-scale application phase in the future.
FPGA heterogeneous acceleration boards deliver more flexibility and low-cost features that better meet the needs of online inferencing services. Highly flexible FPGA solutions can be deeply optimized for AI inferencing applications for higher performance and lower TCO. In addition, FPGA solutions are more adaptable to new AI algorithms. Therefore, more customers choose to deploy FPGA solutions online for AI inferencing services.
Who are the big accelerator companies that you are working with in the AI inferencing space?
Today, Inspur is working with all of the most well-known FPGA vendors. However, for now NVDIA is still Inspur’s largest supplier around the world. Meanwhile we are also working closely with Intel to launch more competitive accelerator products.
Are there certain form factors that you are focusing on to enable in your server portfolio? For example, Facebook is leaning heavily on M.2 for inferencing designs.
Most of them are still in the form of PCIE boards, which have very good compatibility. We are also looking at other form factors such as M.2.
What percentage of your customers today are looking to deploy inferencing in their server clusters? Are they doing so with dedicated hardware or are they looking at technologies like 2nd Generation Intel Xeon Scalable VNNI as “good enough” solutions?
Now this percentage is about 10%, which is not very large, but the YOY growth rate is very high. They are using mainly NVIDIA GPUs, and are also having some loads running on Intel Xeon.
What should STH readers keep in mind as they plan their 2019 server purchases when it comes to AI inferencing?
It is recommended to select the best adaptive hardware platform based on your application scenarios and algorithm characteristics in combination with a software ecosystem. If possible, it is better to test the server with actual loads in advance to find out the actual performance. Scalability is also a very important factor. For example, Inspur provides inferencing servers with up to 16 accelerators and various accelerator cards supported, such as GPU, FPGA and ASIC.
How are you using AI and ML to make your servers and storage solutions better?
We leverage AI technology to discover the performance characteristics of various application loads and recommend the most suitable servers and storage solutions for our customers.
The AISTATION software platform launched by Inspur is embedded with an optimized framework, and can be used as an AI computing resource management and AI development training platform to accelerate customers’ AI operations on our servers.
Where and when do you expect an IT admin will see AI-based automation take over a task that is now so big that they will have a “wow” moment?
Wow moments will not come all of a sudden. We believe AI will continue to bring wow surprises.
For example, the main problem facing IT admins is that the monitoring data in their data center machine rooms is dynamic, multi-dimensional and massive. Different businesses have different indicators which can reach the tens of thousands. With a lack of manpower, it can be difficult to process resource scheduling requests and fault alarms in a timely manner. AI-based automation helps achieve comprehensive performance management, unified resource management, timely fault report management and unified centralized display management. These features will wow IT admins by freeing them from heavy workloads.
I again wanted to thank Liu Jun for taking time out of his busy schedule to answer our Q&A questions. He is driving what may be the top AI and deep learning server business in the world, so getting his insights is extremely valuable. He also has a unique perspective on how the entire ecosystem is evolving.
Something that we did not cover in this Q&A, but I found fascinating this year, was when I had the opportunity to visit the Inspur Intelligent Factory in Jinan China. Although it was not mentioned explicitly in this interview, Inspur is using AI to power the robots building and testing its servers.