Intel Xeon Phi Knights Mill for Machine Learning

1
Intel Xeon Phi Knights Mill High Level
Intel Xeon Phi Knights Mill High Level

Intel Knights Mill is the company’s offering for deep learning. CPU design takes many years. By the time Intel Knights Landing (see STH’s hands-on piece here) was being deployed, the HPC market moved towards supporting deep learning applications. In Q4 2017, we will see Intel Knights Mill which builds on Knights Landing and is specifically targeting deep learning applications. Interestingly enough, Intel is pushing Integer for deep learning with this release.

Intel Portfolio for Deep Learning

Before we get too far covering this story, Intel also made two acquisitions. Altera and Nervana. Altera brings leading FPGA technology for Intel to market FPGAs to deep learning shops. FPGAs Intel is targeting at the inferencing market given their programmability and low latency. The upcoming Nervana CPUs will likely be around in time to displace Intel Knights Hill for deep learning training.

About Intel Xeon Phi Knights Mill

Let us get down to some figures. Here is the new architecture overview:

Intel Xeon Phi Knights Mill High Level
Intel Xeon Phi Knights Mill High Level

One takeaway is the 384GB limit. That is twice what Knights Landing could support. Intel is also using DDR4-2400 instead of the newer DDR4-2666. On the other hand, there is the same 16GB MCDRAM that we saw on the previous generation KNL part.

Intel Knights Mill High Level SoC Overview
Intel Knights Mill High Level SoC Overview

There are a total of 36 tiles connected using a 2D mesh architecture, similar to what Intel Xeon Scalable uses.

Intel Knights Mill Core
Intel Knights Mill Core

The core continues to be 4-way SMT whereas the standard Xeon CPUs are 2-way SMT (Hyper-Threading.) Here are the core details.

Intel KNL And KNM Port Comparison
Intel KNL And KNM Port Comparison

Knights Mill is based largely on Knights Landing, with some changes specifically to address scale-out deep learning training. Here is the FMA port difference for example:

Intel Xeon Phi Knights Mill Quad FMA
Intel Xeon Phi Knights Mill Quad FMA

Here is the benefit slide:

Intel Xeon Phi Knights Mill Quad FMA Efficiency
Intel Xeon Phi Knights Mill Quad FMA Efficiency

One of the more eyebrow-raising parts of the talk was that Intel is advocating Variable precision, VNNI-16. This uses integer math for neural network training.

Intel Xeon Phi Knights Mill VNNI 16
Intel Xeon Phi Knights Mill VNNI 16

When you use quad FMA and VNNI Intel calls it QVNNI:

Intel Xeon Phi Knights Mill QVNNI
Intel Xeon Phi Knights Mill QVNNI

That is how Intel is claiming a 4x speed up in performance with Knights Mill.

Final Words

Overall the Intel Xeon Phi Knights Mill is interesting. First off, Intel is going to get more competitive in the deep learning space. What will make this more interesting is that Intel will have the ability to add Omni-Path to the SKUs, like in the Intel Xeon Scalable and Intel Xeon Phi x200 (Knights Landing.) Main supercomputing centers for deep learning yet the overall software ecosystem is heavily floating point. As a result Intel is releasing updates to MKL and other libraries to help its customers utilize these new chips with existing frameworks.

1 COMMENT

  1. Never understood the point of these since they can’t run the popular libraries faster then a GPU. Why would anyone buy one of these?

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.