Intel QAT without a QAT Accelerator: QAT Engine and Optimizations
The Intel QAT Engine is something that is very important to understand here, as we are going to show the impact. The simple version of the Intel QAT Engine is that it is like a middleware that allows Intel to interface with applications and use either its standard instructions or a QAT accelerator back-end.
The QAT Engine can then utilize either the hardware accelerator, which is our primary focus, but it can also utilize the new instructions that Intel has in its Ice Lake Xeons (and we would imagine future Sapphire Rapids chips) and accelerate certain crypto functions. Here is Intel’s mapping from new instructions to Ciphers to give you some idea about how this is used.
That really gives us three main cases for using acceleration with an Intel CPU. One can have no acceleration, use the QAT Engine in “software” mode taking advantage of the ISA enhancements, or use a hardware QAT accelerator. While in some tests we are going to refer to the QAT Engine, in reality, Intel has many optimized implementations using the instructions above and more such as ISA-L for storage and compression and its Multi-Buffer Crypto for IPsec.
Getting “real” here for a moment, AMD has a well-known strategy of following Intel’s ISA developments. Some of the ISA, but also the software to utilize the new CPU features is what Intel looks at when it discusses the performance of its cores. A big part of what I wanted to do was to take a look at these software/ ISA optimizations as well, not just the hardware acceleration.
The test setup that we have is an important one. We wanted to test 3-5 cases to see the impact of using QAT.
- Intel Base Case: 3rd Generation Intel Xeon Scalable Ice Lake without Acceleration
- Intel QAT Engine or Optimizations: 3rd Generation Intel Xeon Scalable Ice Lake with QAT Engine using software acceleration
- Intel QAT Hardware: 3rd Generation Intel Xeon Scalable Ice Lake with QAT hardware acceleration
- AMD EPYC Case: AMD EPYC 7003 Milan baseline case
- AMD EPYC with Optimizations: AMD EPYC 7003 Milan with optimizations available to it (e.g. ISA-L)
To cover those four cases, we needed two test platforms. The Ice Lake test platform we could then vary based on which acceleration technology we wanted to use. We also needed an AMD EPYC platform for the Milan base case.
The Intel CPUs we are using are the Intel Xeon Gold 6338N processors. These are 32-core network-focused processors that are more aligned to the IPsec testing. In the Ice Lake system we also have an Intel QAT 8970 card with an Intel E810-CQDA2 100GbE adapter.
On the AMD side, we have a Supermicro 2U Ultra server with two AMD EPYC 7513 CPUs with the same NIC but without the QAT adapter. One could argue that the EPYC 7513 is a higher TDP part, but since that one is not getting QAT acceleration, I felt like it made sense to add a bit of TDP headroom for the AMD EPYC CPUs.
The overall demo was far from pretty, but you will likely see the follow-up to this piece, namely the embedded Intel Ice Lake D parts here too. We also will have more details and b-roll of the setup in the accompanying video to this piece. Make no mistake, this took a lot to put this together.
With that, let us get to the testing.