Testing Intel QAT Encryption: nginx Performance
Here we are going to take a look at something that goes beyond just QAT hardware acceleration. We are going to look at the nginx HTTPS TLS handshake performance using QAT Engine. This is going to encompass both the hardware offload, but also the IPP using onboard Ice Lake extensions. Since performance is going to vary a lot based on our different cases, we are aiming for 45,000 connections per second and adjusting the various solutions to achieve around that mark by dialing up and down threads as well as testing the different placement of the threads on the chips.
Just to give some sense of where the acceleration is happening, Intel has this diagram that shows where it accelerates the HTTPS TLS handshake:
Just to give a sense of how taxing this is, here is a screenshot of the ~45,000 connections per second not using any of the QAT acceleration using 100 threads. This goes down to 30 threads with the QAT Engine using the IPP crypto library and our performance goes up to around 46,900 connections per second. We tried dropping cores here, but ended up losing too much performance. Again, it was a lot of trial and error to hit the ~45Kcps range.
With the QAT hardware acceleration, we are doing effectively the same work with only 18 threads.
AMD was able to use only 82 threads to reach this 45K connection range (44200cps.) The AMD solution has some crypto offloads, but not as much as we can get using the more optimized QAT solutions.
What we learned here is that AMD’s higher clocks and TDP meant it used 18% fewer resources to maintain around the 45Kcps range. Once Intel QAT Engine was added, Intel pulled ahead. Finally, once the QAT hardware acceleration was added, Intel was roughly 4.8x AMD’s implementation on a connections-per-core basis or over 5x the Intel Xeon baseline we had.
Taking a step back, this is what makes servers notoriously difficult to benchmark. Companies with other architectures such as Arm CPUs often discuss nginx performance. What they often miss is that a lot of their advantages disappear when encryption and compression are turned on and proper accelerators are used. Using IPP and the QAT Engine did not require special hardware and even that offers massive performance gains.
At the same time, there are many folks running nginx web servers and not using the QAT Engine/ IPP offload for Ice Lake and are therefore not seeing this performance.
Next, let us discuss the “gotchas” of QAT, and our final words.