Testing Intel QAT Encryption: IPsec VPN Performance
IPsec VPN performance is another great example of where commercial providers figured out that QAT hardware acceleration was a huge performance boost. IPsec VPNs are a mature and widely deployed technology, and a big part of establishing secure connections over the public Internet is encryption. As a result, we took a look at the IPsec VPN performance using a few cases.
- Intel Ice Lake Xeon Gold 6338N with:
- QAT Engine Software Acceleration using the Intel Multi-Buffer Crypto for IPsec library that you can find on GitHub.
- QAT Hardware Acceleration
- AMD EPYC 7513 “Milan” with:
- AES offload, but not VAES since Milan does not support AVX-512 VAES
On this one, we are just including the AMD EPYC 7513 numbers to stay consistent. Many of the instructions the QAT Engine’s software acceleration uses are not present in the EPYC 7003 series. Still, we are going to show the results, but perhaps the most important point to look at is the Intel scaling for this.
Here is the basic diagram. We are using the DPDK IPSec-secgw to measure how many packets/ Gbps we can process per second using IPsec. We are using the DPDK libraries here since DPDK is widely used in Intel, AMD, and Arm solutions at this point. We are using Cisco T-Rex as our load generator and we are using an acceptable packet drop rate maximum of 0.00001%.
When we talk about the QAT Engine here, we are using the cores and utilizing things like VAES (vectorized AES) instructions on the Ice Lake cores. This is different from using the Intel QAT hardware accelerator even though one could call it a form of acceleration.
Here, we see the use of VAEX helps Intel quite a bit. Ice Lake Xeons can hit our ~84Gbps target traffic across 2,000 IPsec tunnels using only 6 threads while AMD needs 8 threads. To be fair, the AMD EPYC 7513 hit 84.14Gbps here and 7.39Mpps. The Intel examples were 84.04Gbps and 7.38Mpps. Again, we tried different core/ thread placements. The bigger challenge was actually we were using a 100Gbps link speed and with 64 cores/ 128 threads in this system, the bigger bottleneck was the networking link between the client and server.
Here is a look at this result in a similar Gbps/ core view.
What we can see here is that the QAT accelerator helps a lot here, as do the VAES versus just using traditional acceleration like AES-NI like the AMD EPYC Milan CPU is using.
This will make more of an impact in a future piece we are going to show you, but for now, the Gbps per thread jump from using QAT is much larger than Intel versus AMD without the acceleration.
Next, let us take a look at nginx performance.