Kioxia CD6-L PCIe Gen4 NVMe SSD Review Ending Data Center SATA

5

Traditional “Four Corners” Testing

Our first test was to see sequential transfer rates and 4K random IOPS performance for the Kioxia CD6-L 7.68TB SSD. Please excuse the smaller than normal comparison set, but if you need an explanation, see above as to why we are not using legacy Xeon Scalable platform results.

Kioxia CD6 L Sequential Performance
Kioxia CD6 L Sequential Performance
Kioxia CD6 L 4K Performance
Kioxia CD6 L 4K Performance

Here we can see what Kioxia means by this being a “read optimized” SSD with the CD6-L. We actually get very good sequential performance. The one area of four corners testing that seems to be lower is the 4K random write testing which makes sense given the market. At the same time, we are getting better performance than we got on the original CD6.

Just taking a quick look at the relative performance of the CD6-L on the Ampere Altra Q80 platform as a representation of Arm CPU performance compared to AMD EPYC 7002 performance we get fairly good performance.

Wiwynn Mt Jade Ampere Altra Platform CPUs And Memory
Wiwynn Mt Jade Ampere Altra Platform CPUs And Memory

A few quick notes here. We were going to test these with the IBM POWER9 systems we have in the lab. Our test results looked distinctly wrong so we are not publishing them here. The test setup is less mature, so we basically found we needed to go back and re-look at the system setup. Second, the Ampere parts are in the Wiwynn Mt. Jade platform and this is not the final firmware as you will see on systems in Q1 2021. Third, in Q1 2021 we also expect to see AMD EPYC 7003 CPUs. We do have Milan CPUs that we obtained through authorized channels but told AMD we would not publish the results given the sensitivity of the SKUs we have. Milan will see an update to the AMD figures so expect both the AMD and Ampere/Arm figures to move up next quarter. Finally, Intel Xeon Ice Lake chips will arrive in 2021, we are expecting now in early Q2. So please take all of these Gen4 results as more of a point-in-time snapshot that there will be a lot of changes to over the next quarter and a half.

STH Application Testing

Here is a quick look at real-world application testing versus our PCIe 3.0 x4 and x8 reference drives:

Kioxia CD6 L Application Performance Big
Kioxia CD6 L Application Performance Big

As you can see, there is a lot of variabilities here in terms of how much impact the Kioxia CD6-L and PCIe Gen4 has. As noted in the chart, given some of these use x86 VMs, we have not ported everything to Arm at this point/ validated Arm-based setups so we are only showing two Ampere results. Let us go through and discuss the performance drivers.

On the NVIDIA T4 MobileNet V1 script, we see very little performance impact, but we see some. The key here is that we are being mostly limited by the performance of the NVIDIA T4 and storage is not the bottleneck. Here we can see a benefit to the newer drives, in terms of performance, but it is not huge. Perhaps the more impactful change here is the move from Gen3 x8 to Gen4 x4 frees up more PCIe connectivity for additional NVIDIA T4’s in a system, thus having a greater impact on total performance. This is a strange way to discuss system performance for storage, but it is very relevant in the AI space.

Likewise, our Adobe Media Encoder script is timing copy to the drive, then the transcoding of the video file, followed by the transfer off of the drive. Here, we have a bigger impact because we have some larger sequential reads/ writes involved, the primary performance driver is the encoding speed. The key takeaway from these tests is that if you are compute limited, but still need to go to storage for some parts of a workflow, there is an appreciable impact but not as big of an impact as getting more compute. Here, the CD6-L performed about where we would expect given the sequential read/ write numbers we saw.

On the KVM virtualization testing, we see heavier reliance upon storage. The first KVM virtualization Workload 1 is more CPU limited than Workload 2 or the VM Boot Storm workload so we see strong performance, albeit not as much as the other two. These are a KVM virtualization-based workloads where our client is testing how many VMs it can have online at a given time while completing work under the target SLA. Each VM is a self-contained worker. We know, based on our performance profiling, that Workload 2 due to the databases being used actually scales better with fast storage and Optane PMem. At the same time, if the dataset is larger, PMem does not have the capacity to scale. This profiling is also why we use Workload 1 in our CPU reviews. We see that the Kioxia CM6 is frankly faster than the Kioxia CD6-L. Our sense is that this is purposeful as the CM6 is a higher-cost per GB drive. For many, trading a few percentage points of performance is worth getting more capacity. If one can hold more VMs on the storage, then that can have a bigger TCO benefit than having slightly faster application performance. That is not true in all cases which is why we have the mixed-use CD6-L and the CM6.

Moving to the file server and nginx CDN we see much better QoS from the new CD6 versus the PCIe Gen3 x4 drives. Perhaps this makes sense if we think of a SSD on PCIe Gen4 as having a lower-latency link as well. On the nginx CDN test, we are using an old snapshot and access patterns from the STH website, with DRAM caching disabled, to show what the performance looks like in that case. Here is a quick look at the distribution:

Kioxia CD6 L CDN Latency
Kioxia CD6-L nginx Latency

Overall, we saw a few outliers, but this is an excellent performance. Our performance was again not as good as we saw on the Kioxia CM6, but it was much better than our baseline PCIe Gen3 SSDs. Perhaps the key takeaway is that the CM6 is faster, but the CD6 is a better value if you need capacity and are focused on reads.

We swapped the drives to an AMD EPYC 7742 platform and the Ampere platform. This gives us a 128 core/ 256 thread AMD platform and a 160 core/ thread Ampere platform. The application since we have linux and nginx, is fairly mature on Arm.

Kioxia CD6 L CDN Latency Altra Rome
Kioxia CD6-L nginx Latency Altra Rome

Here we can see a bit better performance than we saw on the AMD system during this test as we got further into the tail. AMD’s SMT design along with its chiplet architecture does have an impact so here we can see a small impact. In general, as we are testing the Q80-33’s it is not always the case where it is better than the EPYC 7002 series, but I will let Patrick discuss that in the full Altra review.

To us, the key takeaway is that as we migrate to the PCIe Gen4 era, it is going to become very difficult to recommend SATA over NVMe as drives like the Kioxia CD6-L (and likely soon joined by others) push PCIe Gen4 NVMe pricing down to meet SATA. With next-gen systems, we are going to have better support for NVMe drives so the close of SATA in the data center is coming.

Next, we are going to give some market perspective on a new variant before moving to our final words.

5 COMMENTS

  1. I’m ashamed but we’re still using SATA in our Dells. We’ll be looking at gen4 in our next refresh for sure.

    If people missed it… watch the video. I’m sending it to a colleague here since it explains the why of arch. It’s different than the article but related.

    Good job on the ARM ampere tests too. we prob won’t buy this cycle, but having this info will help for 2022 plans

  2. We won, John, we won. The SATAs can never again destroy our bandwidth. But the price, John, the terrible terrible price.

  3. Plan to buy Dell 7525 for Media Storage but not sure can support NVME raid on Pci 4.0 ?
    if we use raid for read only performance must multiply by disk ?
    didn’t see anyone test on this 🙂

  4. Can I use these drives in a normal AMD PCIe gen4 system, using a m.2 to u.2 cable? Or is there a m.2 to u.3 cable?

  5. As far I know those Kioxia pro drives don’t have any end-user support in terms of sw tools or firmware. They wont even disclose any endurance numbers. In my they are only an option if you buy perhaps > 1000 disks to get the right support.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.