Recently we became aware of a new version of diskinfo that uses FreeBSD libraries to simulate the ZFS ZIL / SLOG device pattern. While many tests online focus on pure writes, or 70/30 workloads, heavy write endurance drives are also used as log or cache devices where data is written then flushed. Just about everyone in the storage world knows about ZFS and the ability to use a fast device in front of an array to speed performance. We wanted to take both traditional NAND SSDs as well as Intel Optane SSDs and use this new tool to see how they compare.
What is the ZFS ZIL and SLOG?
ZIL stands for ZFS Intent Log. The purpose of the ZIL in ZFS is to log synchronous operations to disk before it is written to your array. That synchronous part essentially is how you can be sure that an operation is completed and the write is safe on persistent storage instead of cached in volatile memory. The ZIL in ZFS acts as a write cache prior to the spa_sync() operation that actually writes data to an array. Since spa_sync() can take considerable time on a disk-based storage system, ZFS has the ZIL which is designed to quickly and safely handle synchronous operations before spa_sync() writes data to disk.
What is the ZFS SLOG?
In ZFS, people commonly refer to adding a write cache SSD as adding a “SSD ZIL.” Colloquially that has become like using the phrase “laughing out loud.” Your English teacher may have corrected you to say “aloud” but nowadays, people simply accept LOL (yes we found a way to fit another acronym in the piece!) What you would be more correct is saying it is a SLOG or Separate intent LOG SSD. In ZFS the SLOG will cache synchronous ZIL data before flushing to disk. When added to a ZFS array, this is essentially meant to be a high speed write cache.
If you want to read more about the ZFS ZIL / SLOG, check out our article What is the ZFS ZIL SLOG and what makes a good one.
Testing the Intel Optane with the ZFS ZIL SLOG Usage Pattern
Today we have some results for the Intel Optane product as a ZIL / SLOG device. We have numbers for several products including the Intel Optane 900p and lower end products like the Optane Memory M.2 devices. We also have Intel NVMe SSDs along with a few devices from other vendors along the NVMe, SAS and SATA ranges to compare.
The diskinfo slogbench test we are using has a usage pattern that is unlike many of the pure write tests. It performs writes then regular flushes. This is different than many write specific workloads that are often tested using tools like fio and iometer. Instead, this is intended to more closely resemble ZFS ZIL SLOG usage patterns, or as a write caching / log device. It turns out, that these are write heavy devices but also those where we typically see high-speed SSDs with high write endurance and reliability.
The genesis of this project was that a user requested we setup a custom demo in our DemoEval lab to compare drives using this specific workload. The individual wanted to compare a few of their existing NAND solutions to Optane. Here is the basic setup we used for this test:
- System: Supermicro 2U Ultra
- CPUs: 2x Intel Xeon E5-2650 V4
- RAM: 256GB (16x16GB DDR4-2400)
- OS: FreeBSD 11.1-RELEASE
The SSD stable is more interesting. We picked the Intel Optane M.2 16GB and 32GB drives just for fun. We also used a 280GB U.2 Intel Optane 900p. You will notice that we do not have P4800X results. We had good results, but off of where we would expect. For NVMe SSDs we have Samsung U.2 and M.2 offerings as well as the Intel DC P3700 and P3600. The particular Samsung M.2 drive we are using has PLP capacitors so it is representative of a data center M.2 device rather than a consumer M.2 device. Consumer drives, without PLP, have such poor performance we excluded those results from our sets. We also have SATA drives in the form of the Intel DC S3700 and S3610 SSDs which show SATA performance. Finally, we have a popular HGST SAS SLOG device. You are going to quickly see the stratification of these results.
The Results: Transfer Rate
We managed to distill the output to three views which we find telling. One that shows transfer rate in this usage scenario. The other two show average latency per IO. This generally is similar to what we saw with Intel Optane Memory v. SATA v. NVMe SSD: WordPress / vBulletin Database Import / Export Performance and in our Intel Optane: Hands-on Real World Benchmark and Test Results pieces, but the three views are telling.
The first view is raw MB/s. Here is the chart where you can see that the Intel Optane 900p 280GB drive is the clear leader despite some of the NAND NVMe SSDs having better throughput specs. If you are still on 10GbE, NAND NVMe SSDs can fill the pipe at larger transfer sizes. Once you go past there into 25GbE and 40GbE, you simply want Optane or something more exotic.
We wanted to highlight a few other parts of this chart. First, the NAND based NVMe SSDs occupy a distinct band to themselves with the two Intel and one Samsung 2.5″ U.2 SSDs offering somewhat similar performance. The Samsung PM953 M.2 SSD we are using has capacitors for power loss protection so it still performs well, albeit at a lower level than the performance-oriented NAND NVMe SSDs in our comparison. A Samsung 960 Pro M.2 NVMe SSD will be near the bottom of this chart, often below the SATA SSDs because of its lack of power loss protection for sync writes.
When it comes to the NAND SSDs for SATA or SAS, we see a tight grouping on this relative scale. This is a case where with latency sensitive I/O the legacy buses show their weaknesses. If you have a 1GbE ZFS NAS, this is unlikely to be an issue, but one can readily see the impacts.
Perhaps the most intriguing result is from the Optane Memory M.2 devices. Here one can see that these are the least expensive devices in the comparison group, but the 32GB version essentially obliterates SATA / SAS2 options despite its obvious handicapping from a product standpoint. The PCIe x2 interface paired with low power / package count Optane does surprisingly well. What these drives lack are capacity and endurance, but the performance is certainly there.
The Results: IO Latency
With a ZFS ZIL SLOG device, a key concern is I/O latency. Most storage systems to not run at 100% write utilization 24x7x365 so an important factor is how long does each I/O take. Here is what the chart looks like across the entire sample set.
One can see the general grouping here, again with the SATA and SAS offerings lagging and the NVMe NAND SSDs performing well. The Intel Optane 900p 280GB drive is again obliterating the competition.
Below the 512K size, the chart is compressed to the point that it looks like all offerings are essentially the same. They are not. We took a sample of this chart stopping the results at 512K to show the difference.
Again, we can see the highest latency per I/O is the group of SATA and SAS drives we had. We actually had a few more drives but we wanted to limit our comparison group to 10 offerings and realistically, while you can debate which one is faster the message is clear: get on the PCIe bus.
The Intel Optane Memory M.2 devices are beyond intriguing. They are handicapped by the PCIe x2 interface and limited media, but at smaller transfer sizes (under 16K) they are competitive with higher-end NVMe drives. As transfer sizes go up, enterprise NAND NVMe SSDs can handle the throughput.
Coming back to the Intel Optane 900p if you were not looking closely you may have missed it on the chart. Its grey bar looks like it is the X-axis for a good portion of this zoomed-in chart with 1/10th the latency of the higher-performance enterprise NAND NVMe SSDs.
What about the Intel DC P4800X as a ZFS ZIL SLOG?
As we mentioned earlier, we actually have data on the Intel DC P4800X. Directionally, you can look at the Intel Optane 900p 280GB drive and assume it is a bit better. We double-checked the configuration before publication and it was in a different PCIe slot so we did not feel comfortable publishing the comparison. The physical location seemed to have a slight impact on performance.
Realistically there are a few more factors into whether you would use an Intel DC P4800X that are more important than the muted performance advantage: endurance and data integrity. The Intel DC P4800X 375GB SSD is rated at 4x the write endurance of the Intel Optane 900p. The Optane 900p is rated at 5PB. One could make a legitimate argument that a majority of 100-200TB ZFS appliances over five years will never push even 1PB of writes onto a SLOG device. That is fair. Once you are over 200TB, the cost of a mirrored Intel DC P4800X becomes so small on a TCO basis, we would recommend the Intel DC P4800X in a heartbeat.
The other reason to get an Intel DC P4800X over the Intel Optane 900p is data integrity. Originally the Intel Optane 900p was marked on Intel ARK as having power loss protection. That makes sense given the physical architecture. There are no RAM write cache packages on the P4800X or 900p due to how Optane works. What is more important is end-to-end data protection.
The official Intel DC P4800X v. Intel Optane 900p comment we got from Intel is:
As an enterprise part, the Intel® Optane™ SSD DC P4800X offers multiple data protection features that the Intel® Optane™ SSD 900P does not, including DIF data integrity checking, circuit checks on the power loss system and ECRC. The DC P4800X also offers a higher MTBF/AFR rating.
Given Optane performance, if you are building a large ZFS cluster or want a fast ZFS ZIL SLOG device, get a mirrored pair of Intel DC P4800X drives and rest easy that you have an awesome solution. If you are building a small proof of concept ZFS solution to get budget for a larger deployment, the Intel Optane 900p is a great choice and simply blows away the competition in its price range.
When we used the new FreeBSD diskinfo slogbench, the Intel Optane drives stood out. In fact, they categorically obliterated SATA and SAS options. Even the previous generation category killer Intel DC P3700 is easily bested by the Intel Optane 900p (and P4800X.) Although it was our conjecture, if you are building a 10TB ZFS proof of concept NAS with 1GbE networking, the Intel Optane Memory 32GB M.2 drive is an enormous upgrade over SATA and SAS devices in the sub $100 category if you can live with the lower endurance and reliability ratings. This particular SLOG use pattern is quite common in other log device scenarios so it is instructive well beyond ZFS applications.
In the first part of this series, we investigated What is the ZFS ZIL SLOG and what makes a good one. In this article, we used a new tool to simulate the writes and flushes that a ZFS ZIL SLOG device goes through. We did not want to stop at using a synthetic test so in the next installment we have real-world data from a lab ZFS NAS. A quick spoiler there is that we have been seeing actual ZFS NAS performance that follows the stratification we see with this benchmark.