This weekend I got a call from a local business that produces highlight reels from video game footage. Admittedly, I am not a big gamer (my PS3 was purchased at launch and has played games for less than four hours over the years), but I did think that this would be an interesting opportunity to do some troubleshooting over a rainy weekend.
Soon after arriving I saw the setup, rather beefy PCs using KillerNICs for gaming traffic and various Realtek 8111 series NICs plus a few setups with dedicated Intel NICs running to a HP ProCurve Gigabit switch. This is a best practice when one has two very different types of traffic going over networks as the gaming network was optimized for low latency and the storage networks was optimized for sequential transfers. The first thing I did was to look at the recording setup:
The main recording setup used a program called FRAPS to capture the gaming video. The “X” drive was a mapped network share to a lower-end four drive NAS solution that uses a proprietary RAID-like feature to store data (I am deliberately not naming the manufacturer.) Inside the NAS box four 7,200rpm Seagate drives were situated so it initially seemed like there was plenty of drive to saturate a gigabit Ethernet link. Trying to remove variables, I had a small SSD-based NAS that I use to troubleshoot these types of problems.
When troubleshooting network connections, I tend to use a program called DU Meter which has a few handy features that show what and when network traffic is being generated in Windows. I had the team load a quick 30 second run (ended up taking almost 45 seconds including loading times) and used the DU Meter stopwatch feature.
Since I know the network and the SSD NAS I was using were both easily capable of running 125MB/s over the network, I was able to determine that the maximum transfer speed that the FRAPS application provided was in the area of 44MB/s. 44MB/s is a figure that most custom-built NAS systems can handle very easily. When I tried using the pre-built NAS, transfers were less than 35MB/s.
Armed with these figures, I started to investigate the pre-built NAS in question. One thing I saw was that there was about 600GB free on the device with approximately 6TB of raw capacity. The company did have extra, clean drives on-hand and after a quick reload of the OS and build of the storage pools, sequential transfers were in the 90-100MB/s range over a single gigabit Ethernet link. After re-installing the previous set of storage drives that were 90% full, the transfer speeds were back to around 35MB/s.
From the best I can tell, what basically happened was that the business was being penalized performance wise on two fronts. First, the data was being written to the inner parts of the drive platters. On modern disks, it is not uncommon to see a drive go from 130MB/s or more at the outer edge to half of that at the disks inner edge. The disks in the production system were filling the last portions of the disks so the per-drive performance was probably closer to 70MB/s per drive. Second, the custom RAID-like feature is known to have some overheads. These overheads are clearly present because even with four new disks sequential transfers were well below that of the SSD-based NAS I used. A quick search revealed that the NAS vendor’s RAID-like solution did have significant overhead. Between the fairly full disks, and the performance penalty due to the software, the NAS in question was unable to hold 45MB/s.
One major thing people should consider when looking at benchmarks of NAS units online is the difference between new and well used systems. As systems become more full, performance degrades on spindle disks simply due to the physical location of the writes on the platter. Next, many SMB (and consumer) friendly NAS units are optimized for ease of use rather than performance. As a result, the performance of the business’ storage server declined over time to the point where the performance became unacceptable even in a single-user environment. My advice is to do research on a given NAS product both when it is new (when most people benchmark) and when it has been deployed for some time. Doing so will help avoid the same issue that the business I helped over the weekend experienced.