WD Red DM-SMR Update 3 Vendors Bail and WD Knew of ZFS Issues

21

How Was This Missed?

In all of my interactions with WD personnel, they have been top-notch. So how does something like this get missed? WD advertises that it does extensive testing, on the WD Red line in NAS scenarios. It knows it has NAS vendors using ZFS. It also knows that ZFS does not play well with DM-SMR. Drive vendors had previously had these WD Red DM-SMR drives on their compatibility matrices/ hardware compatibility lists. Indeed, some still do. So how can something that we can test and find, and readers are seeing in real-world usage, slip through the cracks?

A few quick data points:

  • iXsystems (the company behind FreeNAS/ TrueNAS) has disqualified the drives in April 2020 after shipping them in September 2019, and having moderators in their forums warn users against them in the summer of 2019.
  • Synology, a non-ZFS NAS vendor, has now changed course listing the 2TB and 6TB drives as incompatible.
Synology Says 2TB And 6TB WD Red DM SMR Drives Are Incompatible With Their NAS Units
Synology Says 2TB And 6TB WD Red DM SMR Drives Are Incompatible With Their NAS Units

Again, we have focused a lot on ZFS, since that is what we use in our NASes, however, this is Synology who is not using ZFS in these NASes listing the WD Red DM-SMR WD20EFAX (2TB) and WD60EFAX (6TB) drives as incompatible with their product lines.

That Synology “Incompatible Models” listing has over 60 different NAS models that are incompatible. Synology has both traditional RAID at different levels along with Synology Hybrid RAID. Just for these two drives, testing different RAID levels across 60+ platforms, and even eight different RAID levels mean that there are over 1,000 configuration permutations. Those are just two drives, from one hard drive vendor, and a single drive line. As one can imagine, it is nearly impossible for NAS vendors to actually test every drive, NAS, and RAID-level permutation.

As a result, these companies need to do something that is very common in the technology industry. That is to create a test matrix. That matrix is built upon assumptions on what permutations must be tested to consider sets of drives as qualified.

Assuming that WD actually tested these drives and was OK with the single drive and traditional RAID performance drops, they may not have tested ZFS. While they may have had employees that knew DM-SMR would not be a good fit for ZFS NASes, that knowledge had to be introduced into the test matrix. Also, if a simple workload was used that fit into the CMR area of the DM-SMR drives instead of a thorough real-world test, then they may not have seen the issues. For WD, they have a smaller set of drives to test, but they have a large number of NAS usage scenarios to test.

On the NAS vendor side, one of those test matrix assumptions seems to have been that every drive in a line uses the same recording technology. Since that was the experience for many NAS vendors, it was a reasonable one. If you have a new drive, such as a WD Red, perhaps you test the new larger capacity models instead of the smaller traditional 2TB-6TB models that you have used for generations.

A NAS vendor would not necessarily test these smaller drives unless WD had told them about the DM-SMR change, for the lower capacity drives in the new WD Red series. If that was the case, then the NAS vendor test matrix coverage would test the higher capacity CMR drives and skip the lower capacity points.

All of this sounds reasonable, and like what may have happened here. There is a considerable risk to this information not passing properly if the engineering teams are not the ones working with customers. One can imagine how the WD Sales Reps for given NAS vendors never received the memo from Product Management that Engineering said there was a tweak to certain drives. As a result, that information never passed to the NAS vendors to update test matrices.

Toshiba SMR MC Extreme Performance Delta Over Time Example F6
Toshiba SMR MC Extreme Performance Delta Over Time Example F6

Even after the drives are qualified, because of how DM-SMR drives are designed to have CMR areas for effectively caching random writes, system-level testing may miss this. A system may pass basic functional tests at the NAS vendor by only hitting cache areas with burn-in workloads. The unit is then shipped to a customer. The customer may take quarters or years to store appreciable amounts of data on the drives or see a failure causing a rebuild scenario. Unless very rigorous testing is done at the time of manufacture and installation, DM-SMR technology can effectively hide the drive’s important performance characteristics for a long time through the chain.

What is Going on at WD?

This brings us to the question of why it is now June 14, 2020 and we still have not heard anything from the company since April 22, 2020 even with the italicized “Continue checking here for updates regarding our WD Red NAS Drives” emblazoned across the page.

WD Red DM SMR Page Accessed June 14 2020
WD Red DM SMR Page Accessed June 14, 2020

How do we reconcile what we have learned and the interest this story has generated, with no new updates for almost two months?

There are two main ways. Generally, I like to assume the best of people. WD people are very sharp individuals. While they could just be hoping to weather a negative news cycle and keep shipping WD Red DM-SMR drives to customers hoping it does not impact sales in the long run, that feels too far given what we have seen.

Perhaps the most interesting aspect is that as of publishing this article, nobody at WD has contacted me. Huge numbers of folks in the market, including NAS vendors and end customers, have seen our pieces. We from time-to-time publish these investigative pieces since we are an independent site. Usually, we are met with large meetings in-person or on the phone. At a minimum, most large companies will have a point person reach out in a few hours after a piece goes up. Even if that is simply to stall and set up a meeting for a few days in the future. After 11 years of growing STH, I can say there is a fairly standard process.

STH Turns 11

Part of my process here has been to wait. Prior to running STH, I did management consulting at one of the world’s largest professional services firms. During years of consulting, I was able to see many companies large and small including several in the storage space. One gets a good sense of how corporate structures dictate crisis response from management consulting. A reason companies to not act, is that they are often waiting for a decision.

Looking at symptoms we covered here:

  • WD has individuals who are intelligent and passionate about their work
  • WD knew, or should have known, that ZFS NASes would not function properly with the WD Red DM-SMR drives
  • That information did not flag the issue in or for WD’s internal testing
  • The DM-SMR change seems not to have made its way to NAS vendors that would be impacted by the change
  • Multiple NAS vendors did not hear from their WD contacts about the DM-SMR change. Especially important because it was only to a portion of the new line of Red hard drives. Accordingly, these NAS vendors did not update their test matrices
  • WD has been unable to coordinate even a “Hi, we saw your piece, can we chat?” reach-out (usually a symptom of not having a message to follow-up with)

Generally, these are symptoms of having a corporate structure and culture that is functionally aligned, rather than product aligned. Manufacturing has a pipeline of product in-progress and shipping. Product, marketing, and sales teams have unit and revenue targets. Finance, Customer Service, Legal, PR, and other functions will need to sign off on messaging and plans of action. Instead of there being an individual with a general manager or GM-like responsibilities, a problem like this WD Red DM-SMR issue requires many people to develop a response. More people involved require more touchpoints and approvals which slows response times unless a cross-functional executive drives teams to a course of action.

My sense is that this is a brand issue for WD, but the Red 2TB-6TB line is not the biggest for the company. It is not big enough to drive swift action across functionally aligned groups. The alternative is not one worth entertaining since that paints a poor picture of human empathy.

Final Words

We are getting to the precipice here. Either WD is going to go down the path of staying silent, or they are going to stop selling these drives and get a rectification plan in place. Beyond consumers, it is also the NAS vendors and resellers that were willing to recommend WD drives based on historical greatness (and promises of MDF) to their customers who are hurt. Staying on the continued course of silent inaction will not help either the reputation of the WD Red brand or the brands of those NAS resellers who sold these drives to their customers.

Red Vs Red
Red Vs Red

For our readers who have systems with these WD Red SMR drives, see Rob’s story and his blog post. It may be worth contacting WD directly since they seem to have a support process. Also, develop contingency plans for how to deal with these drives in your environment.

Even if you are not impacted, if you know of clients, friends, family, or others using NAS drives, share your opinion about the SMR drives. Use our data, use Jim Salter’s data, or whatever you would like, but have those conversations. The biggest danger to the community today is not those who know about DM-SMR. Instead, as a tech community, we should take it upon ourselves to help those who are casual NAS users, but who do not understand the impacts of using SMR technology where it does not perform well.

21 COMMENTS

  1. Thank you, again, for taking a stance here!
    And just as your previous article on the subject, it is neutral, balanced, insightful. But also on point, clear and just great tech journalism.

    On the topic: I hope WD will change course. No one could have imagined that they would be able to hurt their own brand image so much only a year ago.
    But they did. And without changing course, they are in danger of totally loosing faith in the NAS space.
    I am not sure that they realize this. To me it seemed from the start that they were trying to wait “till this news blows over”.
    Only that it wont. People buying NAS drives are in the huge majority quite informed, and most of them will now know of this mess. And like me, either be happy to have bought Seagate, or feel so badly burned they will never touch WD again, short of a total change in course.

    WD: Accept this wont go away. Clearly label SMR drives as such; Keep all NAS drives CMR/PMR; Communicate openly the drawbacks for SMR in all channels, advertising, marketing, etc; And replace ALL NAS-SMR drives with CMR when customers who were burned by this request it.
    This may not fix the damage done, but without it, the damage will only grow exponentially.

  2. You’ve got WD dead to rights with this. Nice work.

    Maybe it’s just me but I appreciate you look for the best in people. It’s refreshing for online. I’d say it’s turning my head. Now I’m thinking what if they aren’t just a series of misses, assumption, and luck. You could’ve gone way harder on them the more I’m stewing on it.

  3. Synology may not use ZFS, but their support for btrfs may also expose them to similar issues with SMR drives — hence dropping support for them. I couldn’t tell if Rob’s issues with RAID 6 involved btrfs or just ext4 and mdraid.

  4. I wondered the same thing re: btrfs vs ext4. My tests for Ars Technica were dead simple ext4 on mdraid6, but a Synology uses btrfs, LVM, and mdraid.

    It’s possible that my ext4 tests missed a pathological interaction between either btrfs and SMR or LVM and SMR.

  5. Jim Salter – first off, let me say again, great work.

    We have a had a lot of requests for different file systems/ storage arrays to be tested. As folks hopefully can appreciate from this article, testing all of them is a huge challenge for us and Jim/ Ars. We can help give insights and data points, but it is testing that vendors are being paid to do via the premiums in their products.

  6. Great thanks for your information. It seems like WD is getting greedy. You have to clearly state what type of technology are using in your drive, not just hide it without clearly state that in the spec. It’s so dishonest to do business like this. Shame on you. WD. Time to say goodbye to WD including my customer till you clearly state which drive is using smr clearly with different pricing.

  7. I see there in the screenshot from Synology’s drive compat list that the WD Red SMR drives do not have vibration sensors. I remember distinctively that WD Greens, as well as desktop HDDs from other manufacturers, had at least one or two vibration sensors on their PCB. Is this yet another way of cutting corners by WD? (Those vibration sensors do not seem cheap; looking at bulk pricing at Digikey, for example the Murata PKGS-00GXP1-R comes at USD 0.466 a pop for 30K units)

    Anyways, thanks for the follow-up on this topic!

  8. I agree with Steven, above.

    I’ve been managing enterprise storage since c. 2000 and WD drives have by far been the most prone to failure, especially when used in RAID. Some of the big players that make their drive reliability data available do not bear out my experience on a brand-wide scale, but I would never buy another WD drive.

    HGST improved WD’s hardware and engineering (opinion) but the company’s very long history of denial of drive issues (fact) has permanently steered me clear of their products. Your mileage may vary.

    I’ve got about two years in to Seagate Ironwolf drives on a 108TB box with zero failures and super-low failure predictors (read/write error counts, bad sector counts). Two years? Time to swap them out and consolidate, but it’ll be on Seagate drives again.

  9. Funny…. that video was one of the things I emphasized when I brought this whole stinking mess to the attention of tech media and it’s _WHY_ Chris Mellor ran with the original stories

    There have been a bunch of wannabe reporters jumping on the bandwagon and failing to attribute sources or bother doing their homework since this hit Blocks and Files (the Register) but it was all published there first. Chris and I spent about a month going over various bits (including lots of forums posts containing warnings about the drives) in order to work out when they actually first started hitting channels and that’s when we discovered that DM-SMR was so widespread and had been around in small drives for at least 3 years.

    As for why WD are staying quiet: what they’ve already said has been used against them in court and I explicitly warned them that unless they come utterly clean, whatever they say will end up as court evidence – plus they’re clearly aware they’re being investigated for Sherman act violations.

  10. Many sites would’ve just bashed WD, but you gave them the benefit of the doubt. That’s quite remarkable.

    Hanlon’s razor, isn’t it?

  11. The rot at WD goes back further than the SMR debacle. I was very disappointed recently when I replaced some older 8TB WD red drives with some newer ones to find the newer ones run a fair bit hotter. Turns out the old ones were helium-filled. It just seems like another bait-n-switch. Build up a brand and good reputation for a product line – then over time cheapen it yet keep the prices the same, and trade off the reputation and cash in.
    Eventually though, consumers wake up to this.
    The sad part is there is just not enough competition in the HDD space. It seems almost like a two horse race now. It’s almost like these two vendors feel like they can dictate terms and do what they please with no consequences.

  12. Re: Synology & BTRFS

    Even something as simple as running Extended SMART Tests on a Synology will show a significant difference in my experience. I have an 8×8 Synology 1815+ with 3 WD80EMAZ CMR drives and 5 Seagate ST8000AS0002 SMR drives, and the WD CMR drives will finish something as simple as that Extended SMART Test in like 10-20% of the time it takes the Seagate SMR drives. The SMR drives take literal days to finish.

    FWIW, originally my 8×8 was all Seagate SMRs, but I’ve slowly replaced them with the WD drives as they’ve failed. Luckily for me it turned out that the WD 8tb drives were CMR and so I wasn’t replacing SMR with SMR. The rebuild time for a failed drive on my original 8×8 SMR SHR RAID was over a week.

  13. I personally have 3 2TB WD Red (model WD20EFRX-68EUZN0 ) I had originally wanted to use the drives with an HBA card, in a FreeNas build but I did not due to my hardware not being lackluster to run it (not enough memory, and older LGA 775 system). So I put them in Raid and used them in Windows for my network storage. And since I did that, I have been having nothing but problems, drives dropping from the array, slow writes. Eventually I figured the HBA card was bad, so I ditched it and went with using the windows storage space solution. Still the problems have persisted. So I sought to add more storage, I added a 4TB Seagate Ironwolf drive. That kept getting flagged by windows storage space as being problematic. So I removed the drive from the storage space and have had no more issues. But these WD drives and this recent issues and news of doing this with their NAS Drives. It is clear I will not purchase another WD product.

  14. Patrick,

    Been a fan of the site for years, love the content, love the quality.

    Please, calm down on the article and video length. I hate to be all ADHD but surely this and the video could’ve been trimmed a little.
    Rather than watch your video, I opted not to at all, it was just that long.

    Please consider keeping it a bit shorter, it’s painfully long, your audience are generally pretty clued up, making the video for bottom of the barrel and padding for 30 minutes does nothing for your great content

  15. Just stumbled across this after having issues with what I now know to be WD Red 4TB EFAX drives.
    Western Digital won’t help me as they claim they are OEM drives (I bought them from Amazon) and Amazon won’t send replacements, only refund me, which leave me in the dilema of having to be without any drives whilst I wait for the refund so I can order more. (Yes, I have backups!)
    So beware, if you are buying from Amazon, or any other place, make sure you get the boxed retail version and not the OEM version…

  16. I just read this article running exactly into these issues. But in my case I bought the new WD RED Plus(WD**EFZX) version, which should be the same as the old WD RED ( WD**EFRX). I bought one drive from Amazon and one from Newegg. Both failed with resilvering on ZFS. Faulted write errors, those of which you see from SMR drives. I have a sneaky feeling due to supply chain issues, WD is branding EFAX drives as EFZX drives. Giving them, correct serial and label. I returned all the drives I bought, was going to use them to slowly replace older EFRX drives. I guess I will have to go with Seagate Ironwolf CRM drives. Can’t trust WD anymore. Two drives from two suppliers both fail with same error, yet pass long smart testing.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.