Update to the Intel Xeon Platinum 9282 GROMACS Benchmarks Piece

26
Intel Server System 9200WK 2U Air Cooled Node CPU Heatsink Cover
Intel Server System 9200WK 2U Air Cooled Node CPU Heatsink Cover

I wanted to circle back on this one since apparently a lot of people read the Intel Performance Strategy Team Publishing Intentionally Misleading Benchmarks. As one might imagine, I have been chatting with Intel folks since even before that article came out yesterday. I think we have a fairly good idea of what happened so I wanted to lay this out.

Getting to the Bottom of Intel’s GROMACS Results

Over the last day or so since the piece went live, Intel has re-tested using GROMACS 2019.4. They showed me the results which were largely the same as they presented publicly yesterday.

There were a few things that we addressed in the discussion. The first we are going to use this chart from the discussion yesterday:

Intel Xeon Platinum 9282 GROMACS Config Compared To EPYC 7742 Form Intel
Intel Xeon Platinum 9282 GROMACS Config Compared To EPYC 7742 Form Intel

There are two points that needed clarification:

  • First, the “AVX2 Build” meant that Intel specifically enabled the AVX2 data path and allege it is working properly in their GROMACS 2019.3 run. One of the big changes in GROMACS 2019.4, was that the tool automatically optimizes for this.
  • Second, the disclosure #31 threads per core = 1 on AMD, Intel maintains that it was a typo and that the tests were actually done with two threads per core.

During the discussions, I mentioned that this seems to be a process breakdown. I think we agreed that a better proofing process and disclosures need to be part of Intel’s go-forward process. I also offered to have them bounce this type of content off me before they release it to help prevent this type of error to creep into the public domain.

I also confirmed that Intel did the testing using the AMD EPYC 7742 default TDP of 225W, not a 240W cTDP that the chips are capable of. That cTDP has some variability in how much it helps different pieces of silicon, but it can get single-digit performance gains on tasks like this. Frankly, I still think a 240W cTDP is a better proof point if they could not get something like the EPYC 7H12 if one is being completely fair and best optimizing the AMD system for comparison. One could argue that AMD would similarly not extend that courtesy to an Intel part, but this needs to be documented that the feature was available and not being used.

Publishing incorrect information is misleading, but Intel maintains that given its test setup above, it still was directionally correct on its numbers. We do not have a Platinum 9282 system in the lab for our team to run comparisons on, so at some point, we look to Intel for those numbers and it is up to us and our readers to decide whether the optimizations Intel presented are acceptable.

Broader Perspective

I wanted to add a few points to the discussion:

  1. Many of our readers rightly noted that Intel is using its software stack, and it is comparing a 400W TDP CPU versus the default 225W configurable TDP of a competitor. The Platinum 9282 platform is only available from Intel and lacks some substantial features that are also important to the discussion such as PCIe Gen4. At the same time, Intel disclosed what they did, albeit with an error in the initial draft.
  2. While the criticism of the comparison between a mainstream CPU versus a niche part is valid, as it how the test was conducted, it seems like Intel did a better job than their publishing process indicated.
  3. We need to remember this is a marketing exercise. As such, we also need to expect that Intel is going to try presenting its best case.
  4. Intel needs a better process to accurately communicate what they are showing. This is a case where that process broke down. Without a solid process to accurately communicate what is being shown, we end up in situations like these where Intel is trying to show a competitive landscape but end up showing something else. Even if one item was just a typo, which everyone makes (I will be the first to admit I frequently do), it changes what is being presented to the public. A process to maintain accuracy in communication is key to ensuring that content is not misleading and also ensuring reputations are not damaged.
  5. I pointed out that the disclosures are not easy to find given their citations. I was told that the company has a project to improve that.
  6. I actually know the Intel performance teams fairly well, and they are generally nice folks. I know there are some comments out there that are extremely disparaging citing specific individuals that we never mentioned on STH. It is worthwhile to remember that these folks have families and are often not accustomed to the spotlight.

Final Words

The best comparison for the AMD EPYC 7742 to Intel’s lineup is the Platinum 8280. Comparing the EPYC 7742 to the Platinum 9282 is probably not the comparison that is the most useful, however, it is a story that Intel is pushing. Intel is being very aggressive in providing support for the Xeon Platinum 9282 and there are going to be cases where it is able to beat a lower power chip. That is Intel’s story to tell whether many would agree with that comparison or not.

At the end of the day, to increase accuracy, Intel needs to implement a better publishing process to bridge the gap between the work that their benchmarking teams do, and what they show publicly. The company can choose what data to present and how to present it. Ensuring accurate communication of what was, and what was not done in the comparison is important.

From our perspective, I am more than happy to listen to concerns and help find these errors so that accurate information can be presented.

26 COMMENTS

  1. Thank you Patrick, for your hard and honest work. I just want to ask one single thing regarding your chats: what is that point 5. “I pointed out that the disclosures are not easy to find given their citations. I was told that the company has a project to improve that.” bullcrap? Have you had the opportunity to ask back, WHY ON EARTH does a company need a project(!!!) to copy and paste the already existing disclosures as footnotes? Why must it be a project not to bend laws and Intel-specific FTC degrees to their outmost limits? This is really beyond dishonest, really. Naturally, I’m not addressing this to your buddies who do the benchmarking and performance evaluating themselves. Those folks 100% surely feel really bad having to say ‘we are having a project to copy and paste already written lines’. Damn.

  2. Bravo STH for this update. I don’t think the 7742 against 9282 is fair at all. Still, great on the courage to post the update and take the high road here.

  3. Thanks for the update. We should all know why there’s been a lot of vitriol thrown at Intel. It is unfortunate that a few good people at Intel will get the brunt of it, but that’s what happens when you work for a company with a poor track record of competitive fairness.

    We have not forgotten about the industrial chiller incident, followed by the commissioned benchmarks that were full of errors that favoured Intel over AMD. We all know that Intel has a history of performing various non-competitive, and in some cases arguably illegal actions, rather than stepping up to the plate and competing within the established lines of fairness. There’s absolutely no reason we should give Intel the benefit of the doubt, instead, it’s up to Intel to disclose what they are doing accurately and up front in plain sight, which the company simply has not done in this case as well as others. Any benchmarks that were made using Intel’s compiler should be seen as intentionally misleading propaganda, and it’s made far worse by attempting to hide the fact from view, I’m not going to give Intel the benefit of the doubt here, no one should.

    We should only be OK with Intel using its own compiler when the competing product uses software compiled with a known high quality open source compiler with all optimization settings disclosed. As far as I’m concerned, Intel is still up to no good and should be called out for it. There’s nothing wrong with painting your products in the best light possible, so long as there’s at least a semblance of an apple-to-apples comparison, and what is being done is stated accurately up front in plain sight, allowing it to be verified by a 3rd party should one choose to do so.

    Intel needs to do a lot more than roll out a project to improve citations, that’s just another symptom of a much wider problem at Intel, which as far as I can see, is not yet being tackled in a serious way.

  4. I think it’s important to mention that Intel’s 9892, for all intents and purposes, doesn’t really exist. It’s impossible to get pricing on it, there are no third party tests of it, and Intel doesn’t openly sell them… We truly don’t even know if any “customer” owns one.

  5. Ok so the guy publishing that blog series for Intel should be fired for letting something like this slide.

    EVEN WITH Intel making a level playing field, they still are gaming the comparison. That they didn’t raise cTDP for AMD means that they didn’t fully optimize for AMD.

    So they’ve got fully a fully optimized Intel that isn’t a direct competitor against a less than fully optimized AMD test.

    They’ve called out what they’ve done but still are pushing a cooked competiton.

  6. Intel presented a valid but unfair test blog that wasnt proofread. I don’t think that’s much better. They’re trying to

    I also take issue with the notion that they gave AMD every benefit. CTDP 240W they didn’t use.

    Very poor showing from Intel meant to decieve those that don’t know better.

  7. Is cTDP hard to set or is it an obvious BIOS setting? Maybe Intel and Supermicro didn’t know how to? I saw their system supports 240W so there’s got’a be a setting, no?

    If they’ve done that across all tests then this update’s biggest finding is that Intel ignored a max performance feature when doing max performance benchmarks across the board.

  8. I have a nagging suspicion that if you’re in the market for these products, you are running your own test.

  9. “if you’re in the market for these products, you are running your own test”

    True, otherwise being a company’s paycheck collectors 😀

    First thing first about the design integrity and potential unknown security issues, having Intel explain its using partial addresses and “thoroughly assess” the potential risks is much more important since people don’t know what they don’t know until … but Intel sure does know ist design the best.

  10. Intel, a great company being able to pump $Bs Q after Q but nothing else:

    1) Breached the 386 agreement.
    2) Bribed major OEMs, e.g. $Bs cash to Dell per just the discussion between two CEOs.
    3) Deleting its executives’ email critically related to the lawsuit with AMD and overwriting the backup disk.
    4) Security holes, the ex-CEO told the world “no bug, the chips were designed as intended.”
    5) Using partial addresses, the most fundamental design problem, unknown potential security issues until Intel thoroughly combs through the design to ensure.
    6) …

  11. “the company has a project to improve that”

    The company has a project to find the glue AMD has been using to glue its 2 28-core dice together 😀

  12. TL;DR

    “We are not sinister, just incompetent.”

    Sadly, they are both, but it is understandable, bloated monopolies aren’t exactly known for being neither honest nor nimble in their responses to a changing market landscape.

  13. The fact that they took an Epyc part that is not at the top of the stack performance-wise, and then kept it at it’s lowest configurable TDP, to compare against the 5x more expensive 400w TDP, socketless, “who-knows-how-its-cooled” specialized product, pretty much says all that needs to be said about Intel’s in-house comparisons …

    Benchmark test configurations is another story altogether .. sure, maybe they can claim incompetence here, but even if give we them the benefit of the doubt … that still leaves the issue above ….

  14. Patrick. I just commented a moment ago on this, but I wanted to include a link to Agner Fog’s website, where he documents Intel’s ongoing and flagrantly illegal activities with respect to inspecting CPU id’s in the Intel MKL dispatcher. https://www.agner.org/optimize/blog/read.php?i=49#1006

    This is only the most recent activity, but the site/blog goes back to 2009, and thoroughly documents the issues around the dispatcher’s illegal way of gimping non-Intel x86 processors and the case history, where there was a settlement between Inlet and AMD after AMD sued them, and the FTC filed a complaint after the settlement, to force Intel to cease and desist in artificially limiting performance of other vendor’s x86 chips in the MKL, and other dispatchers.

    Basically, for an Inlet processor it will use the best AVX/vector instructions, and for non-Intel processors, the MKL dispatcher will choose the least good supported vector instructions. As an example from the link above, in the MKL from 2010, rev 10.3 and “Intel processor 64 bit mode” will use AVX, but “non-Intel processor 64 bit mode” the dispatcher will choose SSE2.

    “There are many different versions of Intel compilers and function libraries with different CPU dispatching schemes. Some of these are fair to non-Intel processors and some are unfair. By unfair dispatching I mean that it chooses a suboptimal code path when running on a non-Intel CPU even when the CPU is compatible with a better code path.”

    Unfortunately, MKL latest revision continues to be an intentionally unfair implementation and GROMACS is a biased benchmark as a result, as are other benchmarks using MKL and certain other libraries and compilers produced by Intel. Unfortunately, I don’t know of a way to measure the impact because there’s basically no other option than to use those Intel libraries. Libraries make no money, so there’s is no competition and therefore everyone just uses the Intel libraries, so you can’t substitute fair code without writing a library/compiler/linker yourself to determine the impact.

  15. I should also mention that Agner Fog has produced voluminous documentation about how to mitigate the unfair code execution that Intel persists in including the in MKL. Any benchmark coders should probably be more than passingly familiar with that optimization manual, and benchmarking organizations who don’t account for that should just stop pretending to be a benchmarking organization and admit to being part of Intel’s PR machine.

    *********************
    At the same time, it is incumbent on media outlets to not let this sort of thing get a pass, but specifically call out and exclude all unfair benchmarks, which would include any benchmark using unfairly written Intel libraries and compilers, as is the case here with Intel’s pathetic attempts to use an internal “benchmark” to mitigate the damage being done by AMD’s chips through PR and not better hardware and better pricing.
    *********************

    Obviously, the GROMACS authors should re-base to fair code, because their results are basically meaningless for evaluating actual performance of x86 chips’ vector math implementations.

  16. It all boils down to the subtle difference between being competitive and being anti-competitive.

    To compete means to do your best and match it against other’s best.

    To be anti-competitive means to alter the performance of others, usually, because your own falls short.

    There is nothing wrong with optimizing every last clock cycle out of code, if that gets the job faster – it is a good thing. Using your influence to deliberately make it so that the competing product is deliberate ran through a sub-optimal code path is simply despicable.

    It would be identical to an Olympic runner kicking others in their ankles just before the race… About as uncool as it gets…

    Yet it also happens to be all too common practice, and AMD in particular has seen more of that than any other tech company, both in their CPUs and GPUs, where nvidia has been actively doing the same ever since they found themselves with spare money to throw at it.

  17. Someone asked to ELI5 on the 1st post so I’m going to repost here.

    If you’ve read the update as well here’s the short version:
    — Intel published benchmarks citations that were off the chart and hard to find describing what they did
    — In those benchmarks, Intel was sloppy and had a typo that said they only used half the threads available on AMD. STH got them to update and confirm they really used all the available threads.
    — Intel still used its compiler and software stack that is much better tuned for Intel and has a history of de-tuning performance on AMD.
    — Intel used an older version of the GROMACS software that didn’t correctly support AMD CPUs but they did some changes to get that performance back with AVX2. They later confirmed in testing that what they did worked, or at least gave similar numbers using Intel software.
    — Intel did a lot of changes to boost settings and such on their CPU while doing other changes on AMD like making the chips appear as 8 CPUs instead of just 2.
    — Intel is running AMD chips at 225W of power when they’re capable of running at 240W with a BIOS setting.
    — Intel is using that not fully tuned AMD 225W CPU against a 400W CPU of its own that it changed boost settings on which is what is impacted by 225 instead of 240W.
    — The Xeon Platinum Intel is using in comparison is not socketed, and is sold only affixed to an Intel motherboard. The AMD EPYC is available from systems from almost every OEM.
    — The Xeon Platinum costs 3x the EPYC chips they are comparing to.
    — They didn’t say they were using it, but they’re using air cooling on AMD and most probably water on the Intel system.

    Even with all that they’re getting only like 30% more performance. Now that they’ve updated to fix the mistake they made after these guys pointed it out, and are claiming it’s fair.

    The reason that everyone’s still pissed at Intel is really because they’re comparing a 3x the price, hard to get part, that costs more to operate, and requires water cooling to a standard AMD part. In that comparison they only won by 30% including running the AMD chips at 225W instead of 240W that they are capable of.

  18. This is unfortunate. This isn’t the first time Intel has performed a deceptive act and Intel is given a pass by STH. Not addressing the use of software that runs great on Intel and poorly on AMD should be the main focus. I’m relatively new to the site, and the site shouldn’t tow the line for Intel like the NBA to China.

  19. I think STH is about the only computing media site that made any note of this misleading test, and STH is taking a risk by being critical of Intel, so I think STH is doing fine on this coverage.

    However, my point in bringing up the use of the Intel Compiler and the Intel libraries (MKL), is that those are known to under-represent AMD performance especially for vector instructions, by inspecting the vendor ID information and choosing a sub-optimal code-path, and so even though they re-ran the test on GROMACS 2019.4 which now supports AVX2, that does not guarantee that the AMD chip will be using AVX2 instructions.

    The only way to clear the air as to what the GROMACS performance of these two chips actually is, would be for someone to perform the same tests on this exact hardware using GROMACS 2019.4 compiled under gcc and with glibc, which are known to be fair to AMD and Intel, not the Intel compiler or libraries. That is exactly what I hope that STH will do, but they are not likely to get a 9282 for testing, so we will probably never know how these two cpus stack-up against each other in a fair environment.

  20. GROMACS recommends using gcc to compile the GROMACS test software, btw, so retesting with gcc would just make sense anyway.

  21. Starting from AMD 7nm Zen 2, with truly competitive (not trying-to-compete) or even better products available, the matter is only people’s own choices; some enjoy being milked by the big brand.

    If the professionals who buy servers or server chips don’t know what is going in the industry about the suppliers and products, they need to get themselves up to the speed.

    Divide and conquer applies to silicon too; for high core count chips, the chiplet chips is the only practical way to go and forget about super-sized monolithic chips.

  22. Misleading or marketing?

    Has anyone seen a company doing and publishing a apple-to-apple benchmark with the results showing that the competitor’s product is better?

  23. Misleading marketing is pretty much the only illegal kind of marketing, so it is especially worth noting.

    It’s also why we only bother to pay attention to independent reviewer’s benchmark results … amirite

  24. The week after they do this, they’ve released new security vulnerabilities and patches which hurt Intel performance more. I can’t believe they’re getting away with this. It’s even more misleading now than with the first piece.

  25. I contend that the GROMACS 19.4 retest did not properly use AVX2 instructions for the Rome chip, because Intel claimed no performance improvement vs. running 19.3 that does not support AVX2 despite GROMACS and STH indicating performance gains from 19.4. In addition, there’s a lovely little post you may have heard of on Reddit from Nedflanders1976, which indicates that his patch to force the MKL to run in AVX2 mode on Ryzen chips demonstrates 20-300 percent improvement on MATLAB using the MKL indicating that performance improvements are there when forcing Intel’s libraries to behave fairly toward non-Intel chips as required by law. This raises the question of whether Ryzen AVX2 is actually faster than Intel’s AVX512 on latest available chips as some testing had shown.

    Thank you again for bringing these shenanigans to the public’s attention, Patrick. Can’t wait to see your GROMACS tests using fair compilers and libraries.

LEAVE A REPLY

Please enter your comment!
Please enter your name here