I wanted to circle back on this one since apparently a lot of people read the Intel Performance Strategy Team Publishing Intentionally Misleading Benchmarks. As one might imagine, I have been chatting with Intel folks since even before that article came out yesterday. I think we have a fairly good idea of what happened so I wanted to lay this out.
Getting to the Bottom of Intel’s GROMACS Results
Over the last day or so since the piece went live, Intel has re-tested using GROMACS 2019.4. They showed me the results which were largely the same as they presented publicly yesterday.
There were a few things that we addressed in the discussion. The first we are going to use this chart from the discussion yesterday:
There are two points that needed clarification:
- First, the “AVX2 Build” meant that Intel specifically enabled the AVX2 data path and allege it is working properly in their GROMACS 2019.3 run. One of the big changes in GROMACS 2019.4, was that the tool automatically optimizes for this.
- Second, the disclosure #31 threads per core = 1 on AMD, Intel maintains that it was a typo and that the tests were actually done with two threads per core.
During the discussions, I mentioned that this seems to be a process breakdown. I think we agreed that a better proofing process and disclosures need to be part of Intel’s go-forward process. I also offered to have them bounce this type of content off me before they release it to help prevent this type of error to creep into the public domain.
I also confirmed that Intel did the testing using the AMD EPYC 7742 default TDP of 225W, not a 240W cTDP that the chips are capable of. That cTDP has some variability in how much it helps different pieces of silicon, but it can get single-digit performance gains on tasks like this. Frankly, I still think a 240W cTDP is a better proof point if they could not get something like the EPYC 7H12 if one is being completely fair and best optimizing the AMD system for comparison. One could argue that AMD would similarly not extend that courtesy to an Intel part, but this needs to be documented that the feature was available and not being used.
Publishing incorrect information is misleading, but Intel maintains that given its test setup above, it still was directionally correct on its numbers. We do not have a Platinum 9282 system in the lab for our team to run comparisons on, so at some point, we look to Intel for those numbers and it is up to us and our readers to decide whether the optimizations Intel presented are acceptable.
I wanted to add a few points to the discussion:
- Many of our readers rightly noted that Intel is using its software stack, and it is comparing a 400W TDP CPU versus the default 225W configurable TDP of a competitor. The Platinum 9282 platform is only available from Intel and lacks some substantial features that are also important to the discussion such as PCIe Gen4. At the same time, Intel disclosed what they did, albeit with an error in the initial draft.
- While the criticism of the comparison between a mainstream CPU versus a niche part is valid, as it how the test was conducted, it seems like Intel did a better job than their publishing process indicated.
- We need to remember this is a marketing exercise. As such, we also need to expect that Intel is going to try presenting its best case.
- Intel needs a better process to accurately communicate what they are showing. This is a case where that process broke down. Without a solid process to accurately communicate what is being shown, we end up in situations like these where Intel is trying to show a competitive landscape but end up showing something else. Even if one item was just a typo, which everyone makes (I will be the first to admit I frequently do), it changes what is being presented to the public. A process to maintain accuracy in communication is key to ensuring that content is not misleading and also ensuring reputations are not damaged.
- I pointed out that the disclosures are not easy to find given their citations. I was told that the company has a project to improve that.
- I actually know the Intel performance teams fairly well, and they are generally nice folks. I know there are some comments out there that are extremely disparaging citing specific individuals that we never mentioned on STH. It is worthwhile to remember that these folks have families and are often not accustomed to the spotlight.
The best comparison for the AMD EPYC 7742 to Intel’s lineup is the Platinum 8280. Comparing the EPYC 7742 to the Platinum 9282 is probably not the comparison that is the most useful, however, it is a story that Intel is pushing. Intel is being very aggressive in providing support for the Xeon Platinum 9282 and there are going to be cases where it is able to beat a lower power chip. That is Intel’s story to tell whether many would agree with that comparison or not.
At the end of the day, to increase accuracy, Intel needs to implement a better publishing process to bridge the gap between the work that their benchmarking teams do, and what they show publicly. The company can choose what data to present and how to present it. Ensuring accurate communication of what was, and what was not done in the comparison is important.
From our perspective, I am more than happy to listen to concerns and help find these errors so that accurate information can be presented.