Intel Skylake Bug – UEFI BIOS updates required

0
Xeon E5-2600 V3 AVX2 FMA
Xeon E5-2600 V3 AVX2 FMA

Although the Intel Skylake generation (or 6th generation Core architecture generation) has been generally well received, we have heard news of a new bug with the chips. The bug manifests itself using some of the higher-end AVX/ FMA3 (Fuse Multiply Add) features in the chip for scientific purposes where the chips will freeze up after some time.

Here is the quick primer from Intel on what FMA is from the Haswell-EP slide deck. One can see that the workloads targeted are the Fast Fourier Transforms (FFT) that the Prime95 users are generating.

Xeon E5-2600 V3 AVX2 FMA
Example Intel Xeon E5-2600 V3 (Haswell-EP) AVX2 FMA

From the Intel Communities forum discussing the bug (and cross-posted from mersenne.org):

Steps to freeze your Skylake system:

(If you want to familiarize yourself with the software use the readme, a background in math will be helpful, but is not needed.)

  • In the menu go to ‘Advanced | Test’ and fill in the number 14942209 in the box labeled ‘Exponent to test’
  • Let the program run for some time and at some point, minutes or hours, the system will freeze.

The prime95 software does multiplications of extreme high numbers using the Fast Fourier Transformation. The implementation of these FFT’s in prime95 is handcoded in assembly by George Woltman, and is the most efficient implementation available. This project runs for more than 20 years now and has always been carefully maintained. Tens of thousands of machines run this software 24 hours a day.

For optimization, different FFT sizes have been implemented in Prime95, only the FFT with length 768K freezes the Skylake. [Emphasis added]

Here is the quote from Intel on this issue:

Intel has identified an issue that potentially affects the 6th Gen Intel® Core™ family of products.  This issue only occurs under certain complex workload conditions, like those that may be encountered when running applications like Prime95.  In those cases, the processor may hang or cause unpredictable system behavior.  Intel has identified and released a fix and is working with external business partners to get the fix deployed through BIOS.

What this means is that Intel is going to be sending the fix to motherboard vendors. Motherboard vendors will then need to incorporate the fix into new UEFI BIOS patches for Intel Skylake systems. Users putting this type of heavy AVX/ FMA3 stress on their processors. We do not yet know what the exact nature of the fix is and how it may impact performance.

This is unlikely to impact the majority of users outside the scientific communities, however it is going to be important for those using this newest generation of Intel Skylake chips for solving these problems.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.