Stop constant CentOS 7.3 crashes with AMD Ryzen using Kernel 4.10

3
AMD Ryzen Crash With CentOS Installer
AMD Ryzen Crash With CentOS Installer

If you have tried AMD Ryzen with CentOS 7.3 (or earlier) you will likely have seen a crash or two, possibly a lot more. We have multiple systems where the installer crashes. We have seen hard lockups of running systems. One of the biggest reasons for this is that CentOS is using an older kernel. Our suspicion is that the Kernel 4.10 AMD SMT patches make that a much more stable option. We recently published: AMD Ryzen with Ubuntu Here is what you have to do to fix constant crashes! Now it is time for the CentOS guide.

Examples of CentOS 7 Crashes Running AMD Ryzen

Within a few hours of trying CentOS 7.3 (1611) on AMD Ryzen we were greeted with several types of crashes. One great example is that the CentOS 1611 Everything image continually crashed while running the installer.

AMD Ryzen Crash With CentOS Installer
AMD Ryzen Crash With CentOS Installer

Even if you could get CentOS 7.3 installed, there were still issues. We also saw hard lockups on the Ryzen system which required a power cycle of the machine. That is significantly harder on current Ryzen platforms without IPMI.

AMD Ryzen Select CentOS Kernel Panic
AMD Ryzen Select CentOS Kernel Panic

Like vanilla Ubuntu flavors currently released as of this publication, the experience out of the box was less than stellar. Note: We are working with the Canonical team to get them hardware in our DemoEval lab to use for a fix to 14.04, 16.04 and 16.10.

The AMD Ryzen plus CentOS 7.3 (1611) fix with Kernel 4.10.1

Since we saw similar issues with Ubuntu, we decided to install the 4.10.1 kernel. To make life easier, we are going to use the ELRepo.org repository to get the kernel we need. Installing Kernel 4.10.1 on CentOS 7.3 (1611) is extremely easy. You can fire up nano and make a script we called kernelupdate.sh. Here is the script that we are using to enable ELRepo.org and install the newer kernel:

#!/bin/bash
echo The current kernel is:
uname -r
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
yum --enablerepo=elrepo-kernel install -y kernel-ml
echo Time to reboot

You can simply use bash kernelupdate.sh and it will add the ELRepo and install the Linux 4.10.1 kernel. Alternatively, you can just copy and paste those commands if you are using a single system.

AMD Ryzen Install CentOS Linux Kernel 4.10.1
AMD Ryzen Install CentOS Linux Kernel 4.10.1

After that script executes or after you run the commands manually, you can reboot and you should see a 4.10.1 elrepo entry on the GRUB bootloader. You want to ensure you select the 4.10.1 kernel version.

AMD Ryzen Select CentOS Linux Kernel 4.10.1
AMD Ryzen Select CentOS Linux Kernel 4.10.1

After the system reboots, you may need to update GRUB to use the new kernel by default. Here we will open up the config file and change GRUB_DEFAULT=0 since 0 is the first menu option and our first menu option in the 4.10.1 kernel.

nano /etc/default/grub

AMD Ryzen With CentOS Linux Kernel 4.10.1 Change GRUB Default To Correct Kernel
AMD Ryzen With CentOS Linux Kernel 4.10.1 Change GRUB Default To Correct Kernel

Now we will want to generate the grub2 configuration file so that this change will persist through reboots.

grub2-mkconfig -o /boot/grub2/grub.cfg

AMD Ryzen With CentOS Linux Kernel 4.10.1 Generate New GRUB Configuration
AMD Ryzen With CentOS Linux Kernel 4.10.1 Generate New GRUB Configuration

At this point, you should have a working CentOS 7.3 system updated with the Linux 4.10.1 kernel.

AMD Ryzen With CentOS Linux Kernel 4.10.1
AMD Ryzen With CentOS Linux Kernel 4.10.1

Final Words

The impact of this change was tremendous. We went from crashes every few minutes to being able to run 24-hour stable workloads. You may want to compile your own Kernel but the stock CentOS 7.3 will likely crash before compiling is complete. We found the ELRepo.org method to install fast enough that we could get onto the new kernel. This at least will allow you to get a setup stable enough to use to further customize your AMD Ryzen system.

3 COMMENTS

  1. Or just say goodbye to a distribution which lags behind many others, and comes with, for example, a glibc about 8 generations (4 years) behind what is the state-of-the-art.

  2. I have a 1700x running CentOS 7.3.1611 with kernel 3.10.0-514.2.2.el7.x86_64. I have 32 GB of memory, an Asus Prime X370 motherboard, and my root and boot are on an nvme SSD. I have no stability problems. I originally had problems with some memory that wasn’t on Asus’ compatibility list, but no longer. I am seeing 40% faster results on my workloads compared to a Xeon E5-2670 Sandy Bridge.

  3. I brought a Ryzen 7 1700X (with Asus PRIME B350M-A) just after the release. When I assembled the system running, I just want to get an idea on the performance and I connected a hard disk with CentOS 6.4 and some working software installed. I know it would not work perfectly (it is too old to have any support of the new hardware, but previously it somewhat worked for new systems like Intel Broadwell/Skylake and AMD APUs), but literally it would not boot at all (black screen) after the bootloader screen. After that, I tried another hard disk with CentOS 7 installed and it also gave a black screen. I got a new CentOS 7.3 DVD and do a fresh reinstall, and it freezed at the last step of the installation, and after reboot it got stuck at GRUB2 command screen.

    At the same time, I just tried another Red Hat based distribution (Fedora 25) and it appears to work fine after installation (with a few oops after boot, but the system is working and I got my work done). My software seems to be working, only with some strange CPU scheduling problem for the default kernel-4.8.6 as well as the updated 4.9.14. It has less than optimal performance when my software was using more than 6 cores. and I have to use ‘taskset’ to get the maximum performance (around 10% difference) I had since updated to the 4.10.3 and it work perfectly (no oops, no scheduler problem), and for my work it should be as quick as a 8-core Sandy Bridge E5 at the same frequency when a new version with 256-bit AVX code are used. (for the old version of software that is utilizing 128-bit SSE2/3 instructions, it has similar core-to-core performance to a Skylake CPU at the same frequency)

LEAVE A REPLY

Please enter your comment!
Please enter your name here