Stop constant CentOS 7.3 crashes with AMD Ryzen using Kernel 4.10
If you have tried AMD Ryzen with CentOS 7.3 (or earlier) you will likely have seen a crash or two, possibly a lot more. We have multiple systems where the installer crashes. We have seen hard lockups of running systems. One of the biggest reasons for this is that CentOS is using an older kernel. Our suspicion is that the Kernel 4.10 AMD SMT patches make that a much more stable option. We recently published: AMD Ryzen with Ubuntu Here is what you have to do to fix constant crashes! Now it is time for the CentOS guide.
Examples of CentOS 7 Crashes Running AMD Ryzen
Within a few hours of trying CentOS 7.3 (1611) on AMD Ryzen we were greeted with several types of crashes. One great example is that the CentOS 1611 Everything image continually crashed while running the installer.
Even if you could get CentOS 7.3 installed, there were still issues. We also saw hard lockups on the Ryzen system which required a power cycle of the machine. That is significantly harder on current Ryzen platforms without IPMI.
Like vanilla Ubuntu flavors currently released as of this publication, the experience out of the box was less than stellar. Note: We are working with the Canonical team to get them hardware in our DemoEval lab to use for a fix to 14.04, 16.04 and 16.10.
The AMD Ryzen plus CentOS 7.3 (1611) fix with Kernel 4.10.1
Since we saw similar issues with Ubuntu, we decided to install the 4.10.1 kernel. To make life easier, we are going to use the ELRepo.org repository to get the kernel we need. Installing Kernel 4.10.1 on CentOS 7.3 (1611) is extremely easy. You can fire up nano and make a script we called kernelupdate.sh. Here is the script that we are using to enable ELRepo.org and install the newer kernel:
echo The current kernel is:
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
yum --enablerepo=elrepo-kernel install -y kernel-ml
echo Time to reboot
You can simply use
bash kernelupdate.sh and it will add the ELRepo and install the Linux 4.10.1 kernel. Alternatively, you can just copy and paste those commands if you are using a single system.
After that script executes or after you run the commands manually, you can reboot and you should see a 4.10.1 elrepo entry on the GRUB bootloader. You want to ensure you select the 4.10.1 kernel version.
After the system reboots, you may need to update GRUB to use the new kernel by default. Here we will open up the config file and change GRUB_DEFAULT=0 since 0 is the first menu option and our first menu option in the 4.10.1 kernel.
Now we will want to generate the grub2 configuration file so that this change will persist through reboots.
grub2-mkconfig -o /boot/grub2/grub.cfg
At this point, you should have a working CentOS 7.3 system updated with the Linux 4.10.1 kernel.
The impact of this change was tremendous. We went from crashes every few minutes to being able to run 24-hour stable workloads. You may want to compile your own Kernel but the stock CentOS 7.3 will likely crash before compiling is complete. We found the ELRepo.org method to install fast enough that we could get onto the new kernel. This at least will allow you to get a setup stable enough to use to further customize your AMD Ryzen system.