As with all web systems, at some point, downtime is required. Over the next few days we are doing a series of maintenance upgrades to the underpinnings of STH and the forums. Here is a bit of insight with regard to what we are doing.
First the previous uptime report
For those wondering, the previous uptime going into today for the forums and the main site stood at just under 173 days. Not the greatest, but certainly decent uptime from the Dell PowerEdge C6100 machines in the colocation facility. For those who have not seen the picture here is the hardware STH is run on:
The previous downtime was caused by a power outage during routine testing at the datacenter. Oh well! We did gain a key learning from that experience, namely, remember to mark “Start at Boot” in Proxmox VE.
Suffice to say, other than that hiccup the colocation has been running fairly smoothly. The HP V1910-24G switches have been humming along and we have not needed to pull any of the spare Dell PowerEdge C6100 power supplies, chassis or nodes into use just yet (of course as I am typing this something is getting ready to fail.)
What we are upgrading
A few major components required updating. First we updated WordPress on the main site and played a bit with some of the plugins. Early results are showing slightly faster load times so that is a major positive.
The next big piece is the entire Proxmox VE cluster needed to get updated. Since the colocation was planned just before Proxmox VE 2.3 was released, we had been running on version 2.2 ever since the colocation installation. Taking the safe route we did Proxmox VE 2.2 -> 2.3 -> 3.0 -> 3.1 for anyone looking to do the same we will post a small piece in the next few days. Very simple but it did require three reboots per node to get the entire Proxmox cluster onto the latest revision.
The forums were still running CentOS 6.3. We updates to CentOS 6.5 and did a number of major package updates also.
All told, this involved a total of around 20 reboots. Those 20 reboots yielded total downtime of around 4 minutes for the forums and the main site each. Certainly not too bad at all!
Is this the end?
Unfortunately no. We just changed our backup scheme for the WordPress site and the forums will likely follow suit. We also are renewing the search for a new admin as we need to get someone new onboard helping out. If you do have any recommendations, that would be helpful and there is a forum thread here. Our HA setup is less than ideal.
Also, we are seriously considering moving to an Intel Avoton and Rangeley hosting platform. If only a company made a 3 node in a 2U solution for web hosting. If you have not yet see it, here is the Mini Cluster in a Box V2 proof of concept.