Years ago, we had a big decision to make. In 2013, STH had grown to a size that would seem immeasurably small by the traffic we do today. Still, at that point, we made the decision that it made fiscal sense to leave Amazon AWS for colocation. We chronicled the reasoning in Falling From the Sky Why STH is Leaving the Cloud and then the cost breakdown in Falling From the Sky Part 3 – Evaluating Amazon EC2, VPS, Dedicated and Colocation Options. Since 2013, we have been doing some irregular updates that largely correspond to planned upgrades of our infrastructure. Since we are taking a look at a few upgrades again, it is time to go through the exercise again.
If you want to hear this instead of just reading, we have a YouTube video with some commentary here:
Of course, we can go into a bit more detail below, but some prefer to listen rather than read so we have that option.
Grading Estimates from our 2018 5-year Checkpoint
In 2018, we looked again at whether it was time to move back to the cloud. In our cost analysis, this is what we found using 1-year estimates:
Just to give some sense of how that March 2018 estimate has gone in the 32 months since we looked at this, we ended up using:
- Way more bandwidth. STH has grown a ton since 2018. We also focus more on reviews which tend to use more photos. Even with lazy loading images, our bandwidth usage is significantly higher. This did not incur an incremental cost based on how we buy bandwidth. See: Buyer’s Guide to Hosting: Bandwidth, Data Transfer and Billing
- Several of our VMs doubled memory requirements and we added slightly more storage. We overprovisioned so this was absorbed with still enough leftover.
- We replaced a SSD, upgraded a firewall, and added another ATS PDU. We have not done long-term infrastructure upgrades.
- We ended up using an upgraded node that we had available, as a “hot spare”. We tend to “self-warranty” hardware, and so we had an extra system/ chassis there.
- We are going to say we used four hours of labor. This includes drive time to the primary data center. Since it is far away (18-20 minute drive), we actually did not go there for several quarters. So over 32 months, we had budgeted $640 for remote hands. We effectively either paid $160/ hr or paid less than that even rounding up to four hours.
Overall, it looks like we probably over-estimated self-hosting costs again, and underestimated AWS costs with respect to how much we would spend there, even after service discounts.
The Late 2020 AWS v. Colocation Update
We wanted to answer the question of what the picture would look like for hosting STH now. A few quick words before we get there on assumptions.
First off, we completely could change the way we run the site. That is a given. Frankly, running in VMs whether on the cloud or in self-hosting is convenient. Indeed, we run containers in VMs as well. We could also overhaul the entire software stack again, but frankly, we want to spend more time creating content then doing that work. Something that we learned was that we had less reliability by increasing complexity than by keeping things as simple as possible.
Second, we are modeling current data transfer, and a minimal set of VMs. We actually have a lot more running, including some services that we run for some of the labs. One could argue that since they are lab services they are part of bringing you STH, but they are not focused on the web hosting aspect so we are going to remove them. Also, we have other VMs that are likely only online because we wanted to try something and had capacity. We may or may not elect to run the VMs if there was the AWS incremental cost. We could model these as on-demand or spot instances, but instead, we are just removing them entirely.
Third, we completely understand spot pricing. We are modeling a basic set instead of adding extras. At some point, we need databases, nginx servers, and so forth.
Fourth, we are going to add a mix of AMD EPYC and Intel Xeon instances roughly about what we use for our hosting. We are heavily weighting the larger instances toward the EPYC instances since that helps bring down the costs and for our workloads, there is no appreciable difference. We could go Arm, but that requires some small lift and shift work.
Finally, we do use some AWS services. Those services we would use regardless so we are excluding them from the analysis. We are also not modeling services such as Mailchimp which handles our weekly newsletter, Teespring that handles our online merch shop, YouTube which hosts our videos, and so forth.
AWS Cost 1-Year Reserved No Upfront
Here is the calculator for the absolute base setup for our hosting using 1-year reserved upfront instances:
As you can see, our hosting costs are just under $4,300 per month.
AWS Cost 1-Year Reserved Partial Upfront
Swapping to 1-year reserved partial up-front on the instances helps bring pricing down a bit albeit with a $19,512 up-front cost.
When we factor in the up-front we get a $4,137.63 monthly cost along with a $49,651.56 total annual cost for the year. We are not discounting here using future values/ present values. There is a big issue with this. Typically, we tend to see our servers run for years. To model that, we tend to use 3-year reserved partial upfront.
AWS Cost 3-Year Reserved Partial Upfront
Using the 3-year reserved partial upfront on the instances gives us a much lower operating cost with a larger up-front payment.
First off, the $39,020 is more than we have spent in the last three years on hosting hardware. We do not purchase machines with long warranties or high markups, so if you are buying the average Dell/ HPE/ Lenovo server and think that sounds like a single server, you are trading higher upfront costs for service contracts. Given what we have seen on hardware/ remote hands, it is not a model we are pursuing. On the operating side, we get down to $1,968.51 per month which is great.
STH 2020 Hosting Budget
Next year, we will likely do two small changes. First, we will upgrade database nodes and instead of using Optane SSDs, we will move to Cascade Lake and Optane PMem DIMMs into the database servers and upgrade a few older nodes to AMD EPYC 7002 “Rome” systems. We are testing the Ampere Altra 80-core server right now, and we are at the point where we might consider using Arm in the hosting cluster this year. We are going to increase our hardware budget to $10,000 this year. Although we did not use most of our hardware budget in 2019 nor 2020, we expect to in 2021.
Making up our monthly cost, we increased a bit for inflation. We used a $895/ mo budget in 2018. Our costs are effectively flat, but we are going to assume a bit more labor to install servers/ upgrade the hardware.
We are budgeting $22,000 per year or around $1833.33 per month. This is about the same as we would need for EC2’s 3-year partial upfront reserved instance, albeit without the up-front costs.
The one item that skews this substantially, is that we are not replacing every node every year. We are now in a very different place than we were when we started this journey. We have existing infrastructure that is frankly fine from a performance and node count standpoint even though we have relatively under-invested over the past 32 months. We had budgeted around $1687/ month for the last two months and spent under $1000. Still, at some point, we like to replace equipment before it fails.
There is clearly a lot going into this. We now have just under 8 years since this 10U colocation spot in Las Vegas was our first setup:
What is not reflected in our discussion is all of the lessons learned along the way. Also, as hardware has gotten faster, and memory prices have decreased, the cost of self-hosting has gone down significantly for our applications. We are also taking advantage of falling bandwidth prices. While AWS is absolutely the right choice for many applications, and indeed we use them, for our web hosting it is not what we want for a simple and inexpensive setup. This may not be the perfect analysis, but it is a little bit of how we now look at hosting at STH.