The Supermicro Hyper-Speed server sent to STH for review has been featured several times recently, mostly focusing on features. For those not familiar with the Supermicro Hyper-Speed server line it allows for selected pre-configured components to run faster than their native specifications. For example, the Intel Xeon E5-2687W is currently the fastest CPU at the time of this testing. The Supermicro Hyper-Speed line can run these processors 6% faster than other systems. Likewise while DDR3-1600 may be fast, the Hyper-Speed line runs the components at over 1900MHz, faster than even the DDR3-1866 spec provisions for. The key here is that Supermicro picks parts for the Hyper-Speed line to ensure stability. It does so to validate that the systems will work in data center environments in a reliable fashion.
Supermicro sent the following test configuration for our testing. This represents one common configuration for a compute node. One other popular configuration is using dual Intel Xeon E5-2637 CPUs (4C/8T) for applications where one needs high clock speed and lower core counts due to per-core license costs.[pullquote_right]For years, performance differences in servers with the same memory, storage, and CPUs were less than 1%. The Supermicro Hyper-Speed line changes this producing tangible performance boosts for those looking for every last bit of reliable performance.[/pullquote_right]
- Intel Xeon E5-2687W @ 3.286GHz base clocks
- Supermicro SYS-6027AX-TRF
- Supermicro X9DAX-iF Motherboard
- Samsung 8GB x 8 DDR3-1600 running at 1978MHz
- Operating Systems Ubuntu Server 12.04 and RHEL 6.3
- 2x Mellanox ConnectX-2 MHQH19B-XTR 40gbps FDR Infiniband card (added for STH Transaction Benchmark test)
Overall, this is the fastest dual socket configuration available in terms of today’s processors. Currently the Intel Xeon E5-2600 series line tops out with Intel Xeon E5-2687W at 3.1GHz base clocks and 8 cores / 16 threads each. We used the settings found in the Hyper-Speed BIOS piece to take the chips to 106 BCLK an increase of 6% over stock.
Supermicro Hyper-Speed Benchmarks
For this piece, we use a few benchmarks to see the impact of Supermicro Hyper-Speed overclocking. The unit was tested at stock clocks (100MHz BCLK and DDR3-1600) as well as at 106MHz BCLK and 1978MHz DDR3 speeds. Since the Hyper-Speed line is meant for applications such as High-Frequency Trading and HPC markets, we also ran some of the benchmarks on other reference architectures.
Folding@Home – GROMACS Protein Folding
GROMACS is a widely used molecular dynamics package. Folding@Home is a fairly mature GROMACS application that can run under different operating systems and is easily obtainable. Speed is often measured in Time Per Frame or TPF. A TPF measurement is the amount of time it takes to complete 1% of the work assigned to the node. Therefore we look for the lowest TPF possible. The results here may vary slightly compared to what one sees on a given work unit. We are using the same work unit across configurations to minimize variances. The test does scale well to 64 logical CPUs and beyond.
As one can see, Supermicro’s Hyper-Speed implementation does show a marked improvement over the stock speed dual Intel Xeon E5-2687W configuration. The test scales well with core and clock speed. Memory speed also has some impact on the results so this is a solid showing from the Hyper-Speed line. As an interesting note, Folding@Home uses a logarithmic scale to value contributions, so faster return times equal more “points” that donors can earn. Those serious about helping the Stanford project therefore see a large benefit even from single second faster results.
Chess benchmarks are used to simulate complex simulations that calculate probability. The Crafty Chess benchmark has become one of several benchmarks that currently define the space.
He we see that the Hyper-Speed system does show a consistent performance gain over the stock clocked part.
Stream is a benchmark that needs virtually no introduction. It is considered by many to be the de facto memory performance. Authored by John D. McCalpin, Ph.D. it can be found at http://www.cs.virginia.edu/stream/ and is very easy to use.
At first, I was in total disbelief after running the four tests ten times on each, throwing out the min/ max and averaging the rest. The Supermicro Hyper-Speed server has a huge advantage in this test due to a few factors, not the least of which is the massively improved memory clock rate. Stock Xeon E5-2600 series CPUs run DDR3 memory at 1600MHz if the memory is capable. The Supermicro Hyper-Speed platform runs the memory at 1978MHz. The result shows in the above graph.
Linpack is probably the most well known HPC benchmark. It is used to rate the Top500 supercomputers in the world.
I will note that this is not the bleeding-edge tuned performance numbers that most vendors use. This is representative more of the out of the box OS and Linpack compilation. One can see some significant improvement which is important for those looking for maximum performance.
STH JMeter Transaction Benchmark
High-frequency trading algorithms are fairly difficult to get a hold of since they represent very valuable IP. What we were able to do is run JMeter based test against a local Silicon Valley start-up’s solution. We tend not to use proprietary benchmarks like this, but the simple explanation for what is going on is that the machine has a source server holding a 220GB data set in RAM and a smaller ~30GB local data set that the algorithm then attempts to predict and match requests based on these data sources. Unfortunately, the test chokes on 1GbE and 10GbE so we had to use on-hand Mellanox ConnectX-2 MHQH19B-XTR 40gbps FDR Infiniband cards to string together a high-performance 40gbps network to run the tests. As their offering matures, hopefully we can feature their public solution later in 2013. Instead of using raw numbers, we instead are using the Dual Intel Xeon X5670 as a baseline, then look at transactions serviced per second compared to that baseline.
This is an interesting result. It does fall in line with what we would expect given the performance figures mentioned above. For those wondering, this was similar to what we saw with Apache Bench with these machines using our new JMeter based Apache/ PHP transaction benchmark but we wanted to verify using something closer to the proprietary solutions used in the field.
The performance speaks for itself. Performance is simply faster than one can get in competing systems. For years, performance differences in servers with the same memory, storage, and CPUs were less than 1%. The Supermicro Hyper-Speed line changes this producing tangible performance boosts for those looking for every last bit of reliable performance. Although this piece did not delve into power consumption, heat and stability, the Folding@Home tests were done three times at stock speeds, 104 BCLK, 105BCLK and 106BCLK (numbers used above.) Never did voltage need to be increased for the multi-day benchmarking sessions that relentlessly pegged the twin CPUs at 100%.