Skip to main content

Understanding the Green SPECpower Benchmark

In the middle of this year, the Standard Performance Evaluation Council (SPEC) released the final version of the SPECpower_ssj2008 benchmark --- a new suite of tests that details power and performance in a single number. SPEC is a vendor-neutral industry council that has a long tradition of coming up with impartial benchmarks and strict compliance constraints that prevent vendors from monkeying with their systems to inflate or otherwise game their results. Over the course of years, the benchmarks have become widely respected as true gauges of what they measure. As I mentioned in my July column, Virtualization Servers: The New Green Platform for IT, SPEC has yet to come out with a benchmark for certain kinds of servers, such as virtualization servers. But for the generic server, SPECpower (as it's called for short) is the one benchmark that combines power and performance and distills them into a single metric.

Before embracing it though, it's important to understand what exactly SPECpower measures and how applicable it is to your situation. The heart of benchmark consists of running server-side Java programs (these are alluded to by the "ssj" in the benchmark's full name) and determining the peak throughput capacity of the system under test. That peak capacity is assigned a value of a 100% workload. The tests then run the ssj code at 10 different loads --- 100%, 90%, 80%, down to 10% --- and logs the watts consumed by the server at each load level. It then takes an average of the workloads and divides it by the average of watts used at each level. The resulting number is the final benchmark result. For recent dual-processor, quad-core servers, this number is in the range of 800-1000, indicating that the server under test on average can perform 800 to 1000 ssj operations per watt of electricity consumed. Figure 1 shows the typical way in which the results are reported and computed by SPEC.

Figure 1. The measurements that from the basis of SPECpower_ssj2008 shown in tabular and graphical form, for the IBM x3350 server. (Courtesy of SPEC)

Results that measure operations/watt are a measure of power efficiency, rather than power consumption. So, the first point about this benchmark is that it does not tell you how much power the server under test will consume (unless you look up the report and examine the base data, as shown in Figure 1). What it will only tell you is how efficient the system is.

And even then, the efficiency might not duplicate what you experience. The table in Figure 1 shows that the benchmark is an average of the performance at each of 10 workload levels. What this means is that the model assumes an equal amount of time is spent at each workload level -- a scenario that of will never occur at your site. Depending on your business you could be running at greater than 50% 24 hours a day, or at 60% for eight hours a day and nearly 0% when there's no one on site. As you can see from table and diagram, power efficiency increases as workload goes up. Consequently, in the two scenarios I just presented, the first one will see better actual power efficiency than the benchmark suggests, and the latter will see worse. (As a side note: The increased efficiency at higher workloads validates the thinking behind the strategy of server consolidation via virtualization. Namely, that it's better to have few servers working at higher workloads than it is to have many working a lower levels.)

If you've bought a car in the US during the last few decades, you know that the EPA mileage posted on the car's sticker always varies positively from the mileage you actually get. To avoid this disappointment with the SPECpower benchmark, it's worth noting several more details. First, while the server-side tests are Java-based, they do not use the de facto Java server framework, Java EE. In fact, they don't use databases either. So, in essence this test measures server performance for CPUs, RAM, and network I/O and no other aspect of hardware. On servers that are hooked to a spindle farm, this might be good enough (even though the tested operations are probably not similar to yours); however, if you have local storage on your server, you definitely will experience lower power efficiency.

A final point that is not at all measured in the benchmark is the heat generated at the various workloads. While loading a server to 100% does good things for power efficiency of the system as a computing device, what are the costs of cooling that hot monster? At each higher level of workload, there is a corresponding increase in thermal dissipation that must occur via datacenter resources. Hence, to come to a complete measure of performance per total power consumed, SPECpower requires interpolation of some cooling numbers by you.

Despite its shortcomings and the narrow dimension that it assesses, SPECpower_ssj2008 is a new and important benchmark -- as long as it's used correctly.

More on this topic