QTestLib, benchmarks and iterations

Improving the accuracy of benchmarks with the minimum total option

QTestLib comes with quite useful benchmarking functionality. All it takes to make an existing unit test into a benchmark is to wrap the relevant code with the QBENCHMARK macro. Benchmarks are often influenced by warm-up biases and jitter, leading to variance of results when repeating the same benchmark. The option -iterations can be used to specify how many times a benchmark should be executed and the results aggregated. The displayed result will be the average of the benchmark runs, reducing that variance. Some benchmarks, however, cannot be repeated like that, because they will not execute the same operations when repeated. To control the variance of benchmark results in such situations, the -minimumtotal option was added to QTestLib benchmarks.

Consider this example, take from the QueueBenchmark of ThreadWeaver. It cannot use multiple iterations:

void QueueBenchmarksTest::IndividualJobsBenchmark()
{
    //...
    ThreadWeaver::Weaver weaver;
    //...
    weaver.suspend();
    AccumulateJob jobs[n];
    for(int i = 0; i < n; ++i) {
        jobs[i].setCount(m);
        weaver.enqueue(&jobs[i]);
    }

    QBENCHMARK {
        weaver.resume();
        weaver.finish();
    }
}

The benchmark queues up a number of jobs, and then measures the time it takes to execute them in the thread pool. For every iteration, the body of QBENCHMARK would be executed, obviously leading to useless results: The first iteration would execute the jobs as expected, whereas following ones will find the queue empty and almost skip over the benchmark. The QBENCHMARK_ONCE macro should be used to mark benchmarks that can only be executed once:

void QueueBenchmarksTest::IndividualJobsBenchmark()
{
    //...

    QBENCHMARK_ONCE {
        weaver.resume();
        weaver.finish();
    }
}

Short-running benchmarks often display a larger variance of benchmark results due to problems of measurement granularity and the larger impact of external influences, like a single system call. Short-running benchmarks that can be executed only once are most vulnerable to result variance. Repeating the benchmark multiple times and displaying the median result will still reduce the variance, since the probability that one much deviating result is picked is reduced. Instead of iterations – repeating the QBENCHMARK block and averaging the results – the whole benchmark can be repeated until a certain treshold of total execution time is reached. This is the behaviour of the -minimumtotal command line option. The total is specified in units of the selected measurement, like CPU ticks or walltime milliseconds.

-minimumtotal will affect short-running tests more than longer running ones. When using iterations to reduce variance, the total execution time will increase linearily with the number of iterations, since all benchmarks will be performed with the same number of iterations. With -minimumtotal, long-running benchmarks that exceed the specified threshold in one execution will still be executed only once.

Choosing the best value for -minimumtotal is done based on experience and the aspired level of accuracy. Choosing multiples of the result of the longest running benchmark still affected by variance in a test suite is a good approach. In the following example, the IndividualJobsBenchmark:64 threads, 1 values data tag of ThreadWeaver’s QueueBenchmark is used (the example code above is based in the same benchmark). This benchmark is rather sensitive to variance, since it combines very short execution time of the individual jobs with a large number of threads, leading to many system calls and massive lock contention. The table displays the variance of 10 benchmark runs, one with a single iteration, and once with -minimumtotal 4000. 4000 is about 20 times the expected result of this benchmark, and this individual benchmark is the shortest-running one in the 64 threads benchmark suit.

minimumtotal-stats

The reduction in variance in this example is significant, proving that the -minimumtotal approach can lead to similar improvements of accuracy as when using multiple iterations, and it applicable to benchmarks where iterations cannot be used. The feature has been submitted to Qt and approved by the QTestLib maintainer. It will hopefully end up in the next Qt release.


Google+

One response to “QTestLib, benchmarks and iterations

  1. Pingback: Qt Weekly #4: Benchmarking Code | SDK News

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s