std::sort in C
As some of you might already know through my Twitter, I’m currently attempting to build an advanced standard library for C, with similar performance and usage characteristics as C++’s STL.
It’s going really well so far, but the sort algorithm of std::sort
is giving me headaches.
In theory, it’s a simple hybrid sort (since the C++11 standard requires a complexity of O(N log N)
in all cases, in comparison to C++03, which only required average case performance of O(N log N)
). In case of the libstdc++
, it’s a hybrid-3 sorting algorithm, consisting of introsort (which itself is a hybrid of quicksort and heapsort), up to a defined maximum depth (in case of libstdc++
it’s log2(N) * 2
), followed by an insertionsort.
The whole source code can be found on Github, here is an excerpt of the README
containing the performance benchmarks:
Performance
Analysis
If compiled without any compiler optimizations, that is no -O2
or -O3
, the algorithm in C performs better, compared to std::sort
.
If, however, compiled with optmizations turned on (-O2
or -O3
), the std::sort
implementation performs better. I have yet to figure out why.
Comparison
Tests are done on my GNU/Linux VM (running Debian 8).
Host is an Intel Core i7-5930K @ 3.5GHz / 32GiB RAM.
Guest has 4 Cores, 8GiB RAM.
The test consists of sorting a randomly generated array of 1 million integers.
Compiler | Optimizations | Time |
---|---|---|
gcc 4.9.2 | -O0 -march=native | ~180ms |
gcc 4.9.2 | -O3 -march=native | ~72ms |
g++ 4.9.2 | -O0 -march=native | ~300ms |
g++ 4.9.2 | -O3 -march=native | ~58ms |
clang 3.5 | -O0 -march=native | ~170ms |
clang 3.5 | -O3 -march=native | ~65ms |
clang++ 3.5 | -O0 -march=native | ~360ms |
clang++ 3.5 | -O3 -march=native | ~56ms |
Tests compiled with gcc
or clang
are testing the C
implementation.
Tests compiled with g++
or clang++
are testing the C++
implementation.
The Makefile
contains all you need to compile the tests yourself, and see what you got.
Disclaimer
I wrote this in about 2 hours. Don’t expect it to be either production ready code, nor bugfree, nor actually optimized to be used in another scenario than what I developed it for.