std::sort in C
As some of you might already know through my Twitter, I’m currently attempting to build an advanced standard library for C, with similar performance and usage characteristics as C++’s STL.
It’s going really well so far, but the sort algorithm of std::sort is giving me headaches.
In theory, it’s a simple hybrid sort (since the C++11 standard requires a complexity of O(N log N) in all cases, in comparison to C++03, which only required average case performance of O(N log N)). In case of the libstdc++, it’s a hybrid-3 sorting algorithm, consisting of introsort (which itself is a hybrid of quicksort and heapsort), up to a defined maximum depth (in case of libstdc++ it’s log2(N) * 2), followed by an insertionsort.
The whole source code can be found on Github, here is an excerpt of the README containing the performance benchmarks:
Performance
Analysis
If compiled without any compiler optimizations, that is no -O2 or -O3, the algorithm in C performs better, compared to std::sort.
If, however, compiled with optmizations turned on (-O2 or -O3), the std::sort implementation performs better. I have yet to figure out why.
Comparison
Tests are done on my GNU/Linux VM (running Debian 8).
Host is an Intel Core i7-5930K @ 3.5GHz / 32GiB RAM.
Guest has 4 Cores, 8GiB RAM.
The test consists of sorting a randomly generated array of 1 million integers.
| Compiler | Optimizations | Time |
|---|---|---|
| gcc 4.9.2 | -O0 -march=native | ~180ms |
| gcc 4.9.2 | -O3 -march=native | ~72ms |
| g++ 4.9.2 | -O0 -march=native | ~300ms |
| g++ 4.9.2 | -O3 -march=native | ~58ms |
| clang 3.5 | -O0 -march=native | ~170ms |
| clang 3.5 | -O3 -march=native | ~65ms |
| clang++ 3.5 | -O0 -march=native | ~360ms |
| clang++ 3.5 | -O3 -march=native | ~56ms |
Tests compiled with gcc or clang are testing the C implementation.
Tests compiled with g++ or clang++ are testing the C++ implementation.
The Makefile contains all you need to compile the tests yourself, and see what you got.
Disclaimer
I wrote this in about 2 hours. Don’t expect it to be either production ready code, nor bugfree, nor actually optimized to be used in another scenario than what I developed it for.