std::sort in C

As some of you might already know through my Twitter, I'm currently attempting to build an advanced standard library for C, with similar performance and usage characteristics as C++'s STL.
It's going really well so far, but the sort algorithm of std::sort is giving me headaches.

In theory, it's a simple hybrid sort (since the C++11 standard requires a complexity of O(N log N) in all cases, in comparison to C++03, which only required average case performance of O(N log N)). In case of the libstdc++, it's a hybrid-3 sorting algorithm, consisting of introsort (which itself is a hybrid of quicksort and heapsort), up to a defined maximum depth (in case of libstdc++ it's log2(N) * 2), followed by an insertionsort.

The whole source code can be found on Github1, here is an excerpt of the README containing the performance benchmarks:

Performance

Analysis

If compiled without any compiler optimizations, that is no -O2 or -O3, the algorithm in C performs better, compared to std::sort.

If, however, compiled with optmizations turned on (-O2 or -O3), the std::sort implementation performs better. I have yet to figure out why.

Comparison

Tests are done on my GNU/Linux VM (running Debian 8).
Host is an Intel Core i7-5930K @ 3.5GHz / 32GiB RAM.
Guest has 4 Cores, 8GiB RAM.

The test consists of sorting a randomly generated array of 1 million integers.

Compiler Optimizations Time
gcc 4.9.2 -O0 -march=native ~180ms
gcc 4.9.2 -O3 -march=native ~72ms
g++ 4.9.2 -O0 -march=native ~300ms
g++ 4.9.2 -O3 -march=native ~58ms
clang 3.5 -O0 -march=native ~170ms
clang 3.5 -O3 -march=native ~65ms
clang++ 3.5 -O0 -march=native ~360ms
clang++ 3.5 -O3 -march=native ~56ms

Tests compiled with gcc or clang are testing the C implementation.
Tests compiled with g++ or clang++ are testing the C++ implementation.
The Makefile contains all you need to compile the tests yourself, and see what you got.

Disclaimer

I wrote this in about 2 hours. Don't expect it to be either production ready code, nor bugfree, nor actually optimized to be used in another scenario than what I developed it for.