C Wishlist: Inline Functions

This is the first piece of a series of posts describing a few, very easily solvable, shortcomings of the C language. And proposing a way to fix these.
This is purely done the fact that I'm currently working on a C compiler. While I've written a few compilers over the years, none was actually intended to be used in production. This time it's different!

I'm currently rewriting the expression parser of my compiler, to use the power of top down operator precedence parsing as described by Vaughan Pratt in his paper of the same name.

I'm defining token classes with structures, like this:

struct tok_class tok_add = { 10, add_nud, add_led };

when it glared me, that this could've been done a lot simpler, by declaring the function pointers inline:

struct tok_class tok_add = {
    .lbp = 10,
    .nud = ^int64_t(struct tok_info *t) {
        /* do work. */
        return 0;
    .led = ^int64_t(struct tok_info *t) {
        /* do work. */
        return 0;

I have chosen this syntax with a reason, since it resembles the already existing syntax for blocks. Which already are a C language extension, implemented in clang.

Unfortunately (or fortunately) blocks work a little different, they work like a std::function in C++. This would merely be syntactic sugar for a static function declaration of which the address is assigned to the function pointer.

Questions, criticism or just want to say hi? You can find me on Twitter @ArvidGerstmann.


SIMD Instruction Format

Did you know, that Intel has a naming scheme for their SIMD instructions? Neither did I.

I found, by watching a CppCon talk by Tim Haines, that
SIMD instruction have a pre-defined scheme which can be used to easily decode the, sometimes, cryptic mnemonics.
To much of my surprise, it's not mentioned nor explained in any of Intel's manuals.


An instruction is composed of these parts:


PREFIX (Optional): Whether the instruction is AVX or SSE. Possible values: A[V]X, or left out

OPCODE: The operation to perform. Possible values: Any basic arithmetic instruction (e.g ADD/SUB/MUL), MOV

ALIGNMENT (Optional): The alignment requirements of the data, only applicable to a few instructions. Possible values: [A]ligned, [U]naligned

PACKING: Whether it's a packed operation or operation on a single scalar. Possible values: [P]acked, [S]calar

PRECISION: Whether it operates on single or double precision floats. Possible values: [S]ingle-Precision float, [D]ouble-Precision float


MOVAPD: [MOV][A][P][D]. [MOV]e [A]ligned [P]acked [D]ouble-Precision float

SUBSS: [SUB][S][S]. [SUB]tract [S]calar [S]ingle-Precision float

VMOVSS: [V][MOV][S][S]. A[V]X [MOV]e [S]calar [S]ingle-Precision float


This does only apply to a few basic instructions, namely the basic arithmetic and MOV instructions.

Many instructions, added later in SSE2/3/4 are very specialized, which can be applied vertically or horizontally, use non-temporal stores, etc.


Feedback, criticism? Tweet me at @ArvidGerstmann.


Test if a variable is unavailable in GDB

GDB on macOS can't display a few of the segment registers, which results in that the respective variables are not available for use. Trying to use them will only print $var is not available and skip the rest of any currently running function.

Merely executing this line

printf " %04X  ", $ds

without $ds being available would stop executing the whole function.

To lazy to read and just want the solution? Skip to the bottom.

This posed an issue, since I have a function for displaying all current registers, the stack and the current plus the 5 next instructions, which would stop executing in the middle due to the not available register. For a while now I've simply commented the part of the script out, due to there being no obvious way of testing if a variable is available. As soon as you try to access it, you get the error.

This was frustrating and I wanted a solution, so I started investigating.

After googling and going through the docs for a few minutes, I noticed I was in for a fun ride. The only relevant source I found was an unanswered question on reverseengineering.stackexchange.com (which, as a result of this, could answer).

A little more googling revealed a question, and answer, on GDB's mailing list on how to ignore errors in user defined commands, this was making clever use of GDB's python API. The first solution was found.
I copied the ignore-errors function into a python script, sourced it in my .gdbinit and replaced all printf's with

ignore-errors printf " %04X  ", $ds

It worked! Hooray!

But it was ugly, the printf wasn't executed at all, leaving empty spaces. So I decided to search for a better solution.

I know knew you can catch errors in a python script, so I thought there must also be a way to inspect the variable in python, without GDB throwing an error? I wrote a little function to test my theory:

class IsValid (gdb.Function):
    def __init__ (self):
        super (IsValid, self).__init__("isvalid")

    def invoke (self, var):
        print "var: ", var
        return 0

IsValid ()

It printed var: <unavailable>! The same you would get in GDB, but without any errors. I was on the right track.
But since I haven't written any python in a few years, and was therefore a bit rusty and I had to figure out how to get the value print is getting from the variable, without throwing an error. The obvious gdb.Variable.string() function was throwing errors on me if I tried to access the value through it. After a little of of try and error I figure out __str__() would get me where I wanted to.

Resulting in this final function:

class IsValid (gdb.Function):
    def __init__ (self):
        super (IsValid, self).__init__("isvalid")

    def invoke (self, var):
        if var.__str__() == "<unavailable>":
            return 0
            return 1

IsValid ()

Now I could replace the printf lines with this and it would work absolutely lovely:

if ($isvalid($ds))
    printf " %04X  ", $ds
    printf " ----  "

One last test, and ... Hooray! It worked.

You can get my fixed .gdbinit from my dotfiles repository on GitHub.


Minimal Makefile, Maximum Outcome

Makefiles are flexible in many ways, it's a small language in itself.
As I said in a post before, I normally write the same Makefile over and over, even just for compiling a two or more files. The proposed Makefile there, however, grew to big while writing the post, since I wanted it to be small, but still have a few convinient features.

Today I want to present you a variant which is tiny, yet compiles your small to medium projects just fine. I'll make use of some of makes every convenient implicit rules.

The implicit rules of make for compiling a C file look like this:


%.o: %.c

%: %.o
	$(LINK.o) $^ $(LOADLIBES) $(LDLIBS) -o $@

Making use of these builtin rules, a basic Makefile can be as simple as:
(make sure the executable target contains at least one object file with the same name, or the executale won't be linked)

CC    = gcc
SRC   = $(wildcard *.c)
OBJ   = $(SRC:.c=.o)

.PHONY: clean

foo: foo.o

	rm -f $(OBJ) foo

While this works, and is a variant of the Makefile I write for every tiny project, it does not yet track dependencies on headers, or other include files, requiring you to manually clean the project and recompile.
Modern compilers (gcc > 4, clang > 3.0 and ICC > 12.1) support the options -MMD and -MP, which will generate dependency files (without using sed, like before, with only -MM).
Neither does it respect the environment variable CC, nor generate any debug files.

After applying these changes, this is the resulting Makefile:

CC				= gcc
DEBUG			= -ggdb -O0 -march=native
CFLAGS			:= $(DEBUG) -W -Wall -Wextra -Wpedantic -pedantic -ansi
LDLIBS			:= -lm

SRC				:= $(wildcard *.c)
OBJ				:= $(SRC:.c=.o)
DEP				:= $(SRC:.c=.d)
-include $(DEP)

.PHONY: clean

all: test1
test1: test1.o

	-rm -f $(OBJ) $(DEP) test1

It now correctly tracks dependencies, respects the CC environment variable, generates gdb debugging symbols, and has some default compiler / warning flags.


std::sort in C

As some of you might already know through my Twitter, I'm currently attempting to build an advanced standard library for C, with similar performance and usage characteristics as C++'s STL.
It's going really well so far, but the sort algorithm of std::sort is giving me headaches.

In theory, it's a simple hybrid sort (since the C++11 standard requires a complexity of O(N log N) in all cases, in comparison to C++03, which only required average case performance of O(N log N)). In case of the libstdc++, it's a hybrid-3 sorting algorithm, consisting of introsort (which itself is a hybrid of quicksort and heapsort), up to a defined maximum depth (in case of libstdc++ it's log2(N) * 2), followed by an insertionsort.

The whole source code can be found on Github[^n], here is an excerpt of the README containing the performance benchmarks:



If compiled without any compiler optimizations, that is no -O2 or -O3, the algorithm in C performs better, compared to std::sort.

If, however, compiled with optmizations turned on (-O2 or -O3), the std::sort implementation performs better. I have yet to figure out why.


Tests are done on my GNU/Linux VM (running Debian 8).
Host is an Intel Core i7-5930K @ 3.5GHz / 32GiB RAM.
Guest has 4 Cores, 8GiB RAM.

The test consists of sorting a randomly generated array of 1 million integers.

Compiler Optimizations Time
gcc 4.9.2 -O0 -march=native ~180ms
gcc 4.9.2 -O3 -march=native ~72ms
g++ 4.9.2 -O0 -march=native ~300ms
g++ 4.9.2 -O3 -march=native ~58ms
clang 3.5 -O0 -march=native ~170ms
clang 3.5 -O3 -march=native ~65ms
clang++ 3.5 -O0 -march=native ~360ms
clang++ 3.5 -O3 -march=native ~56ms

Tests compiled with gcc or clang are testing the C implementation.
Tests compiled with g++ or clang++ are testing the C++ implementation.
The Makefile contains all you need to compile the tests yourself, and see what you got.


I wrote this in about 2 hours. Don't expect it to be either production ready code, nor bugfree, nor actually optimized to be used in another scenario than what I developed it for.