Using clang on Windows

Update 1: Visual Studio 2017 works. Thanks to STL.


Disclaimer: This isn't about clang/C2, clang/C2 is Microsoft own fork of clang to work with their backend. This is using clang + llvm.

tl;dr: All the source is in this repository: https://github.com/Leandros/ClangOnWindows


Recently Chrome decided to switch their Windows builds to use clang, exclusively. That got me intrigued to try it again, since my former experience of trying to use clang on Windows was rather mixed. However if it's good enough for Chrome, it surely must've improved!

Unfortunately, getting clang to compile MSVC based projects isn't as easy as just dropping in clang and changing a few flags. Let's get started.

Requirements

You'll need:

Building

Since I want to keep this build-system independent, I've setup a .bat script with all the required steps to compile a simple example. You can grab it here: github.com/Leandros/ClangOnWindows.

Open the build.bat and let's walk through it:

  • Set LLVMPath, VSPath and WinSDKPath to the installation paths of LLVM, VS 2017 and the current Windows Kit.
  • OUTPUT defines the name of the final .exe.
  • CFLAGS contains all your usual clang compiler flags, for our example I've kept them simple.
  • CPPFLAGS defines the include directories of the Universal CRT, C++ Standard Library and Windows SDK.
  • LDLIBS defines the library import paths for the Universal CRT, C++ Standard Library and Windows SDK.
  • MSEXT are the required flags to make clang act more like CL. Not required anymore, Visual Studio 2017 will work without.

The rest of the file is dedicated to compiling all .cc files in the current directory and linking them into an executable.

This example makes use of lld, LLVMs linker. It has a caveat, it's not yet able to fully emit PDBs, you might want to consider to keep using LINK.EXE until lld is fully ready. You can use your normal linking process, the output of clang is fully compatible.

Questions? @ArvidGerstmann on Twitter.

More

Is my output going to bash.exe or cmd.exe?

If you want to color the output of your terminal program in Windows, you might've noticed, that running it from different shells reacts differently.

cmd.exe does not recognize the ANSI escape sequences to change the foreground/background color, while bash.exe does not recognize Windows' SetConsoleTextAttribute. This poses a problem. A way to detect if the output is going to bash.exe or cmd.exe is required.

Fortunately, an old mail on the Cygwin mailing list1 hinted to the fact that GetFileType for the console handle returned by GetStdHandle is different. And after a little testing, it in fact is! Equipped with this information, we can now distinguish between our output terminals:

HANDLE hConsole = GetStdHandle(STD_OUTPUT_HANDLE);  
DWORD dwFiletype = GetFileType(hConsole);  
if (dwFiletype == 0x3) {  
    /* We're running in bash.exe */
} else if (dwFiletype == 0x2) {
    /* We're running in cmd.exe */
}

Questions? Criticism? Wanna talk? I'm @ArvidGerstmann on Twitter.

  1. Despite the author saying it's a bug, this isn't the case, as later emails in the thread confirm.

More

Stop using #ifdef for configuration

Using #ifdef's to configure different conditional compilations is very error prone.

I believe we've all had the case where something was compiled, while it shouldn't have been, due to accidentally creating/including a #define of the same name or defining it to 0 while checking with #ifdef.

While I won't give you a solution to fix the flawed model of using the preprocessor for conditional inclusion, I'll give you a solution to make it less error prone:

#if USING(DEBUG)
  /* Do something in debug. */
#else
  /* Do something in production. */
#endif

The USING macro requires each configuration macro to be explicitly defined to special ON or OFF values, or you'll get an error1.

By simply defining USING to a mathematical expression and ON / OFF to be the operators, we'll get an error whenever an undefined or otherwise defined macro is tried to be used as an argument:

#define USING(x)        ((1 x 1) == 2)
#define ON              +
#define OFF             -

Comments, criticism? Drop me a tweet @ArvidGerstmann.

  1. Not entirely correct, but good enough for our case.

More

C Wishlist: Inline Functions

This is the first piece of a series of posts describing a few, very easily solvable, shortcomings of the C language. And proposing a way to fix these.
This is purely done the fact that I'm currently working on a C compiler. While I've written a few compilers over the years, none was actually intended to be used in production. This time it's different!

I'm currently rewriting the expression parser of my compiler, to use the power of top down operator precedence parsing as described by Vaughan Pratt in his paper of the same name.

I'm defining token classes with structures, like this:

struct tok_class tok_add = { 10, add_nud, add_led };  

when it glared me, that this could've been done a lot simpler, by declaring the function pointers inline:

struct tok_class tok_add = {  
    .lbp = 10,
    .nud = ^int64_t(struct tok_info *t) {
        /* do work. */
        return 0;
    },
    .led = ^int64_t(struct tok_info *t) {
        /* do work. */
        return 0;
    }
};

I have chosen this syntax with a reason, since it resembles the already existing syntax for blocks. Which already are a C language extension, implemented in clang.

Unfortunately (or fortunately) blocks work a little different, they work like a std::function in C++. This would merely be syntactic sugar for a static function declaration of which the address is assigned to the function pointer.

Questions, criticism or just want to say hi? You can find me on Twitter @ArvidGerstmann.

More

SIMD Instruction Format

Did you know, that Intel has a naming scheme for their SIMD instructions? Neither did I.

I found, by watching a CppCon talk by Tim Haines, that
SIMD instruction have a pre-defined scheme which can be used to easily decode the, sometimes, cryptic mnemonics.
To much of my surprise, it's not mentioned nor explained in any of Intel's manuals.

Format

An instruction is composed of these parts:

[PREFIX][OPCODE][ALIGNMENT][PACKING][PRECISION]

PREFIX (Optional): Whether the instruction is AVX or SSE. Possible values: A[V]X, or left out

OPCODE: The operation to perform. Possible values: Any basic arithmetic instruction (e.g ADD/SUB/MUL), MOV

ALIGNMENT (Optional): The alignment requirements of the data, only applicable to a few instructions. Possible values: [A]ligned, [U]naligned

PACKING: Whether it's a packed operation or operation on a single scalar. Possible values: [P]acked, [S]calar

PRECISION: Whether it operates on single or double precision floats. Possible values: [S]ingle-Precision float, [D]ouble-Precision float

Examples

MOVAPD: [MOV][A][P][D]. [MOV]e [A]ligned [P]acked [D]ouble-Precision float

SUBSS: [SUB][S][S]. [SUB]tract [S]calar [S]ingle-Precision float

VMOVSS: [V][MOV][S][S]. A[V]X [MOV]e [S]calar [S]ingle-Precision float

Caveat

This does only apply to a few basic instructions, namely the basic arithmetic and MOV instructions.

Many instructions, added later in SSE2/3/4 are very specialized, which can be applied vertically or horizontally, use non-temporal stores, etc.

Feedback

Feedback, criticism? Tweet me at @ArvidGerstmann.

More

Test if a variable is unavailable in GDB

GDB on macOS can't display a few of the segment registers, which results in that the respective variables are not available for use. Trying to use them will only print $var is not available and skip the rest of any currently running function.

Merely executing this line

printf " %04X  ", $ds

without $ds being available would stop executing the whole function.

To lazy to read and just want the solution? Skip to the bottom.

This posed an issue, since I have a function for displaying all current registers, the stack and the current plus the 5 next instructions, which would stop executing in the middle due to the not available register. For a while now I've simply commented the part of the script out, due to there being no obvious way of testing if a variable is available. As soon as you try to access it, you get the error.

This was frustrating and I wanted a solution, so I started investigating.

After googling and going through the docs for a few minutes, I noticed I was in for a fun ride. The only relevant source I found was an unanswered question on reverseengineering.stackexchange.com (which, as a result of this, could answer).

A little more googling revealed a question, and answer, on GDB's mailing list on how to ignore errors in user defined commands, this was making clever use of GDB's python API. The first solution was found.
I copied the ignore-errors function into a python script, sourced it in my .gdbinit and replaced all printf's with

ignore-errors printf " %04X  ", $ds

It worked! Hooray!

But it was ugly, the printf wasn't executed at all, leaving empty spaces. So I decided to search for a better solution.

I know knew you can catch errors in a python script, so I thought there must also be a way to inspect the variable in python, without GDB throwing an error? I wrote a little function to test my theory:

class IsValid (gdb.Function):
    def __init__ (self):
        super (IsValid, self).__init__("isvalid")

    def invoke (self, var):
        print "var: ", var
        return 0

IsValid ()

It printed var: <unavailable>! The same you would get in GDB, but without any errors. I was on the right track. But since I haven't written any python in a few years, and was therefore a bit rusty and I had to figure out how to get the value print is getting from the variable, without throwing an error. The obvious gdb.Variable.string() function was throwing errors on me if I tried to access the value through it. After a little of of try and error I figure out __str__() would get me where I wanted to.

Resulting in this final function:

class IsValid (gdb.Function):  
    def __init__ (self):
        super (IsValid, self).__init__("isvalid")

    def invoke (self, var):
        if var.__str__() == "<unavailable>":
            return 0
        else:
            return 1

IsValid ()  

Now I could replace the printf lines with this and it would work absolutely lovely:

if ($isvalid($ds))  
    printf " %04X  ", $ds
else  
    printf " ----  "
endif  

One last test, and ... Hooray! It worked.

You can get my fixed .gdbinit from my dotfiles repository on GitHub.

More