The IRIX LLVM Port - Part 4

2022-02-10

Back in ~June of 2021, I began porting clang and LLD (and by extension, LLVM) to Silicon Graphics' IRIX operating system. It was my first time ever hacking on any compiler or linker, and even though I was flying blind a lot of the time, it was incredibly interesting, and I learned a lot from the process. Along the way, I wrote a series of forum posts over on IRIXNet which described my progress as well as bugs I ran into while enabling IRIX support. These forum posts, mostly unchanged, are going to make up the contents of this series. As for the LLVM port itself, even though I ended up having to drop out of the effort due to real life circumstances, Vladimir Vukicevic ended up finishing the port, and tackled plenty of hard bugs related to ELF relocations and MIPS codegen.

Hey people, sorry for not updating for so long. I didn’t die, it just turned out that C++ support ended up being far more complicated than I initially intended it to be (mostly because of my own stupidity), and also uni started back up so I haven’t had as much time each to work on this as I would have liked. This post isn’t going to be quite as comprehensive about things I did as my previous posts have been. A lot has happened in the past few weeks, including this project getting featured on the BSD Now podcast 😀, me getting into contact with Eschaton (the first person to attempt a port of LLVM for IRIX), and best of all, a new contributor joining me! Vladimir Vukićević is probably the first person I’ve talked to more than once who has a Wikipedia article written about them, and he’s been absolutely invaluable. He’s taken the initiative on reexamining and fixing assumptions and hacks I’ve made both now and in the past, as well as extending LLVM’s support for IRIX through debugging bugs caused by clang/LLD interactions with rld. He’s the first to actually get clang and LLD to run on IRIX, and I’m genuinely so grateful for all of his help. A small community is beginning to form around this project, and I’m happy knowing that even if I were to quit for some reason (I’m not planning on it), the project would continue progressing without my involvement.

Starting from the beginning, after the last update, I took a break for a few days from LLVM and worked on other stuff, so when I came back to LLVM, I immediately dived headfirst into C++ support. I picked up where I left off before the shared libraries bug with getting libc++ and libc++abi up and running. I figured out what was causing that bizarre cmake failure from last time, it turned out that cmake was missing an IRIX platform file, as well as IRIX definitions inside the libc++ CMakeLists.txt. I added the platform file from the SGUG cmake port, and I added some IRIX flags to the CMakeLists.txt.

I was onto the build process, which at this point mainly consisted of me fighting with both the libc++ headers, and GCC’s include-fixed headers. The main sources of frustration were anything involving va_list’s and wchar_t. clang and GCC both define wchar_t themselves, however the IRIX headers have a typedef-d wchar_t for MIPSPro, causing a type redefinition error on both clang and GCC. The IRIX headers define va_list as a char*, which GCC doesn’t expect, and so the GCC include-fixed headers provide alternate functions that use its own builtin __gnuc_va_list type. clang doesn’t know about this however and so it tries to use the normal IRIX va_list, inevitably causing an error because it can’t find functions using the IRIX va_list. There wasn’t really any easy way to patch this within the libc++ support files and so I ended up simply hacking away at the include-fixed headers so that the IRIX va_list functions would be available when building with clang.

There were some other bugs in libc++ I had to fix through defining preprocessor macros and turning off certain functionality, unfortunately my memory’s pretty fuzzy on them since it’s been nearly 3 weeks since I was working on them. I also extended libxg to provide strerror_r. Around this time I built libc++abi and got a whole lot of errors related to relocations on read only segments

I disabled the errors by passing -Wl,-z,notext and they went away, however unfortunately this would still end up coming into play later on.

I got libc++ and libc++ abi to both build, and I linked a C++ hello world, however rld surprised me with this:

iswblank was a relatively easy fix, it turns out that IRIX exports iswblank in their C library as _iswblank, so I simply defined an iswblank macro for libc++ that pointed at _iswblank. Those other 2 errors were more involved. I didn’t know what they were so I demangled the function names, and realized they were the aligned operator new, and aligned operator delete. I had to create a posix_memalign in libxg for libc++ so that I could use libc++’s aligned memory allocation function, and thus enable the missing operator overloads that were causing this error.

Finally I had resolved all the missing functions, and I went to run my hello world program, only to hit this

I ran the program in GDB (without any debugging symbols) and realized it was segfaulting when it was calling a constructor. I peppered some printf statements around and confirmed that it was crashing right here

Now here’s the rather dumb part. I ended up messing around for a little over a week in my off hours trying to get the LLD output to more closely match Binutils LD in terms of the ELF segment layout, I assumed this because, for one thing, I was still quite leery of my previous hack I had done to get LLD to output an executable text segment at the DT_MIPS_BASE_ADDRESS, it resulted in a very suspicious looking “ghost” executable segment, that while it technically worked, was definitely one of those things I cannot explain. See for yourself:

On top of this ghost segment issue, I was also leery of the relocation errors I was getting prior. Binutils LD only has two PT_LOAD segments, one for executable code and read only data, and one read/write segment for everything else. The eh_frame section is inside that read/write segment, and so I assumed that was potentially causing a bug in rld, causing it to improperly relocate runtime data.

I’m not gonna bore you guys with the details, because, well, it was really boring. I managed to get the LLD output to more or less mirror the Binutils LD output, however it still didn’t work. It was still crashing in the exact same place. I paused on the project for a few days since I was starting to feel quite drained.

I came back though, refreshed and ready to get to work, and so I decided this time around, screw libc++, let’s see what happens if we try and use GNU libstdc++. I created 4 test binaries using a combo of g++, clang++, Binutils LD, and LLD. Out of the 4, only a combination of g++ and Binutils LD actually produced a working binary that didn’t just immediately segfault. (I later on got a test binary from MIPSPro however it ended up being irrelevant). I decided to step through all of the nonworking binaries in GDB, and that’s when I finally noticed something that I’d completely missed the first time around, the programs were not only all segfaulting at the same place, but they were segfaulting when they were trying to run operations using member variables of ostream objects.

This didn’t make any sense though, the objects themselves were all allocated in memory, why were they all full of invalid member variables? I then noticed that, wait a second, the objects aren’t on the stack like a normal variable, they’re all off in memory occupied by the shared library, this means they’re all global variables.

How is a global variable initialized? Good question, in C, the compiler is responsible for assigning values to global variables, at compile time, it allocates space in the .data section for the variable, and writes in the constant value to it, and the linker will simply add this data section into the final binary. C++ is somewhat different though, the compiler will perform the same operation for any constant initializers, such as say, int x = 5; For non constant initializers though, such as int* x = new int[5]; the variable will be initialized at program startup, before the main function is even called. The specific way the runtime initializes variables can differ between compilers though, for most modern platforms, GCC’s runtime will look through the entries in the .init_array section and call each global constructor placed inside to initialize any global variables. Other operating systems such as IRIX differ though, there, GCC’s runtime will instead look backwards through the .ctors section in order to call any global constructors. Regardless of the method however, the global variables should always be initialized by the global constructors before the main function is even entered so that they’re immediately available for use with valid contents.

I didn’t make the connection before, but std::cout is one of those global objects, and just like other global objects, it needs to be initialized by the runtime before it can be used, explaining why it was causing a segfault everytime I tried using it without it being initialized. Just to double check that my theory about the global constructors not being executed was correct though, I decided to make a test program that simply initialized a global object of a custom class, then called member functions on it to read its contents. As I suspected, none of the values that were returned had been initialized

This should have returned 5 and Hello! for X and Y respectively.

Around this time was when Vladimir Vukićević joined me, and he made rapid progress on cleaning up the various fixes and hacks I’d applied, making it easier for newcomers to get started on setting up an IRIX clang/LLD cross compiler, natively hosting clang/LLD on IRIX, and debugging/fixing his own issues he ran into while cross compiling software with clang/LLD.

It was pretty easy to get a working binary out of clang and Binutils LD, I passed -fno-use-init-array to clang so that it would revert to the older .ctors method of generating a global constructors list, then passed the object file through to Binutils LD, resulting in a working hello world. clang with LLD proved to be more tricky though. Even with the command line flag, it still wasn’t calling any of the global constructors at all, even though they were now being stored in the .ctors section instead of the .init_array. I dumped a list of the symbols present in the clang/Binutils LD binary, and a list of symbols present in the clang/LLD binary, and I realized that clang/LLD was missing a lot of the symbols necessary to actually call all the global constructors.

This was pretty baffling to me, at first I assumed that maybe these symbols just weren’t being read in from the crt objects GCC provides, however I was able to confirm that LLD was seeing them by placing a printf at the function where LLD loads in object file symbols, and it directly printed out the list of symbols that were missing from the final binary. I then assumed that maybe LLD was somehow deciding they were unnecessary and was discarding them? This didn’t really make a whole lot of sense though so I opened up the crti.o file that contained these symbols and realized something right away

Do you remember this screenshot?

It was from one of my earlier posts, it turns out that the local symbols, which were all related to global object initialization, were mixed in with the global symbols. I figured that the way LLD was preserving symbols for the final binary was only including symbols from the position before the global and weak symbols appeared. I searched through the LLD code and I found my suspicions confirmed

Perhaps I was wrong when I said I was going to be more concise this time around?

I began writing a kludge to simply force the getLocalSymbols to return all symbols in the object, and let copyLocalSymbols() deal with the work of determining which symbols were actually necessary to operate on, but then vladv informed me that llvm-objcopy would actually do most of the work of reordering the symbols correctly inside the object file. I ran it on crti.o and crtn.o, and hey, whadayya know, it worked perfectly, I didn’t even have to pass any flags. This STILL wasn’t the end of the global constructors saga though, I was still getting a segfault, and so I ended up spending some time researching online how the global constructors are actually called by the runtime, until I eventually found this comment on a GNU irix-crti.s file on google

Following the advice of this comment (bless you beautiful past people who left this knowledge for future porters), besides my previous -fno-use-init-array, I added -Wl,-init=__gcc_init, and -Wl,-fini=__gcc_fini to the clang command line, as well as added some extra code to 2 places in LLD to ensure that __gcc_init and __gcc_fini aren’t tampered with. The next time I ran the clang/LLD output, I hit an illegal instruction error rather than a plain old segfault. I yet again did the work of stepping through the binary in GDB, and this time I found that the binary was calling the __gcc_init function and then… more or less just falling into the abyss, it ran past the end of the function and kept calling what it thought were instructions next to it in memory.

I dumped a copy of the __gcc_init symbol from clang/Binutils LD’s binary, and a copy from clang/LLD’s binary, and found that __gcc_init was missing some return instructions in the clang/LLD version

clang/LLD version

clang/Binutils LD version

I puzzled over this for a bit, until I decided to dump the __gcc_init symbol from all 4 C runtime object files that LLD and Binutils LD should pull in. crtbegin.o, ,crti.o, and crtend.o all appeared in both versions of the binary, the one exception was crtn.o, that object file contained the function ending/return statements, and this was what was missing from the clang/LLD version. I ran clang/LLD in verbose mode and discovered yet another IRIX quirk. Most platforms will only have 1 crtn.o, however IRIX has 2 if you’re using libgcc as your runtime, one from IRIX in /usr/libXY/mipsZ/crtn.o, and another from /usr/lib/gcc/mips-sgi-irix6.5/$GCC_VERSION/crtn.o

When clang was constructing the LLD flags, it didn’t know that IRIX was a bit special, so since /usr/libXY/mipsZ was higher in the path hierarchy, it pulled in the IRIX crtn.o rather than including both the IRIX crtn.o and libgcc’s crtn.o. I mentioned it to vladv and he quickly wrote up a patch for this, as well as another patch to make sure the correct init and fini flags are automatically applied.

Finally, after all of this work, guess what, we got a working Hello world! from clang/LLD using libstdc++!

To be clear, the global object saga still isn’t totally over, there’s still bugs, and vladv’s been hard at work fixing a relocation bug related to global statics in particular. But we’ve progressed enough now that we can get simple C++ programs to work, and I felt we’d reached a point where I could give a tangible update on progression.

Stepping away from the global objects bugs for now though, vladv is the first person in the world to run clang/LLD, targeting IRIX, on real SGI hardware under IRIX!

The reason I mostly talked about my own changes in this post is mainly because these posts are mostly about what my thought process was at the time I was making each change, and well, I’m not vladv, so I can’t talk about what his thought process specifically was at the time of each change.

I’m extremely happy with all the work he’s done, it’s been awesome having another person who’s just as motivated (possibly even more than I am) to seeing this port through, and he’s incredibly sharp.