The IRIX LLVM Port – Part 2 

Home About Me Systems

The IRIX LLVM Port – Part 2

Back in ~June of 2021, I began porting clang and LLD (and by extension, LLVM) to Silicon Graphics’ IRIX operating system. It was my first time ever hacking on any compiler or linker, and even though I was flying blind a lot of the time, it was incredibly interesting, and I learned a lot from the process. Along the way, I wrote a series of forum posts over on IRIXNet which described my progress as well as bugs I ran into while enabling IRIX support. These forum posts, mostly unchanged, are going to make up the contents of this series. As for the LLVM port itself, even though I ended up having to drop out of the effort due to real life circumstances, Vladimir Vukicevic ended up finishing the port, and tackled plenty of hard bugs related to ELF relocations and MIPS codegen.

Hey guys, done quite a bit of work since the previous comment. Immediately after I got clang to be able to emit object files for IRIX, I went to work on fixing the missing predefines the IRIX standard library relies on. Raion helped out by dumping some of the MIPSPro predefines, these were mostly to enable/disable features for different MIPS microarchitectures, so I added these to clang’s MIPS target, while the SGI defines I added to the IRIX driver. There were 2 predefines that clang already implemented, but that I had to change for IRIX, __mips, and _MIPS_ISA, these were set by default to 64 and _MIPS_ISA_MIPS64 respectively, while MIPSPro sets these to mips3/mips4 and _MIPS_ISA_MIPS3/_MIPS_ISA_MIPS4 depending on which microarchitecture you’re building for. Even after I finished implementing all these predefines however, this wasn’t enough to build a hello world however, there were still plenty of errors from clang being unable to find various types and function prototypes from the IRIX standard library. I then dumped GCC and clang’s predefines and I ended up writing a tiny Python script to compare clang and GCC’s predefine output to figure out which predefines I needed to add to clang.

This revealed about a hundred predefines that GCC had, but clang didn’t have. This was too much to just add to clang manually, especially since I suspected the vast majority of them weren’t even used by anything on IRIX, so I ended up extending the script to check and see which GCC predefines got actually used inside the IRIX standard library. My suspicions were confirmed when I only uncovered about 8 or so of these predefines actually getting used inside the IRIX standard library. _LANGUAGE_C was a big predefine that I was missing, without it, sgidefs.h, which defines some of the types the standard library uses elsewhere, literally appeared to the clang preprocessor as a blank file. I added these few missing-and-important predefines to clang, and then finally, I managed to get clang to compile a hello world object file without needing me to manually stick a big list of predefines at the front.

From here, I initially began working on adding IRIX support to LLVM’s libc++, however, I changed my mind and instead decided to kill two birds with one stone, get a proper IRIX linker for my host machine, and also implement IRIX support for LLD. I quickly fixed the bug I was having before where the Binutils LD wasn’t detecting the object files or shared libraries it needed, turns out I wasn’t actually passing the paths to the IRIX /usr/lib32 inside my sysroot. I also made sure to download my GCC installation from the Octane as well, since I don’t have a working compiler-rt yet, and GCC’s libraries provide a good enough equivalent for now. I then discovered that LLD expected to use MIPS target emulation elf32btsmipn32 rather than elf32bmipn32, so I commented out my previous “fix” I added to clang.

After my 2 previous fixes, I was left with this:

I was extremely confused since I’ve used those exact same object files and libraries to link object files emitted from clang on the Octane under Binutils LD before, and I’ve never had it error out like this. Everything looked fine under readelf, so I initially assumed that maybe I was accidentally linking the GCC libraries, which are built for the MIPS3 microarchitecture, with MIPS4 libraries. However even after fixing the paths so they specifically pointed to MIPS3 files in all cases, I still got the same error. From here I ended up spending quite a while searching the internet, looking at ELF documentation until I finally stumbled across a page from Oracle about ELF symbol tables.

Something I had read prior finally clicked:

LLD was complaining that even after it had read what it thought was the last STB_LOCAL symbol according to .symtab’s sh_info variable defined in the section header, it was seeing that there were still more local symbols mixed in with the global and weak symbols. In other words, the crtbegin.o provided by GCC doesn’t follow the ELF specification. Unlike figuring out what the issue meant, fixing it was rather simple, all I did was patch out the check for mixed local and global/weak symbols inside LLD, causing LLD to emit warnings rather than error out. LLD now only emitted one more error related to library parsing. I got to work on figuring out why LLD was now complaining that libgcc_s.so had an invalid sh_entsize. I checked and it turned out that the .dynamic section of libgcc_s.so was what LLD had a problem with. I did some more reading on ELF and I discovered that sh_entsize essentially defines the size of each entry inside the dynamic array. 

At first, in order to get around the sh_entsize, I wrote some code for LLD that literally just set .dynamic’s sh_entsize to sizeof(Elf_Dyn). I then had to rebuild the entirety of LLVM (since the change was inside a header file), only to discover I had wasted my time, LLD didn’t like me directly modifying the Elf_Dyn section header in memory, and it crashed. I didn’t really feel like spending the time to figure out why LLD was segfaulting after modifying the section headers, plus I knew that the libgcc_s.so file specifically was the problem anyways, so I simply took the easy way out and directly overwrote libgcc_s.so’s .dynamic’s sh_entsize with a value of 8 in a hex editor. The reason I picked 8 is because that’s the native size of each entry under the n32 ABI. After I rebuilt LLVM, I tested out my new libgcc_s.so, and hey, it worked fine. I was getting some new errors, but these were just missing symbol definitions rather than invalid ELF headers.

One of the missing symbol definitions was _rld_new_interface, which is supposed to be filled in at runtime by IRIX’s runtime linker, rld, and the other one was a missing function from libm. These were easily fixed by passing both -lm and –allow-shlib-undefined. However, 4 more errors cropped up from /usr/lib32/mips3/crt1.o even after this, 3 of them were unknown ELF relocation types, and 1 other error was complaining about an invalid type of ELF relocation being applied.

This is what I saw after I looked up the unknown relocation value in the SGI ELF64 documentation. According to the addendum at the bottom of the table, this is “intended for address reset records in an Event Location section, (see Section 2.10)”. This would make sense since LLD was complaining about .MIPS.event.init having an unknown relocation type. After reading the Event Location section (the aforementioned section 2.10), from what I can understand, R_MIPS_SCN_DISP is used to monitor what the program is currently doing, which is where the events name of the section comes from. This info seems to be mainly used for debugging in conjunction with DWARF, but supposedly it’s also used for processor bug workarounds. Anyways, it didn’t really seem like LLD should have to touch this, so I implemented R_MIPS_SCN_DISP into LLD, but had it simply do nothing when encountering this relocation type. 

Finally, only one more error to go (for real this time). LLD didn’t like that .MIPS.event.text held a non ABS relocation, in this case, R_MIPS_CALL_HI16, against the __istart symbol. R_MIPS_CALL_HI16 performs lazy resolution of symbols, essentially, it waits until the first time a symbol is called in order to actually determine the location of this symbol. It appears that LLD wanted the absolute location to be found out right away rather than wait until a function call is made. I simply patched the error out and instead had LLD emit a warning. 

Finally after all of this, clang could compile hello world and have LLD link it to get an a.out.

(I guess libm and libc also don’t follow the ELF specification)

The resulting binary actually ran on the Octane!

Of course I couldn’t stop there, I compiled figlet and linked it using clang and LLD, and it built just fine. I then went to run it on the Octane, and I got a segmentation violation. I inspected the headers with readelf and they were quite obviously corrupted.

I got worried for a few minutes that my fixes to LLD hadn’t worked after all, but then I happened to notice something. figlet was larger by a couple dozen bytes on my host machine, curious as to why this was, I inspected figlet under readelf on my host, and it was perfectly normal.

It turned out that FTP was clobbering the figlet binary while it was being uploaded to the Octane, I have no idea why this didn’t happen to any of the other object files or the hello world when they were being uploaded to the octane, but to get around this, I simply switched to SFTP rather than using the builtin FTP server. I reuploaded figlet, and just like the hello world did, it ran perfectly.

Certainly quite an amazing sight. 

I’ve committed the changes I made to LLD as well as the changes to the predefines to GitHub. My next goal after this is to implement IRIX support into libc++, then get compiler-rt working. After that, hopefully I can get a native clang/LLD build that can run on IRIX itself.