The IRIX LLVM Port - Part 3

2022-02-09

Back in ~June of 2021, I began porting clang and LLD (and by extension, LLVM) to Silicon Graphics' IRIX operating system. It was my first time ever hacking on any compiler or linker, and even though I was flying blind a lot of the time, it was incredibly interesting, and I learned a lot from the process. Along the way, I wrote a series of forum posts over on IRIXNet which described my progress as well as bugs I ran into while enabling IRIX support. These forum posts, mostly unchanged, are going to make up the contents of this series. As for the LLVM port itself, even though I ended up having to drop out of the effort due to real life circumstances, Vladimir Vukicevic ended up finishing the port, and tackled plenty of hard bugs related to ELF relocations and MIPS codegen.

Hi again everyone. After the excitement of getting a full LLVM cross compilation toolchain last week, the last thing I needed to complete it (other than runtime stuff like compiler-rt, and the LLDB debugger) was a C++ standard library, therefore, I set my eyes on getting a LibC++ port up and running.

Most of this work was relatively boring, the main issues I had were that for one thing, IRIX’s libc attempts to typedef wchar_t as a long, which conflicts with clang++. Luckily, SGI implemented a macro to disable this for their own C++ compiler, _WCHAR_T, so I #define-d it and the error went away. IRIX’s libc was also missing max_align_t, so I added a struct for it into the IRIX support directory I placed into LibC++. Finally, IRIX’s libc is missing basically all modern POSIX locale functions and definitions. LibC++ requires a locale implementation, and so I fumbled around with this for a bit until I realized that the LibC++ contributors had anticipated ports to POSIX platforms without a working locale implementation, and thus had written locale stubs. I simply created an xlocale.h implementation that referenced these locale stubs, and finally I only had 2 errors, IRIX doesn’t have an wcsnrtombs or mbsnrtowcs implementation, so I made some stubs that returned null. Finally it was onto the linking stage for LibC++.

Of course, my C++ hello world didn’t link. LibC++ gets built for the target platform that the compiler is going to run on as part of building LLVM, and I was building clang/LLVM/LibC++ to run on Darwin. It was time to build a native LLVM for IRIX! I downloaded the SGUG libstdc++ from the Octane and I set out to build LLVM using my cross compiler. I ran into issues right away. It’s actually surprising how out of date most of the documentation is in regards to cross building modern LLVM toolchains, no matter what flags I passed in from the LLVM cross build docs, it kept trying to build for my host machine and attempting to pull in my host libraries. It took a long while of searching around the web before I finally realized that I should be passing a CMake toolchain file into the cmake stage. It was finally off to the races!

I ran into some issues again, that I’m still not sure of what they even were, since there weren’t any obvious error messages or anything being dropped into the logs, I didn’t investigate further however because I ran into another showstopping issue, LLD was emitting IRIX-incompatible shared libraries. I only noticed this because I realized that LLVM was going to need libxg to build for IRIX, since I used that to provide a few missing standard C/POSIX functions that IRIX doesn’t have when I was trying to compile LLVM on IRIX itself. I went to build libxg and ran into one issue, LLD was trying to use crtbeginS.o and crtendS.o from the GCC multilib directory. GCC on IRIX however doesn’t use these object files. I figured I could simply symlink crtbeginS.o and crtendS.o to the regular crtbegin.o and crtend.o respectively, but just to make sure, I checked the shared library linking behavior of Binutils LD on IRIX, and as I suspected, it just used the normal crtbegin.o and crtend.o. I symlinked it and got a shared library out of LLD, however when I went to test it on IRIX, rld completely rejected it

Here’s what the SGI ELF64 manual had to say about it:

🤨

Okay, fine then, what if we try forcing the shared library base to 0x10000 using -Wl,-image-base=0x10000 on the LDFLAGS? This should make sure that every section inside the shared library is required to be loaded into the application’s memory at least at 0x10000 in RAM (although this might get moved depending on if there’s another shared library that’s trying to load into the same area in RAM).

(I don’t have a screenshot of it, but IRIX returned a segmentation violation when I tried loading a shared library after this “fix”)

Okay, what is even going on?

Sidenote: In hindsight this should have been a very easy clue, shame on me I suppose for not knowing as much about how ELF binaries are loaded into memory before this week.

I decided to forcibly set the DT_MIPS_BASE_ADDRESS inside LLD to 0x10000, this time, I got this upon running a test program with the shared library:

It actually ran, but rld was emitting warnings. They were pretty annoying and I was worried that what I was doing could result in undefined behavior, so I set to work on figuring out how to actually set the DT_MIPS_BASE_ADDRESS in a way that rld was happy with. To narrow down what LLD was doing differently, I generated 3 different variants of a test shared library, one using clang and Binutils LD, one using the -Wl,-image-base=0x10000 flag, and one using my forced DT_MIPS_BASE_ADDRESS trick. This unfortunately led me to chase a red herring, upon comparing the program headers of the 3 shared libraries under readelf, I came to the incorrect conclusion that the executable segment of the image base flag shared library being set to 0x20440 was what was causing the segmentation violation. I assumed this because in the shared library generated with my DT_MIPS_BASE_ADDRESS trick, the executable segment was set to 0x10440, and in the Binutils LD shared library, the executable segment was set to 0x10000. I forcibly set the executable sections to generate at an offset below 0x20000 in memory, however this still caused a segmentation violation.

I really wasn’t sure where to go from here, I tried passing the default Binutils LD linker script to LLD, which didn’t work at all, LLD errored out and complained about some of the symbols being placed over one another. I was about to throw in the towel and accept the warnings for now, when I came to a realization I should have had a long time before, I should create a shared library using MIPSPro and compare it to the result of clang + Binutils LD. Once I did this, I noticed something immediately:

(Top is Binutils LD, bottom is MIPSPro)

In the both of them, not only was the executable PT_LOAD segment placed right at the base address with an offset of 0, but it was the very first PT_LOAD segment. I quickly found this reference diagram from the Linux Foundation and everything clicked into place:

As far as I can tell, this is what was happening before; rld expects the PT_LOAD segment that’s getting placed at or near 0x10000 to be executable, in the case of the DT_MIPS_BASE_ADDRESS trick, this managed to work fine since the only segment near 0x10000 was the executable segment, it was getting placed at 0x10440, and every other segment was getting placed above or below 0x10000. The elf headers were supposedly getting placed below 0x10000, however I suspect rld was shifting them up to 0x10000 using the DT_MIPS_BASE_ADDRESS, which could also potentially explain the warning rld was emitting about a movement calculation error. I wasn’t getting so lucky with the image base flag version of the shared library though. Over there, rld must have been mmap-ing the first PT_LOAD segment which was placed by LLD at 0x10000 into memory, and then attempting to jump into the segment, thus causing IRIX to throw a segmentation violation since the segment wasn’t marked as executable.

It took me a day or 2 of experimenting with LLD but I finally managed to figure out a way to force LLD to make the first PT_LOAD segment flagged as executable and to place it at 0x10000. In the end, after all of my poking and prodding, only a few lines of code needed to be changed inside LLD, but hey it worked. rld wasn’t throwing any errors or warnings, and there weren’t any segviolations. I’ll likely have to improve the method I’m using though since while it works, I’m not sure if it’ll cause problems on other platforms that use ELF. I might have to come up with a flag to signal it’s building for IRIX. That’s something I’ll worry about later on though, I suppose now I can go back to figuring out how to build a native LLVM for IRIX.