Fixing a 16+ Year Old Bug in Linux for DEC Alpha With Only One Line of Code

2024-02-07

In July of 2022, I acquired two workstations with processor architectures I had long had my eye on. The first was a Sun Blade 150, featuring an UltraSPARC IIe processor, on which I promptly installed Solaris 10. The second was a Compaq AlphaStation DS10, with an Alpha 21264A processor. Rather than one of Digital or Compaq’s proprietary operating systems, I decided to go with a Linux distro, first installing Gentoo and later, Debian, as I had several projects in mind to experiment with, and a proper Linux install would make them much easier to carry out. At the top of my list was getting Mesa up and running, as I was eager to try out a modern hardware-accelerated desktop such as XFCE, Gnome, or possibly even KDE Plasma. Given both the age of the workstation and its unusual processor architecture, I figured it would be a fun novelty to explore.

Make sure to also check out the companion article I wrote for this blog post covering lazy binding on Linux for DEC Alpha!

Alpha AXP

I imagine most people who would willingly take the time to read this post already know what the Alpha architecture is. Just in case - it was Digital Equipment Corporation (DEC)’s response to the rapid erosion of their traditional market niche, the enterprise minicomputer and low-mid end mainframe markets by the onslaught of UNIX workstation and server vendors building workstations and servers that could match, or even surpass, the performance of even DEC’s highest end VAX 9000 mainframe, for a small fraction of the cost by using microprocessors based on the then-new RISC paradigm. While the personal computer market was still in its infancy, it too was rapidly expanding, and with it came a further bite out of DEC’s relevance.

DEC initiated development of the Alpha architecture in 1989, and shipped the first commercial implementation, the DECchip 21064 (later renamed to Alpha 21064) in 1993. DEC aggressively followed industry trends, fully embracing RISC principles, as well as designing it from the outset to have a 64-bit native word size. Throughout the 1990s, and for a short period in the 2000’s, Alpha processors were near or at the top of the performance charts. Even once x86 processors began to catch up with the high-end RISC processors of the era, Alpha processors still maintained a slight lead in integer performance, but kept a large lead in floating point performance over contemporary x86 processors.

While development of the Alpha architecture ended up being rather short lived, falling far short of DEC’s anticipated 25 year active development timeline, I still harbor a deep interest in it due to both its place in computing history, as well as its design. Alpha - both in its architecture and implementations - managed to be both extremely conventional and unconventional at the same time. Digital, and later Compaq, followed many industry trends that were all the rage at the time, incorporating out of order execution, superscalar instruction issue, deep pipelines, high clock frequencies, and yet, also managing to leave their own flair with feature(s?) such as supporting both IEE754 and VAX floating point in order to make migration from Digital’s prior VAX architecture easier, PALcode (which was something of a combination between microcode, a BIOS, a hypervisor, and x86 SMM), a memory model that even to this day is quite infamously weak, a lack of 8/16-bit load/store instructions until the EV56, and more. Compaq’s research project, Piranha, designed an octa-core Alpha processor that almost resembled what later became Sun’s UltraSPARC T1, and the cancelled Tarantula extension to the Alpha 21464 called for a then-monstrously large vector unit, containing 31 registers (register 32 was all zeros) each capable of holding 128 64-bit values. Finally, compared to most other high performance RISC vendors of the 90s and early 2000’s, DEC was comparatively quite open with their Alpha platforms, publishing plenty of microarchitectural details, the SROM source code, open sourcing some of their AlphaBIOS and SRM PALcode variants, releasing evaluation boards and their associated schematics, as well as being the first high-end computer vendor to create a Linux port for their computers, and thus becoming the first officially mainlined non-x86 Linux port. DEC’s open sourced SRM PALcode was the basis for the first major Alpha Linux bootloader, named MILO, and DEC/Compaq themselves both contributed code to the project and hosted copies of it.

The State of Linux on Alpha AXP

As I mentioned, Alpha was the second officially supported architecture port within the Linux kernel (there had been some unofficial Motorola 68000 ports but the kernel didn’t gain official support for m68k until after Alpha had already been mainlined). For the time, Alpha-based computers enjoyed first class support: DEC and Compaq contributed documentation and code here and there, as well as software packages like their proprietary optimizing C/C++ compiler suite, and various popular Linux distros of the era such as Red Hat offered releases for Alpha computers. Compaq even went a step further and developed the Linux and Tru64 UNIX Affinity program, which made it significantly easier to not only develop applications that could easily be compiled for both their proprietary Tru64 UNIX (formerly DEC’s Digital UNIX) and Linux, but also in some cases, being able to directly execute Tru64 applications on Alpha Linux systems.

However, given that when I acquired my AlphaStation, it had been 21 years since Compaq had officially announced they were discontinuing further Alpha development and 15 years since HP had last manufactured any new Alpha’s or systems using Alpha CPU’s, needless to say, support for Alpha within the Linux ecosystem has more or less dried up at this point¹. Although the Linux kernel, glibc, and GCC still maintain their Alpha ports, the only Linux distros that still officially support Alpha machines are Gentoo and T2 SDE. Debian cut official support for Alpha some years back; however, several porters have maintained an unofficial release within the Debian Ports tree since then. I tried working with Gentoo initially, but the compile times eventually got to me, and I didn’t particularly feel like trying to set up a distcc box, so I switched to the Debian 11 Alpha port.

The Project

I quite enjoy doing silly and unusual things with vintage computers, especially formerly high end hardware such as workstations, servers, and game consoles. Thus, the first thing I wanted to do with the AlphaStation was attempt to install a contemporary hardware accelerated desktop under my Linux installation. For this, I needed to achieve a few things:

Track down and install a newer GPU that was able to fit within a PCI slot (the AlphaStation is too old to even support AGP, much less PCIe), could support at least OpenGL 2.0, and perhaps most importantly, has some decent open source driver support.
Install the graphics card drivers, both the userspace direct rendering manager (DRM) component, and its corresponding kernel modules.
Install X.org + hardware accelerated XFCE/Cinnamon/Gnome/whatever + preferably some cool demo apps.

I ended up settling on the PCI variant of the Nvidia GeForce 6200; it’s based on Nvidia’s NV40 architecture from around 2004, and from looking at the Nouveau driver support table, it appeared to be the best supported out of all the various OpenGL 2.0-capable Nvidia GPU architectures with a PCI variant I could easily find. It’s not a particularly good GPU, even when it was still contemporary, but performance-wise, it was more than sufficient for everything I had in mind.

I gestured at it before, but to elaborate a bit further, the modern Linux accelerated graphics architecture is composed of two main components: the OpenGL/Vulkan implementation in userspace - which, unless you’re using Nvidia’s proprietary drivers, is nearly always going to be Mesa3D, and the GPU specific Linux kernel driver. All the OpenGL/Vulkan calls and any GLSL or SPIR-V shader code get compiled down to GPU-specific commands or executables within Mesa and transmitted to the driver-specific kernel UAPI. From there, the GPU’s DRM Linux kernel driver will arbitrate GPU commands submitted from userspace.

Initially, the process was smooth sailing. Nouveau, XFCE, and the various Mesa3D components all had packages available within the Debian 11 Alpha port tree, allowing me to do a simple apt install for everything I needed, however, once I ran modprobe to load the nouveau kernel driver, it failed, complaining about a relocation overflow.

It turned out Nouveau wasn’t the only kernel module with this problem. XFS, and really, any especially large kernel module, would fail to load, returning the same error (albeit with different functions being cited as having failed to relocate). I suppose this also explains why XFS wasn’t available as a filesystem option for the root partition within the Debian installer.

The Theory

At this point, it’s worth backing up and talking about what a relocation is and what it means for a relocation to overflow. When a program is compiled, whether into object files or into a dynamically linked program (i.e., it uses shared libraries, such as .dll on Windows, .dylib on macOS, or .so on most other Unixes and Unix-likes), the compiler/linker can’t know ahead of time where every single symbol (function calls and variables not allocated on the stack) will be laid out in memory. For example, if a function within a source file references a symbol outside that source file, the compiler will emit a reference to be filled in later. Thus essentially passing off this task of connecting referenced symbols to actual memory addresses until a point in which these memory addresses are known, i.e., when the program is linked or executed. If we take a look at a description of the various relocation types available on Alpha, we find the following:

This description of relocation 4, aka R_LITERAL, is partly correct; it’s sourced from the Digital UNIX/Tru64 ECOFF manual since Linux on Alpha’s ABI and relocations are based on the former’s respective ABI and relocations, albeit with some differences. Digital UNIX/Tru64’s ECOFF binary format has a notion of different types of tables depending on how a binary is created, such as a GOT for shared objects and literal address pools for relocatable and static objects. Linux on Alpha’s ELF binary format makes no such distinction, instead exclusively relying on the GOT to hold pointers.

Effectively, an R_LITERAL relocation is emitted by the compiler upon generating code that references a function/variable, with the relocation in turn referencing the code making the function/variable reference. Upon encountering an R_LITERAL relocation, the linker will turn the indirect reference into a direct reference by filling in the offset to the function/variable address index from the beginning of the GOT. This all sounds great, except what is the GOT?

While the exact details of how it works partly depends on the processor architecture a program has been compiled for (or more specifically, that processor architecture’s Application Binary Interface, aka its ABI), broadly, executable code uses the GOT to figure out where functions and global variables are located in memory. When a program wants to access something located in the GOT, say, the globally shared errno result variable from the C library, the program will create an address to that address' location in the GOT, by taking the global pointer, contained in the gp register, and adding an offset to it. On Alpha, it’s usually a load instruction as follows:

ldq $dest_reg, symbol_offset(gp)

The R_LITERAL relocation applies to that offset, modifying the offset as needed to make sure the variable/function address load reaches the right index of the GOT.

GOT entries within an Alpha binary (objdump thinks the pointers are actually valid Alpha instructions).

Of course, this begs the question though, why does the module even need a Global Offset Table in the first place? Why can’t the addresses it needs simply be encoded as part of the instruction? I already covered a few reasons within the companion article, Diving Into Lazy Binding on Linux for DEC Alpha, but to recap, among other reasons, as the relocation error itself hints, Alpha - and most other RISC architectures - don’t contain enough bits free within their instructions to encode a full 64-bit address. Needless to say, fixed-length 32-bit instructions are a bit cramped. Thus, Alpha and most other RISC architectures use the aforementioned indexed addressing mode, where they store a base pointer in a register and make a load at an offset away from that base pointer. Alpha memory instruction encodings in particular can only hold a 16-bit value for their offset. Additionally, this immediate is treated as a signed value, allowing memory instructions to address both 32KB behind the address stored in the register containing the base memory pointer and 32KB ahead.

As Digital UNIX/Tru64’s ECOFF manual manual points out, in order for a memory instruction to be able to reach the full 64KB of the GOT, the gp register has to be set to +32KB ahead of the start of the GOT at the beginning of each function, as well as when a subroutine call returns back to the calling function. There can also be multiple GOTs in a given program, although any given function can only use one of those GOTs at most.

Lastly, a relocation overflow is effectively what it sounds like, when the linker is calculating offsets between base pointers, and the target pointers that are expected to be reached - if the offset is too large for the allowed data width, the relocation fails to apply.

The Fix

Given the above information, I initially began taking a look at the way in which the kernel module loader resolved LITERAL relocations, as well as the related GPDISP relocations. My assumption was that perhaps hidden within the convoluted mess of bitshifts and arithmetic necessary for relocation application, there was a small mistake. When that turned out to not be the issue, I turned my attention towards the way the global pointer itself was created.

Needless to say, that global pointer calculation looked a bit suspicious, especially given that the Digital UNIX/Tru64 ECOFF manual specifies the calculation as:

GP value = GP range start address + 32752

I decided to boot a Debian install within qemu-system-alpha, and take a peek at the two values using KGDB.

Needless to say, the above global pointer value was very incorrect, the GOT is only 14104 bytes long and yet the global pointer was 83454 bytes away from the start of the GOT?

Anyways, as you might be able to guess, changing the global pointer calculation to use the Digital UNIX/Tru64 ECOFF manual’s method completely solved the relocation failures. All in a single line.

The Result

While it was already possible to use filesystems such as XFS if one compiled an Alpha kernel with the filesystems baked in, dynamically-loaded modules would fail to load due to their size. Now that we had a working kernel module loader though…

That was pretty awesome, but I still wanted to try and get Nouveau working so I could get an accelerated desktop up and running. I went back to the AlphaStation, installed XFCE via apt-get, and upon launching it, XFCE more or less worked!

Predictably, it was extremely slow, and of course there were a few graphical glitches here and there, unsurprising given the age of the GeForce card, and Nvidia’s persistent reluctance to support open source GPU drivers. Regardless though, seeing a window manager that requires OpenGL 2.1 at a bare minimum, a feature that up until now had been effectively nonexistent on Alpha systems, was incredible.

I decided to have a bit of fun, and so I copied someone’s rice off Reddit and began applying it on the AlphaStation. Along the way I decided to additionally try out ClassiCube.

ClassiCube running while I was still working on the rice.

Fully finished rice! Background artwork by @nandawa_TW on Twitter.

Once I was done messing around, I submitted a patch to the linux-alpha mailing list. It took a while, but it eventually made it into the tree.

It’d be remiss of me to not bring up the fact that both NetBSD and OpenBSD still maintain official support for Alpha platforms. However, I figured for this project I’d like to experiment with the Linux ecosystem, given that it was historically treated as a tier 1 platform by DEC and Compaq. ↩︎