A Short History Behind The Next Front-End Computer Operating System

Front-End Computers (FEC) are embedded systems whose main goal is the low-level control and acquisition of data from accelerator's hardware. As the Operating System (OS) of choice, we use Linux distributions on top of which we add dedicated software.

We migrated from LynxOS (https://www.lynx.com/) to Linux distributions around the year 2010, and since then we periodically evaluate which Linux distribution we want to support for the RUN to come. Since then, and up until now, the distribution of choice was always a Red Hat Enterprise Linux (RHEL) rebuild (https://www.redhat.com/). We started using Scientific Linux 5 and 6 (https://scientificlinux.org/) (a.k.a L865 and L867), a distribution maintained by CERN and FermiLab. Then we adopted, and still use, CERN CentOS 7 (https://www.centos.org/) (a.k.a L867). The main reason leading to this choice was the need to have a CERN wide policy concerning Linux systems to then leverage on the same pool of knowledge, work and support line. However, today, planning for RUN4, we are diverging from this policy and moving toward Debian 12 (https://www.debian.org/) due to technical limitations in the RHEL family preventing most of our systems to boot.

Red Hat Enterprise Linux is a Linux distribution made by Red Hat targeting mainly server-class machines. Red Hat's market segment is clear and Red Hat tailors all their services around it. Nowadays, millions of people rely on server-class machines processing an incredible amount of data just for their daily life. Therefore, server performance is a critical factor. On this class of machines, Intel processors are the absolute majority, and Intel works hard to improve their performance at every release. However, only processor-optimised compiled code can exploit the full processor power. To ease this work, processors generations have been split in different micro-architecture levels based on CPU features (see x86-64 psABI [1]). Then compiler support follows. Because of their priorities, Red Hat decided to optimise their code for micro-architecture level 2 (`-march=x86_64-v2`) in RHEL 9; and then for level 3 (`-march=x86_64-v3`) in RHEL 10 (to be released in 2025). The former choice made RHEL binaries incompatible with CPUs designed before 2010, while the latter extended the incompatibility to those designed before 2015. In other words, these computers will not even boot on RHEL 9 or 10. Thinking of the server-class market segment, this should have a minor impact because machines are typically replaced every 5 to 10 years. However, this is not automatically true for other class of machines like e.g. FECs.

Focusing back on FECs, optimising code for micro-architecture level 2 and level 3 makes obsolete, respectively, 47% and 64% of our systems, implying major investments in hardware upgrades. In BE-CEM and BE-CSS we conducted a risk analysis to better estimate the cost of adopting a RHEL-based distribution, therefore accepting this new optimisation, and we concluded that the cost is prohibitive and with low chance of success. This because beyond the problem of replacing between 47% and 64% of our processors there are more obstacles, and sometimes even more important: Single Board Computer (SBC) or platform replacements, redesign of old modules, procurement, chip shortage and price, re-cabling, redesign of rack space, validation, hiring, and, not to be underestimated, the time needed for these changes since LS3 is approaching. For this reason, we decided to provide a software solution to a software problem and we adopt Debian 12 as our next Linux distribution for Front-End Computers. Thanks to our strategic decision to invest in the portability of our custom layer, this change will not affect our internal developments.

The Next Front-End Computer Operating System

 

In addition to using a different Linux distribution from that of other systems at CERN, we are investing in a new development paradigm. A Linux system is made of 3, or 4, key components: a bootloader, a Linux kernel, optionally an initial ramdisk, and a root filesystem. Linux distributions ship these components together, and together they are used by their users. The advantage is that little work is necessary by the user to use the distribution. The downside of this approach is the loss of control on these components, in particular the Linux kernel being in the hands of the Linux distribution developers. FECs being embedded systems with a lot of custom hardware developed in-house, a higher degree of control and knowledge at the kernel level is necessary to better support all clients with current and future use cases (e.g. SoC-based systems). For this reason, in the future FEC OS we decomposed the system in four independent components: a GRUB 2.12 bootloader from https://www.gnu.org/, a Linux 5.10 stable kernel from https://www.kernel.org/, an initial ramdisk image built with dracut 57 from https://github.com/dracutdevs/dracut/, and finally a Debian 12 root filesystem from https://www.debian.org/. These components are independent, so they can be easily replaced by different versions for example for debugging, testing, or evaluation purposes. Moreover, this paradigm offers enough flexibility for users to self-support a dedicated solution only for a given component and rely on supported releases for the others (e.g. SoC needing a dedicated Linux kernel version).

For further validation, this summer we kickstarted a testing campaign with several Equipment Groups that we hope to conclude by the end of 2024Q1 or early 2024Q2.

[1] https://gitlab.com/x86-psABIs/x86-64-ABI