Breaking Down the System Layer
Tony Garnock-Jones
October 2022
The system layer (Rice 2019; Corbet 2019) is an essential part of an operating system, mediating between user-facing programs and the kernel. Its importance lies in its role as the technical foundation for many qualities1 relevant to system security, resilience, connectivity, maintainability and usability.
In the Linux world, existing system layer realizations cross-cut many, many projects: NetworkManager, GNOME, D-Bus, systemd, OpenRC, apt, apk, and so on. Each project has its own role in the overall system layer, and none takes a strong stance on the overall architecture that results from their combination. However, there are a group of basic concepts involved in a system layer that transcend individual subprojects, relating to issues of IPC, discovery, and whole-machine and application state management.
This document examines the architecture of system layers in general, touching on responsibilities currently handled at each of these levels, with the aim of bringing the concept of "system layer" into sharper focus.
What is a system layer?
The term "system layer" was coined2 by Benno Rice in a 2019 talk. Here's an excerpt from the relevant portion of Rice's talk:3
... dynamic DHCP, IPv6 auto config, all these kinds of things are more dynamic. Time is more dynamic. Some aspects of device handling, you know, all of these things are a lot more dynamic now, and we need a way of strapping these things together so we can manage them that doesn't involve installing 15 different packages that all behave differently.
[15:08] And so what that ends up becoming, is what I term the system layer. Which is a bunch of stuff which might be running in user space or might be running in kernel space but is providing systemic level stuff as opposed to the stuff that you're writing or using directly. So this could include things like NetworkManager, and udev, and a whole bunch of things.
Systemd as a project ends up complementing the Linux kernel by providing all of this user space system layer.
(It's a really good talk.) The system layer idea seems to have been latent for a long time, and only recently to have been given a name.
Some examples include:
- The Mac OS frameworks above the kernel level
- The Android system with its APIs and SDKs
- Various combinations of package manager, init system, service manager, support daemons, and user interface (be it ever so minimal); for example, debian+systemd+udevd+GNOME, or alpine+OpenRC+eudev+SSH.
Both Android and Mac OS embody substantially complete visions of a system layer, while the visions are much more fragmented in the Linux world. Even in cases where systemd makes up a good fraction of a particular system layer, most systems augment it with a wide variety of other software.
What does a system layer do?
A system layer addresses myriad system-level problems that applications face that are out-of-scope for the operating system kernel.
It solves these problems so that application developers can rely on shared vocabulary, common interfaces, and on communal development effort. The result is improved interoperability, compositionality, securability, etc., and reduced duplication of effort, less scope for design flaws, and so on.
The scope of the system layer changes with time as the needs of applications and users change and grow. The problems it addresses range from the highly abstract to the relatively concrete. For example, a system layer may:
- supply services in response to static or dynamic demand
- monitor and react to changes in system state
- give higher-level perspectives to users and applications on system state and resources
- offer access control mechanisms and enforce access control policies
- offer a coherent, system-wide approach to security and privacy
- offer inter-process communication media
- provide name-binding and name resolution services
- provide job-queueing and -scheduling services, including calendar-like and time-based scheduling
- provide user interface facilities
- provide system-wide "cut-and-paste" services for user-controlled IPC
- provide system configuration and user preference databases
- support software package installation, upgrade, and removal
- offer state (data, configuration) replication services
- provide data backup facilities
among other things. All of these areas are common across applications, unique to none of them.
To come up with this list, I surveyed4 a number of existing open systems such as Linux distributions, desktop environments, and so on, plus (in a limited way) Android and Mac OS, looking for commonalities and differences. That is, the list was developed in a largely informal way. Despite this, I've found it a fruitful starting point for an investigation of the properties of system layers in general. I welcome additional perspectives that others might bring.
In the remainder of this document, I'll use each of the topics in the list above as a perspective from which to examine existing software. I'll then attempt a synthesis of the results of this analysis into a firmer idea of what form a system layer could and perhaps should take.
Service management and system reactivity
An extremely common reoccuring pair of related themes in system layers of all sorts is service management and system reactivity. That is, the system layer takes on the tasks of starting and stopping services in response to static or dynamic demand, and of monitoring and reacting to changes in system state. While the kernel offers raw sense data plus a low-level vocabulary for managing the collection of running processes on a system, applications and users need a higher-level vocabulary for managing running software in terms of services and service relationships.
These tasks can be broken down into smaller, but still general, pieces:
- primitive ability to start and stop service instances
- declaration of singleton service instances, service classes, and instances of service classes
- declaration of relationships (including runtime dependencies) among services
- facility for managing service names and connecting service names to service instances
- user interface for examining the service namespace and the collection of running and runnable services
- facility for noticing and a medium for publishing and subscribing to changes in system state
Concrete examples include:
- starting services in response to statically-configured runlevels (OpenRC, systemd, SysV init, etc.)
- starting dependencies before dependent services (OpenRC, systemd, SysV init, etc.), including readiness-detection and -signalling
- restarting terminated or failed services in a supervision hierarchy (daemontools, s6, etc.; Erlang/OTP)
- starting services by service name on demand (D-Bus, etc.)
- starting services by socket activation (systemd, etc.)
- virtual-machine and container lifecycles, including supervision and restart of containers (docker, docker-compose, etc.)
- reacting to hotplugging of a device by installing a driver or starting a program (udevd, etc.)
- reacting to system metrics (e.g. temperature, load average, memory pressure) by changing something
- reacting to network connectivity changes (NetworkManager, etc.)
- setup and naming of devices and network routes (udevd, NetworkManager, etc.)
Laurent Bercot has produced an excellent comparison table in a page describing a new service manager for Linux distributions.
Higher-level perspectives on and control over system state and resources
An essential system layer task is to give users and applications higher-level perspectives on system state, resource availability and resource consumption than those offered by the kernel. This has two parts: refining low-level information about system state into higher-level knowledge, and reflecting user (or application) preferences expressed in terms of the higher-level perspective back into concrete actions to perform at the lower level.
As an example of the first, the kernel's
NETLINK_ROUTE
sockets allow processes to observe
changes in network interface and routing configuration, but applications often do not need the
fine detail on offer: instead, they need higher-level knowledge such as "a usable default route
for IPv4 exists", or "IPv4 connectivity is available, but metered".
As an example of the second, NetworkManager allows users to set policy for wifi connection establishment in terms of a priority ordering over SSIDs and conditions for when and whether to use a particular network. NetworkManager's job is to translate this into a sequence of low-level wifi scans, associations and disconnections.
Breaking this task down into smaller pieces yields:
- access to low-level descriptions of system state, resource availability, and resource usage
- ability to either poll for or subscribe to changes in such state
- ability to compute relevant higher-level perspectives on the state
- a medium for communicating such changes to users and applications
- a medium for retrieving preferences and actions from users and applications
- ability to perform actions on low-level system resources
Concrete examples include:
- computing default-route availability from
NETLINK_ROUTE
events overnetlink
sockets, as discussed - use of
NETLINK_KOBJECT_UEVENT
by udev to configure and expose hotplugged devices to userland - interrogation of disk devices and partition tables to provide views on and control over available filesystems (gnome-disks, etc.)
- interrogation of audio devices and audio routing options to provide high-level views and control over audio setup (pipewire, pulseaudio, etc.), e.g. volume level display and volume controls, mute, select input/output channel, play/pause, skip, rewind etc.
- high-level perspectives on devices such as displays, printers, mice, keyboards, touchpads, accelerometers, proximity sensors, temperature monitors and so on (GNOME, XFCE4, KDE, cups, etc.), communicated via D-Bus and friends
- system configuration databases (
/etc
, Windows' Registry, GNOME configuration databases) - location services mapping from low-level GPS and wifi information to medium-level concrete location coordinates to high-level "you are at home", "you are in the office"-style knowledge about location
- telephony services exposing high-level call management interfaces backed by low-level modem operations
Slightly harder to see, but still certainly an example of the subject of this section, is the
collection of userland tools commonly associated with Unix-like operating systems more
generally. The file system, for example, is firmly a systems concern and not an
application-level concern, so the system layer provides general tools for manipulating,
examining, and repairing the file system. This includes not only tools such as fsck
, df
,
and mount
, but facilities such as automounting, mounting and fsck
ing at boot, scanning and
manipulating partition tables, configuring lvm
, and even the humble ls
, cp
and friends.
On systems such as Mac OS, the Finder and Disk Utility programs and their associated underlying
system services are analogous parts of the system layer.
Access control mechanisms and policies, security, and privacy
An inescapable concern when composing software across trust domains is access control. System layers provide mechanisms for controlling access to software resources and data, allow users and applications to specify access control policies, and enforce those policies on their behalf.
Given the increasingly blurry lines between local and cloud-based personal computing, the scope of access controls can be broad, including confidentiality and integrity protections for user data and careful control over user privacy.
Multiple trust domains appear even in a single-user personal computing system: the kernel is its own trust domain; its daemon representatives within the system layer are at least one other; the user is a trust domain, and its system-layer representatives another; and each application is a trust domain, particularly when it is a third-party application acting on behalf of a user, perhaps bringing cloud services into the picture. Moving from a single- to a multiple-user system then adds only minor complexity.
Existing system layer realizations, at least within the Linux world, tend to address access control, security and particularly privacy at a relatively primitive level, relying on single-machine approaches to security and securability that do not scale well: for example, Unix ACLs and user- and group-ID-based permissions.
-
Debian, Alpine, and other Unix-like Linux distributions offer little or no access controls other than those provided by the kernel
-
Android uses the kernel user ID mechanism in a different way, giving an effective improvement in separation between trust domains when compared to traditional Unix approaches
-
D-Bus authenticates each connection separately, usually mapping principal identities onto Unix user IDs; within the scope of a connection, it uses ACLs to make authorization decisions
-
Some isolation among trust domains can be achieved with careful use of kernel namespaces; however, namespaces are not fine-grained and are awkward to use for privacy-protection purposes. They see use primarily for resource isolation in containerization systems.
Inter-process communication and networking
Networking is interprocess communication. —Robert Metcalfe, 1972, quoted in Day 2008
A key part of an operating system is the selection of communications media it offers its applications. The kernel itself offers a plethora of communication channels, from the file system itself through SysV IPC, shared memory, and pipes up to sockets in multiple flavours.
System layers need richer facilities in order to handle the reactivity, publish-subscribe, name-discovery and -management and access control needs previously discussed. In addition, the concept of an "address" within a system layer is often more complex than the low-level endpoint addresses on offer by the kernel: for example, D-Bus object names, email addresses and aliases, and Docker container names do not fit easily into kernel constructs, and this applies double for the addresses of fine-grained resources (e.g. single objects) within a process.
-
Traditional Unix-like system layers configure email for use by system services, primarily for system-to-user communication but also in principle for program-to-program communication.
-
D-Bus is a coarse-grained, ACL-based message bus with an ad-hoc object model and publish-subscribe mechanism. It has been used as the foundation for a lot of system layer software such as the components in the GNOME desktop environment and the building-blocks of NetworkManager and similar services.
-
X11 offers multiple methods by which clients can communicate with each other. Primary applications include shared clipboard management and window management, but the selection and property change notification mechanisms are general-purpose and could in principle form an interesting substrate for organising software components.
-
Android IPC is (if I understand correctly!) primarily based around binder and layers a number of communication "personalities" on top of it (such as AIDL, Broadcasts, and Messengers). Binder is apparently (1, 2, 3) a (mostly) object-capability ("ocap") system, with fine-grained object passing, failure-signalling (a "link to death" facility, much like Erlang's links and monitors), and distributed garbage-collection5 that is extremely widely used in Android.
From a 2009 email from Dianne Hackborne:
For a rough idea of the scope of the binder's use in Android, here is a list of the basic system services that are implemented on top of it: package manager, telephony manager, app widgets, audio services, search manager, location manager, notification manager, accessibility manager, connectivity manager, wifi manager, input method manager, clipboard, status bar, window manager, sensor service, alarm manager, content service, activity manager, power manager, surface compositor.
Name-binding, name-resolution, and namespaces
Many of the services offered by a system layer involve management and querying of mappings between high-level names and (zero or more) lower-level addresses (Day 2008). These appear in many different guises, from the directories in the file system, to DNS names (mDNS services like avahi; the libc resolver; services like dnsmasq), to device names (managed by udev), to object names (DBus), to service names, to preconfigured connection settings (NetworkManager), to user and group names and so on. Namespace management is a core feature of a system layer.
Job queueing and job scheduling
System layers frequently provide job-queueing and -scheduling services, including calendar-like and time-based scheduling. As a corollary, they also provide job- and schedule-management interfaces.
-
Traditional Unix has
cron
andat
for job scheduling. -
Android has system alarm services.
-
systemd has timers as a replacement for
cron
. -
systemd also has a job engine (see also here and here) for decoupling work in space and time.
-
print queues like
lpd
andcups
are job management engines at heart -
you can even see the mail queue as a kind of job queue (and if you squint very hard, you can see all the intermediate buffers in a networking or IPC system as job queues; cf Day 2008).
User interface
The user interface is a classic example of a system facility that cross-cuts individual applications and tasks. A system layer must provide some kind of user interface service to applications (and to its own system services).
-
At a minimum, Unix-like kernels offer
tty
s. Access to a system viassh
is a natural next step. -
X11 is the traditional Unix user interface, with its own IPC protocol and ad-hoc object model; wayland is a recent entrant into a similar space, also with its own IPC protocol and ad-hoc object model. Android offers SurfaceFlinger and WindowManager along with a large library of user interface widgets; the underlying IPC is presumably binder (see above).
-
In Smalltalk-80-derived systems (like squeak), the user interface is tightly integrated with the multiprocessing and IPC facilities (such as they are). Squeak also offers simple, quick-and-dirty "alert" and "prompt" APIs to applications, similar to the
alert
/prompt
/confirm
functions included in web browsers. -
Many, but not all, system layers provide a system-wide "cut and paste" service as part of their user interface, for user-controlled IPC. X11 applications have a clipboard convention; Mac OS, Windows, Android etc. have a standard clipboard.
-
System-level email can be seen as a form of user interface for reaching users (system administrators).
-
Many desktop environments include notifications and some form of system tray giving quick reference to high-level perspectives on system status as previously discussed.
-
Some system-layer administration tasks require user interface: for example, user input during
apt
package configuration.
Software management
System management involves upgrade of system code and installation, management and removal of
application code. Android has a solid story around software management. Linux distributions
tend to have package management tools (e.g. apt
, apk
, yum
etc.). Stretching a little
further, one might include the system programming language and its development environment as
part of the software management portion of a system layer: for example, many Unix-like systems
include cc
, and Smalltalk systems make the system programming language (Smalltalk) available
from any text input field.
State replication and data backup
The notion of state replication appears in many different contexts. For example, user contact/address databases must often be replicated and accessible across devices. System configuration data is often shared across servers in a cloud deployment (ansible, puppet). Many add-on applications like Dropbox, NextCloud, Syncthing etc. add file replication to a system. Applications like Google Keep, to-do list applications, and other sticky-notes/reminder apps replicate their databases across machines. Very few system layer realizations offer a coherent data replication facility, despite its clear cross-application utility.
Relatedly, preserving user data in case of calamity is a core operating system feature. Despite this, few whole systems offer a coherent data backup facility. Exceptions include Apple's Time Machine and Google's Android backup support libraries.
Synthesis, or, Toward a Complete Vision of a System Layer
Looking back at all these features and variations in design and implementation, we might imagine some kind of ideal system layer.
-
It should be structured around a flexible, high-performance communications substrate with a coherent, system-wide security model, a story around data privacy, flexible name-to-address mapping, and reliable failure signalling
-
It should offer a service description language and a mechanism for managing services, tracking service demand, and responding with appropriate service supply
-
It should allow modular addition of components that enrich it with additional high-level perspectives on the system
-
It should offer utility services such as job-queueing and -scheduling, including calendar-like and time-based scheduling
-
It should offer a user interface
-
It should provide data backup services
-
It could provide data replication services
The most important of these is, in my view, the communications substrate, which dovetails inextricably with the state-management and -introspection subsystem. A good design for this part of a system will have compounded effects and will make it easy to integrate portions of a system layer together. (Witness the success of Android's binder, discussed above!)
References
[Bass et al 1998] Bass, Len, Paul Clements, and Rick Kazman. Software Architecture in Practice. Addison-Wesley, 1998.
[Clements et al 2001] Clements, Paul, Rick Kazman, and Mark Klein. Evaluating Software Architectures: Methods and Case Studies. Addison-Wesley, 2001.
[Corbet 2019] Corbet, Jonathan. “Systemd as Tragedy.” LWN.Net, January 28, 2019. https://lwn.net/Articles/777595/.
[Day 2008] Day, John. Patterns in Network Architecture: A Return to Fundamentals. Prentice Hall, 2008.
[Rice 2019] Rice, Benno. “The Tragedy of Systemd.” Conference Presentation at linux.conf.au, Christchurch, New Zealand, January 24, 2019. https://www.youtube.com/watch?v=o_AIw9bGogo.
Notes
Known in the literature as “-ilities”; see e.g. Bass et al 1998 or Clements et al 2001.
I wrote to Benno Rice to ask him about the term. He replied that he
doesn't know of any earlier use of "system layer" for this particular bundle of ideas.
Quoted (with permission) from his email to me: I’m not going to claim to be the first
who thought of the idea but the name was something I came up with to describe the services
that run in userspace but provide system-level services. I’m happy to own it if nobody else
had the idea first. 🙃
It looks to me, then, like the term originated with him in 2019.
I cut and pasted the automated YouTube transcript of the talk, and then cleaned it up. (Emphasis mine.)
The raw notes that I took during my survey and during the Synit design process are available.
Looking at binder, I see strong similarities with the Syndicated Actor Model and its protocol!