Synit is a Reactive Operating System

Welcome!

Synit is an experiment in applying pervasive reactivity and object capabilities to the System Layer of an operating system for personal computers, including laptops, desktops, and mobile phones. Its architecture follows the principles of the Syndicated Actor Model.

Synit builds upon the Linux kernel, but replaces many pieces of familiar Linux software, including systemd, NetworkManager, D-Bus, and so on. It makes use of many concepts that will be familiar to Linux users, but also incorporates many ideas drawn from programming languages and operating systems not closely connected with Linux's Unix heritage.

Quickstart

If you have a mobile phone or computer capable of running PostmarketOS, then you can install the software to try it out. You can also run Synit inside a virtual machine.

See the installation instructions for a list of supported devices.

Acknowledgements

Much initial work on Synit was made possible by a generous grant from the NLnet Foundation as part of the NGI Zero PET programme. Please see "Structuring the System Layer with Dataspaces (2021)" for details of the funded project.

Creative Commons License This manual is licensed under a Creative Commons Attribution 4.0 International License.
Copyright © 2021–2022 Tony Garnock-Jones tonyg@leastfixedpoint.com.

The Synit programs and source code are separately licensed. Please see the source code for details.

Architecture

The Syndicated Actor Model (SAM) is at the heart of Synit. In turn, the SAM builds upon E-style actors, replacing message-exchange with eventually-consistent state replication as the fundamental building block for interaction. Both E and the SAM are instances of the Object Capability (ocap) model, a compositional approach to system security.

The "feel" of the system is somewhere between Smalltalk-style object-orientation, publish-subscribe programming, E- or Erlang-style actor interaction, Prolog-style logic programming, and Esterel-style reactive dataflow.

  1. Programs are Actors. Synit programs ("actors" in the SAM) interoperate by dataspace-mediated exchange of messages and replication of conversational state expressed as assertions.

  2. Ocaps for security and privacy. The ocap model provides the fundamental building blocks for secure composition of programs in the system. Synit extends the core ocap model with Macaroon-inspired attenuation of capabilities, for both limiting visibility of state and constraining access to behaviour.

  3. Reactivity and homeostasis. Programs publish relevant aspects of their internal state to peers (usually by placing assertions in a dataspace). Peers subscribe to those assertions, reacting to changes in state to preserve overall system equilibrium.

  4. Heterogeneous; "open". Different programming languages and styles interoperate freely. Programs may or may not be structured internally using SAM principles: the system as a whole is where the architectural principles are applied. However, it often makes good sense to use SAM principles within a given Synit program as well as between programs.

  5. Language-neutral. Where possible, programs interoperate via a simple protocol across transports like TCP/IP, WebSockets, and Unix sockets and pipes. Otherwise, they interoperate using traditional Unix techniques. The concrete syntax for the messages and assertions exchanged among programs is the Preserves data language.

  6. Strongly typed. Preserves Schemas describe the data exchanged among programs. Schemas compile to type definitions in various programming languages, helping give an ergonomic development experience as well as ensuring safety at runtime.

Source code, Building, and Installation

The initial application of Synit is to mobile phones.

As such, in addition to regular system layer concepts, Synit supports concepts from mobile telephony: calls, SMSes, mobile data, headsets, speakerphone, hotspots, battery levels and charging status, and so on.

Synit builds upon many existing technologies, but primarily relies on the following:

  • PostmarketOS. Synit builds on PostmarketOS, replacing only a few core packages. All of PostmarketOS and Alpine Linux are available underneath Synit.

  • Preserves. The Preserves data language and its associated schema and query languages are central to Synit.

  • Syndicate. Syndicate is an umbrella project for tools and specifications related to the Syndicated Actor Model (the SAM).

You will need

  • A Linux development system. (I use Debian testing/unstable.)

  • Rust nightly and Cargo (perhaps installed via rustup).

  • Python 3.9 or greater

  • git, ssh, rsync

  • Make, a C compiler, and so on; standard Unix programming tools.

  • For cross builds (e.g. the very common case of building for aarch64 on an x86_64 host), qemu and its binfmt support. On Debian, apt install binfmt-support qemu-user-static. (NB. Version 1:7.0+dfsg-7 of qemu-user-static has a bug (possibly this one) which makes Docker-based cross builds hang. Downgrading qemu-user-static to version 1:5.2+dfsg-11+deb11u2 worked for me.)

  • Source code for Synit components (see below).

  • A standard PostmarketOS distribution for the target computer or mobile phone. If you don't want to install on actual hardware, you can use a virtual machine. See the instructions for installing PostmarketOS.

  • Great tolerance for the possibility of soft-bricking your phone. This is experimental software! When it breaks, you'll often have to (at least) reinstall PostmarketOS from absolute scratch on the machine. I do lots of development using qemu-amd64 for this reason.

Get the code

The Synit codebase itself is contained in the synit git repository:

git clone https://git.syndicate-lang.org/synit/synit

See the README for an overview of the contents of the repository.

Synit depends on published packages for Preserves and Syndicate support in each of the many programming languages it uses. These will be automatically found and downloaded during the Synit build process, but you can find details on the Preserves and Syndicate homepages, respectively.

For the Smalltalk-based phone-management and UI part of the system, you will need a number of other tools. See the README for the squeak-phone repository:

git clone https://git.syndicate-lang.org/tonyg/squeak-phone

Build the packages

To build, type make ARCH=<architecture> in the root of your checkout, where <architecture> is one of:

  • aarch64 (default), for e.g. Pinephone or Samsung Galaxy S7 deployment
  • x86_64, for e.g. qemu-amd64 deployment

If you see errors of the form "exec /bin/sh: exec format error" while building, say, the aarch64 packages using an x86_64 build host, you need to install qemu's binfmt support. See above.

The result of the build will be a collection of Alpine Linux apk packages in packaging/target/packages/<architecture>/. At the time of writing, these include

  • preserves-schemas, common schema files for working with general Preserves data and schemas
  • preserves-tools, standard command-line tools for working with Preserves documents (pretty-printer, document query processor, etc.)
  • py3-preserves, python support libraries for Preserves
  • py3-syndicate, python support for the Syndicated Actor Model
  • squeak-cog-vm and squeak-stack-vm, Squeak Smalltalk virtual machine for the Smalltalk-based portions of the system
  • syndicate-schemas, common schema files for working with the Syndicated Actor Model
  • syndicate-server, package for the core system bus
  • synit-config, main package for Synit, with configuration files, init scripts, system daemons and so on.
  • synit-pid1, PID1 program for Synit that starts the core system bus and then becomes passive

Install PostmarketOS on your system

Follow the instructions for your device on the PostmarketOS wiki.

Boot and connect your device to your development machine. Make sure you can ssh into it.

Upload Synit packages

Use scripts/upload-bundle.sh to rsync the ingredients for transforming stock PostmarketOS to Synit to the phone.

Run the transformation script

Use ssh to log into your phone. Run ./transmogrify.sh. (If your user's password on the phone is anything other than user, you will have to run SUDOPASS=yourpassword ./transmogrify.sh.)

This will install the Synit packages. After this step is complete, next time you boot the system, it will boot into Synit. It may very well be unbootable at this point, depending on the state of the codebase! Make sure you know how to restore to a stock PostmarketOS installation.

Install the Smalltalk parts of the system (optional)

If you want to experiment with the Smalltalk-based modem support and UI, follow the instructions in the squeak-phone README now.

Reboot and hope

With luck, you'll see the Smalltalk user interface start up. (If you didn't install the UI, you should still be able to ssh into the system.) From here, you can operate the system normally, following the information in the following chapter.

Glossary

Action

In the Syndicated Actor Model, an action may be performed by an actor during a turn. Actions are quasi-transactional, taking effect only if their containing turn is committed.

Four core varieties of action, each with a matching variety of event, are offered across all realisations of the SAM:

  • An assertion action publishes an assertion at a target entity. A unique handle names the assertion action so that it may later be retracted. For more detail, see below on Assertions.

  • A retraction action withdraws a previously-published assertion from the target entity.

  • A message action sends a message to a target entity.

  • A synchronization action carries a local entity reference to a target entity. When it eventually reaches the target, the target will (by default) immediately reply with a simple acknowledgement to the entity reference carried in the request. For more detail, see below on Synchronization.

Beside the core four actions, many individual implementations offer action variants such as the following:

  • A spawn action will, when the containing turn commits, create a new actor running alongside the acting party. In many implementations, spawned actors may optionally be linked to the spawning actor.

  • Replacement of a previously-established assertion, "altering" the target entity reference and/or payload. This proceeds, conventionally, by establishment of the new assertion followed immediately by retraction of the old.

Finally, implementations may offer pseudo-actions whose effects are local to the acting party:

Active Facet

The facet associated with the event currently being processed in an active turn.

Actor

In the Syndicated Actor Model, an actor is an isolated thread of execution. An actor repeatedly takes events from its mailbox, handling each in a turn. In many implementations of the SAM, each actor is internally structured as a tree of facets.

Alarm

See timeout.

Assertion

  • verb. To assert (or to publish) a value is to choose a target entity and perform an action conveying an assertion to that entity.

  • noun. An assertion is a value carried as the payload of an assertion action, denoting a relevant portion of a public aspect of the conversational state of the sending party that it has chosen to convey to the recipient entity.

    The value carried in an assertion may, in some implementations, depend on one or more dataflow variables; in those implementations, when the contents of such a variable changes, the assertion is automatically withdrawn, recomputed, and re-published (with a fresh handle).

Attenuation

To attenuate a capability (yielding an attenuated capability), a sequence of filters is prepended to the possibly-empty list of filters attached to an existing capability. Each filter either discards, rewrites, or accepts unchanged any payload directed at the underlying capability. A special pattern language exists in the Syndicate network protocol for describing filters; many implementations also allow in-memory capabilities to be filtered by the same language.

Capability

(a.k.a. Cap) Used roughly interchangeably with "reference", connoting a security-, access-control-, or privacy-relevant aspect.

Cell

See dataflow variable.

Compositional

To quote the Stanford Encyclopedia of Philosophy, the "principle of compositionality" can be understood to be that

The meaning of a complex expression is determined by its structure and the meanings of its constituents.

People often implicitly intend "... and nothing else." For example, when I claim that the object-capability model is a compositional approach to system security, I mean that the access conveyed by an assemblage of capabilties can be understood in terms of the access conveyed by each individual capability taken in isolation, and nothing else.

Configuration Scripting Language

Main article: The Configuration Scripting Language

The syndicate-server program includes a scripting language, used for configuration of the server and its clients, population of initial dataspaces for the system that the syndicate-server instance is part of, and scripting of simple behaviours in reaction to appearance of assertions or transmission of messages.

The scripting language is documented here.

Conversational State

The collection of facts and knowledge held by a component participating in an ongoing conversation about some task that the component is undertaking:

The conversational state that accumulates as part of a collaboration among components can be thought of as a collection of facts. First, there are those facts that define the frame of a conversation. These are exactly the facts that identify the task at hand; we label them “framing knowledge”, and taken together, they are the “conversational frame” for the conversation whose purpose is completion of a particular shared task. Just as tasks can be broken down into more finely-focused subtasks, so can conversations be broken down into sub-conversations. In these cases, part of the conversational state of an overarching interaction will describe a frame for each sub-conversation, within which corresponding sub-conversational state exists. The knowledge framing a conversation acts as a bridge between it and its wider context, defining its “purpose” in the sense of the [Gricean Cooperative Principle]. [The following figure] schematically depicts these relationships.

Figure 2 from Garnock-Jones 2017

Some facts define conversational frames, but every shared fact is contextualized within some conversational frame. Within a frame, then, some facts will pertain directly to the task at hand. These, we label “domain knowledge”. Generally, such facts describe global aspects of the common problem that remain valid as we shift our perspective from participant to participant. Other facts describe the knowledge or beliefs of particular components. These, we label “epistemic knowledge”.

Excerpt from Chapter 2 of (Garnock-Jones 2017). The quoted section continues here.

In the Syndicated Actor Model, there is often a one-to-one correspondence between a facet and a conversational frame, with fate-sharing employed to connect the lifetime of the one with the lifetime of the other.

Dataflow

A programming model in which changes in stored state automatically cause re-evaluation of computations depending on that state. The results of such re-evaluations are themselves often used to update a store, potentially triggering further re-computation.

In the Syndicated Actor Model, dataflow appears in two guises: first, at a coarse granularity, among actors and entities in the form of changes in published assertions; and second, at fine granularity, many implementations include dataflow variables and dataflow blocks for intra-actor dataflow-based management of conversational state and related computation.

Dataflow Block

Implementations of the Syndicated Actor Model often include some language feature or library operation for marking a portion of code as participating in dataflow, where changes in observed dataflow variables cause re-evaluation of the code block.

For example, in a Smalltalk implementation of the SAM,

a := Turn active cell: 1.
b := Turn active cell: 2.
sum := Turn active cell: 0.
Turn active dataflow: [sum value: a value + b value].

Later, as a and b have their values updated, sum will automatically be updated by re-evaluation of the block given to the dataflow: method.

Analogous code can be written in TypeScript:

field a: number = 1;
field b: number = 2;
field sum: number = 0;
dataflow {
    sum.value = a.value + b.value;
}

in Racket:

(define-field a 1)
(define-field b 2)
(define/dataflow sum (+ (a) (b)))

in Python:

a = turn.field(1)
b = turn.field(2)
sum = turn.field(0)
@turn.dataflow
def maintain_sum():
    sum.value = a.value + b.value

and in Rust:

turn.dataflow(|turn| {
    let a_value = turn.get(&a);
    let b_value = turn.get(&b);
    turn.set(&sum, a_value + b_value);
})

Dataflow Variable

(a.k.a. Field, Cell) A dataflow variable is a store for a single value, used with dataflow blocks in dataflow programming.

When the value of a dataflow variable is read, the active dataflow block is marked as depending on the variable; and when the value of the variable is updated, the variable is marked as damaged, leading eventually to re-evaluation of dataflow blocks depending on that variable.

Dataspace

In the Syndicated Actor Model, a dataspace is a particular class of entity with prescribed behaviour. Its role is to route and replicate published assertions according to the declared interests of its peers.

See here for a full explanation of dataspaces.

Dataspace Pattern

In the Syndicated Actor Model, a dataspace pattern is a structured value describing a pattern over other values.

TODO: link to documentation

E

The E programming language is an object-capability model Actor language that has strongly influenced the Syndicated Actor Model.

Many good sources exist describing the language and its associated philosophy, including:

  • The ERights.org website, the home of E

  • E (programming language) on Wikipedia

  • Miller, Mark S. “Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control.” PhD, Johns Hopkins University, 2006. [PDF]

  • Miller, Mark S., E. Dean Tribble, and Jonathan Shapiro. “Concurrency Among Strangers.” In Proc. Int. Symp. on Trustworthy Global Computing, 195–229. Edinburgh, Scotland, 2005. [DOI] [PDF]

Embedded References

In the Syndicated Actor Model, the values carried by assertions and messages may include references to entities. Because the SAM uses Preserves as its data language, the Preserves concept of an embedded value is used in the SAM to reliably mark portions of a datum referring to SAM entities.

Concretely, in Preserves text syntax, embedded values appear prepended with #!. In messages transferred across links using the Syndicate network protocol, references might appear as #![0 123], #![1 555], etc. etc.

Entity

In the Syndicated Actor Model, an entity is a stateful programming-language construct, located within an actor, that is the target of events. Each entity has its own behaviour, specifying in code how it responds to incoming events.

An entity is the SAM analogue of "object" in E-style languages: an addressable construct logically contained within and fate-sharing with an actor. The concept of "entity" differs from "object" in that entities are able to respond to assertions, not just messages.

In many implementations of the SAM, entities fate-share with individual facets within their containing actor rather than with the actor as a whole: when the facet associated with an entity is stopped, the entity becomes unresponsive.

Erlang

Erlang is a process-style Actor language that has strongly influenced the Syndicated Actor Model. In particular, Erlang's approach to failure-handling, involving supervisors arranged in supervision trees and processes (actors) connected via links and monitors, has been influential on the SAM. In the SAM, links and monitors become special cases of assertions, and Erlang's approach to process supervision is used directly and is an important aspect of SAM system organisation.

Event

In the Syndicated Actor Model, an event is processed by an entity during a turn, and describes the outcome of an action taken by some other actor.

Events come in four varieties corresponding to the four core actions in the SAM:

  • An assertion event notifies the recipient entity of an assertion published by some peer. A unique handle names the event so that later retraction of the assertion can be correlated with the assertion event.

  • A retraction event notifies the recipient entity of withdrawal of a previously-published assertion.

  • A message event notifies the recipient entity of a message sent by some peer.

  • A synchronization event, usually not handled explicitly by an entity, carries an entity reference. The recipient should arrange for an acknowledgement to be delivered to the referenced entity once previously-received events that might modify the recipient's state (or the state of a remote entity that it is proxy for) have been completely processed. For more detail, see below on Synchronization.

Facet

In many implementations of the Syndicated Actor Model, a facet is a programming-language construct representing a conversation and corresponding to a conversational frame. Facets are similar to the "nested threads" of Martin Sústrik's idea of Structured Concurrency (see also Wikipedia).

Every actor is structured as a tree of facets. (Compare and contrast with the diagram in the entry for Conversational State.)

Every facet is either "running" or "stopped". Each facet is the logical owner of zero or more entities as well as of zero or more published assertions. A facet's entities and published assertions share its fate. While a facet is running, its associated entities are responsive to incoming events; when it stops, its entities become permanently unresponsive. A stopped facet never starts running again. When a facet is stopped, all its assertions are retracted and all its subfacets are also stopped.

Facets may have stop handlers associated with them: when a facet is stopped, its stop handlers are executed, one at a time. The stop handlers of each facet are executed before the stop handlers of its parent and before its assertions are withdrawn.

Facets may be explicitly stopped by a stop action, or implicitly stopped when an actor crashes. When an actor crashes, its stop handlers are not run: stop handlers are for orderly processing of conversation termination. Instead, many implementations allow actors to have associated crash handlers which run only in case of an actor crash. In the limit, of course, even crash handlers cannot be guaranteed to run, because the underlying hardware or operating system may suffer some kind of catastrophic failure.

Fate-sharing

A design principle from large-scale network engineering, due to David Clark:

The fate-sharing model suggests that it is acceptable to lose the state information associated with an entity if, at the same time, the entity itself is lost.

David D. Clark, “The Design Philosophy of the DARPA Internet Protocols.” ACM SIGCOMM Computer Communication Review 18, no. 4 (August 1988): 106–14. [DOI]

In the Syndicated Actor Model, fate-sharing is used in connecting the lifetime of conversational state with the programming language representation of a conversational frame, a facet.

Field

See dataflow variable.

Handle

In the Syndicated Actor Model, every assertion action (and the corresponding event) includes a scope-lifetime-unique handle that denotes the specific action/event concerned, for purposes of later correlation with a retraction action.

Handles are, in many cases, implemented as unsigned integers, allocated using a simple scope-wide counter.

Initial OID

In the Syndicate network protocol, the initial OID is a special OID value understood by prior arrangement to denote an entity (specified by the "initial ref") owned by some remote peer across some network medium. The initial OID of a session is used to bootstrap activity within that session.

Initial Ref

In the Syndicate network protocol, the initial ref is a special entity reference associated by prior arrangement with an initial OID in a session in order to bootstrap session activity.

Linked Actor

Many implementations of the Syndicated Actor Model offer a feature whereby an actor can be spawned so that its root facet is linked to the spawning facet in the spawning actor, so that when one terminates, so does the other (by default).

sub– facet (1.2) sub–sub– facet (1.3.4) Actor 1 root facet (1) sub–sub– facet (1.3.5) sub–facet (1.3) presence presence sub– facet (6.7) sub– facet (6.8) root facet (6) Actor 6

Links are implemented as a pair of "presence" assertions, atomically established at the time of the spawn action, each indicating to a special entity with "stop on retraction" behaviour the presence of its peer. When one of these assertions is withdrawn, the targetted entity stops its associated facet, automatically terminating any subfacets and executing any stop handlers.

This allows a "parent" actor to react to termination of its child, perhaps releasing associated resources, and the corresponding "child" actor to be automatically terminated when the facet in its parent that spawned the actor terminates.

This idea is inspired by Erlang, whose "links" are symmetric, bidirectional, failure-propagating connections among Erlang processes (actors) and whose "monitors" are unidirectional connections similar to the individual "presence" assertions described above.

Linked Task

Many implementations of the Syndicated Actor Model offer the ability to associate a facet with zero or more native threads, coroutines, objects, or other language-specific representations of asynchronous activities. When such a facet stops (either by explicit stop action or by crash-termination of the facet's actor), its linked tasks are also terminated. By default, the converse is also the case: a terminating linked task will trigger termination of its associated facet. This allows for resource management patterns similar to those enabled by the related idea of linked actors.

Macaroon

A macaroon is an access token for authorization of actions in distributed systems. Macaroons were introduced in the paper:

“Macaroons: Cookies with Contextual Caveats for Decentralized Authorization in the Cloud.”, by Arnar Birgisson, Joe Gibbs Politz, Úlfar Erlingsson, Ankur Taly, Michael Vrable, and Mark Lentczner. In Proc. Network and Distributed System Security Symposium (NDSS), 2014. [PDF]

In the Syndicated Actor Model, a variation of the macaroon concept is used to represent "sturdyrefs". A sturdyref is a long-lived token authorizing interaction with some entity, which can be upgraded to a live entity reference by presenting it to a gatekeeper entity (TODO: link) across a session of the Syndicate network protocol. (The term "sturdyref" is lifted directly from the E language and associated ecosystem.)

Mailbox

Every actor notionally has a mailbox which receives events resulting from its peers' actions. Each actor spends its existence waiting for an incoming event to appear in its mailbox, removing the event, taking a turn to process it, and repeating the cycle.

Membrane

A membrane is a structure used in implementations of the Syndicate network protocol to keep track of wire symbols.

Message

In the Syndicated Actor Model, a message is a value carried as the payload or body of a message action (and associated event), conveying transient information from some sending actor to a recipient entity.

Network

A network is a group of peers (actors), plus a medium of communication (a transport), an addressing model (references), and an associated scope.

Object-Capability Model

The Object-capability model is a compositional means of expressing access control in a distributed system. It has its roots in operating systems research stretching back decades, but was pioneered in a programming language setting by the E language and the Scheme dialect W7.

In the Syndicated Actor Model, object-capabilities manifest as potentially-attenuated entity references.

Observe

In the Syndicated Actor Model, assertion of an Observe record at a dataspace declares an interest in receiving notifications about matching assertions and messages as they are asserted, retracted and sent through the dataspace.

Each Observe record contains a dataspace pattern describing a structural predicate over assertion and message payloads, plus an entity reference to the entity which should be informed as matching events appear at the dataspace.

OID

An OID is an "object identifier", a small, session-unique integer acting as an entity reference across a transport link in an instance of the Syndicate network protocol.

Publishing

To publish something is to assert it; see assertion.

Preserves

Main article: Preserves

Many implementations of the SAM use Preserves, a programming-language-independent language for data, as the language defining the possible values that may be exchanged among entities in assertions and values.

See the chapter on Preserves in this manual for more information.

Record

The Preserves data language defines the notion of a record, a tuple containing a label and zero or more numbered fields. The dataspace pattern language used by dataspaces allows for patterns over records as well as over other compound data structures.

Reference

(a.k.a. Ref, Entity Reference, Capability) A reference is a pointer or handle denoting a live, stateful entity running within an actor. The entity accepts Preserves-format messages and/or assertions. The capability may be attenuated to restrict the messages and assertions that may be delivered to the denoted entity by way of this particular reference.

Retraction

In the Syndicated Actor Model, a retraction is an action (and corresponding event) which withdraws a previous assertion. Retractions can be explicitly performed within a turn, or implicitly performed during facet shutdown or actor termination (both normal termination and crash stop).

The SAM guarantees that an actor's assertions will be retracted when it terminates, no matter whether an orderly shutdown or an exceptional or crashing situation was the cause.

Relay

A relay connects scopes, allowing references to denote entities resident in remote networks, making use of the Syndicate network protocol to do so.

See the Syndicate network protocol for more on relays.

Relay Entity

A relay entity is a local proxy for an entity at the other side of a relay link. It forwards events delivered to it across its transport to its counterpart at the other end.

See the Syndicate network protocol for more on relay entities.

S6

S6, "Skarnet's Small Supervision Suite", is

a small suite of programs for UNIX, designed to allow process supervision (a.k.a service supervision), in the line of daemontools and runit, as well as various operations on processes and daemons.

The S6 website

Synit uses s6-log to capture standard error output from the root system bus.

Schema

A schema defines a mapping between values and host-language types in various programming languages. The mapping describes how to parse values into host-language data, as well as how to unparse host-language data, generating equivalent values. Another way of thinking about a schema is as a specification of the allowable shapes for data to be used in a particular context.

Synit, and many programs making use of the Syndicated Actor Model, uses Preserves' schema language to define schemas for many different applications.

For more, see the section on schemas in the chapter on Preserves.

Scope

A scope maps refs to the entities they denote. Scopes exist in one-to-one relationship to networks. Because message bodies and asserted values contain embedded references, each message and assertion transmitted via some network is also inseparable from its scope.

Most actors will participate in a single scope. However, relay actors participate in two or more scopes, translating refs back and forth as messages and assertions traverse the relay.

Examples.

  1. A process is a scope for in-memory values: in-memory refs contain direct pointers to entities, which cannot be interpreted outside the context of the process's address space. The "network" associated with the process's scope is the intra-process graph of object references.

  2. A TCP/IP socket (or serial link, or WebSocket, or Unix socket, etc.) is a scope for values travelling between two connected processes: refs on the wire denote entities owned by one or the other of the two participants. The "network" for a socket's scope is exactly the two connected peers (NB. and is not the underlying TCP/IP network, HTTP network, or Unix kernel that supports the point-to-point link).

  3. An ethernet segment is a scope for values broadcast among stations: the embedded refs are (MAC address, OID) pairs. The network is the set of participating peers.

  4. A running web page is a scope for the JavaScript objects it contains: both local and remote entities are represented by JavaScript objects. The "network" is the JavaScript heap.

Subscription

See observation.

Supervision tree

A supervision tree is a concept borrowed from Erlang, where a root supervisor supervises other supervisors, which in turn supervise worker actors engaged in some task. As workers fail, their supervisors restart them; if the failures are too severe or too frequent, their direct supervisors fail in turn, and the supervisors' supervisors take action to recover from the failures.

Supervisor

A supervisor is an actor or facet whose role is to monitor the state of some service, taking action to ensure its availability to other portions of a complete system. When the service fails, the supervisor is able to restart it. If the failures are too severe or too frequent, the supervisor can take an alternative action, perhaps pausing for some time before retrying the service, or perhaps even terminating itself to give its own supervisor in a supervision tree a chance to get things back on track.

Synit uses supervisors extensively to monitor system daemons and other system services.

Sync Peer Entity

The sync peer entity is the entity reference carried in a synchronization action or event.

Synchronization

An actor may synchronize with an entity by scheduling a synchronization action targeted at that entity. The action will carry a local entity reference acting as a continuation. When the target entity eventually responds, it will transmit an acknowledgement to the continuation entity reference carried in the request.

An entity receiving a synchronization event should arrange for an acknowledgement to be delivered to the referenced continuation entity once previously-received events that might modify the recipient's state (or the state of a remote entity that it is proxy for) have been completely processed.

Most entities do not explicitly include code for responding to synchronization requests. The default code, which simply replies to the continuation immediately, usually suffices. However, sometimes the default is not appropriate. For example, when relay entity is proxying for some remote entity via a relay across a transport, it should react to synchronization events by forwarding them to the remote entity. When the remote entity receives the forwarded request, it will reply to its local proxy for the continuation entity, which will in turn forward the reply back across the transport.

Syndicate Protocol

Main article: The Syndicate Protocol

The Syndicate Protocol (a.k.a the Syndicate Network Protocol) allows relays to proxy entities from remote scopes into the local scope.

For more, see the protocol specification document.

Syndicated Actor Model

Main article: The Syndicated Actor Model

The Syndicated Actor Model (often abbreviated SAM) is the model of concurrency and communication underpinning Synit. The SAM offers a “conversational” metaphor: programs meet and converse in virtual locations, building common understanding of the state of their shared tasks.

In the SAM, source entities running within an actor publish assertions and send messages to target entities, possibly in other actors. The essential idea of the SAM is that state replication is more useful than message-passing; message-passing protocols often end up simulating state replication.

A thorough introduction to the Syndicated Actor Model is available.

System Layer

The system layer is an essential part of an operating system, mediating between user-facing programs and the kernel. It provides the technical foundation for many qualities relevant to system security, resilience, connectivity, maintainability and usability.

The concept of a system layer has only been recently recognised—the term itself was coined by Benno Rice in a 2019 conference presentation—although many of the ideas it entails have a long history.

The hypothesis that the Synit system explores is that the Syndicated Actor model provides a suitable theoretical and practical foundation for a system layer. The system layer demands, and the SAM supplies, well-integrated expression of features such as service naming, presence, discovery and activation; security mechanism and policy; subsystem isolation; and robust handling of partial failure.

System Dataspace

The system dataspace in Synit is the primary dataspace entity, owned by an actor running within the root system bus, and (selectively) made available to daemons, system services, and user programs.

Timeout

Many implementations of the Syndicated Actor Model offer actions for establishing timeouts, i.e. one-off or repeating alarms. Timeouts are frequently implemented as linked tasks.

Transport

A transport is the underlying medium connecting one relay to its counterpart(s) in an instance of the Syndicate network protocol. For example, a TLS-on-TCP/IP socket may connect a pair of relays to one another, or a UDP multicast socket may connect an entire group of relays across an ethernet.

Turn

Each time an event arrives at an actor's mailbox, the actor takes a turn. A turn is the process of handling the triggering event, from the moment of its withdrawal from the mailbox to the moment of the completion of its interpretation.

Relatedly, the programming-language representation of a turn is a convenient place to attach the APIs necessary for working with the Syndicated Actor Model. In many implementations, some class named Turn or similar exposes methods corresponding to the actions available in the SAM.

In the SAM, a turn comprises

  • the event that triggered the turn,
  • the entity addressed by the event,
  • the facet owning the targeted entity, and
  • the collection of pending actions produced during execution.

If a turn proceeds to completion without an exception or other crash, its pending actions are committed (finalised and/or delivered to their target entities). If, on the other hand, the turn is aborted for some reason, its pending actions are rolled back (discarded), the actor is terminated, its assertions retracted, and all its resources released.

Value

A Preserves Value with embedded data. The embedded data are often embedded references but, in some implementations, may be other kinds of datum. Every message body and every assertion payload is a value.

Wire Symbol

A wire symbol is a structure used in implementations of the Syndicate network protocol to maintain a connection between an in-memory entity reference and the equivalent name for the entity as used in packets sent across the network.

System overview

Synit uses the Linux kernel as a hardware abstraction and virtualisation layer.

All processes in the system are arranged into a supervision tree, conceptually rooted at the system bus.

Browser console getty init Email Root System Bus (syndicate–server) udevd Network Interface Monitor Daemon Wifi Daemon (wlan0) X Server Session bus (syndicate– server) ... . . .

While init is PID 1, and thus the root of the tree of processes according to the kernel, it is not the root of the supervision tree. The init process, acting as management daemon for the kernel from Synit's perspective, is "supervised" by the system bus like all other services. The supervision tree is a Synit concept, not a Linux concept.

Boot process

The kernel first loads the stock PostmarketOS initrd, which performs a number of important tasks and then delegates to /sbin/init.

/sbin/init = synit-init.sh

The synit-config package overrides the usual contents of /sbin/init, replacing it with a short shell script, synit-init.sh. This script, in turn, takes care of a few boring tasks such as mounting /dev, /proc, /run, etc., ensuring that a few important directories exist, and remounting / as read-write before execing /sbin/synit-pid1.

For the remainder of the lifetime of the system, /sbin/synit-pid1 is the PID 1 init process.

/sbin/synit-pid1

The synit-pid1 program starts by spawning the system bus (syndicate-server in the process tree above) and the program /sbin/synit-log, connecting stderr of the former to stdin of the latter.

It then goes on to perform two tasks concurrently: the first is the Unix init role, reaping zombie processes, and the second is to interact with the system bus as an ordinary system service.

The latter allows the system to treat init just like any other part of the system, accessing its abilities to reboot or power off the system using messages and assertions in the system dataspace as usual.

Even though synit-pid1 is, to the kernel, a parent process of syndicate-server, it is logically a child process.

/sbin/synit-log

This short shell script invokes the S6 program s6-log to capture log output from the system bus, directing it to files in /var/log/synit/.

The System Bus: syndicate-server

The syndicate-server program has a number of closely-related functions. In many ways, it is a reification of the system layer concept itself.

It provides:

  1. A root system bus service for use by other programs. In this way, it is analogous to D-Bus.

  2. A configuration language suitable for programming dataspaces with simple reactive behaviours.

  3. A general-purpose service dependency tracking facility.

  4. A gatekeeper service, for exposing capabilities to running objects as (potentially long-lived) macaroon-style "sturdy references", plus TCP/IP- and Unix-socket-based transports for accessing capabilities through the gatekeeper.

  5. An inotify-based configuration tracker which loads and executes configuration files written in the scripting language.

  6. Process startup and supervision services for running external programs.

The program can also be used as an "inferior" bus. For example, there may be a per-user bus, or a per-session bus, or both. Each bus would appropriately scope the lifetime of its supervised processes.

Finally, it can be used completely standalone, outside a Synit context.

The root system bus

The synit-pid1 program invokes syndicate-server like this:

/usr/bin/syndicate-server --inferior --config /etc/syndicate/boot

The first flag, --inferior, tells the server to expect to be able to communicate on its stdin/stdout using the standard wire protocol. This lets synit-pid1 join the community of actors running within the system dataspace.

The second flag, --config /etc/syndicate/boot, tells the server to start monitoring the directory tree rooted at /etc/syndicate/boot for changes. Files whose names end with .pr within that tree are loaded as configuration script files.

Almost all of Synit is a consequence of careful use of the configuration script files in /etc/syndicate.

Configuration scripting language

The syndicate-server program includes a mechanism that was originally intended for populating a dataspace with assertions, for use in configuring the server, but which has since grown into a small Syndicated Actor Model scripting language in its own right. This seems to be the destiny of "configuration formats"—why fight it?—but the current language is inelegant and artificially limited in many ways. I have an as-yet-unimplemented sketch of a more refined design to replace it. Please forgive the ad-hoc nature of the actually-implemented language described below, and be warned that this is an unstable area of the Synit design.

See near the end of this document for a few illustrative examples.

Evaluation model

The language consists of sequences of instructions. For example, one of the most important instructions simply publishes (asserts) a value at a given entity (which will often be a dataspace).

The language evaluation context includes an environment mapping variable names to Preserves Values.

Variable references are lexically scoped.

Each source file is interpreted in a top-level environment. The top-level environment is supplied by the context invoking the script, and is generally non-empty. It frequently includes a binding for the variable config, which happens to be the default target variable name.

Source file syntax

Program = Instruction ...

A configuration source file is a file whose name ends in .pr that contains zero or more Preserves text-syntax values, which are together interpreted as a sequence of Instructions.

Comments. Preserves comments are ignored. One unfortunate wart is that because Preserves comments are really annotations, they are required by the Preserves data model to be attached to some other value. Syntactically, this manifests as the need for some non-comment following every comment. In scripts written to date, often an empty SequencingInstruction serves to anchor comments at the end of a file:

; A comment
; Another comment
; The following empty sequence is needed to give the comments
; something to attach to
[]

Patterns, variable references, and variable bindings

Symbols are treated specially throughout the language. Perl-style sigils control the interpretation of any given symbol:

  • $var is a variable reference. The variable var will be looked up in the environment, and the corresponding value substituted.

  • ?var is a variable binder, used in pattern-matching. The value being matched at that position will be captured into the environment under the name var.

  • _ is a discard or wildcard, used in pattern-matching. The value being matched at that position will be accepted (and otherwise ignored), and pattern matching will continue.

  • =sym denotes the literal symbol sym. It is used whereever syntactic ambiguity could prevent use of a bare literal symbol. For example, =?foo denotes the literal symbol ?foo, where ?foo on its own would denote a variable binder for the variable named foo.

  • all other symbols are bare literal symbols, denoting just themselves.

The special variable . (referenced using $.) denotes "the current environment, as a dictionary".

The active target

During loading and compilation (!) of a source file, the compiler maintains a compile-time register called the active target (often simply the "target"), containing the name of a variable that will be used at runtime to select an entity reference to act upon. At the beginning of compilation, it is set to the name config, so that whatever is bound to config in the initial environment at runtime is used as the default target for targeted Instructions.

This is one of the awkward parts of the current language design.

Instructions

Instruction =
    SequencingInstruction |
    RetargetInstruction |
    AssertionInstruction |
    SendInstruction |
    ReactionInstruction |
    LetInstruction |
    ConditionalInstruction

Sequencing

SequencingInstruction = [Instruction...]

A sequence of instructions is written as a Preserves sequence. The carried instructions are compiled and executed in order. NB: to publish a sequence of values, use the += form of AssertionInstruction.

Setting the active target

RetargetInstruction = $var

The target is set with a variable reference standing alone. After compiling such an instruction, the active target register will contain the variable name var. NB: to publish the contents of a variable, use the += form of AssertionInstruction.

Publishing an assertion

AssertionInstruction =
    += ValueExpr |
    AttenuationExpr |
    <ValueExpr ValueExpr...> |
    {ValueExpr:ValueExpr ...}

The most general form of AssertionInstruction is "+= ValueExpr". When executed, the result of evaluating ValueExpr will be published (asserted) at the entity denoted by the active target register.

As a convenient shorthand, the compiler also interprets every Preserves record or dictionary in Instruction position as denoting a ValueExpr to be used to produce a value to be asserted.

Sending a message

SendInstruction = ! ValueExpr

When executed, the result of evaluating ValueExpr will be sent as a message to the entity denoted by the active target register.

Reacting to events

ReactionInstruction =
    DuringInstruction |
    OnMessageInstruction |
    OnStopInstruction

These instructions establish event handlers of one kind or another.

Subscribing to assertions and messages

DuringInstruction = ? PatternExpr Instruction
OnMessageInstruction = ?? PatternExpr Instruction

These instructions publish assertions of the form <Observe pat #!ref> at the entity denoted by the active target register, where pat is the dataspace pattern resulting from evaluation of PatternExpr, and ref is a fresh entity whose behaviour is to execute Instruction in response to assertions (resp. messages) carrying captured values from the binding-patterns in pat.

When the active target denotes a dataspace entity, the Observe record establishes a subscription to matching assertions and messages.

Each time a matching assertion arrives at a ref, a new facet is created, and Instruction is executed in the new facet. If the instruction creating the facet is a DuringInstruction, then the facet is automatically terminated when the triggering assertion is retracted. If the instruction is an OnMessageInstruction, the facet is not automatically terminated.1

Programs can react to facet termination using OnStopInstructions, and can trigger early facet termination themselves using the facet form of ConvenienceExpr (see below).

Reacting to facet termination

OnStopInstruction = ?- Instruction

This instruction installs a "stop handler" on the facet active during its execution. When the facet terminates, Instruction is run.

Destructuring-bind and convenience expressions

LetInstruction = let PatternExpr=ConvenienceExpr

ConvenienceExpr =
    dataspace |
    timestamp |
    facet |
stringify ConvenienceExpr |
    ValueExpr

Values can be destructured and new variables introduced into the environment with let, which is a "destructuring bind" or "pattern-match definition" statement. When executed, the result of evaluating ConvenienceExpr is matched against the result of evaluating PatternExpr. If the match fails, the actor crashes. If the match succeeds, the resulting binding variables (if any) are introduced into the environment.

The right-hand-side of a let, after the equals sign, is either a normal ValueExpr or one of the following special "convenience" expressions:

  • dataspace: Evaluates to a fresh, empty dataspace entity.

  • timestamp: Evaluates to a string containing an RFC-3339-formatted timestamp.

  • facet: Evaluates to a fresh entity representing the current facet. Sending the message stop to the entity (using e.g. the SendInstruction "! stop") triggers termination of its associated facet. The entity does not respond to any other assertion or message.

  • stringify: Evaluates its argument, then renders it as a Preserves value using Preserves text syntax, and yields the resulting string.

Conditional execution

ConditionalInstruction = $var=~PatternExpr Instruction Instruction ...

When executed, the value in variable var is matched against the result of evaluating PatternExpr.

  • If the match succeeds, the resulting bound variables are placed in the environment and execution continues with the first Instruction. The subsequent Instructions are not executed in this case.

  • If the match fails, then the first Instruction is skipped, and the subsequent Instructions are executed.

Value Expressions

ValueExpr =
    #t | #f | float | double | int | string | bytes |
    $var | =symbol | bare-symbol |
    AttenuationExpr |
    <ValueExpr ValueExpr...> |
    [ValueExpr...] |
    #{ValueExpr...} |
    {ValueExpr:ValueExpr ...}

Value expressions are recursively evaluated and yield a Preserves Value. Syntactically, they consist of literal non-symbol atoms, compound data structures (records, sequences, sets and dictionaries), plus special syntax for attenuated entity references, variable references, and literal symbols:

  • AttenuationExpr, described below, evaluates to an entity reference with an attached attenuation.

  • $var evaluates to the binding for var in the environment, if there is one, or crashes the actor, if there is not.

  • =symbol and bare-symbol (i.e. any symbols except a binding, a reference, or a discard) denote literal symbols.

Attenuation Expressions

AttenuationExpr = <* $var [Rewrite ...]>

Rewrite =
    <filter PatternExpr> |
    <rewrite PatternExpr TemplateExpr>

An attenuation expression looks up var in the environment, asserts that it is an entity reference orig, and returns a new entity reference ref, like orig but attenuated with zero or more Rewrites. The result of evaluation is ref, the new attenuated entity reference.

When an assertion is published or a message body arrives at ref, the sequence of Rewrites is executed left-to-right. If a Rewrite succeeds, the value if produces is forwarded on to orig. If all Rewrites fail, the assertion or message is silently ignored.

A rewrite Rewrite matches values with PatternExpr. If the match fails, the next Rewrite is tried; if it succeeds, the resulting bindings are used along with the current environment to evaluate TemplateExpr, and the resulting value is forwarded on to orig.

A filter Rewrite is the same as <rewrite <?v PatternExpr> $v>, for some fresh v.

Supplying zero Rewrites will cause the new entity to reject all assertions and messages sent to it.

Pattern Expressions

PatternExpr =
    #t | #f | float | double | int | string | bytes |
    $var | ?var | _ | =symbol | bare-symbol |
    AttenuationExpr |
    <?var PatternExpr> |
    <PatternExpr PatternExpr...> |
    [PatternExpr...] |
    {literal:PatternExpr ...}

Pattern expressions are recursively evaluated to yield a dataspace pattern. Evaluation of a PatternExpr is like evaluation of a ValueExpr, except that binders and wildcards are allowed, set syntax is not allowed, and dictionary keys are constrained to being literal values rather than PatternExprs.

Two kinds of binder are supplied. The more general is <?var PatternExpr>, which evaluates to a pattern that succeeds, capturing the matched value in a variable named var, only if PatternExpr succeeds. For the special case of <?var _>, the shorthand form ?var is supported.

The pattern _ (discard, wildcard) always succeeds, matching any value.

Template Expressions

TemplateExpr =
    #t | #f | float | double | int | string | bytes |
    $var | =symbol | bare-symbol |
    AttenuationExpr |
    <TemplateExpr TemplateExpr...> |
    [TemplateExpr...] |
    {literal:TemplateExpr ...}

Template expressions are used in attenuation expressions as part of value-rewriting instructions. Evaluation of a TemplateExpr is like evaluation of a ValueExpr, except that set syntax is not allowed and dictionary keys are constrained to being literal values rather than TemplateExprs.

Additionally, record template labels (just after a "<") must be "literal-enough". If any sub-part of the label TemplateExpr refers to a variable's value, the variable must have been bound in the environment surrounding the AttenuationExpr that the TemplateExpr is part of, and must not be any of the capture variables from the PatternExpr corresponding to the template. This is a constraint stemming from the definition of the syntax used for expressing capability attenuation in the underlying Syndicated Actor Model. (TODO: link to sturdy.prs documentation)

Examples

Example 1. The simplest example uses no variables, publishing constant assertions to the implicit default target, $config:

<require-service <daemon console-getty>>
<daemon console-getty "getty 0 /dev/console">

Example 2. A more complex example subscribes to two kinds of service-state assertion at the dataspace named by the default target, $config, and in response to their existence asserts a rewritten variation on them:

? <service-state ?x ready> <service-state $x up>
? <service-state ?x complete> <service-state $x up>

In prose, it reads as "during any assertion at $config of a service-state record with state ready for any service name x, assert (also at $config) that x's service-state is up in addition to ready," and similar for state complete.

Example 3. The following example first attenuates $config, binding the resulting capability to $sys. Any require-service record published to $sys is rewritten into a require-core-service record; other assertions are forwarded unchanged.

let ?sys = <* $config [
  <rewrite <require-service ?s> <require-core-service $s>>
  <filter _>
]>

Then, $sys is used to build the initial environment for a configuration tracker, which executes script files in the /etc/syndicate/core directory using the environment given.

<require-service <config-watcher "/etc/syndicate/core" {
  config: $sys
  gatekeeper: $gatekeeper
  log: $log
}>>

Example 4. The final example executes a script in response to an exec record being sent as a message to $config. The use of ?? indicates a message-event-handler, rather than ?, which would indicate an assertion-event-handler.

?? <exec ?argv ?restartPolicy> [
  let ?id = timestamp
  let ?facet = facet
  let ?d = <temporary-exec $id $argv>
  <run-service <daemon $d>>
  <daemon $d {
    argv: $argv,
    readyOnStart: #f,
    restart: $restartPolicy,
  }>
  ? <service-state <daemon $d> complete> [$facet ! stop]
  ? <service-state <daemon $d> failed>   [$facet ! stop]
]

First, the current timestamp is bound to $id, and a fresh entity representing the facet established in response to the exec message is created and bound to $facet. The variable $d is then initialized to a value uniquely identifying this particular exec request. Next, run-service and daemon assertions are placed in $config. These assertions communicate with the built-in program execution and supervision service, causing a Unix subprocess to be created to execute the command in $argv. Finally, the script responds to service-state assertions from the execution service by terminating the facet by sending its representative entity, $facet, a stop message.

Programming idioms

Conventional top-level variable bindings. Besides config, many scripts are executed in a context where gatekeeper names a server-wide gatekeeper entity, and log names an entity that logs messages of a certain shape that are delivered to it.

Setting the active target register. The following pairs of Instructions first set and then use the active target register:

$log ! <log "-" { line: "Hello, world!" }>
$config ? <configure-interface ?ifname <dhcp>> [
  <require-service <daemon <udhcpc $ifname>>>
]
$config ? <service-object <daemon interface-monitor> ?cap> [
  $cap {
    machine: $machine
  }
]

In the last one, $cap is captured from service-object records at $config and is then used as a target for publication of a dictionary (containing key machine).

Using conditionals. The syntax of ConditionalInstruction is such that it can be easily chained:

$val =~ pat1 [ ... if pat1 matches ...]
$val =~ pat2 [ ... if pat2 matches ...]
... if neither pat1 nor pat2 matches ...

Using dataspaces as ad-hoc entities. Constructing a dataspace, attaching subscriptions to it, and then passing it to somewhere else is a useful trick for creating scripted entities able to respond to a few different kinds of assertion or message:

let ?ds = dataspace ; create the dataspace

$config += <my-entity $ds> ; send it to peers for them to use

$ds [ ; select $ds as the active target for `DuringInstruction`s inside the [...]
  ? pat1 [ ... ] ; respond to assertions of the form `pat1`
  ? pat2 [ ... ] ; respond to assertions of the form `pat2`
  ?? pat3 [ ... ] ; respond to messages of the form `pat3`
  ?? pat4 [ ... ] ; respond to messages of the form `pat4`
]

Notes

1

This isn't quite true. If, after execution of Instruction, the new facet is "inert"—roughly speaking, has published no assertions and has no subfacets—then it is terminated. However, since inert facets are unreachable and cannot interact with anything or affect the future of a program in any way, this is operationally indistinguishable from being left in existence, and so serves only to release memory for later reuse.

Services and service dependencies

Assertions in the main $config dataspace are the means Synit uses to declare services and service dependencies.

Service are started "gracefully", taking their dependencies into consideration, using require-service assertions; upon appearance of require-service, and after dependencies are satisfied, a run-service assertion is automatically made. Services can also be "force-started" using run-service assertions directly. Once all run-service assertions for a service have been withdrawn, services shut themselves down.

Example: Docker daemon

As a concrete example, take the file /etc/syndicate/services/docker.pr, which both defines and invokes a service for running the Docker daemon:

<require-service <daemon docker>>
<depends-on <daemon docker> <service-state <milestone network> up>>
<daemon docker "/usr/bin/dockerd --experimental 2>/var/log/docker.log">

This is an example of the scripting language in action, albeit a simple one without use of variables or any reactive constructs.

  • The require-service assertion instructs syndicate-server to solve the dependencies for the service named <daemon docker> and to start the service running.

  • The depends-on assertion specifies that the Docker daemon requires the network milestone (configured primarily in network.pr) to have been reached.

  • The daemon assertion is interpreted by the built-in external service class, and specifies how to configure and run the service once its dependencies are ready.

Details

A few different kinds of assertions, all declared in the service.prs schema, form the heart of the system.

Assert that a service and its dependencies should be started

RequireService = <require-service @serviceName any>.

Asserts that a service should begin (and stay) running after waiting for its dependencies and considering reverse-dependencies, blocks, and so on.

Assert that a service should start right now

RunService = <run-service @serviceName any>.

Asserts that a service should begin (and stay) running RIGHT NOW, without considering its dependencies.

The built-in handler for require-service assertions will assert run-service automatically once all dependencies have been satisfied.

Declare a dependency among services

ServiceDependency = <depends-on @depender any @dependee ServiceState>.

Asserts that, when depender is require-serviced, it should not be started until dependee has been asserted, and also that dependee's serviceName should be require-serviced.

Convey the current state of a service

ServiceState = <service-state @serviceName any @state State>.
State = =started / =ready / =failed / =complete / @userDefined any .

Asserts one or more current states of service serviceName. The overall state of the service is the union of asserted states.

A few built-in states are defined:

  • started - the service has begun its startup routine, and may or may not be ready to take requests from other parties.

  • started + ready - the service has started and is also ready to take requests from other parties. Note that the ready state is special in that it is asserted in addition to started.

  • failed - the service has failed.

  • complete - the service has completed execution.

In addition, any user-defined value is acceptable as a State.

Make an entity representing a service instance available

ServiceObject = <service-object @serviceName any @object any>.

A running service publishes zero or more of these. The details of the object vary by service.

Request a service restart

RestartService = <restart-service @serviceName any>.

This is a message, not an assertion. It should be sent in order to request a service restart.

Built-in services and service classes

The syndicate-server program includes built-in knowledge about a handful of useful services, including a means of loading external programs and integrating them into the running system.

  • Every server program starts a gatekeeper service, which is able to manage conversion between live references and so-called "sturdy refs", long-lived capabilities for access to resources managed by the server.

  • A simple logging actor copies log messages from the system dataspace to the server's standard error file descriptor.

  • Any number of TCP/IP, WebSocket, and Unix socket transports may be configured to allow external access to the gatekeeper and its registered services. (These can also be started from the syndicate-server command-line with -p and -s options.)

  • Any number of configuration watchers may be created to monitor directories for changes to files written using the server scripting language. (These can also be started from the syndicate-server command-line with -c options.)

  • Finally, external programs can be started, either as long-lived "daemon" services or as one-off scripts.

Resources available at startup

The syndicate-server program uses the Rust tracing crate, which means different levels of internal logging verbosity are available via the RUST_LOG environment variable. See here for more on RUST_LOG.

If tracing of Syndicated Actor Model actions is enabled with the -t flag, it is configured prior to the start of the main server actor.

As the main actor starts up, it

  • creates a fresh dataspace, known as the $config dataspace, intended to contain top-level/global configuration for the server instance;

  • creates a fresh dataspace, known as $log, for assertions and messages related to service logging within the server instance;

  • creates the $gatekeeper actor implementing the gatekeeper service, attaching it to the $config dataspace;

  • exposes $config, $log and $gatekeeper as the variables available to configuration scripts loaded by config-watchers started with the -c flag (N.B. the $config dataspace is thus the default target for assertions in config files);

  • creates service factories monitoring various service assertions in the $config dataspace;

  • processes -p command-line options, each of which creates a TCP/IP relay listener;

  • processes -s command-line options, each of which creates a Unix socket relay listener;

  • processes -c command-line options, each of which creates a config-watcher monitoring a file-system directory; and finally

  • creates the logging actor, listening to certain events on the $log dataspace.

Once these tasks have been completed, it quiesces, leaving the rest of the operation of the system up to other actors (relay-listeners, configuration scripts, and other configured services).

Gatekeeper

When syndicate-server starts, it creates a gatekeeper service entity, which accepts resolve assertions requesting conversion of a long-lived "sturdyref" to a live reference. The gatekeeper is the default object, available as OID 0 to peers at the other end of relay listener connections.

Gatekeeper protocol

Resolve = <resolve @sturdyref sturdy.SturdyRef @observer #!#!any>.

When a request to resolve a given sturdyref appears, the gatekeeper entity queries a dataspace (by default, the server's top-level $config dataspace) for bind assertions:

Bind = <bind @oid any @key bytes @target #!any>.

Each bind assertion matching the requested sturdyref is checked against the credentials provided in the sturdyref, and if the checks pass, the target entity from the bind is asserted to the observer in the resolve.

Sturdyrefs

A "sturdyref" is a long-lived certificate including a cryptographic signature that can be upgraded by a gatekeeper entity to a live reference to the entity named in the sturdyref. The current sturdyref implementation is based on the design of Macaroons.

The following definitions are taken from the sturdy.prs schema.

SturdyRef = <ref @oid any @caveatChain [Attenuation ...] @sig bytes>.

Within a ref record, the oid field is a free-form value that the targeted service chooses to name itself. The sig is an iterated keyed-HMAC construction, just as in macaroons. First, the service's secret key is used to key an HMAC of the oid. Then, the result is used to key an HMAC of the first Attenuation in caveatChain. Each Attenuation's HMAC becomes the key for the next in the caveatChain. The final result is equal to the sig field in a valid sturdyref.

Attenuation of authority

When it comes to publishing assertions or sending messages to the entity denoted by a sturdyref, the caveatChain is used to attenuate the authority denoted by the sturdyref by filtering and/or rewriting assertion and message bodies. The caveatChain is run right to left, with newer rewrites-and-filters at the right-hand end of the chain and older ones at the left-hand end. Of course, an empty caveatChain is an unattenuated reference.

Attenuation = [Caveat ...].

Each individual Attenuation in a caveatChain is a sequence of Caveats. The term "caveat" is shamelessly taken from macaroons, though our caveats presently embody only what in the Macaroons paper are called "first-party caveats" over assertion structure; future versions of the server may add "third-party caveats" and other, richer, predicates over assertions.

Each Attenuation's Caveats are run in right to left order. The structure and interpretation of Caveats is described fully in the relevant section of the Syndicate network protocol specification.

Logging

The Synit logging infrastructure is still underdeveloped.

At present, there is an actor created at syndicate-server startup time that monitors the $log dataspace for messages of the form:

LogEntry = <log @timestamp string @detail { any: any ...:... }> .

When it receives a log entry, it looks for a few conventional and optional keys in the detail field, each permitted to be any kind of value:

  • pid, conventionally a Unix process ID;
  • line, conventionally a string of free-form text intended for people to read;
  • service, conventionally a service name in the sense of require-service/run-service; and
  • stream, conventionally one of the symbols stdout or stderr.

The timestamp and the special keys are then formatted, along with all other information in the entry record, and printed to the syndicate-server's standard error at INFO level using tracing.

Relay Listeners

The syndicate-server program can be configured to listen on TCP/IP ports and Unix sockets1 for incoming connections speaking the Syndicate network protocol.

TCP/IP and WebSockets

Assertions requiring a service with name matching TcpRelayListener cause the server to start a TCP server socket on the given addr's host and port, exposing the gatekeeper entity reference as the initial ref of incoming connections:

TcpRelayListener = <relay-listener @addr Tcp @gatekeeper #!gatekeeper.Resolve> .
Tcp = <tcp @host string @port int>.

When a new connection arrives, the first byte is examined to see what kind of connection it is and which Preserves syntax it will use.

  • If it is ASCII "G" (0x47), it cannot be the start of a protocol packet, so it is interpreted as the start of a WebSocket connection and handed off to the tokio_tungstenite WebSocket library. Within the WebSocket's context, each packet must be encoded as a binary packet using Preserves binary syntax.

  • Otherwise, if it could start a valid UTF-8 sequence, the connection will be a plain TCP/IP link using the Preserves text syntax.

  • Otherwise, it's a byte which cannot be part of a valid UTF-8 sequence, so it is interpreted as a Preserves binary syntax tag: the connection will be a plain TCP/IP link using Preserves binary syntax.

Unix sockets

Assertions requiring a service with name matching UnixRelayListener cause the server to start a Unix server socket on the given addr's path, exposing the gatekeeper entity reference as the initial ref of incoming connections:

UnixRelayListener = <relay-listener @addr Unix @gatekeeper #!gatekeeper.Resolve> .
Unix = <unix @path string>.

Syntax autodetection is as for TCP/IP, except that WebSockets are not supported over Unix sockets.


Notes

1

Only SOCK_STREAM Unix sockets are supported, at present. In future, SOCK_DGRAM could be useful for e.g. file-descriptor passing.

Configuration watcher

Assertions requiring a service with name matching ConfigWatcher cause the server to start a configuration watcher service monitoring files in and subdirectories of the given path for changes:

ConfigWatcher = <config-watcher @path string @env ConfigEnv>.
ConfigEnv = { symbol: any ...:... }.

The path may name either a file or directory. Any time the configuration watcher finds a file matching the glob *.pr within the tree rooted at path, it loads the file. Each time a *.pr file is loaded, it is interpreted as a configuration scripting language program, with a copy of env as the "initial environment" for the script.

Whenever a change to a *.pr file is detected, the configuration watcher reloads the file, discarding previous internal state related to the file.

Note that a quirk of the config language requires that there exist an entry in env with key the symbol config and value an entity reference (usually denoting a dataspace entity). However, the config entry need not be the same as the surrounding $config! A useful pattern is to set up a new ConfigWatcher with env containing a config binding pointing to an attenuated reference to the current config dataspace, or even an entirely fresh dataspace created specifically for the task.

Process supervision and management

Assertions requiring a service with name matching DaemonService cause the server to start a subprocess-based service:

DaemonService = <daemon @id any> .

Each daemon service can have zero or more subprocesses associated with it. Subprocesses can be long-lived services or short-lived, system-state-changing programs or scripts.

Adding process specifications to a service

Each subprocess associated with a DaemonService is defined with a DaemonProcess assertion:

DaemonProcess = <daemon @id any @config DaemonProcessSpec>.
DaemonProcessSpec =
  / @simple CommandLine
  / @oneShot <one-shot @setup CommandLine>
  / @full FullDaemonProcess .

The simplest kind of subprocess specification is a CommandLine, either a string (sent to sh -c) or an array of program name (looked up in the $PATH) and arguments:

CommandLine = @shell string / @full FullCommandLine .
FullCommandLine = [@program string, @args string ...] .

The simple and oneShot variants of DaemonProcessSpec expand into FullDaemonProcess values as follows:

  • a simple command-line c becomes { argv: c }; and
  • a record <one-shot c> becomes { argv: c, readyOnStart: false, restart: on-error }.

Subprocess specification

The FullDaemonProcess type matches a Preserves dictionary having, at minimum, an argv key, and optionally including many other parameters controlling various aspects of the subprocess to be created.1

FullDaemonProcess =
  & @process FullProcess
  & @readyOnStart ReadyOnStart
  & @restart RestartField
  & @protocol ProtocolField .

FullProcess =
  & { argv: CommandLine }
  & @env ProcessEnv
  & @dir ProcessDir
  & @clearEnv ClearEnv .

The CommandLine associated with argv specifies the program name to invoke and its command-line arguments. The other options are described in the remainder of this section.

Ready-signalling

If the key readyOnStart is present in a FullDaemonProcess dictionary, then if its associated value is #t (the default), the service will be considered ready immediately after it has been spawned; if its value is #f, some other arrangement is expected to be made to announce a ready ServiceState against the service's name.

ReadyOnStart =
  / @present { readyOnStart: bool }
  / @invalid { readyOnStart: any }
  / @absent {} .

Whether and when to restart

The default restart policy is always. It can be overridden by providing the key restart a FullDaemonProcess dictionary, mapping to a valid RestartPolicy value.

RestartField =
  / @present { restart: RestartPolicy }
  / @invalid { restart: any }
  / @absent {} .

RestartPolicy = =always / =on-error / =all / =never .

The valid restart policies are:

  • always: Whether the process terminates normally or abnormally, restart it without affecting any peer processes within the service.

  • on-error: If the process terminates normally, leave everything alone; if it terminates abnormally, restart it without affecting peers.

  • all: If the process terminates normally, leave everything alone; if it terminates abnormally, restart the whole daemon (all processes within the Daemonservice).

  • never: Treat both normal and abnormal termination as normal termination; that is, never restart, and enter state complete even if the process fails.

Speaking Syndicate Network Protocol via stdin/stdout

By default, the syndicate-server program assumes nothing about the information to be read and written via a subprocess's standard input and standard output. This can be overridden with a protocol entry in a FullDaemonProcess specification. (Standard error is always considered to produce information to be put in the system logs, however.)

ProtocolField =
  / @present { protocol: Protocol }
  / @invalid { protocol: any }
  / @absent {} .

Protocol = =none / =application/syndicate / =text/syndicate .

The available options for protocol are:

  • none: the standard input of the subprocess is connected to /dev/null, and the standard output and standard error are logged.

  • application/syndicate: the subprocess standard input and output are used as a binary syntax Syndicate network protocol relay. Standard error is logged. The subprocess is expected to make some entity available to the server via initial oid 0. The server reflects this expectation by automatically placing a service object record into the dataspace alongside the daemon record defining the subprocess.

  • text/syndicate: as for application/syndicate, but Preserves' text syntax is used instead of binary syntax.

Specifying subprocess environment variables

By default, the Unix process environment passed on to subprocesses is not changed. Supplying clearEnv and/or env keys alters this behaviour.

ClearEnv =
  / @present { clearEnv: bool }
  / @invalid { clearEnv: any }
  / @absent {} .

ProcessEnv =
  / @present { env: { EnvVariable: EnvValue ...:... } }
  / @invalid { env: any }
  / @absent {} .

EnvVariable = @string string / @symbol symbol / @invalid any .
EnvValue = @set string / @remove #f / @invalid any .

Setting clearEnv to #t causes the environment to be emptied before env is processed and before the subprocess is started. The env key is expected to contain a dictionary whose keys are strings or symbols and whose values are either a string, to set the variable to a new value, or #f, to remove it from the environment.

Setting the Current Working Directory for a subprocess

By default, each subprocess inherits the current working directory of the syndicate-server program. Setting a dir key to a string value in a FullDaemonProcess overrides this.

ProcessDir =
  / @present { dir: string }
  / @invalid { dir: any }
  / @absent {} .

Notes

1

The FullProcess type is split out in order for it to be able to be reused outside the specific context of a daemon process.

Configuration files and directories

The root system bus is started with a --config /etc/syndicate/boot command-line argument, which causes it to execute configuration scripts in that directory. In turn, the boot directory contains instructions for loading configuration from other locations on the filesystem.

This section will examine the layout of the configuration scripts and directories.

The boot layer

The files in /etc/syndicate/boot define the boot layer.

Console getty

The first thing the boot layer does, in 001-console-getty.pr, is start a getty on /dev/console:

<require-service <daemon console-getty>>
<daemon console-getty "getty 0 /dev/console">

Ad-hoc execution of programs

Next, in 010-exec.pr, it installs a handler that responds to messages requesting ad-hoc execution of programs:

?? <exec ?argv ?restartPolicy> [
  let ?id = timestamp
  let ?facet = facet
  let ?d = <temporary-exec $id $argv>
  <run-service <daemon $d>>
  <daemon $d { argv: $argv, readyOnStart: #f, restart: $restartPolicy }>
  ? <service-state <daemon $d> complete> [$facet ! stop]
  ? <service-state <daemon $d> failed>   [$facet ! stop]
]

If the restart policy is not specified, it is defaulted to on-error:

?? <exec ?argv> ! <exec $argv on-error>

"Milestone" pseudo-services

Then, in 010-milestone.pr, it defines how to respond to a request to run a "milestone" pseudo-service:

? <run-service <milestone ?m>> [
  <service-state <milestone $m> started>
  <service-state <milestone $m> ready>
]

The definition is trivial—when requested, simply declare success—but useful in that a "milestone" can be used as a proxy for a configuration state that other services can depend upon.

Concretely, milestones are used in two places at present: a core milestone declares that the core layer of services is ready, and a network milestone declares that initial network configuration is complete.

Synthesis of service state "up"

The definition of ServiceState includes ready, for long-running service programs, and complete, for successful exit (exit status 0) of "one-shot" service programs. In 010-service-state-up.pr, we declare an alias up that is asserted in either of these cases:

? <service-state ?x ready> <service-state $x up>
? <service-state ?x complete> <service-state $x up>

Loading of "core" and "services" layers

The final tasks of the boot layer are to load the "core" and "service" layers, respectively.

Services declared in the "core" layer are automatically marked as dependencies of the <milestone core> pseudo-service, and those declared in the "services" layer are automatically marked as depending on <milestone core>.

services layer core layer eudev docker hostname modem milestone core network ntpd depend on milestone core depended on by milestone core machine dataspace sshd userSettings wifi

The core layer loader

For the core layer, in 020-load-core-layer.pr, a configuration watcher is started, monitoring /etc/syndicate/core for scripts defining services to place into the layer. Instead of passing an unattenuated reference to $config to the configuration watcher, an attenuation expression rewrites require-service assertions into require-core-service assertions:

let ?sys = <* $config [
  <rewrite <require-service ?s> <require-core-service $s>>
  <filter _>
]>

<require-service <config-watcher "/etc/syndicate/core" {
  config: $sys
  gatekeeper: $gatekeeper
  log: $log
}>

Then, require-core-service is given meaning:

? <require-core-service ?s> [
  <depends-on <milestone core> <service-state $s up>>
  <require-service $s>
]

The services layer loader

The services layer is treated similarly in 030-load-services.pr, except require-basic-service takes the place of require-core-service, and the configuration watcher isn't started until <milestone core> is ready. Any require-basic-service assertions are given meaning as follows:

? <require-basic-service ?s> [
  <depends-on $s <service-state <milestone core> up>>
  <require-service $s>
]

The core layer: /etc/syndicate/core

The files in /etc/syndicate/core define the core layer.

The configdirs.pr script brings in scripts in /run and /usr/local analogues of the core config directory:

<require-service <config-watcher "/run/etc/syndicate/core" $.>>
<require-service <config-watcher "/usr/local/etc/syndicate/core" $.>>

The eudev.pr script runs a udevd instance and, once it's ready, starts an initial scan:

<require-service <daemon eudev>>
<daemon eudev ["/sbin/udevd", "--children-max=5"]>

<require-service <daemon eudev-initial-scan>>
<depends-on <daemon eudev-initial-scan> <service-state <daemon eudev> up>>
<daemon eudev-initial-scan <one-shot "
  echo '' > /proc/sys/kernel/hotplug &&
  udevadm trigger --type=subsystems --action=add &&
  udevadm trigger --type=devices --action=add &&
  udevadm settle --timeout=30
">>

The hostname.pr script simply sets the machine hostname:

<require-service <daemon hostname>>
<daemon hostname <one-shot "hostname $(cat /etc/hostname)">>

Finally, the machine-dataspace.pr script declares a fresh, empty dataspace, and asserts a reference to it in a "well-known location" for use by other services later:

let ?ds = dataspace
<machine-dataspace $ds>

The services layer: /etc/syndicate/services

The files in /etc/syndicate/services define the services layer.

The configdirs.pr script brings in /run and /usr/local service definitions, analogous to the same file in the core layer:

<require-service <config-watcher "/run/etc/syndicate/services" $.>>
<require-service <config-watcher "/usr/local/etc/syndicate/services" $.>>

Networking core

The network.pr defines the <milestone network> pseudo-service and starts a number of ancillary services for generically monitoring and configuring system network interfaces.

First, <daemon interface-monitor> is a small Python program, required by <milestone network>, using Netlink sockets to track changes to interfaces and interface state. It speaks the Syndicate network protocol on its standard input and output, and publishes a service object which expects a reference to the machine dataspace defined earlier:

<require-service <daemon interface-monitor>>
<depends-on <milestone network> <service-state <daemon interface-monitor> ready>>
<daemon interface-monitor {
  argv: "/usr/lib/synit/interface-monitor"
  protocol: application/syndicate
}>
? <machine-dataspace ?machine> [
  ? <service-object <daemon interface-monitor> ?cap> [
    $cap {
      machine: $machine
    }
  ]
]

The interface-monitor publishes assertions describing interface presence and state to the machine dataspace. The network.pr script responds to these assertions by requesting configuration of an interface once it reaches a certain state. First, all interfaces are enabled when they appear and disabled when they disappear:

  $machine ? <interface ?ifname _ _ _ _ _ _> [
    $config [
      ! <exec ["ip" "link" "set" $ifname "up"]>
      ?- ! <exec ["ip" "link" "set" $ifname "down"] never>
    ]
  ]

Next, a DHCP client is invoked for any "normal" (wired-ethernet-like) interface in "up" state with a carrier:

  $machine ? <interface ?ifname _ normal up up carrier _> [
    $config <configure-interface $ifname <dhcp>>
  ]
  $machine ? <interface ?ifname _ normal up unknown carrier _> [
    $config <configure-interface $ifname <dhcp>>
  ]

  $config ? <configure-interface ?ifname <dhcp>> [
    <require-service <daemon <udhcpc $ifname>>>
  ]
  $config ? <run-service <daemon <udhcpc ?ifname>>> [
    <daemon <udhcpc $ifname> ["udhcpc" "-i" $ifname "-fR" "-s" "/usr/lib/synit/udhcpc.script"]>
  ]

We use a custom udhcpc script which modifies the default script to give mobile-data devices a sensible routing metric.

The final pieces of network.pr are static configuration of the loopback interface:

<configure-interface "lo" <static "127.0.0.1/8">>
? <configure-interface ?ifname <static ?ipaddr>> [
  ! <exec ["ip" "address" "add" "dev" $ifname $ipaddr]>
  ?- ! <exec ["ip" "address" "del" "dev" $ifname $ipaddr] never>
]

and conditional publication of a default-route record, allowing services to detect when the internet is (nominally) available:

  $machine ? <route ?addressFamily default _ _ _ _> [
    $config <default-route $addressFamily>
  ]

Wifi & Mobile Data

Building atop the networking core, wifi.pr and modem.pr provide the necessary support for wireless LAN and mobile data interfaces, respectively.

When interface-monitor detects presence of a wireless LAN interface, wifi.pr reacts by starting wpa_supplicant for the interface along with a small Python program, wifi-daemon, that acts as a client to wpa_supplicant, adding and removing networks and network configuration according to selected-wifi-network assertions in the machine dataspace.

  $machine ? <interface ?ifname _ wireless _ _ _ _> [
    $config [
      <require-service <daemon <wpa_supplicant $ifname>>>
      <depends-on
        <daemon <wifi-daemon $ifname>>
        <service-state <daemon <wpa_supplicant $ifname>> up>>
      <require-service <daemon <wifi-daemon $ifname>>>
    ]
  ]

  $config ? <run-service <daemon <wifi-daemon ?ifname>>> [
    <daemon <wifi-daemon $ifname> {
      argv: "/usr/lib/synit/wifi-daemon"
      protocol: application/syndicate
    }>
    ? <service-object <daemon <wifi-daemon $ifname>> ?cap> [
      $cap {
        machine: $machine
        ifname: $ifname
      }
    ]
  ]

  $config ? <run-service <daemon <wpa_supplicant ?ifname>>> [
    <daemon <wpa_supplicant $ifname> [
      "wpa_supplicant" "-Dnl80211,wext" "-C/run/wpa_supplicant" "-i" $ifname
    ]>
  ]

The other tasks performed by wifi.pr are to request DHCP configuration for available wifi interfaces:

  $machine ? <interface ?ifname _ wireless up up carrier _> [
    $config <configure-interface $ifname <dhcp>>
  ]

and to relay selected-wifi-network records from user settings (described below) into the machine dataspace, for wifi-daemon instances to pick up:

  $config ? <user-setting <?s <selected-wifi-network _ _ _>>> [ $machine += $s ]

Turning to modem.pr, which is currently hard-coded for Pinephone devices, we see two main blocks of config. The simplest just starts the eg25-manager daemon for controlling the Pinephone's Quectel modem, along with a simple monitoring script for restarting it if and when /dev/EG25.AT disappears:

<daemon eg25-manager "eg25-manager">
<depends-on <daemon eg25-manager> <service-state <daemon eg25-manager-monitor> up>>
<daemon eg25-manager-monitor "/usr/lib/synit/eg25-manager-monitor">

The remainder of modem.pr handles cellular data, configured via the qmicli program.

<require-service <qmi-wwan "/dev/cdc-wdm0">>
<depends-on <qmi-wwan "/dev/cdc-wdm0"> <service-state <daemon eg25-manager> up>>

When the user settings mobile-data-enabled and mobile-data-apn are both present, it responds to qmi-wwan service requests by invoking qmi-wwan-manager, a small shell script, for each particular device and APN combination:

? <user-setting <mobile-data-enabled>> [
  ? <user-setting <mobile-data-apn ?apn>> [
    ? <run-service <qmi-wwan ?dev>> [
      <require-service <daemon <qmi-wwan-manager $dev $apn>>>
    ]
  ]
]
? <run-service <daemon <qmi-wwan-manager ?dev ?apn>>> [
  <daemon <qmi-wwan-manager $dev $apn> ["/usr/lib/synit/qmi-wwan-manager" $dev $apn]>
]

(Because qmicli is sometimes not well behaved, there is also code in modem.pr for restarting it in certain circumstances when it gets into a state where it reports errors but does not terminate.)

Simple daemons

A few simple daemons are also started as part of the services layer.

The docker.pr script starts the docker daemon, but only once the network configuration is available:

<require-service <daemon docker>>
<depends-on <daemon docker> <service-state <milestone network> up>>
<daemon docker "/usr/bin/dockerd --experimental 2>/var/log/docker.log">

The ntpd.pr script starts an NTP daemon, but only when an IPv4 default route exists:

<require-service <daemon ntpd>>
<depends-on <daemon ntpd> <default-route ipv4>>
<daemon ntpd "ntpd -d -n -p pool.ntp.org">

Finally, the sshd.pr script starts the OpenSSH server daemon after ensuring both that the network is available and that SSH host keys exist:

<require-service <daemon sshd>>
<depends-on <daemon sshd> <service-state <milestone network> up>>
<depends-on <daemon sshd> <service-state <daemon ssh-host-keys> complete>>
<daemon sshd "/usr/sbin/sshd -D">
<daemon ssh-host-keys <one-shot "ssh-keygen -A">>

User settings

A special folder, /etc/syndicate/user-settings, acts as a persistent database of assertions relating to user settings, including such things as wifi network credentials and preferences, mobile data preferences, and so on. The userSettings.pr script sets up the programs responsible for managing the folder.

The contents of the folder itself are managed by a small Python program, user-settings-daemon, which responds to requests arriving via the $config dataspace by adding and removing files containing assertions in /etc/syndicate/user-settings.

let ?settingsDir = "/etc/syndicate/user-settings"
<require-service <daemon user-settings-daemon>>
<daemon user-settings-daemon {
  argv: "/usr/lib/synit/user-settings-daemon"
  protocol: application/syndicate
}>
? <service-object <daemon user-settings-daemon> ?cap> [
  $cap {
    config: $config
    settingsDir: $settingsDir
  }
]

Each such file is named after the SHA-1 digest of the canonical form of the assertion it contains. For example, /etc/syndicate/user-settings/8814297f352be4ebbff19137770e619b2ebc5e91.pr contains <mobile-data-enabled>.

The files in /etc/syndicate/user-settings are brought into the main config dataspace by way of a rewriting configuration watcher:

let ?settings = <* $config [ <rewrite ?item <user-setting $item>> ]>
<require-service <config-watcher $settingsDir { config: $settings }>>

Every assertion from /etc/syndicate/user-settings is wrapped in a <user-setting ...> record before being placed into the main $config dataspace.

How-to ...

The following pages walk through examples of common system administration tasks.

How to define services and service classes

Synit services are started in response to run-service assertions. These, in turn, are eventually asserted by the service dependency tracker in response to require-service assertions, once any declared dependencies have been started.

So to implement a service, respond to run-service records mentioning the service's name.

There are a number of concepts involved in service definitions:

  • Service name. A unique identifier for a service instance.

  • Service implementation. Code that responds to run-service requests for a service instance to start running, implementing the service's ongoing behaviour.

  • Service class. A parameterized collection of services sharing a common parameterized implementation.

A service may be an instance of a service class (a parameterized family of services) or may be a simple service that is the only instance of its class. Service dependencies can be statically-declared or dynamically-computed.

A service's implementation may be external, running as a subprocess managed by syndicate-server; internal, backed by code that is part of the syndicate-server process itself; or user-defined, implemented via user-supplied code written in the configuration language or as other actor programs connected somehow to the system bus.

An external service may involve a long-running process (a "daemon"; what s6-rc calls a "longrun"), or may involve a short-lived activity that, at startup or shutdown, modifies aspects of overall system state outside the purview of the supervision tree (what s6-rc calls a "one-shot").

Service names

Every service is identified with its name. A service name can be any Preserves value. A simple symbol may suffice, but records and dictionaries are often useful in giving structure to service names.

Here are a few example service names:

  • <config-watcher "/foo/bar" $.>1
  • <daemon docker>
  • <milestone network>
  • <qmi-wwan "/dev/cdc-wdm0">
  • <udhcpc "eth0">

The first two invoke service behaviours that are built-in to syndicate-server; the last three are user-defined service names.

Defining a simple external service

As an example of a simple external service, take the ntpd daemon. The following assertions placed in the configuration file /etc/syndicate/services/ntpd.pr cause ntpd to be run as part of the Synit services layer.

First, we choose the service name: <daemon ntpd>. The name is a daemon record, marking it as a supervised external service. Having chosen a name, and chosen to use the external service supervision mechanism to run the service, we make our first assertion, which defines the program to be launched:

<daemon ntpd "ntpd -d -n -p pool.ntp.org">

Next, we mark the service as depending on the presence of another assertion, <default-route ipv4>. This assertion is managed by the networking core.

<depends-on <daemon ntpd> <default-route ipv4>>

These two assertions are, together, the total of the definition of the service.

However, without a final require-service assertion, the service will not be activated. By requiring the service, we connect the service definition into the system dependency tree, enabling actual loading and activation of the service.

<require-service <daemon ntpd>>

Defining a service class

The following stanza (actually part of the networking core) waits for run-service assertions matching a family of service names, <daemon <udhcpc ifname>>. When it sees one, it computes the specification for the corresponding command-line, on the fly, substituting the value of the ifname binding in the correct places (once in the service name and once in the command-line specification).

? <run-service <daemon <udhcpc ?ifname>>> [
  <daemon
    <udhcpc $ifname>
    ["udhcpc" "-i" $ifname "-fR" "-s" "/usr/lib/synit/udhcpc.script"]
  >
]

This suffices to define the service. To instantiate it, we may either manually provide assertions mentioning the interfaces we care about,

<require-service <daemon <udhcpc "eth0">>>
<require-service <daemon <udhcpc "wlan0">>>

or, as actually implemented in the networking core (in network.pr lines 13–15 and 42–47), we may respond to assertions placed in the dataspace by a daemon, interface-monitor, whose role is to reflect AF_NETLINK events into assertions:

? <configure-interface ?ifname <dhcp>> [
  <require-service <daemon <udhcpc $ifname>>>
]

Here, when an assertion of the form <configure-interface ifname <dhcp>> appears in the dataspace, we react by asserting a require-service record that in turn eventually triggers assertion of a matching run-service, which then in turn results in invocation of the udhcpc command-line we specified above.

Defining non-daemon services; reacting to user settings

Only service names of the form <daemon name> are backed by external service supervisor code. Other service name schemes have other implementations. In particualr, user-defined service name schemes are possible and useful.

For example, in the configuration relating to setup of mobile data interfaces, service names of the form <qmi-wwan devicePath> are defined:

? <user-setting <mobile-data-enabled>> [
  ? <user-setting <mobile-data-apn ?apn>> [
    ? <run-service <qmi-wwan ?dev>> [
      <require-service <daemon <qmi-wwan-manager $dev $apn>>>
      $log ! <log "-" { line: "starting wwan manager", dev: $dev, apn: $apn }>
    ]
  ]
]

Reading this inside-out,

  • run-service for qmi-wwan service names is defined to require a <daemon <qmi-wwan-manager deviceName APN>> service, defined elsewhere; in addition, when a run-service assertion appears, a log message is produced.

  • the stanza reacting to run-service is only active when some <user-setting <mobile-data-apn APN>> assertion exists.

  • the stanza querying the mobile-data-apn user setting is itself only active when <user-setting <mobile-data-enabled>> has been asserted.

In sum, this means that even if a qmi-wwan service is requested and activated, nothing will happen until the user enables mobile data and selects an APN. If the user later disables mobile data, the qmi-wwan implementation will automatically be retracted, and the corresponding qmi-wwan-manager service terminated.


1

This first service name example is interesting because it includes an embedded capability reference using the $. syntax from the scripting language to denote the active scripting language environment dictionary.

How to restart services

Send a restart-service message mentioning the service name of the service to restart. Use the ! operator of the configuration language to send a message (as opposed to make an assertion):

! <restart-service <daemon <wifi-daemon "wlan0">>>
! <restart-service <daemon user-settings-daemon>>

In future, a command-line tool for sending messages to a system dataspace will be provided; for now, create temporary configuration language scripts in /run/etc/syndicate/services:

THROCK=/run/etc/syndicate/services/throck.pr
echo '! <restart-service <daemon <wifi-daemon "wlan0">>>' > $THROCK
sleep 1
rm -f $THROCK

How to schedule one-off or repeating tasks

(TODO. Not yet implemented: a cron-like program will eventually respond to assertions demanding periodic or delayed execution of tasks (likely expressed as assertions, making it more of a delayed-or-periodic-assertion-producing program).)

Timer tolerances

Apple has come up with the useful idea of a timer tolerance, applicable to both repeating and one-off timers. In their documentation, they write:

The timer may fire at any time between its scheduled fire date and the scheduled fire date plus the tolerance. [...] A general rule, set the tolerance to at least 10% of the interval [...] Even a small amount of tolerance has significant positive impact on the power usage of your application.

One-off tasks

Repeating tasks

How to manage user settings

Send a user-settings-command message containing an assert or retract record containing the setting assertion to add or remove. Use the ! operator of the configuration language to send a message (as opposed to make an assertion):

! <user-settings-command <assert <mobile-data-enabled>>>
! <user-settings-command <assert <mobile-data-apn "internet">>>
! <user-settings-command <retract <mobile-data-enabled>>>

In future, a command-line tool for sending such messages will be provided; for now, create temporary configuration language scripts in /run/etc/syndicate/services:

THROCK=/run/etc/syndicate/services/throck.pr
echo '! <user-settings-command <assert <mobile-data-enabled>>>' > $THROCK
sleep 1
rm -f $THROCK

How to reboot and power off the machine

(TODO. Not yet implemented: eventually, synit-pid1 will respond to messages/assertions from the dataspace, implementing the necessary coordination for a graceful shutdown procedure. For now, sync three times, sleep a bit, and reboot -f or poweroff -f...)

How to suspend the machine

(TODO. Not yet implemented: eventually, assertions in the dataspace will control the desired suspend state, and reactive stanzas will allow responses to any kind of ambient conditions to include changes in the suspend state.)

The preserves-tools package

The preserves-tools package includes useful command-line utilities for working with Preserves values and schemas.

At present, it includes the preserves-tool Swiss-army-knife utility, which is useful for

  • converting between text and binary Preserves syntaxes;
  • pretty-printing (indenting) text Preserves syntax;
  • manipulating Preserves annotations;
  • breaking down and filtering Preserves documents using preserves path selectors;
  • and so on.

See also the preserves-tool documentation.

Preserves

Synit makes extensive use of Preserves, a programming-language-independent language for data.

The Preserves data language is in many ways comparable to JSON, XML, S-expressions, CBOR, ASN.1 BER, and so on. From the specification document:

Preserves supports records with user-defined labels, embedded references, and the usual suite of atomic and compound data types, including binary data as a distinct type from text strings.

Why does Synit rely on Preserves?

There are four aspects of Preserves that make it particularly relevant to Synit:

Grammar of values

Preserves has programming-language-independent semantics: the specification defines an equivalence relation over Preserves values.1 This makes it a solid foundation for a multi-language, multi-process, potentially distributed system like Synit. 2

Values and Types

Preserves values come in various types: a few basic atomic types, plus sequence, set, dictionary, and record compound types. From the specification:

                    Value = Atom           Atom = Boolean
                          | Compound            | Float
                          | Embedded            | Double
                                                | SignedInteger
                 Compound = Record              | String
                          | Sequence            | ByteString
                          | Set                 | Symbol
                          | Dictionary

Concrete syntax

Preserves offers multiple syntaxes, each useful in different settings. Values are automatically, losslessly translatable from one syntax to another because Preserves' semantics are syntax-independent.

The core Preserves specification defines a text-based, human-readable, JSON-like syntax, that is a syntactic superset of JSON, and a completely equivalent compact binary syntax, crucial to the definition of canonical form for Preserves values.3

Here are a few example values, written using the text syntax (see the specification for the grammar):

Boolean    : #t #f
Float      : 1.0f 10.4e3f -100.6f
Double     : 1.0 10.4e3 -100.6
Integer    : 1 0 -100
String     : "Hello, world!\n"
ByteString : #"bin\x00str\x00" #[YmluAHN0cgA] #x"62696e0073747200"
Symbol     : hello-world |hello world| = ! hello? || ...
Record     : <label field1 field2 ...>
Sequence   : [value1 value2 ...]
Set        : #{value1 value2 ...}
Dictionary : {key1: value1 key2: value2 ...: ...}
Embedded   : #!value

Commas are optional in sequences, sets, and dictionaries.

Canonical form

Every Preserves value can be serialized into a canonical form using the binary syntax along with a few simple rules about serialization ordering of elements in sets and keys in dictionaries.

Having a canonical form means that, for example, a cryptographic hash of a value's canonical serialization can be used as a unique fingerprint for the value.

For example, the SHA-512 digest of the canonical serialization of the value

<sms-delivery <address international "31653131313">
              <address international "31655512345">
              <rfc3339 "2022-02-09T08:18:29.88847+01:00">
              "This is a test SMS message">

is

bfea9bd5ddf7781e34b6ca7e146ba2e442ef8ce04fd5ff912f889359945d0e2967a77a13
c86b13959dcce7e8ba3950d303832b825648609447b3d147677163ce

Capabilities

Preserves values can include embedded references, written as values with a #! prefix. For example, a command adding <some-setting> to the user settings database might look like this as it travels over a Unix pipe connecting a program to the root dataspace:

<user-settings-command <assert <some-setting>> #![0 123]>

The user-settings-command structure includes the assert command itself, plus an embedded capability reference, #![0 123], which encodes a transport-specific reference to an object.

TODO: Link to documentation for sturdy.prs.

The syntax of values under #! differs depending on the medium carrying the message. For example, point-to-point transports need to be able to refer to "my references" (#![0 n]) and "your references" (#![1 n]), while multicast/broadcast media (like Ethernet) need to be able to name references within specific, named conversational participants (#![<udp [192 168 1 10] 5999> n]), and in-memory representations need to use direct pointers (#!140425190562944).

In every case, the references themselves work like Unix file descriptors: an integer or similar that unforgeably denotes, in a local context, some complex data structure on the other side of a trust boundary.

When capability-bearing Preserves values are read off a transport, the capabilities are automatically rewritten into references to in-memory proxy objects. The reverse process of rewriting capability references happens when an in-memory value is serialized for transmission.

Schemas

Preserves comes with a schema language suitable for defining protocols among actors/programs in Synit. Because Preserves is a superset of JSON, its schemas can be used for parsing JSON just as well as for native Preserves values.4 From the schema specification:

A Preserves schema connects Preserves Values to host-language data structures. Each definition within a schema can be processed by a compiler to produce

  • a host-language type definition;
  • a partial parsing function from Values to instances of the produced type; and
  • a total serialization function from instances of the type to Values.

Every parsed Value retains enough information to always be able to be serialized again, and every instance of a host-language data structure contains, by construction, enough information to be successfully serialized.

Instead of taking host-language data structure definitions as primary, in the way that systems like Serde do, Preserves schemas take the shape of the serialized data as primary.

To see the difference, let's look at an example.

Example: Book Outline

Systems like Serde concentrate on defining (de)serializers for host-language type definitions.

Serde starts from definitions like the following.5 It generates (de)serialization code for various different data languages (such as JSON, XML, CBOR, etc.) in a single programming language: Rust.


#![allow(unused)]
fn main() {
pub struct BookOutline {
    pub sections: Vec<BookItem>,
}
pub enum BookItem {
    Chapter(Chapter),
    Separator,
    PartTitle(String),
}
pub struct Chapter {
    pub name: String,
    pub sub_items: Vec<BookItem>,
}
}

The (de)serializers are able to convert between in-memory and serialized representations such as the following JSON document. The focus is on Rust: interpreting the produced documents from other languages is out-of-scope for Serde.

{
  "sections": [
    { "PartTitle": "Part I" },
    "Separator",
    {
      "Chapter": {
        "name": "Chapter One",
        "sub_items": []
      }
    },
    {
      "Chapter": {
        "name": "Chapter Two",
        "sub_items": []
      }
    }
  ]
}

By contrast, Preserves schemas map a single data language to and from multiple programming languages. Each specific programming language has its own schema compiler, which generates type definitions and (de)serialization code for that language from a language-independent grammar.

For example, a schema able to parse values compatible with those produced by Serde for the type definitions above is the following:

version 1 .

BookOutline = {
  "sections": @sections [BookItem ...],
} .

BookItem = @chapter { "Chapter": @value Chapter }
         / @separator "Separator"
         / @partTitle { "PartTitle": @value string } .

Chapter = {
  "name": @name string,
  "sub_items": @sub_items [BookItem ...],
} .

Using the Rust schema compiler, we see types such as the following, which are similar to but not the same as the original Rust types above:


#![allow(unused)]
fn main() {
pub struct BookOutline {
    pub sections: std::vec::Vec<BookItem>
}
pub enum BookItem {
    Chapter { value: std::boxed::Box<Chapter> },
    Separator,
    PartTitle { value: std::string::String }
}
pub struct Chapter {
    pub name: std::string::String,
    pub sub_items: std::vec::Vec<BookItem>
}
}

Using the TypeScript schema compiler, we see

export type BookOutline = {"sections": Array<BookItem>};

export type BookItem = (
    {"_variant": "chapter", "value": Chapter} |
    {"_variant": "separator"} |
    {"_variant": "partTitle", "value": string}
);

export type Chapter = {"name": string, "sub_items": Array<BookItem>};

Using the Racket schema compiler, we see

(struct BookOutline (sections))
(define (BookItem? p)
    (or (BookItem-chapter? p)
        (BookItem-separator? p)
        (BookItem-partTitle? p)))
(struct BookItem-chapter (value))
(struct BookItem-separator ())
(struct BookItem-partTitle (value))
(struct Chapter (name sub_items))

and so on.

Example: Book Outline redux, using Records

The schema for book outlines above accepts Preserves (JSON) documents compatible with the (de)serializers produced by Serde for a Rust-native type.

Instead, we might choose to define a Preserves-native data definition, and to work from that:6

version 1 .
BookOutline = <book-outline @sections [BookItem ...]> .
BookItem = Chapter / =separator / @partTitle string .
Chapter = <chapter @name string @sub_items [BookItem ...]> .

The schema compilers produce exactly the same type definitions7 for this variation. The differences are in the (de)serialization code only.

Here's the Preserves value equivalent to the example above, expressed using the Preserves-native schema:

<book-outline [
  "Part I"
  separator
  <chapter "Chapter One" []>
  <chapter "Chapter Two" []>
]>

Notes

1

The specification defines a total order relation over Preserves values as well.

2

In particular, dataspaces need the assertion data they contain to have a sensible equivalence predicate in order to be useful at all. If you can't reliably tell whether two values are the same or different, how are you supposed to use them to look things up in anything database-like? Languages like JSON, which don't have a well-defined equivalence relation, aren't good enough. When programs communicate with each other, they need to be sure that their peers will understand the information they receive exactly as it was sent.

3

Besides the two core syntaxes, other serialization syntaxes are in use in other systems. For example, the Spritely Goblins actor library uses a serialization syntax called Syrup, reminiscent of bencode.

4

You have to use a Preserves text-syntax reader on JSON terms to do this, though: JSON values like null, true, and false naively read as Preserves symbols. Preserves doesn't have the concept of null.

5

This example is a simplified form of the preprocessor type definitions for mdBook, the system used to render these pages. I use a real Preserves schema definition for parsing and producing Serde's JSON representation of mdBook Book structures in order to preprocess the text.

6

By doing so, we lose compatibility with the Serde structures, but the point is to show the kinds of schemas available to us once we move away from strict compatibility with existing data formats.

7

Well, almost exactly the same. The only difference is in the Rust types, which use tuple-style instead of record-style structs for chapters and part titles.

Working with schemas

Schema source code: *.prs files

Preserves schemas are written in a syntax that (ab)uses Preserves text syntax as a kind of S-expression. Schema source code looks like this:

version 1 .
Present = <Present @username string> .
Says = <Says @who string @what string> .
UserStatus = <Status @username string @status Status> .
Status = =here / <away @since TimeStamp> .
TimeStamp = string .

Conventionally, schema source code is stored in *.prs files. In this example, the source code above is placed in simpleChatProtocol.prs.

Compiling source code to metaschema instances: *.prb files

Many of the code generator tools for Preserves schemas require not source code, but instances of the Preserves metaschema. To compile schema source code to metaschema instances, use preserves-schemac:

yarn global add @preserves/schema
preserves-schemac .:simpleChatProtocol.prs > simpleChatProtocol.prb

Binary-syntax metaschema instances are conventionally stored in *.prb files.

If you have a whole directory tree of *.prs files, you can supply just "." without the ":"-prefixed fileglob part. See the preserves-schemac documentation.

Converting the simpleChatProtocol.prb file to Preserves text syntax lets us read the metaschema instance corresponding to the source code:

cat simpleChatProtocol.prb | preserves-tool convert

The result:

<bundle {
  [
    simpleChatProtocol
  ]: <schema {
    definitions: {
      Present: <rec <lit Present> <tuple [
        <named username <atom String>>
      ]>>
      Says: <rec <lit Says> <tuple [
        <named who <atom String>>
        <named what <atom String>>
      ]>>
      Status: <or [
        [
          "here"
          <lit here>
        ]
        [
          "away"
          <rec <lit away> <tuple [
            <named since <ref [] TimeStamp>>
          ]>>
        ]
      ]>
      TimeStamp: <atom String>
      UserStatus: <rec <lit Status> <tuple [
        <named username <atom String>>
        <named status <ref [] Status>>
      ]>>
    }
    embeddedType: #f
    version: 1
  }>
}>

Generating support code from metaschema instances

Support exists for working with schemas in many languages, including Python, Rust, TypeScript, Racket, and Squeak Smalltalk.

Python

Python doesn't have a separate compilation step: it loads binary metaschema instances at runtime, generating classes on the fly.

After pip install preserves, load metaschemas with preserves.schema.load_schema_file:

from preserves import stringify, schema, parse
S = schema.load_schema_file('./simpleChatProtocol.prb')
P = S.simpleChatProtocol

Then, members of P are the definitions from simpleChatProtocol.prs:

>>> P.Present('me')
Present {'username': 'me'}

>>> stringify(P.Present('me'))
'<Present "me">'

>>> P.Present.decode(parse('<Present "me">'))
Present {'username': 'me'}

>>> P.Present.try_decode(parse('<Present "me">'))
Present {'username': 'me'}

>>> P.Present.try_decode(parse('<NotPresent "me">')) is None
True

>>> stringify(P.UserStatus('me', P.Status.here()))
'<Status "me" here>'

>>> stringify(P.UserStatus('me', P.Status.away('2022-03-08')))
'<Status "me" <away "2022-03-08">>'

>>> x = P.UserStatus.decode(parse('<Status "me" <away "2022-03-08">>'))
>>> x.status.VARIANT
#away
>>> x.status.VARIANT == Symbol('away')
True

Rust

Generate Rust definitions corresponding to a metaschema instance with preserves-schema-rs. The best way to use it is to integrate it into your build.rs (see the docs), but you can also use it as a standalone command-line tool.

The following command generates a directory ./rs/chat containing rust sources for a module that expects to be called chat in Rust code:

preserves-schema-rs --output-dir rs/chat --prefix chat simpleChatProtocol.prb

Representative excerpts from one of the generated files, ./rs/chat/simple_chat_protocol.rs:

pub struct Present {
    pub username: std::string::String
}
pub struct Says {
    pub who: std::string::String,
    pub what: std::string::String
}
pub struct UserStatus {
    pub username: std::string::String,
    pub status: Status
}
pub enum Status {
    Here,
    Away {
        since: std::boxed::Box<TimeStamp>
    }
}
pub struct TimeStamp(pub std::string::String);

TypeScript

Generate TypeScript definitions from schema sources (not metaschema instances) using preserves-schema-ts. Unlike other code generators, this one understands schema source code directly.

The following command generates a directory ./ts/gen containing TypeScript sources:

preserves-schema-ts --output ./ts/gen .:simpleChatProtocol.prs

Representative excerpts from one of the generated files, ./ts/gen/simpleChatProtocol.ts:

export type Present = {"username": string};
export type Says = {"who": string, "what": string};
export type UserStatus = {"username": string, "status": Status};
export type Status = ({"_variant": "here"} | {"_variant": "away", "since": TimeStamp});
export type TimeStamp = string;

Squeak Smalltalk

After loading the Preserves package from the Preserves project SqueakSource page, perhaps via

Installer squeaksource project: 'Preserves'; install: 'Preserves'.

you can load and compile the bundle using something like

(PreservesSchemaEnvironment fromBundleFile: 'simpleChatProtocol.prb')
	category: 'Example-Preserves-Schema-SimpleChat';
	prefix: 'SimpleChat';
	cleanCategoryOnCompile: true;
	compileBundle.

which results in classes whose names are prefixed with SimpleChat being created in package Example-Preserves-Schema-SimpleChat. Here's a screenshot of a browser showing the generated classes:

Screenshot of Squeak Browser on class SimpleChatSimpleChatProtocol

Exploring the result of evaluating the following expression, which generates a Smalltalk object in the specified schema, yields the following screenshot:

SimpleChatSimpleChatProtocolStatus away
    since: (SimpleChatSimpleChatProtocolTimeStamp new value: '2022-03-08')

Exploration of a SimpleChatSimpleChatProtocolStatus object

Exploring the result of evaluating the following expression, which generates a Smalltalk object representing the Preserves value corresponding to the value produced in the previous expression, yields the following screenshot:

(SimpleChatSimpleChatProtocolStatus away
        since: (SimpleChatSimpleChatProtocolTimeStamp new value: '2022-03-08'))
    asPreserves

Exploration of a SimpleChatSimpleChatProtocolStatus preserves value object

Finally, the following expression parses a valid Status string input:

SimpleChatSimpleChatProtocolStatus
    from: '<away "2022-03-08">' parsePreserves
    orTry: []

If it had been invalid, the answer would have been nil (because [] value is nil).

Capturing and rendering interaction traces

The syndicate-server program is able to capture traces of all Syndicated Actor Model interactions that traverse it, saving them as TraceEntry records (defined in trace.prs) to a file for later analysis.

Recording a trace

To record a trace, start syndicate-server with the -t <trace-file> (or --trace-file <trace-file>) command-line options. All interactions will be recorded in the named file.

The contents of the file will look a bit like this:

<trace 1643236405.7954443 1 <start <anonymous>>>
<trace 1643236405.7959989 11 <start <named dependencies_listener>>>
<trace 1643236405.7960189 21 <start <named config_watcher>>>
<trace 1643236405.7960294 31 <start <named daemon_listener>>>
<trace 1643236405.7960389 41 <start <named debt_reporter_listener>>>
<trace 1643236405.7960542 51 <start <named milestone_listener>>>
<trace 1643236405.7960613 61 <start <named tcp_relay_listener>>>
<trace 1643236405.7960687 71 <start <named unix_relay_listener>>>
<trace 1643236405.7960766 81 <start <named logger>>>
<trace 1643236405.7960895 1 <turn 9 <external "top-level actor"> [<facet-start [12 2]> <spawn #f 11> <spawn #f 21> <spawn #f 31> <spawn #f 41> <spawn #f 51> <spawn #f 61> <spawn #f 71> <enqueue <event <entity 1 12 140358591713488> <assert <value <run-service <config-watcher "config" {config: #!"1/12:00007fa7c80010d0" gatekeeper: #!"1/12:00007fa7c8005420" log: #!"1/12:00007fa7c80011b0"}>>> 3>>> <spawn #f 81>]>>
<trace 1643236405.7961453 1 <turn 29 <caused-by 9> [<dequeue <event <entity 1 12 140358591713488> <assert <value <run-service <config-watcher "config" {config: #!"1/12:00007fa7c80010d0" gatekeeper: #!"1/12:00007fa7c8005420" log: #!"1/12:00007fa7c80011b0"}>>> 3>>>]>>
<trace 1643236405.7962394 81 <turn 49 <caused-by 9> [<facet-start [122 92]> <enqueue <event <entity 1 12 140358591713712> <assert <value <Observe <rec log [<bind <_>> <bind <_>>]> #!"81/122:00007fa7c800ff10">> 13>>>]>>
<trace 1643236405.796323 11 <turn 19 <caused-by 9> [<facet-start [102 22]> <enqueue <event <entity 1 12 140358591713488> <assert <value <Observe <rec require-service [<bind <_>>]> #!"11/102:00007fa75c0010b0">> 23>>>]>>
...

Rendering a trace

Tools such as syndicate-render-msd can process trace files to produce message-sequence-diagram-like interactive renderings of their contents. The trace file excerpted above renders (in part) in the browser to the following screenshot:

Interaction snapshot

Enhancements such as streaming of a live trace and filtering and selecting subtraces are on the roadmap.

Python support libraries

The py3-preserves and py3-syndicate packages include the Python implementations of Preserves (preserves on PyPI; git) and the Syndicated Actor Model and Syndicate Protocol (syndicate-py on PyPI; git), respectively.

When installed, the libraries are available in the standard location for system-wide Python packages.

Shell-scripting libraries

The syndicate-sh package includes /usr/lib/syndicate/syndicate.sh, an implementation of the Syndicate Protocol for Bash. Scripts may take advantage of the library to interact with peers via system dataspaces, either as supervised services or as external programs making use of the gatekeeper service.

Examples of both kinds of script are included in the syndicate-sh git repository (see the examples directory).

Preserves schemas

The following pages document schemas associated with the Preserves data language and its tools.

Preserves Schema metaschema

The Preserves Schema metaschema defines the structure of the abstract syntax (AST) of schemas. Every valid Preserves Schema document can be represented as an instance of the metaschema. (And of course the metaschema is itself a schema, which is in turn an instance of the metaschema!)

⟶ See Appendix A: "Metaschema" of the Preserves Schema specification.

Preserves Path schema

Preserves Path is a language for selecting and filtering portions of a Preserves value. It has an associated schema describing the various kinds of Path expressions as abstract syntax.

The schema source below is taken from path/path.prs in the Preserves source code repository.

Preserves Path expressions come in several flavours: selectors, steps (axes and filters), and predicates. Each is described below along with its abstract syntax definitions.

Selectors and Steps

Selectors are a sequence of steps, applied one after the other to the currently-selected value. Each step transforms an input into zero or more outputs. A step is an axis or a filter.

Selector = [Step ...] .
Step = Axis / Filter .

Axes: selecting portions of the input

Each axis step generally selects some sub-portion or -portions of the current document. An axis may also have a secondary filtering effect: for example, label only applies to Records, and will yield an empty result set when applied to any other kind of input.

Axis =
/ <values>          ;; yields the immediate subvalues of the input nonrecursively
/ <descendants>     ;; recurses through all descendant subvalues of the input
/ <at @key any>     ;; extracts a subvalue named by the given key, if any
/ <label>           ;; extracts a Record's label, if any
/ <keys>            ;; extracts all keys (for subvalues) of the input, nonrecursively
/ <length>          ;; extracts the length/size of the input, if any
/ <annotations>     ;; extracts all annotations attached to the input
/ <embedded>        ;; moves into the representation of an embedded value, if any
/ <parse @module [symbol ...] @name symbol>   ;; parses using Preserves Schema
/ <unparse @module [symbol ...] @name symbol> ;; unparses using Preserves Schema
.

The parse and unparse variants name Schema definitions, to be resolved by the eventual surrounding context in which the expression will be executed. A parse axis parses the input using a Schema definition; if the parse succeeds, the axis moves into the parse result. Similarly, unparse expects an abstract parse result, transforming it back into a concrete value according to the Schema definition.

Filters: rejecting inputs

Each filter step generally applies some test to the current document as a whole, either emitting it unchanged (with exceptions, detailed below) or emitting no outputs at all.

Filter =
/ <nop>                                 ;; Always emit the input
/ <compare @op Comparison @literal any> ;; Emit iff the comparison holds
/ <regex @regex string>                 ;; Emit iff input is String and regex matches
/ <test @pred Predicate>                ;; Apply complex predicate
/ <real>                                ;; Emit iff input is Float, Double, or Integer
/ <int>                                 ;; TRUNCATE and emit iff Float, Double or Integer
/ <kind @kind ValueKind>                ;; Emit iff input kind matches
.

Complex predicates

The complex predicates in a test filter are built up from logical connectives over selectors. A Selector predicate evaluates to true whenever, applied to its input, it results in a non-empty output set.

Predicate =
/ Selector
/ <not @pred Predicate>
/ <or @preds [Predicate ...]>
/ <and @preds [Predicate ...]>
.

Comparison against a literal

Each compare filter includes a Comparison and a literal value to compare the input against. For example, <compare eq 3> only produces an output if the input is equal (according to the Preserves semantic model) to 3.

Comparison = =eq / =ne / =lt / =ge / =gt / =le .

NB. For inequalities (lt/ge/gt/le), comparison between values of different kinds is undefined in the current draft specification.

Filtering by value kind

Each kind filter selects only values from one of the kinds of Preserves value:

ValueKind =
/ =Boolean / =Float / =Double / =SignedInteger / =String / =ByteString / =Symbol
/ =Record / =Sequence / =Set / =Dictionary
/ =Embedded
.

Syndicated Actor Model schemas

The following pages document schemas associated with the Syndicated Actor Model. By and large, these schemas are contained in the syndicate-protocols Git repository.

"Observe" assertions

The protocol for interaction with a dataspace entity is very simple: any assertion can be sent to a dataspace. The job of a dataspace is to relay assertions on to interested peers; they do not generally interpret assertions themselves.

The sole exception is assertions of interest in other assertions.

These are called "Observe" assertions, or subscriptions:

Observe = <Observe @pattern dataspacePatterns.Pattern @observer #!any>.

An Observe assertion contains a pattern and a reference to an observer entity. When an Observe assertion is published to a dataspace, the dataspace alters its internal index to make a note of the new expression of interest. It also immediately relays any of its already-existing assertions that match the pattern to the observer entity. As other assertions come and go subsequently, the dataspace takes care to inform the observer entity in the Observe record of the arrival or departure of any of the changing assertions that match the pattern.

Patterns over assertions

Each subscription record asserted at a dataspace entity contains a pattern over Preserves values.

The pattern language is carefully chosen to be reasonably expressive without closing the door to efficient indexing of dataspace contents.1

Interpretation of patterns

A pattern is matched against a candidate input value. Matching can either fail or succeed; if matching succeeds, a sequence of (numbered) bindings is produced. Each binding in the sequence corresponds to a (possibly-nested) binding pattern in the overall pattern.

Example

Consider the pattern:

<arr [<lit 1> <bind <arr [<bind <_>> <_>]>> <_>]>

The following values each yield different results:

  • [1 2 3] fails, because 2 is not an array.

  • [1 [2 3] 4] succeeds, yielding a binding sequence [[2 3] 2], because the outer bind captures the whole [2 3] array, and the inner (nested) bind captures the 2.

  • [1 [2 3 4] 5] fails, because [2 3 4] has more than the expected two elements.

  • [1 [<x> <y>] []] succeeds, yielding a binding sequence [[<x> <y>] <x>]. Each discard pattern (<_>) ignores the specific input it is given.

Abstract syntax of patterns

A pattern may be either a discard, a (nested) binding, a literal, or a compound.

Pattern = DDiscard / DBind / DLit / DCompound .

Discard

A discard pattern matches any input value.

DDiscard = <_>.

Binding

A binding pattern speculatively pushes the portion of the input under consideration onto the end of the binding sequence being built, and then recursively evaluates its subpattern. If the subpattern succeeds, so does the overall binding pattern (keeping the binding); otherwise, the speculative addition to the binding sequence is undone, and the overall binding pattern fails.

DBind = <bind @pattern Pattern>.

Literal

A literal pattern matches any atomic Preserves value. In order to match a literal compound value, a combination of compound and literal patterns must be used.

DLit = <lit @value AnyAtom>.
AnyAtom =
    / @bool bool
    / @float float
    / @double double
    / @int int
    / @string string
    / @bytes bytes
    / @symbol symbol
    / @embedded #!any
.

Compound

Each compound pattern first checks the type of its input: a rec pattern fails unless it is given a Record, an arr demands a Sequence and a dict only matches a Dictionary.

DCompound = <rec @label any @fields [Pattern ...]>
          / <arr @items [Pattern ...]>
          / <dict @entries { any: Pattern ...:... }> .

If the type check fails, the pattern match fails. Otherwise, matching continues:

  • rec patterns compare the label of the input Record against the label field in the pattern; unless they match literally and exactly, the overall match fails. Otherwise, if the number of fields in the input does not equal the number expected in the pattern, the match fails. Otherwise, matching proceeds structurally recursively.

  • arr patterns fail if the number of subpatterns does not match the number of items in the input Sequence. Otherwise, matching proceeds structurally recursively.

  • dict patterns consider each of the key/subpattern pairs in entries in turn, according to the Preserves order of the keys.2 If any given key from the pattern is not present in the input value, matching fails. Otherwise, matching proceeds recursively. The pattern ignores keys in the input value that are not mentioned in the pattern.


1

Most implementations of Syndicated Actor Model dataspaces use an efficient index datastructure described here.

2

The ordering of visiting of keys in a dict pattern is important because bindings are numbered in this pattern language, not named. Recall that <dict {a: <bind <_>>, b: <bind <_>>}> is an identical Preserves value to <dict {b: <<bind <_>>, a: <bind <_>>}>, so to guarantee consistent binding results, we must choose some deterministic order for visiting the subpatterns of the dict. (In this example, a will be visited before b, because a < b).

Gatekeeper and Sturdy-references

Wire-protocol

The wire-protocol schema, used for communication among entities separated by a point-to-point link of some kind, is fully described as part of the Syndicate Protocol specification.

Service dependencies

Tracing

Transport addresses

Syndicated Actor Model

The Syndicated Actor Model (SAM) [Garnock-Jones 2017] is an approach to concurrency based on the Communicating Event-Loop Actor Model [De Koster et al 2016] as pioneered by E [Miller 2006] and AmbientTalk [Van Cutsem et al 2007].

While other Actor-like models take message-passing as fundamental, the SAM builds on a different underlying primitive: eventually-consistent replication of state among actors. Message-passing follows as a derived operation.

This fundamental difference integrates Tuplespace- [Gelernter and Carriero 1992] and publish/subscribe-like ideas with concurrent object-oriented programming, and makes the SAM well-suited for building programs and systems that are reactive, robust to change, and graceful in the face of partial failure.

Outline. This document first describes the primitives of SAM interaction, and then briefly illustrates their application to distributed state management and handling of partial failure. It goes on to present the idea of a dataspace, an integration of Tuplespace- and publish/subscribe-like ideas with the SAM. Finally, it discusses the SAM's generalization of object capabilities to allow for control not only over invocation of object behaviour but subscription to object state.

Throughout, we will limit discussion to interaction among actors connected directly to one another: that is, to interaction within a single scope. Scopes can be treated as "subnets" and connected together: see the Syndicate protocol specification.

For more on the SAM, on the concept of "conversational concurrency" that the model is a response to, and on other aspects of the larger project that the SAM is a part of, please see https://syndicate-lang.org/about/ and Garnock-Jones' 2017 dissertation.

Concepts and components of SAM interaction

A number of inter-related ideas must be taken together to make sense of SAM interaction. This section will outline the essentials.

For core concepts of Actor models generally, see De Koster et al.'s outstanding 2016 survey paper, which lays out a taxonomy of Actor systems as well as introducing solid definitions for terms such as "actor", "message", and so on.

Actors, Entities, Assertions and Messages

The SAM is based around actors which not only exchange messages, but publish ("assert") selected portions of their internal state ("assertions") to their peers in a publish/subscribe, reactive manner. Assertions and messages in the SAM are semi-structured data: their structure allows for pattern-matching and content-based routing.

Assertions are published and withdrawn freely throughout each actor's lifetime. When an actor terminates, all its published assertions are automatically withdrawn. This holds for both normal and exceptional termination: crashing actors are cleaned up, too.

An actor in the SAM comprises

  • an inbox, for receiving events from peers;
  • a state, "all the state that is synchronously accessible by that actor" (De Koster et al 2016);
  • a collection of entities; and
  • a collection of outbound assertions, the data to be automatically retracted upon actor termination.

The term "entity" in the SAM denotes a reactive object, owned by a specific actor.1 Entities, not actors, are the unit of addressing in the SAM. Every published assertion and every sent message is targeted at some entity. Entities never outlive their actors—when an actor terminates, its entities become unresponsive—but may have lifetimes shorter than their owning actors.

Local interactions, among objects (entities) within the state of the same actor, occur synchronously. All other interactions are considered "remote", and occur asynchronously.

Turns

Each time an event arrives at an actor's inbox, the actor takes a turn. De Koster et al. define turns as follows:

A turn is defined as the processing of a single message by an actor. In other words, a turn defines the process of an actor taking a message from its inbox and processing that message to completion.

In the SAM, a turn comprises

  • the event that triggered the turn and the entity addressed by the event,
  • the entity's execution of its response to the event, and
  • the collection of pending actions produced during execution.

If a turn proceeds to completion without an exception or other crash, its pending actions are delivered to their target entities/actors. If, on the other hand, the turn is aborted for some reason, its pending actions are discarded. This transactional "commit" or "rollback" of a turn is familiar from other event-loop-style models such as Ken [Yoo et al 2012].

Events and Actions

SAM events convey a new assertion, retraction of a previously-established assertion, delivery of a message, or a request for synchronisation.

In response to an event, an actor (entity) schedules actions to be performed at the end of the turn. Actions include not only publication and retraction of assertions, transmission of messages, and issuing of synchronisaton requests, but also termination of the running actor and creation of new actors to run alongside the running actor.

Entity References are Object Capabilities

As mentioned above, entities are the unit of addressing in the SAM. Assertions and message bodies may include references to entities. Actors receiving such references may then use them as targets for later assertions and messages. Entity references act as object capabilities, very similar to those offered by E [Miller 2006].

Entity references play many roles in SAM interactions, but two are of particular importance. First, entity references are used to simulate functions and continuations for encoding remote procedure calls (RPCs). Second, entity references can act like consumers or subscribers, receiving asynchronous notifications about state changes from peers.

Illustrative Examples

To show some of the potential of the SAM, we will explore two representative examples: a distributed spreadsheet, and a cellular modem server.

Spreadsheet cell

Imagine a collection of actors representing portions of a spreadsheet, each containing entities representing spreadsheet cells. Each cell entity publishes public aspects of its state to interested peers: namely, its current value. It also responds to messages instructing it to update its formula. In pseudocode:

     1: define entity Cell(formula):
     2:         subscribers ← ∅
     3:         on assertion from a peer of interest in our value,
     4:                 add peer, the entity reference carried in the assertion of interest, to subscribers
     5:         on retraction of previously-expressed interest from some peer,
     6:                 remove peer from subscribers
     7:         assert subscriptions to other Cells (using entity references in formula)
     8:         on message conveying a new formula,
     9:                 formulanewFormula
    10:                 replace subscription assertions using references in new formula
    11:         on assertion conveying updated contents relevant to formula,
    12:                 value ← eval(formula)
    13:         continuously, whenever value or subscribers changes,
    14:                 assert the contents of value to every peer in subscribers,
    15:                 retracting previously-asserted values

Much of the subscription-management behaviour of Cell is generic: lines 2–6 managing the subscribers set and lines 13–14 iterating over it will be common to any entity wishing to allow observers to track portions of its state. This observation leads to the factoring-out of dataspaces, introduced below.

Cellular modem server

Imagine an actor implementing a simple driver for a cellular modem, that accepts requests (as Hayes modem command strings) paired with continuations represented as entity references. Any responses the modem sends in reply to a command string are delivered to the continuation entity as a SAM message.

     1: define entity Server():
     2:         on assertion Request(commandString, replyEntity)
     3:                 output commandString via modem serial port
     4:                 collect response(s) from modem serial port
     5:                 send response(s) as a message to replyEntity

     6: define entity Client(serverRef):
     7:         define entity k:
     8:                 on message containing responses,
     9:                         retract the Request assertion
    10:                         (and continue with other tasks)
    11:         assert Request("AT+CMGS=...", k) to serverRef

This is almost a standard continuation-passing style encoding of remote procedure call.2 However, there is one important difference: the request is sent to the remote object not as a message, but as an assertion. Assertions, unlike messages, have a lifetime and so can act to set up a conversational frame within which further interaction can take place.

Here, subsequent interaction appears at first glance to be limited to transmission of a response message to replyEntity. But what if the Server were to crash before sending a response?

Erlang [Armstrong 2003] pioneered the use of "links" and "monitors" to detect failure of a remote peer during an interaction; "broken promises" and a suite of special system messages such as __whenBroken and __reactToLostClient [Miller 2006, chapter 17] do the same for E. The SAM instead uses retraction of previous assertions to signal failure.

To see how this works, we must step away from the pseudocode above and examine the context where serverRef is discovered for eventual use with Client. In the case that an assertion, rather than a message, conveys serverRef to the client actor, then when Server crashes, the assertion conveying serverRef is automatically retracted. The client actor, interpreting this as failure, can choose to respond appropriately.

The ubiquity of these patterns of service discovery and failure signalling also contributed, along with the patterns of generic publisher/subscriber state management mentioned above, to the factoring-out of dataspaces.

Dataspaces

A special kind of syndicated actor entity, a dataspace, routes and replicates published data according to actors' interests.

     1: define entity Dataspace():
     2:         allAssertionsnew Bag()
     3:         allSubscribersnew Set()
     4:         on assertion of semi-structured datum a,
     5:                 add a to allAssertions
     6:                 if a appears exactly once now in allAssertions,
     7:                         if a matches Observe(pattern, subscriberRef),
     8:                                 add (pattern, subscriberRef) to allSubscribers
     9:                                 for x in allAssertions, if x matches pattern,
    10:                                         assert x at subscriberRef
    11:                         otherwise,
    12:                                 for (p, s) in allSubscribers, if a matches p,
    13:                                         assert a at s
    14:         on retraction of previously-asserted a,
    15:                 remove a from allAssertions
    16:                 if a no longer appears at all in allAssertions,
    17:                         retract a from all subscribers to whom it was forwarded
    18:                         if a matches Observe(pattern, subscriberRef),
    19:                                 remove (pattern, subscriberRef) from allSubscribers
    20:                                 retract all assertions previously sent to subscriberRef

Assertions sent to a dataspace are routed by pattern-matching. Subscriptions—tuples associating a pattern with a subscriber entity—are placed in the dataspace as assertions like any other.

A dataspace entity behaves very similarly to a tuplespace [Gelernter and Carriero 1992]. However, there are two key differences.

The first is that, while tuples in a tuplespace are "generative" [Gelernter 1985], taking on independent existence once created and potentially remaining in a tuplespace indefinitely, SAM assertions never outlive their asserting actors. This means that assertions placed at a dataspace only exist as long as they are actively maintained. If an actor terminates or crashes, all its assertions are withdrawn, including those targeted at a dataspace entity. The dataspace, following its definition, forwards all withdrawals on to interested subscribers.

The second is that assertion of a value is idempotent: multiple assertions of the same value3 appear to observers indistinguishable from a single assertion. In other words, assertions at a dataspace are deduplicated.

Applications of dataspaces

Dataspaces have many uses. They are ubiquitous in SAM programs. The form of state replication embodied in dataspaces subsumes Erlang-style links and monitors, publish/subscribe, tuplespaces, presence notifications, directory/naming services, and so on.

Subscription management

The very essence of a dataspace entity is subscription management. Entities wishing to manage collections of subscribers can cooperate with dataspaces: they may either manage a private dataspace entity, or share a dataspace with other entities. For example, in the spreadsheet cell example above, each cell could use its own private dataspace, or all cells could share a dataspace by embedding their values in a record alongside some name for the cell.

Service directory and service discovery

Assertions placed at a dataspace may include entity references. This makes a dataspace an ideal implementation of a service directory. Services advertise their existence by asserting service presence [Konieczny et al 2009] records including their names alongside relevant entity references:

    Service("name", serviceRef)

Clients discover services by asserting interest in such records using patterns:

    Observe(⌜Service("name", _)⌝, clientRef)

Whenever some matching Service record has been asserted by a server, the dataspace asserts the corresponding record to clientRef. (The real dataspace pattern language includes binding, not discussed here; see [TODO].)

Failure signalling

Since assertions of service presence are withdrawn on failure, and withdrawals are propagated to interested subscribers, service clients like clientRef above will be automatically notified whenever serviceRef goes out of service. The same principle can also be applied in other similar settings.

Independence from service identity

There's no need to separate service discovery from service interaction. A client may assert its request directly at the dataspace; a service may subscribe to requests in the same direct way:

    (client:) ServiceRequest("name', arg1, arg2, ..., replyRef)
    (server:) Observe(⌜ServiceRequest("name', ?a, ?b, ..., ?k)⌝, serviceRef)

In fact, there are benefits to doing things this way. If the service should crash mid-transaction, then when it restarts, the incomplete ServiceRequest record will remain, and it can pick up where it left off. The client has become decoupled from the specific identity of the service provider, allowing flexibility that wasn't available before.

Asserting interest in assertions of interest

Subscriptions at a dataspace are assertions like any other. This opens up the possibility of reacting to subscriptions:

    Observe(⌜Observe(⌜...⌝, _)⌝, r)

This allows dataspace subscribers to express interest in which other subscribers are present.

In many cases, explicit assertion of presence (via, e.g., the Service records above) is the right thing to do, but from time to time it can make sense for clients to treat the presence of some subscriber interested in their requests as sufficient indication of service presence to go ahead.4

Illustrative Examples revisited

Now that we have Dataspaces in our toolbelt, let's revisit the spreadsheet cell and cellular modem examples from above.

Spreadsheet cell with a dataspace

     1: define entity Cell(dataspaceRef, name, formula):
     2:         continuously, whenever value changes,
     3:                 assert CellValue(name, value) to dataspaceRef
     4:         continuously, whenever formula changes,
     5:                 for each name n in formula,
     6:                         define entity k:
     7:                                 on assertion of nValue,
     8:                                         value ← (re)evaluation based on formula, nValue, and other nValues
     9:                         assert Observe(⌜CellValue(n, ?nValue)⌝, k) to dataspaceRef
    10:         on message conveying a new formula,
    11:                 formulanewFormula

The cell is able to outsource all subscription management to the dataspaceRef it is given. Its behaviour function is looking much closer to an abstract prose specification of a spreadsheet cell.

Cellular modem server with a dataspace

There are many ways to implement RPC using dataspaces,2 each with different characteristics. This implementation uses anonymous service instances, implicit service names, asserted requests, and message-based responses:

     1: define entity Server(dataspaceRef):
     2:         define entity serviceRef:
     3:                 on assertion of commandString and replyEntity
     4:                         output commandString via modem serial port
     5:                         collect response(s) from modem serial port
     6:                         send response(s) as a message to replyEntity
     7:         assert Observe(⌜Request(?commandString, ?replyEntity)⌝, serviceRef) to dataspaceRef

     8: define entity Client(dataspaceRef):
     9:         define entity k:
    10:                 on message containing responses,
    11:                         retract the Request assertion
    12:                         (and continue with other tasks)
    13:         assert Request("AT+CMGS=...", k) to dataspaceRef

If the service crashes before replying, the client's request remains outstanding, and a service supervisor [Armstrong 2003, section 4.3.2] can reset the modem and start a fresh service instance. The client remains blissfully unaware that anything untoward happened.

We may also consider a variation where the client wishes to retract or modify its request in case of service crash. To do this, the client must pay more attention to the conversational frame of its interaction with the server. In the pseudocode above, no explicit service discovery step is used, but the client could reason about the server's lifecycle by observing the (disappearance of) presence of the server's subscription to requests: Observe(⌜Observe(⌜Request(⌞_⌟, ⌞_⌟)⌝, _)⌝, ...).

Object-capabilities for access control

Object capabilities are the only properly compositional way to secure a distributed system.5 They are a natural fit for Actor-style systems, as demonstrated by E and its various descendants [Miller 2006, Van Cutsem et al 2007, Stiegler and Tie 2010, Yoo et al 2012 and others], so it makes sense that they would work well for the Syndicated Actor Model.

The main difference between SAM capabilities and those in E-style Actor models is that syndicated capabilities express pattern-matching-based restrictions on the assertions that may be directed toward a given entity, as well as the messages that may be sent its way.

Combined with the fact that subscription is expressed with assertions like any other, this yields a mechanism offering control over state replication and observation of replicated state as well as ordinary message-passing and RPC.

In the SAM, a capability is a triple of

  • target actor reference,
  • target entity reference within that actor, and
  • an attenuation describing accepted assertions and messages.

An "attenuation" is a piece of syntax including patterns over semi-structured data. When an assertion or message is directed to the underlying entity by way of an attenuated capability, the asserted value or message body is checked against the patterns in the attenuation. Values not matching are discarded silently.6

Restricting method calls. For example, a reference to the dataspace where our cellular modem server example is running could be attenuated to only allow assertions of the form Request("ATA", _). This would have the effect of limiting holders of the capability to only being able to cause the modem to answer an incoming call ("ATA").

Restricting subscriptions. As another example, a reference to the dataspace where our spreadsheet cells are running could be attenuated to only allow assertions of the form Observe(⌜CellValue("B13", _)⌝, _). This would have the effect of limiting holders of the capability to only being able to read the contents (or presence) of cell B13.

Conclusion

We have looked at the concepts involved in the Syndicated Actor Model (SAM), an Actor-like approach to concurrency that offers a form of concurrent object-oriented programming with intrinsic publish/subscribe support. The notion of a dataspace factors out common interaction patterns and decouples SAM components from one another in useful ways. Object capabilities are used in the SAM not only to restrict access to the behaviour offered by objects, but to restrict the kinds of subscriptions that can be established to the state published by SAM objects.

While we have examined some of the high level forms of interaction among entities residing in SAM actors, we have not explored techniques for effectively structuring the internals of such actors. For this, the SAM offers the concept of "facets", which relate directly to conversational contexts; for a discussion of these, see Garnock-Jones' 2017 dissertation, especially chapter 2, chapter 5, chapter 8 and section 11.1. A less formal discussion of facets can also be found on the Syndicate project website.

Bibliography

[Armstrong 2003] Armstrong, Joe. “Making Reliable Distributed Systems in the Presence of Software Errors.” PhD, Royal Institute of Technology, Stockholm, 2003. [PDF]

[De Koster et al 2016] De Koster, Joeri, Tom Van Cutsem, and Wolfgang De Meuter. “43 Years of Actors: A Taxonomy of Actor Models and Their Key Properties.” In Proc. AGERE. Amsterdam, The Netherlands, 2016. [DOI (PDF available)]

[Felleisen 1991] Felleisen, Matthias. “On the Expressive Power of Programming Languages.” Science of Computer Programming 17, no. 1–3 (1991): 35–75. [DOI (PDF available)] [PS]

[Fischer et al 1985] Fischer, Michael J., Nancy A. Lynch, and Michael S. Paterson. “Impossibility of Distributed Consensus with One Faulty Process.” Journal of the ACM 32, no. 2 (April 1985): 374–382. [DOI (PDF available)] [PDF]

[Garnock-Jones 2017] Garnock-Jones, Tony. “Conversational Concurrency.” PhD, Northeastern University, 2017. [PDF] [HTML]

[Gelernter 1985] Gelernter, David. “Generative Communication in Linda.” ACM TOPLAS 7, no. 1 (January 2, 1985): 80–112. [DOI]

[Gelernter and Carriero 1992] Gelernter, David, and Nicholas Carriero. “Coordination Languages and Their Significance.” Communications of the ACM 35, no. 2 (February 1, 1992): 97–107. [DOI]

[Karp 2015] Karp, Alan H. “Access Control for IoT: A Position Paper.” In IEEE Workshop on Security and Privacy for IoT. Washington, DC, USA, 2015. [PDF]

[Konieczny et al 2009] Konieczny, Eric, Ryan Ashcraft, David Cunningham, and Sandeep Maripuri. “Establishing Presence within the Service-Oriented Environment.” In IEEE Aerospace Conference. Big Sky, Montana, 2009. [DOI]

[Miller 2006] Miller, Mark S. “Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control.” PhD, Johns Hopkins University, 2006. [PDF]

[Morris 1968] Morris, James Hiram, Jr. “Lambda-Calculus Models of Programming Languages.” PhD thesis, Massachusetts Institute of Technology, 1968. [Available online]

[Stiegler and Tie 2010] Stiegler, Marc, and Jing Tie. “Introduction to Waterken Programming.” Technical Report. Hewlett-Packard Labs, August 6, 2010. [Available online]

[Van Cutsem et al 2007] Van Cutsem, Tom, Stijn Mostinckx, Elisa González Boix, Jessie Dedecker, and Wolfgang De Meuter. “AmbientTalk: Object-Oriented Event-Driven Programming in Mobile Ad Hoc Networks.” In Proc. XXVI Int. Conf. of the Chilean Soc. of Comp. Sci. (SCCC’07). Iquique, Chile, 2007. [DOI]

[Yoo et al 2012] Yoo, Sunghwan, Charles Killian, Terence Kelly, Hyoun Kyu Cho, and Steven Plite. “Composable Reliability for Asynchronous Systems.” In Proc. USENIX Annual Technical Conference. Boston, Massachusetts, 2012. [Talk] [PDF] [Project page]


Notes

1

The terminology used in the SAM connects to the names used in E [Miller 2006] as follows: our actors are E's vats; our entities are E's objects.

2

Many variations on RPC are discussed in section 8.7 of Garnock-Jones' 2017 dissertation (direct link to relevant section of online text).

3

Here the thorny question of the equivalence of entity references rears its head. Preserves specifies an equivalence over its Values that is generic in the equivalence over embedded values such as entity references. The ideal equivalence here would be observational equivalence [Morris 1968, Felleisen 1991]: two references are the same when they react indistinguishably to assertions and messages. However, this isn't something that can be practically implemented except in relatively limited circumstances. Fortunately, in most cases, pointer equivalence of entity references is good enough to work with, and that's what I've implemented to date (modulo details such as structural comparison of attenuations attached to a reference etc.).

4

For more on assertions of interest, see here and here.

5

Karp [2015] offers a good justification of this claim along with a worked example of object-capabilities in a personal-computing setting. The capabilities are ordinary E-style capabilities rather than SAM-style capabilities, but the conclusions hold.

6

You might be wondering "why silent discard of assertions rejected by an attenuation filter?", or more generally, "why discard assertions and messages silently on any kind of failure?" The answer is related to the famous Fischer/Lynch/Paterson (FLP) result [Fischer et al 1985], where one cannot distinguish between a failed process or a slow process. By extending the reasoning to a process that simply ignores some or all of its inputs, we see that offering any kind of response at the SAM level in case of failure or rejection would be a false comfort, because nothing would prevent successful delivery of a message to a recipient which then simply discards it. Instead, processes have to agree ahead of time on the conversational frame in which they will communicate. The SAM encourages a programming style where assertions are used to set up a conversational frame, and then other interactions happen in the context of the information carried in those assertions; see the section where we revisit the cellular modem server with the components decoupled and placed in a conversational frame by addition of a dataspace to the system. Finally, and with all this said, debugging-level notifications of rejected or discarded messages have their place: it's just the SAM itself that does not include feedback of this kind. Implementions are encouraged to offer such aids to debugging.

Syndicate Protocol

Actors that share a local scope can communicate directly. To communicate further afield, scopes are connected using relay actors.1 Relays allow indirect communication: distant entities can be addressed as if they were local.

Relays exchange Syndicate Protocol messages across a transport. A transport is the underlying medium connecting one relay to its counterpart(s). For example, a TLS-on-TCP/IP socket may connect a pair of relays to one another, or a UDP multicast socket may connect an entire group of relays across an ethernet.2

Transport requirements

Transports must

  • be able to carry Preserves values back and forth,
  • be reliable and in-order,
  • have a well-defined session lifecycle (created → connected → disconnected), and
  • assure confidentiality, integrity, authenticity, and replay-resistance.

This document focuses primarily on point-to-point transports, discussing multicast and in-memory variations briefly toward the end.

Roles and session lifecycle

The protocol is completely symmetric, aside from certain conventions detailed below about the entities available for use immediately upon connection establishment. It is not a client/server protocol.

Session startup. To begin a session on a newly-established point-to-point link, a relay simply starts sending packets. Each peer starts the session with an empty entity reference map (see below) and making no assertions in either the outbound (on behalf of local entities) or inbound (on behalf of the remote peer) directions.

Session teardown. At the end of a session, terminated normally or abnormally, cleanly or through involuntary transport disconnection, all published assertions are retracted.3 This is in keeping with the essence of the Syndicated Actor Model (SAM).

Packet definitions

Packets exchanged by relays are Preserves values defined using Preserves schema.

Packet = Turn / Error / Extension .

A packet may be a turn, an error, or an extension.

Packets are neither commands nor responses; they are events.

Extension packets

Extension = <<rec> @label any @fields [any ...]> .

An extension packet must be a Preserves record, but is otherwise unconstrained.

Handling. Peers MUST ignore extensions that they do not understand.4

Error packets

Error = <error @message string @detail any>.

Handling. An error packet describes something that went wrong on the other end of the connection. Error packets are primarily intended for debugging.

Receipt of an error packet denotes that the sender has terminated (crashed) and will not respond further; the connection will usually be closed shortly thereafter.

Error packets are optional: connections may simply be closed without comment.

Turn packets

Turn      = [TurnEvent ...].
TurnEvent = [@oid Oid @event Event].
Event     = Assert / Retract / Message / Sync .

Assert = <assert @assertion Assertion @handle Handle>.
Retract = <retract @handle Handle>.
Message = <message @body Assertion>.
Sync = <sync @peer #!#t>.

Assertion = any .
Handle    = int .
Oid       = int .

A Turn is the most important packet variant. It directly reflects the SAM notion of a turn.

Handling. Each Turn carries events to be delivered to entities residing in the scope at the receiving end of the transport. Each event is either publication of an assertion, retraction of a previously-published assertion, delivery of a single message, or a synchronization event.

Upon receipt of a Turn, the sequence of TurnEvents is examined. The OID in each TurnEvent selects an entity known to the recipient. If a particular TurnEvent's OID is not mapped to an entity, the TurnEvent is silently ignored, and the remaining TurnEvents in the Turn are processed.

The assertion fields of Assert events and the body fields of Message events may contain any Preserves value, including embedded entity references. On the wire, these will always be formatted as described below. As each Assert or Message is processed, embedded references are mapped to internal references. Symmetrically, internal references are mapped to their external form prior to transmission. The mapping procedure to follow is detailed below.

Turn boundaries. In the case that the receiving party is structured internally using the SAM, it is important to preserve turn boundaries. Since turn boundaries are a per-actor concept, but a Turn mentions only entities, the receiver must map entities to actors, group TurnEvents into per-actor queues, and deliver those queues to each actor in a single SAM turn for each actor.

Uniqueness. The Handles used to refer to published assertions MUST be unique within the scope of the transport connection.

Capabilities on the wire

References embedded in Turn packets denote capabilities for interacting with some entity.

For example, assertion of a capability-bearing record could appear as the following Event:

<assert <please-reply-to #![0 555]>>

The #![0 555] is concrete Preserves text syntax for an embedded (#!) value ([0 555]).

In the Syndicate Protocol, these embedded values MUST conform to the WireRef schema:5

WireRef = @mine [0 @oid Oid] / @yours [1 @oid Oid @attenuation Caveat ...].
Oid = int .

The mine variant denotes capability references managed by the sender of a given packet; the yours variant, the receiver of the packet. A relay receiving a packet mentioning #![0 555] will use #![1 555] in later responses that refer to that same entity, and vice versa.

Attenuation of authority

A yours-variant capability may include a request6 to impose additional conditions on the receiver's use of its own capability, known as an attenuation of the capability's authority.

An attenuation is a chain of Caveats.7 A Caveat acts as a function that, given a Preserves value representing an assertion or message body, yields either a possibly-rewritten value, or no value at all.8 In the latter case, the value has been rejected. In the former case, the rewritten value is used as input to the next Caveat in the chain, or as the final assertion or message body for delivery to the entity backing the capability.

The chain of Caveats in an attenuation is written down in reverse order: newer Caveats are appended to the sequence, and each Caveat's output is fed into the input of the next leftward Caveat in the sequence. If no Caveats are present, the capability is unattenuated, and inputs are passed through to the backing capability unmodified.

Caveat = Rewrite / Alts .

Rewrite = <rewrite @pattern Pattern @template Template>.
Alts = <or @alternatives [Rewrite ...]>.

A Caveat can be either a single Rewrite or a sequence of alternative possible rewrites, tried in left-to-right order until one of them accepts the input or there are none left to try. (A single Rewrite R is equivalent to <or [R]>.)

A Rewrite applies its Pattern to the input to the Caveat. If it matches, the bindings captured by the pattern are gathered together and used in instantiation of the Rewrite's Template, yielding the output from the Caveat. If the pattern does not match, the Rewrite has rejected the input, and other alternatives are tried until none remain, at which point the whole Caveat has rejected the input and processing of the triggering event stops.

Patterns

A Pattern within a rewrite can be any of the following variants:

Pattern = PDiscard / PAtom / PEmbedded / PBind / PAnd / PNot / Lit / PCompound .

Wildcard. PDiscard matches any value:

PDiscard = <_>.

Atomic type. PAtom requires that a matched value be a boolean, a single- or double-precision float, an integer, a string, a binary blob, or a symbol, respectively:

PAtom = =Boolean / =Float / =Double / =SignedInteger / =String / =ByteString / =Symbol .

Embedded value. PEmbedded requires that a matched value be an embedded capability:

PEmbedded = =Embedded .

Binding. PBind first captures the matched value, adding it to the bindings vector, and then applies the nested pattern. If the subpattern matches, the PBind succeeds; otherwise, it fails:

PBind = <bind @pattern Pattern>.

Conjunction. PAnd is a conjunction of patterns; every pattern in patterns must match for the PAnd to match:

PAnd = <and @patterns [Pattern ...]>.

Negation. PNot is a pattern negation: if pattern matches, the PNot fails to match, and vice versa. It is an error for pattern to include any PBind subpatterns.

PNot = <not @pattern Pattern>.

Literal. Lit is an exact match pattern. If the matched value is exactly equal to value (according to Preserves' own built-in equivalence relation), the match succeeds; otherwise, it fails:

Lit = <lit @value any>.

Compound. Finally, PCompound patterns match compound data structures. The rec variant demands that a matched value be a record, with label exactly equal to label and fields one-for-one matching the Patterns in fields; the arr variant demands a sequence, with each element matching the corresponding element of items; and dict demands a dictionary having at least entries named by the keys of the entries dictionary, each matching the corresponding Pattern.

PCompound =
    / @rec <rec @label any @fields [Pattern ...]>
    / @arr <arr @items [Pattern ...]>
    / @dict <dict @entries { any: Pattern ...:... }> .

Bindings

Matching notionally produces a sequence of values, one for each PBind in the pattern.

When a PBind pattern is seen, the matcher first appends the matched value to the binding sequence and then recurses on the nested subpattern. This makes binding indexes appear in left-to-right order as a Pattern is read.

Example. Given the pattern <bind <arr [<bind <_>>, <bind <_>>]>> and the matched value ["a" "b"], the resulting captured values are, in order, ["a" "b"], "a", and "b"; the template <ref 0> will be instantiated to ["1" "2"], <ref 1> to "a", and <ref 2> to "b".

Templates

A Template within a rewrite produces a concrete Preserves value when instantiated with a vector of captured binding values. Template instantiation may fail, yielding no value.

A given Template may be any of the following variants:

Template = TAttenuate / TRef / Lit / TCompound .

TAttenuate first instantiates the sub-template. If it yields a value, and if that value is an embedded reference (i.e. a capability), the Caveats in attenuation are appended to the (possibly-empty) sequence of Caveats already present in the embedded capability. The resulting possibly-attenuated capability is the final result of instantiation of the TAttenuate.

TAttenuate = <attenuate @template Template @attenuation [Caveat ...]>.

TRef retrieves the bindingth (0-based) index into the bindings vector, yielding the associated captured value as the result of instantiation. It is an error if binding is less than zero, or greater than or equal to the number of bindings in the bindings vector.

TRef = <ref @binding int>.

Lit (the same definition as used in the grammar for Pattern above) instantiates to exactly its value argument:

Lit = <lit @value any>.

Finally, TCompound instantiates to compound data. The rec variant produces a record with the given label and fields; arr produces an array; and dict a dictionary:

TCompound =
    / @rec <rec @label any @fields [Template ...]>
    / @arr <arr @items [Template ...]>
    / @dict <dict @entries { any: Template ...:... }> .

Validity of Caveats

The above definitions imply some validity constraints on Caveats.

  • All TRefs must be bound: the index referred to must relate to the index associated with some PBind in the pattern corresponding to the template.

  • Binding under negation is forbidden: a pattern within a PNot may not include any PBind constructors.

  • The value produced by instantiation of template within a TAttenuate must be an embedded reference (a capability).

Implementations MUST enforce these constraints (either statically or dynamically).

Membranes

Every relay maintains two stateful objects called membranes. A membrane is a bidirectional mapping between OID and relay-internal entity pointer. Membranes connect embedded references on the wire to entity references local to the relay.

  • The import membrane connects OIDs managed by the remote peer to local relay entities which proxy access to an "imported" remote entity.

  • The export membrane connects OIDs managed by the local peer to any local "exported" entities accessible to the peer.

Entity "your 3" ID Pointer 0x1234 Import Membrane Export Membrane packets 0x462e Pointer ID "my 7" ... 0x462e Relay Entity 0x1234 ID "your 7" Entity 0xa043 Pointer 0x9abc 0xa043 Import Membrane Export Membrane 0x9abc ... "my 3" ID Relay Entity

Logically, a membrane's state can be represented as a set of WireSymbol structures: a WireSymbol is a triple of an OID, a local reference pointer (its ref), and a reference count. There is never more than one WireSymbol associated with an OID or a ref.

A WireSymbol exists only so long as some assertion mentioning its OID exists across the relay link. When the last assertion mentioning an OID is retracted, its WireSymbol is deleted. Assertions mentioning a particular OID can come from either side of the relay link: initially, a local reference is sent to the peer in an assertion, but then the peer may assert something back, either targeting or mentioning the same entity. Care must be taken not to release an OID entry prematurely in such situations.

For example, at least the following contribute to a WireSymbol's reference count:

  • The initial entry mapping a local entity ref to an well-known OID for use at session startup (see below) contributes a permanent reference.

  • Mention of an OID in a received or sent TurnEvent adds one to the OID's reference count for the duration of processing of the event. For Assert events in either direction, the duration of processing is until the assertion is later retracted. For received Message events, the duration of processing is until the incoming message has been forwarded on to the target ref.

"Transient" references. Embedded references in Message event bodies are special. Because messages, unlike assertions, have no notion of lifetime—they are forwarded and forgotten—it is not possible for a message to cause establishment of a long-lived entry in a membrane's WireSymbol set. Therefore, messages MUST NOT embed any reference not previously known to the peer (a "transient reference"). In other words, only after using an assertion to introduce a reference, associating a conversational context with its lifetime, is it permitted to discuss the reference using messages. A relay receiving a message bearing a transient reference MUST terminate the session with an error. A relay about to send such a message SHOULD preemptively refuse to do so.

Rewriting embedded references upon receipt

When processing a Value v in a received Assert or Message event, embedded references in v are decoded from their on-the-wire WireRef form to in-memory ref-pointer form.

The value is recursively traversed. As the relay comes across each embedded WireRef,

  • If it is of mine variant, it refers to an entity exported by the remote, sending peer. Its OID is looked up in the import membrane.

    • If no WireSymbol exists in the import membrane, one is created, mapping the OID to a fresh relay entity for the OID.

    • If a WireSymbol is already present, its associated ref is substituted into v.

  • If it is of yours variant, it refers to an entity previously exported by the local, receiving peer. Its OID is looked up in the export membrane.

    • If no WireSymbol exists for the OID, one is created, associating the OID with a dummy inert entity ref. The dummy ref is substituted into v. It will later be released once the reference count of the WireSymbol drops to zero.

    • If a WireSymbol exists for the OID, and the WireRef is not attenuated, the associated ref is substituted into v. If the WireRef is attenuated, the associated ref is wrapped with the Caveats from the WireRef before its substitution into v.

  • In each case, the WireSymbol associated with the OID has its reference count incremented (if an Assert is being processed).

Rewriting embedded references for transmission

When transmitting a Value v in an Assert or Message event, embedded references in v are encoded from their in-memory ref-pointer form to on-the-wire WireRef form.

The value is recursively traversed. As the relay comes across each embedded reference:

  • The reference is first looked up in the export membrane. If an associated WireSymbol is present in the export membrane, its OID is substituted as a mine-variant WireRef into v.

  • Otherwise, it is looked up in the import membrane. If no associated WireSymbol exists there, a fresh OID and WireSymbol are placed in the export membrane, and the new OID is substituted as a mine-variant WireRef into v. If a WireSymbol exists in the import membrane, however, the embedded reference must be a local relay entity referencing a previously-imported remote entity:

    • If the local entity reference has not been attenuated subsequent to its import, the OID it was imported under is substituted as a yours-variant WireRef into v with an empty attenuation.

    • If it has been attenuated, the relay may choose whether to trust the remote party to enforce an attenuation request. If it trusts the peer to honour attenuation requests, it substitutes a yours-variant WireRef with non-empty attenuation into v. Otherwise, a fresh OID and WireSymbol are placed in the export membrane, with ref denoting the attenuated local reference, and the new OID is substituted as a mine-variant WireRef into v.

Relay entities

A relay entity is a local proxy for an entity at the other side of a relay link. It forwards events delivered to it—assert, retract, message and sync—across the link to its counterpart at the other end. It holds two pieces of state: a pointer to the relay link, and the OID of the remote entity it represents. It packages all received events into TurnEvents which are then sent across the transport.

Turn boundaries. When the relay is structured internally using the SAM, it is important to preserve turn boundaries. When all the relay entities of a given relay instance are managed by a single actor, this will be natural: a single turn can deliver events to a group of entities in the actor, so if the relay entity enqueues its TurnEvents in a buffer which is flushed into a Turn packet sent across the transport at the conclusion of the turn, the correct turn boundaries will be preserved.

Client and server roles

While the protocol itself is symmetric, in many cases there will be one active ("client") and one passive ("server") party during the establishment of a transport connection.

As an optional convention, a "server" MAY have a single entity exposed as well-known OID 0 at the establishment of a connection, and a "client" MAY likewise expect OID 0 to resolve to some pre-arranged entity. It is frequently useful for the pre-arranged entity to be a gatekeeper service, but direct exposure of a dataspace or even some domain-specific object can also be useful. Either or both party to a connection may play one role, the other, neither, or both.

APIs for making use of relays in programs should permit programs to supply to a newly-constructed relay an (optional) initial ref, to be exposed as well-known OID 0; an (optional) initial OID, to denote a remote well-known OID and to be immediately proxied by a local relay entity; or both.

In the case of TCP/IP, the "client" role is often played by a connecting party, and the "server" by a listening party, but the opposite arrangement is also useful from time to time.

Security considerations

The security considerations for this protocol fall into two categories: those having to do with particular transports for relay instances, and those having to do with the protocol itself.

Transport security

The security of an instance of the protocol depends on the security characteristics of its transport.

Confidentiality. Parties outwith the communicating peers must not be able to deduce the contents of packets sent back and forth: some of the packets may contain secrets. For example, a Resolve message sent to a gatekeeper service contains a "bearer capability", which conveys authority to any holder able to present it to the gatekeeper.

Integrity. Packets delivered to peers must be proof from tampering or other in-flight damage.

Authenticity. Each packet delivered to a peer must have genuinely originated with another party, and must have genuinely originated in the same session. Forgery of packets must be prevented.

Replay-resistance. Each packet delivered to a peer must be delivered exactly once within the context of the transport session. That is, replay of otherwise-authentic packets must not be possible from outside the session.

Protocol security

The protocol builds on, and directly reflects, the object-capability security model of the SAM. Entities are accessed via unforgeable references (OIDs). OIDs are meaningful only within the context of their transport session; in this way, they are analogous to Unix file descriptors, which are small integers that meaningfully denote objects only within the context of a single Unix process. If the transport is secure, so is the reference.

Entities can only obtain references to other entities by the standard methods by which "connectivity begets connectivity"; namely:

  • By initial conditions. The relevant initial conditions here are the state of the relays at the moment a transport session is established, including any mappings from well-known OIDs to their underlying refs.

  • By parenthood and by endowment. No direct provision is made for creation of new entities in this protocol, so these do not apply.

  • By introduction. Transmission of OIDs in Turn packets, and the associated rules for managing the mappings between OIDs and references, are the normal method by which references pass from one entity to another.

While transport confidentiality is important for preserving secrecy of secrets such as bearer capabilities, OIDs do not need this kind of protection. An attacker able to observe OIDs communicated via a transport does not gain authority to deliver events to the denoted entity. At most, the attacker may glean information on patterns of interconnectivity among entities communicating across a transport link.

Relation to CapTP

This protocol is strikingly similar to a family of protocols known as CapTP (see, for example, here, here and here). This is no accident: the Syndicated Actor Model draws heavily on the actor model, and has over the years been incrementally evolving to be closer and closer to the actor model as it appears in the E programming language. However, the Syndicate protocol described in this document was developed based on the needs of the Syndicated Actor Model, without particular reference to CapTP. This makes it all the more striking that the similarities should be so strong. No doubt I have been subconsciously as well as consciously influenced by E's design, but perhaps there might also be a Platonic form awaiting discovery somewhere nearby.

For example:

  • CapTP has the notion of a "c-list [capability list] index", cognate with our OID. A c-list index is meaningful only within the context of a transport connection, just like an OID is. A given c-list index maps to a "live-ref", an in-memory pointer to an object, in the same way that an OID maps to a ref via a WireSymbol.

  • CapTP has "the four tables" at each end of a connection; each of our relays has two membranes, each having two unidirectional mapping tables.

  • Syndicate gatekeeper services borrow the concept of a SturdyRef directly from CapTP. However, the notion of a gatekeeper entity at well-known OID 0 is an example of convergent evolution in action: in the CapTP world, the analogous service happens also to be available at c-list index 0, by convention.

A notable difference is that this protocol completely lacks support for the promises/futures of CapTP. CapTP c-list indices are just one part of a framework of descriptors (descs) denoting various kinds of remote object and eventual remote-procedure-call (RPC) result. The SAM handles RPC in a different, more low-level way.

Specific transport mappings

For now, this document focuses on SOCK_STREAM-like transports: reliable, in-order, bidirectional, connection-oriented, fully-duplex byte streams. While these transports naturally have a certain level of integrity assurance and replay-resistance associated with them, special care should be taken in the case of non-cryptographic transport protocols like plain TCP/IP.

To use such a transport for this protocol, establish a connection and begin transmitting Packets encoded as Preserves values using either the Preserves text syntax or the Preserves binary syntax. The session starts with the first packet and ends with transport disconnection. If either peer in a connection detects a syntax error, it MUST disconnect the transport. A responding server MUST support the binary syntax, and MAY also support the text syntax. It can autodetect the syntax variant by following the rules in the specification: the first byte of a valid binary-syntax Preserves document is guaranteed not to be interpretable as the start of a valid UTF-8 sequence.

Packets encoded in either binary or text syntax are self-delimiting. However, peers using text syntax MAY choose to insert whitespace (e.g. newline) after each transmitted packet.

Some domain-specific details are also relevant:

  • Unix-domain sockets. An additional layer of authentication checks can be made based on process-ID and user-ID credentials associated with each Unix-domain socket.

  • TCP/IP sockets. Plain TCP/IP sockets offer only weak message integrity and replay-resistance guarantees, and offer no authenticity or confidentiality guarantees at all. Plain TCP/IP sockets SHOULD NOT be used; consider using TLS sockets instead.

  • TLS atop TCP/IP. An additional layer of authentication checks can be made based on the signatures and certificates exchanged during TLS setup.

    TODO: concretely develop some recommendations for ordinary use of TLS certificates, including referencing a domain name in a SturdyRef, checking the presented certificate, and requiring SNI at the server end.

  • WebSockets atop HTTP 1.x. These suffer similar flaws to plain TCP/IP sockets and SHOULD NOT be used.

  • WebSockets atop HTTPS 1.x. Similar considerations to the use of TLS sockets apply regarding authentication checks. WebSocket messages are self-delimiting; peers MUST place exactly one Packet in each WebSocket message. Since (a) WebSockets are established after a standard HTTP(S) message header exchange, (b) every HTTP(S) request header starts with an ASCII letter, and (c) every Packet in text syntax begins with the ASCII "<" character, it is possible to autodetect use of a WebSocket protocol multiplexed on a server socket that is also able to handle plain Preserves binary and/or text syntax for Packets: any ASCII character between "A" and "Z" or "a" and "z" must be HTTP, an ASCII "<" must be Preserves text syntax, and any byte with the high bit set must be Preserves binary syntax.

Appendix: Complete schema of the protocol

The following is a consolidated form of the definitions from the text above.

Protocol packets

The authoritative version of this schema is [syndicate-protocols]/schemas/protocol.prs.

version 1 .

Packet = Turn / Error / Extension .

Extension = <<rec> @label any @fields [any ...]> .

Error = <error @message string @detail any>.

Assertion = any .
Handle    = int .
Event     = Assert / Retract / Message / Sync .
Oid       = int .
Turn      = [TurnEvent ...].
TurnEvent = [@oid Oid @event Event].

Assert = <assert @assertion Assertion @handle Handle>.
Retract = <retract @handle Handle>.
Message = <message @body Assertion>.
Sync = <sync @peer #!#t>.

Capabilities, WireRefs, and attenuations

The authoritative version of this schema is [syndicate-protocols]/schemas/sturdy.prs.

version 1 .

Attenuation = [Caveat ...].

Caveat = Rewrite / Alts .
Rewrite = <rewrite @pattern Pattern @template Template>.
Alts = <or @alternatives [Rewrite ...]>.

Oid = int .
WireRef = @mine [0 @oid Oid] / @yours [1 @oid Oid @attenuation Caveat ...].

Lit = <lit @value any>.

Pattern = PDiscard / PAtom / PEmbedded / PBind / PAnd / PNot / Lit / PCompound .
PDiscard = <_>.
PAtom = =Boolean / =Float / =Double / =SignedInteger / =String / =ByteString / =Symbol .
PEmbedded = =Embedded .
PBind = <bind @pattern Pattern>.
PAnd = <and @patterns [Pattern ...]>.
PNot = <not @pattern Pattern>.
PCompound =
    / @rec <rec @label any @fields [Pattern ...]>
    / @arr <arr @items [Pattern ...]>
    / @dict <dict @entries { any: Pattern ...:... }> .

Template = TAttenuate / TRef / Lit / TCompound .
TAttenuate = <attenuate @template Template @attenuation Attenuation>.
TRef = <ref @binding int>.
TCompound =
    / @rec <rec @label any @fields [Template ...]>
    / @arr <arr @items [Template ...]>
    / @dict <dict @entries { any: Template ...:... }> .

Appendix: Pseudocode for attenuation, pattern matching, and template instantiation

Attenuation

def attenuate(attenuation, value):
    for caveat in reversed(attenuation):
        value = applyCaveat(caveat, value)
        if value is None:
            return None
    return value

def applyCaveat(caveat, value):
    if caveat is 'Alts' variant:
        for rewrite in caveat.alternatives:
            possibleResult = tryRewrite(rewrite, value);
            if possibleResult is not None:
                return possibleResult
        return None
    if caveat is 'Rewrite' variant:
        return tryRewrite(caveat, value)

def tryRewrite(rewrite, value):
    bindings = applyPattern(rewrite.pattern, value)
    if bindings is None:
        return None
    else:
        return instantiateTemplate(rewrite.template, bindings)

Pattern matching

def match(pattern, value, bindings):
    if pattern is 'PDiscard' variant:
        return True
    if pattern is 'PAtom' variant:
        return True if value is of the appropriate atomic class else False
    if pattern is 'PEmbedded' variant:
        return True if value is a capability else False
    if pattern is 'PBind' variant:
        append value to bindings
        return match(pattern.pattern, value, bindings)
    if pattern is 'PAnd' variant:
        for p in pattern.patterns:
            if not match(p, value, bindings):
                return False
        return True
    if pattern is 'PNot' variant:
        return False if match(pattern.pattern, value, bindings) else True
    if pattern is 'Lit' variant:
        return (pattern.value == value)
    if pattern is 'PCompound' variant:
        if pattern is 'rec' variant:
            if value is not a record: return False
            if value.label is not equal to pattern.label: return False
            if value.fields.length is not equal to pattern.fields.length: return False
            for i in [0 .. pattern.fields.length):
                if not match(pattern.fields[i], value.fields[i], bindings):
                    return False
            return True
        if pattern is 'arr' variant:
            if value is not a sequence: return False
            if value.length is not equal to pattern.items.length: return False
            for i in [0 .. pattern.items.length):
                if not match(pattern.items[i], value[i], bindings):
                    return False
            return True
        if pattern is 'dict' variant:
            if value is not a dictionary: return False
            for k in keys of pattern.entries:
                if k not in keys of value: return False
                if not match(pattern.entries[k], value[k], bindings):
                    return False
            return True

Template instantiation

def instantiate(template, bindings):
    if template is 'TAttenuate' variant:
        c = instantiate(template.template, bindings)
        if c is not a capability: raise an exception
        c′ = c with the caveats in template.attenuation appended to the existing
             attenuation in c
        return c′
    if template is 'TRef' variant:
        if 0 ≤ template.binding < bindings.length:
            return bindings[template.binding]
        else:
            raise an exception
    if template is 'Lit' variant:
        return template.value
    if template is 'TCompound' variant:
        if template is 'rec' variant:
            return Record(label=template.label,
                          fields=[instantiate(t, bindings) for t in template.fields])
        if template is 'arr' variant:
            return [instantiate(t, bindings) for t in template.items]
        if template is 'dict' variant:
            result = {}
            for k in keys of template.entries:
                result[k] = instantiate(template.entries[k], bindings)
            return result

Notes

1

Strictly speaking, scope subnets are connected by relay actors. The situation is directly analogous to IP subnets being connected by IP routers.

2

In fact, it makes perfect sense to run the relay protocol between actors that are already connected in some scope: this is like running a VPN, tunnelling IP over IP. A variation of the Syndicate Protocol like this gives federated dataspaces.

3

This process of assertion-retraction on termination is largely automatic when relay actors are structured internally using the SAM: simply terminating a SAM actor automatically retracts its published assertions.

4

This specification does not define any extensions, but future revisions could, for example, use extensions to perform version-negotiation. Another potential future use could be to propagate provenance information for tracing/debugging.

5

The syntax for WireRefs is slightly silly, using tuples as quasi-records with 0 and 1 acting as quasi-labels. It would probably be better to use real records, like <my @oid Oid> and <yours @oid Oid @attenuation [Caveat ...]>. Pros: less cryptic. Cons: slightly more verbose on the wire. TODO: should we revise the spec in this regard?

6

Such conditions can only ever be requests: after all, every yours-capability is already completely accessible to the recipient of the packet. Similarly, it does not make sense to include an attenuation description on a my-capability. However, in every case, if a party wishes to enforce an attenuation on a my- or yours-capability, it may record the attenuation against the underlying capability internally, issuing to its peers a fresh my-capability denoting the attenuated capability.

7

This terminology, "caveat", is lifted from the excellent paper on Macaroons, where it is used to describe a more general mechanism. Future versions of this specification may opt to include some of this generality.

8

TODO: It might be better to have a Caveat yield zero or more values? That way they can act as filters. I've sometimes wanted the multiple-value case, though I've so far been able to work around its lack. TODO: Perhaps it would also make sense to have a Caveat map an event to zero or more events, rather than to values? Tricky corners there include ensuring that carried authority isn't misused; macaroons are a very elegant solution to this problem, of course, so maybe the macaroon design idea could be adapted to this. For now, ValueOption<Value> is probably OK.