Synit is a Reactive Operating System
Welcome!
Synit is an experiment in applying pervasive reactivity and object capabilities to the System Layer of an operating system for personal computers, including laptops, desktops, and mobile phones. Its architecture follows the principles of the Syndicated Actor Model.
Synit builds upon the Linux kernel, but replaces many pieces of
familiar Linux software, including systemd
, NetworkManager
,
D-Bus
, and so on. It makes use of many concepts that will be
familiar to Linux users, but also incorporates many ideas drawn from
programming languages and operating systems not closely connected with
Linux's Unix heritage.
- Project homepage: https://synit.org/
- Source code: https://git.syndicate-lang.org/synit/
Quickstart
If you have a mobile phone or computer capable of running PostmarketOS, then you can install the software to try it out. You can also run Synit inside a virtual machine.
See the installation instructions for a list of supported devices.
Acknowledgements
Much initial work on Synit was made possible by a generous grant from the NLnet Foundation as part of the NGI Zero PET programme. Please see "Structuring the System Layer with Dataspaces (2021)" for details of the funded project.
Copyright and License
This manual is licensed under a Creative Commons Attribution 4.0 International License.
Copyright © 2021–2022 Tony Garnock-Jones tonyg@leastfixedpoint.com.
The Synit programs and source code are separately licensed. Please see the source code for details.
Architecture
The Syndicated Actor Model (SAM) is at the heart of Synit. In turn, the SAM builds upon E-style actors, replacing message-exchange with eventually-consistent state replication as the fundamental building block for interaction. Both E and the SAM are instances of the Object Capability (ocap) model, a compositional approach to system security.
The "feel" of the system is somewhere between Smalltalk-style object-orientation, publish-subscribe programming, E- or Erlang-style actor interaction, Prolog-style logic programming, and Esterel-style reactive dataflow.
-
Programs are Actors. Synit programs ("actors" in the SAM) interoperate by dataspace-mediated exchange of messages and replication of conversational state expressed as assertions.
-
Ocaps for security and privacy. The ocap model provides the fundamental building blocks for secure composition of programs in the system. Synit extends the core ocap model with Macaroon-inspired attenuation of capabilities, for both limiting visibility of state and constraining access to behaviour.
-
Reactivity and homeostasis. Programs publish relevant aspects of their internal state to peers (usually by placing assertions in a dataspace). Peers subscribe to those assertions, reacting to changes in state to preserve overall system equilibrium.
-
Heterogeneous; "open". Different programming languages and styles interoperate freely. Programs may or may not be structured internally using SAM principles: the system as a whole is where the architectural principles are applied. However, it often makes good sense to use SAM principles within a given Synit program as well as between programs.
-
Language-neutral. Where possible, programs interoperate via a simple protocol across transports like TCP/IP, WebSockets, and Unix sockets and pipes. Otherwise, they interoperate using traditional Unix techniques. The concrete syntax for the messages and assertions exchanged among programs is the Preserves data language.
-
Strongly typed. Preserves Schemas describe the data exchanged among programs. Schemas compile to type definitions in various programming languages, helping give an ergonomic development experience as well as ensuring safety at runtime.
Source code, Building, and Installation
The initial application of Synit is to mobile phones.
As such, in addition to regular system layer concepts, Synit supports concepts from mobile telephony: calls, SMSes, mobile data, headsets, speakerphone, hotspots, battery levels and charging status, and so on.
Synit builds upon many existing technologies, but primarily relies on the following:
-
PostmarketOS. Synit builds on PostmarketOS, replacing only a few core packages. All of PostmarketOS and Alpine Linux are available underneath Synit.
-
Preserves. The Preserves data language and its associated schema and query languages are central to Synit.
-
Syndicate. Syndicate is an umbrella project for tools and specifications related to the Syndicated Actor Model (the SAM).
You will need
-
A Linux development system. (I use Debian testing/unstable.)
-
Rust nightly and Cargo (perhaps installed via rustup).
-
Python 3.9 or greater
-
git
,ssh
,rsync
-
Make, a C compiler, and so on; standard Unix programming tools.
-
For cross builds (e.g. the very common case of building for
aarch64
on anx86_64
host),qemu
and itsbinfmt
support. On Debian,apt install binfmt-support qemu-user-static
. (NB. Version1:7.0+dfsg-7
ofqemu-user-static
has a bug (possibly this one) which makes Docker-based cross builds hang. Downgradingqemu-user-static
to version1:5.2+dfsg-11+deb11u2
worked for me.) -
Source code for Synit components (see below).
-
A standard PostmarketOS distribution for the target computer or mobile phone. If you don't want to install on actual hardware, you can use a virtual machine. See the instructions for installing PostmarketOS.
-
Great tolerance for the possibility of soft-bricking your phone. This is experimental software! When it breaks, you'll often have to (at least) reinstall PostmarketOS from absolute scratch on the machine. I do lots of development using
qemu-amd64
for this reason.
Get the code
The Synit codebase itself is contained in the synit
git
repository:
git clone https://git.syndicate-lang.org/synit/synit
See the README for an overview of the contents of the repository.
Synit depends on published packages for Preserves and Syndicate support in each of the many programming languages it uses. These will be automatically found and downloaded during the Synit build process, but you can find details on the Preserves and Syndicate homepages, respectively.
For the Smalltalk-based phone-management and UI part of the system, you will need a number of
other tools. See the
README for the
squeak-phone
repository:
git clone https://git.syndicate-lang.org/tonyg/squeak-phone
Build the packages
To build, type make ARCH=
<architecture> in the root of your checkout, where
<architecture> is one of:
aarch64
(default), for e.g. Pinephone or Samsung Galaxy S7 deploymentx86_64
, for e.g.qemu-amd64
deployment
If you see errors of the form "exec /bin/sh: exec format error
" while building, say, the
aarch64
packages using an x86_64
build host, you need to install qemu's binfmt support. See
above.
The result of the build will be a collection of Alpine Linux apk
packages in
packaging/target/packages/
<architecture>/
. At the time of writing, these include
preserves-schemas
, common schema files for working with general Preserves data and schemaspreserves-tools
, standard command-line tools for working with Preserves documents (pretty-printer, document query processor, etc.)py3-preserves
, python support libraries for Preservespy3-syndicate
, python support for the Syndicated Actor Modelsqueak-cog-vm
andsqueak-stack-vm
, Squeak Smalltalk virtual machine for the Smalltalk-based portions of the systemsyndicate-schemas
, common schema files for working with the Syndicated Actor Modelsyndicate-server
, package for the core system bussynit-config
, main package for Synit, with configuration files,init
scripts, system daemons and so on.synit-pid1
, PID1 program for Synit that starts the core system bus and then becomes passive
Install PostmarketOS on your system
Follow the instructions for your device on the PostmarketOS wiki.
Boot and connect your device to your development machine. Make sure you can ssh
into it.
Upload Synit packages
Use scripts/upload-bundle.sh
to rsync the ingredients for transforming stock PostmarketOS to
Synit to the phone.
Run the transformation script
Use ssh
to log into your phone. Run ./transmogrify.sh
. (If your user's password on the
phone is anything other than user
, you will have to run SUDOPASS=yourpassword ./transmogrify.sh
.)
This will install the Synit packages. After this step is complete, next time you boot the system, it will boot into Synit. It may very well be unbootable at this point, depending on the state of the codebase! Make sure you know how to restore to a stock PostmarketOS installation.
Install the Smalltalk parts of the system (optional)
If you want to experiment with the Smalltalk-based modem support and UI, follow the instructions in the squeak-phone README now.
Reboot and hope
With luck, you'll see the Smalltalk user interface start up. (If you didn't install the UI, you
should still be able to ssh
into the system.) From here, you can operate the system normally,
following the information in the following chapter.
Glossary
Action
In the Syndicated Actor Model, an action may be performed by an actor during a turn. Actions are quasi-transactional, taking effect only if their containing turn is committed.
Four core varieties of action, each with a matching variety of event, are offered across all realisations of the SAM:
-
An assertion action publishes an assertion at a target entity. A unique handle names the assertion action so that it may later be retracted. For more detail, see below on Assertions.
-
A retraction action withdraws a previously-published assertion from the target entity.
-
A message action sends a message to a target entity.
-
A synchronization action carries a local entity reference to a target entity. When it eventually reaches the target, the target will (by default) immediately reply with a simple acknowledgement to the entity reference carried in the request. For more detail, see below on Synchronization.
Beside the core four actions, many individual implementations offer action variants such as the following:
-
A spawn action will, when the containing turn commits, create a new actor running alongside the acting party. In many implementations, spawned actors may optionally be linked to the spawning actor.
-
Replacement of a previously-established assertion, "altering" the target entity reference and/or payload. This proceeds, conventionally, by establishment of the new assertion followed immediately by retraction of the old.
Finally, implementations may offer pseudo-actions whose effects are local to the acting party:
-
Creation of a new facet.
-
Creation of a new entity reference associated with the active facet denoting a freshly-created local entity.
-
Shutdown (stopping) of the active facet or any other facet within the acting party.
-
Stopping of the current actor, either gracefully or with a simulated crash.
-
Creation of a new field/cell/dataflow variable.
-
Creation of a new dataflow block.
-
Creation of a new linked task associated with the active facet.
-
Scheduling of a new one-off or periodic alarm.
Active Facet
The facet associated with the event currently being processed in an active turn.
Actor
In the Syndicated Actor Model, an actor is an isolated thread of execution. An actor repeatedly takes events from its mailbox, handling each in a turn. In many implementations of the SAM, each actor is internally structured as a tree of facets.
Alarm
See timeout.
Assertion
-
verb. To assert (or to publish) a value is to choose a target entity and perform an action conveying an assertion to that entity.
-
noun. An assertion is a value carried as the payload of an assertion action, denoting a relevant portion of a public aspect of the conversational state of the sending party that it has chosen to convey to the recipient entity.
The value carried in an assertion may, in some implementations, depend on one or more dataflow variables; in those implementations, when the contents of such a variable changes, the assertion is automatically withdrawn, recomputed, and re-published (with a fresh handle).
Attenuation
To attenuate a capability (yielding an attenuated capability), a sequence of filters is prepended to the possibly-empty list of filters attached to an existing capability. Each filter either discards, rewrites, or accepts unchanged any payload directed at the underlying capability. A special pattern language exists in the Syndicate network protocol for describing filters; many implementations also allow in-memory capabilities to be filtered by the same language.
Capability
(a.k.a. Cap) Used roughly interchangeably with "reference", connoting a security-, access-control-, or privacy-relevant aspect.
Cell
See dataflow variable.
Compositional
To quote the Stanford Encyclopedia of Philosophy, the "principle of compositionality" can be understood to be that
The meaning of a complex expression is determined by its structure and the meanings of its constituents.
People often implicitly intend "... and nothing else." For example, when I claim that the object-capability model is a compositional approach to system security, I mean that the access conveyed by an assemblage of capabilties can be understood in terms of the access conveyed by each individual capability taken in isolation, and nothing else.
Configuration Scripting Language
Main article: The Configuration Scripting Language
The syndicate-server
program includes a scripting language, used for configuration of the
server and its clients, population of initial dataspaces for the system that the
syndicate-server
instance is part of, and scripting of simple behaviours in reaction to
appearance of assertions or transmission of messages.
The scripting language is documented here.
Conversational State
The collection of facts and knowledge held by a component participating in an ongoing conversation about some task that the component is undertaking:
The conversational state that accumulates as part of a collaboration among components can be thought of as a collection of facts. First, there are those facts that define the frame of a conversation. These are exactly the facts that identify the task at hand; we label them “framing knowledge”, and taken together, they are the “conversational frame” for the conversation whose purpose is completion of a particular shared task. Just as tasks can be broken down into more finely-focused subtasks, so can conversations be broken down into sub-conversations. In these cases, part of the conversational state of an overarching interaction will describe a frame for each sub-conversation, within which corresponding sub-conversational state exists. The knowledge framing a conversation acts as a bridge between it and its wider context, defining its “purpose” in the sense of the [Gricean Cooperative Principle]. [The following figure] schematically depicts these relationships.
Some facts define conversational frames, but every shared fact is contextualized within some conversational frame. Within a frame, then, some facts will pertain directly to the task at hand. These, we label “domain knowledge”. Generally, such facts describe global aspects of the common problem that remain valid as we shift our perspective from participant to participant. Other facts describe the knowledge or beliefs of particular components. These, we label “epistemic knowledge”.
— Excerpt from Chapter 2 of (Garnock-Jones 2017). The quoted section continues here.
In the Syndicated Actor Model, there is often a one-to-one correspondence between a facet and a conversational frame, with fate-sharing employed to connect the lifetime of the one with the lifetime of the other.
Dataflow
A programming model in which changes in stored state automatically cause re-evaluation of computations depending on that state. The results of such re-evaluations are themselves often used to update a store, potentially triggering further re-computation.
In the Syndicated Actor Model, dataflow appears in two guises: first, at a coarse granularity, among actors and entities in the form of changes in published assertions; and second, at fine granularity, many implementations include dataflow variables and dataflow blocks for intra-actor dataflow-based management of conversational state and related computation.
Dataflow Block
Implementations of the Syndicated Actor Model often include some language feature or library operation for marking a portion of code as participating in dataflow, where changes in observed dataflow variables cause re-evaluation of the code block.
For example, in a Smalltalk implementation of the SAM,
a := Turn active cell: 1.
b := Turn active cell: 2.
sum := Turn active cell: 0.
Turn active dataflow: [sum value: a value + b value].
Later, as a
and b
have their values updated, sum
will automatically be updated by
re-evaluation of the block given to the dataflow:
method.
Analogous code can be written in TypeScript:
field a: number = 1;
field b: number = 2;
field sum: number = 0;
dataflow {
sum.value = a.value + b.value;
}
in Racket:
(define-field a 1)
(define-field b 2)
(define/dataflow sum (+ (a) (b)))
in Python:
a = turn.field(1)
b = turn.field(2)
sum = turn.field(0)
@turn.dataflow
def maintain_sum():
sum.value = a.value + b.value
and in Rust:
turn.dataflow(|turn| {
let a_value = turn.get(&a);
let b_value = turn.get(&b);
turn.set(&sum, a_value + b_value);
})
Dataflow Variable
(a.k.a. Field, Cell) A dataflow variable is a store for a single value, used with dataflow blocks in dataflow programming.
When the value of a dataflow variable is read, the active dataflow block is marked as depending on the variable; and when the value of the variable is updated, the variable is marked as damaged, leading eventually to re-evaluation of dataflow blocks depending on that variable.
Dataspace
In the Syndicated Actor Model, a dataspace is a particular class of entity with prescribed behaviour. Its role is to route and replicate published assertions according to the declared interests of its peers.
See here for a full explanation of dataspaces.
Dataspace Pattern
In the Syndicated Actor Model, a dataspace pattern is a structured value describing a pattern over other values.
TODO: link to documentation
E
The E programming language is an object-capability model Actor language that has strongly influenced the Syndicated Actor Model.
Many good sources exist describing the language and its associated philosophy, including:
-
The ERights.org website, the home of E
-
E (programming language) on Wikipedia
-
Miller, Mark S. “Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control.” PhD, Johns Hopkins University, 2006. [PDF]
-
Miller, Mark S., E. Dean Tribble, and Jonathan Shapiro. “Concurrency Among Strangers.” In Proc. Int. Symp. on Trustworthy Global Computing, 195–229. Edinburgh, Scotland, 2005. [DOI] [PDF]
Embedded References
In the Syndicated Actor Model, the values carried by assertions and messages may include references to entities. Because the SAM uses Preserves as its data language, the Preserves concept of an embedded value is used in the SAM to reliably mark portions of a datum referring to SAM entities.
Concretely, in Preserves text
syntax, embedded values
appear prepended with #!
. In messages transferred across links using the Syndicate network
protocol, references might appear as #![0 123]
, #![1 555]
, etc. etc.
Entity
In the Syndicated Actor Model, an entity is a stateful programming-language construct, located within an actor, that is the target of events. Each entity has its own behaviour, specifying in code how it responds to incoming events.
An entity is the SAM analogue of "object" in E-style languages: an addressable construct logically contained within and fate-sharing with an actor. The concept of "entity" differs from "object" in that entities are able to respond to assertions, not just messages.
In many implementations of the SAM, entities fate-share with individual facets within their containing actor rather than with the actor as a whole: when the facet associated with an entity is stopped, the entity becomes unresponsive.
Erlang
Erlang is a process-style Actor language that has strongly influenced the Syndicated Actor Model. In particular, Erlang's approach to failure-handling, involving supervisors arranged in supervision trees and processes (actors) connected via links and monitors, has been influential on the SAM. In the SAM, links and monitors become special cases of assertions, and Erlang's approach to process supervision is used directly and is an important aspect of SAM system organisation.
Event
In the Syndicated Actor Model, an event is processed by an entity during a turn, and describes the outcome of an action taken by some other actor.
Events come in four varieties corresponding to the four core actions in the SAM:
-
An assertion event notifies the recipient entity of an assertion published by some peer. A unique handle names the event so that later retraction of the assertion can be correlated with the assertion event.
-
A retraction event notifies the recipient entity of withdrawal of a previously-published assertion.
-
A message event notifies the recipient entity of a message sent by some peer.
-
A synchronization event, usually not handled explicitly by an entity, carries an entity reference. The recipient should arrange for an acknowledgement to be delivered to the referenced entity once previously-received events that might modify the recipient's state (or the state of a remote entity that it is proxy for) have been completely processed. For more detail, see below on Synchronization.
Facet
In many implementations of the Syndicated Actor Model, a facet is a programming-language construct representing a conversation and corresponding to a conversational frame. Facets are similar to the "nested threads" of Martin Sústrik's idea of Structured Concurrency (see also Wikipedia).
Every actor is structured as a tree of facets. (Compare and contrast with the diagram in the entry for Conversational State.)
Every facet is either "running" or "stopped". Each facet is the logical owner of zero or more entities as well as of zero or more published assertions. A facet's entities and published assertions share its fate. While a facet is running, its associated entities are responsive to incoming events; when it stops, its entities become permanently unresponsive. A stopped facet never starts running again. When a facet is stopped, all its assertions are retracted and all its subfacets are also stopped.
Facets may have stop handlers associated with them: when a facet is stopped, its stop handlers are executed, one at a time. The stop handlers of each facet are executed before the stop handlers of its parent and before its assertions are withdrawn.
Facets may be explicitly stopped by a stop action, or implicitly stopped when an actor crashes. When an actor crashes, its stop handlers are not run: stop handlers are for orderly processing of conversation termination. Instead, many implementations allow actors to have associated crash handlers which run only in case of an actor crash. In the limit, of course, even crash handlers cannot be guaranteed to run, because the underlying hardware or operating system may suffer some kind of catastrophic failure.
Fate-sharing
A design principle from large-scale network engineering, due to David Clark:
The fate-sharing model suggests that it is acceptable to lose the state information associated with an entity if, at the same time, the entity itself is lost.
— David D. Clark, “The Design Philosophy of the DARPA Internet Protocols.” ACM SIGCOMM Computer Communication Review 18, no. 4 (August 1988): 106–14. [DOI]
In the Syndicated Actor Model, fate-sharing is used in connecting the lifetime of conversational state with the programming language representation of a conversational frame, a facet.
Field
See dataflow variable.
Handle
In the Syndicated Actor Model, every assertion action (and the corresponding event) includes a scope-lifetime-unique handle that denotes the specific action/event concerned, for purposes of later correlation with a retraction action.
Handles are, in many cases, implemented as unsigned integers, allocated using a simple scope-wide counter.
Initial OID
In the Syndicate network protocol, the initial OID is a special OID value understood by prior arrangement to denote an entity (specified by the "initial ref") owned by some remote peer across some network medium. The initial OID of a session is used to bootstrap activity within that session.
Initial Ref
In the Syndicate network protocol, the initial ref is a special entity reference associated by prior arrangement with an initial OID in a session in order to bootstrap session activity.
Linked Actor
Many implementations of the Syndicated Actor Model offer a feature whereby an actor can be spawned so that its root facet is linked to the spawning facet in the spawning actor, so that when one terminates, so does the other (by default).
Links are implemented as a pair of "presence" assertions, atomically established at the time of the spawn action, each indicating to a special entity with "stop on retraction" behaviour the presence of its peer. When one of these assertions is withdrawn, the targetted entity stops its associated facet, automatically terminating any subfacets and executing any stop handlers.
This allows a "parent" actor to react to termination of its child, perhaps releasing associated resources, and the corresponding "child" actor to be automatically terminated when the facet in its parent that spawned the actor terminates.
This idea is inspired by Erlang, whose "links" are symmetric, bidirectional, failure-propagating connections among Erlang processes (actors) and whose "monitors" are unidirectional connections similar to the individual "presence" assertions described above.
Linked Task
Many implementations of the Syndicated Actor Model offer the ability to associate a facet with zero or more native threads, coroutines, objects, or other language-specific representations of asynchronous activities. When such a facet stops (either by explicit stop action or by crash-termination of the facet's actor), its linked tasks are also terminated. By default, the converse is also the case: a terminating linked task will trigger termination of its associated facet. This allows for resource management patterns similar to those enabled by the related idea of linked actors.
Macaroon
A macaroon is an access token for authorization of actions in distributed systems. Macaroons were introduced in the paper:
“Macaroons: Cookies with Contextual Caveats for Decentralized Authorization in the Cloud.”, by Arnar Birgisson, Joe Gibbs Politz, Úlfar Erlingsson, Ankur Taly, Michael Vrable, and Mark Lentczner. In Proc. Network and Distributed System Security Symposium (NDSS), 2014. [PDF]
In the Syndicated Actor Model, a variation of the macaroon concept is used to represent "sturdyrefs". A sturdyref is a long-lived token authorizing interaction with some entity, which can be upgraded to a live entity reference by presenting it to a gatekeeper entity (TODO: link) across a session of the Syndicate network protocol. (The term "sturdyref" is lifted directly from the E language and associated ecosystem.)
Mailbox
Every actor notionally has a mailbox which receives events resulting from its peers' actions. Each actor spends its existence waiting for an incoming event to appear in its mailbox, removing the event, taking a turn to process it, and repeating the cycle.
Membrane
A membrane is a structure used in implementations of the Syndicate network protocol to keep track of wire symbols.
Message
In the Syndicated Actor Model, a message is a value carried as the payload or body of a message action (and associated event), conveying transient information from some sending actor to a recipient entity.
Network
A network is a group of peers (actors), plus a medium of communication (a transport), an addressing model (references), and an associated scope.
Object-Capability Model
The Object-capability model is a compositional means of expressing access control in a distributed system. It has its roots in operating systems research stretching back decades, but was pioneered in a programming language setting by the E language and the Scheme dialect W7.
In the Syndicated Actor Model, object-capabilities manifest as potentially-attenuated entity references.
Observe
In the Syndicated Actor Model, assertion of an Observe
record at a dataspace
declares an interest in receiving notifications about matching assertions and
messages as they are asserted, retracted and sent through the dataspace.
Each Observe
record contains a dataspace pattern describing a structural predicate over
assertion and message payloads, plus an entity reference to the entity which should
be informed as matching events appear at the dataspace.
OID
An OID is an "object identifier", a small, session-unique integer acting as an entity reference across a transport link in an instance of the Syndicate network protocol.
Publishing
To publish something is to assert it; see assertion.
Preserves
Main article: Preserves
Many implementations of the SAM use Preserves, a programming-language-independent language for data, as the language defining the possible values that may be exchanged among entities in assertions and values.
See the chapter on Preserves in this manual for more information.
Record
The Preserves data language defines the notion of a record, a tuple containing a label and zero or more numbered fields. The dataspace pattern language used by dataspaces allows for patterns over records as well as over other compound data structures.
Reference
(a.k.a. Ref, Entity Reference, Capability) A reference is a pointer or handle denoting a live, stateful entity running within an actor. The entity accepts Preserves-format messages and/or assertions. The capability may be attenuated to restrict the messages and assertions that may be delivered to the denoted entity by way of this particular reference.
Retraction
In the Syndicated Actor Model, a retraction is an action (and corresponding event) which withdraws a previous assertion. Retractions can be explicitly performed within a turn, or implicitly performed during facet shutdown or actor termination (both normal termination and crash stop).
The SAM guarantees that an actor's assertions will be retracted when it terminates, no matter whether an orderly shutdown or an exceptional or crashing situation was the cause.
Relay
A relay connects scopes, allowing references to denote entities resident in remote networks, making use of the Syndicate network protocol to do so.
See the Syndicate network protocol for more on relays.
Relay Entity
A relay entity is a local proxy for an entity at the other side of a relay link. It forwards events delivered to it across its transport to its counterpart at the other end.
See the Syndicate network protocol for more on relay entities.
S6
S6, "Skarnet's Small Supervision Suite", is
a small suite of programs for UNIX, designed to allow process supervision (a.k.a service supervision), in the line of daemontools and runit, as well as various operations on processes and daemons.
Synit uses s6-log
to capture standard error
output from the root system bus.
Schema
A schema defines a mapping between values and host-language types in various programming languages. The mapping describes how to parse values into host-language data, as well as how to unparse host-language data, generating equivalent values. Another way of thinking about a schema is as a specification of the allowable shapes for data to be used in a particular context.
Synit, and many programs making use of the Syndicated Actor Model, uses Preserves' schema language to define schemas for many different applications.
For more, see the section on schemas in the chapter on Preserves.
Scope
A scope maps refs to the entities they denote. Scopes exist in one-to-one relationship to networks. Because message bodies and asserted values contain embedded references, each message and assertion transmitted via some network is also inseparable from its scope.
Most actors will participate in a single scope. However, relay actors participate in two or more scopes, translating refs back and forth as messages and assertions traverse the relay.
Examples.
-
A process is a scope for in-memory values: in-memory refs contain direct pointers to entities, which cannot be interpreted outside the context of the process's address space. The "network" associated with the process's scope is the intra-process graph of object references.
-
A TCP/IP socket (or serial link, or WebSocket, or Unix socket, etc.) is a scope for values travelling between two connected processes: refs on the wire denote entities owned by one or the other of the two participants. The "network" for a socket's scope is exactly the two connected peers (NB. and is not the underlying TCP/IP network, HTTP network, or Unix kernel that supports the point-to-point link).
-
An ethernet segment is a scope for values broadcast among stations: the embedded refs are (MAC address, OID) pairs. The network is the set of participating peers.
-
A running web page is a scope for the JavaScript objects it contains: both local and remote entities are represented by JavaScript objects. The "network" is the JavaScript heap.
Subscription
See observation.
Supervision tree
A supervision tree is a concept borrowed from Erlang, where a root supervisor supervises other supervisors, which in turn supervise worker actors engaged in some task. As workers fail, their supervisors restart them; if the failures are too severe or too frequent, their direct supervisors fail in turn, and the supervisors' supervisors take action to recover from the failures.
Supervisor
A supervisor is an actor or facet whose role is to monitor the state of some service, taking action to ensure its availability to other portions of a complete system. When the service fails, the supervisor is able to restart it. If the failures are too severe or too frequent, the supervisor can take an alternative action, perhaps pausing for some time before retrying the service, or perhaps even terminating itself to give its own supervisor in a supervision tree a chance to get things back on track.
Synit uses supervisors extensively to monitor system daemons and other system services.
Sync Peer Entity
The sync peer entity is the entity reference carried in a synchronization action or event.
Synchronization
An actor may synchronize with an entity by scheduling a synchronization action targeted at that entity. The action will carry a local entity reference acting as a continuation. When the target entity eventually responds, it will transmit an acknowledgement to the continuation entity reference carried in the request.
An entity receiving a synchronization event should arrange for an acknowledgement to be delivered to the referenced continuation entity once previously-received events that might modify the recipient's state (or the state of a remote entity that it is proxy for) have been completely processed.
Most entities do not explicitly include code for responding to synchronization requests. The default code, which simply replies to the continuation immediately, usually suffices. However, sometimes the default is not appropriate. For example, when relay entity is proxying for some remote entity via a relay across a transport, it should react to synchronization events by forwarding them to the remote entity. When the remote entity receives the forwarded request, it will reply to its local proxy for the continuation entity, which will in turn forward the reply back across the transport.
Syndicate Protocol
Main article: The Syndicate Protocol
The Syndicate Protocol (a.k.a the Syndicate Network Protocol) allows relays to proxy entities from remote scopes into the local scope.
For more, see the protocol specification document.
Syndicated Actor Model
Main article: The Syndicated Actor Model
The Syndicated Actor Model (often abbreviated SAM) is the model of concurrency and communication underpinning Synit. The SAM offers a “conversational” metaphor: programs meet and converse in virtual locations, building common understanding of the state of their shared tasks.
In the SAM, source entities running within an actor publish assertions and send messages to target entities, possibly in other actors. The essential idea of the SAM is that state replication is more useful than message-passing; message-passing protocols often end up simulating state replication.
A thorough introduction to the Syndicated Actor Model is available.
System Layer
The system layer is an essential part of an operating system, mediating between user-facing programs and the kernel. It provides the technical foundation for many qualities relevant to system security, resilience, connectivity, maintainability and usability.
The concept of a system layer has only been recently recognised—the term itself was coined by Benno Rice in a 2019 conference presentation—although many of the ideas it entails have a long history.
The hypothesis that the Synit system explores is that the Syndicated Actor model provides a suitable theoretical and practical foundation for a system layer. The system layer demands, and the SAM supplies, well-integrated expression of features such as service naming, presence, discovery and activation; security mechanism and policy; subsystem isolation; and robust handling of partial failure.
System Dataspace
The system dataspace in Synit is the primary dataspace entity, owned by an actor running within the root system bus, and (selectively) made available to daemons, system services, and user programs.
Timeout
Many implementations of the Syndicated Actor Model offer actions for establishing timeouts, i.e. one-off or repeating alarms. Timeouts are frequently implemented as linked tasks.
Transport
A transport is the underlying medium connecting one relay to its counterpart(s) in an instance of the Syndicate network protocol. For example, a TLS-on-TCP/IP socket may connect a pair of relays to one another, or a UDP multicast socket may connect an entire group of relays across an ethernet.
Turn
Each time an event arrives at an actor's mailbox, the actor takes a turn. A turn is the process of handling the triggering event, from the moment of its withdrawal from the mailbox to the moment of the completion of its interpretation.
Relatedly, the programming-language representation of a turn is a convenient place to attach
the APIs necessary for working with the Syndicated Actor Model. In many implementations,
some class named Turn
or similar exposes methods corresponding to the actions available
in the SAM.
In the SAM, a turn comprises
- the event that triggered the turn,
- the entity addressed by the event,
- the facet owning the targeted entity, and
- the collection of pending actions produced during execution.
If a turn proceeds to completion without an exception or other crash, its pending actions are committed (finalised and/or delivered to their target entities). If, on the other hand, the turn is aborted for some reason, its pending actions are rolled back (discarded), the actor is terminated, its assertions retracted, and all its resources released.
Value
A Preserves Value
with embedded data. The embedded data are often embedded references
but, in some implementations, may be other kinds of datum. Every message body and every
assertion payload is a value.
Wire Symbol
A wire symbol is a structure used in implementations of the Syndicate network protocol to maintain a connection between an in-memory entity reference and the equivalent name for the entity as used in packets sent across the network.
System overview
Synit uses the Linux kernel as a hardware abstraction and virtualisation layer.
All processes in the system are arranged into a supervision tree, conceptually rooted at the system bus.
While init
is PID 1, and thus the root of the tree of processes according to the kernel, it
is not the root of the supervision tree. The init
process, acting as management daemon for
the kernel from Synit's perspective, is "supervised" by the system bus like all other
services. The supervision tree is a Synit concept, not a Linux concept.
Boot process
The kernel first loads the stock PostmarketOS initrd
, which performs a number of important
tasks and then delegates to /sbin/init
.
/sbin/init = synit-init.sh
The synit-config
package overrides the usual contents of
/sbin/init
, replacing it with a short shell script, synit-init.sh
. This script, in turn,
takes care of a few boring tasks such as mounting /dev
, /proc
, /run
, etc., ensuring that
a few important directories exist, and remounting /
as read-write before exec
ing
/sbin/synit-pid1
.
For the remainder of the lifetime of the system, /sbin/synit-pid1
is the PID 1 init
process.
/sbin/synit-pid1
- Source code:
[synit]/synit-pid1/
- Packaging:
[synit]/packaging/packages/synit-pid1/
The synit-pid1
program starts by spawning the system bus
(syndicate-server
in the process tree above) and the program /sbin/synit-log
, connecting
stderr
of the former to stdin
of the latter.
It then goes on to perform two tasks concurrently: the first is the Unix
init
role, reaping zombie processes, and the second is
to interact with the system bus as an ordinary system service.
The latter allows the system to treat init
just like any other part of the system, accessing
its abilities to reboot or power off the system using messages and assertions in the system
dataspace as usual.
Even though synit-pid1
is, to the kernel, a parent process of syndicate-server
, it is
logically a child process.
/sbin/synit-log
- Source code:
[synit]/packaging/packages/synit-pid1/synit-log
This short shell script invokes the S6 program s6-log
to capture log
output from the system bus, directing it to files in /var/log/synit/
.
The System Bus: syndicate-server
- Source code:
[syndicate-rs]/syndicate-server/
- Packaging:
[synit]/packaging/packages/syndicate-server/
The syndicate-server
program has a number of closely-related functions. In many ways, it is a
reification of the system layer concept itself.
It provides:
-
A root system bus service for use by other programs. In this way, it is analogous to D-Bus.
-
A configuration language suitable for programming dataspaces with simple reactive behaviours.
-
A general-purpose service dependency tracking facility.
-
A gatekeeper service, for exposing capabilities to running objects as (potentially long-lived) macaroon-style "sturdy references", plus TCP/IP- and Unix-socket-based transports for accessing capabilities through the gatekeeper.
-
An
inotify
-based configuration tracker which loads and executes configuration files written in the scripting language. -
Process startup and supervision services for running external programs.
The program can also be used as an "inferior" bus. For example, there may be a per-user bus, or a per-session bus, or both. Each bus would appropriately scope the lifetime of its supervised processes.
Finally, it can be used completely standalone, outside a Synit context.
The root system bus
The synit-pid1
program invokes syndicate-server
like this:
/usr/bin/syndicate-server --inferior --config /etc/syndicate/boot
The first flag, --inferior
, tells the server to expect to be able to communicate on its
stdin/stdout using the standard wire protocol. This lets synit-pid1
join
the community of actors running within the system dataspace.
The second flag, --config /etc/syndicate/boot
, tells the server to start monitoring the
directory tree rooted at /etc/syndicate/boot
for changes. Files whose names end with .pr
within that tree are loaded as configuration script files.
Almost all of Synit is a consequence of careful use of the configuration script files in
/etc/syndicate
.
Configuration scripting language
The syndicate-server
program includes a mechanism that was originally intended for populating
a dataspace with assertions, for use in configuring the server, but which has since grown into
a small Syndicated Actor Model scripting language in its own right. This seems to be the
destiny of "configuration formats"—why fight it?—but the current language is inelegant and
artificially limited in many ways. I have an as-yet-unimplemented sketch of a more refined
design to replace it. Please forgive the ad-hoc nature of the actually-implemented language
described below, and be warned that this is an unstable area of the Synit design.
See near the end of this document for a few illustrative examples.
Evaluation model
The language consists of sequences of instructions. For example, one of the most important instructions simply publishes (asserts) a value at a given entity (which will often be a dataspace).
The language evaluation context includes an environment mapping variable names to Preserves
Value
s.
Variable references are lexically scoped.
Each source file is interpreted in a top-level environment. The top-level environment is
supplied by the context invoking the script, and is generally non-empty. It frequently includes
a binding for the variable config
, which happens to be the default target variable
name.
Source file syntax
Program = Instruction ...
A configuration source file is a file whose name ends in .pr
that contains zero or more
Preserves text-syntax
values, which are together interpreted as a sequence of Instructions.
Comments. Preserves comments are ignored. One unfortunate wart is that because Preserves comments are really annotations, they are required by the Preserves data model to be attached to some other value. Syntactically, this manifests as the need for some non-comment following every comment. In scripts written to date, often an empty SequencingInstruction serves to anchor comments at the end of a file:
; A comment
; Another comment
; The following empty sequence is needed to give the comments
; something to attach to
[]
Patterns, variable references, and variable bindings
Symbols are treated specially throughout the language. Perl-style sigils control the interpretation of any given symbol:
-
$
var is a variable reference. The variable var will be looked up in the environment, and the corresponding value substituted. -
?
var is a variable binder, used in pattern-matching. The value being matched at that position will be captured into the environment under the name var. -
_
is a discard or wildcard, used in pattern-matching. The value being matched at that position will be accepted (and otherwise ignored), and pattern matching will continue. -
=
sym denotes the literal symbol sym. It is used whereever syntactic ambiguity could prevent use of a bare literal symbol. For example,=?foo
denotes the literal symbol?foo
, where?foo
on its own would denote a variable binder for the variable namedfoo
. -
all other symbols are bare literal symbols, denoting just themselves.
The special variable .
(referenced using $.
) denotes "the current environment, as a
dictionary".
The active target
During loading and compilation (!) of a source file, the compiler maintains a compile-time
register called the active target (often simply the "target"), containing the name of a
variable that will be used at runtime to select an entity reference
to act upon. At the beginning of compilation, it is set to the name config
, so that whatever
is bound to config
in the initial environment at runtime is used as the default target for
targeted Instructions.
This is one of the awkward parts of the current language design.
Instructions
Instruction =
SequencingInstruction |
RetargetInstruction |
AssertionInstruction |
SendInstruction |
ReactionInstruction |
LetInstruction |
ConditionalInstruction
Sequencing
SequencingInstruction = [
Instruction...]
A sequence of instructions is written as a Preserves sequence. The carried instructions are
compiled and executed in order. NB: to publish a sequence of values, use the +=
form of
AssertionInstruction.
Setting the active target
RetargetInstruction = $
var
The target is set with a variable reference standing alone. After compiling such an
instruction, the active target register will contain the variable name var. NB: to publish
the contents of a variable, use the +=
form of AssertionInstruction.
Publishing an assertion
AssertionInstruction =
+=
ValueExpr |
AttenuationExpr |
<
ValueExpr
ValueExpr...>
|
{
ValueExpr:
ValueExpr
...}
The most general form of AssertionInstruction is "+=
ValueExpr". When executed, the
result of evaluating ValueExpr will be published (asserted) at the entity denoted by the
active target register.
As a convenient shorthand, the compiler also interprets every Preserves record or dictionary in Instruction position as denoting a ValueExpr to be used to produce a value to be asserted.
Sending a message
SendInstruction = !
ValueExpr
When executed, the result of evaluating ValueExpr will be sent as a message to the entity denoted by the active target register.
Reacting to events
ReactionInstruction =
DuringInstruction |
OnMessageInstruction |
OnStopInstruction
These instructions establish event handlers of one kind or another.
Subscribing to assertions and messages
DuringInstruction = ?
PatternExpr
Instruction
OnMessageInstruction = ??
PatternExpr
Instruction
These instructions publish assertions of the form <Observe
pat #!
ref>
at the entity
denoted by the active target register, where pat is the dataspace
pattern resulting from evaluation of PatternExpr, and
ref is a fresh entity whose behaviour is to execute Instruction in
response to assertions (resp. messages) carrying captured values from the binding-patterns in
pat.
When the active target denotes a dataspace entity, the Observe
record establishes a subscription to matching assertions and messages.
Each time a matching assertion arrives at a ref, a new facet is created, and Instruction is executed in the new facet. If the instruction creating the facet is a DuringInstruction, then the facet is automatically terminated when the triggering assertion is retracted. If the instruction is an OnMessageInstruction, the facet is not automatically terminated.1
Programs can react to facet termination using OnStopInstructions, and can trigger early facet
termination themselves using the facet
form of ConvenienceExpr (see below).
Reacting to facet termination
OnStopInstruction = ?-
Instruction
This instruction installs a "stop handler" on the facet active during its execution. When the facet terminates, Instruction is run.
Destructuring-bind and convenience expressions
LetInstruction = let
PatternExpr=
ConvenienceExpr
ConvenienceExpr =
dataspace
|
timestamp
|
facet
|
stringify
ConvenienceExpr |
ValueExpr
Values can be destructured and new variables introduced into the environment with let
, which
is a "destructuring bind" or "pattern-match definition" statement. When executed, the result of
evaluating ConvenienceExpr is matched against the result of evaluating PatternExpr. If the
match fails, the actor crashes. If the match succeeds, the resulting binding variables (if any)
are introduced into the environment.
The right-hand-side of a let
, after the equals sign, is either a normal ValueExpr or one of
the following special "convenience" expressions:
-
dataspace
: Evaluates to a fresh, empty dataspace entity. -
timestamp
: Evaluates to a string containing an RFC-3339-formatted timestamp. -
facet
: Evaluates to a fresh entity representing the current facet. Sending the messagestop
to the entity (using e.g. the SendInstruction "! stop
") triggers termination of its associated facet. The entity does not respond to any other assertion or message. -
stringify
: Evaluates its argument, then renders it as a Preserves value using Preserves text syntax, and yields the resulting string.
Conditional execution
ConditionalInstruction = $
var=~
PatternExpr
Instruction
Instruction ...
When executed, the value in variable var is matched against the result of evaluating PatternExpr.
-
If the match succeeds, the resulting bound variables are placed in the environment and execution continues with the first Instruction. The subsequent Instructions are not executed in this case.
-
If the match fails, then the first Instruction is skipped, and the subsequent Instructions are executed.
Value Expressions
ValueExpr =
#t
| #f
| float | double | int | string | bytes |
$
var | =
symbol | bare-symbol |
AttenuationExpr |
<
ValueExpr
ValueExpr...>
|
[
ValueExpr...]
|
#{
ValueExpr...}
|
{
ValueExpr:
ValueExpr
...}
Value expressions are recursively evaluated and yield a Preserves
Value
. Syntactically, they consist of literal
non-symbol atoms, compound data structures (records, sequences, sets and dictionaries), plus
special syntax for attenuated entity references, variable references, and literal symbols:
-
AttenuationExpr, described below, evaluates to an entity reference with an attached attenuation.
-
$
var evaluates to the binding for var in the environment, if there is one, or crashes the actor, if there is not. -
=
symbol and bare-symbol (i.e. any symbols except a binding, a reference, or a discard) denote literal symbols.
Attenuation Expressions
AttenuationExpr = <* $
var [
Rewrite ...]>
Rewrite =
<filter
PatternExpr>
|
<rewrite
PatternExpr
TemplateExpr>
An attenuation expression looks up var in the environment, asserts that it is an entity reference orig, and returns a new entity reference ref, like orig but attenuated with zero or more Rewrites. The result of evaluation is ref, the new attenuated entity reference.
When an assertion is published or a message body arrives at ref, the sequence of Rewrites is executed left-to-right. If a Rewrite succeeds, the value if produces is forwarded on to orig. If all Rewrites fail, the assertion or message is silently ignored.
A rewrite
Rewrite matches values with PatternExpr. If the match fails, the next Rewrite
is tried; if it succeeds, the resulting bindings are used along with the current environment to
evaluate TemplateExpr, and the resulting value is forwarded on to orig.
A filter
Rewrite is the same as <rewrite <?
v
PatternExpr> $
v>
, for some fresh
v.
Supplying zero Rewrites will cause the new entity to reject all assertions and messages sent to it.
Pattern Expressions
PatternExpr =
#t
| #f
| float | double | int | string | bytes |
$
var | ?
var | _
| =
symbol | bare-symbol |
AttenuationExpr |
<?
var
PatternExpr>
|
<
PatternExpr
PatternExpr...>
|
[
PatternExpr...]
|
{
literal:
PatternExpr
...}
Pattern expressions are recursively evaluated to yield a dataspace pattern. Evaluation of a PatternExpr is like evaluation of a ValueExpr, except that binders and wildcards are allowed, set syntax is not allowed, and dictionary keys are constrained to being literal values rather than PatternExprs.
Two kinds of binder are supplied. The more general is <?
var
PatternExpr>
, which
evaluates to a pattern that succeeds, capturing the matched value in a variable named var,
only if PatternExpr succeeds. For the special case of <?
var _>
, the shorthand form
?
var is supported.
The pattern _
(discard, wildcard)
always succeeds, matching any value.
Template Expressions
TemplateExpr =
#t
| #f
| float | double | int | string | bytes |
$
var | =
symbol | bare-symbol |
AttenuationExpr |
<
TemplateExpr
TemplateExpr...>
|
[
TemplateExpr...]
|
{
literal:
TemplateExpr
...}
Template expressions are used in attenuation expressions as part of value-rewriting instructions. Evaluation of a TemplateExpr is like evaluation of a ValueExpr, except that set syntax is not allowed and dictionary keys are constrained to being literal values rather than TemplateExprs.
Additionally, record template labels (just after a "<
") must be "literal-enough". If any
sub-part of the label TemplateExpr refers to a variable's value, the variable must have been
bound in the environment surrounding the AttenuationExpr that the TemplateExpr is part of,
and must not be any of the capture variables from the PatternExpr corresponding to the
template. This is a constraint stemming from the definition of the syntax used for expressing
capability attenuation in the underlying Syndicated Actor Model. (TODO: link to sturdy.prs
documentation)
Examples
Example 1. The simplest example uses no variables, publishing
constant assertions to the implicit default target, $config
:
<require-service <daemon console-getty>>
<daemon console-getty "getty 0 /dev/console">
Example 2. A more complex example subscribes to two kinds of
service-state
assertion at the dataspace named by the default target, $config
, and in
response to their existence asserts a rewritten variation on them:
? <service-state ?x ready> <service-state $x up>
? <service-state ?x complete> <service-state $x up>
In prose, it reads as "during any assertion at $config
of a service-state
record with state
ready
for any service name x
, assert (also at $config
) that x
's service-state
is up
in addition to ready
," and similar for state complete
.
Example 3. The following example first attenuates $config
,
binding the resulting capability to $sys
. Any require-service
record published to $sys
is
rewritten into a require-core-service
record; other assertions are forwarded unchanged.
let ?sys = <* $config [
<rewrite <require-service ?s> <require-core-service $s>>
<filter _>
]>
Then, $sys
is used to build the initial environment for a configuration
tracker, which executes script files in the /etc/syndicate/core
directory using the environment given.
<require-service <config-watcher "/etc/syndicate/core" {
config: $sys
gatekeeper: $gatekeeper
log: $log
}>>
Example 4. The final example executes a script in response to
an exec
record being sent as a message to $config
. The use of ??
indicates a
message-event-handler, rather than ?
, which would indicate an assertion-event-handler.
?? <exec ?argv ?restartPolicy> [
let ?id = timestamp
let ?facet = facet
let ?d = <temporary-exec $id $argv>
<run-service <daemon $d>>
<daemon $d {
argv: $argv,
readyOnStart: #f,
restart: $restartPolicy,
}>
? <service-state <daemon $d> complete> [$facet ! stop]
? <service-state <daemon $d> failed> [$facet ! stop]
]
First, the current timestamp is bound to $id
, and a fresh entity representing the facet
established in response to the exec
message is created and bound to $facet
. The variable
$d
is then initialized to a value uniquely identifying this particular exec
request. Next,
run-service
and daemon
assertions are placed in $config
. These assertions communicate
with the built-in program execution and supervision service, causing a
Unix subprocess to be created to execute the command in $argv
. Finally, the script responds
to service-state
assertions from the execution service by terminating the facet by sending
its representative entity, $facet
, a stop
message.
Programming idioms
Conventional top-level variable bindings. Besides config
, many scripts are executed in a
context where gatekeeper
names a server-wide gatekeeper entity,
and log
names an entity that logs messages of a certain shape that are delivered to it.
Setting the active target register. The following pairs of Instructions first set and then use the active target register:
$log ! <log "-" { line: "Hello, world!" }>
$config ? <configure-interface ?ifname <dhcp>> [
<require-service <daemon <udhcpc $ifname>>>
]
$config ? <service-object <daemon interface-monitor> ?cap> [
$cap {
machine: $machine
}
]
In the last one, $cap
is captured from service-object
records at $config
and is then used
as a target for publication of a dictionary (containing key machine
).
Using conditionals. The syntax of ConditionalInstruction is such that it can be easily chained:
$val =~ pat1 [ ... if pat1 matches ...]
$val =~ pat2 [ ... if pat2 matches ...]
... if neither pat1 nor pat2 matches ...
Using dataspaces as ad-hoc entities. Constructing a dataspace, attaching subscriptions to it, and then passing it to somewhere else is a useful trick for creating scripted entities able to respond to a few different kinds of assertion or message:
let ?ds = dataspace ; create the dataspace
$config += <my-entity $ds> ; send it to peers for them to use
$ds [ ; select $ds as the active target for `DuringInstruction`s inside the [...]
? pat1 [ ... ] ; respond to assertions of the form `pat1`
? pat2 [ ... ] ; respond to assertions of the form `pat2`
?? pat3 [ ... ] ; respond to messages of the form `pat3`
?? pat4 [ ... ] ; respond to messages of the form `pat4`
]
Notes
This isn't quite true. If, after execution of Instruction, the new facet is "inert"—roughly speaking, has published no assertions and has no subfacets—then it is terminated. However, since inert facets are unreachable and cannot interact with anything or affect the future of a program in any way, this is operationally indistinguishable from being left in existence, and so serves only to release memory for later reuse.
Services and service dependencies
- Relevant schema source: [syndicate-protocols]/schemas/service.prs
Assertions in the main $config
dataspace are the means Synit uses to declare services and
service dependencies.
Service are started "gracefully", taking their dependencies into consideration, using
require-service
assertions; upon appearance of require-service
, and after dependencies are
satisfied, a run-service
assertion is automatically made. Services can also be
"force-started" using run-service
assertions directly. Once all run-service
assertions for
a service have been withdrawn, services shut themselves down.
Example: Docker daemon
As a concrete example, take the file
/etc/syndicate/services/docker.pr
,
which both defines and invokes a service for running the Docker daemon:
<require-service <daemon docker>>
<depends-on <daemon docker> <service-state <milestone network> up>>
<daemon docker "/usr/bin/dockerd --experimental 2>/var/log/docker.log">
This is an example of the scripting language in action, albeit a simple one without use of variables or any reactive constructs.
-
The
require-service
assertion instructssyndicate-server
to solve the dependencies for the service named<daemon docker>
and to start the service running. -
The
depends-on
assertion specifies that the Docker daemon requires thenetwork
milestone (configured primarily in network.pr) to have been reached. -
The
daemon
assertion is interpreted by the built-in external service class, and specifies how to configure and run the service once its dependencies are ready.
Details
A few different kinds of assertions, all declared in the service.prs
schema,
form the heart of the system.
Assert that a service and its dependencies should be started
RequireService = <require-service @serviceName any>.
Asserts that a service should begin (and stay) running after waiting for its dependencies and considering reverse-dependencies, blocks, and so on.
Assert that a service should start right now
RunService = <run-service @serviceName any>.
Asserts that a service should begin (and stay) running RIGHT NOW, without considering its dependencies.
The built-in handler for require-service
assertions will assert run-service
automatically
once all dependencies have been satisfied.
Declare a dependency among services
ServiceDependency = <depends-on @depender any @dependee ServiceState>.
Asserts that, when depender
is require-service
d, it should not be started until dependee
has been asserted, and also that dependee
's serviceName
should be require-service
d.
Convey the current state of a service
ServiceState = <service-state @serviceName any @state State>.
State = =started / =ready / =failed / =complete / @userDefined any .
Asserts one or more current states of service serviceName
. The overall state of the service
is the union of asserted state
s.
A few built-in states are defined:
-
started
- the service has begun its startup routine, and may or may not be ready to take requests from other parties. -
started
+ready
- the service has started and is also ready to take requests from other parties. Note that theready
state is special in that it is asserted in addition tostarted
. -
failed
- the service has failed. -
complete
- the service has completed execution.
In addition, any user-defined value is acceptable as a State
.
Make an entity representing a service instance available
ServiceObject = <service-object @serviceName any @object any>.
A running service publishes zero or more of these. The details of the object vary by service.
Request a service restart
RestartService = <restart-service @serviceName any>.
This is a message, not an assertion. It should be sent in order to request a service restart.
Built-in services and service classes
The syndicate-server
program includes built-in knowledge about a handful of useful services,
including a means of loading external programs and integrating them into the running system.
-
Every server program starts a gatekeeper service, which is able to manage conversion between live references and so-called "sturdy refs", long-lived capabilities for access to resources managed by the server.
-
A simple logging actor copies log messages from the system dataspace to the server's standard error file descriptor.
-
Any number of TCP/IP, WebSocket, and Unix socket transports may be configured to allow external access to the gatekeeper and its registered services. (These can also be started from the
syndicate-server
command-line with-p
and-s
options.) -
Any number of configuration watchers may be created to monitor directories for changes to files written using the server scripting language. (These can also be started from the
syndicate-server
command-line with-c
options.) -
Finally, external programs can be started, either as long-lived "daemon" services or as one-off scripts.
Resources available at startup
The syndicate-server
program uses the Rust
tracing
crate, which means different levels of
internal logging verbosity are available via the RUST_LOG
environment variable. See here for
more on RUST_LOG
.
If tracing of Syndicated Actor Model actions is enabled with the
-t
flag, it is configured prior to the start of the main server actor.
As the main actor starts up, it
-
creates a fresh dataspace, known as the
$config
dataspace, intended to contain top-level/global configuration for the server instance; -
creates a fresh dataspace, known as
$log
, for assertions and messages related to service logging within the server instance; -
creates the
$gatekeeper
actor implementing the gatekeeper service, attaching it to the$config
dataspace; -
exposes
$config
,$log
and$gatekeeper
as the variables available to configuration scripts loaded by config-watchers started with the-c
flag (N.B. the$config
dataspace is thus the default target for assertions in config files); -
creates service factories monitoring various service assertions in the
$config
dataspace; -
processes
-p
command-line options, each of which creates a TCP/IP relay listener; -
processes
-s
command-line options, each of which creates a Unix socket relay listener; -
processes
-c
command-line options, each of which creates a config-watcher monitoring a file-system directory; and finally -
creates the logging actor, listening to certain events on the
$log
dataspace.
Once these tasks have been completed, it quiesces, leaving the rest of the operation of the system up to other actors (relay-listeners, configuration scripts, and other configured services).
Gatekeeper
When syndicate-server
starts, it creates a gatekeeper service entity, which accepts
resolve
assertions requesting conversion of a long-lived "sturdyref" to a live
reference. The gatekeeper is the default
object, available as OID 0 to peers at
the other end of relay listener connections.
Gatekeeper protocol
- Relevant schema: [syndicate-protocol]/schemas/gatekeeper.prs
Resolve = <resolve @sturdyref sturdy.SturdyRef @observer #!#!any>.
When a request to resolve a given sturdyref
appears, the gatekeeper entity queries a
dataspace (by default, the server's top-level $config
dataspace) for bind
assertions:
Bind = <bind @oid any @key bytes @target #!any>.
Each bind
assertion matching the requested sturdyref
is checked against the credentials
provided in the sturdyref, and if the checks pass, the target
entity from the bind
is
asserted to the observer
in the resolve
.
Sturdyrefs
- Relevant schema: [syndicate-protocol]/schemas/sturdy.prs
A "sturdyref" is a long-lived certificate including a cryptographic signature that can be upgraded by a gatekeeper entity to a live reference to the entity named in the sturdyref. The current sturdyref implementation is based on the design of Macaroons.
The following definitions are taken from the sturdy.prs schema.
SturdyRef = <ref @oid any @caveatChain [Attenuation ...] @sig bytes>.
Within a ref
record, the oid
field is a free-form value that the targeted service chooses
to name itself. The sig
is an iterated keyed-HMAC construction, just as in macaroons.
First, the service's secret key is used to key an HMAC of the oid
. Then, the result is used
to key an HMAC of the first Attenuation
in caveatChain
. Each Attenuation
's HMAC becomes
the key for the next in the caveatChain
. The final result is equal to the sig
field in a
valid sturdyref.
Attenuation of authority
When it comes to publishing assertions or sending messages to the entity denoted by a
sturdyref, the caveatChain
is used to attenuate the
authority denoted by the sturdyref by filtering and/or rewriting assertion and message bodies.
The caveatChain
is run right to left, with newer rewrites-and-filters at the right-hand end
of the chain and older ones at the left-hand end. Of course, an empty caveatChain
is an
unattenuated reference.
Attenuation = [Caveat ...].
Each individual Attenuation
in a caveatChain
is a sequence of Caveat
s. The term "caveat"
is shamelessly taken from macaroons, though our caveats presently embody only what in the
Macaroons paper are called "first-party caveats" over assertion structure; future versions of
the server may add "third-party caveats" and other, richer, predicates over assertions.
Each Attenuation
's Caveat
s are run in right to left order. The structure and
interpretation of Caveat
s is described fully in the relevant section of the Syndicate
network protocol specification.
Logging
The Synit logging infrastructure is still underdeveloped.
At present, there is an actor created at syndicate-server
startup time that monitors the
$log
dataspace for messages of the form:
LogEntry = <log @timestamp string @detail { any: any ...:... }> .
When it receives a log entry, it looks for a few conventional and optional keys in the detail
field, each permitted to be any kind of value:
pid
, conventionally a Unix process ID;line
, conventionally a string of free-form text intended for people to read;service
, conventionally a service name in the sense ofrequire-service
/run-service
; andstream
, conventionally one of the symbolsstdout
orstderr
.
The timestamp and the special keys are then formatted, along with all other information in the
entry record, and printed to the syndicate-server
's standard error at INFO
level using
tracing
.
Relay Listeners
- Relevant schema:
The syndicate-server
program can be configured to listen on TCP/IP ports and Unix
sockets1 for incoming connections speaking the Syndicate network
protocol.
TCP/IP and WebSockets
Assertions requiring a service with name matching
TcpRelayListener
cause the server to start a TCP server socket on the given addr
's host
and port
, exposing the gatekeeper
entity reference as the
initial ref of incoming connections:
TcpRelayListener = <relay-listener @addr Tcp @gatekeeper #!gatekeeper.Resolve> .
Tcp = <tcp @host string @port int>.
When a new connection arrives, the first byte is examined to see what kind of connection it is and which Preserves syntax it will use.
-
If it is ASCII "
G
" (0x47), it cannot be the start of a protocol packet, so it is interpreted as the start of a WebSocket connection and handed off to the tokio_tungstenite WebSocket library. Within the WebSocket's context, each packet must be encoded as a binary packet using Preserves binary syntax. -
Otherwise, if it could start a valid UTF-8 sequence, the connection will be a plain TCP/IP link using the Preserves text syntax.
-
Otherwise, it's a byte which cannot be part of a valid UTF-8 sequence, so it is interpreted as a Preserves binary syntax tag: the connection will be a plain TCP/IP link using Preserves binary syntax.
Unix sockets
Assertions requiring a service with name matching
UnixRelayListener
cause the server to start a Unix server socket on the given addr
's
path
, exposing the gatekeeper
entity reference as the
initial ref of incoming connections:
UnixRelayListener = <relay-listener @addr Unix @gatekeeper #!gatekeeper.Resolve> .
Unix = <unix @path string>.
Syntax autodetection is as for TCP/IP, except that WebSockets are not supported over Unix sockets.
Notes
Only SOCK_STREAM
Unix sockets are supported, at present. In future,
SOCK_DGRAM
could be useful for e.g. file-descriptor passing.
Configuration watcher
Assertions requiring a service with name matching
ConfigWatcher
cause the server to start a configuration watcher service monitoring files in
and subdirectories of the given path
for changes:
ConfigWatcher = <config-watcher @path string @env ConfigEnv>.
ConfigEnv = { symbol: any ...:... }.
The path
may name either a file or directory. Any time the configuration watcher finds a file
matching the glob *.pr
within the tree rooted at path
, it loads the file. Each time a
*.pr
file is loaded, it is interpreted as a configuration scripting
language program, with a copy of env
as the "initial environment" for the
script.
Whenever a change to a *.pr
file is detected, the configuration watcher reloads the file,
discarding previous internal state related to the file.
Note that a quirk of the config language requires that there exist an entry in env
with key
the symbol config
and value an entity reference (usually
denoting a dataspace entity). However, the config
entry need
not be the same as the surrounding $config
! A useful pattern is to set up a new
ConfigWatcher
with env
containing a config
binding pointing to an
attenuated reference to the current config
dataspace, or
even an entirely fresh dataspace created specifically for the
task.
Process supervision and management
Assertions requiring a service with name matching
DaemonService
cause the server to start a subprocess-based service:
DaemonService = <daemon @id any> .
Each daemon
service can have zero or more subprocesses associated with it. Subprocesses can
be long-lived services or short-lived, system-state-changing programs or scripts.
Adding process specifications to a service
Each subprocess associated with a DaemonService
is defined with a DaemonProcess
assertion:
DaemonProcess = <daemon @id any @config DaemonProcessSpec>.
DaemonProcessSpec =
/ @simple CommandLine
/ @oneShot <one-shot @setup CommandLine>
/ @full FullDaemonProcess .
The simplest kind of subprocess specification is a CommandLine
, either a string (sent to sh -c
) or an array of program name (looked up in the $PATH
) and arguments:
CommandLine = @shell string / @full FullCommandLine .
FullCommandLine = [@program string, @args string ...] .
The simple
and oneShot
variants of DaemonProcessSpec
expand into FullDaemonProcess
values as follows:
- a
simple
command-line c becomes{ argv:
c}
; and - a record
<one-shot
c>
becomes{ argv:
c, readyOnStart: false, restart: on-error }
.
Subprocess specification
The FullDaemonProcess
type matches a Preserves dictionary having, at minimum, an argv
key,
and optionally including many other parameters controlling various aspects of the subprocess to
be created.1
FullDaemonProcess =
& @process FullProcess
& @readyOnStart ReadyOnStart
& @restart RestartField
& @protocol ProtocolField .
FullProcess =
& { argv: CommandLine }
& @env ProcessEnv
& @dir ProcessDir
& @clearEnv ClearEnv .
The CommandLine
associated with argv
specifies the program name to invoke and its
command-line arguments. The other options are described in the remainder of this section.
Ready-signalling
If the key readyOnStart
is present in a FullDaemonProcess
dictionary, then if its
associated value is #t
(the default), the service will be considered
ready
immediately after it has been spawned; if its value is
#f
, some other arrangement is expected to be made to announce a ready
ServiceState
against the service's name.
ReadyOnStart =
/ @present { readyOnStart: bool }
/ @invalid { readyOnStart: any }
/ @absent {} .
Whether and when to restart
The default restart policy is always
. It can be overridden by providing the key restart
a
FullDaemonProcess
dictionary, mapping to a valid RestartPolicy
value.
RestartField =
/ @present { restart: RestartPolicy }
/ @invalid { restart: any }
/ @absent {} .
RestartPolicy = =always / =on-error / =all / =never .
The valid restart policies are:
-
always
: Whether the process terminates normally or abnormally, restart it without affecting any peer processes within the service. -
on-error
: If the process terminates normally, leave everything alone; if it terminates abnormally, restart it without affecting peers. -
all
: If the process terminates normally, leave everything alone; if it terminates abnormally, restart the whole daemon (all processes within theDaemonservice
). -
never
: Treat both normal and abnormal termination as normal termination; that is, never restart, and enter statecomplete
even if the process fails.
Speaking Syndicate Network Protocol via stdin/stdout
By default, the syndicate-server
program assumes nothing about the information to be read and
written via a subprocess's standard input and standard output. This can be overridden with a
protocol
entry in a FullDaemonProcess
specification. (Standard error is always considered
to produce information to be put in the system logs, however.)
ProtocolField =
/ @present { protocol: Protocol }
/ @invalid { protocol: any }
/ @absent {} .
Protocol = =none / =application/syndicate / =text/syndicate .
The available options for protocol
are:
-
none
: the standard input of the subprocess is connected to/dev/null
, and the standard output and standard error are logged. -
application/syndicate
: the subprocess standard input and output are used as a binary syntax Syndicate network protocol relay. Standard error is logged. The subprocess is expected to make some entity available to the server via initial oid 0. The server reflects this expectation by automatically placing a service object record into the dataspace alongside thedaemon
record defining the subprocess. -
text/syndicate
: as forapplication/syndicate
, but Preserves' text syntax is used instead of binary syntax.
Specifying subprocess environment variables
By default, the Unix process environment passed on to subprocesses is not changed. Supplying
clearEnv
and/or env
keys alters this behaviour.
ClearEnv =
/ @present { clearEnv: bool }
/ @invalid { clearEnv: any }
/ @absent {} .
ProcessEnv =
/ @present { env: { EnvVariable: EnvValue ...:... } }
/ @invalid { env: any }
/ @absent {} .
EnvVariable = @string string / @symbol symbol / @invalid any .
EnvValue = @set string / @remove #f / @invalid any .
Setting clearEnv
to #t
causes the environment to be emptied before env
is processed and
before the subprocess is started. The env
key is expected to contain a dictionary whose keys
are strings or symbols and whose values are either a string, to set the variable to a new
value, or #f
, to remove it from the environment.
Setting the Current Working Directory for a subprocess
By default, each subprocess inherits the current working directory of the syndicate-server
program. Setting a dir
key to a string value in a FullDaemonProcess
overrides this.
ProcessDir =
/ @present { dir: string }
/ @invalid { dir: any }
/ @absent {} .
Notes
The FullProcess
type is split out in order for it to be able to be
reused outside the specific context of a daemon process.
Configuration files and directories
- On a running system:
/etc/syndicate/
- Source repository: [synit]/packaging/packages/synit-config/files/etc/syndicate
The root system bus is started with a --config /etc/syndicate/boot
command-line argument, which causes it to execute configuration scripts in
that directory. In turn, the boot
directory contains instructions for loading configuration
from other locations on the filesystem.
This section will examine the layout of the configuration scripts and directories.
The boot layer
The files in /etc/syndicate/boot define the boot layer.
Console getty
The first thing the boot layer does, in
001-console-getty.pr,
is start a getty
on /dev/console
:
<require-service <daemon console-getty>>
<daemon console-getty "getty 0 /dev/console">
Ad-hoc execution of programs
Next, in 010-exec.pr, it installs a handler that responds to messages requesting ad-hoc execution of programs:
?? <exec ?argv ?restartPolicy> [
let ?id = timestamp
let ?facet = facet
let ?d = <temporary-exec $id $argv>
<run-service <daemon $d>>
<daemon $d { argv: $argv, readyOnStart: #f, restart: $restartPolicy }>
? <service-state <daemon $d> complete> [$facet ! stop]
? <service-state <daemon $d> failed> [$facet ! stop]
]
If the restart policy is not specified, it is defaulted to on-error
:
?? <exec ?argv> ! <exec $argv on-error>
"Milestone" pseudo-services
Then, in 010-milestone.pr, it defines how to respond to a request to run a "milestone" pseudo-service:
? <run-service <milestone ?m>> [
<service-state <milestone $m> started>
<service-state <milestone $m> ready>
]
The definition is trivial—when requested, simply declare success—but useful in that a "milestone" can be used as a proxy for a configuration state that other services can depend upon.
Concretely, milestones are used in two places at present: a core
milestone declares that the
core layer of services is ready, and a network
milestone declares that initial network
configuration is complete.
Synthesis of service state "up"
The definition of ServiceState includes
ready
, for long-running service programs, and complete
, for successful exit (exit status 0)
of "one-shot" service programs. In
010-service-state-up.pr,
we declare an alias up
that is asserted in either of these cases:
? <service-state ?x ready> <service-state $x up>
? <service-state ?x complete> <service-state $x up>
Loading of "core" and "services" layers
The final tasks of the boot layer are to load the "core" and "service" layers, respectively.
Services declared in the "core" layer are automatically marked as dependencies of the
<milestone core>
pseudo-service, and those declared in the "services" layer are automatically
marked as depending on <milestone core>
.
The core layer loader
For the core layer, in
020-load-core-layer.pr,
a configuration watcher is started, monitoring
/etc/syndicate/core
for scripts defining services to place into the layer. Instead of passing
an unattenuated reference to $config
to the configuration watcher, an attenuation
expression rewrites require-service
assertions into
require-core-service
assertions:
let ?sys = <* $config [
<rewrite <require-service ?s> <require-core-service $s>>
<filter _>
]>
<require-service <config-watcher "/etc/syndicate/core" {
config: $sys
gatekeeper: $gatekeeper
log: $log
}>
Then, require-core-service
is given meaning:
? <require-core-service ?s> [
<depends-on <milestone core> <service-state $s up>>
<require-service $s>
]
The services layer loader
The services layer is treated similarly in
030-load-services.pr,
except require-basic-service
takes the place of require-core-service
, and the configuration
watcher isn't started until <milestone core>
is ready. Any require-basic-service
assertions
are given meaning as follows:
? <require-basic-service ?s> [
<depends-on $s <service-state <milestone core> up>>
<require-service $s>
]
The core layer: /etc/syndicate/core
The files in /etc/syndicate/core define the core layer.
The
configdirs.pr
script brings in scripts in /run
and /usr/local
analogues of the core config directory:
<require-service <config-watcher "/run/etc/syndicate/core" $.>>
<require-service <config-watcher "/usr/local/etc/syndicate/core" $.>>
The
eudev.pr
script runs a udevd
instance and, once it's ready, starts an initial scan:
<require-service <daemon eudev>>
<daemon eudev ["/sbin/udevd", "--children-max=5"]>
<require-service <daemon eudev-initial-scan>>
<depends-on <daemon eudev-initial-scan> <service-state <daemon eudev> up>>
<daemon eudev-initial-scan <one-shot "
echo '' > /proc/sys/kernel/hotplug &&
udevadm trigger --type=subsystems --action=add &&
udevadm trigger --type=devices --action=add &&
udevadm settle --timeout=30
">>
The hostname.pr script simply sets the machine hostname:
<require-service <daemon hostname>>
<daemon hostname <one-shot "hostname $(cat /etc/hostname)">>
Finally, the machine-dataspace.pr script declares a fresh, empty dataspace, and asserts a reference to it in a "well-known location" for use by other services later:
let ?ds = dataspace
<machine-dataspace $ds>
The services layer: /etc/syndicate/services
The files in /etc/syndicate/services define the services layer.
The
configdirs.pr
script brings in /run
and /usr/local
service definitions, analogous to the same file in the
core layer:
<require-service <config-watcher "/run/etc/syndicate/services" $.>>
<require-service <config-watcher "/usr/local/etc/syndicate/services" $.>>
Networking core
The
network.pr
defines the <milestone network>
pseudo-service and starts a number of ancillary services for
generically monitoring and configuring system network interfaces.
First, <daemon interface-monitor>
is a small Python program, required by <milestone network>
, using Netlink sockets to track changes to interfaces and interface state. It speaks
the Syndicate network protocol on its standard input and output, and
publishes a service object which expects a reference to the
machine dataspace defined earlier:
<require-service <daemon interface-monitor>>
<depends-on <milestone network> <service-state <daemon interface-monitor> ready>>
<daemon interface-monitor {
argv: "/usr/lib/synit/interface-monitor"
protocol: application/syndicate
}>
? <machine-dataspace ?machine> [
? <service-object <daemon interface-monitor> ?cap> [
$cap {
machine: $machine
}
]
]
The interface-monitor
publishes assertions describing interface presence and state to the
machine dataspace. The network.pr script responds to these assertions by requesting
configuration of an interface once it reaches a certain state. First, all interfaces are
enabled when they appear and disabled when they disappear:
$machine ? <interface ?ifname _ _ _ _ _ _> [
$config [
! <exec ["ip" "link" "set" $ifname "up"]>
?- ! <exec ["ip" "link" "set" $ifname "down"] never>
]
]
Next, a DHCP client is invoked for any "normal" (wired-ethernet-like) interface in "up" state with a carrier:
$machine ? <interface ?ifname _ normal up up carrier _> [
$config <configure-interface $ifname <dhcp>>
]
$machine ? <interface ?ifname _ normal up unknown carrier _> [
$config <configure-interface $ifname <dhcp>>
]
$config ? <configure-interface ?ifname <dhcp>> [
<require-service <daemon <udhcpc $ifname>>>
]
$config ? <run-service <daemon <udhcpc ?ifname>>> [
<daemon <udhcpc $ifname> ["udhcpc" "-i" $ifname "-fR" "-s" "/usr/lib/synit/udhcpc.script"]>
]
We use a custom udhcpc
script which modifies the default script to give mobile-data devices a
sensible routing metric.
The final pieces of network.pr are static configuration of the loopback interface:
<configure-interface "lo" <static "127.0.0.1/8">>
? <configure-interface ?ifname <static ?ipaddr>> [
! <exec ["ip" "address" "add" "dev" $ifname $ipaddr]>
?- ! <exec ["ip" "address" "del" "dev" $ifname $ipaddr] never>
]
and conditional publication of a default-route
record, allowing services to detect when the
internet is (nominally) available:
$machine ? <route ?addressFamily default _ _ _ _> [
$config <default-route $addressFamily>
]
Wifi & Mobile Data
Building atop the networking core, wifi.pr and modem.pr provide the necessary support for wireless LAN and mobile data interfaces, respectively.
When interface-monitor
detects presence of a wireless LAN interface, wifi.pr reacts by
starting wpa_supplicant
for the interface along with a small Python program, wifi-daemon
,
that acts as a client to wpa_supplicant
, adding and removing networks and network
configuration according to selected-wifi-network
assertions in the machine dataspace.
$machine ? <interface ?ifname _ wireless _ _ _ _> [
$config [
<require-service <daemon <wpa_supplicant $ifname>>>
<depends-on
<daemon <wifi-daemon $ifname>>
<service-state <daemon <wpa_supplicant $ifname>> up>>
<require-service <daemon <wifi-daemon $ifname>>>
]
]
$config ? <run-service <daemon <wifi-daemon ?ifname>>> [
<daemon <wifi-daemon $ifname> {
argv: "/usr/lib/synit/wifi-daemon"
protocol: application/syndicate
}>
? <service-object <daemon <wifi-daemon $ifname>> ?cap> [
$cap {
machine: $machine
ifname: $ifname
}
]
]
$config ? <run-service <daemon <wpa_supplicant ?ifname>>> [
<daemon <wpa_supplicant $ifname> [
"wpa_supplicant" "-Dnl80211,wext" "-C/run/wpa_supplicant" "-i" $ifname
]>
]
The other tasks performed by wifi.pr are to request DHCP configuration for available wifi interfaces:
$machine ? <interface ?ifname _ wireless up up carrier _> [
$config <configure-interface $ifname <dhcp>>
]
and to relay selected-wifi-network
records from user settings (described
below) into the machine dataspace, for wifi-daemon
instances to pick up:
$config ? <user-setting <?s <selected-wifi-network _ _ _>>> [ $machine += $s ]
Turning to modem.pr, which is currently hard-coded for Pinephone devices, we see two main
blocks of config. The simplest just starts the eg25-manager
daemon for controlling the
Pinephone's Quectel modem, along with a simple monitoring script for restarting it if and when
/dev/EG25.AT
disappears:
<daemon eg25-manager "eg25-manager">
<depends-on <daemon eg25-manager> <service-state <daemon eg25-manager-monitor> up>>
<daemon eg25-manager-monitor "/usr/lib/synit/eg25-manager-monitor">
The remainder of modem.pr handles cellular data, configured via the qmicli program.
<require-service <qmi-wwan "/dev/cdc-wdm0">>
<depends-on <qmi-wwan "/dev/cdc-wdm0"> <service-state <daemon eg25-manager> up>>
When the user settings mobile-data-enabled
and mobile-data-apn
are both
present, it responds to qmi-wwan
service requests by invoking qmi-wwan-manager
, a small
shell script, for each particular device and APN combination:
? <user-setting <mobile-data-enabled>> [
? <user-setting <mobile-data-apn ?apn>> [
? <run-service <qmi-wwan ?dev>> [
<require-service <daemon <qmi-wwan-manager $dev $apn>>>
]
]
]
? <run-service <daemon <qmi-wwan-manager ?dev ?apn>>> [
<daemon <qmi-wwan-manager $dev $apn> ["/usr/lib/synit/qmi-wwan-manager" $dev $apn]>
]
(Because qmicli is sometimes not well behaved, there is also code in modem.pr for restarting it in certain circumstances when it gets into a state where it reports errors but does not terminate.)
Simple daemons
A few simple daemons are also started as part of the services layer.
The docker.pr script starts the docker daemon, but only once the network configuration is available:
<require-service <daemon docker>>
<depends-on <daemon docker> <service-state <milestone network> up>>
<daemon docker "/usr/bin/dockerd --experimental 2>/var/log/docker.log">
The ntpd.pr script starts an NTP daemon, but only when an IPv4 default route exists:
<require-service <daemon ntpd>>
<depends-on <daemon ntpd> <default-route ipv4>>
<daemon ntpd "ntpd -d -n -p pool.ntp.org">
Finally, the sshd.pr script starts the OpenSSH server daemon after ensuring both that the network is available and that SSH host keys exist:
<require-service <daemon sshd>>
<depends-on <daemon sshd> <service-state <milestone network> up>>
<depends-on <daemon sshd> <service-state <daemon ssh-host-keys> complete>>
<daemon sshd "/usr/sbin/sshd -D">
<daemon ssh-host-keys <one-shot "ssh-keygen -A">>
User settings
A special folder, /etc/syndicate/user-settings
, acts as a persistent database of assertions
relating to user settings, including such things as wifi network credentials and preferences,
mobile data preferences, and so on. The
userSettings.pr
script sets up the programs responsible for managing the folder.
The contents of the folder itself are managed by a small Python program,
user-settings-daemon
, which responds to requests arriving via the $config
dataspace by
adding and removing files containing assertions in /etc/syndicate/user-settings
.
let ?settingsDir = "/etc/syndicate/user-settings"
<require-service <daemon user-settings-daemon>>
<daemon user-settings-daemon {
argv: "/usr/lib/synit/user-settings-daemon"
protocol: application/syndicate
}>
? <service-object <daemon user-settings-daemon> ?cap> [
$cap {
config: $config
settingsDir: $settingsDir
}
]
Each such file is named after the SHA-1 digest of the canonical
form of the assertion it contains. For example,
/etc/syndicate/user-settings/8814297f352be4ebbff19137770e619b2ebc5e91.pr
contains
<mobile-data-enabled>
.
The files in /etc/syndicate/user-settings
are brought into the main config dataspace by way
of a rewriting configuration watcher:
let ?settings = <* $config [ <rewrite ?item <user-setting $item>> ]>
<require-service <config-watcher $settingsDir { config: $settings }>>
Every assertion from /etc/syndicate/user-settings
is wrapped in a <user-setting ...>
record
before being placed into the main $config
dataspace.
How-to ...
The following pages walk through examples of common system administration tasks.
How to define services and service classes
Synit services are started in response to run-service
assertions. These, in turn, are eventually asserted by the service
dependency tracker in response to require-service
assertions, once any declared dependencies have been started.
So to implement a service, respond to run-service
records mentioning the service's name.
There are a number of concepts involved in service definitions:
-
Service name. A unique identifier for a service instance.
-
Service implementation. Code that responds to
run-service
requests for a service instance to start running, implementing the service's ongoing behaviour. -
Service class. A parameterized collection of services sharing a common parameterized implementation.
A service may be an instance of a service class (a parameterized family of services) or may be a simple service that is the only instance of its class. Service dependencies can be statically-declared or dynamically-computed.
A service's implementation may be external, running as a subprocess
managed by syndicate-server
; internal, backed by code that is part of the syndicate-server
process itself; or user-defined, implemented via user-supplied code written in the
configuration language or as other actor programs connected somehow to the
system bus.
An external service may involve a long-running process (a "daemon"; what s6-rc
calls
a "longrun"), or may involve a short-lived activity that, at startup or shutdown, modifies
aspects of overall system state outside the purview of the supervision tree (what
s6-rc
calls a "one-shot").
Service names
Every service is identified with its name. A service name can be any Preserves value. A simple symbol may suffice, but records and dictionaries are often useful in giving structure to service names.
Here are a few example service names:
<config-watcher "/foo/bar" $.>
1<daemon docker>
<milestone network>
<qmi-wwan "/dev/cdc-wdm0">
<udhcpc "eth0">
The first two invoke service behaviours that are built-in to
syndicate-server
; the last three are user-defined service names.
Defining a simple external service
As an example of a simple external service, take the ntpd
daemon. The following assertions
placed in the configuration file /etc/syndicate/services/ntpd.pr
cause ntpd
to be run as
part of the Synit services layer.
First, we choose the service name: <daemon ntpd>
. The name is a daemon
record, marking it
as a supervised external service. Having chosen a name, and chosen to
use the external service supervision mechanism to run the service, we make our first assertion,
which defines the program to be launched:
<daemon ntpd "ntpd -d -n -p pool.ntp.org">
Next, we mark the service as depending on the presence of another assertion, <default-route ipv4>
. This assertion is managed by the networking core.
<depends-on <daemon ntpd> <default-route ipv4>>
These two assertions are, together, the total of the definition of the service.
However, without a final require-service
assertion, the service will not be activated. By
requiring the service, we connect the service definition into
the system dependency tree, enabling actual loading and activation of the service.
<require-service <daemon ntpd>>
Defining a service class
The following stanza (actually part of the networking
core) waits for run-service
assertions matching a
family of service names, <daemon <udhcpc
ifname>>
. When it sees one, it computes the
specification for the corresponding command-line, on the fly, substituting the value of the
ifname binding in the correct places (once in the service name and once in the command-line
specification).
? <run-service <daemon <udhcpc ?ifname>>> [
<daemon
<udhcpc $ifname>
["udhcpc" "-i" $ifname "-fR" "-s" "/usr/lib/synit/udhcpc.script"]
>
]
This suffices to define the service. To instantiate it, we may either manually provide assertions mentioning the interfaces we care about,
<require-service <daemon <udhcpc "eth0">>>
<require-service <daemon <udhcpc "wlan0">>>
or, as actually implemented in the networking core (in network.pr
lines
13–15
and
42–47),
we may respond to assertions placed in the dataspace by a daemon,
interface-monitor
,
whose role is to reflect AF_NETLINK events into
assertions:
? <configure-interface ?ifname <dhcp>> [
<require-service <daemon <udhcpc $ifname>>>
]
Here, when an assertion of the form <configure-interface
ifname <dhcp>>
appears in the
dataspace, we react by asserting a require-service
record that in turn eventually triggers
assertion of a matching run-service
, which then in turn results in invocation of the udhcpc
command-line we specified above.
Defining non-daemon
services; reacting to user settings
Only service names of the form <daemon
name>
are backed by external service
supervisor code. Other service name schemes have other implementations.
In particualr, user-defined service name schemes are possible and useful.
For example, in the configuration relating to setup of mobile data
interfaces, service names of the form <qmi-wwan
devicePath>
are defined:
? <user-setting <mobile-data-enabled>> [
? <user-setting <mobile-data-apn ?apn>> [
? <run-service <qmi-wwan ?dev>> [
<require-service <daemon <qmi-wwan-manager $dev $apn>>>
$log ! <log "-" { line: "starting wwan manager", dev: $dev, apn: $apn }>
]
]
]
Reading this inside-out,
-
run-service
forqmi-wwan
service names is defined to require a<daemon <qmi-wwan-manager
deviceName APN>>
service, defined elsewhere; in addition, when arun-service
assertion appears, a log message is produced. -
the stanza reacting to
run-service
is only active when some<user-setting <mobile-data-apn
APN>>
assertion exists. -
the stanza querying the
mobile-data-apn
user setting is itself only active when<user-setting <mobile-data-enabled>>
has been asserted.
In sum, this means that even if a qmi-wwan
service is requested and activated, nothing will
happen until the user enables mobile data and selects an
APN. If the user later disables mobile data,
the qmi-wwan
implementation will automatically be retracted, and the corresponding
qmi-wwan-manager
service terminated.
This first service name example is interesting because it includes
an embedded capability reference using the $.
syntax from
the scripting language to denote the active scripting language
environment dictionary.
How to restart services
Send a restart-service
message mentioning the service
name of the service to restart. Use the !
operator of
the configuration language to send a message (as opposed to
make an assertion):
! <restart-service <daemon <wifi-daemon "wlan0">>>
! <restart-service <daemon user-settings-daemon>>
In future, a command-line tool for sending messages to a system dataspace will be provided; for
now, create temporary configuration language scripts in /run/etc/syndicate/services
:
THROCK=/run/etc/syndicate/services/throck.pr
echo '! <restart-service <daemon <wifi-daemon "wlan0">>>' > $THROCK
sleep 1
rm -f $THROCK
How to schedule one-off or repeating tasks
(TODO. Not yet implemented: a cron-like program will eventually respond to assertions demanding periodic or delayed execution of tasks (likely expressed as assertions, making it more of a delayed-or-periodic-assertion-producing program).)
Timer tolerances
Apple has come up with the useful idea of a timer tolerance, applicable to both repeating and one-off timers. In their documentation, they write:
The timer may fire at any time between its scheduled fire date and the scheduled fire date plus the tolerance. [...] A general rule, set the tolerance to at least 10% of the interval [...] Even a small amount of tolerance has significant positive impact on the power usage of your application.
One-off tasks
Repeating tasks
How to manage user settings
Send a user-settings-command
message containing an assert
or retract
record
containing the setting assertion to add or remove. Use the !
operator of the configuration
language to send a message (as opposed to make an assertion):
! <user-settings-command <assert <mobile-data-enabled>>>
! <user-settings-command <assert <mobile-data-apn "internet">>>
! <user-settings-command <retract <mobile-data-enabled>>>
In future, a command-line tool for sending such messages will be provided; for now, create
temporary configuration language scripts in /run/etc/syndicate/services
:
THROCK=/run/etc/syndicate/services/throck.pr
echo '! <user-settings-command <assert <mobile-data-enabled>>>' > $THROCK
sleep 1
rm -f $THROCK
How to reboot and power off the machine
(TODO. Not yet implemented: eventually, synit-pid1
will respond to messages/assertions from
the dataspace, implementing the necessary coordination for a graceful shutdown procedure. For
now, sync
three times, sleep a bit, and reboot -f
or poweroff -f
...)
How to suspend the machine
(TODO. Not yet implemented: eventually, assertions in the dataspace will control the desired suspend state, and reactive stanzas will allow responses to any kind of ambient conditions to include changes in the suspend state.)
The preserves-tools
package
The preserves-tools
package includes useful command-line utilities for working with
Preserves values and schemas.
At present, it includes the preserves-tool
Swiss-army-knife
utility, which is useful for
- converting between text and binary Preserves syntaxes;
- pretty-printing (indenting) text Preserves syntax;
- manipulating Preserves annotations;
- breaking down and filtering Preserves documents using preserves path selectors;
- and so on.
See also the preserves-tool
documentation.
Preserves
Synit makes extensive use of Preserves, a programming-language-independent language for data.
- Preserves homepage
- Preserves specification
- Preserves Schema specification
- Source code for many (not all) of the implementations
- Implementations for Nim, Python, Racket, Rust, Squeak Smalltalk, TypeScript/Javascript
The Preserves data language is in many ways comparable to JSON, XML, S-expressions, CBOR, ASN.1 BER, and so on. From the specification document:
Preserves supports records with user-defined labels, embedded references, and the usual suite of atomic and compound data types, including binary data as a distinct type from text strings.
Why does Synit rely on Preserves?
There are four aspects of Preserves that make it particularly relevant to Synit:
- the core Preserves data language has a robust semantics;
- a canonical form exists for every Preserves value;
- Preserves values may have capability references embedded within them; and
- Preserves has a schema language useful for specifying protocols among actors.
Grammar of values
Preserves has programming-language-independent semantics: the specification defines an equivalence relation over Preserves values.1 This makes it a solid foundation for a multi-language, multi-process, potentially distributed system like Synit. 2
Values and Types
Preserves values come in various types: a few basic atomic types, plus sequence, set, dictionary, and record compound types. From the specification:
Value = Atom Atom = Boolean
| Compound | Float
| Embedded | Double
| SignedInteger
Compound = Record | String
| Sequence | ByteString
| Set | Symbol
| Dictionary
Concrete syntax
Preserves offers multiple syntaxes, each useful in different settings. Values are automatically, losslessly translatable from one syntax to another because Preserves' semantics are syntax-independent.
The core Preserves specification defines a text-based, human-readable, JSON-like syntax, that is a syntactic superset of JSON, and a completely equivalent compact binary syntax, crucial to the definition of canonical form for Preserves values.3
Here are a few example values, written using the text syntax (see the specification for the grammar):
Boolean : #t #f
Float : 1.0f 10.4e3f -100.6f
Double : 1.0 10.4e3 -100.6
Integer : 1 0 -100
String : "Hello, world!\n"
ByteString : #"bin\x00str\x00" #[YmluAHN0cgA] #x"62696e0073747200"
Symbol : hello-world |hello world| = ! hello? || ...
Record : <label field1 field2 ...>
Sequence : [value1 value2 ...]
Set : #{value1 value2 ...}
Dictionary : {key1: value1 key2: value2 ...: ...}
Embedded : #!value
Commas are optional in sequences, sets, and dictionaries.
Canonical form
Every Preserves value can be serialized into a canonical form using the binary syntax along with a few simple rules about serialization ordering of elements in sets and keys in dictionaries.
Having a canonical form means that, for example, a cryptographic hash of a value's canonical serialization can be used as a unique fingerprint for the value.
For example, the SHA-512 digest of the canonical serialization of the value
<sms-delivery <address international "31653131313">
<address international "31655512345">
<rfc3339 "2022-02-09T08:18:29.88847+01:00">
"This is a test SMS message">
is
bfea9bd5ddf7781e34b6ca7e146ba2e442ef8ce04fd5ff912f889359945d0e2967a77a13
c86b13959dcce7e8ba3950d303832b825648609447b3d147677163ce
Capabilities
Preserves values can include embedded references, written as values with a #!
prefix. For
example, a command adding <some-setting>
to the user settings database might look like this
as it travels over a Unix pipe connecting a program to the root dataspace:
<user-settings-command <assert <some-setting>> #![0 123]>
The user-settings-command
structure includes the assert
command itself, plus an embedded
capability reference, #![0 123]
, which encodes a transport-specific reference to an object.
TODO: Link to documentation for
sturdy.prs
.
The syntax of values under #!
differs depending on the medium carrying the message.
For example, point-to-point transports need to be able to refer to "my references" (#![0
n]
) and "your
references" (#![1
n]
), while multicast/broadcast media (like Ethernet) need to be able to name
references within specific, named conversational participants (#![<udp [192 168 1 10] 5999>
n]
), and in-memory representations need to use direct pointers (#!140425190562944
).
In every case, the references themselves work like Unix file descriptors: an integer or similar that unforgeably denotes, in a local context, some complex data structure on the other side of a trust boundary.
When capability-bearing Preserves values are read off a transport, the capabilities are automatically rewritten into references to in-memory proxy objects. The reverse process of rewriting capability references happens when an in-memory value is serialized for transmission.
Schemas
Preserves comes with a schema language suitable for defining protocols among actors/programs in Synit. Because Preserves is a superset of JSON, its schemas can be used for parsing JSON just as well as for native Preserves values.4 From the schema specification:
A Preserves schema connects Preserves Values to host-language data structures. Each definition within a schema can be processed by a compiler to produce
- a host-language type definition;
- a partial parsing function from Values to instances of the produced type; and
- a total serialization function from instances of the type to Values.
Every parsed Value retains enough information to always be able to be serialized again, and every instance of a host-language data structure contains, by construction, enough information to be successfully serialized.
Instead of taking host-language data structure definitions as primary, in the way that systems like Serde do, Preserves schemas take the shape of the serialized data as primary.
To see the difference, let's look at an example.
Example: Book Outline
Systems like Serde concentrate on defining (de)serializers for host-language type definitions.
Serde starts from definitions like the following.5 It generates (de)serialization code for various different data languages (such as JSON, XML, CBOR, etc.) in a single programming language: Rust.
#![allow(unused)] fn main() { pub struct BookOutline { pub sections: Vec<BookItem>, } pub enum BookItem { Chapter(Chapter), Separator, PartTitle(String), } pub struct Chapter { pub name: String, pub sub_items: Vec<BookItem>, } }
The (de)serializers are able to convert between in-memory and serialized representations such as the following JSON document. The focus is on Rust: interpreting the produced documents from other languages is out-of-scope for Serde.
{
"sections": [
{ "PartTitle": "Part I" },
"Separator",
{
"Chapter": {
"name": "Chapter One",
"sub_items": []
}
},
{
"Chapter": {
"name": "Chapter Two",
"sub_items": []
}
}
]
}
By contrast, Preserves schemas map a single data language to and from multiple programming languages. Each specific programming language has its own schema compiler, which generates type definitions and (de)serialization code for that language from a language-independent grammar.
For example, a schema able to parse values compatible with those produced by Serde for the type definitions above is the following:
version 1 .
BookOutline = {
"sections": @sections [BookItem ...],
} .
BookItem = @chapter { "Chapter": @value Chapter }
/ @separator "Separator"
/ @partTitle { "PartTitle": @value string } .
Chapter = {
"name": @name string,
"sub_items": @sub_items [BookItem ...],
} .
Using the Rust schema compiler, we see types such as the following, which are similar to but not the same as the original Rust types above:
#![allow(unused)] fn main() { pub struct BookOutline { pub sections: std::vec::Vec<BookItem> } pub enum BookItem { Chapter { value: std::boxed::Box<Chapter> }, Separator, PartTitle { value: std::string::String } } pub struct Chapter { pub name: std::string::String, pub sub_items: std::vec::Vec<BookItem> } }
Using the TypeScript schema compiler, we see
export type BookOutline = {"sections": Array<BookItem>};
export type BookItem = (
{"_variant": "chapter", "value": Chapter} |
{"_variant": "separator"} |
{"_variant": "partTitle", "value": string}
);
export type Chapter = {"name": string, "sub_items": Array<BookItem>};
Using the Racket schema compiler, we see
(struct BookOutline (sections))
(define (BookItem? p)
(or (BookItem-chapter? p)
(BookItem-separator? p)
(BookItem-partTitle? p)))
(struct BookItem-chapter (value))
(struct BookItem-separator ())
(struct BookItem-partTitle (value))
(struct Chapter (name sub_items))
and so on.
Example: Book Outline redux, using Records
The schema for book outlines above accepts Preserves (JSON) documents compatible with the (de)serializers produced by Serde for a Rust-native type.
Instead, we might choose to define a Preserves-native data definition, and to work from that:6
version 1 .
BookOutline = <book-outline @sections [BookItem ...]> .
BookItem = Chapter / =separator / @partTitle string .
Chapter = <chapter @name string @sub_items [BookItem ...]> .
The schema compilers produce exactly the same type definitions7 for this variation. The differences are in the (de)serialization code only.
Here's the Preserves value equivalent to the example above, expressed using the Preserves-native schema:
<book-outline [
"Part I"
separator
<chapter "Chapter One" []>
<chapter "Chapter Two" []>
]>
Notes
The specification defines a total order relation over Preserves values as well.
In particular, dataspaces need the assertion data they contain to have a sensible equivalence predicate in order to be useful at all. If you can't reliably tell whether two values are the same or different, how are you supposed to use them to look things up in anything database-like? Languages like JSON, which don't have a well-defined equivalence relation, aren't good enough. When programs communicate with each other, they need to be sure that their peers will understand the information they receive exactly as it was sent.
Besides the two core syntaxes, other serialization syntaxes are in use in other
systems. For example, the Spritely
Goblins actor library uses a serialization syntax
called Syrup, reminiscent of
bencode
.
You have to use a Preserves text-syntax reader on JSON
terms to do this, though: JSON values like null
, true
, and false
naively read as
Preserves symbols. Preserves doesn't have the concept of null
.
This example is a simplified form of the preprocessor type
definitions for
mdBook, the system
used to render these pages. I use a real Preserves schema
definition for
parsing and producing Serde's JSON representation of mdBook Book
structures in order to
preprocess the text.
By doing so, we lose compatibility with the Serde structures, but the point is to show the kinds of schemas available to us once we move away from strict compatibility with existing data formats.
Well, almost exactly the same. The only difference is in the Rust types, which use tuple-style instead of record-style structs for chapters and part titles.
Working with schemas
Schema source code: *.prs files
Preserves schemas are written in a syntax that (ab)uses Preserves text syntax as a kind of S-expression. Schema source code looks like this:
version 1 .
Present = <Present @username string> .
Says = <Says @who string @what string> .
UserStatus = <Status @username string @status Status> .
Status = =here / <away @since TimeStamp> .
TimeStamp = string .
Conventionally, schema source code is stored in *.prs
files. In this example, the source code
above is placed in simpleChatProtocol.prs
.
Compiling source code to metaschema instances: *.prb files
Many of the code generator tools for Preserves schemas require not source code, but instances of the Preserves metaschema. To compile schema source code to metaschema instances, use preserves-schemac:
yarn global add @preserves/schema
preserves-schemac .:simpleChatProtocol.prs > simpleChatProtocol.prb
Binary-syntax metaschema instances are conventionally stored in *.prb
files.
If you have a whole directory tree of *.prs
files, you can supply just ".
" without the
":
"-prefixed fileglob part. See the preserves-schemac documentation.
Converting the simpleChatProtocol.prb
file to Preserves text syntax lets us read the
metaschema instance corresponding to the source code:
cat simpleChatProtocol.prb | preserves-tool convert
The result:
<bundle {
[
simpleChatProtocol
]: <schema {
definitions: {
Present: <rec <lit Present> <tuple [
<named username <atom String>>
]>>
Says: <rec <lit Says> <tuple [
<named who <atom String>>
<named what <atom String>>
]>>
Status: <or [
[
"here"
<lit here>
]
[
"away"
<rec <lit away> <tuple [
<named since <ref [] TimeStamp>>
]>>
]
]>
TimeStamp: <atom String>
UserStatus: <rec <lit Status> <tuple [
<named username <atom String>>
<named status <ref [] Status>>
]>>
}
embeddedType: #f
version: 1
}>
}>
Generating support code from metaschema instances
Support exists for working with schemas in many languages, including Python, Rust, TypeScript, Racket, and Squeak Smalltalk.
Python
Python doesn't have a separate compilation step: it loads binary metaschema instances at runtime, generating classes on the fly.
After pip install preserves
, load metaschemas with preserves.schema.load_schema_file
:
from preserves import stringify, schema, parse
S = schema.load_schema_file('./simpleChatProtocol.prb')
P = S.simpleChatProtocol
Then, members of P
are the definitions from simpleChatProtocol.prs
:
>>> P.Present('me')
Present {'username': 'me'}
>>> stringify(P.Present('me'))
'<Present "me">'
>>> P.Present.decode(parse('<Present "me">'))
Present {'username': 'me'}
>>> P.Present.try_decode(parse('<Present "me">'))
Present {'username': 'me'}
>>> P.Present.try_decode(parse('<NotPresent "me">')) is None
True
>>> stringify(P.UserStatus('me', P.Status.here()))
'<Status "me" here>'
>>> stringify(P.UserStatus('me', P.Status.away('2022-03-08')))
'<Status "me" <away "2022-03-08">>'
>>> x = P.UserStatus.decode(parse('<Status "me" <away "2022-03-08">>'))
>>> x.status.VARIANT
#away
>>> x.status.VARIANT == Symbol('away')
True
Rust
Generate Rust definitions corresponding to a metaschema instance with preserves-schema-rs.
The best way to use it is to integrate it into your build.rs
(see the
docs), but you can also use it as a standalone command-line tool.
The following command generates a directory ./rs/chat
containing rust sources for a module
that expects to be called chat
in Rust code:
preserves-schema-rs --output-dir rs/chat --prefix chat simpleChatProtocol.prb
Representative excerpts from one of the generated files, ./rs/chat/simple_chat_protocol.rs
:
pub struct Present {
pub username: std::string::String
}
pub struct Says {
pub who: std::string::String,
pub what: std::string::String
}
pub struct UserStatus {
pub username: std::string::String,
pub status: Status
}
pub enum Status {
Here,
Away {
since: std::boxed::Box<TimeStamp>
}
}
pub struct TimeStamp(pub std::string::String);
TypeScript
Generate TypeScript definitions from schema sources (not metaschema instances) using preserves-schema-ts. Unlike other code generators, this one understands schema source code directly.
The following command generates a directory ./ts/gen
containing TypeScript sources:
preserves-schema-ts --output ./ts/gen .:simpleChatProtocol.prs
Representative excerpts from one of the generated files, ./ts/gen/simpleChatProtocol.ts
:
export type Present = {"username": string};
export type Says = {"who": string, "what": string};
export type UserStatus = {"username": string, "status": Status};
export type Status = ({"_variant": "here"} | {"_variant": "away", "since": TimeStamp});
export type TimeStamp = string;
Squeak Smalltalk
After loading the Preserves
package from the Preserves project SqueakSource
page, perhaps via
Installer squeaksource project: 'Preserves'; install: 'Preserves'.
you can load and compile the bundle using something like
(PreservesSchemaEnvironment fromBundleFile: 'simpleChatProtocol.prb')
category: 'Example-Preserves-Schema-SimpleChat';
prefix: 'SimpleChat';
cleanCategoryOnCompile: true;
compileBundle.
which results in classes whose names are prefixed with SimpleChat
being created in package
Example-Preserves-Schema-SimpleChat
. Here's a screenshot of a browser showing the generated
classes:
Exploring the result of evaluating the following expression, which generates a Smalltalk object in the specified schema, yields the following screenshot:
SimpleChatSimpleChatProtocolStatus away
since: (SimpleChatSimpleChatProtocolTimeStamp new value: '2022-03-08')
Exploring the result of evaluating the following expression, which generates a Smalltalk object representing the Preserves value corresponding to the value produced in the previous expression, yields the following screenshot:
(SimpleChatSimpleChatProtocolStatus away
since: (SimpleChatSimpleChatProtocolTimeStamp new value: '2022-03-08'))
asPreserves
Finally, the following expression parses a valid Status
string input:
SimpleChatSimpleChatProtocolStatus
from: '<away "2022-03-08">' parsePreserves
orTry: []
If it had been invalid, the answer would have been nil
(because [] value
is nil
).
Capturing and rendering interaction traces
- Trace schema: [syndicate-protocols]/schemas/trace.prs
- Trace rendering tool: syndicate-render-msd
The syndicate-server
program is able to capture traces of all Syndicated Actor
Model interactions that traverse it, saving them as
TraceEntry
records (defined in trace.prs) to a file for later analysis.
Recording a trace
To record a trace, start syndicate-server
with the -t <trace-file>
(or --trace-file <trace-file>
) command-line options. All interactions will be recorded in the named file.
The contents of the file will look a bit like this:
<trace 1643236405.7954443 1 <start <anonymous>>>
<trace 1643236405.7959989 11 <start <named dependencies_listener>>>
<trace 1643236405.7960189 21 <start <named config_watcher>>>
<trace 1643236405.7960294 31 <start <named daemon_listener>>>
<trace 1643236405.7960389 41 <start <named debt_reporter_listener>>>
<trace 1643236405.7960542 51 <start <named milestone_listener>>>
<trace 1643236405.7960613 61 <start <named tcp_relay_listener>>>
<trace 1643236405.7960687 71 <start <named unix_relay_listener>>>
<trace 1643236405.7960766 81 <start <named logger>>>
<trace 1643236405.7960895 1 <turn 9 <external "top-level actor"> [<facet-start [12 2]> <spawn #f 11> <spawn #f 21> <spawn #f 31> <spawn #f 41> <spawn #f 51> <spawn #f 61> <spawn #f 71> <enqueue <event <entity 1 12 140358591713488> <assert <value <run-service <config-watcher "config" {config: #!"1/12:00007fa7c80010d0" gatekeeper: #!"1/12:00007fa7c8005420" log: #!"1/12:00007fa7c80011b0"}>>> 3>>> <spawn #f 81>]>>
<trace 1643236405.7961453 1 <turn 29 <caused-by 9> [<dequeue <event <entity 1 12 140358591713488> <assert <value <run-service <config-watcher "config" {config: #!"1/12:00007fa7c80010d0" gatekeeper: #!"1/12:00007fa7c8005420" log: #!"1/12:00007fa7c80011b0"}>>> 3>>>]>>
<trace 1643236405.7962394 81 <turn 49 <caused-by 9> [<facet-start [122 92]> <enqueue <event <entity 1 12 140358591713712> <assert <value <Observe <rec log [<bind <_>> <bind <_>>]> #!"81/122:00007fa7c800ff10">> 13>>>]>>
<trace 1643236405.796323 11 <turn 19 <caused-by 9> [<facet-start [102 22]> <enqueue <event <entity 1 12 140358591713488> <assert <value <Observe <rec require-service [<bind <_>>]> #!"11/102:00007fa75c0010b0">> 23>>>]>>
...
Rendering a trace
Tools such as syndicate-render-msd can process trace files to produce message-sequence-diagram-like interactive renderings of their contents. The trace file excerpted above renders (in part) in the browser to the following screenshot:
Enhancements such as streaming of a live trace and filtering and selecting subtraces are on the roadmap.
Python support libraries
The py3-preserves
and py3-syndicate
packages include the Python implementations of
Preserves (preserves
on PyPI;
git) and the
Syndicated Actor Model and Syndicate
Protocol (syndicate-py
on PyPI;
git), respectively.
When installed, the libraries are available in the standard location for system-wide Python packages.
Shell-scripting libraries
The syndicate-sh
package includes /usr/lib/syndicate/syndicate.sh
, an implementation of the
Syndicate Protocol for Bash. Scripts may take advantage of the library to
interact with peers via system dataspaces, either as supervised
services or as external programs making use of the gatekeeper
service.
Examples of both kinds of script are included in the syndicate-sh
git
repository (see the
examples
directory).
Preserves schemas
The following pages document schemas associated with the Preserves data language and its tools.
Preserves Schema metaschema
The Preserves Schema metaschema defines the structure of the abstract syntax (AST) of schemas. Every valid Preserves Schema document can be represented as an instance of the metaschema. (And of course the metaschema is itself a schema, which is in turn an instance of the metaschema!)
⟶ See Appendix A: "Metaschema" of the Preserves Schema specification.
Preserves Path schema
Preserves Path is a language for selecting and filtering portions of a Preserves value. It has an associated schema describing the various kinds of Path expressions as abstract syntax.
The schema source below is taken from path/path.prs in the Preserves source code repository.
Preserves Path expressions come in several flavours: selectors, steps (axes and filters), and predicates. Each is described below along with its abstract syntax definitions.
Selectors and Steps
Selectors are a sequence of steps, applied one after the other to the currently-selected value. Each step transforms an input into zero or more outputs. A step is an axis or a filter.
Selector = [Step ...] .
Step = Axis / Filter .
Axes: selecting portions of the input
Each axis step generally selects some sub-portion or -portions of the current document. An
axis may also have a secondary filtering effect: for example, label
only applies to Records,
and will yield an empty result set when applied to any other kind of input.
Axis =
/ <values> ;; yields the immediate subvalues of the input nonrecursively
/ <descendants> ;; recurses through all descendant subvalues of the input
/ <at @key any> ;; extracts a subvalue named by the given key, if any
/ <label> ;; extracts a Record's label, if any
/ <keys> ;; extracts all keys (for subvalues) of the input, nonrecursively
/ <length> ;; extracts the length/size of the input, if any
/ <annotations> ;; extracts all annotations attached to the input
/ <embedded> ;; moves into the representation of an embedded value, if any
/ <parse @module [symbol ...] @name symbol> ;; parses using Preserves Schema
/ <unparse @module [symbol ...] @name symbol> ;; unparses using Preserves Schema
.
The parse
and unparse
variants name Schema definitions,
to be resolved by the eventual surrounding context in which the expression will be executed. A
parse
axis parses the input using a Schema definition; if the parse succeeds, the axis moves
into the parse result. Similarly, unparse
expects an abstract parse result, transforming it
back into a concrete value according to the Schema definition.
Filters: rejecting inputs
Each filter step generally applies some test to the current document as a whole, either emitting it unchanged (with exceptions, detailed below) or emitting no outputs at all.
Filter =
/ <nop> ;; Always emit the input
/ <compare @op Comparison @literal any> ;; Emit iff the comparison holds
/ <regex @regex string> ;; Emit iff input is String and regex matches
/ <test @pred Predicate> ;; Apply complex predicate
/ <real> ;; Emit iff input is Float, Double, or Integer
/ <int> ;; TRUNCATE and emit iff Float, Double or Integer
/ <kind @kind ValueKind> ;; Emit iff input kind matches
.
Complex predicates
The complex predicates in a test
filter are built up from logical connectives over selectors.
A Selector
predicate evaluates to true whenever, applied to its input, it results in a
non-empty output set.
Predicate =
/ Selector
/ <not @pred Predicate>
/ <or @preds [Predicate ...]>
/ <and @preds [Predicate ...]>
.
Comparison against a literal
Each compare
filter includes a Comparison
and a literal value to compare the input against.
For example, <compare eq 3>
only produces an output if the input is equal (according to the
Preserves semantic model) to
3
.
Comparison = =eq / =ne / =lt / =ge / =gt / =le .
NB. For inequalities (lt
/ge
/gt
/le
), comparison between values of different kinds is
undefined in the current draft specification.
Filtering by value kind
Each kind
filter selects only values from one of the kinds of Preserves value:
ValueKind =
/ =Boolean / =Float / =Double / =SignedInteger / =String / =ByteString / =Symbol
/ =Record / =Sequence / =Set / =Dictionary
/ =Embedded
.
Syndicated Actor Model schemas
- Schema definitions:
[syndicate-protocols]/schemas/
The following pages document schemas associated with the Syndicated Actor Model. By and large, these schemas are contained in the syndicate-protocols Git repository.
"Observe" assertions
The protocol for interaction with a dataspace entity is very simple: any assertion can be sent to a dataspace. The job of a dataspace is to relay assertions on to interested peers; they do not generally interpret assertions themselves.
The sole exception is assertions of interest in other assertions.
These are called "Observe" assertions, or subscriptions:
Observe = <Observe @pattern dataspacePatterns.Pattern @observer #!any>.
An Observe
assertion contains a pattern and a reference to an
observer entity. When an Observe
assertion is published to a dataspace, the dataspace alters
its internal index to make a note of the new expression of interest. It also immediately relays
any of its already-existing assertions that match the pattern to the observer entity. As other
assertions come and go subsequently, the dataspace takes care to inform the observer entity in
the Observe
record of the arrival or departure of any of the changing assertions that match
the pattern.
Patterns over assertions
Each subscription record asserted at a dataspace entity contains a pattern over Preserves values.
The pattern language is carefully chosen to be reasonably expressive without closing the door to efficient indexing of dataspace contents.1
Interpretation of patterns
A pattern is matched against a candidate input value. Matching can either fail or succeed; if matching succeeds, a sequence of (numbered) bindings is produced. Each binding in the sequence corresponds to a (possibly-nested) binding pattern in the overall pattern.
Example
Consider the pattern:
<arr [<lit 1> <bind <arr [<bind <_>> <_>]>> <_>]>
The following values each yield different results:
-
[1 2 3]
fails, because2
is not an array. -
[1 [2 3] 4]
succeeds, yielding a binding sequence[[2 3] 2]
, because the outerbind
captures the whole[2 3]
array, and the inner (nested)bind
captures the2
. -
[1 [2 3 4] 5]
fails, because[2 3 4]
has more than the expected two elements. -
[1 [<x> <y>] []]
succeeds, yielding a binding sequence[[<x> <y>] <x>]
. Each discard pattern (<_>
) ignores the specific input it is given.
Abstract syntax of patterns
A pattern may be either a discard, a (nested) binding, a literal, or a compound.
Pattern = DDiscard / DBind / DLit / DCompound .
Discard
A discard pattern matches any input value.
DDiscard = <_>.
Binding
A binding pattern speculatively pushes the portion of the input under consideration onto the end of the binding sequence being built, and then recursively evaluates its subpattern. If the subpattern succeeds, so does the overall binding pattern (keeping the binding); otherwise, the speculative addition to the binding sequence is undone, and the overall binding pattern fails.
DBind = <bind @pattern Pattern>.
Literal
A literal pattern matches any atomic Preserves value. In order to match a literal compound value, a combination of compound and literal patterns must be used.
DLit = <lit @value AnyAtom>.
AnyAtom =
/ @bool bool
/ @float float
/ @double double
/ @int int
/ @string string
/ @bytes bytes
/ @symbol symbol
/ @embedded #!any
.
Compound
Each compound pattern first checks the type of its input: a rec
pattern fails unless it is
given a Record, an arr
demands a Sequence and a dict
only matches a Dictionary.
DCompound = <rec @label any @fields [Pattern ...]>
/ <arr @items [Pattern ...]>
/ <dict @entries { any: Pattern ...:... }> .
If the type check fails, the pattern match fails. Otherwise, matching continues:
-
rec
patterns compare the label of the input Record against thelabel
field in the pattern; unless they match literally and exactly, the overall match fails. Otherwise, if the number of fields in the input does not equal the number expected in the pattern, the match fails. Otherwise, matching proceeds structurally recursively. -
arr
patterns fail if the number of subpatterns does not match the number of items in the input Sequence. Otherwise, matching proceeds structurally recursively. -
dict
patterns consider each of the key/subpattern pairs inentries
in turn, according to the Preserves order of the keys.2 If any given key from the pattern is not present in the input value, matching fails. Otherwise, matching proceeds recursively. The pattern ignores keys in the input value that are not mentioned in the pattern.
Most implementations of Syndicated Actor Model dataspaces use an efficient index datastructure described here.
The ordering of visiting of keys in a dict
pattern is important
because bindings are numbered in this pattern language, not named. Recall that <dict {a: <bind <_>>, b: <bind <_>>}>
is an identical Preserves value to <dict {b: <<bind <_>>, a: <bind <_>>}>
, so to guarantee consistent binding results, we must choose some
deterministic order for visiting the subpatterns of the dict
. (In this example, a
will
be visited before b
, because a
< b
).
Gatekeeper and Sturdy-references
Wire-protocol
The wire-protocol schema, used for communication among entities separated by a point-to-point link of some kind, is fully described as part of the Syndicate Protocol specification.
Service dependencies
Tracing
Transport addresses
Syndicated Actor Model
The Syndicated Actor Model (SAM) [Garnock-Jones 2017] is an approach to concurrency based on the Communicating Event-Loop Actor Model [De Koster et al 2016] as pioneered by E [Miller 2006] and AmbientTalk [Van Cutsem et al 2007].
While other Actor-like models take message-passing as fundamental, the SAM builds on a different underlying primitive: eventually-consistent replication of state among actors. Message-passing follows as a derived operation.
This fundamental difference integrates Tuplespace- [Gelernter and Carriero 1992] and publish/subscribe-like ideas with concurrent object-oriented programming, and makes the SAM well-suited for building programs and systems that are reactive, robust to change, and graceful in the face of partial failure.
Outline. This document first describes the primitives of SAM interaction, and then briefly illustrates their application to distributed state management and handling of partial failure. It goes on to present the idea of a dataspace, an integration of Tuplespace- and publish/subscribe-like ideas with the SAM. Finally, it discusses the SAM's generalization of object capabilities to allow for control not only over invocation of object behaviour but subscription to object state.
Throughout, we will limit discussion to interaction among actors connected directly to one another: that is, to interaction within a single scope. Scopes can be treated as "subnets" and connected together: see the Syndicate protocol specification.
For more on the SAM, on the concept of "conversational concurrency" that the model is a response to, and on other aspects of the larger project that the SAM is a part of, please see https://syndicate-lang.org/about/ and Garnock-Jones' 2017 dissertation.
Concepts and components of SAM interaction
A number of inter-related ideas must be taken together to make sense of SAM interaction. This section will outline the essentials.
For core concepts of Actor models generally, see De Koster et al.'s outstanding 2016 survey paper, which lays out a taxonomy of Actor systems as well as introducing solid definitions for terms such as "actor", "message", and so on.
Actors, Entities, Assertions and Messages
The SAM is based around actors which not only exchange messages, but publish ("assert") selected portions of their internal state ("assertions") to their peers in a publish/subscribe, reactive manner. Assertions and messages in the SAM are semi-structured data: their structure allows for pattern-matching and content-based routing.
Assertions are published and withdrawn freely throughout each actor's lifetime. When an actor terminates, all its published assertions are automatically withdrawn. This holds for both normal and exceptional termination: crashing actors are cleaned up, too.
An actor in the SAM comprises
- an inbox, for receiving events from peers;
- a state, "all the state that is synchronously accessible by that actor" (De Koster et al 2016);
- a collection of entities; and
- a collection of outbound assertions, the data to be automatically retracted upon actor termination.
The term "entity" in the SAM denotes a reactive object, owned by a specific actor.1 Entities, not actors, are the unit of addressing in the SAM. Every published assertion and every sent message is targeted at some entity. Entities never outlive their actors—when an actor terminates, its entities become unresponsive—but may have lifetimes shorter than their owning actors.
Local interactions, among objects (entities) within the state of the same actor, occur synchronously. All other interactions are considered "remote", and occur asynchronously.
Turns
Each time an event arrives at an actor's inbox, the actor takes a turn. De Koster et al. define turns as follows:
A turn is defined as the processing of a single message by an actor. In other words, a turn defines the process of an actor taking a message from its inbox and processing that message to completion.
In the SAM, a turn comprises
- the event that triggered the turn and the entity addressed by the event,
- the entity's execution of its response to the event, and
- the collection of pending actions produced during execution.
If a turn proceeds to completion without an exception or other crash, its pending actions are delivered to their target entities/actors. If, on the other hand, the turn is aborted for some reason, its pending actions are discarded. This transactional "commit" or "rollback" of a turn is familiar from other event-loop-style models such as Ken [Yoo et al 2012].
Events and Actions
SAM events convey a new assertion, retraction of a previously-established assertion, delivery of a message, or a request for synchronisation.
In response to an event, an actor (entity) schedules actions to be performed at the end of the turn. Actions include not only publication and retraction of assertions, transmission of messages, and issuing of synchronisaton requests, but also termination of the running actor and creation of new actors to run alongside the running actor.
Entity References are Object Capabilities
As mentioned above, entities are the unit of addressing in the SAM. Assertions and message bodies may include references to entities. Actors receiving such references may then use them as targets for later assertions and messages. Entity references act as object capabilities, very similar to those offered by E [Miller 2006].
Entity references play many roles in SAM interactions, but two are of particular importance. First, entity references are used to simulate functions and continuations for encoding remote procedure calls (RPCs). Second, entity references can act like consumers or subscribers, receiving asynchronous notifications about state changes from peers.
Illustrative Examples
To show some of the potential of the SAM, we will explore two representative examples: a distributed spreadsheet, and a cellular modem server.
Spreadsheet cell
Imagine a collection of actors representing portions of a spreadsheet, each containing entities representing spreadsheet cells. Each cell entity publishes public aspects of its state to interested peers: namely, its current value. It also responds to messages instructing it to update its formula. In pseudocode:
1:
define entity Cell(formula):
2:
subscribers ← ∅
3:
on assertion from a peer of interest in our value,
4:
add peer, the entity reference carried in the assertion of interest, to subscribers
5:
on retraction of previously-expressed interest from some peer,
6:
remove peer from subscribers
7:
assert subscriptions to other Cells (using entity references in formula)
8:
on message conveying a new formula,
9:
formula ← newFormula
10:
replace subscription assertions using references in new formula
11:
on assertion conveying updated contents relevant to formula,
12:
value ← eval(formula)
13:
continuously, whenever value or subscribers changes,
14:
assert the contents of value to every peer in subscribers,
15:
retracting previously-asserted values
Much of the subscription-management behaviour of Cell is generic: lines 2–6 managing the subscribers set and lines 13–14 iterating over it will be common to any entity wishing to allow observers to track portions of its state. This observation leads to the factoring-out of dataspaces, introduced below.
Cellular modem server
Imagine an actor implementing a simple driver for a cellular modem, that accepts requests (as Hayes modem command strings) paired with continuations represented as entity references. Any responses the modem sends in reply to a command string are delivered to the continuation entity as a SAM message.
1:
define entity Server():
2:
on assertion Request(commandString, replyEntity)
3:
output commandString via modem serial port
4:
collect response(s) from modem serial port
5:
send response(s) as a message to replyEntity
6:
define entity Client(serverRef):
7:
define entity k:
8:
on message containing responses,
9:
retract the Request assertion
10:
(and continue with other tasks)
11:
assert Request("AT+CMGS=..."
, k) to serverRef
This is almost a standard continuation-passing style encoding of remote procedure call.2 However, there is one important difference: the request is sent to the remote object not as a message, but as an assertion. Assertions, unlike messages, have a lifetime and so can act to set up a conversational frame within which further interaction can take place.
Here, subsequent interaction appears at first glance to be limited to transmission of a response message to replyEntity. But what if the Server were to crash before sending a response?
Erlang [Armstrong 2003] pioneered the use of "links" and "monitors" to detect failure of a
remote peer during an interaction; "broken promises" and a suite of special system messages
such as __whenBroken
and __reactToLostClient
[Miller 2006, chapter 17] do the same for
E. The SAM instead uses retraction of previous assertions to signal failure.
To see how this works, we must step away from the pseudocode above and examine the context where serverRef is discovered for eventual use with Client. In the case that an assertion, rather than a message, conveys serverRef to the client actor, then when Server crashes, the assertion conveying serverRef is automatically retracted. The client actor, interpreting this as failure, can choose to respond appropriately.
The ubiquity of these patterns of service discovery and failure signalling also contributed, along with the patterns of generic publisher/subscriber state management mentioned above, to the factoring-out of dataspaces.
Dataspaces
A special kind of syndicated actor entity, a dataspace, routes and replicates published data according to actors' interests.
1:
define entity Dataspace():
2:
allAssertions ← new Bag()
3:
allSubscribers ← new Set()
4:
on assertion of semi-structured datum a,
5:
add a to allAssertions
6:
if a appears exactly once now in allAssertions,
7:
if a matches Observe(pattern, subscriberRef),
8:
add (pattern, subscriberRef) to allSubscribers
9:
for x in allAssertions, if x matches pattern,
10:
assert x at subscriberRef
11:
otherwise,
12:
for (p, s) in allSubscribers, if a matches p,
13:
assert a at s
14:
on retraction of previously-asserted a,
15:
remove a from allAssertions
16:
if a no longer appears at all in allAssertions,
17:
retract a from all subscribers to whom it was forwarded
18:
if a matches Observe(pattern, subscriberRef),
19:
remove (pattern, subscriberRef) from allSubscribers
20:
retract all assertions previously sent to subscriberRef
Assertions sent to a dataspace are routed by pattern-matching. Subscriptions—tuples associating a pattern with a subscriber entity—are placed in the dataspace as assertions like any other.
A dataspace entity behaves very similarly to a tuplespace [Gelernter and Carriero 1992]. However, there are two key differences.
The first is that, while tuples in a tuplespace are "generative" [Gelernter 1985], taking on independent existence once created and potentially remaining in a tuplespace indefinitely, SAM assertions never outlive their asserting actors. This means that assertions placed at a dataspace only exist as long as they are actively maintained. If an actor terminates or crashes, all its assertions are withdrawn, including those targeted at a dataspace entity. The dataspace, following its definition, forwards all withdrawals on to interested subscribers.
The second is that assertion of a value is idempotent: multiple assertions of the same value3 appear to observers indistinguishable from a single assertion. In other words, assertions at a dataspace are deduplicated.
Applications of dataspaces
Dataspaces have many uses. They are ubiquitous in SAM programs. The form of state replication embodied in dataspaces subsumes Erlang-style links and monitors, publish/subscribe, tuplespaces, presence notifications, directory/naming services, and so on.
Subscription management
The very essence of a dataspace entity is subscription management. Entities wishing to manage collections of subscribers can cooperate with dataspaces: they may either manage a private dataspace entity, or share a dataspace with other entities. For example, in the spreadsheet cell example above, each cell could use its own private dataspace, or all cells could share a dataspace by embedding their values in a record alongside some name for the cell.
Service directory and service discovery
Assertions placed at a dataspace may include entity references. This makes a dataspace an ideal implementation of a service directory. Services advertise their existence by asserting service presence [Konieczny et al 2009] records including their names alongside relevant entity references:
Service("name"
, serviceRef)
Clients discover services by asserting interest in such records using patterns:
Observe(⌜Service("name"
, _)⌝, clientRef)
Whenever some matching Service record has been asserted by a server, the dataspace asserts the corresponding record to clientRef. (The real dataspace pattern language includes binding, not discussed here; see [TODO].)
Failure signalling
Since assertions of service presence are withdrawn on failure, and withdrawals are propagated to interested subscribers, service clients like clientRef above will be automatically notified whenever serviceRef goes out of service. The same principle can also be applied in other similar settings.
Independence from service identity
There's no need to separate service discovery from service interaction. A client may assert its request directly at the dataspace; a service may subscribe to requests in the same direct way:
(client:)
ServiceRequest("name'
, arg1, arg2, ..., replyRef)
(server:)
Observe(⌜ServiceRequest("name'
, ?a, ?b, ..., ?k)⌝, serviceRef)
In fact, there are benefits to doing things this way. If the service should crash mid-transaction, then when it restarts, the incomplete ServiceRequest record will remain, and it can pick up where it left off. The client has become decoupled from the specific identity of the service provider, allowing flexibility that wasn't available before.
Asserting interest in assertions of interest
Subscriptions at a dataspace are assertions like any other. This opens up the possibility of reacting to subscriptions:
Observe(⌜Observe(⌜...⌝, _)⌝, r)
This allows dataspace subscribers to express interest in which other subscribers are present.
In many cases, explicit assertion of presence (via, e.g., the Service records above) is the right thing to do, but from time to time it can make sense for clients to treat the presence of some subscriber interested in their requests as sufficient indication of service presence to go ahead.4
Illustrative Examples revisited
Now that we have Dataspaces in our toolbelt, let's revisit the spreadsheet cell and cellular modem examples from above.
Spreadsheet cell with a dataspace
1:
define entity Cell(dataspaceRef, name, formula):
2:
continuously, whenever value changes,
3:
assert CellValue(name, value) to dataspaceRef
4:
continuously, whenever formula changes,
5:
for each name n in formula,
6:
define entity k:
7:
on assertion of nValue,
8:
value ← (re)evaluation based on formula, nValue, and other nValues
9:
assert Observe(⌜CellValue(n, ?nValue)⌝, k) to dataspaceRef
10:
on message conveying a new formula,
11:
formula ← newFormula
The cell is able to outsource all subscription management to the dataspaceRef it is given. Its behaviour function is looking much closer to an abstract prose specification of a spreadsheet cell.
Cellular modem server with a dataspace
There are many ways to implement RPC using dataspaces,2 each with different characteristics. This implementation uses anonymous service instances, implicit service names, asserted requests, and message-based responses:
1:
define entity Server(dataspaceRef):
2:
define entity serviceRef:
3:
on assertion of commandString and replyEntity
4:
output commandString via modem serial port
5:
collect response(s) from modem serial port
6:
send response(s) as a message to replyEntity
7:
assert Observe(⌜Request(?commandString, ?replyEntity)⌝, serviceRef) to dataspaceRef
8:
define entity Client(dataspaceRef):
9:
define entity k:
10:
on message containing responses,
11:
retract the Request assertion
12:
(and continue with other tasks)
13:
assert Request("AT+CMGS=..."
, k) to dataspaceRef
If the service crashes before replying, the client's request remains outstanding, and a service supervisor [Armstrong 2003, section 4.3.2] can reset the modem and start a fresh service instance. The client remains blissfully unaware that anything untoward happened.
We may also consider a variation where the client wishes to retract or modify its request in case of service crash. To do this, the client must pay more attention to the conversational frame of its interaction with the server. In the pseudocode above, no explicit service discovery step is used, but the client could reason about the server's lifecycle by observing the (disappearance of) presence of the server's subscription to requests: Observe(⌜Observe(⌜Request(⌞_⌟, ⌞_⌟)⌝, _)⌝, ...).
Object-capabilities for access control
Object capabilities are the only properly compositional way to secure a distributed system.5 They are a natural fit for Actor-style systems, as demonstrated by E and its various descendants [Miller 2006, Van Cutsem et al 2007, Stiegler and Tie 2010, Yoo et al 2012 and others], so it makes sense that they would work well for the Syndicated Actor Model.
The main difference between SAM capabilities and those in E-style Actor models is that syndicated capabilities express pattern-matching-based restrictions on the assertions that may be directed toward a given entity, as well as the messages that may be sent its way.
Combined with the fact that subscription is expressed with assertions like any other, this yields a mechanism offering control over state replication and observation of replicated state as well as ordinary message-passing and RPC.
In the SAM, a capability is a triple of
- target actor reference,
- target entity reference within that actor, and
- an attenuation describing accepted assertions and messages.
An "attenuation" is a piece of syntax including patterns over semi-structured data. When an assertion or message is directed to the underlying entity by way of an attenuated capability, the asserted value or message body is checked against the patterns in the attenuation. Values not matching are discarded silently.6
Restricting method calls. For example, a reference to the dataspace where our cellular
modem server example is running could be attenuated to only allow assertions of the form
Request("ATA"
, _). This would have the effect of limiting holders of the capability to only
being able to cause the modem to answer an incoming call ("ATA").
Restricting subscriptions. As another example, a reference to the dataspace where our
spreadsheet cells are running could be attenuated to only allow assertions of the form
Observe(⌜CellValue("B13"
, _)⌝, _). This would have the effect of limiting holders of the
capability to only being able to read the contents (or presence) of cell B13.
Conclusion
We have looked at the concepts involved in the Syndicated Actor Model (SAM), an Actor-like approach to concurrency that offers a form of concurrent object-oriented programming with intrinsic publish/subscribe support. The notion of a dataspace factors out common interaction patterns and decouples SAM components from one another in useful ways. Object capabilities are used in the SAM not only to restrict access to the behaviour offered by objects, but to restrict the kinds of subscriptions that can be established to the state published by SAM objects.
While we have examined some of the high level forms of interaction among entities residing in SAM actors, we have not explored techniques for effectively structuring the internals of such actors. For this, the SAM offers the concept of "facets", which relate directly to conversational contexts; for a discussion of these, see Garnock-Jones' 2017 dissertation, especially chapter 2, chapter 5, chapter 8 and section 11.1. A less formal discussion of facets can also be found on the Syndicate project website.
Bibliography
[Armstrong 2003] Armstrong, Joe. “Making Reliable Distributed Systems in the Presence of Software Errors.” PhD, Royal Institute of Technology, Stockholm, 2003. [PDF]
[De Koster et al 2016] De Koster, Joeri, Tom Van Cutsem, and Wolfgang De Meuter. “43 Years of Actors: A Taxonomy of Actor Models and Their Key Properties.” In Proc. AGERE. Amsterdam, The Netherlands, 2016. [DOI (PDF available)]
[Felleisen 1991] Felleisen, Matthias. “On the Expressive Power of Programming Languages.” Science of Computer Programming 17, no. 1–3 (1991): 35–75. [DOI (PDF available)] [PS]
[Fischer et al 1985] Fischer, Michael J., Nancy A. Lynch, and Michael S. Paterson. “Impossibility of Distributed Consensus with One Faulty Process.” Journal of the ACM 32, no. 2 (April 1985): 374–382. [DOI (PDF available)] [PDF]
[Garnock-Jones 2017] Garnock-Jones, Tony. “Conversational Concurrency.” PhD, Northeastern University, 2017. [PDF] [HTML]
[Gelernter 1985] Gelernter, David. “Generative Communication in Linda.” ACM TOPLAS 7, no. 1 (January 2, 1985): 80–112. [DOI]
[Gelernter and Carriero 1992] Gelernter, David, and Nicholas Carriero. “Coordination Languages and Their Significance.” Communications of the ACM 35, no. 2 (February 1, 1992): 97–107. [DOI]
[Karp 2015] Karp, Alan H. “Access Control for IoT: A Position Paper.” In IEEE Workshop on Security and Privacy for IoT. Washington, DC, USA, 2015. [PDF]
[Konieczny et al 2009] Konieczny, Eric, Ryan Ashcraft, David Cunningham, and Sandeep Maripuri. “Establishing Presence within the Service-Oriented Environment.” In IEEE Aerospace Conference. Big Sky, Montana, 2009. [DOI]
[Miller 2006] Miller, Mark S. “Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control.” PhD, Johns Hopkins University, 2006. [PDF]
[Morris 1968] Morris, James Hiram, Jr. “Lambda-Calculus Models of Programming Languages.” PhD thesis, Massachusetts Institute of Technology, 1968. [Available online]
[Stiegler and Tie 2010] Stiegler, Marc, and Jing Tie. “Introduction to Waterken Programming.” Technical Report. Hewlett-Packard Labs, August 6, 2010. [Available online]
[Van Cutsem et al 2007] Van Cutsem, Tom, Stijn Mostinckx, Elisa González Boix, Jessie Dedecker, and Wolfgang De Meuter. “AmbientTalk: Object-Oriented Event-Driven Programming in Mobile Ad Hoc Networks.” In Proc. XXVI Int. Conf. of the Chilean Soc. of Comp. Sci. (SCCC’07). Iquique, Chile, 2007. [DOI]
[Yoo et al 2012] Yoo, Sunghwan, Charles Killian, Terence Kelly, Hyoun Kyu Cho, and Steven Plite. “Composable Reliability for Asynchronous Systems.” In Proc. USENIX Annual Technical Conference. Boston, Massachusetts, 2012. [Talk] [PDF] [Project page]
Notes
The terminology used in the SAM connects to the names used in E [Miller 2006] as follows: our actors are E's vats; our entities are E's objects.
Many variations on RPC are discussed in section 8.7 of Garnock-Jones' 2017 dissertation (direct link to relevant section of online text).
Here the thorny question of the equivalence of entity
references rears its head. Preserves specifies an equivalence over its Value
s that is
generic in the equivalence over embedded values such as entity references. The ideal
equivalence here would be observational equivalence [Morris 1968, Felleisen
1991]: two references are the same when they react indistinguishably to assertions and
messages. However, this isn't something that can be practically implemented except in
relatively limited circumstances. Fortunately, in most cases, pointer equivalence of
entity references is good enough to work with, and that's what I've implemented to date
(modulo details such as structural comparison of attenuations attached to a reference
etc.).
Karp [2015] offers a good justification of this claim along with a worked example of object-capabilities in a personal-computing setting. The capabilities are ordinary E-style capabilities rather than SAM-style capabilities, but the conclusions hold.
You might be wondering "why silent discard of assertions rejected by an attenuation filter?", or more generally, "why discard assertions and messages silently on any kind of failure?" The answer is related to the famous Fischer/Lynch/Paterson (FLP) result [Fischer et al 1985], where one cannot distinguish between a failed process or a slow process. By extending the reasoning to a process that simply ignores some or all of its inputs, we see that offering any kind of response at the SAM level in case of failure or rejection would be a false comfort, because nothing would prevent successful delivery of a message to a recipient which then simply discards it. Instead, processes have to agree ahead of time on the conversational frame in which they will communicate. The SAM encourages a programming style where assertions are used to set up a conversational frame, and then other interactions happen in the context of the information carried in those assertions; see the section where we revisit the cellular modem server with the components decoupled and placed in a conversational frame by addition of a dataspace to the system. Finally, and with all this said, debugging-level notifications of rejected or discarded messages have their place: it's just the SAM itself that does not include feedback of this kind. Implementions are encouraged to offer such aids to debugging.
Syndicate Protocol
Actors that share a local scope can communicate directly. To communicate further afield, scopes are connected using relay actors.1 Relays allow indirect communication: distant entities can be addressed as if they were local.
Relays exchange Syndicate Protocol messages across a transport. A transport is the underlying medium connecting one relay to its counterpart(s). For example, a TLS-on-TCP/IP socket may connect a pair of relays to one another, or a UDP multicast socket may connect an entire group of relays across an ethernet.2
Transport requirements
Transports must
- be able to carry Preserves values back and forth,
- be reliable and in-order,
- have a well-defined session lifecycle (created → connected → disconnected), and
- assure confidentiality, integrity, authenticity, and replay-resistance.
This document focuses primarily on point-to-point transports, discussing multicast and in-memory variations briefly toward the end.
Roles and session lifecycle
The protocol is completely symmetric, aside from certain conventions detailed below about the entities available for use immediately upon connection establishment. It is not a client/server protocol.
Session startup. To begin a session on a newly-established point-to-point link, a relay simply starts sending packets. Each peer starts the session with an empty entity reference map (see below) and making no assertions in either the outbound (on behalf of local entities) or inbound (on behalf of the remote peer) directions.
Session teardown. At the end of a session, terminated normally or abnormally, cleanly or through involuntary transport disconnection, all published assertions are retracted.3 This is in keeping with the essence of the Syndicated Actor Model (SAM).
Packet definitions
Packets exchanged by relays are Preserves values defined using Preserves schema.
Packet = Turn / Error / Extension .
A packet may be a turn, an error, or an extension.
Packets are neither commands nor responses; they are events.
Extension packets
Extension = <<rec> @label any @fields [any ...]> .
An extension packet must be a Preserves record, but is otherwise unconstrained.
Handling. Peers MUST ignore extensions that they do not understand.4
Error packets
Error = <error @message string @detail any>.
Handling. An error packet describes something that went wrong on the other end of the connection. Error packets are primarily intended for debugging.
Receipt of an error packet denotes that the sender has terminated (crashed) and will not respond further; the connection will usually be closed shortly thereafter.
Error packets are optional: connections may simply be closed without comment.
Turn packets
Turn = [TurnEvent ...].
TurnEvent = [@oid Oid @event Event].
Event = Assert / Retract / Message / Sync .
Assert = <assert @assertion Assertion @handle Handle>.
Retract = <retract @handle Handle>.
Message = <message @body Assertion>.
Sync = <sync @peer #!#t>.
Assertion = any .
Handle = int .
Oid = int .
A Turn
is the most important packet variant. It directly reflects the
SAM notion of a turn.
Handling. Each Turn
carries events to be delivered to
entities residing in the scope at the receiving end of the transport.
Each event is either publication of an assertion, retraction of a previously-published
assertion, delivery of a single message, or a synchronization
event.
Upon receipt of a Turn
, the sequence of TurnEvent
s is examined. The
OID in each TurnEvent
selects an entity known to the recipient. If a
particular TurnEvent
's OID is not mapped to an entity, the TurnEvent
is silently ignored,
and the remaining TurnEvent
s in the Turn
are processed.
The assertion
fields of Assert
events and the body
fields of Message
events may contain
any Preserves value, including embedded entity references. On the wire, these will always be
formatted as described below. As each Assert
or Message
is
processed, embedded references are mapped to internal references. Symmetrically, internal
references are mapped to their external form prior to transmission. The mapping procedure to
follow is detailed below.
Turn boundaries. In the case that the receiving party is structured internally using the
SAM, it is important to preserve turn boundaries. Since turn boundaries are a
per-actor concept, but a Turn
mentions only entities, the receiver
must map entities to actors, group TurnEvent
s into per-actor queues, and deliver those queues
to each actor in a single SAM turn for each actor.
Uniqueness. The Handle
s used to refer to published assertions MUST be unique within the
scope of the transport connection.
Capabilities on the wire
References embedded in Turn
packets denote capabilities for interacting with some entity.
For example, assertion of a capability-bearing record could appear as the following Event
:
<assert <please-reply-to #![0 555]>>
The #![0 555]
is concrete Preserves text syntax for
an embedded (#!
) value ([0 555]
).
In the Syndicate Protocol, these embedded values MUST conform to the WireRef
schema:5
WireRef = @mine [0 @oid Oid] / @yours [1 @oid Oid @attenuation Caveat ...].
Oid = int .
The mine
variant denotes capability references managed by the sender of a given packet; the
yours
variant, the receiver of the packet. A relay receiving a packet mentioning #![0 555]
will use #![1 555]
in later responses that refer to that same entity, and vice versa.
Attenuation of authority
A yours
-variant capability may include a request6 to impose
additional conditions on the receiver's use of its own capability, known as an
attenuation of the capability's authority.
An attenuation is a chain of Caveat
s.7 A Caveat
acts as a
function that, given a Preserves value representing an assertion or message body, yields either
a possibly-rewritten value, or no value at all.8 In the latter case, the value
has been rejected. In the former case, the rewritten value is used as input to the next
Caveat
in the chain, or as the final assertion or message body for delivery to the entity
backing the capability.
The chain of Caveats
in an attenuation is written down in reverse order: newer Caveat
s
are appended to the sequence, and each Caveat
's output is fed into the input of the next
leftward Caveat
in the sequence. If no Caveat
s are present, the capability is unattenuated,
and inputs are passed through to the backing capability unmodified.
Caveat = Rewrite / Alts .
Rewrite = <rewrite @pattern Pattern @template Template>.
Alts = <or @alternatives [Rewrite ...]>.
A Caveat
can be either a single Rewrite
or a sequence of alternative possible rewrites,
tried in left-to-right order until one of them accepts the input or there are none left to try.
(A single Rewrite
R is equivalent to <or [
R]>
.)
A Rewrite
applies its Pattern
to the input to the Caveat
. If it matches, the bindings
captured by the pattern are gathered together and used in instantiation of the Rewrite
's
Template
, yielding the output from the Caveat
. If the pattern does not match, the Rewrite
has rejected the input, and other alternatives
are tried until none remain, at which point
the whole Caveat
has rejected the input and processing of the triggering event stops.
Patterns
A Pattern
within a rewrite can be any of the following variants:
Pattern = PDiscard / PAtom / PEmbedded / PBind / PAnd / PNot / Lit / PCompound .
Wildcard. PDiscard
matches any value:
PDiscard = <_>.
Atomic type. PAtom
requires that a matched value be a boolean, a single- or double-precision float, an
integer, a string, a binary blob, or a symbol, respectively:
PAtom = =Boolean / =Float / =Double / =SignedInteger / =String / =ByteString / =Symbol .
Embedded value. PEmbedded
requires that a matched value be an embedded capability:
PEmbedded = =Embedded .
Binding. PBind
first captures the matched value, adding it to the bindings vector, and then applies
the nested pattern
. If the subpattern matches, the PBind
succeeds; otherwise, it fails:
PBind = <bind @pattern Pattern>.
Conjunction. PAnd
is a conjunction of patterns; every pattern in patterns
must match for the PAnd
to
match:
PAnd = <and @patterns [Pattern ...]>.
Negation. PNot
is a pattern negation: if pattern
matches, the PNot
fails to match, and vice
versa. It is an error for pattern
to include any PBind
subpatterns.
PNot = <not @pattern Pattern>.
Literal. Lit
is an exact match pattern. If the matched value is exactly equal to value
(according to
Preserves' own built-in equivalence relation), the match succeeds; otherwise, it fails:
Lit = <lit @value any>.
Compound. Finally, PCompound
patterns match compound data structures. The rec
variant demands that a
matched value be a record, with label exactly equal to label
and fields one-for-one matching
the Pattern
s in fields
; the arr
variant demands a sequence, with each element matching
the corresponding element of items
; and dict
demands a dictionary having at least entries
named by the keys of the entries
dictionary, each matching the corresponding Pattern
.
PCompound =
/ @rec <rec @label any @fields [Pattern ...]>
/ @arr <arr @items [Pattern ...]>
/ @dict <dict @entries { any: Pattern ...:... }> .
Bindings
Matching notionally produces a sequence of values, one for each PBind
in the pattern.
When a PBind
pattern is seen, the matcher first appends the matched value to the binding
sequence and then recurses on the nested subpattern. This makes binding indexes appear in
left-to-right order as a Pattern
is read.
Example. Given the pattern <bind <arr [<bind <_>>, <bind <_>>]>>
and the matched value
["a" "b"]
, the resulting captured values are, in order, ["a" "b"]
, "a"
, and "b"
; the
template <ref 0>
will be instantiated to ["1" "2"]
, <ref 1>
to "a"
, and <ref 2>
to
"b"
.
Templates
A Template
within a rewrite produces a concrete Preserves value when instantiated with a
vector of captured binding values. Template instantiation may fail, yielding no value.
A given Template
may be any of the following variants:
Template = TAttenuate / TRef / Lit / TCompound .
TAttenuate
first instantiates the sub-template
. If it yields a value, and if that value is
an embedded reference (i.e. a capability), the Caveat
s in attenuation
are appended to the
(possibly-empty) sequence of Caveat
s already present in the embedded capability. The
resulting possibly-attenuated capability is the final result of instantiation of the
TAttenuate
.
TAttenuate = <attenuate @template Template @attenuation [Caveat ...]>.
TRef
retrieves the binding
th (0-based) index into the bindings vector, yielding the
associated captured value as the result of instantiation. It is an error if binding
is less
than zero, or greater than or equal to the number of bindings in the bindings vector.
TRef = <ref @binding int>.
Lit
(the same definition as used in the grammar for Pattern
above) instantiates to exactly
its value
argument:
Lit = <lit @value any>.
Finally, TCompound
instantiates to compound data. The rec
variant produces a record with
the given label
and fields
; arr
produces an array; and dict
a dictionary:
TCompound =
/ @rec <rec @label any @fields [Template ...]>
/ @arr <arr @items [Template ...]>
/ @dict <dict @entries { any: Template ...:... }> .
Validity of Caveats
The above definitions imply some validity constraints on Caveat
s.
-
All
TRef
s must be bound: the index referred to must relate to the index associated with somePBind
in the pattern corresponding to the template. -
Binding under negation is forbidden: a
pattern
within aPNot
may not include anyPBind
constructors. -
The value produced by instantiation of
template
within aTAttenuate
must be an embedded reference (a capability).
Implementations MUST enforce these constraints (either statically or dynamically).
Membranes
Every relay maintains two stateful objects called membranes. A membrane is a bidirectional mapping between OID and relay-internal entity pointer. Membranes connect embedded references on the wire to entity references local to the relay.
-
The import membrane connects OIDs managed by the remote peer to local relay entities which proxy access to an "imported" remote entity.
-
The export membrane connects OIDs managed by the local peer to any local "exported" entities accessible to the peer.
Logically, a membrane's state can be represented as a set of WireSymbol
structures: a
WireSymbol
is a triple of an OID, a local reference pointer (its ref), and a reference
count. There is never more than one WireSymbol
associated with an OID or a ref.
A WireSymbol
exists only so long as some assertion mentioning its OID exists across the relay
link. When the last assertion mentioning an OID is retracted, its WireSymbol
is deleted.
Assertions mentioning a particular OID can come from either side of the relay link:
initially, a local reference is sent to the peer in an assertion, but then the peer may assert
something back, either targeting or mentioning the same entity. Care must be taken not to
release an OID entry prematurely in such situations.
For example, at least the following contribute to a WireSymbol
's reference count:
-
The initial entry mapping a local entity ref to an well-known OID for use at session startup (see below) contributes a permanent reference.
-
Mention of an OID in a received or sent
TurnEvent
adds one to the OID's reference count for the duration of processing of the event. ForAssert
events in either direction, the duration of processing is until the assertion is later retracted. For receivedMessage
events, the duration of processing is until the incoming message has been forwarded on to the target ref.
"Transient" references. Embedded references in Message
event bodies are special. Because
messages, unlike assertions, have no notion of lifetime—they are forwarded and forgotten—it is
not possible for a message to cause establishment of a long-lived entry in a membrane's
WireSymbol
set. Therefore, messages MUST NOT embed any reference not previously known to the
peer (a "transient reference"). In other words, only after using an assertion to introduce a
reference, associating a conversational context with its lifetime, is it permitted to discuss
the reference using messages. A relay receiving a message bearing a transient reference MUST
terminate the session with an error. A relay about to send such a message SHOULD preemptively
refuse to do so.
Rewriting embedded references upon receipt
When processing a Value
v in a received Assert
or Message
event, embedded references in
v are decoded from their on-the-wire WireRef
form to in-memory
ref-pointer form.
The value is recursively traversed. As the relay comes across each embedded WireRef
,
-
If it is of
mine
variant, it refers to an entity exported by the remote, sending peer. Its OID is looked up in the import membrane.-
If no
WireSymbol
exists in the import membrane, one is created, mapping the OID to a fresh relay entity for the OID. -
If a
WireSymbol
is already present, its associated ref is substituted into v.
-
-
If it is of
yours
variant, it refers to an entity previously exported by the local, receiving peer. Its OID is looked up in the export membrane.-
If no
WireSymbol
exists for the OID, one is created, associating the OID with a dummy inert entity ref. The dummy ref is substituted into v. It will later be released once the reference count of theWireSymbol
drops to zero. -
If a
WireSymbol
exists for the OID, and theWireRef
is not attenuated, the associated ref is substituted into v. If theWireRef
is attenuated, the associated ref is wrapped with theCaveat
s from theWireRef
before its substitution into v.
-
-
In each case, the
WireSymbol
associated with the OID has its reference count incremented (if anAssert
is being processed).
Rewriting embedded references for transmission
When transmitting a Value
v in an Assert
or Message
event, embedded references in v
are encoded from their in-memory ref-pointer form to on-the-wire WireRef
form.
The value is recursively traversed. As the relay comes across each embedded reference:
-
The reference is first looked up in the export membrane. If an associated
WireSymbol
is present in the export membrane, its OID is substituted as amine
-variantWireRef
into v. -
Otherwise, it is looked up in the import membrane. If no associated
WireSymbol
exists there, a fresh OID andWireSymbol
are placed in the export membrane, and the new OID is substituted as amine
-variantWireRef
into v. If aWireSymbol
exists in the import membrane, however, the embedded reference must be a local relay entity referencing a previously-imported remote entity:-
If the local entity reference has not been attenuated subsequent to its import, the OID it was imported under is substituted as a
yours
-variantWireRef
into v with an empty attenuation. -
If it has been attenuated, the relay may choose whether to trust the remote party to enforce an attenuation request. If it trusts the peer to honour attenuation requests, it substitutes a
yours
-variantWireRef
with non-empty attenuation into v. Otherwise, a fresh OID andWireSymbol
are placed in the export membrane, with ref denoting the attenuated local reference, and the new OID is substituted as amine
-variantWireRef
into v.
-
Relay entities
A relay entity is a local proxy for an entity at the other side of a relay link. It forwards
events delivered to it—assert
, retract
, message
and sync
—across the link to its
counterpart at the other end. It holds two pieces of state: a pointer to the relay link, and
the OID of the remote entity it represents. It packages all received events into TurnEvent
s
which are then sent across the transport.
Turn boundaries. When the relay is structured internally using the SAM, it is important to
preserve turn boundaries. When all the relay entities of a given relay instance are managed by
a single actor, this will be natural: a single turn can deliver events to a group of entities
in the actor, so if the relay entity enqueues its TurnEvent
s in a buffer which is flushed
into a Turn
packet sent across the transport at the conclusion of the turn, the correct turn
boundaries will be preserved.
Client and server roles
While the protocol itself is symmetric, in many cases there will be one active ("client") and one passive ("server") party during the establishment of a transport connection.
As an optional convention, a "server" MAY have a single entity exposed as well-known OID 0 at the establishment of a connection, and a "client" MAY likewise expect OID 0 to resolve to some pre-arranged entity. It is frequently useful for the pre-arranged entity to be a gatekeeper service, but direct exposure of a dataspace or even some domain-specific object can also be useful. Either or both party to a connection may play one role, the other, neither, or both.
APIs for making use of relays in programs should permit programs to supply to a newly-constructed relay an (optional) initial ref, to be exposed as well-known OID 0; an (optional) initial OID, to denote a remote well-known OID and to be immediately proxied by a local relay entity; or both.
In the case of TCP/IP, the "client" role is often played by a connect
ing party, and the
"server" by a listen
ing party, but the opposite arrangement is also useful from time to time.
Security considerations
The security considerations for this protocol fall into two categories: those having to do with particular transports for relay instances, and those having to do with the protocol itself.
Transport security
The security of an instance of the protocol depends on the security characteristics of its transport.
Confidentiality. Parties outwith the communicating peers must not be able to deduce the
contents of packets sent back and forth: some of the packets may contain secrets. For example,
a Resolve
message sent to a gatekeeper service contains a "bearer
capability", which conveys authority to any holder able to present it to the gatekeeper.
Integrity. Packets delivered to peers must be proof from tampering or other in-flight damage.
Authenticity. Each packet delivered to a peer must have genuinely originated with another party, and must have genuinely originated in the same session. Forgery of packets must be prevented.
Replay-resistance. Each packet delivered to a peer must be delivered exactly once within the context of the transport session. That is, replay of otherwise-authentic packets must not be possible from outside the session.
Protocol security
The protocol builds on, and directly reflects, the object-capability security model of the SAM. Entities are accessed via unforgeable references (OIDs). OIDs are meaningful only within the context of their transport session; in this way, they are analogous to Unix file descriptors, which are small integers that meaningfully denote objects only within the context of a single Unix process. If the transport is secure, so is the reference.
Entities can only obtain references to other entities by the standard methods by which "connectivity begets connectivity"; namely:
-
By initial conditions. The relevant initial conditions here are the state of the relays at the moment a transport session is established, including any mappings from well-known OIDs to their underlying refs.
-
By parenthood and by endowment. No direct provision is made for creation of new entities in this protocol, so these do not apply.
-
By introduction. Transmission of OIDs in
Turn
packets, and the associated rules for managing the mappings between OIDs and references, are the normal method by which references pass from one entity to another.
While transport confidentiality is important for preserving secrecy of secrets such as bearer capabilities, OIDs do not need this kind of protection. An attacker able to observe OIDs communicated via a transport does not gain authority to deliver events to the denoted entity. At most, the attacker may glean information on patterns of interconnectivity among entities communicating across a transport link.
Relation to CapTP
This protocol is strikingly similar to a family of protocols known as CapTP (see, for example, here, here and here). This is no accident: the Syndicated Actor Model draws heavily on the actor model, and has over the years been incrementally evolving to be closer and closer to the actor model as it appears in the E programming language. However, the Syndicate protocol described in this document was developed based on the needs of the Syndicated Actor Model, without particular reference to CapTP. This makes it all the more striking that the similarities should be so strong. No doubt I have been subconsciously as well as consciously influenced by E's design, but perhaps there might also be a Platonic form awaiting discovery somewhere nearby.
For example:
-
CapTP has the notion of a "c-list [capability list] index", cognate with our OID. A c-list index is meaningful only within the context of a transport connection, just like an OID is. A given c-list index maps to a "live-ref", an in-memory pointer to an object, in the same way that an OID maps to a ref via a
WireSymbol
. -
CapTP has "the four tables" at each end of a connection; each of our relays has two membranes, each having two unidirectional mapping tables.
-
Syndicate gatekeeper services borrow the concept of a SturdyRef directly from CapTP. However, the notion of a gatekeeper entity at well-known OID 0 is an example of convergent evolution in action: in the CapTP world, the analogous service happens also to be available at c-list index 0, by convention.
A notable difference is that this protocol completely lacks support for the promises/futures of CapTP. CapTP c-list indices are just one part of a framework of descriptors (descs) denoting various kinds of remote object and eventual remote-procedure-call (RPC) result. The SAM handles RPC in a different, more low-level way.
Specific transport mappings
For now, this document focuses on SOCK_STREAM
-like transports: reliable, in-order,
bidirectional, connection-oriented, fully-duplex byte streams. While these transports naturally
have a certain level of integrity assurance and replay-resistance associated with them, special
care should be taken in the case of non-cryptographic transport protocols like plain TCP/IP.
To use such a transport for this protocol, establish a connection and begin transmitting
Packet
s encoded as Preserves values using either the Preserves text
syntax or the Preserves
binary syntax.
The session starts with the first packet and ends with transport disconnection. If either peer
in a connection detects a syntax error, it MUST disconnect the transport. A responding server
MUST support the binary syntax, and MAY also support the text syntax. It can autodetect the
syntax variant by following the rules in the
specification:
the first byte of a valid binary-syntax Preserves document is guaranteed not to be
interpretable as the start of a valid UTF-8 sequence.
Packet
s encoded in either binary or text syntax are self-delimiting. However, peers using
text syntax MAY choose to insert whitespace (e.g. newline) after each transmitted packet.
Some domain-specific details are also relevant:
-
Unix-domain sockets. An additional layer of authentication checks can be made based on process-ID and user-ID credentials associated with each Unix-domain socket.
-
TCP/IP sockets. Plain TCP/IP sockets offer only weak message integrity and replay-resistance guarantees, and offer no authenticity or confidentiality guarantees at all. Plain TCP/IP sockets SHOULD NOT be used; consider using TLS sockets instead.
-
TLS atop TCP/IP. An additional layer of authentication checks can be made based on the signatures and certificates exchanged during TLS setup.
TODO: concretely develop some recommendations for ordinary use of TLS certificates, including referencing a domain name in a
SturdyRef
, checking the presented certificate, and requiring SNI at the server end. -
WebSockets atop HTTP 1.x. These suffer similar flaws to plain TCP/IP sockets and SHOULD NOT be used.
-
WebSockets atop HTTPS 1.x. Similar considerations to the use of TLS sockets apply regarding authentication checks. WebSocket messages are self-delimiting; peers MUST place exactly one
Packet
in each WebSocket message. Since (a) WebSockets are established after a standard HTTP(S) message header exchange, (b) every HTTP(S) request header starts with an ASCII letter, and (c) everyPacket
in text syntax begins with the ASCII "<
" character, it is possible to autodetect use of a WebSocket protocol multiplexed on a server socket that is also able to handle plain Preserves binary and/or text syntax forPacket
s: any ASCII character between "A
" and "Z
" or "a
" and "z
" must be HTTP, an ASCII "<
" must be Preserves text syntax, and any byte with the high bit set must be Preserves binary syntax.
Appendix: Complete schema of the protocol
The following is a consolidated form of the definitions from the text above.
Protocol packets
The authoritative
version of this schema is
[syndicate-protocols]/schemas/protocol.prs
.
version 1 .
Packet = Turn / Error / Extension .
Extension = <<rec> @label any @fields [any ...]> .
Error = <error @message string @detail any>.
Assertion = any .
Handle = int .
Event = Assert / Retract / Message / Sync .
Oid = int .
Turn = [TurnEvent ...].
TurnEvent = [@oid Oid @event Event].
Assert = <assert @assertion Assertion @handle Handle>.
Retract = <retract @handle Handle>.
Message = <message @body Assertion>.
Sync = <sync @peer #!#t>.
Capabilities, WireRefs, and attenuations
The authoritative version of this schema is
[syndicate-protocols]/schemas/sturdy.prs
.
version 1 .
Attenuation = [Caveat ...].
Caveat = Rewrite / Alts .
Rewrite = <rewrite @pattern Pattern @template Template>.
Alts = <or @alternatives [Rewrite ...]>.
Oid = int .
WireRef = @mine [0 @oid Oid] / @yours [1 @oid Oid @attenuation Caveat ...].
Lit = <lit @value any>.
Pattern = PDiscard / PAtom / PEmbedded / PBind / PAnd / PNot / Lit / PCompound .
PDiscard = <_>.
PAtom = =Boolean / =Float / =Double / =SignedInteger / =String / =ByteString / =Symbol .
PEmbedded = =Embedded .
PBind = <bind @pattern Pattern>.
PAnd = <and @patterns [Pattern ...]>.
PNot = <not @pattern Pattern>.
PCompound =
/ @rec <rec @label any @fields [Pattern ...]>
/ @arr <arr @items [Pattern ...]>
/ @dict <dict @entries { any: Pattern ...:... }> .
Template = TAttenuate / TRef / Lit / TCompound .
TAttenuate = <attenuate @template Template @attenuation Attenuation>.
TRef = <ref @binding int>.
TCompound =
/ @rec <rec @label any @fields [Template ...]>
/ @arr <arr @items [Template ...]>
/ @dict <dict @entries { any: Template ...:... }> .
Appendix: Pseudocode for attenuation, pattern matching, and template instantiation
Attenuation
def attenuate(attenuation, value):
for caveat in reversed(attenuation):
value = applyCaveat(caveat, value)
if value is None:
return None
return value
def applyCaveat(caveat, value):
if caveat is 'Alts' variant:
for rewrite in caveat.alternatives:
possibleResult = tryRewrite(rewrite, value);
if possibleResult is not None:
return possibleResult
return None
if caveat is 'Rewrite' variant:
return tryRewrite(caveat, value)
def tryRewrite(rewrite, value):
bindings = applyPattern(rewrite.pattern, value)
if bindings is None:
return None
else:
return instantiateTemplate(rewrite.template, bindings)
Pattern matching
def match(pattern, value, bindings):
if pattern is 'PDiscard' variant:
return True
if pattern is 'PAtom' variant:
return True if value is of the appropriate atomic class else False
if pattern is 'PEmbedded' variant:
return True if value is a capability else False
if pattern is 'PBind' variant:
append value to bindings
return match(pattern.pattern, value, bindings)
if pattern is 'PAnd' variant:
for p in pattern.patterns:
if not match(p, value, bindings):
return False
return True
if pattern is 'PNot' variant:
return False if match(pattern.pattern, value, bindings) else True
if pattern is 'Lit' variant:
return (pattern.value == value)
if pattern is 'PCompound' variant:
if pattern is 'rec' variant:
if value is not a record: return False
if value.label is not equal to pattern.label: return False
if value.fields.length is not equal to pattern.fields.length: return False
for i in [0 .. pattern.fields.length):
if not match(pattern.fields[i], value.fields[i], bindings):
return False
return True
if pattern is 'arr' variant:
if value is not a sequence: return False
if value.length is not equal to pattern.items.length: return False
for i in [0 .. pattern.items.length):
if not match(pattern.items[i], value[i], bindings):
return False
return True
if pattern is 'dict' variant:
if value is not a dictionary: return False
for k in keys of pattern.entries:
if k not in keys of value: return False
if not match(pattern.entries[k], value[k], bindings):
return False
return True
Template instantiation
def instantiate(template, bindings):
if template is 'TAttenuate' variant:
c = instantiate(template.template, bindings)
if c is not a capability: raise an exception
c′ = c with the caveats in template.attenuation appended to the existing
attenuation in c
return c′
if template is 'TRef' variant:
if 0 ≤ template.binding < bindings.length:
return bindings[template.binding]
else:
raise an exception
if template is 'Lit' variant:
return template.value
if template is 'TCompound' variant:
if template is 'rec' variant:
return Record(label=template.label,
fields=[instantiate(t, bindings) for t in template.fields])
if template is 'arr' variant:
return [instantiate(t, bindings) for t in template.items]
if template is 'dict' variant:
result = {}
for k in keys of template.entries:
result[k] = instantiate(template.entries[k], bindings)
return result
Notes
Strictly speaking, scope subnets are connected by relay actors. The situation is directly analogous to IP subnets being connected by IP routers.
In fact, it makes perfect sense to run the relay protocol between actors that are already connected in some scope: this is like running a VPN, tunnelling IP over IP. A variation of the Syndicate Protocol like this gives federated dataspaces.
This process of assertion-retraction on termination is largely automatic when relay actors are structured internally using the SAM: simply terminating a SAM actor automatically retracts its published assertions.
This specification does not define any extensions, but future revisions could, for example, use extensions to perform version-negotiation. Another potential future use could be to propagate provenance information for tracing/debugging.
The syntax for WireRef
s is slightly silly, using tuples as
quasi-records with 0
and 1
acting as quasi-labels. It would probably be better to use
real records, like <my @oid Oid>
and <yours @oid Oid @attenuation [Caveat ...]>
. Pros:
less cryptic. Cons: slightly more verbose on the wire. TODO: should we revise the spec in
this regard?
Such conditions can only ever be requests: after all, every
yours
-capability is already completely accessible to the recipient of the packet.
Similarly, it does not make sense to include an attenuation description on a
my
-capability. However, in every case, if a party wishes to enforce an attenuation on a
my
- or yours
-capability, it may record the attenuation against the underlying
capability internally, issuing to its peers a fresh my
-capability denoting the attenuated
capability.
This terminology, "caveat", is lifted from the excellent paper on Macaroons, where it is used to describe a more general mechanism. Future versions of this specification may opt to include some of this generality.
TODO: It might be better to have a Caveat
yield zero or more values? That
way they can act as filters. I've sometimes wanted the multiple-value case, though I've so
far been able to work around its lack. TODO: Perhaps it would also make sense to have a
Caveat
map an event to zero or more events, rather than to values? Tricky corners
there include ensuring that carried authority isn't misused; macaroons are a very elegant
solution to this problem, of course, so maybe the macaroon design idea could be adapted to
this. For now, Value
→Option<Value>
is probably OK.