diff --git a/docs/domains.rst b/docs/domains.rst new file mode 100644 index 0000000..8819e30 --- /dev/null +++ b/docs/domains.rst @@ -0,0 +1,63 @@ +Domains +======= + +One of our key design features in Takahē is that we support multiple different +domains for ActivityPub users to be under. + +As a server administrator, you do this by specifying one or more Domains on +your server that users can make Identities (posting accounts) under. + +Domains can take two forms: + +* **Takahē lives on and serves the domain**. In this case, you just set the domain + to point to Takahē and ensure you have a matching domain record; ignore the + "service domain" setting. + +* **Takahē handles accounts under the domain but does not live on it**. For + example, you wanted to service the ``@andrew@aeracode.org`` handle, but there + is already a site on ``aeracode.org``, and Takahē instead must live elsewhere + (e.g. ``fedi.aeracode.org``). + +In this second case, you need to have a *service domain* - a place where +Takahē and the Actor URIs for your users live, but which is different to your +main domain you'd like the account handles to contain. + +To set this up, you need to: + +* Choose a service domain and point it at Takahē. *You cannot change this + domain later without breaking everything*, so choose very wisely. + +* On your primary domain, forward the URLs ``/.well-known/webfinger``, + ``/.well-known/nodeinfo`` and ``/.well-known/host-meta`` to Takahē. + +* Set up a domain with these separate primary and service domains in its + record. + + +Technical Details +----------------- + +At its core, ActivityPub is a system built around URIs; the +``@username@domain.tld`` format is actually based on Webfinger, a different +standard, and merely used to discover the Actor URI for someone. + +Making a system that allows any Webfinger handle to be accepted is relatively +easy, but unfortunately this is only how users are discovered via mentions +and search; when an incoming Follow comes in, or a Post is boosted onto your +timeline, you have to discover the user's Webfinger handle +*from their Actor URI* and this is where it gets tricky. + +Mastodon, and from what we can tell most other implementations, do this by +taking the ``preferredUsername`` field from the Actor object, the domain from +the Actor URI, and webfinger that combination of username and domain. This +means that the domain you serve the Actor URI on must uniquely map to a +Webfinger handle domain - they don't need to match, but they do need to be +translatable into one another. + +Takahē handles all this internally, however, with a concept of Domains. Each +domain has a primary (display) domain name, and an optional "service" domain; +the primary domain is what we will use for the user's Webfinger handle, and +the service domain is what their Actor URI is served on. + +We look at ``HOST`` headers on incoming requests to match users to their +domains, though for Actor URIs we ensure the domain is in the URI anyway. diff --git a/docs/index.rst b/docs/index.rst index 9bd09b5..95c5a1e 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -15,4 +15,5 @@ in alpha. For more information about Takahē, see :caption: Contents: installation - principles + domains + stator diff --git a/docs/installation.rst b/docs/installation.rst index 32a82a4..660bd3c 100644 --- a/docs/installation.rst +++ b/docs/installation.rst @@ -14,6 +14,7 @@ Prerequisites * SSL support (Takahē *requires* HTTPS) * Something that can run Docker/OCI images * A PostgreSQL 14 (or above) database +* Hosting/reverse proxy that passes the ``HOST`` header down to Takahē * One of these to store uploaded images and media: * Amazon S3 @@ -28,7 +29,7 @@ This means that a "serverless" platform like AWS Lambda or Google Cloud Run is not enough by itself; while you can use these to serve the web pages if you like, you will need to run the Stator runner somewhere else as well. -The flagship Takahē instance, [takahe.social](https://takahe.social), runs +The flagship Takahē instance, `takahe.social `_, runs inside of Kubernetes, with one Deployment for the webserver and one for the Stator runner. diff --git a/docs/principles.rst b/docs/principles.rst deleted file mode 100644 index 737c5f9..0000000 --- a/docs/principles.rst +++ /dev/null @@ -1,59 +0,0 @@ -Design Principles -================= - -Takahē is somewhat opinionated in its design goals, which are: - -* Simplicity of maintenance and operation -* Multiple domain support -* Asychronous Python core -* Low-JS user interface - -These are explained more below, but it's important to stress the one thing we -are not aiming for - scalability. - -If we wanted to build a system that could handle hundreds of thousands of -accounts on a single server, it would be built very differently - queues -everywhere as the primary communication mechanism, most likely - but we're -not aiming for that. - -Our final design goal is for around 10,000 users to work well, provided you do -some PostgreSQL optimisation. It's likely the design will work beyond that, -but we're not going to put any specific effort towards it. - -After all, if you want to scale in a federated system, you can always launch -more servers. We'd rather work towards the ability to share moderation and -administration workloads across servers rather than have one giant big one. - - -Simplicity Of Maintenance -------------------------- - -It's important that, when running a social networking server, you have as much -time to focus on moderation and looking after your users as you can, rather -than trying to be an SRE. - -To this end, we use our deliberate design aim of "small to medium size" to try -and keep the infrastructure simple - one set of web servers, one set of task -runners, and a PostgreSQL database. - -The task system (which we call Stator) is not based on a task queue, but on -a state machine per type of object - which have retry logic built in. The -system continually examines every object to see if it can progress its state -by performing an action, which is not quite as *efficient* as using a queue, -but recovers much more easily and doesn't get out of sync. - - -Multiple Domain Support ------------------------ - -TODO - - -Asynchronous Python -------------------- - -TODO - - -Low-JS User Interface ---------------------- diff --git a/docs/stator.rst b/docs/stator.rst new file mode 100644 index 0000000..0ddd05c --- /dev/null +++ b/docs/stator.rst @@ -0,0 +1,63 @@ +Stator +====== + +Takahē's background task system is called Stator, and rather than being a +transitional task queue, it is instead a *reconciliation loop* system; the +workers look for objects that could have actions taken, try to take them, and +update them if successful. + +As someone running Takahē, the most important aspects of this are: + +* You have to run at least one Stator worker to make things like follows, + posting, and timelines work. + +* You can run as many workers as you want; there is a locking system to ensure + they can coexist. + +* You can get away without running any workers for a few minutes; the server + will continue to accept posts and follows from other servers, and will + process them when a worker comes back up. + +* There is no separate queue to run, flush or replay; it is all stored in the + main database. + +* If all your workers die, just restart them, and within a few minutes the + existing locks will time out and the system will recover itself and process + everything that's pending. + +You run a worker via the command ``manage.py runstator``. It will run forever +until it is killed; send SIGINT (Ctrl-C) to it once to have it enter graceful +shutdown, and a second time to force exiting immediately. + + +Technical Details +----------------- + +Each object managed by Stator has a set of extra columns: + +* ``state``, the name of a state in a state machine +* ``state_ready``, a boolean saying if it's ready to have a transition tried +* ``state_changed``, when it entered into its current state +* ``state_attempted``, when a transition was last attempted +* ``state_locked_until``, when the entry is locked by a worker until + +They also have an associated state machine which is a subclass of +``stator.graph.StateGraph``, which will define a series of states, the +possible transitions between them, and handlers that run for each state to see +if a transition is possible. + +An object becoming ready for execution happens first: + +* If it's just entered into a new state, or just created, it is marked ready. +* If ``state_attempted`` is far enough in the past (based on the ``try_interval`` + of the current state), a small scheduling loop marks it as ready. + +Then, in the main fast loop of the worker, it: + +* Selects an item with ``state_ready`` that is in a state it can handle (some + states are "externally progressed" and will not have handlers run) +* Fires up a coroutine for that handler and lets it run +* When that coroutine exits, sees if it returned a new state name and if so, + transitions the object to that state. +* If that coroutine errors or exits with ``None`` as a return value, it marks + down the attempt and leaves the object to be rescheduled after its ``try_interval``.