mirror of
https://github.com/jointakahe/takahe.git
synced 2024-11-28 18:21:00 +00:00
Write some more docs
This commit is contained in:
parent
c8ad22a704
commit
807d546b12
5 changed files with 130 additions and 61 deletions
63
docs/domains.rst
Normal file
63
docs/domains.rst
Normal file
|
@ -0,0 +1,63 @@
|
||||||
|
Domains
|
||||||
|
=======
|
||||||
|
|
||||||
|
One of our key design features in Takahē is that we support multiple different
|
||||||
|
domains for ActivityPub users to be under.
|
||||||
|
|
||||||
|
As a server administrator, you do this by specifying one or more Domains on
|
||||||
|
your server that users can make Identities (posting accounts) under.
|
||||||
|
|
||||||
|
Domains can take two forms:
|
||||||
|
|
||||||
|
* **Takahē lives on and serves the domain**. In this case, you just set the domain
|
||||||
|
to point to Takahē and ensure you have a matching domain record; ignore the
|
||||||
|
"service domain" setting.
|
||||||
|
|
||||||
|
* **Takahē handles accounts under the domain but does not live on it**. For
|
||||||
|
example, you wanted to service the ``@andrew@aeracode.org`` handle, but there
|
||||||
|
is already a site on ``aeracode.org``, and Takahē instead must live elsewhere
|
||||||
|
(e.g. ``fedi.aeracode.org``).
|
||||||
|
|
||||||
|
In this second case, you need to have a *service domain* - a place where
|
||||||
|
Takahē and the Actor URIs for your users live, but which is different to your
|
||||||
|
main domain you'd like the account handles to contain.
|
||||||
|
|
||||||
|
To set this up, you need to:
|
||||||
|
|
||||||
|
* Choose a service domain and point it at Takahē. *You cannot change this
|
||||||
|
domain later without breaking everything*, so choose very wisely.
|
||||||
|
|
||||||
|
* On your primary domain, forward the URLs ``/.well-known/webfinger``,
|
||||||
|
``/.well-known/nodeinfo`` and ``/.well-known/host-meta`` to Takahē.
|
||||||
|
|
||||||
|
* Set up a domain with these separate primary and service domains in its
|
||||||
|
record.
|
||||||
|
|
||||||
|
|
||||||
|
Technical Details
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
At its core, ActivityPub is a system built around URIs; the
|
||||||
|
``@username@domain.tld`` format is actually based on Webfinger, a different
|
||||||
|
standard, and merely used to discover the Actor URI for someone.
|
||||||
|
|
||||||
|
Making a system that allows any Webfinger handle to be accepted is relatively
|
||||||
|
easy, but unfortunately this is only how users are discovered via mentions
|
||||||
|
and search; when an incoming Follow comes in, or a Post is boosted onto your
|
||||||
|
timeline, you have to discover the user's Webfinger handle
|
||||||
|
*from their Actor URI* and this is where it gets tricky.
|
||||||
|
|
||||||
|
Mastodon, and from what we can tell most other implementations, do this by
|
||||||
|
taking the ``preferredUsername`` field from the Actor object, the domain from
|
||||||
|
the Actor URI, and webfinger that combination of username and domain. This
|
||||||
|
means that the domain you serve the Actor URI on must uniquely map to a
|
||||||
|
Webfinger handle domain - they don't need to match, but they do need to be
|
||||||
|
translatable into one another.
|
||||||
|
|
||||||
|
Takahē handles all this internally, however, with a concept of Domains. Each
|
||||||
|
domain has a primary (display) domain name, and an optional "service" domain;
|
||||||
|
the primary domain is what we will use for the user's Webfinger handle, and
|
||||||
|
the service domain is what their Actor URI is served on.
|
||||||
|
|
||||||
|
We look at ``HOST`` headers on incoming requests to match users to their
|
||||||
|
domains, though for Actor URIs we ensure the domain is in the URI anyway.
|
|
@ -15,4 +15,5 @@ in alpha. For more information about Takahē, see
|
||||||
:caption: Contents:
|
:caption: Contents:
|
||||||
|
|
||||||
installation
|
installation
|
||||||
principles
|
domains
|
||||||
|
stator
|
||||||
|
|
|
@ -14,6 +14,7 @@ Prerequisites
|
||||||
* SSL support (Takahē *requires* HTTPS)
|
* SSL support (Takahē *requires* HTTPS)
|
||||||
* Something that can run Docker/OCI images
|
* Something that can run Docker/OCI images
|
||||||
* A PostgreSQL 14 (or above) database
|
* A PostgreSQL 14 (or above) database
|
||||||
|
* Hosting/reverse proxy that passes the ``HOST`` header down to Takahē
|
||||||
* One of these to store uploaded images and media:
|
* One of these to store uploaded images and media:
|
||||||
|
|
||||||
* Amazon S3
|
* Amazon S3
|
||||||
|
@ -28,7 +29,7 @@ This means that a "serverless" platform like AWS Lambda or Google Cloud Run is
|
||||||
not enough by itself; while you can use these to serve the web pages if you
|
not enough by itself; while you can use these to serve the web pages if you
|
||||||
like, you will need to run the Stator runner somewhere else as well.
|
like, you will need to run the Stator runner somewhere else as well.
|
||||||
|
|
||||||
The flagship Takahē instance, [takahe.social](https://takahe.social), runs
|
The flagship Takahē instance, `takahe.social <https://takahe.social>`_, runs
|
||||||
inside of Kubernetes, with one Deployment for the webserver and one for the
|
inside of Kubernetes, with one Deployment for the webserver and one for the
|
||||||
Stator runner.
|
Stator runner.
|
||||||
|
|
||||||
|
|
|
@ -1,59 +0,0 @@
|
||||||
Design Principles
|
|
||||||
=================
|
|
||||||
|
|
||||||
Takahē is somewhat opinionated in its design goals, which are:
|
|
||||||
|
|
||||||
* Simplicity of maintenance and operation
|
|
||||||
* Multiple domain support
|
|
||||||
* Asychronous Python core
|
|
||||||
* Low-JS user interface
|
|
||||||
|
|
||||||
These are explained more below, but it's important to stress the one thing we
|
|
||||||
are not aiming for - scalability.
|
|
||||||
|
|
||||||
If we wanted to build a system that could handle hundreds of thousands of
|
|
||||||
accounts on a single server, it would be built very differently - queues
|
|
||||||
everywhere as the primary communication mechanism, most likely - but we're
|
|
||||||
not aiming for that.
|
|
||||||
|
|
||||||
Our final design goal is for around 10,000 users to work well, provided you do
|
|
||||||
some PostgreSQL optimisation. It's likely the design will work beyond that,
|
|
||||||
but we're not going to put any specific effort towards it.
|
|
||||||
|
|
||||||
After all, if you want to scale in a federated system, you can always launch
|
|
||||||
more servers. We'd rather work towards the ability to share moderation and
|
|
||||||
administration workloads across servers rather than have one giant big one.
|
|
||||||
|
|
||||||
|
|
||||||
Simplicity Of Maintenance
|
|
||||||
-------------------------
|
|
||||||
|
|
||||||
It's important that, when running a social networking server, you have as much
|
|
||||||
time to focus on moderation and looking after your users as you can, rather
|
|
||||||
than trying to be an SRE.
|
|
||||||
|
|
||||||
To this end, we use our deliberate design aim of "small to medium size" to try
|
|
||||||
and keep the infrastructure simple - one set of web servers, one set of task
|
|
||||||
runners, and a PostgreSQL database.
|
|
||||||
|
|
||||||
The task system (which we call Stator) is not based on a task queue, but on
|
|
||||||
a state machine per type of object - which have retry logic built in. The
|
|
||||||
system continually examines every object to see if it can progress its state
|
|
||||||
by performing an action, which is not quite as *efficient* as using a queue,
|
|
||||||
but recovers much more easily and doesn't get out of sync.
|
|
||||||
|
|
||||||
|
|
||||||
Multiple Domain Support
|
|
||||||
-----------------------
|
|
||||||
|
|
||||||
TODO
|
|
||||||
|
|
||||||
|
|
||||||
Asynchronous Python
|
|
||||||
-------------------
|
|
||||||
|
|
||||||
TODO
|
|
||||||
|
|
||||||
|
|
||||||
Low-JS User Interface
|
|
||||||
---------------------
|
|
63
docs/stator.rst
Normal file
63
docs/stator.rst
Normal file
|
@ -0,0 +1,63 @@
|
||||||
|
Stator
|
||||||
|
======
|
||||||
|
|
||||||
|
Takahē's background task system is called Stator, and rather than being a
|
||||||
|
transitional task queue, it is instead a *reconciliation loop* system; the
|
||||||
|
workers look for objects that could have actions taken, try to take them, and
|
||||||
|
update them if successful.
|
||||||
|
|
||||||
|
As someone running Takahē, the most important aspects of this are:
|
||||||
|
|
||||||
|
* You have to run at least one Stator worker to make things like follows,
|
||||||
|
posting, and timelines work.
|
||||||
|
|
||||||
|
* You can run as many workers as you want; there is a locking system to ensure
|
||||||
|
they can coexist.
|
||||||
|
|
||||||
|
* You can get away without running any workers for a few minutes; the server
|
||||||
|
will continue to accept posts and follows from other servers, and will
|
||||||
|
process them when a worker comes back up.
|
||||||
|
|
||||||
|
* There is no separate queue to run, flush or replay; it is all stored in the
|
||||||
|
main database.
|
||||||
|
|
||||||
|
* If all your workers die, just restart them, and within a few minutes the
|
||||||
|
existing locks will time out and the system will recover itself and process
|
||||||
|
everything that's pending.
|
||||||
|
|
||||||
|
You run a worker via the command ``manage.py runstator``. It will run forever
|
||||||
|
until it is killed; send SIGINT (Ctrl-C) to it once to have it enter graceful
|
||||||
|
shutdown, and a second time to force exiting immediately.
|
||||||
|
|
||||||
|
|
||||||
|
Technical Details
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
Each object managed by Stator has a set of extra columns:
|
||||||
|
|
||||||
|
* ``state``, the name of a state in a state machine
|
||||||
|
* ``state_ready``, a boolean saying if it's ready to have a transition tried
|
||||||
|
* ``state_changed``, when it entered into its current state
|
||||||
|
* ``state_attempted``, when a transition was last attempted
|
||||||
|
* ``state_locked_until``, when the entry is locked by a worker until
|
||||||
|
|
||||||
|
They also have an associated state machine which is a subclass of
|
||||||
|
``stator.graph.StateGraph``, which will define a series of states, the
|
||||||
|
possible transitions between them, and handlers that run for each state to see
|
||||||
|
if a transition is possible.
|
||||||
|
|
||||||
|
An object becoming ready for execution happens first:
|
||||||
|
|
||||||
|
* If it's just entered into a new state, or just created, it is marked ready.
|
||||||
|
* If ``state_attempted`` is far enough in the past (based on the ``try_interval``
|
||||||
|
of the current state), a small scheduling loop marks it as ready.
|
||||||
|
|
||||||
|
Then, in the main fast loop of the worker, it:
|
||||||
|
|
||||||
|
* Selects an item with ``state_ready`` that is in a state it can handle (some
|
||||||
|
states are "externally progressed" and will not have handlers run)
|
||||||
|
* Fires up a coroutine for that handler and lets it run
|
||||||
|
* When that coroutine exits, sees if it returned a new state name and if so,
|
||||||
|
transitions the object to that state.
|
||||||
|
* If that coroutine errors or exits with ``None`` as a return value, it marks
|
||||||
|
down the attempt and leaves the object to be rescheduled after its ``try_interval``.
|
Loading…
Reference in a new issue