takahe/docs/tuning.rst

Tuning
======

This page contains a collection of tips and settings that can be used to
tune your server based upon its users and the other servers it federates
with.

We recommend that all installations are run behind a CDN, and
have caches configured. See below for more details on each.


Scaling
-------

The only bottleneck, and single point of failure in a Takahē installation is
its database; no permanent state is stored elsewhere.

Provided your database is happy (and PostgreSQL does a very good job of just
using more resources if you give them to it), you can:

* Run more webserver containers to handle a higher request load (requests
  come from both users and other ActivityPub servers trying to forward you
  messages). Consider setting up the DEFAULT cache under high request load, too.

* Run more Stator worker containers to handle a higher processing load (Stator
  handles pulling profiles, fanning out messages to followers, and processing
  stats, among others). You'll generally see Stator load climb roughly in
  relation to the sum of the number of followers each user in your instance has;
  a "celebrity" or other popular account will give Stator a lot of work as it
  has to send a copy of each of their posts to every follower, separately.

* Takahe is run with Gunicorn which spawns several
  `workers <https://docs.gunicorn.org/en/stable/settings.html#workers>`_ to
  handle requests. Depending on what environment you are running Takahe on,
  you might want to customize this via the ``GUNICORN_CMD_ARGS`` environment
  variable. For example - ``GUNICORN_CMD_ARGS="--workers 2"`` to set the
  worker count to 2.


As you scale up the number of containers, keep the PostgreSQL connection limit
in mind; this is generally the first thing that will fail, as Stator workers in
particular are quite connection-hungry (the parallel nature of their internal
processing means they might be working on 50 different objects at once). It's
generally a good idea to set it as high as your PostgreSQL server will take
(consult PostgreSQL tuning guides for the effect changing that settting has
on memory usage, specifically).

If you end up having a large server that is running into database performance
problems, please get in touch with us and discuss it; Takahē is young enough
that we need data and insight from those installations to help optimise it more.


Stator (Task Processing)
------------------------

Takahē's background task processing system is called Stator, and it uses
asynchronous Python to pack loads of tasks at once time into a single process.

By default, it will try to run up to 20 tasks at once, with a maximum of 4 from
any single model (FanOut will usually be the one it's doing most of). You can
tweak these with the ``TAKAHE_STATOR_CONCURRENCY`` and
``TAKAHE_STATOR_CONCURRENCY_PER_MODEL`` environment variables; for every extra
element of concurrency you add, however, it will use an additional database
connection in a new worker thread. Be wary of hitting your database's
connection limits.

The only real limits Stator can hit are CPU and memory usage; if you see your
Stator (worker) containers not using anywhere near all of their CPU or memory,
you can safely increase these numbers.


Federation
----------

ActivityPub, as a federated protocol, involves talking to a lot of other
servers. Sometimes, those servers may be under heavy load and not respond
when Takahē tries to go and fetch user details, posts, or images.

There is a ``TAKAHE_REMOTE_TIMEOUT`` setting to specify the number of seconds
Takahē will wait when making remote requests to other Fediverse instances; it
is set to 5 seconds by default. We recommend you keep this relatively low,
unless for some reason your server is on a very slow internet link.

This may also be a tuple of four floats to set the timeouts for
connect, read, write, and pool timeouts::

  TAKAHE_REMOTE_TIMEOUT='[0.5, 1.0, 1.0, 0.5]'

Note that if your server is unreachable (including being so slow that other
servers' timeouts make the connection fail) for more than about a week, some
servers may consider it permanently unreachable and stop sending posts.


Pruning
-------

Over time, the amount of Fediverse content your server consumes will grow -
you'll see every reply to every post from every user you follow, and fetch
every identity of every author of those replies.

Obviously, you don't need all of this past a certain date, as it's unlikely
you'll want to go back to view what the timeline would have looked like months
ago. If you want to remove this data, you can run the two "pruning" commmands::

  ./manage.py pruneposts
  ./manage.py pruneidentities

Each operates in batches, and takes an optional ``--number=1000`` argument
to specify the batch size. The ``TAKAHE_REMOTE_PRUNE_HORIZON`` environment
variable specifies the number of days of history you want to keep intact before
the pruning happens - this defaults to 3 months.

Post pruning removes any post that isn't:

* Written by a local identity
* Newer than ``TAKAHE_REMOTE_PRUNE_HORIZON`` days old
* Favourited, bookmarked or boosted by a local identity
* Replied to by a local identity
* A reply to a local identity's post

Identity pruning removes any identity that isn't:

* A local identity
* Newer than ``TAKAHE_REMOTE_PRUNE_HORIZON`` days old
* Mentioned by a post by a local identity
* Followed or blocked by a local identity
* Following or blocking a local identity
* A liker or booster of a local post

We recommend you run the pruning commands on a scheduled basis (i.e. like
a cronjob). They will return a ``0`` exit code if they deleted something and
a ``1`` exit code if they found nothing to delete, if you want to put them in
a loop that runs until deletion is complete::

  while ./manage.py pruneposts; do sleep 1; done


Caching
-------

There are two ways Takahē uses caches:

* For caching rendered pages and responses, like user profile information.
  These caches reduce database load on your server and improve performance.

* For proxying and caching remote user images and post images. These must be
  proxied to protect your users' privacy; also caching these reduces
  your server's consumed bandwidth and improves users' loading times.

By default Takakē has Nginx inside its container image configured to perform
read-through HTTP caching for the image and media files, and no cache
configured for page rendering.

Each cache can be adjusted to your needs; let's talk about both.


Page Caching
~~~~~~~~~~~~

This caching helps Takahē avoid database hits by rendering complex pages or
API endpoints only once, and turning it on will reduce your database load.
There is no cache enabled for this by default

To configure it, set the ``TAKAHE_CACHES_DEFAULT`` environment variable.
We support anything that is available as part of
`django-cache-url <https://github.com/epicserve/django-cache-url>`_, but
some cache backends will require additional Python packages not installed
by default with Takahē. More discussion on some major backends is below.


Redis
#####

Examples::

  redis://redis:6379/0
  redis://user:password@redis:6379/0
  rediss://user:password@redis:6379/0

A Redis-protocol server. Use ``redis://`` for unencrypted communication and
``rediss://`` for TLS.


Memcache
########

Examples::

  memcached://memcache:11211?key_prefix=takahe
  memcached://server1:11211,server2:11211

A remote Memcache-protocol server (or set of servers).


Filesystem
##########

Examples::

  file:///var/cache/takahe/

A cache on the local disk. Slower than other options, and only really useful
if you have no other choice.

Note that if you are running Takahē in a cluster, this cache will not be shared
across different machines. This is not quite as bad as it first seems; it just
means you will have more potential uncached requests until all machines have
a cached copy.


Local Memory
############

Examples::

  locmem://default

A local memory cache, inside the Python process. This will consume additional
memory for the process, and should be used with care.


Image and Media Caching
~~~~~~~~~~~~~~~~~~~~~~~

In order to protect your users' privacy and IP addresses, we can't just send
them the remote URLs of user avatars and post images that aren't on your
server; we instead need to proxy them through Takahē in order to obscure who
is requesting them.

Some other ActivityPub servers do this by downloading all media and images as
soon as they see it, and storing it all locally with some sort of clean-up job;
Takahē instead opts for using a read-through cache for this task, which uses
a bit more bandwidth in the long run but which has much easier maintenance and
better failure modes.

Our Docker image comes with this cache built in, as without it you'll be making
Python do a lot of file proxying on every page load (and it's not the best at
that). It's set to 1GB of disk on each container by default, but you can adjust
this by setting the ``TAKAHE_NGINX_CACHE_SIZE`` environment variable to a value
Nginx understands, like ``10g``.

The cache directory is ``/cache/``, and you can mount a different disk into
this path if you'd like to give it faster or more ephemeral storage.

If you have an external CDN or cache, you can also opt to add your own caching
to these URLs; they all begin with ``/proxy/``, and have appropriate
``Cache-Control`` headers set.


CDNs
----

Takahē can be run behind a CDN if you want to offset some of the load from the
webserver containers. Takahē has to proxy all remote user avatars and images in
order to protect the privacy of your users, and has a built-in cache to help
with this (see "Caching" above), but at large scale this might start to get
strained.

If you do run behind a CDN, ensure that your CDN is set to respect
``Cache-Control`` headers from the origin rather than going purely off of file
extensions. Some CDNs go purely off of file
extensions by default, which will not capture all of the proxy views Takahē
uses to show remote images without leaking user information.

If you don't want to use a CDN but still want a performance improvement, a
read-through cache that respects ``Cache-Control``, like Varnish, will
also help if placed in front of Takahē.


Remote Content Pruning
----------------------

By default, Takahē will prune (delete) any remote posts or identities that
haven't been interacted with after 90 days. You can change this using the
``TAKAHE_REMOTE_PRUNE_HORIZON`` environment variable, which accepts an integer
number of days as its value.

Setting this environment variable to ``0`` disables this feature entirely.


Sentry.io integration
---------------------

Takahē can optionally integrate with https://sentry.io for collection of raised
exceptions from the webserver or Stator.

To enable this, set the ``TAKAHE_SENTRY_DSN`` environment variable to the value
found in your sentry project:
``https://<org>.sentry.io/settings/projects/<project>/keys/``

Other Sentry configuration can be controlled through environment variables
found in ``takahe/settings.py``. See the
`Sentry python documentation <https://docs.sentry.io/platforms/python/configuration/options/>`_
for details.
Add TAKAHE_DEFAULT_TIMEOUT with default of 5.0 (#99) 2022-12-04 16:32:25 +00:00			`Tuning`
			`======`

			`This page contains a collection of tips and settings that can be used to`
			`tune your server based upon its users and the other servers it federates`
			`with.`

Hammer home CDN more 2022-12-15 06:48:28 +00:00			`We recommend that all installations are run behind a CDN, and`
			`have caches configured. See below for more details on each.`


Media proxy, caching and tuning docs Fixes #67 2022-12-10 19:16:08 +00:00			`Scaling`
			`-------`

			`The only bottleneck, and single point of failure in a Takahē installation is`
			`its database; no permanent state is stored elsewhere.`

			`Provided your database is happy (and PostgreSQL does a very good job of just`
			`using more resources if you give them to it), you can:`

			`* Run more webserver containers to handle a higher request load (requests`
			`come from both users and other ActivityPub servers trying to forward you`
			`messages). Consider setting up the DEFAULT cache under high request load, too.`

			`* Run more Stator worker containers to handle a higher processing load (Stator`
			`handles pulling profiles, fanning out messages to followers, and processing`
			`stats, among others). You'll generally see Stator load climb roughly in`
			`relation to the sum of the number of followers each user in your instance has;`
			`a "celebrity" or other popular account will give Stator a lot of work as it`
			`has to send a copy of each of their posts to every follower, separately.`

Customize gunicorn worker count (#578) 2023-05-19 16:31:44 +00:00			`* Takahe is run with Gunicorn which spawns several`
Fix small syntax errors (#627) 2023-08-07 15:18:18 +00:00			`workers <https://docs.gunicorn.org/en/stable/settings.html#workers>`_ to
Customize gunicorn worker count (#578) 2023-05-19 16:31:44 +00:00			`handle requests. Depending on what environment you are running Takahe on,`
			you might want to customize this via the ``GUNICORN_CMD_ARGS`` environment
			variable. For example - ``GUNICORN_CMD_ARGS="--workers 2"`` to set the
			`worker count to 2.`


Media proxy, caching and tuning docs Fixes #67 2022-12-10 19:16:08 +00:00			`As you scale up the number of containers, keep the PostgreSQL connection limit`
			`in mind; this is generally the first thing that will fail, as Stator workers in`
			`particular are quite connection-hungry (the parallel nature of their internal`
			`processing means they might be working on 50 different objects at once). It's`
			`generally a good idea to set it as high as your PostgreSQL server will take`
			`(consult PostgreSQL tuning guides for the effect changing that settting has`
			`on memory usage, specifically).`

			`If you end up having a large server that is running into database performance`
			`problems, please get in touch with us and discuss it; Takahē is young enough`
			`that we need data and insight from those installations to help optimise it more.`


Allow tuning of stator concurrency 2022-12-20 08:02:35 +00:00			`Stator (Task Processing)`
			`------------------------`

			`Takahē's background task processing system is called Stator, and it uses`
			`asynchronous Python to pack loads of tasks at once time into a single process.`

Add Stator tuning notes 2023-11-13 17:52:22 +00:00			`By default, it will try to run up to 20 tasks at once, with a maximum of 4 from`
			`any single model (FanOut will usually be the one it's doing most of). You can`
			tweak these with the ``TAKAHE_STATOR_CONCURRENCY`` and
			``TAKAHE_STATOR_CONCURRENCY_PER_MODEL`` environment variables; for every extra
			`element of concurrency you add, however, it will use an additional database`
			`connection in a new worker thread. Be wary of hitting your database's`
			`connection limits.`
Allow tuning of stator concurrency 2022-12-20 08:02:35 +00:00
			`The only real limits Stator can hit are CPU and memory usage; if you see your`
			`Stator (worker) containers not using anywhere near all of their CPU or memory,`
			`you can safely increase these numbers.`


Improve tuning docs formatting 2022-12-10 19:57:36 +00:00			`Federation`
Add TAKAHE_DEFAULT_TIMEOUT with default of 5.0 (#99) 2022-12-04 16:32:25 +00:00			`----------`

Improve tuning docs formatting 2022-12-10 19:57:36 +00:00			`ActivityPub, as a federated protocol, involves talking to a lot of other`
			`servers. Sometimes, those servers may be under heavy load and not respond`
			`when Takahē tries to go and fetch user details, posts, or images.`

			There is a ``TAKAHE_REMOTE_TIMEOUT`` setting to specify the number of seconds
			`Takahē will wait when making remote requests to other Fediverse instances; it`
			`is set to 5 seconds by default. We recommend you keep this relatively low,`
			`unless for some reason your server is on a very slow internet link.`

			`This may also be a tuple of four floats to set the timeouts for`
			`connect, read, write, and pool timeouts::`

			`TAKAHE_REMOTE_TIMEOUT='[0.5, 1.0, 1.0, 0.5]'`
Add TAKAHE_DEFAULT_TIMEOUT with default of 5.0 (#99) 2022-12-04 16:32:25 +00:00
Improve tuning docs formatting 2022-12-10 19:57:36 +00:00			`Note that if your server is unreachable (including being so slow that other`
			`servers' timeouts make the connection fail) for more than about a week, some`
			`servers may consider it permanently unreachable and stop sending posts.`
Added caching and initial settings 2022-12-05 17:55:30 +00:00

Don't prune replies to local, add docs 2023-11-13 01:32:38 +00:00			`Pruning`
			`-------`

			`Over time, the amount of Fediverse content your server consumes will grow -`
			`you'll see every reply to every post from every user you follow, and fetch`
			`every identity of every author of those replies.`

			`Obviously, you don't need all of this past a certain date, as it's unlikely`
			`you'll want to go back to view what the timeline would have looked like months`
			`ago. If you want to remove this data, you can run the two "pruning" commmands::`

			`./manage.py pruneposts`
			`./manage.py pruneidentities`

			Each operates in batches, and takes an optional ``--number=1000`` argument
			to specify the batch size. The ``TAKAHE_REMOTE_PRUNE_HORIZON`` environment
			`variable specifies the number of days of history you want to keep intact before`
			`the pruning happens - this defaults to 3 months.`

			`Post pruning removes any post that isn't:`

			`* Written by a local identity`
			* Newer than ``TAKAHE_REMOTE_PRUNE_HORIZON`` days old
			`* Favourited, bookmarked or boosted by a local identity`
			`* Replied to by a local identity`
			`* A reply to a local identity's post`

			`Identity pruning removes any identity that isn't:`

			`* A local identity`
			* Newer than ``TAKAHE_REMOTE_PRUNE_HORIZON`` days old
			`* Mentioned by a post by a local identity`
			`* Followed or blocked by a local identity`
			`* Following or blocking a local identity`
			`* A liker or booster of a local post`

			`We recommend you run the pruning commands on a scheduled basis (i.e. like`
			a cronjob). They will return a ``0`` exit code if they deleted something and
			a ``1`` exit code if they found nothing to delete, if you want to put them in
			`a loop that runs until deletion is complete::`

			`while ./manage.py pruneposts; do sleep 1; done`


Added caching and initial settings 2022-12-05 17:55:30 +00:00			`Caching`
Media proxy, caching and tuning docs Fixes #67 2022-12-10 19:16:08 +00:00			`-------`
Added caching and initial settings 2022-12-05 17:55:30 +00:00
Nginx now bundled in image, does media caching Also serves static files. Old media caching removed. 2022-12-19 04:26:42 +00:00			`There are two ways Takahē uses caches:`
Media proxy, caching and tuning docs Fixes #67 2022-12-10 19:16:08 +00:00
			`* For caching rendered pages and responses, like user profile information.`
			`These caches reduce database load on your server and improve performance.`

			`* For proxying and caching remote user images and post images. These must be`
			`proxied to protect your users' privacy; also caching these reduces`
			`your server's consumed bandwidth and improves users' loading times.`

Nginx now bundled in image, does media caching Also serves static files. Old media caching removed. 2022-12-19 04:26:42 +00:00			`By default Takakē has Nginx inside its container image configured to perform`
			`read-through HTTP caching for the image and media files, and no cache`
			`configured for page rendering.`
Media proxy, caching and tuning docs Fixes #67 2022-12-10 19:16:08 +00:00
Nginx now bundled in image, does media caching Also serves static files. Old media caching removed. 2022-12-19 04:26:42 +00:00			`Each cache can be adjusted to your needs; let's talk about both.`
Media proxy, caching and tuning docs Fixes #67 2022-12-10 19:16:08 +00:00

Nginx now bundled in image, does media caching Also serves static files. Old media caching removed. 2022-12-19 04:26:42 +00:00			`Page Caching`
			`~~~~~~~~~~~~`
Media proxy, caching and tuning docs Fixes #67 2022-12-10 19:16:08 +00:00
Nginx now bundled in image, does media caching Also serves static files. Old media caching removed. 2022-12-19 04:26:42 +00:00			`This caching helps Takahē avoid database hits by rendering complex pages or`
			`API endpoints only once, and turning it on will reduce your database load.`
			`There is no cache enabled for this by default`
Media proxy, caching and tuning docs Fixes #67 2022-12-10 19:16:08 +00:00
Nginx now bundled in image, does media caching Also serves static files. Old media caching removed. 2022-12-19 04:26:42 +00:00			To configure it, set the ``TAKAHE_CACHES_DEFAULT`` environment variable.
			`We support anything that is available as part of`
Added caching and initial settings 2022-12-05 17:55:30 +00:00			`django-cache-url <https://github.com/epicserve/django-cache-url>`_, but
Fix text on Tuning page 2022-12-07 00:07:10 +00:00			`some cache backends will require additional Python packages not installed`
Nginx now bundled in image, does media caching Also serves static files. Old media caching removed. 2022-12-19 04:26:42 +00:00			`by default with Takahē. More discussion on some major backends is below.`
Media proxy, caching and tuning docs Fixes #67 2022-12-10 19:16:08 +00:00

			`Redis`
			`#####`

			`Examples::`
Improve tuning docs formatting 2022-12-10 19:57:36 +00:00
Media proxy, caching and tuning docs Fixes #67 2022-12-10 19:16:08 +00:00			`redis://redis:6379/0`
			`redis://user:password@redis:6379/0`
			`rediss://user:password@redis:6379/0`

			A Redis-protocol server. Use ``redis://`` for unencrypted communication and
			``rediss://`` for TLS.



			`Memcache`
			`########`

			`Examples::`
Improve tuning docs formatting 2022-12-10 19:57:36 +00:00
Media proxy, caching and tuning docs Fixes #67 2022-12-10 19:16:08 +00:00			`memcached://memcache:11211?key_prefix=takahe`
			`memcached://server1:11211,server2:11211`

			`A remote Memcache-protocol server (or set of servers).`


			`Filesystem`
			`##########`

			`Examples::`
Improve tuning docs formatting 2022-12-10 19:57:36 +00:00
Media proxy, caching and tuning docs Fixes #67 2022-12-10 19:16:08 +00:00			`file:///var/cache/takahe/`

Nginx now bundled in image, does media caching Also serves static files. Old media caching removed. 2022-12-19 04:26:42 +00:00			`A cache on the local disk. Slower than other options, and only really useful`
			`if you have no other choice.`
Media proxy, caching and tuning docs Fixes #67 2022-12-10 19:16:08 +00:00
			`Note that if you are running Takahē in a cluster, this cache will not be shared`
			`across different machines. This is not quite as bad as it first seems; it just`
			`means you will have more potential uncached requests until all machines have`
			`a cached copy.`


			`Local Memory`
			`############`

			`Examples::`
Improve tuning docs formatting 2022-12-10 19:57:36 +00:00
Media proxy, caching and tuning docs Fixes #67 2022-12-10 19:16:08 +00:00			`locmem://default`

			`A local memory cache, inside the Python process. This will consume additional`
Nginx now bundled in image, does media caching Also serves static files. Old media caching removed. 2022-12-19 04:26:42 +00:00			`memory for the process, and should be used with care.`


			`Image and Media Caching`
			`~~~~~~~~~~~~~~~~~~~~~~~`

			`In order to protect your users' privacy and IP addresses, we can't just send`
			`them the remote URLs of user avatars and post images that aren't on your`
			`server; we instead need to proxy them through Takahē in order to obscure who`
			`is requesting them.`

			`Some other ActivityPub servers do this by downloading all media and images as`
			`soon as they see it, and storing it all locally with some sort of clean-up job;`
			`Takahē instead opts for using a read-through cache for this task, which uses`
			`a bit more bandwidth in the long run but which has much easier maintenance and`
			`better failure modes.`

			`Our Docker image comes with this cache built in, as without it you'll be making`
			`Python do a lot of file proxying on every page load (and it's not the best at`
			`that). It's set to 1GB of disk on each container by default, but you can adjust`
			this by setting the ``TAKAHE_NGINX_CACHE_SIZE`` environment variable to a value
			Nginx understands, like ``10g``.

			The cache directory is ``/cache/``, and you can mount a different disk into
			`this path if you'd like to give it faster or more ephemeral storage.`

			`If you have an external CDN or cache, you can also opt to add your own caching`
			to these URLs; they all begin with ``/proxy/``, and have appropriate
			``Cache-Control`` headers set.


			`CDNs`
			`----`

			`Takahē can be run behind a CDN if you want to offset some of the load from the`
			`webserver containers. Takahē has to proxy all remote user avatars and images in`
			`order to protect the privacy of your users, and has a built-in cache to help`
			`with this (see "Caching" above), but at large scale this might start to get`
			`strained.`

			`If you do run behind a CDN, ensure that your CDN is set to respect`
			``Cache-Control`` headers from the origin rather than going purely off of file
			`extensions. Some CDNs go purely off of file`
			`extensions by default, which will not capture all of the proxy views Takahē`
			`uses to show remote images without leaking user information.`

			`If you don't want to use a CDN but still want a performance improvement, a`
			read-through cache that respects ``Cache-Control``, like Varnish, will
			`also help if placed in front of Takahē.`
Simple docs for setting up a sentry.io installation (#540) 2023-03-13 16:49:07 +00:00

Pruning docs and ability to turn off 2023-10-01 16:49:10 +00:00			`Remote Content Pruning`
			`----------------------`

			`By default, Takahē will prune (delete) any remote posts or identities that`
			`haven't been interacted with after 90 days. You can change this using the`
			``TAKAHE_REMOTE_PRUNE_HORIZON`` environment variable, which accepts an integer
			`number of days as its value.`

			Setting this environment variable to ``0`` disables this feature entirely.


Fix tuning EOF 2023-03-14 21:39:42 +00:00			`Sentry.io integration`
Simple docs for setting up a sentry.io installation (#540) 2023-03-13 16:49:07 +00:00			`---------------------`

			`Takahē can optionally integrate with https://sentry.io for collection of raised`
			`exceptions from the webserver or Stator.`

Fix tuning EOF 2023-03-14 21:39:42 +00:00			To enable this, set the ``TAKAHE_SENTRY_DSN`` environment variable to the value
			`found in your sentry project:`
Simple docs for setting up a sentry.io installation (#540) 2023-03-13 16:49:07 +00:00			``https://<org>.sentry.io/settings/projects/<project>/keys/``

			`Other Sentry configuration can be controlled through environment variables`
Fix tuning EOF 2023-03-14 21:39:42 +00:00			found in ``takahe/settings.py``. See the
Simple docs for setting up a sentry.io installation (#540) 2023-03-13 16:49:07 +00:00			`Sentry python documentation <https://docs.sentry.io/platforms/python/configuration/options/>`_
Fix tuning EOF 2023-03-14 21:39:42 +00:00			`for details.`