Don't prune replies to local, add docs

This commit is contained in:
Andrew Godwin 2023-11-12 18:32:38 -07:00
parent eb0b0d775c
commit 460d1d7e1c
6 changed files with 82 additions and 17 deletions

View file

@ -17,7 +17,7 @@ class Command(BaseCommand):
"--number", "--number",
"-n", "-n",
type=int, type=int,
default=5000, default=500,
help="The maximum number of posts to prune at once", help="The maximum number of posts to prune at once",
) )
@ -49,14 +49,32 @@ class Command(BaseCommand):
for reply in replies: for reply in replies:
if reply and reply in post_ids_and_uris: if reply and reply in post_ids_and_uris:
del post_ids_and_uris[reply] del post_ids_and_uris[reply]
print(f" narrowed down to {len(post_ids_and_uris)}")
# Fetch all the posts that they are replies to, and don't delete ones
# that are replies to local posts
print("Excluding ones that are replies to local posts...")
in_reply_tos = (
Post.objects.filter(id__in=post_ids_and_uris.values())
.values_list("in_reply_to", flat=True)
.distinct()
)
local_object_uris = Post.objects.filter(
local=True, object_uri__in=in_reply_tos
).values_list("object_uri", flat=True)
final_post_ids = list(
Post.objects.filter(id__in=post_ids_and_uris.values())
.exclude(in_reply_to__in=local_object_uris)
.values_list("id", flat=True)
)
print(f" narrowed down to {len(final_post_ids)}")
# Delete them # Delete them
print(f" narrowed down to {len(post_ids_and_uris)}") if not final_post_ids:
if not post_ids_and_uris:
sys.exit(1) sys.exit(1)
print("Deleting...") print("Deleting...")
_, deleted = Post.objects.filter(id__in=post_ids_and_uris.values()).delete() _, deleted = Post.objects.filter(id__in=final_post_ids).delete()
print("Deleted:") print("Deleted:")
for model, model_deleted in deleted.items(): for model, model_deleted in deleted.items():
print(f" {model}: {model_deleted}") print(f" {model}: {model_deleted}")

View file

@ -49,6 +49,20 @@ You can download images from `Docker Hub <https://hub.docker.com/r/jointakahe/ta
or use the image name ``jointakahe/takahe:0.10``. or use the image name ``jointakahe/takahe:0.10``.
0.10.1
------
*Released: Not Yet Released*
This is a bugfix and small feature addition release:
* The ``runstator`` command now logs its output to the terminal again
* Two new commands, ``pruneposts`` and ``pruneidentities`` are added, to enable
pruning (deletion of old content) of Posts and Identities respectively.
You can read more about them in :doc:`/tuning`.
Upgrade Notes Upgrade Notes
------------- -------------

View file

@ -9,17 +9,6 @@ Notes TBD.
Upgrade Notes Upgrade Notes
------------- -------------
Remote Pruning
~~~~~~~~~~~~~~
Post pruning is now in and comes *enabled by default*, as it is not directly
destructive (it will only delete content that has not been interacted with
locally and which can be re-fetched).
Nevertheless, if you want to avoid post deletion triggering on your server at
all, you should set the ``TAKAHE_REMOTE_PRUNE_HORIZON`` environment variable to
``0``.
VAPID keys and Push notifications VAPID keys and Push notifications
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View file

@ -88,6 +88,50 @@ servers' timeouts make the connection fail) for more than about a week, some
servers may consider it permanently unreachable and stop sending posts. servers may consider it permanently unreachable and stop sending posts.
Pruning
-------
Over time, the amount of Fediverse content your server consumes will grow -
you'll see every reply to every post from every user you follow, and fetch
every identity of every author of those replies.
Obviously, you don't need all of this past a certain date, as it's unlikely
you'll want to go back to view what the timeline would have looked like months
ago. If you want to remove this data, you can run the two "pruning" commmands::
./manage.py pruneposts
./manage.py pruneidentities
Each operates in batches, and takes an optional ``--number=1000`` argument
to specify the batch size. The ``TAKAHE_REMOTE_PRUNE_HORIZON`` environment
variable specifies the number of days of history you want to keep intact before
the pruning happens - this defaults to 3 months.
Post pruning removes any post that isn't:
* Written by a local identity
* Newer than ``TAKAHE_REMOTE_PRUNE_HORIZON`` days old
* Favourited, bookmarked or boosted by a local identity
* Replied to by a local identity
* A reply to a local identity's post
Identity pruning removes any identity that isn't:
* A local identity
* Newer than ``TAKAHE_REMOTE_PRUNE_HORIZON`` days old
* Mentioned by a post by a local identity
* Followed or blocked by a local identity
* Following or blocking a local identity
* A liker or booster of a local post
We recommend you run the pruning commands on a scheduled basis (i.e. like
a cronjob). They will return a ``0`` exit code if they deleted something and
a ``1`` exit code if they found nothing to delete, if you want to put them in
a loop that runs until deletion is complete::
while ./manage.py pruneposts; do sleep 1; done
Caching Caching
------- -------

View file

@ -1 +1 @@
__version__ = "0.10.0" __version__ = "0.10.1"

View file

@ -16,7 +16,7 @@ class Command(BaseCommand):
"--number", "--number",
"-n", "-n",
type=int, type=int,
default=1000, default=500,
help="The maximum number of identities to prune at once", help="The maximum number of identities to prune at once",
) )