1. populate_streams_get_audience
This tries to set status_reply_parent_privacy as None if there is no status.reply_parent, but None is not a valid value for privacy.
This doesn't appear to be breaking anything but does result in a lot of error messages in the logs.
I have set this to equal the original status.privacy - this won't realy have any effect since it only happens when there is no parent,
however we could set this to "direct" if we want to be highly cautious.
2. rerank_user_task
Again, this doesn't seem to caused major issues, but is throwing errors if the user in question no longer exists for some reason.
This commit checks whether 'user' exists before attempting to rerank.
I think the reason I didn't do this initially was so that related users
and books, which are added necessarily after the model instance is
crated, will be part of the object when the task runs, but I have
investigated this and because of the transaction.atomic statement in the
to_model method in bookwyrm/activitypub/base_activity.py and in the
status view (added in this commit), this is not an issue.
Looking at the tracing data from this function in prod, only ~500ms is
spent in the database. My best guess for the rest of the time is
transferring and creating the user objects, which we don't use, since we
simply need the ID.
This is essentially a revert of 9cbff312a. The commit was at the advice
of the Celery docs for optimization, but I've since decided that the
downsides in terms of making things harder to debug (it makes Flower
nearly useless, for instance) are bigger than the upsides in performance
gain (which seem extremely small in practice, given how long our tasks
take, and the number of tasks we have).
This avoids filtering for the user that made the post in the same query
as we use for other things, which should allow for better use of indices
in all cases. Previously, #2723 did some work on this that only worked
for some cases in HomeStream, but this code should work for all cases.
Related: #2720
This is still slow in some cases, despite #2723, so this information
should give useful data about how it could be optimized more.
This also adds some abstraction around getting the tracer, just to
follow the advice in the OpenTelemetry documentation not to use __name__
directly to set the tracer name. The advice is ignored in most of their
examples, so it probably doesn't matter, but IDK, seems reasonable to
try to follow it.
Related: #2720
The queries as they previously existed required joining together 12
different tables, which is extremely expensive. Splitting it into four
queries means that the individual queries can effectively use the
indexes we have, and should be very fast no matter how many statuses are
in the database.
Removing the .distinct() call is fine, since we're adding them to a set
in Redis anyways, which will take care of the duplicates.
It's a bit ugly that we now make four separate calls to Redis (this
might result in things being slightly slower in cases where there are an
extremely small number of statuses), but doing things differently would
result in significantly more surgery to the existing code, so I've opted
to avoid that for the moment.
Fixes: #2725
This splits HomeStream.get_audience into two separate database queries,
in order to more effectively take advantage of the indexes we have.
Combining the user ID query and the user following query means that
Postgres isn't able to use the index we have on the userfollows table.
The query planner claims that the userfollows query should be about 20
times faster than it was previously, and the id query should take a
negligible amount of time, since it's selecting a single item by primary
key.
We don't need to worry about duplicates, since there is a constraint
preventing a user from following themself.
Fixes: #2720
Anywhere we have a user object, we can easily get the user ID in the
caller, and this will allow us more flexibility in the future to
implement optimizations that involve knowing a user ID without querying
the database for the user object.
Since we don't use the results of our Celery tasks (all of them return
None implicitly), it's prudent to set the ignore_result flag, for a
potential performance improvement. See the Celery docs for details [1].
We could do this with the global CELERY_IGNORE_RESULT setting, but it
offers more flexibility if we want to use task results in the future to
set it on a per-task basis.
[1]: https://docs.celeryq.dev/en/stable/userguide/tasks.html#ignore-results-you-don-t-want
Previously, every time a status was saved, a task would start to add it
to people's timelines. This meant there were a ton of duplicate tasks
that were potentially heavy to run. Now, the Status model has a "ready"
field which indicates that it's worth updating the timelines. It
defaults to True, which prevents statuses from accidentally not being
added due to ready state.
The ready state is explicitly set to false in the view, which is the
source of most of the noise for that task.
Retains 'direct' messages at the top of the logic tree to make it easier to understand.
In practice because direct messages are excluded from feeds anyway, this doesn't seem to make much difference, but it's easier to read.
This is in response to #1870
Users should not see links to posts they are not allowed to see, in their feed. The main question is how to stop that happening.
This commit hides all replies to posts if the original post was "followers only" and the user is not a follower of the original poster. The privacy of the reply is not considered relevant (except "direct").
I believe this is the cleanest way to deal with the problem, as it avoids orphaned replies and confusing 404s, and a reply without access to the context of the original post is not particularly useful to anyone. This also feels like it respects the wishes of the original poster more accurately, as it does not draw attention from non-followers to the original followers-only post.
A less draconian approach might be to remove the link to the original status in the feed interface, however that simply leads to confusion of another kind since it will make the interface inconsistent.
This commit does not change any ActivityPub behaviour - it only affects the Bookwyrm user feeds. This means orphaned posts may be sent to external apps like Mastodon.