Commit graph

117 commits

Author SHA1 Message Date
Bart Schuurmans e1c54b2933 Remove optimizations with adverse effects
`if not audience` actually causes the entire query to be evaluated, before .values_list() is called.
2024-04-04 13:47:56 +02:00
Bart Schuurmans 439cb3ccaa Remove unnecessary conversions between list and set 2024-04-04 13:15:31 +02:00
Bart Schuurmans 321397a349 Specify which column DISTINCT should apply to 2024-04-03 21:28:22 +02:00
Hugh Rundle 5ed1441ddb
fix illegal values in redis jobs
1. populate_streams_get_audience

This tries to set status_reply_parent_privacy as None if there is no status.reply_parent, but None is not a valid value for privacy.
This doesn't appear to be  breaking anything but does result in a lot of error messages in the logs.
I have set this to equal the original status.privacy - this won't realy have any effect since it only happens when there is no parent,
however we could set this to "direct" if we want to be highly cautious.

2. rerank_user_task

Again, this doesn't seem to caused major issues, but is throwing errors if the user in question no longer exists for some reason.
This commit checks whether 'user' exists before attempting to rerank.
2023-08-19 08:34:03 +10:00
Mouse Reeve f6fba19ac4 Only trigger add_status_task when status is first created
I think the reason I didn't do this initially was so that related users
and books, which are added necessarily after the model instance is
crated, will be part of the object when the task runs, but I have
investigated this and because of the transaction.atomic statement in the
to_model method in bookwyrm/activitypub/base_activity.py and in the
status view (added in this commit), this is not an issue.
2023-07-31 17:23:57 -07:00
Mouse Reeve 71dc05f894 Reduce activity in streams queue from boost tasks 2023-07-30 13:07:12 -07:00
Wesley Aptekar-Cassels 3e78e398c0 Switch from priority queues to function-based queues
Fixes: #2907
2023-07-20 12:25:30 -04:00
Wesley Aptekar-Cassels 097cd3ed72 Optimize get_audience by only fetching IDs
Looking at the tracing data from this function in prod, only ~500ms is
spent in the database. My best guess for the rest of the time is
transferring and creating the user objects, which we don't use, since we
simply need the ID.
2023-04-28 12:51:44 -04:00
Wesley Aptekar-Cassels 1048638e30 Stop ignoring task results
This is essentially a revert of 9cbff312a. The commit was at the advice
of the Celery docs for optimization, but I've since decided that the
downsides in terms of making things harder to debug (it makes Flower
nearly useless, for instance) are bigger than the upsides in performance
gain (which seem extremely small in practice, given how long our tasks
take, and the number of tasks we have).
2023-04-07 21:51:44 -04:00
Wesley Aptekar-Cassels 07b50a1453 Optimize get_audience
This avoids filtering for the user that made the post in the same query
as we use for other things, which should allow for better use of indices
in all cases. Previously, #2723 did some work on this that only worked
for some cases in HomeStream, but this code should work for all cases.

Related: #2720
2023-04-07 10:38:14 -04:00
Wesley Aptekar-Cassels 77264493eb Override get_audience instead of _get_audience in LocalStream
I suspect this will make some future work simpler.
2023-04-07 10:37:02 -04:00
Mouse Reeve 5272786fbb
Merge pull request #2779 from WesleyAC/get_audience_more_telemetry
Add more information to get_audience telemetry
2023-04-07 07:12:41 -07:00
Mouse Reeve 4e3513bd41
Merge pull request #2784 from WesleyAC/add-status-cache-get-audience
Only call get_audience once in add_status
2023-04-07 06:43:04 -07:00
Mouse Reeve 10f53d9809
Merge branch 'main' into get_audience_more_telemetry 2023-04-07 06:30:59 -07:00
Wesley Aptekar-Cassels 776c5526c8 Remove ActivityStream.get_stores_for_object
I think this makes the code somewhat more understandable.
2023-04-05 23:20:47 -04:00
Wesley Aptekar-Cassels 7a93b5c315 Only call get_audience once in add_status
This is by far the most expensive part of this task, so this should
double the speed in the increment_unread case.

Related: #2720
2023-04-05 22:11:49 -04:00
Wesley Aptekar-Cassels 78607a0c3e Remove get_stores_for_object abstract method
The implementations still have and use this, we've just removed this
concept from the RedisStore abstraction, which simplifies things
somewhat.
2023-04-05 22:07:38 -04:00
Wesley Aptekar-Cassels 68c6a9e748 Rename remove_object_from_related_stores
This makes the stores argument required, making it simpler to change the
code.
2023-04-05 22:06:09 -04:00
Wesley Aptekar-Cassels 8053f49acc Always pass stores to remove_object_from_related_stores 2023-04-05 21:48:24 -04:00
Wesley Aptekar-Cassels 93bd66ad3e Refactor to delete add_object_to_related_stores
This is working towards some optimizations.
2023-04-05 21:48:22 -04:00
Josh Soref 94c573b469 spelling: appear
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>
2023-04-04 20:02:47 -04:00
Wesley Aptekar-Cassels 118b5bfda7 Add more information to get_audience telemetry
This will assist with debugging #2720, by letting us see which kinds of
requests take the longest.
2023-04-04 14:23:15 -04:00
Wesley Aptekar-Cassels 7efbdb1865 Add more detailed telemetry for get_audience
This is still slow in some cases, despite #2723, so this information
should give useful data about how it could be optimized more.

This also adds some abstraction around getting the tracer, just to
follow the advice in the OpenTelemetry documentation not to use __name__
directly to set the tracer name. The advice is ignored in most of their
examples, so it probably doesn't matter, but IDK, seems reasonable to
try to follow it.

Related: #2720
2023-03-20 20:51:20 -04:00
Mouse Reeve 6345beb90d
Merge pull request #2714 from WesleyAC/celery-ignore-results
Ignore Celery task results
2023-03-12 16:26:20 -07:00
Mouse Reeve c28d523e6f
Merge branch 'main' into get-audience-perf 2023-03-12 15:40:53 -07:00
Wesley Aptekar-Cassels 2a5f722f6e Optimize add/remove book statuses task queries
The queries as they previously existed required joining together 12
different tables, which is extremely expensive. Splitting it into four
queries means that the individual queries can effectively use the
indexes we have, and should be very fast no matter how many statuses are
in the database.

Removing the .distinct() call is fine, since we're adding them to a set
in Redis anyways, which will take care of the duplicates.

It's a bit ugly that we now make four separate calls to Redis (this
might result in things being slightly slower in cases where there are an
extremely small number of statuses), but doing things differently would
result in significantly more surgery to the existing code, so I've opted
to avoid that for the moment.

Fixes: #2725
2023-03-09 15:26:03 -05:00
Wesley Aptekar-Cassels 56243f6529 Optimize HomeStream.get_audience
This splits HomeStream.get_audience into two separate database queries,
in order to more effectively take advantage of the indexes we have.
Combining the user ID query and the user following query means that
Postgres isn't able to use the index we have on the userfollows table.

The query planner claims that the userfollows query should be about 20
times faster than it was previously, and the id query should take a
negligible amount of time, since it's selecting a single item by primary
key.

We don't need to worry about duplicates, since there is a constraint
preventing a user from following themself.

Fixes: #2720
2023-03-09 00:50:24 -05:00
Wesley Aptekar-Cassels 23698dafe5 Change get_audience to return list of user IDs
This will make it simpler to implement various optimizations.
2023-03-09 00:50:24 -05:00
Wesley Aptekar-Cassels 41e14bdfaf Change unread_by_status_type_id to take user ID
Same reason as in prior commit.
2023-03-09 00:50:24 -05:00
Wesley Aptekar-Cassels 653e8ee81b Change unread_id to take user ID
Same reason as described in the prior commit.
2023-03-09 00:50:24 -05:00
Wesley Aptekar-Cassels 5446869c38 Change stream_id to take user ID
Anywhere we have a user object, we can easily get the user ID in the
caller, and this will allow us more flexibility in the future to
implement optimizations that involve knowing a user ID without querying
the database for the user object.
2023-03-09 00:50:16 -05:00
Wesley Aptekar-Cassels 9cbff312a5 Ignore Celery task results
Since we don't use the results of our Celery tasks (all of them return
None implicitly), it's prudent to set the ignore_result flag, for a
potential performance improvement. See the Celery docs for details [1].

We could do this with the global CELERY_IGNORE_RESULT setting, but it
offers more flexibility if we want to use task results in the future to
set it on a per-task basis.

[1]: https://docs.celeryq.dev/en/stable/userguide/tasks.html#ignore-results-you-don-t-want
2023-03-08 02:12:13 -05:00
Mouse Reeve 0354eb9828 Don't add imported reviews to timelines
Generally they're so backdated that they don't add, and they put too
much load on the instance.
2022-12-16 14:11:15 -08:00
Mouse Reeve 317cf5fcf5 Generate fewer add_status_tasks
Previously, every time a status was saved, a task would start to add it
to people's timelines. This meant there were a ton of duplicate tasks
that were potentially heavy to run. Now, the Status model has a "ready"
field which indicates that it's worth updating the timelines. It
defaults to True, which prevents statuses from accidentally not being
added due to ready state.

The ready state is explicitly set to false in the view, which is the
source of most of the noise for that task.
2022-11-15 14:14:32 -08:00
Hugh Rundle cc97c52d12 make get_audience logic clearer
Retains 'direct' messages at the top of the logic tree to make it easier to understand.
In practice because direct messages are excluded from feeds anyway, this doesn't seem to make much difference, but it's easier to read.
2022-08-21 09:33:43 +10:00
Hugh Rundle 8d593e4498 hide replies to posts user cannot see
This is in response to #1870

Users should not see links to posts they are not allowed to see, in their feed. The main question is how to stop that happening.
This commit hides all replies to posts if the original post was "followers only" and the user is not a follower of the original poster. The privacy of the reply is not considered relevant (except "direct").

I believe this is the cleanest way to deal with the problem, as it avoids orphaned replies and confusing 404s, and a reply without access to the context of the original post is not particularly useful to anyone. This also feels like it respects the wishes of the original poster more accurately, as it does not draw attention from non-followers to the original followers-only post.

A less draconian approach might be to remove the link to the original status in the feed interface, however that simply leads to confusion of another kind since it will make the interface inconsistent.

This commit does not change any ActivityPub behaviour - it only affects the Bookwyrm user feeds. This means orphaned posts may be sent to external apps like Mastodon.
2022-08-14 14:17:10 +10:00
Mouse Reeve 876d9c2695 Fixes how backdated statuses are prioritized 2022-05-16 09:24:01 -07:00
Mouse Reeve 23e498879e Fixes account create tasks 2022-01-04 14:17:14 -08:00
Joachim db5e7a886a Handle count of notifications banner 2021-11-24 19:00:30 +01:00
Mouse Reeve f71ef286b6 Updates mocks 2021-11-12 08:55:47 -08:00
Mouse Reeve 3190ef4346 Deprioritize adding old statuses to timelines 2021-11-11 19:19:23 -08:00
Mouse Reeve 2307cb2227 Filter followers only in place
It's only used in one spot
2021-10-06 11:12:03 -07:00
Mouse Reeve 97cc129478 Updates calls to privacy_filter 2021-10-06 10:37:09 -07:00
Mouse Reeve 321949f2fa Lightly updates tests 2021-10-04 09:47:33 -07:00
Mouse Reeve 0798ba028f Fixes unblock signal 2021-10-03 11:51:17 -07:00
Mouse Reeve ea303fb285 Updating string format synatx part 3 2021-09-20 16:45:26 -07:00
Mouse Reeve 08f6a97653 Python formatting 2021-09-18 11:33:43 -07:00
Mouse Reeve 377a4e1ef1 Updating string format syntax part 1 2021-09-17 21:39:18 -07:00
Mouse Reeve c20f6c21ae Fixes date formats 2021-09-11 11:37:10 -07:00
Mouse Reeve cd9fe70dbc Don't increment unread counts on csv import statuses 2021-09-11 10:26:33 -07:00