bookwyrm

mirror of https://github.com/bookwyrm-social/bookwyrm.git synced 2024-06-01 07:08:16 +00:00

Author	SHA1	Message	Date
Bart Schuurmans	e1c54b2933	Remove optimizations with adverse effects `if not audience` actually causes the entire query to be evaluated, before .values_list() is called.	2024-04-04 13:47:56 +02:00
Bart Schuurmans	439cb3ccaa	Remove unnecessary conversions between list and set	2024-04-04 13:15:31 +02:00
Bart Schuurmans	321397a349	Specify which column DISTINCT should apply to	2024-04-03 21:28:22 +02:00
Hugh Rundle	5ed1441ddb	fix illegal values in redis jobs 1. populate_streams_get_audience This tries to set status_reply_parent_privacy as None if there is no status.reply_parent, but None is not a valid value for privacy. This doesn't appear to be breaking anything but does result in a lot of error messages in the logs. I have set this to equal the original status.privacy - this won't realy have any effect since it only happens when there is no parent, however we could set this to "direct" if we want to be highly cautious. 2. rerank_user_task Again, this doesn't seem to caused major issues, but is throwing errors if the user in question no longer exists for some reason. This commit checks whether 'user' exists before attempting to rerank.	2023-08-19 08:34:03 +10:00
Mouse Reeve	f6fba19ac4	Only trigger add_status_task when status is first created I think the reason I didn't do this initially was so that related users and books, which are added necessarily after the model instance is crated, will be part of the object when the task runs, but I have investigated this and because of the transaction.atomic statement in the to_model method in bookwyrm/activitypub/base_activity.py and in the status view (added in this commit), this is not an issue.	2023-07-31 17:23:57 -07:00
Mouse Reeve	71dc05f894	Reduce activity in streams queue from boost tasks	2023-07-30 13:07:12 -07:00
Wesley Aptekar-Cassels	3e78e398c0	Switch from priority queues to function-based queues Fixes: #2907	2023-07-20 12:25:30 -04:00
Wesley Aptekar-Cassels	097cd3ed72	Optimize get_audience by only fetching IDs Looking at the tracing data from this function in prod, only ~500ms is spent in the database. My best guess for the rest of the time is transferring and creating the user objects, which we don't use, since we simply need the ID.	2023-04-28 12:51:44 -04:00
Wesley Aptekar-Cassels	1048638e30	Stop ignoring task results This is essentially a revert of `9cbff312a`. The commit was at the advice of the Celery docs for optimization, but I've since decided that the downsides in terms of making things harder to debug (it makes Flower nearly useless, for instance) are bigger than the upsides in performance gain (which seem extremely small in practice, given how long our tasks take, and the number of tasks we have).	2023-04-07 21:51:44 -04:00
Wesley Aptekar-Cassels	07b50a1453	Optimize get_audience This avoids filtering for the user that made the post in the same query as we use for other things, which should allow for better use of indices in all cases. Previously, #2723 did some work on this that only worked for some cases in HomeStream, but this code should work for all cases. Related: #2720	2023-04-07 10:38:14 -04:00
Wesley Aptekar-Cassels	77264493eb	Override get_audience instead of _get_audience in LocalStream I suspect this will make some future work simpler.	2023-04-07 10:37:02 -04:00
Mouse Reeve	5272786fbb	Merge pull request #2779 from WesleyAC/get_audience_more_telemetry Add more information to get_audience telemetry	2023-04-07 07:12:41 -07:00
Mouse Reeve	4e3513bd41	Merge pull request #2784 from WesleyAC/add-status-cache-get-audience Only call get_audience once in add_status	2023-04-07 06:43:04 -07:00
Mouse Reeve	10f53d9809	Merge branch 'main' into get_audience_more_telemetry	2023-04-07 06:30:59 -07:00
Wesley Aptekar-Cassels	776c5526c8	Remove ActivityStream.get_stores_for_object I think this makes the code somewhat more understandable.	2023-04-05 23:20:47 -04:00
Wesley Aptekar-Cassels	7a93b5c315	Only call get_audience once in add_status This is by far the most expensive part of this task, so this should double the speed in the increment_unread case. Related: #2720	2023-04-05 22:11:49 -04:00
Wesley Aptekar-Cassels	78607a0c3e	Remove get_stores_for_object abstract method The implementations still have and use this, we've just removed this concept from the RedisStore abstraction, which simplifies things somewhat.	2023-04-05 22:07:38 -04:00
Wesley Aptekar-Cassels	68c6a9e748	Rename remove_object_from_related_stores This makes the stores argument required, making it simpler to change the code.	2023-04-05 22:06:09 -04:00
Wesley Aptekar-Cassels	8053f49acc	Always pass stores to remove_object_from_related_stores	2023-04-05 21:48:24 -04:00
Wesley Aptekar-Cassels	93bd66ad3e	Refactor to delete add_object_to_related_stores This is working towards some optimizations.	2023-04-05 21:48:22 -04:00
Josh Soref	94c573b469	spelling: appear Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>	2023-04-04 20:02:47 -04:00
Wesley Aptekar-Cassels	118b5bfda7	Add more information to get_audience telemetry This will assist with debugging #2720, by letting us see which kinds of requests take the longest.	2023-04-04 14:23:15 -04:00
Wesley Aptekar-Cassels	7efbdb1865	Add more detailed telemetry for get_audience This is still slow in some cases, despite #2723, so this information should give useful data about how it could be optimized more. This also adds some abstraction around getting the tracer, just to follow the advice in the OpenTelemetry documentation not to use __name__ directly to set the tracer name. The advice is ignored in most of their examples, so it probably doesn't matter, but IDK, seems reasonable to try to follow it. Related: #2720	2023-03-20 20:51:20 -04:00
Mouse Reeve	6345beb90d	Merge pull request #2714 from WesleyAC/celery-ignore-results Ignore Celery task results	2023-03-12 16:26:20 -07:00
Mouse Reeve	c28d523e6f	Merge branch 'main' into get-audience-perf	2023-03-12 15:40:53 -07:00
Wesley Aptekar-Cassels	2a5f722f6e	Optimize add/remove book statuses task queries The queries as they previously existed required joining together 12 different tables, which is extremely expensive. Splitting it into four queries means that the individual queries can effectively use the indexes we have, and should be very fast no matter how many statuses are in the database. Removing the .distinct() call is fine, since we're adding them to a set in Redis anyways, which will take care of the duplicates. It's a bit ugly that we now make four separate calls to Redis (this might result in things being slightly slower in cases where there are an extremely small number of statuses), but doing things differently would result in significantly more surgery to the existing code, so I've opted to avoid that for the moment. Fixes: #2725	2023-03-09 15:26:03 -05:00
Wesley Aptekar-Cassels	56243f6529	Optimize HomeStream.get_audience This splits HomeStream.get_audience into two separate database queries, in order to more effectively take advantage of the indexes we have. Combining the user ID query and the user following query means that Postgres isn't able to use the index we have on the userfollows table. The query planner claims that the userfollows query should be about 20 times faster than it was previously, and the id query should take a negligible amount of time, since it's selecting a single item by primary key. We don't need to worry about duplicates, since there is a constraint preventing a user from following themself. Fixes: #2720	2023-03-09 00:50:24 -05:00
Wesley Aptekar-Cassels	23698dafe5	Change get_audience to return list of user IDs This will make it simpler to implement various optimizations.	2023-03-09 00:50:24 -05:00
Wesley Aptekar-Cassels	41e14bdfaf	Change unread_by_status_type_id to take user ID Same reason as in prior commit.	2023-03-09 00:50:24 -05:00
Wesley Aptekar-Cassels	653e8ee81b	Change unread_id to take user ID Same reason as described in the prior commit.	2023-03-09 00:50:24 -05:00
Wesley Aptekar-Cassels	5446869c38	Change stream_id to take user ID Anywhere we have a user object, we can easily get the user ID in the caller, and this will allow us more flexibility in the future to implement optimizations that involve knowing a user ID without querying the database for the user object.	2023-03-09 00:50:16 -05:00
Wesley Aptekar-Cassels	9cbff312a5	Ignore Celery task results Since we don't use the results of our Celery tasks (all of them return None implicitly), it's prudent to set the ignore_result flag, for a potential performance improvement. See the Celery docs for details [1]. We could do this with the global CELERY_IGNORE_RESULT setting, but it offers more flexibility if we want to use task results in the future to set it on a per-task basis. [1]: https://docs.celeryq.dev/en/stable/userguide/tasks.html#ignore-results-you-don-t-want	2023-03-08 02:12:13 -05:00
Mouse Reeve	0354eb9828	Don't add imported reviews to timelines Generally they're so backdated that they don't add, and they put too much load on the instance.	2022-12-16 14:11:15 -08:00
Mouse Reeve	317cf5fcf5	Generate fewer add_status_tasks Previously, every time a status was saved, a task would start to add it to people's timelines. This meant there were a ton of duplicate tasks that were potentially heavy to run. Now, the Status model has a "ready" field which indicates that it's worth updating the timelines. It defaults to True, which prevents statuses from accidentally not being added due to ready state. The ready state is explicitly set to false in the view, which is the source of most of the noise for that task.	2022-11-15 14:14:32 -08:00
Hugh Rundle	cc97c52d12	make get_audience logic clearer Retains 'direct' messages at the top of the logic tree to make it easier to understand. In practice because direct messages are excluded from feeds anyway, this doesn't seem to make much difference, but it's easier to read.	2022-08-21 09:33:43 +10:00
Hugh Rundle	8d593e4498	hide replies to posts user cannot see This is in response to #1870 Users should not see links to posts they are not allowed to see, in their feed. The main question is how to stop that happening. This commit hides all replies to posts if the original post was "followers only" and the user is not a follower of the original poster. The privacy of the reply is not considered relevant (except "direct"). I believe this is the cleanest way to deal with the problem, as it avoids orphaned replies and confusing 404s, and a reply without access to the context of the original post is not particularly useful to anyone. This also feels like it respects the wishes of the original poster more accurately, as it does not draw attention from non-followers to the original followers-only post. A less draconian approach might be to remove the link to the original status in the feed interface, however that simply leads to confusion of another kind since it will make the interface inconsistent. This commit does not change any ActivityPub behaviour - it only affects the Bookwyrm user feeds. This means orphaned posts may be sent to external apps like Mastodon.	2022-08-14 14:17:10 +10:00
Mouse Reeve	876d9c2695	Fixes how backdated statuses are prioritized	2022-05-16 09:24:01 -07:00
Mouse Reeve	23e498879e	Fixes account create tasks	2022-01-04 14:17:14 -08:00
Joachim	db5e7a886a	Handle count of notifications banner	2021-11-24 19:00:30 +01:00
Mouse Reeve	f71ef286b6	Updates mocks	2021-11-12 08:55:47 -08:00
Mouse Reeve	3190ef4346	Deprioritize adding old statuses to timelines	2021-11-11 19:19:23 -08:00
Mouse Reeve	2307cb2227	Filter followers only in place It's only used in one spot	2021-10-06 11:12:03 -07:00
Mouse Reeve	97cc129478	Updates calls to privacy_filter	2021-10-06 10:37:09 -07:00
Mouse Reeve	321949f2fa	Lightly updates tests	2021-10-04 09:47:33 -07:00
Mouse Reeve	0798ba028f	Fixes unblock signal	2021-10-03 11:51:17 -07:00
Mouse Reeve	ea303fb285	Updating string format synatx part 3	2021-09-20 16:45:26 -07:00
Mouse Reeve	08f6a97653	Python formatting	2021-09-18 11:33:43 -07:00
Mouse Reeve	377a4e1ef1	Updating string format syntax part 1	2021-09-17 21:39:18 -07:00
Mouse Reeve	c20f6c21ae	Fixes date formats	2021-09-11 11:37:10 -07:00
Mouse Reeve	cd9fe70dbc	Don't increment unread counts on csv import statuses	2021-09-11 10:26:33 -07:00

1 2 3

117 commits