Commit graph

9032 commits

Author SHA1 Message Date
Wesley Aptekar-Cassels
60fee54da9 Optimize CSV export query
Splitting this into five separate queries avoids the large join that
prevents us from using indexes, and requires materializing to disk.

Fixes: #2157 (hopefully)
2023-03-13 15:45:21 -04:00
Mouse Reeve
ded3f469ef
Merge pull request #2738 from bookwyrm-social/update-version
Update version number and js cachebuster
2023-03-13 08:16:24 -07:00
Mouse Reeve
177131f53f
Merge pull request #2739 from bookwyrm-social/locales
Updates locales
2023-03-13 08:16:12 -07:00
Mouse Reeve
ae164ee6e1 Updates locales 2023-03-13 07:56:25 -07:00
Mouse Reeve
9c1aaadab3 Update verison number and js cachebuster 2023-03-13 07:52:28 -07:00
Mouse Reeve
a73a461867
Merge pull request #2737 from WesleyAC/fix-500-page-css
Use context processor for 500 page
2023-03-13 07:43:29 -07:00
Wesley Aptekar-Cassels
0b9e4d617e Use context processor for 500 page
By default, Django doesn't run any context processors for server errors,
to make the error path as simple as possible. However, this has the
downside that our template does not load correctly. To fix this, I added
a custom 500 error handler, which will run the context processor.

Fixes: #2736
2023-03-13 03:47:23 -04:00
Mouse Reeve
cca20f4834
Merge pull request #2735 from bookwyrm-social/migration
Adds merge migration
2023-03-12 17:34:17 -07:00
Mouse Reeve
7ffe5b9440 Adds merge migration 2023-03-12 16:43:06 -07:00
Mouse Reeve
12af5992a3
Merge pull request #2524 from chdorner/feature/tag-support
Initial hashtag support
2023-03-12 16:37:39 -07:00
Mouse Reeve
48889ee6c4
Merge pull request #2695 from chdorner/book-edit-form-validation-notification
Show notification banner on top of form when book failed to update
2023-03-12 16:33:34 -07:00
Mouse Reeve
2e7eb0f3ce
Merge pull request #2702 from Ryuno-Ki/lazyload-images
Add attributes to images to hint async load
2023-03-12 16:31:27 -07:00
Mouse Reeve
d253a61f02
Merge pull request #2708 from WesleyAC/portable-hashbangs
Use more portable hashbang for dev scripts.
2023-03-12 16:29:10 -07:00
Mouse Reeve
863ec1602a
Merge pull request #2710 from WesleyAC/celery-env-vars
Add env vars for celery concurrency and time limit
2023-03-12 16:27:02 -07:00
Mouse Reeve
6345beb90d
Merge pull request #2714 from WesleyAC/celery-ignore-results
Ignore Celery task results
2023-03-12 16:26:20 -07:00
Mouse Reeve
84b8a5c433
Merge pull request #2713 from WesleyAC/buffer-csv-export
Change CSV export to buffer instead of streaming
2023-03-12 16:17:53 -07:00
Mouse Reeve
d17190fae3
Merge pull request #2718 from WesleyAC/broaden-dashboard-http-except
Broaden except section for HTTP request in dashboard
2023-03-12 16:10:01 -07:00
Mouse Reeve
600340771a
Merge pull request #2723 from WesleyAC/get-audience-perf
Improve `HomeStream.get_audience` performance
2023-03-12 16:08:54 -07:00
Mouse Reeve
352ba972c5
Merge pull request #2724 from WesleyAC/fix-bw-dev-dbshell
Fix dbshell command
2023-03-12 15:45:45 -07:00
Mouse Reeve
c28d523e6f
Merge branch 'main' into get-audience-perf 2023-03-12 15:40:53 -07:00
Mouse Reeve
efe3cb9461
Merge pull request #2726 from WesleyAC/optimize-add-remove-book-statuses-task
Optimize add/remove book statuses task queries
2023-03-12 15:36:27 -07:00
Wesley Aptekar-Cassels
2a5f722f6e Optimize add/remove book statuses task queries
The queries as they previously existed required joining together 12
different tables, which is extremely expensive. Splitting it into four
queries means that the individual queries can effectively use the
indexes we have, and should be very fast no matter how many statuses are
in the database.

Removing the .distinct() call is fine, since we're adding them to a set
in Redis anyways, which will take care of the duplicates.

It's a bit ugly that we now make four separate calls to Redis (this
might result in things being slightly slower in cases where there are an
extremely small number of statuses), but doing things differently would
result in significantly more surgery to the existing code, so I've opted
to avoid that for the moment.

Fixes: #2725
2023-03-09 15:26:03 -05:00
Wesley Aptekar-Cassels
cc610372ca Fix dbshell command
dbshell needs to be run in a already-running container, thus exec rather
than run is the correct docker-compose command.
2023-03-09 02:02:56 -05:00
Wesley Aptekar-Cassels
56243f6529 Optimize HomeStream.get_audience
This splits HomeStream.get_audience into two separate database queries,
in order to more effectively take advantage of the indexes we have.
Combining the user ID query and the user following query means that
Postgres isn't able to use the index we have on the userfollows table.

The query planner claims that the userfollows query should be about 20
times faster than it was previously, and the id query should take a
negligible amount of time, since it's selecting a single item by primary
key.

We don't need to worry about duplicates, since there is a constraint
preventing a user from following themself.

Fixes: #2720
2023-03-09 00:50:24 -05:00
Wesley Aptekar-Cassels
23698dafe5 Change get_audience to return list of user IDs
This will make it simpler to implement various optimizations.
2023-03-09 00:50:24 -05:00
Wesley Aptekar-Cassels
41e14bdfaf Change unread_by_status_type_id to take user ID
Same reason as in prior commit.
2023-03-09 00:50:24 -05:00
Wesley Aptekar-Cassels
653e8ee81b Change unread_id to take user ID
Same reason as described in the prior commit.
2023-03-09 00:50:24 -05:00
Wesley Aptekar-Cassels
5446869c38 Change stream_id to take user ID
Anywhere we have a user object, we can easily get the user ID in the
caller, and this will allow us more flexibility in the future to
implement optimizations that involve knowing a user ID without querying
the database for the user object.
2023-03-09 00:50:16 -05:00
Mouse Reeve
e4edef03c5
Merge pull request #2721 from verymilan/verymilan-patch-1
fix typo in systemd example
2023-03-08 18:39:05 -08:00
Wesley Aptekar-Cassels
50a81bdfdd Change CSV export to buffer instead of streaming
The idea behind a streaming CSV export was to reduce the amount of
memory used, by avoiding building the entire CSV file in memory before
sending it to the client. However, it didn't work out this way in
practice: the query objects that were created to represent each line
caused Postgres to generate a very large (~200MB on bookwyrm.social)
temp file, not to mention the memory being used by the Query object
likely being similar to, if not larger than that used by the finalized
CSV row.

While we should in the long term run our CSV exports as a Celery task,
this change should allow CSV exports to work on large servers without
causing disk-space problems.

Fixes: #2157
2023-03-08 21:37:56 -05:00
Mouse Reeve
5c109a2566
Merge branch 'main' into celery-env-vars 2023-03-08 18:37:03 -08:00
Mouse Reeve
2f737efeff
Merge pull request #2709 from WesleyAC/improve-polling-backoff
Improve polling algorithm
2023-03-08 18:36:19 -08:00
Wesley Aptekar-Cassels
4af4f30cde Broaden except section for HTTP request in dashboard
get_data can return exceptions other than ConnectorException, and when
it does, we want to simply not show the update section, rather than
crashing.

Related: #2717
2023-03-08 21:32:41 -05:00
Chris Moultrie
86675ee944
Example Settings and run black 2023-03-08 14:48:04 -05:00
Milan
a6bc53a936
fix typo in systemd example
...which prevented imports from running
2023-03-08 19:58:58 +01:00
Wesley Aptekar-Cassels
9cbff312a5 Ignore Celery task results
Since we don't use the results of our Celery tasks (all of them return
None implicitly), it's prudent to set the ignore_result flag, for a
potential performance improvement. See the Celery docs for details [1].

We could do this with the global CELERY_IGNORE_RESULT setting, but it
offers more flexibility if we want to use task results in the future to
set it on a per-task basis.

[1]: https://docs.celeryq.dev/en/stable/userguide/tasks.html#ignore-results-you-don-t-want
2023-03-08 02:12:13 -05:00
Mouse Reeve
c3109f1238
Merge branch 'main' into improve-polling-backoff 2023-03-07 13:57:00 -08:00
Mouse Reeve
3c57797852
Merge branch 'main' into portable-hashbangs 2023-03-07 13:56:44 -08:00
Mouse Reeve
1350e91971
Merge branch 'main' into book-edit-form-validation-notification 2023-03-07 13:56:23 -08:00
Mouse Reeve
00666c4f52
Merge pull request #2711 from bookwyrm-social/fix/reorder-head-migration
Reorder head migration
2023-03-07 13:56:04 -08:00
Christof Dorner
ee0a89faf2 Reorder head migration 2023-03-07 22:31:58 +01:00
Christof Dorner
bc0b291d36 Show notification banner on top of form when book failed to update 2023-03-07 21:58:12 +01:00
Wesley Aptekar-Cassels
26e34ddffa Add env vars for celery concurrency and time limit 2023-03-07 13:52:02 -05:00
Wesley Aptekar-Cassels
abb5dc857e Use more portable shebang for dev scripts
/bin/bash, while common, is not part of the unix standard, and does not
exist on some operating systems (such as NixOS). /usr/bin/env, on the
other hand, is standardized, and thus should exist on all systems.
2023-03-07 13:39:18 -05:00
Wesley Aptekar-Cassels
43ad3d0c15 Improve polling algorithm
The existing polling code had a few problems:

* It started the timer for a new request when the first request was
  sent, rather than when a response was received.
* It increased the delay regardless of whether the response was a
  success or a failure.

This commit changes it to a more standard exponential backoff system,
where it starts with a 5 minute ± 30 second delay, and uses that same
delay until it hits an error, at which point the delay is increased by
10%. Once it receives a successful response again, the delay is reset to
the default.

I suspect this should be nicer on the server, since it avoids the
initial sending of many requests. After about half an hour of leaving
the page open, the request rate for this new code will be higher than
that of the old code, so it's possible that this may cause problems, but
I think that a five-minute request frequency should be pretty reasonable.
2023-03-07 13:15:52 -05:00
Mouse Reeve
05a303ea18
Merge pull request #2690 from bookwyrm-social/link-domain-notifications
Create notifications for link domains that need approval
2023-03-07 08:43:14 -08:00
Mouse Reeve
1612217eaa
Merge pull request #2696 from bookwyrm-social/chronological-pagination
Only use chronological pagination sometimes
2023-03-07 08:42:43 -08:00
Christof Dorner
9ca9883e0b Enable finding existing hashtags case-insensitive
We should store hashtags case-sensitive, but ensures that an existing
hashtag with different case are found and re-used. for example,
an existing #BookWyrm hashtag will be found and used even if the
status content is using #bookwyrm.
2023-03-07 13:16:45 +01:00
Christof Dorner
f3334b1550 Render hashtag links with data-mention="hashtag" attribute 2023-03-07 13:16:45 +01:00
Christof Dorner
276b255f32 Post-process status.content field to change hashtag URLs
Since the status content already contains rendered HTML when we receive an
ActivityPub inbox message it contains links to the mentioned hashtags on the
originating instance.

To fix this on the receiving instance we need to post-process the status content
after successfully storing the status and its many-to-many fields (the one we're
is the `mention_hashtags`). Post-processing means that we run a regex against the
content to find the anchor tags linking to the originating hashtag and replace the
`href` attribute with the URL to the hashtag page on the receiving (local) instance.
2023-03-07 13:16:45 +01:00