In most cases, we want to return back to where we came from after
performing an action. It's not safe to return to an arbitrary referer,
so this streamlines using the util validator to verify the redirect and
fall back on regular redirect params if the referer is outside our
domain.
Many of these environment variables were probably not actually usable,
since they would be strings if set in the env file. Using the
typecasting functions fixes this, and generally shows the intention of
the code more clearly.
Splitting this into five separate queries avoids the large join that
prevents us from using indexes, and requires materializing to disk.
Fixes: #2157 (hopefully)
By default, Django doesn't run any context processors for server errors,
to make the error path as simple as possible. However, this has the
downside that our template does not load correctly. To fix this, I added
a custom 500 error handler, which will run the context processor.
Fixes: #2736
The queries as they previously existed required joining together 12
different tables, which is extremely expensive. Splitting it into four
queries means that the individual queries can effectively use the
indexes we have, and should be very fast no matter how many statuses are
in the database.
Removing the .distinct() call is fine, since we're adding them to a set
in Redis anyways, which will take care of the duplicates.
It's a bit ugly that we now make four separate calls to Redis (this
might result in things being slightly slower in cases where there are an
extremely small number of statuses), but doing things differently would
result in significantly more surgery to the existing code, so I've opted
to avoid that for the moment.
Fixes: #2725
This splits HomeStream.get_audience into two separate database queries,
in order to more effectively take advantage of the indexes we have.
Combining the user ID query and the user following query means that
Postgres isn't able to use the index we have on the userfollows table.
The query planner claims that the userfollows query should be about 20
times faster than it was previously, and the id query should take a
negligible amount of time, since it's selecting a single item by primary
key.
We don't need to worry about duplicates, since there is a constraint
preventing a user from following themself.
Fixes: #2720
Anywhere we have a user object, we can easily get the user ID in the
caller, and this will allow us more flexibility in the future to
implement optimizations that involve knowing a user ID without querying
the database for the user object.
The idea behind a streaming CSV export was to reduce the amount of
memory used, by avoiding building the entire CSV file in memory before
sending it to the client. However, it didn't work out this way in
practice: the query objects that were created to represent each line
caused Postgres to generate a very large (~200MB on bookwyrm.social)
temp file, not to mention the memory being used by the Query object
likely being similar to, if not larger than that used by the finalized
CSV row.
While we should in the long term run our CSV exports as a Celery task,
this change should allow CSV exports to work on large servers without
causing disk-space problems.
Fixes: #2157
get_data can return exceptions other than ConnectorException, and when
it does, we want to simply not show the update section, rather than
crashing.
Related: #2717
Since we don't use the results of our Celery tasks (all of them return
None implicitly), it's prudent to set the ignore_result flag, for a
potential performance improvement. See the Celery docs for details [1].
We could do this with the global CELERY_IGNORE_RESULT setting, but it
offers more flexibility if we want to use task results in the future to
set it on a per-task basis.
[1]: https://docs.celeryq.dev/en/stable/userguide/tasks.html#ignore-results-you-don-t-want
The existing polling code had a few problems:
* It started the timer for a new request when the first request was
sent, rather than when a response was received.
* It increased the delay regardless of whether the response was a
success or a failure.
This commit changes it to a more standard exponential backoff system,
where it starts with a 5 minute ± 30 second delay, and uses that same
delay until it hits an error, at which point the delay is increased by
10%. Once it receives a successful response again, the delay is reset to
the default.
I suspect this should be nicer on the server, since it avoids the
initial sending of many requests. After about half an hour of leaving
the page open, the request rate for this new code will be higher than
that of the old code, so it's possible that this may cause problems, but
I think that a five-minute request frequency should be pretty reasonable.
We should store hashtags case-sensitive, but ensures that an existing
hashtag with different case are found and re-used. for example,
an existing #BookWyrm hashtag will be found and used even if the
status content is using #bookwyrm.
Since the status content already contains rendered HTML when we receive an
ActivityPub inbox message it contains links to the mentioned hashtags on the
originating instance.
To fix this on the receiving instance we need to post-process the status content
after successfully storing the status and its many-to-many fields (the one we're
is the `mention_hashtags`). Post-processing means that we run a regex against the
content to find the anchor tags linking to the originating hashtag and replace the
`href` attribute with the URL to the hashtag page on the receiving (local) instance.
This ensures that when an existing hashtag comes in through ActivityPub federation,
it correctly finds the local one, instead of creating duplicate hashtags.