This is essentially a revert of 9cbff312a. The commit was at the advice
of the Celery docs for optimization, but I've since decided that the
downsides in terms of making things harder to debug (it makes Flower
nearly useless, for instance) are bigger than the upsides in performance
gain (which seem extremely small in practice, given how long our tasks
take, and the number of tasks we have).
Since we don't use the results of our Celery tasks (all of them return
None implicitly), it's prudent to set the ignore_result flag, for a
potential performance improvement. See the Celery docs for details [1].
We could do this with the global CELERY_IGNORE_RESULT setting, but it
offers more flexibility if we want to use task results in the future to
set it on a per-task basis.
[1]: https://docs.celeryq.dev/en/stable/userguide/tasks.html#ignore-results-you-don-t-want
ERROR HANDLING FIXES
- use raise_for_status() to pass through response code
- handle exceptions where no response object is passed through
INSTANCE ACTOR
- models.User.objects.create_user function cannot take an ID
- allow instance admins to determine username and email for instance actor in settings.py
These errors in resolve_remote_id aren't really errors, they're
routine problems that we can expect from dealing with the outside world,
like a connection timeout, a server being down, a server being blocked,
et cetera. It's cluttering up the logs and causing unnecessary worry.
- uppercase ISBN before checking it's a number to account for trailing 'x'
- check maybe_isbn for search_identifiers search. Without this we are only searching external connectors, not locally!
ISBNs are always numeric except for when the check digit in ISBN-10s is a ten, indicated with a capital X.
These changes ensure that ISBNs are always upper-case so that a lower-case 'x' is not used when searching.
Additionally some ancient ISBNs have been printed without a leading zero (i.e. they only have 9 characters on the physical book). This change prepends a zero if something looks like an ISBN but only has 9 chars.
These are normal, expected errors, and while we should probably
re-evaluate the connectors in some way, pending that, there's no need to
log these as unepected errors, which causes confusion and clutters my
error logging.
Loading every edition in one task takes ages, and produces a large task
that clogs up the queue. This will create more, smaller tasks that will
finish more quickly.
Set OpenLibrary search condifidence based on the provided result order,
just using 1/(list index), so the first has rank 1, the second 0.5, the
third 0.33, et cetera.
Since we get all the results quickly now, this aggregates all the
results that came back and sorts them by confidence, and returns the
highest confidence result. The confidences aren't great on free text
search, but conceptually that's how it should work at least.
It may make sense to aggregate the search results in all contexts, but
I'll propose that in a separate PR.
The parser was extracting the list of search results from the json
object returned by the search endpoint, and the formatter was converting
an individual json entry into a SearchResult object. This just merged
them into one function, because they are never used separately.
Adds logging and error handling for some of the numerous ways a request
could fail (the remote site is down, the url is blocked, etc).
I also have the results boxes open by default, which makes it more
legible imo.
The database lookup doesn't work during the asyn process, so this change
loops through the connectors and grabs the formatted urls before sending
it to the async handler.
Instead of having individual search functions that make individual
requests, the connectors will always be searched asynchronously
together. The process_seach_response combines the parse and format
functions, which could probably be merged into one over-rideable
function.
The current to-do on this is to remove Inventaire search results that
are below the confidence threshhold after search, which used to happen
in the `search` function.
This is the untest first pass at re-arranging remote search to work in
parallel rather than sequence. It moves a couple functions around
(raise_not_valid_url, for example, needs to be in connector_manager.py
now to avoid circular imports). It adds a function to Connector objects
that generates a search result (either to the isbn endpoint or the free
text endpoint) based on the query, which was previously done as part of
the search.
I also lowered the timeout to 8 seconds by default.