Commit graph

208 commits

Author SHA1 Message Date
Hugh Rundle 1ee2ff4811 normalise isbn on local book search
- uppercase ISBN before checking it's a number to account for trailing 'x'
- check maybe_isbn for search_identifiers search. Without this we are only searching external connectors, not locally!
2022-08-30 20:00:09 +10:00
Hugh Rundle 18d3d2f85d linting 2022-08-28 17:30:46 +10:00
Hugh Rundle f219851f3a strip leading and following spaces from ISBN 2022-08-28 17:28:00 +10:00
Hugh Rundle da5fd32196 normalise isbn searching
ISBNs are always numeric except for when the check digit in ISBN-10s is a ten, indicated with a capital X.
These changes ensure that ISBNs are always upper-case so that a lower-case 'x' is not used when searching.

Additionally some ancient ISBNs have been printed without a leading zero (i.e. they only have 9 characters on the physical book). This change prepends a zero if something looks like an ISBN but only has 9 chars.
2022-08-28 11:05:40 +10:00
Mouse Reeve 5706028656 Log failing to connect as info instead of exception
These are normal, expected errors, and while we should probably
re-evaluate the connectors in some way, pending that, there's no need to
log these as unepected errors, which causes confusion and clutters my
error logging.
2022-07-11 08:47:18 -07:00
Mouse Reeve 5d363da175 Handle getting edition data as dict or string 2022-07-03 11:05:20 -07:00
Mouse Reeve e7b0a84ded
Merge pull request #2142 from bookwyrm-social/load-data-duration
Split expand book data task into per-edition tasks
2022-06-30 11:47:23 -07:00
Mouse Reeve d149e57494 Split expand book data task into per-edition tasks
Loading every edition in one task takes ages, and produces a large task
that clogs up the queue. This will create more, smaller tasks that will
finish more quickly.
2022-05-31 12:41:57 -07:00
Mouse Reeve 374fdcf467 Use relative list order ranking in openlibrary search
Set OpenLibrary search condifidence based on the provided result order,
just using 1/(list index), so the first has rank 1, the second 0.5, the
third 0.33, et cetera.
2022-05-31 10:22:49 -07:00
Mouse Reeve c3b35760a2 Updates test mocks for remote search 2022-05-31 09:37:54 -07:00
Mouse Reeve 969db13ff2 Safely return None in remote search return_first 2022-05-31 08:49:23 -07:00
Mouse Reeve a053f20961 Re-implements return first option
Since we get all the results quickly now, this aggregates all the
results that came back and sorts them by confidence, and returns the
highest confidence result. The confidences aren't great on free text
search, but conceptually that's how it should work at least.

It may make sense to aggregate the search results in all contexts, but
I'll propose that in a separate PR.
2022-05-31 08:20:59 -07:00
Mouse Reeve 98ed03b6b4 Python formatting and test update 2022-05-30 17:00:34 -07:00
Mouse Reeve 83ee5a756f Filter intentaire results by confidence 2022-05-30 16:42:37 -07:00
Mouse Reeve af19d728d2 Removes outdated unit tests 2022-05-30 16:16:10 -07:00
Mouse Reeve 87fe984462 Combines search formatter and parser function
The parser was extracting the list of search results from the json
object returned by the search endpoint, and the formatter was converting
an individual json entry into a SearchResult object. This just merged
them into one function, because they are never used separately.
2022-05-30 12:52:31 -07:00
Mouse Reeve 525e2a591d More error handing
Adds logging and error handling for some of the numerous ways a request
could fail (the remote site is down, the url is blocked, etc).

I also have the results boxes open by default, which makes it more
legible imo.
2022-05-30 12:40:13 -07:00
Mouse Reeve 45f2199c71 Gather and wait on async requests
This sends out the request tasks all at once and then aggregates the
results, instead of just running them one after another asynchronously.
2022-05-30 12:05:22 -07:00
Mouse Reeve 5e81ec75fb Set request headers in async search get request
Gotta ask for json
2022-05-30 11:19:16 -07:00
Mouse Reeve 9a9cef7766 Verify url before async search
The database lookup doesn't work during the asyn process, so this change
loops through the connectors and grabs the formatted urls before sending
it to the async handler.
2022-05-30 11:16:05 -07:00
Mouse Reeve 0adda36da7 Remove search endpoints from Connector
Instead of having individual search functions that make individual
requests, the connectors will always be searched asynchronously
together. The process_seach_response combines the parse and format
functions, which could probably be merged into one over-rideable
function.

The current to-do on this is to remove Inventaire search results that
are below the confidence threshhold after search, which used to happen
in the `search` function.
2022-05-30 10:37:24 -07:00
Mouse Reeve 9c03bf782e Make an async request to all search connectors
This is the untest first pass at re-arranging remote search to work in
parallel rather than sequence. It moves a couple functions around
(raise_not_valid_url, for example, needs to be in connector_manager.py
now to avoid circular imports). It adds a function to Connector objects
that generates a search result (either to the isbn endpoint or the free
text endpoint) based on the query, which was previously done as part of
the search.

I also lowered the timeout to 8 seconds by default.
2022-05-30 10:15:22 -07:00
Mouse Reeve 72d6a4ce52 Log info, not exception, for expected errors 2022-03-11 14:55:54 -08:00
Mouse Reeve 39691bed3a Merge branch 'main' into openlibrary-author-fields 2022-02-16 18:06:04 -08:00
Mouse Reeve 3e635f497e Adds some simple url validation 2022-02-03 15:11:01 -08:00
Mouse Reeve 194c69f512 Fixes return values of null responses 2022-02-02 07:09:35 -08:00
Mouse Reeve 754e24812b Check image extensions before saving 2022-02-01 21:18:25 -08:00
Mouse Reeve 9611815b44 Extract wikipedia and inventaire ids 2022-01-30 12:02:18 -08:00
Mouse Reeve 44dad43f36 Load new fields via connector 2022-01-30 11:41:33 -08:00
Mouse Reeve b18c69e186 Make search timeouts configurable 2022-01-07 07:42:05 -08:00
Mouse Reeve 3545085a7d Fixes tests 2021-12-14 14:19:27 -08:00
Mouse Reeve 09f5218f9c Fixes accept header 2021-12-14 13:47:09 -08:00
Mouse Reeve 6e61e4d52c
Merge pull request #1578 from bookwyrm-social/improve-compatibility
Improve federation compability with Hubzilla and Zap
2021-12-09 11:06:04 -08:00
Mouse Reeve 02313f40b8 Adds update from inventaire link for books 2021-12-05 13:48:05 -08:00
Mouse Reeve 071da7d4fb Handle various link generation needs 2021-12-05 13:38:15 -08:00
Mouse Reeve 4085714764 Update openlibrary author with ISNI 2021-12-05 13:26:22 -08:00
Mouse Reeve d7e4e6aa1e Adds openlibrary update for book 2021-12-05 13:02:42 -08:00
Mouse Reeve b824841cb3 Adds update logic to connectors 2021-12-05 12:47:27 -08:00
Mouse Reeve 6dd7eebd98 Fixes tests 2021-11-16 10:16:28 -08:00
Mouse Reeve d3e4c7e8d9 Removes change to boolean logic 2021-10-27 10:40:37 -07:00
Mouse Reeve 07446fa7d2 Adds more tests for the inventaire connector 2021-10-27 10:03:09 -07:00
Mouse Reeve 8ba875af4a Improve federation compability with Hubzilla and Zap
Co-authored-by: hubzilla <redmatrix@users.noreply.github.com>
Fixes #1564
2021-10-26 14:41:06 -07:00
Mouse Reeve 1033d3d045 Updates connector tests 2021-09-30 11:33:04 -07:00
Mouse Reeve 5dd2aac600 Merge branch 'main' into search-refactor 2021-09-30 10:41:30 -07:00
Mouse Reeve d36ef2bcf1 Pylint change 2021-09-29 12:42:28 -07:00
Mouse Reeve 32391dd64d Python formatting 2021-09-29 12:38:31 -07:00
Mouse Reeve 0aef011258 Don't use the format detail if it maps directly 2021-09-29 12:29:17 -07:00
Mouse Reeve 123b23728f Infer format in openlibrary import 2021-09-29 12:21:19 -07:00
Mouse Reeve 08f6a97653 Python formatting 2021-09-18 11:33:43 -07:00
Mouse Reeve acfb1bb376 Updating string format synatx part 2 2021-09-18 11:32:00 -07:00