Commit graph

1124 commits

Author SHA1 Message Date
Markus Heiser
a88e3e4fea [pylint] searx/engines/unsplash.py, add logger & norm indentation
- fix messages from pylint
- add logger and log request URL
- normalized various indentation

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-05-25 16:45:32 +02:00
Markus Heiser
84a943f867 [enh] XPath engine - add time safe-search support
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-05-23 22:26:18 +02:00
Markus Heiser
6bfe3fd033 [enh] XPath engine - add time range support
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-05-23 16:49:30 +02:00
Markus Heiser
1933577c8e [enh] XPath engine - add ISO 639-1 {lang} replacement to search-URL
BTW: remove obsolte params['query'] and not needed paging condition.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-05-23 15:05:36 +02:00
Markus Heiser
8cd544b2a6 [doc] add documentation about the XPath engine
- pylint searx/engines/xpath.py
- fix indentation of some long lines
- add logging
- add doc-strings

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-05-23 11:48:21 +02:00
Markus Heiser
ffcebf5e12 [enh] xpath engine - add request parameter 'soft_max_redirects'
Make 'soft_max_redirects' configurable per Xpath engine::

    - name : <engine-name>
      engine : xpath
      soft_max_redirects: 1
      ...

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-05-17 15:04:55 +02:00
Alexandre Flament
8c1a65d32f [mod] multithreading only in searx.search.* packages
it prepares the new architecture change,
everything about multithreading in moved in the searx.search.* packages

previously the call to the "init" function of the engines was done in searx.engines:
* the network was not set (request not sent using the defined proxy)
* it requires to monkey patch the code to avoid HTTP requests during the tests
2021-05-05 13:12:42 +02:00
Marc Abonce Seguin
448bfe6005 fix Qwant's fetch_languages function 2021-05-02 17:46:40 -07:00
Michael Ilsaas
0c43cf89ca [fix] URL to solidtorrent result page
Reported-by: https://github.com/searx/searx/pull/2786
2021-04-29 10:40:47 +02:00
Markus Heiser
dc29f1d826 [pylint] tag PYLINT_FILES by comment # lint: pylint
These py files are linted by `test.pylint`, all other files are linted by
`test.pep8`.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-04-26 20:18:20 +02:00
Markus Heiser
28b25185c5 [brand] searxng -- fix links to issue tracker & WEB-GUI
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-04-25 14:25:08 +02:00
Markus Heiser
8efabd3ab7 [mod] core.ac.uk engine
- add to list of pylint scripts
- add debug log messages
- move API key int `settings.yml`
- improved readability
- add some metadata to results

Signed-off-by: Markus Heiser <markus@darmarit.de>
2021-04-24 09:00:53 +02:00
spongebob33
7528e38c8a add core.ac.uk engine 2021-04-24 08:55:45 +02:00
Alexandre Flament
d01741c9a2
Merge pull request #15 from return42/add-springer
Add a search engine for Springer Nature
2021-04-22 13:23:31 +02:00
Pierre Chevalier
a80bf1ba97 [enh] Add Springer Nature engine
Springer Nature is a global publisher dedicated to providing service to research
community [1] with official API [2].

To test this PR, first get your API key following this page:

   https://dev.springernature.com/signup

In searx/engines/springer.py at line 24, add this API key.  I left my own key,
commented out in the line aboce.  Feel free to use it, if needed.

[1] https://www.springernature.com/
[2] https://dev.springernature.com/
2021-04-22 12:35:25 +02:00
habsinn
41a2e3785e [enh] add engine using API from "The Art Institute of Chicago" 2021-04-22 12:25:43 +02:00
Markus Heiser
e9a6ab4015 [fix] youtube - send CONSENT Cookie to not be redirected
In the EU there exists a "General Data Protection Regulation" [1] aka GDPR (BTW:
very user friendly!) which requires consent to tracking.  To get the consent
from the user, youtube requests are redirected to confirm and get a CONSENT
Cookie from https://consent.youtube.com

This patch adds a CONSENT Cookie to the youtube request to avoid redirection.

[1] https://en.wikipedia.org/wiki/General_Data_Protection_Regulation

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Reported-by: https://github.com/searx/searx/issues/2774
2021-04-22 12:09:09 +02:00
Alexandre Flament
c6d5605d27
Merge pull request #7 from searxng/metrics
Metrics
2021-04-22 08:34:17 +02:00
Alexandre Flament
b7848e3422 [fix] searxng fix: sjp engine 2021-04-21 16:31:29 +02:00
Alexandre Flament
7acd7ffc02 [enh] rewrite and enhance metrics 2021-04-21 16:24:46 +02:00
Alexandre Flament
aae7830d14 [mod] refactoring: processors
Report to the user suspended engines.

searx.search.processor.abstract:
* manages suspend time (per network).
* reports suspended time to the ResultContainer (method extend_container_if_suspended)
* adds the results to the ResultContainer (method extend_container)
* handles exceptions (method handle_exception)
2021-04-21 16:24:46 +02:00
Alexandre Flament
48720e20a8 Merge remote-tracking branch 'searx/master' 2021-04-19 09:35:12 +02:00
Noémi Ványi
8362257b9a
Merge pull request #2736 from plague-doctor/sjp
Add new engine: SJP - Słownik języka polskiego
2021-04-16 17:30:14 +02:00
Noémi Ványi
e56323d3c8
Merge pull request #2759 from ypid/fix/typo
Fix grammar mistake in debug log output
2021-04-16 17:26:45 +02:00
Plague Doctor
d275d7a35e Code refactoring. 2021-04-16 12:23:27 +10:00
Markus Heiser
062d589f86 [fix] xpath expressions to grap all items from bandcamp's response
I also found some items missing a thumbnail and I used text_extract for content
and title, to remove unneeded whitespaces.

BTW: added bandcamp's favicon

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-04-15 08:52:11 +02:00
Kyle Anthony Williams
4d3c399ee9 [feat] add bandcamp engine 2021-04-15 08:52:11 +02:00
Alexandre Flament
d14994dc73 [httpx] replace searx.poolrequests by searx.network
settings.yml:

* outgoing.networks:
   * can contains network definition
   * propertiers: enable_http, verify, http2, max_connections, max_keepalive_connections,
     keepalive_expiry, local_addresses, support_ipv4, support_ipv6, proxies, max_redirects, retries
   * retries: 0 by default, number of times searx retries to send the HTTP request (using different IP & proxy each time)
   * local_addresses can be "192.168.0.1/24" (it supports IPv6)
   * support_ipv4 & support_ipv6: both True by default
     see https://github.com/searx/searx/pull/1034
* each engine can define a "network" section:
   * either a full network description
   * either reference an existing network

* all HTTP requests of engine use the same HTTP configuration (it was not the case before, see proxy configuration in master)
2021-04-12 17:25:56 +02:00
Robin Schneider
dfc66ff0f0
Fix grammar mistake in debug log output 2021-04-11 22:12:53 +02:00
Alexandre Flament
eaa694fb7d [enh] replace requests by httpx 2021-04-10 15:38:33 +02:00
Plague Doctor
599ff39ddf Fix conflicts 2021-04-09 06:54:03 +10:00
Plague Doctor
6631f11305 Add new engine: SJP 2021-04-08 10:21:54 +10:00
Plague Doctor
7035bed4ee Add new engine: Wordnik.com 2021-04-08 09:58:00 +10:00
Noémi Ványi
07f5edce3d Add Meilisearch engine
Website: https://www.meilisearch.com/
2021-04-06 21:57:05 +02:00
Alexandre Flament
725a69616b
Merge pull request #2681 from dalf/fix-wikipedia-title
[fix] wikipedia: remove HTML from the title
2021-03-27 17:43:36 +01:00
Noémi Ványi
9bb312c505 Remove duplicated key from dict in Semantic Scholar 2021-03-27 16:58:32 +01:00
Noémi Ványi
f596f5767b fix Semantic Scholar engine 2021-03-27 16:54:01 +01:00
Adam Tauber
28286cf3f2 [fix] update seznam engine to be compatible with the new website 2021-03-27 15:29:04 +01:00
Alexandre Flament
fcfcf662ff [fix] wikipedia: remove HTML from the title
fr.wikipedia.org (and it seems not other wikipedia websites),
adds HTML to api_result['displayTitle'].
(Search for '!wp :fr Braid' for example)

The commit uses api_result['title']
2021-03-25 08:31:39 +01:00
Adam Tauber
0ba71c3644 [fix] make ina engine compatible with the new response json 2021-03-25 01:20:41 +01:00
Adam Tauber
5f450fda74 [enh] add year filter to duckduckgo 2021-03-25 00:25:36 +01:00
Adam Tauber
fd737dc9d8 [fix] remove debug code 2021-03-24 23:54:39 +01:00
Alexandre Flament
38c210d746
[mod] soundcloud: faster initialization
The get_cliend_id() function:
* fetches https://soundcloud.com
* then fetches each referenced javascript URL to get the client id.

This commit fetches the javascript URLs in the reverse order: the client id is in the last javascript URL.
2021-03-21 09:29:53 +01:00
Adam Tauber
4c631ac6d0 [fix] remove debug code 2021-03-15 21:47:27 +01:00
Noémi Ványi
8158d8654a fix Microsoft Academic engine 2021-03-15 20:21:28 +01:00
Adam Tauber
f97b4ff7b6 [fix] update youtube_noapi paging 2021-03-15 17:22:31 +01:00
Adam Tauber
dd34ac396c
Merge pull request #2652 from kvch/solr-engine
Add Apache Solr engine
2021-03-15 15:39:39 +01:00
Alexandre Flament
1664258061
Merge pull request #2655 from return42/fix-imports
[fix] remove unused import from yahoo-news engine
2021-03-15 08:38:34 +01:00
Markus Heiser
6e1f1085ef [fix] remove unused import from yahoo-news engine
Signed-off-by: Markus Heiser <markus@darmarit.de>
2021-03-14 15:13:57 +01:00
Markus Heiser
3703ebb22a [drop] Acgsou engine - www.acgsou.com no longer exists
- https://www.acgsou.com/ acgsou.com is redirected to 36dm.club
- @rinpatch do not plan on maintaining the engine [1]

[1] https://github.com/searx/searx/pull/1283#issuecomment-798783585

Signed-off-by: Markus Heiser <markus@darmarit.de>
2021-03-14 11:49:18 +01:00