Commit graph

4267 commits

Author SHA1 Message Date
Alexandre Flament
7d6e69e2f9 [upd] wikipedia engine: return an empty result on query with illegal characters
on some queries (like an IT error message), wikipedia returns an HTTP error 400.
this commit returns an empty result instead of showing an error to the user.
2021-02-11 12:29:21 +01:00
Alexandre Flament
ff84a1af35 [mod] json_engine: add content_html_to_text and title_html_to_text
Some JSON API returns HTML in either in the HTML or the content.
This commit adds two new parameters to the json_engine:
content_html_to_text and title_html_to_text, False by default.

If True, then the searx.utils.html_to_text removes the HTML tags.

Update crossref, openairedatasets and openairepublications engines
2021-02-10 16:42:11 +01:00
Alexandre Flament
436d366448
Merge pull request #2544 from mrwormo/congresslibrary
[Engine] Add Library of Congress engine
2021-02-10 10:13:46 +01:00
Alexandre Flament
eafd27f42a
Merge pull request #2556 from dalf/fix-apk-mirror
[fix] fix apk_mirror engine
2021-02-10 10:12:37 +01:00
Alexandre Flament
c40316d957
Merge pull request #2558 from dalf/remove-google-play-music
[upd] remove google_play_music engine
2021-02-10 10:12:21 +01:00
Alexandre Flament
d2dac11392 [mod] duckduckgo engine: better support of the language preference
After the main request, send a second to https://duckduckgo.com/t/sl_h

See https://github.com/searx/searx/issues/2259
2021-02-09 14:36:43 +01:00
Alexandre Flament
74d56f6cfb [mod] poolrequests: for one (user request, engine) always use the same HTTPAdapter
The duckduckgo engine requires an additional request after the results have been sent.
This commit makes sure that the second request uses the same HTTPAdapter
= the same IP address, and the same proxy.
2021-02-09 14:33:36 +01:00
Markus Heiser
bc1be3f0e9 [enh] add engine MediathekViewWeb (API)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-02-09 13:08:01 +01:00
mrwormo
051da88328 Add Library of Congress engine 2021-02-09 12:45:39 +01:00
Alexandre Flament
9211cdfe9b [upd] remove google_play_music engine
Google Play Music has been replaced by Youtube music.
2021-02-09 11:38:50 +01:00
Alexandre Flament
aedf03c0f7 Fix: activate raise_for_error by default
Fix commit d703119d3a :
Some engines need to parse the HTTP error but
raise_for_error is always set to False in the "request" function.
2021-02-09 11:27:41 +01:00
Alexandre Flament
5e055b069b [fix) fix apk_mirror engine 2021-02-09 11:02:12 +01:00
Alexandre Flament
f03ad0a3c0
Merge pull request #2555 from dalf/fix-github-data-update
[fix] fix github action data-update.yml
2021-02-09 10:48:55 +01:00
Alexandre Flament
966a7a1f25 [fix] fix github action data-update.yml 2021-02-09 09:58:59 +01:00
Alexandre Flament
e4cc7f13a3
Merge pull request #2542 from kvch/fix-naver-engine
Fix XPATHs in Naver engine
2021-02-09 08:52:38 +01:00
Alexandre Flament
bec9e30fe7
Merge pull request #2554 from MarcAbonce/zh-variants-in-wikipedia
Add support for Chinese variants in Wikipedia
2021-02-09 08:49:59 +01:00
Alexandre Flament
6c513095e4
Merge pull request #2553 from danielhones/improve-results-highlighting-updated
Ignore double-quotes when highlighting query parts
2021-02-09 08:39:07 +01:00
Daniel Hones
138f32471c Updated webutils.highlight_content to ignore double-quotes when highlighting query parts 2021-02-08 23:58:54 -05:00
Marc Abonce Seguin
64e81794fe add support for Chinese variants in Wikipedia 2021-02-08 21:56:45 -07:00
Noémi Ványi
ac309f5b8d Fix naver engine
Closes #2540
2021-02-07 18:58:13 +01:00
Noémi Ványi
ab8739809c
Merge pull request #2538 from return42/drop-metager
[drop] metager - xpath engine won't work anymore
2021-02-07 15:21:40 +01:00
Markus Heiser
41c03cf011 [drop] metager - xpath engine won't work anymore
The new version of MetaGer needs to reload the reults (into a iframe) with a
unique tag (see HTML response below).

Implementing a dedicated metager-engine for searx makes no sense to me. The
great days of MetaGer seems to be ended.  I remember the good old days this
project started in the 90's of the last century.  But in the last few years it
becomes more and more crap.  As the name suggested, MetaGer was made for
germans in the first place.  They have added a english and spain translation but
the i18n is very poor compared to what searx offers.

It's a pity, lets drop MetaGer.

This is the first response, the id (b82679980656899ba5a17ffd02a56846) is unique
for each query:

    $ curl "https://metager.org/meta/meta.ger3?eingabe=foo&submit-query=&focus=web"
    <!DOCTYPE html>
    <html lang="en">
    <head>
        <meta charset="UTF-8">
        <link rel="stylesheet" href="/index.css?id=b82679980656899ba5a17ffd02a56846">
        <script src="/index.js?id=b82679980656899ba5a17ffd02a56846"></script>
    <title>foo - MetaGer</title>
    <meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1" />
    </head>
    <body>
        <iframe id="mg-framed" src="https://metager.org/meta/meta.ger3?eingabe=foo&amp;submit-query=&amp;focus=web&amp;mgv=b82679980656899ba5a17ffd02a56846" autofocus="true" onload="this.contentWindow.focus();"></iframe>
     </body>

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-02-07 14:55:21 +01:00
Noémi Ványi
1f09d7d561
Merge pull request #2539 from OliveiraHermogenes/recoll/paged_json
[feat] recoll: add support for paging
2021-02-07 14:28:57 +01:00
Hermógenes Oliveira
514faa9162 [feat] recoll: paged json support 2021-02-07 10:05:35 -03:00
Alexandre Flament
1e35c3ccce
Merge pull request #2531 from MarcAbonce/fix-browser-locale
[fix] Get correct locale with country from browser
2021-02-05 10:55:37 +01:00
Marc Abonce Seguin
c937a9e85f [fix] get correct locale with country from browser
Some of our interface locales include uppercase country codes,
which are separated by `_` instead of the more common `-`.
Also, a browser's `Accept-Language` header could be in lowercase.

This commit attempts to normalize those cases so a browser's
language+country codes can better match with our locales.

This solution assumes that our UI locales have nothing more than
language and optionally country. If we ever add a script specific
locale like `zh-Hant-TW` this would have to change to accomodate
that, but the idea would be pretty much the same as this fix.
2021-02-04 19:53:59 -07:00
Alexandre Flament
321788f14a
Merge pull request #2528 from dalf/mod-ci-gh-pages
[mod] CI: minor changes
2021-02-04 23:12:27 +01:00
Noémi Ványi
ffaf785f82
Merge pull request #2533 from mrwormo/ccengine
[Engine] Add Creative Commons search engine
2021-02-04 22:35:08 +01:00
mrwormo
c4c1636b18 Add Creative Commons search engine 2021-02-04 11:31:35 +01:00
Noémi Ványi
006f206dc9
Merge pull request #2530 from return42/fix-user-hb
[fix] make books/user.pdf
2021-02-02 20:50:35 +01:00
Markus Heiser
89554e42a9 [fix] make books/user.pdf
Error:

  Configuration error:
  There is a programmable error in your configuration file:
  ...
  NameError: name 'DOCS_URL' is not defined
  make: *** [utils/makefile.sphinx:156: books/user.latex] Fehler 2

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-02-02 20:14:07 +01:00
Alexandre Flament
90b9d0d6a8 [mod] CI: minor changes
* utils/makefile.python: travis-gh-pages renamed ci-gh-pages
2021-02-02 08:53:57 +01:00
Alexandre Flament
34de715e62
Merge pull request #2500 from dalf/github-action-data
[enh] every Sunday, call utils/fetch_*.py scripts and create a PR automatically
2021-02-01 17:16:58 +01:00
Alexandre Flament
1742355eb8
Merge pull request #2499 from dalf/remove-language-support-variable
[mod] dynamically set language_support variable
2021-02-01 17:16:18 +01:00
Alexandre Flament
ca93a01844 [mod] dynamically set language_support variable
The language_support variable is set to True by default,
and set to False in only 5 engines.

Except the documentation and the /config URL, this variable is not used.

This commit remove the variable definition in the engines, and
set value according to supported_languages length: False when the length is 0,
True otherwise.

Close #2485
2021-02-01 17:10:37 +01:00
Alexandre Flament
99244440e4
Merge pull request #2514 from return42/fix-gh-pages
[fix] Makefile target gh-pages & flatten history of branch gh.pages
2021-02-01 17:07:08 +01:00
Alexandre Flament
0a8799b834
Merge pull request #2517 from dalf/debug-ci
Update pyenv pyenvinstall Make targets
2021-02-01 17:01:34 +01:00
Markus Heiser
8c45f1149d [hardening] github workflows - corrupted cache
aka: ensure that 'make test' works as expected

The cache contains a copy './local' which is - under some circumstance -
corrupted.  It is not possible to clear the cache [1] (see the top of the page).

Ensure that 'make test' works as expected [2] even if

- the python interpreter is missing
- the virtualenv exists but pyyaml is missing

To hardening when the workflow cache fails, this patch adds the new target
'travis.test' into the workflow.  This target probes to import a python module
'yaml'.  If this fails the virtualenv will be completely new build.

[1] https://github.com/actions/cache/issues/2#issuecomment-673493515
[2] https://github.com/searx/searx/pull/2517#discussion_r567240235

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-02-01 16:58:04 +01:00
Markus Heiser
38b39ef0ae [fix] re-add 'pip-exe' target - partial revert 9b48ae47
Target pip-exe is a prerequisite of the targets:

  - pyinstall
  - pyuninstall

and was accidentally deleted in commit 9b48ae47.

HINT:
  do not confuse pyinstall with penvinstall

pyinstall & pyuninstall
    Installing into user's HOME using pip from OS,
    therefore the message is needed.

pyenvinstall & pyenvuninstall
    Installing into virtualenv (./local) using pip which is provided by
    prerequisite 'pyenv' in the virtualenv.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-02-01 16:58:04 +01:00
Alexandre Flament
d70c5a621a [mod] more robust make pyenv / make pyenvinstall
"make pyenv" ensures that ./local/py3/bin/python is an executable
2021-02-01 16:58:04 +01:00
Alexandre Flament
806af50738
Merge pull request #2494 from return42/rm-fabfile
[fix] remove Fabric file
2021-02-01 15:09:35 +01:00
Markus Heiser
40d2a116e1 [fix] Makefile target gh-pages & flatten history of branch gh.pages
1. This patch fixes error:

    rm -rf gh-pages/
    make V=1 gh-pages
    make[1]: Leaving directory '/800GBPCIex4/share/searx'
    [ -d "gh-pages/.git" ] || git clone  gh-pages
    fatal: repository 'gh-pages' does not exist

2. The gh-page build has been moved to ./build/gh-pages this also affects
   'travis-gh-pages'

3. The gh-pages commit messages now includes a ref to the repository and commit

4. Since a gh-pages history has only the drawback that the reposetory grows
   fast, this patch also flattens the history:

    cd build/gh-pages/; git log --oneline
    bash: cd: build/gh-pages/: Datei oder Verzeichnis nicht gefunden
    026126be (HEAD -> gh-pages, origin/gh-pages) make gh-pages: from https://github.com/return42/searx.git@71d66979c2935312e0aed7fc7c3cf6199fbe88a2

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-29 11:41:48 +01:00
Alexandre Flament
71d66979c2
Merge pull request #2482 from return42/fix-google-video
[fix] revise of the google-Video engine
2021-01-28 11:11:07 +01:00
Markus Heiser
7f505bdc6f [fix] google: avoid unnecessary SearxEngineXPathException errors
Avoid SearxEngineXPathException errors when parsing non valid results::

    .//div[@class="yuRUbf"]//a/@href index 0 not found
    Traceback (most recent call last):
      File "./searx/engines/google.py", line 274, in response
        url = eval_xpath_getindex(result, href_xpath, 0)
      File "./searx/searx/utils.py", line 608, in eval_xpath_getindex
        raise SearxEngineXPathException(xpath_spec, 'index ' + str(index) + ' not found')
    searx.exceptions.SearxEngineXPathException: .//div[@class="yuRUbf"]//a/@href index 0 not found

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-28 10:08:50 +01:00
Markus Heiser
e436287385 [mod] checker: add some additional tests
BTW: fix indentation by 2 spaces

The additional tests has been commented out in the google engines to not release
any CAPTCHA issues.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-28 10:08:50 +01:00
Markus Heiser
b1fefec40d [fix] normalize the language & region aspects of all google engines
BTW: make the engines ready for search.checker:

- replace eval_xpath by eval_xpath_getindex and eval_xpath_list
- google_images: remove outer try/except block

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-28 10:08:46 +01:00
Alexandre Flament
0f18e885bf
Merge pull request #2479 from Tobi823/master
Document workaround for using 2 languages simultaneously #1508
2021-01-27 21:29:42 +01:00
Alexandre Flament
b661c3f5d4
Merge pull request #2509 from return42/fix-morty-key
[doc] improve admin-docs about result proxy (morty) configuration
2021-01-27 15:31:29 +01:00
Markus Heiser
a69a8a3ed5 [doc] improve admin-docs about result proxy (morty) configuration
[1] https://github.com/searx/searx/pull/1872#issuecomment-768107138

Suggested-by @dalf [1]
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-27 09:58:06 +01:00
Markus Heiser
923b490022 [mod] add Makfile targets for search.checker.<engine_name>
To check all engines:

    make search.checker

To check a engine 'google news' replace space by underline:

    make search.checker.google_news

To see HTTP requests and more use SEARX_DEBUG:

    make SEARX_DEBUG=1 search.checker.google_news

To filter out HTTP redirects:

    make SEARX_DEBUG=1 search.checker.google_news | grep -A1 "HTTP/1.1\" 3[0-9][0-9]"
    ...
    Engine google news                   Checking
    https://news.google.com:443 "GET /search?q=life&hl=en&lr=lang_en&ie=utf8&oe=utf8&ceid=US%3Aen&gl=US HTTP/1.1" 302 0
    https://news.google.com:443 "GET /search?q=life&hl=en-US&lr=lang_en&ie=utf8&oe=utf8&ceid=US:en&gl=US HTTP/1.1" 200 None
    --
    https://news.google.com:443 "GET /search?q=computer&hl=en&lr=lang_en&ie=utf8&oe=utf8&ceid=US%3Aen&gl=US HTTP/1.1" 302 0
    https://news.google.com:443 "GET /search?q=computer&hl=en-US&lr=lang_en&ie=utf8&oe=utf8&ceid=US:en&gl=US HTTP/1.1" 200 None
    --

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-26 11:46:36 +01:00