Commit graph

4025 commits

Author SHA1 Message Date
Alexandre Flament
71d66979c2
Merge pull request #2482 from return42/fix-google-video
[fix] revise of the google-Video engine
2021-01-28 11:11:07 +01:00
Markus Heiser
7f505bdc6f [fix] google: avoid unnecessary SearxEngineXPathException errors
Avoid SearxEngineXPathException errors when parsing non valid results::

    .//div[@class="yuRUbf"]//a/@href index 0 not found
    Traceback (most recent call last):
      File "./searx/engines/google.py", line 274, in response
        url = eval_xpath_getindex(result, href_xpath, 0)
      File "./searx/searx/utils.py", line 608, in eval_xpath_getindex
        raise SearxEngineXPathException(xpath_spec, 'index ' + str(index) + ' not found')
    searx.exceptions.SearxEngineXPathException: .//div[@class="yuRUbf"]//a/@href index 0 not found

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-28 10:08:50 +01:00
Markus Heiser
e436287385 [mod] checker: add some additional tests
BTW: fix indentation by 2 spaces

The additional tests has been commented out in the google engines to not release
any CAPTCHA issues.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-28 10:08:50 +01:00
Markus Heiser
b1fefec40d [fix] normalize the language & region aspects of all google engines
BTW: make the engines ready for search.checker:

- replace eval_xpath by eval_xpath_getindex and eval_xpath_list
- google_images: remove outer try/except block

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-28 10:08:46 +01:00
Alexandre Flament
0f18e885bf
Merge pull request #2479 from Tobi823/master
Document workaround for using 2 languages simultaneously #1508
2021-01-27 21:29:42 +01:00
Alexandre Flament
b661c3f5d4
Merge pull request #2509 from return42/fix-morty-key
[doc] improve admin-docs about result proxy (morty) configuration
2021-01-27 15:31:29 +01:00
Markus Heiser
a69a8a3ed5 [doc] improve admin-docs about result proxy (morty) configuration
[1] https://github.com/searx/searx/pull/1872#issuecomment-768107138

Suggested-by @dalf [1]
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-27 09:58:06 +01:00
Markus Heiser
923b490022 [mod] add Makfile targets for search.checker.<engine_name>
To check all engines:

    make search.checker

To check a engine 'google news' replace space by underline:

    make search.checker.google_news

To see HTTP requests and more use SEARX_DEBUG:

    make SEARX_DEBUG=1 search.checker.google_news

To filter out HTTP redirects:

    make SEARX_DEBUG=1 search.checker.google_news | grep -A1 "HTTP/1.1\" 3[0-9][0-9]"
    ...
    Engine google news                   Checking
    https://news.google.com:443 "GET /search?q=life&hl=en&lr=lang_en&ie=utf8&oe=utf8&ceid=US%3Aen&gl=US HTTP/1.1" 302 0
    https://news.google.com:443 "GET /search?q=life&hl=en-US&lr=lang_en&ie=utf8&oe=utf8&ceid=US:en&gl=US HTTP/1.1" 200 None
    --
    https://news.google.com:443 "GET /search?q=computer&hl=en&lr=lang_en&ie=utf8&oe=utf8&ceid=US%3Aen&gl=US HTTP/1.1" 302 0
    https://news.google.com:443 "GET /search?q=computer&hl=en-US&lr=lang_en&ie=utf8&oe=utf8&ceid=US:en&gl=US HTTP/1.1" 200 None
    --

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-26 11:46:36 +01:00
Alexandre Flament
6047087aac [mod] utils/fetch_languages.py: write files at the right location 2021-01-24 14:25:27 +01:00
Alexandre Flament
3330cf4a46 [enh] every monday, call utils/fetch_*.py scripts and create a PR automatically 2021-01-24 13:32:39 +01:00
Markus Heiser
ff6804e545 [data] make engines.languages
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-24 09:52:32 +01:00
Markus Heiser
8cdad5d85d [fix] google-videos: parse values for 'length' & 'author'
The 'video.html' template from the 'oscar' design supports replacement
for *author* and *length*.  Google-videos does not have an author, alternatively
the publisher info from is used for the *author*.

Hint: these replacements are not supported by the 'simple' design.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-24 09:51:24 +01:00
Markus Heiser
89b3050b5c [fix] revise of the google-Video engine
This revise is based on the methods developed in the revise of the google engine
(see commit 410c2f9).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-24 09:39:30 +01:00
Alexandre Flament
f4a17acb7a
Merge pull request #2498 from dalf/minor-fix-google-news
[fix] google_news: avoid one HTTP redirect except for the English results
2021-01-24 09:13:48 +01:00
Alexandre Flament
96c2996857
Merge pull request #2497 from return42/fix-test.sh
[fix] lxc.sh - SC2034: ubu2010_boilerplate appears unused.
2021-01-24 09:06:11 +01:00
Alexandre Flament
8c46b767d0 [fix] google_news: avoid one HTTP redirect except for the English results
also add
params['soft_max_redirects'] = 1
to avoid false error reporting in /stats/errors
2021-01-24 08:53:35 +01:00
Markus Heiser
ea5c992d4f [fix] lxc.sh - SC2034: ubu2010_boilerplate appears unused.
$ make test.sh
  In utils/lxc.sh line 42:
  ubu2010_boilerplate="$ubu1904_boilerplate"
  ^-----------------^ SC2034: ubu2010_boilerplate appears unused. Verify use (or export if used externally).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-24 08:29:13 +01:00
Alexandre Flament
7d24850d49
Merge pull request #2483 from return42/fix-google-news
[fix] revise of the google-News engine
2021-01-23 20:21:09 +01:00
Markus Heiser
5f92dfcdbe [fix] google-news: query uses locale without country tag
Wthout country-region tag google will redirect to correct the contry tag [1]:

    SEARX_DEBUG=1 searx-checker -v "google news"
    ...
    https://news.google.com:443 "GET /search?q=computer&hl=en...      HTTP/1.1" 302 0
    https://news.google.com:443 "GET /search?q=computer&hl=en-US&.... HTTP/1.1" 200 None
    ...

[1] https://github.com/searx/searx/pull/2483#issuecomment-765600849

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-23 11:37:14 +01:00
Markus Heiser
baec54c492 [fix] revise of the google-news engine
This revise is based on the methods developed in the revise of the google engine
(see commit 410c2f9).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-22 18:49:45 +01:00
Markus Heiser
a8544798ec [fix] remove Fabric file
The fabfile.py has not been updated since 5 years.  I also asked [1] if someone
still use Fabric wtihout any response.  Lets drop outdated Fabric file.

[1] https://github.com/searx/searx/discussions/2400

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-22 17:57:55 +01:00
Adam Tauber
f310305c54
Merge pull request #2481 from dalf/mod-check
Mod check
2021-01-20 18:48:29 +00:00
Alexandre Flament
73c86f9bf2 [mod] checker: disable by default 2021-01-19 21:44:48 +01:00
Alexandre Flament
3b7b852aa8 [fix] checker: minor fix about language detection 2021-01-19 21:29:31 +01:00
Alexandre Flament
aa887eb375 [mod] checker : replace pycld3 by langdetect
pycld3 requires the native library cld3
langdetect is a pure python package
2021-01-19 21:26:04 +01:00
Tobi823
16a0a01553 Document workaround for using 2 languages simultaneously #1508 2021-01-18 17:23:09 +01:00
Alexandre Flament
0495e15df4
Merge pull request #2476 from dalf/fix-error-recording-and-checker
Fix error recording and checker
2021-01-18 08:29:25 +01:00
Alexandre Flament
67a1aab0d5 [fix] /stats/checker : remove the timestamp field when the checker is disabled 2021-01-18 08:19:53 +01:00
Alexandre Flament
d473407ec9 [fix] checker: fix engine statistics
Without this commit, the URL /stats/errors shows percentage above 100% after the checker has run.
2021-01-18 08:19:44 +01:00
Alexandre Flament
ca76f3119a [fix] error_recorder: record code and lineno about the engine
since the PR #2225 , code and lineno were sometimes meaningless
see /stats/errors
2021-01-17 16:25:11 +01:00
Alexandre Flament
80d7411f2c
Merge pull request #2452 from kvch/add-wilby-engine
Add wiby.me engine
2021-01-16 22:36:31 +01:00
Alexandre Flament
b405646749
Merge pull request #2451 from mrwormo/invidious-engine
[Fix] Invidious Engine
2021-01-16 19:25:45 +01:00
Alexandre Flament
709dd960f1
Merge pull request #2473 from return42/fix-setup.py
[fix] setup.py requires pyyaml installed
2021-01-16 19:05:36 +01:00
Alexandre Flament
1d13ad8452
Merge pull request #2460 from dalf/engine-about
[enh] engines: add about variable
2021-01-16 19:05:17 +01:00
Markus Heiser
c4a98862bf [fix] setup.py requires pyyaml installed
pip install -e .
...
Obtaining file:///usr/local/searx/searx-src
    ERROR: Command errored out with exit status 1:
     command: /usr/local/searx/searx-pyenv/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/usr/local/searx/searx-src/setup.py'"'"'; __file__='"'"'/usr/local/searx/searx-src/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'rn'"'"', '"'"'n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-vzer91m2
         cwd: /usr/local/searx/searx-src/
    Complete output (9 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/usr/local/searx/searx-src/setup.py", line 10, in <module>
        from searx.version import VERSION_STRING
      File "/usr/local/searx/searx-src/searx/__init__.py", line 19, in <module>
        import searx.settings_loader
      File "/usr/local/searx/searx-src/searx/settings_loader.py", line 8, in <module>
        import yaml
    ModuleNotFoundError: No module named 'yaml'
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-16 08:58:13 +01:00
Alexandre Flament
a4dcfa025c [enh] engines: add about variable
move meta information from comment to the about variable
so the preferences, the documentation can show these information
2021-01-14 20:57:17 +01:00
Alexandre Flament
5a511f0d62 [fix] CI: fix docker push 2021-01-14 20:35:10 +01:00
Alexandre Flament
824fe40a28
Merge pull request #2467 from dalf/fix-ci
[fix] github actions: use ubuntu-20.04 instead of ubuntu-latest
2021-01-14 17:14:59 +01:00
Alexandre Flament
38090daa29 [fix] github actions: use ubuntu-20.04 instead of ubuntu-latest 2021-01-14 16:49:17 +01:00
mrwormo
2dff3887f0 [fix] Invidious engine by enabling requests by randomly picking amongst working instances 2021-01-14 12:12:56 +01:00
Alexandre Flament
484dc99580
Merge pull request #2419 from dalf/checker
[enh] add checker
2021-01-13 15:46:48 +01:00
Alexandre Flament
912c7e975c [fix] checker: don't run the checker when uwsgi is not properly configured
Before this commit, even with the scheduler disabled, the checker was running
at least once for each uwsgi worker.
2021-01-13 14:07:39 +01:00
Alexandre Flament
7f0c508598 [fix] checker: fix typo unknown instead of unknow 2021-01-12 11:47:17 +01:00
Alexandre Flament
a0c8b413a6 [mod] searx.shared: minor tweaks
searx.shared.shared_abstract.SharedDict inherit from abc.ABC
searx.shared.shared_uwsgi.schedule can schedule multiple functions without issue
2021-01-12 11:47:17 +01:00
Alexandre Flament
87bafbc32b [mod] checker: add status and timestamp to the result
for each engine: replace status by success
2021-01-12 11:47:17 +01:00
Alexandre Flament
f3e1bd308f [mod] checker: minor adjustements on the default tests
the query "time" is convinient because most of the search engine will return some results,
but some engines in the general category will return documentation about the HTML tags <time> or <input type="time">
2021-01-12 11:47:17 +01:00
Alexandre Flament
45bfab77d0 |mod] checker: improve searx-checker command line
* output is unbuffered
* verbose mode describe more precisly the errrors
2021-01-12 11:47:17 +01:00
Alexandre Flament
3a9f513521 [enh] checker: background check
See settings.yml for the options
SIGUSR1 signal starts the checker.
The result is available at /stats/checker
2021-01-12 11:47:17 +01:00
Alexandre Flament
6e2872f436 [enh] add searx.shared
shared dictionary between the workers (UWSGI or werkzeug)
scheduler: run a task once every x seconds (UWSGI or werkzeug)
2021-01-12 11:47:17 +01:00
Markus Heiser
9c581466e1 [fix] do not colorize output on dumb terminals
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-12 11:47:17 +01:00