Slightly modified merge of commit [1cb1d3ac] from searx [PR 2543]:
This adds Docker Hub .. as a search engine .. the engine's favicon was
downloaded from the Docker Hub website with wget and converted to a PNG
with ImageMagick .. It supports the parsing of URLs, titles, content,
published dates, and thumbnails of Docker images.
[1cb1d3ac] https://github.com/searx/searx/pull/2543/commits/1cb1d3ac
[PR 2543] https://github.com/searx/searx/pull/2543
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Access to formats can be denied by settings configuration::
search:
formats: [html, csv, json, rss]
Closes: https://github.com/searxng/searxng/issues/95
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
To test & demonstrate this implementation download:
https://liste.mediathekview.de/filmliste-v2.db.bz2
and unpack into searx/data/filmliste-v2.db, in your settings.yml define a sqlite
engine named "demo"::
- name : demo
engine : sqlite
shortcut: demo
categories: general
result_template: default.html
database : searx/data/filmliste-v2.db
query_str : >-
SELECT title || ' (' || time(duration, 'unixepoch') || ')' AS title,
COALESCE( NULLIF(url_video_hd,''), NULLIF(url_video_sd,''), url_video) AS url,
description AS content
FROM film
WHERE title LIKE :wildcard OR description LIKE :wildcard
ORDER BY duration DESC
disabled : False
Query to test: "!demo concert"
This is a rewrite of the implementation from commit [1]
[1] searx/searx@8e90a21
Suggested-by: @virtadpt searx/searx#2808
No functional change, just some linting.
- fix messages from pylint (see below)
- log where general Exceptions are catched (broad-except)
- normalized various indentation
- To avoid clashes with common names, add prefix 'route_' to all @app.route
decorated functions.
Fixed messages::
searx/webapp.py:744:0: C0301: Line too long (146/120) (line-too-long)
searx/webapp.py:756:0: C0301: Line too long (132/120) (line-too-long)
searx/webapp.py:730:9: W0511: TODO, check if timezone is calculated right (fixme)
searx/webapp.py:1:0: C0114: Missing module docstring (missing-module-docstring)
searx/webapp.py:126:8: I1101: Module 'setproctitle' has no 'setthreadtitle' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
searx/webapp.py:126:36: W0212: Access to a protected member _name of a client class (protected-access)
searx/webapp.py:131:4: R1722: Consider using sys.exit() (consider-using-sys-exit)
searx/webapp.py:141:4: R1722: Consider using sys.exit() (consider-using-sys-exit)
searx/webapp.py:255:38: W0621: Redefining name 'request' from outer scope (line 32) (redefined-outer-name)
searx/webapp.py:307:4: W0702: No exception type(s) specified (bare-except)
searx/webapp.py:374:24: W0621: Redefining name 'theme' from outer scope (line 155) (redefined-outer-name)
searx/webapp.py:420:8: R1705: Unnecessary "else" after "return" (no-else-return)
searx/webapp.py:544:4: W0621: Redefining name 'preferences' from outer scope (line 917) (redefined-outer-name)
searx/webapp.py:551:4: W0702: No exception type(s) specified (bare-except)
searx/webapp.py:566:15: W0703: Catching too general exception Exception (broad-except)
searx/webapp.py:613:4: R1705: Unnecessary "elif" after "return" (no-else-return)
searx/webapp.py:690:8: W0621: Redefining name 'search' from outer scope (line 661) (redefined-outer-name)
searx/webapp.py:661:0: R0914: Too many local variables (22/20) (too-many-locals)
searx/webapp.py:674:8: R1705: Unnecessary "else" after "return" (no-else-return)
searx/webapp.py:697:11: W0703: Catching too general exception Exception (broad-except)
searx/webapp.py:748:4: R1705: Unnecessary "elif" after "return" (no-else-return)
searx/webapp.py:661:0: R0911: Too many return statements (9/6) (too-many-return-statements)
searx/webapp.py:661:0: R0912: Too many branches (29/12) (too-many-branches)
searx/webapp.py:661:0: R0915: Too many statements (74/50) (too-many-statements)
searx/webapp.py:931:4: W0621: Redefining name 'image_proxy' from outer scope (line 1072) (redefined-outer-name)
searx/webapp.py:946:4: W0621: Redefining name 'stats' from outer scope (line 1132) (redefined-outer-name)
searx/webapp.py:917:0: R0914: Too many local variables (34/20) (too-many-locals)
searx/webapp.py:917:0: R0912: Too many branches (19/12) (too-many-branches)
searx/webapp.py:917:0: R0915: Too many statements (65/50) (too-many-statements)
searx/webapp.py:1063:44: W0621: Redefining name 'preferences' from outer scope (line 917) (redefined-outer-name)
searx/webapp.py:1072:0: R0911: Too many return statements (9/6) (too-many-return-statements)
searx/webapp.py:1151:4: C0103: Variable name "SORT_PARAMETERS" doesn't conform to '(([a-z][a-zA-Z0-9_]{2,30})|(_[a-z0-9_]*)|([a-z]))$' pattern (invalid-name)
searx/webapp.py:1297:0: R1721: Unnecessary use of a comprehension (unnecessary-comprehension)
searx/webapp.py:1303:0: C0103: Argument name "e" doesn't conform to '(([a-z][a-zA-Z0-9_]{2,30})|(_[a-z0-9_]*))$' pattern (invalid-name)
searx/webapp.py:1303:19: W0613: Unused argument 'e' (unused-argument)
searx/webapp.py:1338:23: W0621: Redefining name 'app' from outer scope (line 162) (redefined-outer-name)
searx/webapp.py:1318:0: R0903: Too few public methods (1/2) (too-few-public-methods)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
pylint message: wrong-import-order
Respect PEP8 import order (standard imports first, then third-party libraries,
then local imports).
pylint message: wrong-import-position
Do not mix code & imports
BTW:
- only one import per line
- replace licence text by SPDX tag
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Remove extension of the sys.path (aka PYTHONPATH). Running instance directly
from repository's folder is a relict from the early beginning in
2014 (fd651083f) and is no longer supported.
Since commit dd46629 was merged the command line 'searx-run' exists and should
be used.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- Use result 'alt_description' as title, if not given use
default title 'unknown'.
- Use result 'description' from unsplash as 'content'
Fix error::
DEBUG:searx:result: invalid title: {..., 'title': None, 'content': '', 'engine': 'unsplash'}
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
No functional change!
- fix messages from pylint
- add ``global THREADLOCAL``
- normalized various indentation
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Remove 'template' from result. Engine genius should
not use the video template. BTW: fix indentations
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- pylint searx/engines/xpath.py
- fix indentation of some long lines
- add logging
- add doc-strings
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- init stat values by None
- drop round_or_none
- don't try to get percentage if base is 'None'
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Add a new environment variable SEARX_DISABLE_ETC_SETTINGS
to disable loading of /etc/searx/settings.yml
unit tests:
* set SEARX_DISABLE_ETC_SETTINGS to 1
* remove SEARX_SETTINGS_PATH if it exists
Based on commits
- 0507e185 [fix] bar graph and rename CSS class engine-scores -> engine-score
- 3e9ad7ae [fix] make /stats more CSP compliant - github issue form
- 34859d0e [fix] make /stats more CSP compliant - oscar theme
- 0a6c4884 [fix] make /stats more CSP compliant - simple theme
- cdfb4b7f [fix] make /stats more CSP compliant - bar graph
- 965817f2 [fix] simple theme - generate missing sourceMap file
this patch is generated by::
make themes.all
Reported-by: https://github.com/searxng/searxng/issues/57
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- drop #main_stats selector in stats.less
- 'engine-score' exists before this PR.
- untabify searx/static/themes/__common__/less/stats.less
for details see comment at: d93bec7638..1204e4f07e (r633571496)
Suggested-by: @dalf in commit 1204e4f0
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Upgraded [v3.3.0] otherwise::
` width: calc(100% - 5rem);`
becomes `width: 95%` once compiled by less version 1.4.1.
[v3.3.0] https://github.com/gruntjs/grunt-contrib-less/releases/tag/v3.0.0
Suggested-by: @dalf in commit 1204e4f0
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Make 'soft_max_redirects' configurable per Xpath engine::
- name : <engine-name>
engine : xpath
soft_max_redirects: 1
...
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
it prepares the new architecture change,
everything about multithreading in moved in the searx.search.* packages
previously the call to the "init" function of the engines was done in searx.engines:
* the network was not set (request not sent using the defined proxy)
* it requires to monkey patch the code to avoid HTTP requests during the tests
* [mod] option to enable or disable "proxy" button next to each result
Closes: https://github.com/searxng/searxng/issues/51
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Co-authored-by: Alexandre Flament <alex@al-f.net>
When there is at least one errors or one failed checker test:
* the warning icon is displayed in the reliability column
* the link "View error logs and submit a bug report" is displayed on engine name tooltip.
Before:
* the warning icon was displayed only when one or more checker test(s) failed.
* the link "View error logs and submit a bug report" was not shown when a checker test failed but there were no error.
In the preference page, in the 'about' toolbox of an engine, add a link to the
stats page of the engine, if the engine had one or more errors.
Condition is::
reliabilities[<engine.name>].errors
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Inline styles are blocked by default with Content Security Policy (CSP). Move
the inline styles from 'new_issue.html' to::
searx/static/themes/__common__/less/new_issue.less
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
File searx/static/themes/simple/less/stats.less is not used (imported) in any
other less file. I can't say when it's usage was dropped or if it has ever been
used. ATM this file is without any usage.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* searx.network.client.LOOP is initialized in a thread
* searx.network.__init__ imports LOOP which may happen
before the thread has initialized LOOP
This commit adds a new function "searx.network.client.get_loop()"
to fix this issue
The issue exists only in the debug log::
--- Logging error ---
Traceback (most recent call last):
File "/usr/lib/python3.9/logging/__init__.py", line 1086, in emit
stream.write(msg + self.terminator)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 79-89: ordinal not in range(128)
Call stack:
File "/usr/local/searx/searx-pyenv/lib/python3.9/site-packages/flask/app.py", line 2464, in __call__
return self.wsgi_app(environ, start_response)
File "/usr/local/searx/searx-src/searx/webapp.py", line 1316, in __call__
return self.app(environ, start_response)
File "/usr/local/searx/searx-pyenv/lib/python3.9/site-packages/werkzeug/middleware/proxy_fix.py", line 169, in __call__
return self.app(environ, start_response)
File "/usr/local/searx/searx-pyenv/lib/python3.9/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/searx/searx-pyenv/lib/python3.9/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/searx/searx-pyenv/lib/python3.9/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/usr/local/searx/searx-src/searx/webapp.py", line 766, in search
number_of_results=format_decimal(number_of_results),
File "/usr/local/searx/searx-pyenv/lib/python3.9/site-packages/flask_babel/__init__.py", line 458, in format_decimal
locale = get_locale()
File "/usr/local/searx/searx-pyenv/lib/python3.9/site-packages/flask_babel/__init__.py", line 226, in get_locale
rv = babel.locale_selector_func()
File "/usr/local/searx/searx-src/searx/webapp.py", line 249, in get_locale
logger.debug("%s uses locale `%s` from %s", request.url, locale, locale_source)
Unable to print the message and arguments - possible formatting error.
Use the traceback above to help find the error.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- add to list of pylint scripts
- add debug log messages
- move API key int `settings.yml`
- improved readability
- add some metadata to results
Signed-off-by: Markus Heiser <markus@darmarit.de>
Springer Nature is a global publisher dedicated to providing service to research
community [1] with official API [2].
To test this PR, first get your API key following this page:
https://dev.springernature.com/signup
In searx/engines/springer.py at line 24, add this API key. I left my own key,
commented out in the line aboce. Feel free to use it, if needed.
[1] https://www.springernature.com/
[2] https://dev.springernature.com/
The new sci-hub URLs are comming from @aurora-vasiliev [1].
[1] https://github.com/searx/searx/pull/2706
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In the EU there exists a "General Data Protection Regulation" [1] aka GDPR (BTW:
very user friendly!) which requires consent to tracking. To get the consent
from the user, youtube requests are redirected to confirm and get a CONSENT
Cookie from https://consent.youtube.com
This patch adds a CONSENT Cookie to the youtube request to avoid redirection.
[1] https://en.wikipedia.org/wiki/General_Data_Protection_Regulation
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Reported-by: https://github.com/searx/searx/issues/2774
* display the median time instead of the average.
* add a "Reliability" column (sum up the metrics and the checker results).
* the "selected language", "SafeSearch", "Time range" values are displayed as "broken" when the checker tests fail.
Some error won't stop the engine:
* additional HTTP redirects for example
* some invalid results
secondary=True allows to flag these errors as not important.
Report to the user suspended engines.
searx.search.processor.abstract:
* manages suspend time (per network).
* reports suspended time to the ResultContainer (method extend_container_if_suspended)
* adds the results to the ResultContainer (method extend_container)
* handles exceptions (method handle_exception)
Some engine do have set result.img_src, other return a result.thumbnail. If
result.img_src is unset and a result.thumbnail is given, show it to the UI.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
I also found some items missing a thumbnail and I used text_extract for content
and title, to remove unneeded whitespaces.
BTW: added bandcamp's favicon
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
settings.yml:
* outgoing.networks:
* can contains network definition
* propertiers: enable_http, verify, http2, max_connections, max_keepalive_connections,
keepalive_expiry, local_addresses, support_ipv4, support_ipv6, proxies, max_redirects, retries
* retries: 0 by default, number of times searx retries to send the HTTP request (using different IP & proxy each time)
* local_addresses can be "192.168.0.1/24" (it supports IPv6)
* support_ipv4 & support_ipv6: both True by default
see https://github.com/searx/searx/pull/1034
* each engine can define a "network" section:
* either a full network description
* either reference an existing network
* all HTTP requests of engine use the same HTTP configuration (it was not the case before, see proxy configuration in master)
This patch is an addition to PR #2656 which removed all usage of `base_url` from
the templates, except one was forgotten in the cookie URL of the preferences.
closes: 2740
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
I can't set `default_doi_resolver` in `settings.yml` if I'm using
`use_default_settings`. Searx seems to try to interpret all settings at root
level in `settings.yml` as dict, which is correct except for
`default_doi_resolver` which is at root level and a string::
File "/usr/lib/python3.9/site-packages/searx/settings_loader.py", line 125, in load_settings
update_settings(default_settings, user_settings)
File "/usr/lib/python3.9/site-packages/searx/settings_loader.py", line 61, in update_settings
update_dict(default_settings[k], v)
File "/usr/lib/python3.9/site-packages/searx/settings_loader.py", line 48, in update_dict
for k, v in user_dict.items():
AttributeError: 'str' object has no attribute 'items'
Signed-off-by: Markus Heiser <markus@darmarit.de>
Suggested-by: @0xhtml https://github.com/searx/searx/issues/2722#issuecomment-813391659
The `url_for` function in the template context is not the one from Flask, it is
the one from `webapp`. The `webapp.url_for_theme` is different from its
namesake of Flask and has it quirks, when called with argument `_external=True`.
The `webapp.url_for_theme` can't handle absolute URLs since it pokes a leading
'/', here is the snippet of the old code::
url = url_for(endpoint, **values)
if settings['server']['base_url']:
if url.startswith('/'):
url = url[1:]
url = urljoin(settings['server']['base_url'], url)
Next drawback of (Flask's) `_external=True` is, that it will not return the HTTP
scheme when searx (the Flask app) listens on http and is proxied by a https
server.
To get the right scheme `HTTP_X_SCHEME` is needed by Flask (werkzeug). Since
this is not provided in every environment (e.g. behind Apache mod_wsgi or the
HTTP header is not fully set for some other reasons) it is recommended to
get *script_name*, *server* and *scheme* from the configured `base_url`. If
`base_url` is specified, then these values from are given preference over any
Flask's generics.
BTW this patch normalize to use `url_for` in the `opensearch.xml` and drop the
need of `host` and `urljoin` in template's context.
Signed-off-by: Markus Heiser <markus@darmarit.de>
Instead of a hard-coded `oadoi.org` default, use the default value from
`settings.yml`.
Fix an issue in the themes: The replacement 'current_doi_resolver' contains the
doi_resolver_url, not the name of the DOI resolver. Compare return value of::
searx.plugins.oa_doi_rewrite.get_doi_resolver(...)
Fix a typo in `get_doi_resolver(..)`: suggested by @kvch:
*L32 should set doi_resolver not doi_resolvers*
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
fr.wikipedia.org (and it seems not other wikipedia websites),
adds HTML to api_result['displayTitle'].
(Search for '!wp :fr Braid' for example)
The commit uses api_result['title']
the json response has been changed and it contains html chunks which is
not compatible with our json engine, so we have to switch to html/xpath
parsing
The get_cliend_id() function:
* fetches https://soundcloud.com
* then fetches each referenced javascript URL to get the client id.
This commit fetches the javascript URLs in the reverse order: the client id is in the last javascript URL.
Added a line to the yacy entry to enable HTTP if the local yacy instance isn't using HTTPS. Otherwise, an error will be thrown in the logs: "No connection adapters were found for 'http://localhost:8090/yacysearch.json...'". This is likely related to ticket #2641 that forces HTTPS by default.
See https://github.com/requirejs/requirejs/issues/1816
requirejs loads one file: leaflet.
This commit:
* removes requirejs
* load leaflet using <script src...> HTML tag in searx/templates/oscar/base.html
Many things have been changed since last review of this engine. This patch fix
xpath selectors, implements suggestion and is a complete review / rewrite of the
engine.
Signed-off-by: Markus Heiser <markus@darmarit.de>
When initing engines a "SearxEngineResponseException" is logged very verbose,
including full traceback information:
ERROR:searx.engines:yggtorrent engine: Fail to initialize
Traceback (most recent call last):
File "share/searx/searx/engines/__init__.py", line 293, in engine_init
init_fn(get_engine_from_settings(engine_name))
File "share/searx/searx/engines/yggtorrent.py", line 42, in init
resp = http_get(url, allow_redirects=False)
File "share/searx/searx/poolrequests.py", line 197, in get
return request('get', url, **kwargs)
File "share/searx/searx/poolrequests.py", line 190, in request
raise_for_httperror(response)
File "share/searx/searx/raise_for_httperror.py", line 60, in raise_for_httperror
raise_for_captcha(resp)
File "share/searx/searx/raise_for_httperror.py", line 43, in raise_for_captcha
raise_for_cloudflare_captcha(resp)
File "share/searx/searx/raise_for_httperror.py", line 30, in raise_for_cloudflare_captcha
raise SearxEngineCaptchaException(message='Cloudflare CAPTCHA', suspended_time=3600 * 24 * 15)
searx.exceptions.SearxEngineCaptchaException: Cloudflare CAPTCHA, suspended_time=1296000
For SearxEngineResponseException this is not needed. Those types of exceptions
can be a normal use case. E.g. for CAPTCHA errors like shown in the example
above. It should be enough to log a warning for such issues:
WARNING:searx.engines:yggtorrent engine: Fail to initialize // Cloudflare CAPTCHA, suspended_time=1296000
closes: #2612
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The old xpath configuration for google scholar did not work and is replaced by a
python implementation.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- unittest2 is a backport of the new features added to the unittest testing
framework in Python 2.7
- unittest2 was only needed in py2 and can be dropped now
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Bing has a list of regions that it supports and some of these regions
may have more than one possible language.
In some cases, like Switzerland, these languages are always shown as
options, so there is no issue. But in other cases, like Andorra, Bing
will only show one language at the time, either the region's default or
the request's language if the latter is supported by that region.
For example, if the HTTP request is in French, Andorra will appear as
fr-AD but if the same page is requested in any other language Andorra
will appear as ca-AD.
This is specially a problem when Bing assumes that the request is in
English because it overrides enough language codes to make several major
languages like Arabic dissappear from the languages.py file.
To avoid that issue, I set the Accept-Language header to a language
that's only supported in one region to hopefully avoid these overrides.
use a sparql request on wikidata to get the list of currencies.
currencies.json contains the translation for all supported searx languages.
Supersede #993