Commit graph

1434 commits

Author SHA1 Message Date
Markus Heiser
7daf4f95ef [mod] Wikipedia: fetch engine traits (data_type: supported_languages)
Implements a fetch_traits function for the Wikipedia engines.

.. note::

   Does not include migration of the request methode from 'supported_languages'
   to 'traits' (EngineTraits) object!

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
f78f908383 [mod] Google: fetch engine traits (data_type: supported_languages)
Implements a fetch_traits function for the Google engines.

.. note::

   Does not include migration of the request methode from 'supported_languages'
   to 'traits' (EngineTraits) object!

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
dba8977b09 [mod] DuckDuckGo: fetch engine traits (data_type: supported_languages)
Implements a fetch_traits function for the DuckDuckGo engines.

.. note::

   Does not include migration of the request methode from 'supported_languages'
   to 'traits' (EngineTraits) object!

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
ef143729a0 [mod] yahoo: fetch engine traits (data_type: traits_v1)
Implements a fetch_traits function for the Yahoo engine.

.. note::

   Includes migration of the request methode from 'supported_languages' to
   'traits' (EngineTraits) object!

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
c1ae2ef57c [mod] qwant: fetch engine traits (data_type: traits_v1)
Implements a fetch_traits function for the Qwant engines.

.. note::

   Includes migration of the request methode from 'supported_languages' to
   'traits' (EngineTraits) object!

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
fc0c775030 [mod] Dailymotion: fetch engine traits (data_type: supported_languages)
Implements a fetch_traits function for the Dailymotion engine.

.. note::

   Does not include migration of the request methode from 'supported_languages'
   to 'traits' (EngineTraits) object!

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
61383edb27 [mod] Startpage: fetch engine traits (data_type: supported_languages)
Implements a fetch_traits function for the Startpage engine.

.. note::

   Does not include migration of the request methode from 'supported_languages'
   to 'traits' (EngineTraits) object!

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
d3aa690a7a [mod] bing: fetch engine traits (data_type: supported_languages)
Implements a fetch_traits function for the Bing engines.

.. note::

   Does not include migration of the request methode from 'supported_languages'
   to 'traits' (EngineTraits) object!

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
a7fe22770a [mod] Peertube: re-engineered & upgrade to data_type: traits_v1
- fetch_traits(): Fetch languages from peertube's search-index source code.

  [mod] Include migration of the request methode from 'supported_languages'
        to 'traits' (EngineTraits) object.
  [fix] old supported_languages_url is no longer valid since the sources
        has been moved to a different path.

- fixed code to pass pylint
- request(): complete re-implementation based on the API docs [1]
- response(): complete re-implementation, adds serveral fields missed before
- add source code documentation

[1] https://docs.joinpeertube.org/api-rest-reference.html#tag/Search/operation/searchVideos

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
6e5f22e558 [mod] replace engines_languages.json by engines_traits.json
Implementations of the *traits* of the engines.

Engine's traits are fetched from the origin engine and stored in a JSON file in
the *data folder*.  Most often traits are languages and region codes and their
mapping from SearXNG's representation to the representation in the origin search
engine.

To load traits from the persistence::

    searx.enginelib.traits.EngineTraitsMap.from_data()

For new traits new properties can be added to the class::

    searx.enginelib.traits.EngineTraits

.. hint::

   Implementation is downward compatible to the deprecated *supported_languages
   method* from the vintage implementation.

   The vintage code is tagged as *deprecated* an can be removed when all engines
   has been ported to the *traits method*.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Solirs
ac169a0f75 Pass black formatting test 2023-03-21 00:41:36 +01:00
Solirs
e26bce33d4 WIKIDATA: Add description for results 2023-03-21 00:14:54 +01:00
Alexandre Flament
3e9cddc606
rollback test 2023-03-15 19:55:20 +01:00
Alexandre Flament
41ed0ef0c7
test 2023-03-15 19:53:53 +01:00
Markus Heiser
4c06837a50 [mod] make python code pylint 2.16.1 compliant
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-02-10 13:59:21 +01:00
Markus Heiser
257dc7d6c4 [fix-2146] set different HTTP Referer header to DuckDuckGo requests
For what ever reasons, ddg-lite don't like the Referer

  https://lite.duckduckgo.com/

In an interactive session in the WEB browser the the Reverer has exactly this
value, but ddg-lite don't like this value when the request is build up by
SearXNG.  The new value is:

  https://google.com/

What fakes a user comes from a google link.

Related: https://github.com/searxng/searxng/pull/2081
Closes: https://github.com/searxng/searxng/issues/2146

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-02-03 08:45:51 +01:00
Alexandre Flament
9d102fb08f
Merge pull request #2132 from dalf/update_pr_1967
search.suspended_time settings: bug fixes
2023-01-29 20:48:43 +01:00
Alexandre Flament
bfca63c536 wikipedia engine: update _fetch_supported_languages
the layout https://meta.wikimedia.org/wiki/List_of_Wikipedias has changed
2023-01-29 10:01:58 +00:00
Alexandre Flament
8256de2fe8 peertube engine: update _fetch_supported_languages
There is now an API to get the list of supported languages
https://docs.joinpeertube.org/api-rest-reference.html#tag/Video/operation/getLanguages
2023-01-29 10:01:54 +00:00
Alexandre Flament
37addec69e search.suspended_time settings: bug fixes
* fix type in settings.yml: replace suspend_times by suspended_times
* always use delay defined in settings.yml:
  * HTTP status 402 and 403: read the value from settings.yml instead of using the hardcoded value of 1 day.
  * startpage engine: CAPTCHA suspend the engine for one day instead of one week
2023-01-28 10:24:14 +00:00
Ahmad Alkadri
7fc8d72889 [fix] bing: parsing result; check to see if the element contains links
This patch is to hardening the parsing of the bing response:

1. To fix [2087] check if the selected result item contains a link, otherwise
   skip result item and continue in the result loop.  Increment the result
   pointer when a result has been added / the enumerate that counts for skipped
   items is no longer valid when result items are skipped.

   To test the bugfix use:   ``!bi :all cerbot``

2. Limit the XPath selection of result items to direct children nodes (list
   items ``li``) of the ordered list (``ol``).

   To test the selector use: ``!bi :en pontiac aztek wiki``

   .. in the result list you should find the wikipedia entry on top,
   compare [2068]

[2087] https://github.com/searxng/searxng/issues/2087
[2068] https://github.com/searxng/searxng/issues/2068
2023-01-09 15:08:24 +01:00
ahmad-alkadri
9ee99423fe [fix] Bing-Web engine: XPath to get the wikipedia result
Modify the XPath selector to get the wikipedia result plus small fixes.

About result content: especially with the Wikipedia result, we'd get several
paragraph elements, only the first paragraph would be taken and displayed on the
search result
2023-01-08 09:11:16 +01:00
Rudis Muiznieks
128b8c7f0a
Add HTTP Referer header to DuckDuckGo requests
closes #2080
2023-01-06 16:07:37 -06:00
Rudis Muiznieks
6804ff048d
Fix: add trailing slash to duckduckgo url
Close #1854
2022-12-22 07:49:58 -06:00
Alexandre Flament
269326063a Fix: don't crash when engine or name is missing in settings.yml
SearXNG crashes if the engine or name fields are missing.
With this commit, the app displays an error in the log and keeps loading.

Close #1951
2022-12-04 23:43:59 +01:00
Émilien Devos
46ad32343a Switch back to protobuf for raw HTML 2022-11-11 07:39:48 +00:00
ngosang
78be4b4c70 Fix Google search engine.
- Fix broken links. Resolves #1794
- Fix missing results. Resolves #1829
2022-11-11 07:34:19 +01:00
Alexandre Flament
8f19bdaf17
Merge pull request #1882 from fehho/metacpan
Add MetaCPAN engine
2022-11-07 21:54:11 +01:00
fehho
fe351c2802 Add MetaCPAN engine 2022-11-07 08:07:06 -06:00
Vasilis Gerakaris
947b62c9d5
Fix floating point format in DDG weather humidity
Fixes #1836
2022-10-20 11:44:17 +03:00
Alexandre FLAMENT
035bc507ec [fix] startpage engine 2022-10-14 18:27:53 +00:00
Alexandre Flament
a3148e5115
Merge pull request #1814 from return42/fix-typos
[fix] typos / reported by @kianmeng in searx PR-3366
2022-09-28 09:22:02 +02:00
Alexandre Flament
0e00af9c26
Merge pull request #1810 from return42/fix-1809
[fix] springer: unsupported operand type(s) for +: 'NoneType' and 'str'
2022-09-28 09:20:03 +02:00
Markus Heiser
ba8959ad7c [fix] typos / reported by @kianmeng in searx PR-3366
[PR-3366] https://github.com/searx/searx/pull/3366

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-27 18:32:14 +02:00
Markus Heiser
52023e3d6e [fix] doc of the paper.html template (isbn, issn)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-25 15:46:29 +02:00
Markus Heiser
0052887929 [fix] springer: unsupported operand type(s) for +: 'NoneType' and 'str'
- fix issue reported #1809
- filter out `None` value from issn and isbn list
- add comments (from publicationName)
- add publisher

Closes: https://github.com/searxng/searxng/issues/1809
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-25 15:25:55 +02:00
Markus Heiser
e36b023508 [mod] core.ac.uk: add cetgory 'scientific publications'
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-24 16:16:22 +02:00
Alexandre Flament
16443d4f4a [mod] core.ac.uk: try multiple ways to get url
If the url is not found, using:
* the DOI
* the downloadUrl
* the ARK id
2022-09-24 15:02:39 +02:00
Markus Heiser
c76830d8a8 [mod] core.ac.uk: use paper.html template
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-24 13:19:33 +02:00
Markus Heiser
3ff2ad939d [fix] ERROR searx.engines.core.ac.uk: list index out of range
Some result items from core.ac.uk do not have an URL::

  Traceback (most recent call last):
  File "searx/search/processors/online.py", line 154, in search
    search_results = self._search_basic(query, params)
  File "searx/search/processors/online.py", line 142, in _search_basic
    return self.engine.response(response)
  File "SearXNG/searx/engines/core.py", line 73, in response
    'url': source['urls'][0].replace('http://', 'https://', 1),

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-24 13:19:33 +02:00
Alexandre Flament
d6446be38f [mod] science category: various update of about PR 1705 2022-09-23 20:52:55 +02:00
Alexandre FLAMENT
e36f85b836 Science category: update the engines
* use the paper.html template
* fetch more data from the engines
* add crossref.py
2022-09-23 20:45:58 +02:00
Alexandre Flament
bef3984d03
Merge pull request #1728 from liimee/eng-ddw
add duckduckgo weather engine
2022-09-23 18:14:09 +02:00
Alexandre Flament
d3fec1388c
Merge pull request #1624 from liimee/eng-wttr
Add wttr.in engine
2022-09-23 18:13:37 +02:00
Alexandre Flament
1a7b6872b5
Merge pull request #1792 from unixfox/google-images-internal-api
use the internal API for google images
2022-09-21 19:50:38 +02:00
Markus Heiser
cf7ee67f71 [mod] google-images: slightly improvements of the engine
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-21 18:59:55 +02:00
Emilien Devos
df5f8d0e8e use the internal API for google images 2022-09-20 22:52:38 +02:00
Markus Heiser
dcf1d408a5 [fix] google-news: origin result does not have a content area
The google news are in a rework, the content area of a news item has been
removed.

Closes: https://github.com/searxng/searxng/issues/1790
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-20 20:18:43 +02:00
Markus Heiser
fbf07237ff [fix] and improve docs generated from source code.
Fix::

    searx/locales.py:docstring of searx.locales.get_engine_locale:17: \
      WARNING: Definition list ends without a blank line; unexpected unindent.

Improvement: don't show default values in the generated documentation whe it is
more a mess than a usefull information (`:meta hide-value:`).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-18 12:44:12 +02:00
Alexandre FLAMENT
dd0887be18 xpath engine: change raise_for_httperror to no_result_for_http_status
no_result_for_http_status contains a list of HTTP status.
These HTTP status are seen an empty result list.
In other cases an exception is thrown as usual.

Previously raise_for_httperror were ignoring all HTTP error,
which make defective engines invisible in the stats.
2022-09-04 09:07:28 +02:00