Commit graph

1738 commits

Author SHA1 Message Date
Markus Heiser
460bbe5b81 [mod] implement brave (WEB) engine to replace XPath configuration
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-08-08 16:21:45 +02:00
Bnyro
d151497db3 [feat] engine: brave - support for news 2023-08-08 16:21:45 +02:00
Bnyro
cae06f2781 [feat] engine: brave - support for videos 2023-08-08 16:21:45 +02:00
Bnyro
73364e158e [feat] engine: brave - support for images 2023-08-08 16:21:45 +02:00
Markus Heiser
1d0abb7157 [doc] engine bt4g: add documentation to docs/dev/engines/online/
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-08-06 09:30:48 +02:00
Emilien Devos
0fc8f99ecc [feat] new engine: bt4g added & enabled and disable by default btdigg
Disable btdigg because on most SearXNG instances, SearXNG is blocked by btdigg
due to cloudflare too many requests.

This impementation did not parse the HTML page because there is an API in
XML (RSS).  The RSS feed provides fewer data like amount of seeders/leechers and
the files in the torrent file. It's a tradeoff for a "stable" engine as the XML
from RSS content will change way less than the HTML page.

Closes: https://github.com/searxng/searxng/issues/2553
2023-08-06 09:30:48 +02:00
Markus Heiser
db522cf76d [mod] engine: wikimedia - improve results, add addition settings & doc
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-08-04 19:06:50 +02:00
Bnyro
7d8c20c80d [feat] new engine: wikispecies 2023-08-04 19:06:50 +02:00
Markus Heiser
1b030d4b41 [doc] engine: Yacy
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-08-03 19:58:51 +02:00
zutto
ca518c6803 add option to change yacy search mode 2023-08-03 19:58:51 +02:00
Markus Heiser
203f1f0928 [fix] engine piped: 'invalid content'
SearXNG does not allow a None value in the content field of a result item.

If the key (shortDescription, uploaderName) in the JSON response from piped
exists but is set to None, SearXNG ignores this result item::

  DEBUG   searx    : result: invalid content: { ..,  'content': None,  ..}

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-08-03 16:23:36 +02:00
Markus Heiser
207fcc0c8c [mod] engine piped: add paging support
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-08-03 16:23:36 +02:00
Markus Heiser
ef5831cd84 [mod] engine piped: split into two dedicated engiens for video & music
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-08-03 16:23:36 +02:00
Markus Heiser
7aa95d2d52 [doc] engine piped: add documentation to docs/dev/engines/online/
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-08-03 16:23:36 +02:00
Bnyro
636bfdac68 [feat] engine: implementation of Piped 2023-08-03 16:23:36 +02:00
Paolo Basso
cada89ee36 [feat] engine: re-enables z-library (zlibrary-global.se)
- re-enables z-library as the new domain zlibrary-global.se is now available
  from the open web.   The announcement of the domain:

    https://www.reddit.com/r/zlibrary/comments/13whe08/mod_note_zlibraryglobalse_domain_is_officially/

  It is an official domain, it requires to log in to the "personal" subdomain
  only to download files, but the search works.

- changes the result template of zlibrary to paper.html, filling the appropriate fields
- implements language filtering for zlibrary
- implement zlibrary custom filters (engine traits)
- refactor and document the zlibrary engine
2023-07-07 21:36:51 +02:00
Markus Heiser
5720844fcd [doc] rearranges Settings & Engines docs for better readability
We have built up detailed documentation of the *settings* and the *engines* over
the past few years.  However, this documentation was still spread over various
chapters and was difficult to navigate in its entirety.

This patch rearranges the Settings & Engines documentation for better
readability.

To review new ordered docs::

   make docs.clean docs.live

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-07-01 22:45:19 +02:00
Markus Heiser
87e7926ae9 [fix] engine: Anna's Archive - grep results from '.js-scroll-hidden' elements
The renderuing of the WEB page is very strange; except the firts position all
other positions of Anna's result page are enclosed in SGML comments.  These
cooments are *uncommented* by some JS code, see query of the class
'.js-scroll-hidden' in Anna's HTML template [1].

[1] https://annas-software.org/AnnaArchivist/annas-archive/-/blob/main/allthethings/templates/macros/md5_list.html

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-29 09:32:57 +02:00
Markus Heiser
e2df6b77a3 [mod] engine: Anna's Archive - additionl settings (content, sort, ext)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-29 09:32:57 +02:00
Markus Heiser
eafc2906f1 [mod] engine: Anna's Archive - fetch search arguments from search form
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-29 09:32:57 +02:00
Paolo Basso
7adb9090e5 [mod] engine: Anna's Archive - add language support 2023-06-29 09:32:57 +02:00
Paolo Basso
e5637fe7b9 [feat] engine: implementation of Anna's Archive
Anna's Archive [1] is a free non-profit online shadow library metasearch engine
providing access to a variety of book resources (also via IPFS), created by a
team of anonymous archivists [2].

[1] https://annas-archive.org/
[2] https://annas-software.org/AnnaArchivist/annas-archive
2023-06-29 09:32:57 +02:00
Paolo Basso
401561cb58 [mod] engine torznab - refactor & option to hide links
- torznab engine using types and clearer code
- torznab option to hide torrent and magnet links.
- document the torznab engine
- add myself to authors

Closes: https://github.com/searxng/searxng/issues/1124
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-28 10:03:44 +02:00
Markus Heiser
da7c30291d [fix] Google API changed
It seems that Google is rolling out a modified WEB API [1][2].

In the past there was only the UI language in the `hl` argument but nowadays it
seems a combination of the UI language and the "search region" is mixed in this
argument and the `gl` argument has been removed.  I'm very surprised that google
is starting to mix the parameters of the UI with the parameters of the search
index.

This patch modifies the get_google_info(..) function.  Beside Google-WEB this
function is also used by other Google services, here are some examples to test
region & language of ..

- Google-WEB:    `!go dragon boat :en-CA`
- Google-News:   `!gon dragon boat :en-CA`
- Google-Videos: `!gov bmw :en-CA`
- Goolge-Images  `!goi bmw :en-CA`

- [1] https://github.com/searxng/searxng/issues/2515#issuecomment-1606294635
- [2] https://github.com/searxng/searxng/issues/2515#issuecomment-1607150817

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-26 18:28:09 +02:00
Markus Heiser
e8706fb738 [fix] engine & network issues / documentation and type annotations
This patch fixes some quirks and issues related to the engines and the network.
Each engine has its own network and this network was broken for the following
engines[1]:

- archlinux
- bing
- dailymotion
- duckduckgo
- google
- peertube
- startpage
- wikipedia

Since the files have been touched anyway, the type annotaions of the engine
modules has also been completed so that error messages from the type checker are
no longer reported.

Related and (partial) fixed issue:

- [1] https://github.com/searxng/searxng/issues/762#issuecomment-1605323861
- [2] https://github.com/searxng/searxng/issues/2513
- [3] https://github.com/searxng/searxng/issues/2515

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-25 13:58:26 +02:00
pankaj
4900c091a6 use logger.warning
logger.warn() is depricated.
logger.warning is already being used in some files.
2023-05-19 19:35:29 +05:30
Markus Heiser
caebd297e9 [fix] engine ddg: minor change in the API of ddg
Closes: https://github.com/searxng/searxng/issues/2419
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-05-12 18:58:49 +02:00
Markus Heiser
9b575a997b [fix] doc of locales.get_engine_locale() / zh-classical is missleading
Wikipedia's zh-classical is not zh_Hant (see doc-string of engines.wikipedia).
Fixed the example in the doc-string of locales.get_engine_locale() to 'zh_TW'.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-04-17 08:48:57 +02:00
Markus Heiser
f1b6351ae1 [fix] engine: google play movies
Closes: https://github.com/searxng/searxng/pull/1746
Closes: https://github.com/searxng/searxng/issues/1599

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-04-16 19:15:44 +02:00
Markus Heiser
27369ebec2 [fix] searxng_extra/update/update_engine_descriptions.py (part 1)
Follow up of #2269

The script to update the descriptions of the engines does no longer work since
PR #2269 has been merged.

searx/engines/wikipedia.py
==========================

1. There was a misusage of zh-classical.wikipedia.org:

   - `zh-classical` is dedicate to classical Chinese [1] which is not
     traditional Chinese [2].

   - zh.wikipedia.org has LanguageConverter enabled [3] and is going to
     dynamically show simplified or traditional Chinese according to the
     HTTP Accept-Language header.

2. The update_engine_descriptions.py needs a list of all wikipedias.  The
   implementation from #2269 included only a reduced list:

   - https://meta.wikimedia.org/wiki/Wikipedia_article_depth
   - https://meta.wikimedia.org/wiki/List_of_Wikipedias

searxng_extra/update/update_engine_descriptions.py
==================================================

Before PR #2269 there was a match_language() function that did an approximation
using various methods.  With PR #2269 there are only the types in the data model
of the languages, which can be recognized by babel.  The approximation methods,
which are needed (only here) in the determination of the descriptions, must be
replaced by other methods.

[1] https://en.wikipedia.org/wiki/Classical_Chinese
[2] https://en.wikipedia.org/wiki/Traditional_Chinese_characters
[3] https://www.mediawiki.org/wiki/Writing_systems#LanguageConverter

Closes: https://github.com/searxng/searxng/issues/2330
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-04-15 16:03:59 +02:00
Markus Heiser
23ac964e35 [fix] Bing-WEB: use <span class='algoSlug_icon'> for the description
On some result items from Bing-WEB the `<span class='algoSlug_icon'>` tag is the
only tag that contains a description.  The issue can be reproduced by [1]::

    !bi vmware

[1] https://github.com/searxng/searxng/issues/1764#issuecomment-1417990531

Reported-by: @AlyoshaVasilieva
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-04-08 09:43:04 +02:00
Markus Heiser
2ffd446e5c [mod] clarify the difference of the default category and subgrouping
This PR does no functional change it is just an attempt to make more clear in
the code, what a default category is and what a subcategory is.  The previous
name 'others' leads to confusion with the **category 'other'**.

If a engine is not assigned to a category, the default is assigned::

    DEFAULT_CATEGORY = 'other'

If an engine has only one category and this category is shown as tab in the user
interface, this engine has no further subgrouping::

    NO_SUBGROUPING = 'without further subgrouping'

Related:

- https://github.com/searxng/searxng/issues/1604
- https://github.com/searxng/searxng/pull/1545

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-04-07 11:03:25 +02:00
Markus Heiser
5234e45010 [fix] Gigablast.com has been erased
[1] https://www.reddit.com/r/searchengines/comments/128wdcp/gigablastcom_has_been_erased/

Closes: https://github.com/searxng/searxng/issues/2321
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-04-06 08:22:57 +02:00
Markus Heiser
a762172bf7 [fix] engine ddg: quote !bangs in a request send to ddg
Closes: https://github.com/searxng/searxng/issues/392
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-04-03 09:52:16 +02:00
Markus Heiser
0430662189 [fix] engine google-News: fix decoding of URLs (part 2)
Follow up of 8de8070ed to fix the issue reported by AlyoshaVasilieva [1].

[1] https://github.com/searxng/searxng/issues/1959#issuecomment-1493300574

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-04-02 19:19:59 +02:00
Markus Heiser
8de8070ed9 [fix] engine google-News: fix decoding of URLs
Google-News returns internal links where the origin URL is encoded in a
base64 (RFC 2045 aka URL-safe) string.

Closes: https://github.com/searxng/searxng/issues/1959
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-04-01 19:33:13 +02:00
Markus Heiser
509afbbb84 [fix] engine seznam: fix issues reported by black & pylint
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-31 17:25:39 +02:00
Venca24
c8d78355ff [fix] engine seznam 2023-03-31 16:11:27 +02:00
Markus Heiser
270ad18897 [fix] engine flickr: adapt to the new data model from flicker's response
Closes: https://github.com/searxng/searxng/issues/1879
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-30 21:04:53 +02:00
Markus Heiser
2b8dfab33f [fix] engine gigablast: add &userid=<User ID>&code=<Feed Code>
Gigablast's API does block unauthorized request[1].

[1] https://gigablast.com/searchfeed.html

Closes: https://github.com/searxng/searxng/issues/1454
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-29 16:18:02 +02:00
Markus Heiser
6f9e678346 [fix] engine: google has changed the layout of its response
Since 28. March google has changed its response, this patch fixes the google
engine to scrap out the results & images from the new designed response.

closes: https://github.com/searxng/searxng/issues/2287

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-28 14:39:16 +02:00
Markus Heiser
4d4aa13e1f [mod] remove obsolete EngineTraits.supported_languages
All engines has been migrated from ``supported_languages`` to the
``fetch_traits`` concept.  There is no longer a need for the obsolete code that
implements the ``supported_languages`` concept.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
96a2eec3b5 [mod] Archlinux Wiki: improved request API & upgrade to data_type: traits_v1
re-implementation of the Archlinux Wiki:

- fetch_traits(): fetch languages, wiki URLs and title arguments
- add content field to the result list
- add documentation

Wikis from wiki.archlinux.fr, wiki.archlinux.ro, archtr.org/wiki do no longer
exists (has been merged in the main wiki).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
057e9bc1d1 [mod] SepiaSearch: re-engineered & upgrade to data_type: traits_v1
- fetch_traits() SepiaSearch and Peertube are using identical languages.
  Replace module's dictionary `supported_languages` by `engine.traits.languages`
  (data_type: `traits_v1`).
- fixed code to pass pylint
- request(): add argument boostLanguages
- response(): is replaced by peertube's video_response() function, which adds
  metadata from channel name, host & tags

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
8a8c584fec [mod] Dailymotion: improved request API & upgrade to data_type: traits_v1
- fetch_traits(): fetch locales (and languages) from dailymotion API
- removed obsolete data-type "supported_languages"
- add documentation
- improved argument list of the HTTP request:
  - add argument: family_filter_map
  - add conditional argument: localization
    Don't add localization and country arguments if the user does select a
    language (:de, :en, ..)
- improve code quality (mainly improve readability)

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
2499899554 [mod] Google: reversed engineered & upgrade to data_type: traits_v1
Partial reverse engineering of the Google engines including a improved language
and region handling based on the engine.traits_v1 data.

When ever possible the implementations of the Google engines try to make use of
the async REST APIs.  The get_lang_info() has been generalized to a
get_google_info() function / especially the region handling has been improved by
adding the cr parameter.

searx/data/engine_traits.json
  Add data type "traits_v1" generated by the fetch_traits() functions from:

  - Google (WEB),
  - Google images,
  - Google news,
  - Google scholar and
  - Google videos

  and remove data from obsolete data type "supported_languages".

  A traits.custom type that maps region codes to *supported_domains* is fetched
  from https://www.google.com/supported_domains

searx/autocomplete.py:
  Reversed engineered autocomplete from Google WEB.  Supports Google's languages and
  subdomains.  The old API suggestqueries.google.com/complete has been replaced
  by the async REST API: https://{subdomain}/complete/search?{args}

searx/engines/google.py
  Reverse engineering and extensive testing ..
  - fetch_traits():  Fetch languages & regions from Google properties.
  - always use the async REST API (formally known as 'use_mobile_ui')
  - use *supported_domains* from traits
  - improved the result list by fetching './/div[@data-content-feature]'
    and parsing the type of the various *content features* --> thumbnails are
    added

searx/engines/google_images.py
  Reverse engineering and extensive testing ..
  - fetch_traits():  Fetch languages & regions from Google properties.
  - use *supported_domains* from traits
  - if exists, freshness_date is added to the result
  - issue 1864: result list has been improved a lot (due to the new cr parameter)

searx/engines/google_news.py
  Reverse engineering and extensive testing ..
  - fetch_traits():  Fetch languages & regions from Google properties.
    *supported_domains* is not needed but a ceid list has been added.
  - different region handling compared to Google WEB
  - fixed for various languages & regions (due to the new ceid parameter) /
    avoid CONSENT page
  - Google News do no longer support time range
  - result list has been fixed: XPath of pub_date and pub_origin

searx/engines/google_videos.py
  - fetch_traits():  Fetch languages & regions from Google properties.
  - use *supported_domains* from traits
  - add paging support
  - implement a async request ('asearch': 'arc' & 'async':
    'use_ac:true,_fmt:html')
  - simplified code (thanks to '_fmt:html' request)
  - issue 1359: fixed xpath of video length data

searx/engines/google_scholar.py
  - fetch_traits():  Fetch languages & regions from Google properties.
  - use *supported_domains* from traits
  - request(): include patents & citations
  - response(): fixed CAPTCHA detection (Scholar has its own CATCHA manager)
  - hardening XPath to iterate over results
  - fixed XPath of pub_type (has been change from gs_ct1 to gs_cgt2 class)
  - issue 1769 fixed: new request implementation is no longer incompatible

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
c80e82a855 [mod] DuckDuckGo: reversed engineered & upgrade to data_type: traits_v1
Partial reverse engineering of the DuckDuckGo (DDG) engines including a
improved language and region handling based on the enigne.traits_v1 data.

- DDG Lite
- DDG Instant Answer API
- DDG Images
- DDG Weather

docs/src/searx.engine.duckduckgo.rst:
  Online documentation of the DDG engines (make docs.live)

searx/data/engine_traits.json
  Add data type "traits_v1" generated by the fetch_traits() functions from:

  - "duckduckgo" (WEB),
  - "duckduckgo images" and
  - "duckduckgo weather"

  and remove data from obsolete data type "supported_languages".

searx/autocomplete.py:
  Reversed engineered Autocomplete from DDG.  Supports DDG's languages.

searx/engines/duckduckgo.py:
  - fetch_traits():  Fetch languages & regions from DDG.

  - get_ddg_lang(): Get DDG's language identifier from SearXNG's locale.  DDG
    defines its languages by region codes.  DDG-Lite does not offer a language
    selection to the user, only a region can be selected by the user.

  - Cache ``vqd`` value: The vqd value depends on the query string and is needed
    for the follow up pages or the images loaded by a XMLHttpRequest (DDG
    images).  The ``vqd`` value of a search term is stored for 10min in the
    redis DB.

  - DDG Lite engine: reversed engineered request method with improved Language
    and region support and better ``vqd`` handling.

searx/engines/duckduckgo_definitions.py: DDG Instant Answer API
  The *instant answers* API does not support languages, or at least we could not
  find out how language support should work.  It seems that most of the features
  are based on English terms.

searx/engines/duckduckgo_images.py: DDG Images
  Reversed engineered request method.  Improved language and region handling
  based on cookies and the enigne.traits_v1 data.  Response: add image format to
  the result list

searx/engines/duckduckgo_weather.py: DDG Weather
  Improved language and region handling based on cookies and the
  enigne.traits_v1 data.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
e9afc4f8ce [mod] Startpage: reversed engineered & upgrade to data_type: traits_v1
One reason for the often seen CAPTCHA of the Startpage requests are the
incomplete requests SearXNG sends to startpage.com: this patch is a complete new
implementation of the ``request()`` function, reversed engineered from the
Startpage's search form.  The new implementation:

- use traits of data_type: traits_v1 and drop deprecated data_type: supported_languages
- adds time-range support
- adds save-search support
- fix searxng/searxng/issues 1884
- fix searxng/searxng/issues 1081 --> improvements to avoid CAPTCHA

In preparation for more categories (News, Images, Videos ..) from Startpage, the
variable ``startpage_categ`` was set up.  The default value is ``web`` and other
categories from Startpage are not yet implemented.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
858aa3e604 [mod] wikipedia & wikidata: upgrade to data_type: traits_v1
BTW this fix an issue in wikipedia: SearXNG's locales zh-TW and zh-HK are now
using language `zh-classical` from wikipedia (and not `zh`).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
e0a6ca96cc [doc] add a description of bing engines (web, news, video, images)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
15eaf0f15f [mod] bing_news: use async API & upgrade to data_type: traits_v1
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
ff80e7637e [mod] bing_images: use async API & upgrade to data_type: traits_v1
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
bc21d28298 [mod] bing_videos: use async API & upgrade to data_type: traits_v1
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
d0f465e6fa [mod] bing: add time_range support & upgrade to data_type: traits_v1
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
7daf4f95ef [mod] Wikipedia: fetch engine traits (data_type: supported_languages)
Implements a fetch_traits function for the Wikipedia engines.

.. note::

   Does not include migration of the request methode from 'supported_languages'
   to 'traits' (EngineTraits) object!

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
f78f908383 [mod] Google: fetch engine traits (data_type: supported_languages)
Implements a fetch_traits function for the Google engines.

.. note::

   Does not include migration of the request methode from 'supported_languages'
   to 'traits' (EngineTraits) object!

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
dba8977b09 [mod] DuckDuckGo: fetch engine traits (data_type: supported_languages)
Implements a fetch_traits function for the DuckDuckGo engines.

.. note::

   Does not include migration of the request methode from 'supported_languages'
   to 'traits' (EngineTraits) object!

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
ef143729a0 [mod] yahoo: fetch engine traits (data_type: traits_v1)
Implements a fetch_traits function for the Yahoo engine.

.. note::

   Includes migration of the request methode from 'supported_languages' to
   'traits' (EngineTraits) object!

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
c1ae2ef57c [mod] qwant: fetch engine traits (data_type: traits_v1)
Implements a fetch_traits function for the Qwant engines.

.. note::

   Includes migration of the request methode from 'supported_languages' to
   'traits' (EngineTraits) object!

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
fc0c775030 [mod] Dailymotion: fetch engine traits (data_type: supported_languages)
Implements a fetch_traits function for the Dailymotion engine.

.. note::

   Does not include migration of the request methode from 'supported_languages'
   to 'traits' (EngineTraits) object!

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
61383edb27 [mod] Startpage: fetch engine traits (data_type: supported_languages)
Implements a fetch_traits function for the Startpage engine.

.. note::

   Does not include migration of the request methode from 'supported_languages'
   to 'traits' (EngineTraits) object!

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
d3aa690a7a [mod] bing: fetch engine traits (data_type: supported_languages)
Implements a fetch_traits function for the Bing engines.

.. note::

   Does not include migration of the request methode from 'supported_languages'
   to 'traits' (EngineTraits) object!

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
a7fe22770a [mod] Peertube: re-engineered & upgrade to data_type: traits_v1
- fetch_traits(): Fetch languages from peertube's search-index source code.

  [mod] Include migration of the request methode from 'supported_languages'
        to 'traits' (EngineTraits) object.
  [fix] old supported_languages_url is no longer valid since the sources
        has been moved to a different path.

- fixed code to pass pylint
- request(): complete re-implementation based on the API docs [1]
- response(): complete re-implementation, adds serveral fields missed before
- add source code documentation

[1] https://docs.joinpeertube.org/api-rest-reference.html#tag/Search/operation/searchVideos

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
6e5f22e558 [mod] replace engines_languages.json by engines_traits.json
Implementations of the *traits* of the engines.

Engine's traits are fetched from the origin engine and stored in a JSON file in
the *data folder*.  Most often traits are languages and region codes and their
mapping from SearXNG's representation to the representation in the origin search
engine.

To load traits from the persistence::

    searx.enginelib.traits.EngineTraitsMap.from_data()

For new traits new properties can be added to the class::

    searx.enginelib.traits.EngineTraits

.. hint::

   Implementation is downward compatible to the deprecated *supported_languages
   method* from the vintage implementation.

   The vintage code is tagged as *deprecated* an can be removed when all engines
   has been ported to the *traits method*.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Solirs
ac169a0f75 Pass black formatting test 2023-03-21 00:41:36 +01:00
Solirs
e26bce33d4 WIKIDATA: Add description for results 2023-03-21 00:14:54 +01:00
Alexandre Flament
3e9cddc606
rollback test 2023-03-15 19:55:20 +01:00
Alexandre Flament
41ed0ef0c7
test 2023-03-15 19:53:53 +01:00
Markus Heiser
4c06837a50 [mod] make python code pylint 2.16.1 compliant
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-02-10 13:59:21 +01:00
Markus Heiser
257dc7d6c4 [fix-2146] set different HTTP Referer header to DuckDuckGo requests
For what ever reasons, ddg-lite don't like the Referer

  https://lite.duckduckgo.com/

In an interactive session in the WEB browser the the Reverer has exactly this
value, but ddg-lite don't like this value when the request is build up by
SearXNG.  The new value is:

  https://google.com/

What fakes a user comes from a google link.

Related: https://github.com/searxng/searxng/pull/2081
Closes: https://github.com/searxng/searxng/issues/2146

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-02-03 08:45:51 +01:00
Alexandre Flament
9d102fb08f
Merge pull request #2132 from dalf/update_pr_1967
search.suspended_time settings: bug fixes
2023-01-29 20:48:43 +01:00
Alexandre Flament
bfca63c536 wikipedia engine: update _fetch_supported_languages
the layout https://meta.wikimedia.org/wiki/List_of_Wikipedias has changed
2023-01-29 10:01:58 +00:00
Alexandre Flament
8256de2fe8 peertube engine: update _fetch_supported_languages
There is now an API to get the list of supported languages
https://docs.joinpeertube.org/api-rest-reference.html#tag/Video/operation/getLanguages
2023-01-29 10:01:54 +00:00
Alexandre Flament
37addec69e search.suspended_time settings: bug fixes
* fix type in settings.yml: replace suspend_times by suspended_times
* always use delay defined in settings.yml:
  * HTTP status 402 and 403: read the value from settings.yml instead of using the hardcoded value of 1 day.
  * startpage engine: CAPTCHA suspend the engine for one day instead of one week
2023-01-28 10:24:14 +00:00
Ahmad Alkadri
7fc8d72889 [fix] bing: parsing result; check to see if the element contains links
This patch is to hardening the parsing of the bing response:

1. To fix [2087] check if the selected result item contains a link, otherwise
   skip result item and continue in the result loop.  Increment the result
   pointer when a result has been added / the enumerate that counts for skipped
   items is no longer valid when result items are skipped.

   To test the bugfix use:   ``!bi :all cerbot``

2. Limit the XPath selection of result items to direct children nodes (list
   items ``li``) of the ordered list (``ol``).

   To test the selector use: ``!bi :en pontiac aztek wiki``

   .. in the result list you should find the wikipedia entry on top,
   compare [2068]

[2087] https://github.com/searxng/searxng/issues/2087
[2068] https://github.com/searxng/searxng/issues/2068
2023-01-09 15:08:24 +01:00
ahmad-alkadri
9ee99423fe [fix] Bing-Web engine: XPath to get the wikipedia result
Modify the XPath selector to get the wikipedia result plus small fixes.

About result content: especially with the Wikipedia result, we'd get several
paragraph elements, only the first paragraph would be taken and displayed on the
search result
2023-01-08 09:11:16 +01:00
Rudis Muiznieks
128b8c7f0a
Add HTTP Referer header to DuckDuckGo requests
closes #2080
2023-01-06 16:07:37 -06:00
Rudis Muiznieks
6804ff048d
Fix: add trailing slash to duckduckgo url
Close #1854
2022-12-22 07:49:58 -06:00
Alexandre Flament
269326063a Fix: don't crash when engine or name is missing in settings.yml
SearXNG crashes if the engine or name fields are missing.
With this commit, the app displays an error in the log and keeps loading.

Close #1951
2022-12-04 23:43:59 +01:00
Émilien Devos
46ad32343a Switch back to protobuf for raw HTML 2022-11-11 07:39:48 +00:00
ngosang
78be4b4c70 Fix Google search engine.
- Fix broken links. Resolves #1794
- Fix missing results. Resolves #1829
2022-11-11 07:34:19 +01:00
Alexandre Flament
8f19bdaf17
Merge pull request #1882 from fehho/metacpan
Add MetaCPAN engine
2022-11-07 21:54:11 +01:00
fehho
fe351c2802 Add MetaCPAN engine 2022-11-07 08:07:06 -06:00
Vasilis Gerakaris
947b62c9d5
Fix floating point format in DDG weather humidity
Fixes #1836
2022-10-20 11:44:17 +03:00
Alexandre FLAMENT
035bc507ec [fix] startpage engine 2022-10-14 18:27:53 +00:00
Alexandre Flament
a3148e5115
Merge pull request #1814 from return42/fix-typos
[fix] typos / reported by @kianmeng in searx PR-3366
2022-09-28 09:22:02 +02:00
Alexandre Flament
0e00af9c26
Merge pull request #1810 from return42/fix-1809
[fix] springer: unsupported operand type(s) for +: 'NoneType' and 'str'
2022-09-28 09:20:03 +02:00
Markus Heiser
ba8959ad7c [fix] typos / reported by @kianmeng in searx PR-3366
[PR-3366] https://github.com/searx/searx/pull/3366

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-27 18:32:14 +02:00
Markus Heiser
52023e3d6e [fix] doc of the paper.html template (isbn, issn)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-25 15:46:29 +02:00
Markus Heiser
0052887929 [fix] springer: unsupported operand type(s) for +: 'NoneType' and 'str'
- fix issue reported #1809
- filter out `None` value from issn and isbn list
- add comments (from publicationName)
- add publisher

Closes: https://github.com/searxng/searxng/issues/1809
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-25 15:25:55 +02:00
Markus Heiser
e36b023508 [mod] core.ac.uk: add cetgory 'scientific publications'
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-24 16:16:22 +02:00
Alexandre Flament
16443d4f4a [mod] core.ac.uk: try multiple ways to get url
If the url is not found, using:
* the DOI
* the downloadUrl
* the ARK id
2022-09-24 15:02:39 +02:00
Markus Heiser
c76830d8a8 [mod] core.ac.uk: use paper.html template
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-24 13:19:33 +02:00
Markus Heiser
3ff2ad939d [fix] ERROR searx.engines.core.ac.uk: list index out of range
Some result items from core.ac.uk do not have an URL::

  Traceback (most recent call last):
  File "searx/search/processors/online.py", line 154, in search
    search_results = self._search_basic(query, params)
  File "searx/search/processors/online.py", line 142, in _search_basic
    return self.engine.response(response)
  File "SearXNG/searx/engines/core.py", line 73, in response
    'url': source['urls'][0].replace('http://', 'https://', 1),

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-24 13:19:33 +02:00
Alexandre Flament
d6446be38f [mod] science category: various update of about PR 1705 2022-09-23 20:52:55 +02:00
Alexandre FLAMENT
e36f85b836 Science category: update the engines
* use the paper.html template
* fetch more data from the engines
* add crossref.py
2022-09-23 20:45:58 +02:00
Alexandre Flament
bef3984d03
Merge pull request #1728 from liimee/eng-ddw
add duckduckgo weather engine
2022-09-23 18:14:09 +02:00
Alexandre Flament
d3fec1388c
Merge pull request #1624 from liimee/eng-wttr
Add wttr.in engine
2022-09-23 18:13:37 +02:00
Alexandre Flament
1a7b6872b5
Merge pull request #1792 from unixfox/google-images-internal-api
use the internal API for google images
2022-09-21 19:50:38 +02:00
Markus Heiser
cf7ee67f71 [mod] google-images: slightly improvements of the engine
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-21 18:59:55 +02:00
Emilien Devos
df5f8d0e8e use the internal API for google images 2022-09-20 22:52:38 +02:00
Markus Heiser
dcf1d408a5 [fix] google-news: origin result does not have a content area
The google news are in a rework, the content area of a news item has been
removed.

Closes: https://github.com/searxng/searxng/issues/1790
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-20 20:18:43 +02:00
Markus Heiser
fbf07237ff [fix] and improve docs generated from source code.
Fix::

    searx/locales.py:docstring of searx.locales.get_engine_locale:17: \
      WARNING: Definition list ends without a blank line; unexpected unindent.

Improvement: don't show default values in the generated documentation whe it is
more a mess than a usefull information (`:meta hide-value:`).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-18 12:44:12 +02:00
Alexandre FLAMENT
dd0887be18 xpath engine: change raise_for_httperror to no_result_for_http_status
no_result_for_http_status contains a list of HTTP status.
These HTTP status are seen an empty result list.
In other cases an exception is thrown as usual.

Previously raise_for_httperror were ignoring all HTTP error,
which make defective engines invisible in the stats.
2022-09-04 09:07:28 +02:00
Markus Heiser
a15dfa5ee1 [fix] engine woxikon.de - don't raise exception on empty result list
Woxikon expects a word in German, so with query "foo" the site finds nothing and
respons a 404:

    httpx.HTTPStatusError: Client error '404 Not Found' \
      for url 'https://synonyme.woxikon.de/synonyme/foo.php'

[1] https://github.com/searxng/searxng/issues/1543#issuecomment-1193317054

Closes: https://github.com/searxng/searxng/issues/1543
Suggested-by: @allendema [1]
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-04 09:07:28 +02:00
Markus Heiser
8e9fb0b435
Merge pull request #1647 from return42/deepl-engine
[mod] add deepl translation engine
2022-09-02 14:09:22 +02:00
ta
85b5293e40 simplify infobox result 2022-08-31 18:29:50 +07:00
ta
12f7d4a46b add duckduckgo weather engine 2022-08-31 17:29:32 +07:00
Alexandre Flament
56000d5162
Merge pull request #1699 from liimee/eng-app-store
add apple app store engine
2022-08-27 07:43:23 +02:00
Alexandre Flament
44bc94c36e
Merge pull request #1700 from liimee/eng-ddm
add apple maps engine
2022-08-27 07:36:16 +02:00
ta
5057007270 remove thumbnail from results 2022-08-27 06:23:30 +07:00
ta
525946d7dd add poi's website and phone number, doesn't crash when there is no displayMapRegion, query the token on the first request 2022-08-27 06:17:58 +07:00
ta
5dce299b22 add apple maps engine 2022-08-25 17:05:40 +07:00
ta
cef7bbab22 get the not cropped version of the thumbnail when the image height is not too important 2022-08-24 18:33:11 +07:00
ta
78bff4618c add safesearch support 2022-08-24 18:31:04 +07:00
ta
bcae7ae4e3 add developer info as author 2022-08-24 17:50:38 +07:00
ta
e5c1b64b1d add the apple app store engine
The Apple App Store is the digital app distribution platform for iOS & iPadOS.
2022-08-24 17:27:36 +07:00
ta
040e24f9ad support playing videos directly 2022-08-24 16:48:31 +07:00
ta
79d06509c1 add tags as suggestions 2022-08-23 05:18:35 +07:00
ta
d22f469010 use invalid-name instead of C0103 for pylint 2022-08-22 18:27:35 +07:00
ta
dd9127492f add 9gag engine
9GAG is a social media website where users upload and share user-generated images and videos
2022-08-22 17:35:07 +07:00
ta
e64cca8c3f don't raise error when nothing was found 2022-08-22 17:04:29 +07:00
M Asenov
faa32d5773 fixed xpath selector for appropriate results 2022-08-21 20:08:00 +01:00
Alexandre Flament
5ed40af3ba
Merge pull request #1661 from liimee/eng-tw
Add twitter engine
2022-08-21 15:21:18 +02:00
Markus Heiser
77a0f33819 [fix] engine duden - don't raise exception on empty result list
Duden expects a word in German, so with query "amazing" the site finds nothing
and respons a 404:

    httpx.HTTPStatusError: Client error '404 Not Found' for url\
      'https://www.duden.de/suchen/dudenonline/amazing'

[1] https://github.com/searxng/searxng/issues/1543#issuecomment-1193317054

Suggested-by: @allendema [1]
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-20 08:41:03 +02:00
ta
05851978cf add explanation of token 2022-08-17 19:45:42 +07:00
ta
c8acd4a3b6 add profile image to user results 2022-08-17 14:30:59 +07:00
ta
b6fd7cd571 add thumbnail to results if available 2022-08-17 14:25:22 +07:00
Markus Heiser
27385e7898 [mod] qwant - add safesearch option
Closes: https://github.com/searxng/searxng/issues/1640
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-14 10:36:14 +02:00
Markus Heiser
6579d6d558 [fix] qwant - API error::locale must be one ..
The request function should not request a language (aka locale) that is not
supported by qwant. Select a locale like zh-TW ends in qwant's API error:

  ERROR searx.engines.qwant news: exception : \
  API error::locale must be one of the following values: \
    en_gb, en_ie, en_us, en_ca, en_my, en_au, en_nz, de_de, de_ch, de_at, fr_fr, \
    fr_be, fr_ch, fr_ca, fr_ad, fc_ca, co_fr, es_es, es_ar, es_cl, es_co, es_mx, \
    es_pe, es_ad, ca_es, ca_ad, ca_fr, eu_es, eu_fr, it_it, it_ch, pt_pt, pt_ad, \
    nl_be, nl_nl

The existing searx.utils.match_language function is unsuitable for this purpose,
it is replaced by function searx.locales.get_engine_locale that is based on the
methods from the babel package.

The quant's _fetch_supported_languages function has been revised to filter out
languages 8aka locales) not supported by qwant.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-14 10:36:14 +02:00
Markus Heiser
75bb8c45d0 [mod] decouple qwant's categories from SearXNG's categories
By using new property `qwant_categ:` the category of qwant is no longer bound to
the category of SearXNG.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-14 10:26:54 +02:00
ta
96ea355a1f add twitter engine 2022-08-14 08:39:41 +07:00
Markus Heiser
eb02cc77c5 [fix] google - simplify XPath selectors to fetch more results
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-10 18:55:31 +02:00
Émilien Devos
b9f16a77db output format protobuf to HTML for google mobile 2022-08-10 09:36:06 +00:00
Thomas Renard
d4acbcfe63 [mod] add deepl translation engine
This implements the Deepl Translation engine. It works nearly like lingva but
directly to the deepl API.  This api only needs a to-lang, from-lang is a fake
by now.

There is a free option to use [1].

[1] https://www.deepl.com/pro-api?cta=header-pro-api for registering a free account.
2022-08-10 09:14:36 +02:00
Brock Vojković
24210fb10b
Revert PR #1633
This reverts the changes made to the Google results XPath in PR #1633.
2022-08-10 03:41:39 +02:00
Léon Tiekötter
94b3656b4a [fix] google engine: results XPath
Seems google rolls out changes first on the `google.com` domain and later on the
"language" domains.  By example: yesterday [1] `google.com` did not work but
`google.de` and `google.fr` did work, today they do not work any longer and this
fix is needed on all domains.

Closes: https://github.com/searxng/searxng/issues/1628
[1] https://github.com/searxng/searxng/issues/1628#issuecomment-1208191816
2022-08-09 06:23:59 +02:00
liimee
8c318562e2
add description and wikidata ID to wttr.in engine 2022-08-07 14:57:10 +07:00
ta
8aa018db95 add wttr.in engine 2022-08-07 13:04:18 +07:00
Markus Heiser
8df1f0c47e [mod] add 'Accept-Language' HTTP header to online processores
Most engines that support languages (and regions) use the Accept-Language from
the WEB browser to build a response that fits to the language (and region).

- add new engine option: send_accept_language_header

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-01 17:01:59 +02:00
Alexandre Flament
2babf59adc [fix] pyright repported errors
The errors make pyright usage useless since a new error won't be seen [1].

[1] https://github.com/searxng/searxng/pull/1569

```
  searx/compat.py:11:27 - error: Expression of type "Type[cached_property[_T@cached_property]]" cannot be assigned to declared type "Type[cached_property]"
    "Type[cached_property[_T@cached_property]]" is incompatible with "Type[cached_property]"
    Type "Type[cached_property[_T@cached_property]]" cannot be assigned to type "Type[cached_property]" (reportGeneralTypeIssues)
  searx/utils.py:69:36 - error: Expression of type "None" cannot be assigned to parameter of type "str"
    Type "None" cannot be assigned to type "str" (reportGeneralTypeIssues)
  searx/utils.py:573:85 - error: Expression of type "None" cannot be assigned to parameter of type "int"
    Type "None" cannot be assigned to type "int" (reportGeneralTypeIssues)
  searx/webapp.py:1306:22 - error: Argument of type "str" cannot be assigned to parameter "__a" of type "BytesPath" in function "join"
    Type "str" cannot be assigned to type "BytesPath"
      "str" is incompatible with "bytes"
      "str" is incompatible with protocol "PathLike[bytes]"
        "__fspath__" is not present (reportGeneralTypeIssues)
  searx/webapp.py:1306:68 - error: Argument of type "Literal['themes']" cannot be assigned to parameter "paths" of type "BytesPath" in function "join"
    Type "Literal['themes']" cannot be assigned to type "BytesPath"
      "Literal['themes']" is incompatible with "bytes"
      "Literal['themes']" is incompatible with protocol "PathLike[bytes]"
        "__fspath__" is not present (reportGeneralTypeIssues)
  searx/webapp.py:1306:78 - error: Argument of type "str | Any | None" cannot be assigned to parameter "paths" of type "BytesPath" in function "join"
    Type "str | Any | None" cannot be assigned to type "BytesPath"
      Type "str" cannot be assigned to type "BytesPath"
        "str" is incompatible with "bytes"
        "str" is incompatible with protocol "PathLike[bytes]"
          "__fspath__" is not present (reportGeneralTypeIssues)
  searx/webapp.py:1306:85 - error: Argument of type "Literal['img']" cannot be assigned to parameter "paths" of type "BytesPath" in function "join"
    Type "Literal['img']" cannot be assigned to type "BytesPath"
      "Literal['img']" is incompatible with "bytes"
      "Literal['img']" is incompatible with protocol "PathLike[bytes]"
        "__fspath__" is not present (reportGeneralTypeIssues)
  searx/engines/mongodb.py:8:6 - warning: Import "pymongo" could not be resolved (reportMissingImports)
  searx/engines/mysql_server.py:9:8 - warning: Import "mysql.connector" could not be resolved (reportMissingImports)
  searx/engines/postgresql.py:9:8 - warning: Import "psycopg2" could not be resolved from source (reportMissingModuleSource)
  searx/engines/xpath.py:187:28 - warning: "categories" is not defined (reportUndefinedVariable)
  searx/search/__init__.py:184:82 - warning: "flask" is not defined (reportUndefinedVariable)
  searx/search/checker/background.py:19:26 - error: Type of "schedule" is partially unknown
    Type of "schedule" is "(delay: Any, func: Any, *args: Any) -> Literal[True]" (reportUnknownVariableType)
  searx/shared/__init__.py:8:12 - warning: Import "uwsgi" could not be resolved (reportMissingImports)
  searx/shared/shared_uwsgi.py:5:8 - warning: Import "uwsgi" could not be resolved (reportMissingImports)
```
2022-07-30 18:04:44 +02:00
Markus Heiser
c72d70d45c Revert "Quick fix for google engine for EU countries"
This reverts commit 747cf1a246.
2022-07-26 06:39:44 +02:00
Léon Tiekötter
950f036c03
[fix] google engine: results XPath 2022-07-26 00:24:15 +02:00
Émilien Devos
747cf1a246
Quick fix for google engine for EU countries
This revert part of the commit of 5fb2071cb2
2022-07-25 20:48:50 +00:00
Markus Heiser
0be0e63117 [fix] demo_online.py - fixed typo
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-07-25 20:04:00 +02:00
Emilien Devos
5fb2071cb2 [fix] google & youtube - set EU consent cookie
This change the previous bypass method for Google consent using
``ucbcb=1`` (6face215b8) to accept the consent using ``CONSENT=YES+``.

The youtube_noapi and google have a similar API, at least for the consent[1].

Get CONSENT cookie from google reguest::

    curl -i "https://www.google.com/search?q=time&tbm=isch" \
         -A "Mozilla/5.0 (X11; Linux i686; rv:102.0) Gecko/20100101 Firefox/102.0" \
         | grep -i consent
    ...
    location: https://consent.google.com/m?continue=https://www.google.com/search?q%3Dtime%26tbm%3Disch&gl=DE&m=0&pc=irp&uxe=eomtm&hl=en-US&src=1
    set-cookie: CONSENT=PENDING+936; expires=Wed, 24-Jul-2024 11:26:20 GMT; path=/; domain=.google.com; Secure
    ...

PENDING & YES [2]:

  Google change the way for consent about YouTube cookies agreement in EU
  countries. Instead of showing a popup in the website, YouTube redirects the
  user to a new webpage at consent.youtube.com domain ...  Fix for this is to
  put a cookie CONSENT with YES+ value for every YouTube request

[1] https://github.com/iv-org/invidious/pull/2207
[2] https://github.com/TeamNewPipe/NewPipeExtractor/issues/592

Closes: https://github.com/searxng/searxng/issues/1432
2022-07-25 13:27:06 +02:00
Markus Heiser
4231a5770b [fix] sjp engine - convert enginename to a latin1 compliance name
The engine name is not only a *name* its also a identifier that is used in
logs, HTTP headers and more.  Unicode characters in the name of an engine could
cause various issues.

Closes: https://github.com/searxng/searxng/issues/1544
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-07-24 21:10:55 +02:00
james-still
2516e21c58 [fix] emojipedia - update XPath to be relative 2022-07-24 19:14:26 +02:00
Markus Heiser
1540891561 [fix] engine tineye: handle 422 response of not supported img format
Closes: https://github.com/searxng/searxng/issues/1449
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-07-23 16:00:58 +02:00
Markus Heiser
4e05197444
Merge pull request #1475 from return42/Emojipedia
[mod] Add engine for Emojipedia
2022-07-15 09:30:40 +02:00
Jay
10edcbe3c2 [mod] Add engine for Emojipedia
Emojipedia is an emoji reference website which documents the meaning and
common usage of emoji characters in the Unicode Standard.  It is owned by Zedge
since 2021. Emojipedia is a voting member of The Unicode Consortium.[1]

Cherry picked from @james-still [2[3] and slightly modified to fit SearXNG's
quality gates.

[1] https://en.wikipedia.org/wiki/Emojipedia
[2] 2fc01eb20f
[3] https://github.com/searx/searx/pull/3278
2022-07-15 09:26:44 +02:00
Alexandre Flament
44f2eb50a5
Merge pull request #1219 from dalf/follow_bing_redirect
bing.py: remove redirection links
2022-07-10 18:06:22 +02:00
Emilien Devos
6face215b8 bypass google consent with ucbcb=1 2022-07-09 21:33:24 +00:00
Alexandre Flament
a1e8af0796 bing.py: resolve bing.com/ck/a redirections
add a new function searx.network.multi_requests to send multiple HTTP requests at once
2022-07-08 22:02:21 +02:00
Markus Heiser
970a69012b [fix] engine z-zlibrary https URL
before this patch:

    DEBUG   searx.engines.z-library : using base_url: https:https://de1lib.org

with this patch URL is fixed to:

    DEBUG   searx.engines.z-library : using base_url: https://de1lib.org

Closes: https://github.com/searxng/searxng/issues/1435
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-07-05 22:27:55 +02:00
ta
14756a2674 [mod] Adds Lingva translate engine
Add the lingva engine (which grabs data from google translate).  Results from
Lingva are added to the infobox results.
2022-07-04 19:06:45 +02:00
Markus Heiser
5831c15b49 [fix] engines/openstreetmap.py typo: user_langage --> user_language
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-07-02 16:51:25 +02:00
Alexandre Flament
6716c6b0c3 openstreetmap engine: return the localized named.
For example: display "Tokyo" instead of "東京都" when the language is English.
2022-07-02 16:51:25 +02:00
ta
8883aed132 [fix] google play apps engine: implement engines/google_play_apps.py 2022-06-18 16:02:39 +02:00
Alexandre Flament
5bcbec9b06 Fix: use sys.modules.copy() to avoid RuntimeError
use sys.modules.copy() to avoid "RuntimeError: dictionary changed size during iteration"
see https://github.com/python/cpython/issues/89516
and https://docs.python.org/3.10/library/sys.html#sys.modules

close https://github.com/searxng/searxng/issues/1342
2022-06-18 07:39:46 +02:00
Alexandre Flament
2455f1d06a
Merge pull request #1308 from allendema/add-yep-com-json
[enh] Add yep.com via json_engine
2022-06-12 11:09:04 +02:00
Allen
fd9a13a3e5 [enh] Initial no paging support for Yep.com
Upstream example query:
https://yep.com/web?q=test

https://yep.com/about
2022-06-11 14:17:44 +02:00
Alexandre Flament
cd2dd5dd55 Wikidata engine: ignore dummy entities
Close #641
2022-06-11 11:09:21 +02:00
Alexandre Flament
d068b67a71 Wikidata engine: minor change of the SPARQL request
The engine can be slow especially when the query won't return any answer.
See https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI#Find_articles_in_Wikipedia_speaking_about_cheese_and_see_which_Wikibase_items_they_correspond_to

Related to #1290
2022-06-11 10:50:11 +02:00
Markus Heiser
2de007138c [fix] prepare for pylint 2.14.0
Remove issue reported by Pylint 2.14.0:

- no-self-use: has been moved to optional extension [1]
- The refactoring checker now also raises 'consider-using-generator' messages
  for max(), min() and sum(). [2]

.pylintrc:
  - <option name>-hint has been removed since long, Pylint 2.14.0 raises an
    error on invalid options
  - bad-continuation and bad-whitespace have been removed [3]

[1] https://pylint.pycqa.org/en/latest/whatsnew/2/2.14/summary.html#removed-checkers
[2] https://pylint.pycqa.org/en/latest/whatsnew/2/2.14/full.html#what-s-new-in-pylint-2-14-0
[2] https://pylint.pycqa.org/en/latest/whatsnew/2/2.6/summary.html#summary-release-highlights

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-06-03 15:41:52 +02:00
Allen
43dc9eb7d6 [enh] Initial Petalsearch Images support
Upstream example query:

  https://petalsearch.com/search?query=test&channel=image&ps=50&pn=1&region=de-de&ss_mode=off&ss_type=normal

Depending on locale it will internally use some/all results from other
engines. See:

  https://seirdy.one/posts/2021/03/10/search-engines-with-own-indexes/#general-indexing-search-engines
2022-06-02 14:32:37 +02:00
Émilien Devos
06cb15cbf7
Reflect the real world parameter from settings.yml 2022-05-10 20:44:35 +00:00
Markus Heiser
4326009d00 [format.python] based on bugfix in 9ed626130 2022-05-07 18:23:10 +02:00
capric98
8c7e6cc983 [fix] FutureWarning from lxml
Just in case if content is None, the original code will skip extract_text(), and
just append the None value to 'content'. So just add allow_none=True, and this
will return None without raising a ValueError in extract_text().
2022-04-22 16:09:36 +02:00
Alexandre Flament
bbf13a4657
Merge pull request #1101 from allendema/pass-cookies-from-settings
[enh] Allow passing headers/cookies from settings.yml
2022-04-17 11:37:07 +02:00
Allen
dae8a08089
[fix[ Update only cookies/headers 2022-04-17 11:29:23 +02:00
Allen
67fb6fba84
[lint] Remove whitespace
From GH GUI
2022-04-17 10:42:25 +02:00
Allen
15862ebc35
[mod] Pass desired ebay domain in settings
https://www.ebay.de
https://www.ebay.com
htttps://www.ebay.es

etc
2022-04-16 19:10:35 +02:00
Allen
155333f625
[enh] Allow passing headers/cookies from settings.yml
Example:

   - engine: xpath
   - search_url: example.org
   - headers: {'example_header': 'example_header'}
   - cookies: {'safesearch': 'off'}
2022-04-16 17:42:04 +02:00
Alexandre Flament
c474616642
Merge pull request #1071 from return42/fix-lang-dailymotion
[fix] dailymotion engine: filter by language & country
2022-04-16 11:54:49 +02:00
Alexandre Flament
1a82e79b50 dailymotion: send valid value for the language parameter 2022-04-16 09:27:34 +02:00
Markus Heiser
3bb62823ec [fix] dailymotion engine: filter by language & country
- fix the issue of fetching more the 7000 *languages*
- improve the request function and filter by language & country
- implement time_range_support & safesearch
- add more fields to the response from dailymotion (allow_embed, length)
- better clean up of HTML tags in the 'content' field.

This is more or less a complete rework based on the '/videos' API from [1].
This patch cleans up the language list in SearXNG that has been polluted by the
ISO-639-3 2 and 3 letter codes from dailymotion languages which have never been
used.

[1] https://developers.dailymotion.com/tools/

Closes: https://github.com/searxng/searxng/issues/1065
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-04-16 09:27:34 +02:00
Jabster28
9eb1b04f48
change "Wolfram|Alpha" to "Wolfram Alpha" in search results 2022-04-12 10:37:33 +01:00
Alexandre Flament
592cea0e5e
Merge pull request #1030 from austinhuang0131/master
(feat) add jisho.org
2022-04-09 18:57:20 +02:00
Alexandre Flament
74c7aee9ec jisho : code refactoring 2022-04-09 18:01:57 +02:00
Austin Huang
19fa0095a0
(fix) satisfy the linter, and btw reduce timeout 2022-04-01 09:23:24 -04:00
Austin Huang
a399248f56
update jisho.py according to suggestions 2022-04-01 09:18:19 -04:00
Alexandre FLAMENT
f00cdb5e51 bing engine: _fetch_supported_languages: don't use the language code as a country
ref #1029
2022-03-31 20:03:34 +00:00
Austin Huang
934ae4e086
(feat) add jisho.org
Closes #1016
2022-03-31 14:45:39 -04:00
Alexandre Flament
378b29be2f fix startpage: update XPath in _fetch_supported_languages 2022-03-19 14:16:37 +01:00
Markus Heiser
53b5a804e2 [fix] engine mediathekviewweb: replace http links by https
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-03-07 19:49:16 +01:00
Markus Heiser
20f4538e13 [fix] engine: Semantic Scholar (Science) // rework & fix
Closes: https://github.com/searxng/searxng/issues/939
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-03-05 11:53:41 +01:00
Markus Heiser
8d937179ab
Merge pull request #913 from return42/add-artwork
[mod] add artwork to mixcloud & soundcloud engines
2022-02-21 22:24:40 +01:00
Markus Heiser
b08b81b434 [mod] bandcamp & genius: in result set img_src instead thumbnail
Suggested-by: @dalf https://github.com/searxng/searxng/pull/900#issuecomment-1046009057
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-02-21 22:12:07 +01:00
Markus Heiser
bded1ee280 [fix] genius: add player an avoid exceptional programming
Add player:

- The players are just playing 30sec from the title.  Some of the player will be
  blocked because of a cross-origin request and some players will link to apple
  when you press the play button.

Avoid exceptions and (and BTW improve results)

-  ERROR   searx.engines.genius          : list index out of range

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-02-21 22:12:07 +01:00
Markus Heiser
36aee70c24
Merge pull request #910 from tiekoetter/fix-909
[fix] google images engine: Fix 'scrap_img_by_id' function
2022-02-20 18:29:50 +01:00
Markus Heiser
2921d3cd17 [mod] add artwork to mixcloud & soundcloud engines
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-02-19 21:59:12 +01:00
Markus Heiser
4a28b593c2 [fix] google images engine: Fix 'scrap_img_by_id' function
The 'scrap_img_by_id' function didn't return any longer anything useful.  This
fix allows the google images engine to present the full source image instead of
only the thumbnail.

The function scrap_img_by_id() is rpelaced by a fully rewrite to parse image
URLs by a regular expression. The new function parse_urls_img_from_js(dom)
returns a mapping of data-id to image URL.

Closes: https://github.com/searxng/searxng/issues/909
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-02-19 14:33:56 +01:00
Alexandre Flament
ace5401632
Merge pull request #900 from return42/fix-883
[fix] bandcamp: fix itemtype (album|track) and exceptions
2022-02-19 13:42:53 +01:00
Markus Heiser
943a7fdcb5 [mod] mediathekviewweb engine: add iframe_src and use videos template
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-02-19 00:50:54 +01:00
Markus Heiser
05c105b837 [fix] bandcamp: fix itemtype (album|track) and exceptions
BTW: polish implementation and show tracklist for albums

Closes: https://github.com/searxng/searxng/issues/883
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-02-18 22:44:43 +01:00
Markus Heiser
7352c6bc79 [mod] templates: rename field for <iframe> URL to iframe_src
Rename result field data_src to iframe_src

Suggested-by: @dalf https://github.com/searxng/searxng/pull/882#issuecomment-1037997402
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-02-18 19:00:49 +01:00
Markus Heiser
98cab4cf75 [mod] result_templates/default.html replace embedded HTML by data_src audio_src
Embedded HTML breaks SearXNG architecture.  To modularize, HTML is generated in
the templates (oscar & simple) and result parameter 'embedded' is replaced by
'data_src' (and 'audio_src'), an URL for embedded content (<iframe>).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-02-13 14:20:47 +01:00
Markus Heiser
46e131fdad [mod] result_templates/videos.html: replace embedded HTML by data_src
Embedded HTML breaks SearXNG architecture.  To modularize, HTML is generated in
the templates (oscar & simple) and result parameter 'embedded' is replaced by
'data_src', an URL for embedded content (<iframe>).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-02-13 14:20:47 +01:00
Émilien Devos
7d3e8118b0
Update the XPath for fetching the Google results 2022-02-09 14:34:14 +01:00
Markus Heiser
906a0a99cd [fix] openstreatmap: load thumbnail from uploads.wikimedia.org
Openstreatmap images are now loaded from uploads.wikimedia.org instead of
commons.wikimedia.org to prevent redirects.

With `image_proxy` enabled images from commons.wikimedia.org cant be loaded
since they are redirected.  We already discussed this issue [875] and
@tiekoetter fixed this issue in PR [878].

Related-to:
- [875] https://github.com/searxng/searxng/issues/875
- [878] https://github.com/searxng/searxng/pull/878
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-02-07 13:05:52 +01:00
Markus Heiser
a967e59590 [pylint] searx/engines/wikidata.py (no functional change)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-02-07 10:15:32 +01:00
Léon Tiekötter
1c151ae92b
[fix] wikidata: URL decoding and file extension handling
Add '.png' to the second img_src_name if it has the extension '.svg'.
Use urllib.parse.unquote for URL decoding.
2022-02-07 00:21:02 +01:00
Markus Heiser
a13c5d70c7 [fix] wikidata engine: select image with higher (not lower) priority
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-02-06 23:35:55 +01:00
Léon Tiekötter
a50f32bcfc
wikidata: load thumbnail instead of full image 2022-02-06 23:25:50 +01:00
Léon Tiekötter
560a14e77b
[fix] wikidata info box images
Wikidata info box images are now loaded from uploads.wikimedia.org instead of commons.wikimedia.org to prevent redirects

Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
2022-02-06 22:16:06 +01:00
Markus Heiser
b35ef9789b [pylint] engines/invidious.py
Fix remarks from pylint and remove usless comments

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-02-04 15:42:06 +01:00
Markus Heiser
e2ec6b4211 [fix] invidious engine: store random base_url in param
Two different threads ( = two different user queries) can call the request
function in a row and then the response function.  The namespace will be same
since this is the same engine.

To keep exactly the same value ``base_url`` must be stored in params and then
retrieve using ``resp.search_params["base_url"]``.

Suggested-by: @dalf https://github.com/searxng/searxng/pull/862#discussion_r799324861
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-02-04 15:42:06 +01:00
Markus Heiser
ddc2102a07 [fix] solidtorrents engine: store random bas_url in param
Two different threads ( = two different user queries) can call the request
function in a row and then the response function.  The namespace will be same
since this is the same engine.

To keep exactly the same value ``base_url`` must be stored in params and then
retrieve using ``resp.search_params["base_url"]``.

Suggested-by: @dalf https://github.com/searxng/searxng/pull/862#discussion_r799324861
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-02-04 14:55:21 +01:00
Markus Heiser
d6061b7c8a [mod] solidtorrents engine: add metadata & torrentfile
BTW: define min_len in eval_xpath_list of 'stats' list

Suggested-by: @dalf https://github.com/searxng/searxng/pull/862#pullrequestreview-872910744
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-02-04 14:53:42 +01:00
Markus Heiser
f9c4868142 [fix] solidtorrents engine: use get_torrent_size from searx.utils
Suggested-by: @dalf https://github.com/searxng/searxng/pull/862#pullrequestreview-872858489
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-02-04 14:53:42 +01:00
Markus Heiser
d92b3d96fd [fix] solidtorrents engine: JSON API no longer exists
The API endpoint, we where using does not exist anymore.  This patch is a
rewrite that parses the HTML page.

Related: https://github.com/paulgoio/searxng/issues/17
Closes: https://github.com/searxng/searxng/issues/858

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-02-04 14:53:37 +01:00
Markus Heiser
50a56532c4 [pylint] engines/currency_convert.py
Fix remarks from pylint

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-02-01 08:02:42 +01:00
Markus Heiser
15320b5eec [fix] engines description - currency_convert.py
Currency engine has DuckDuckGo metadata

In the engine selector of the preferences window, the currency search engine has
the same metadata and wikidata url as duckduckgo, I'd assume there should be a
difference of some sort there clarifying what source the currency uses or, if
it's a duckduckgo service, at least clarifying that it's a currency service by
duck duck go.

Closes: https://github.com/searxng/searxng/issues/787
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-31 23:17:28 +01:00
Markus Heiser
60e7fee47a
Merge pull request #475 from return42/tineye
[enh] engine - add Tineye reverse image search
2022-01-31 08:51:35 +01:00
Alexandre Flament
ebd3013a1a [mod] tineye engine: minor changes
* remove "disable: false" in settings.yml
* use the json() method from httpx.Response (faster character encoding detection)
2022-01-30 20:49:22 +01:00
Léon Tiekötter
a6673a1a94 [fix] 1x engine
1x changed the XML result layout.
2022-01-30 19:48:40 +01:00
Markus Heiser
a6b879f19c [mod] tineye engine: set engine_type to 'online_url_search'
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-30 16:30:52 +01:00
Alexandre Flament
116802852d [fix] ina engine
based on a45408e8e2
2022-01-28 22:33:41 +01:00
Markus Heiser
b7f74fbe42 [mod] tineye - add some documentation
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-28 09:06:44 +01:00
Allen
880555e263 [enh] engine - add Tineye reverse image search
Other optional parameter ..

`&sort=crawl_date`
    can be appended to search_string to sort results by date.

`&domain=example.org`
    can be implemented to search_string to get results from just one domain.

Public instances could get relatively fast timed-out for 3600s.

--

Merged from @allendema's commit [1] and slightly modfied / see [2].

Related-to: [1] 455b2b4460
Related-to: [2] https://github.com/searx/searx/pull/3040
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-28 09:06:44 +01:00
Léon Tiekötter
0cbf73a1f4
Allow 'using_tor_proxy' to be set for each engine individually
Check 'using_tor_proxy' for each engine individually instead of checking globally

[fix] searx.network: update _rdns test to the last httpx version

Co-authored-by: Alexandre Flament <alex@al-f.net>
2022-01-27 22:37:02 +01:00
Markus Heiser
1a0760c10a [fix] googel engine - "some results are invalids: invalid content"
Fix google issues listet in the `/stats?engine=google` and message::

    some results are invalids: invalid content

The log is::

    DEBUG   searx                         : result: invalid content: {'url': 'https://de.wikipedia.org/wiki/Foo', 'title': 'Foo - Wikipedia', 'content': None, 'engine': 'google'}
    WARNING searx.engines.google          : ErrorContext('searx/search/processors/abstract.py', 111, 'result_container.extend(self.engine_name, search_results)', None, 'some results are invalids: invalid content', ()) True

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-18 13:23:35 +01:00
Markus Heiser
f0102a95c9 [fix] google engine: remove adds and fix mobile_ui selector
1. Fix issue reported in comment [1]
2. Fix XPath selector for the response of google's mobile UI, reported in
   comment [2]

[1] https://github.com/searxng/searxng/pull/777#issuecomment-1015121322
[2] https://github.com/searxng/searxng/pull/777#issuecomment-1015236238

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-18 11:05:45 +01:00
Émilien Devos
6670063e0d
Update XPath for Google engine 2022-01-17 21:49:57 +00:00
Alexandre Flament
e07417848f
Merge pull request #695 from return42/fix-sp
[fix] startpage engine / modified API
2022-01-16 20:27:36 +01:00
Alexandre Flament
f9271d595f [fix] startpage: workaround to use the startpage network
workaround for the issue #762
2022-01-15 22:56:34 +01:00
Markus Heiser
bf593af423 [mod] engine mysql_server: make port configurable
Cherry piked from https://github.com/searx/searx/commit/82ac634070

Suggested-by: https://github.com/searx/searx/issues/3117
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-11 23:47:40 +01:00
Markus Heiser
df238e944c [mod] starpage engine: add comment about Startpage's FFox add-on
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-10 11:22:38 +01:00
Markus Heiser
21e884f369 [fix] startpage engine: fetch CAPTCHA & issues related to PR-695
In case of CAPTCHA raise a SearxEngineCaptchaException and suspend for 7 days.
When get_sc_code() fails raise a SearxEngineResponseException and suspend for 7
days.

[1] https://github.com/searxng/searxng/pull/695

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-10 11:22:38 +01:00
Markus Heiser
2f4e567e90 [fix] Get an actual sc argument from startpage's home page.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-10 11:22:38 +01:00
Markus Heiser
1cbcddb3f7 [pylint] Startpage engine
Fix remarks from pylint

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-10 11:22:38 +01:00
Markus Heiser
f1f5e69c42 [fix] startpage engine - avoid captcha
Startpage has introduced new anti-scraping measures that make SearXNG instances
run into captchas:

1. some arguments has been removed and a new `sc` has been added.
2. search path changed from `do/search` to `sp/search`
3. POST request is no longer needed

Closes: https://github.com/searxng/searxng/issues/692
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-10 11:22:12 +01:00
Martin Fischer
576e19dad1 [fix] add default for "about" engine property
Fixes #732.
2022-01-10 08:40:06 +01:00
Markus Heiser
4fc5e5299c [fix] ccengine engine - avoid unwanted redirects
api.openverse.engineering is a little picky and wants to have a trailing slash
in the path:

    /v1/images? -->/ v1/images/?

otherwise it redirects, here is the debug log:

    DEBUG   searx.network.openverse       : HTTP Request: GET https://api.openverse.engineering/v1/images?&page=1&page_size=20&format=json&q=foo "HTTP/2 301 Moved Permanently" (text/html; charset=utf-8)
    DEBUG   searx.network.openverse       : HTTP Request: GET https://api.openverse.engineering/v1/images/?&page=1&page_size=20&format=json&q=foo "HTTP/2 200 OK" (application/json)
    WARNING searx.engines.openverse       : ErrorContext('searx/search/processors/online.py', 105, 'count_error(', None, '1 redirects, maximum: 0', ('200', 'OK', 'api.openverse.engineering')) True

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-07 14:14:31 +01:00
Léon Tiekötter
37baf46ece [fix] Rename ccengine engine to openverse
The CC engine was merged with WordPress and renamed to Openverse

Source: https://wordpress.org/news/2021/05/welcome-to-openverse/
2022-01-07 13:06:05 +01:00
Léon Tiekötter
4be6deb0a1 [fix] ccengine engine
Change domain to api.openverse.engineering
2022-01-07 13:01:37 +01:00
Markus Heiser
ced656606f
Merge pull request #709 from return42/drop-etools
[fix] drop etools engine module
2022-01-07 11:18:47 +01:00
Markus Heiser
5dd3442f83 [fix] drop etools engine module
The implementation of the etools engine is poor.  No date-range support, no
language support and it is broken by a CAPTCHA.

etools is a metasearch engine, the major search engines it supports (google,
bing, wikipedia, Yahoo) are already available in SeaarXNG.

While etools does support several engines we currently don't support directly,
support for them should be added directly to SearXNG if there is demand.

In practice: in SearXNG the worse etools results will be mixed with good results
from other engines we have (as long as there is no captcha).

At best case, what we win with etools is in e.g. results from de.ask.com in a
query from a german request .. in all other cases worse results are bubble up in
SearXNG's result list.

[1] https://github.com/searxng/searxng/issues/696#issuecomment-1005855499

Closes: https://github.com/searxng/searxng/issues/696
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-07 10:41:09 +01:00
Martin Fischer
e12525a1fa
Merge pull request #708 from not-my-profile/pref-refactor
Refactor `preferences`
2022-01-07 09:45:23 +01:00
Léon Tiekötter
3ab826de22 Drop microsoft academic engine
Microsoft academic was discontinued on 2021-12-31.

Source: https://www.microsoft.com/en-us/research/project/academic/articles/microsoft-academic-to-expand-horizons-with-community-driven-approach/
2022-01-07 01:35:13 +01:00
Martin Fischer
bb06758a7b [refactor] add type hints & remove Setting._post_init
Previously the Setting classes used a horrible _post_init
hack that prevented proper type checking.
2022-01-06 14:21:14 +01:00
Alexandre Flament
aedd6279b3
Merge pull request #634 from not-my-profile/powered-by
Introduce `categories_as_tabs` & group engines in tabs
2022-01-06 09:22:02 +01:00
Alexandre Flament
d3ecadd3f8
Merge pull request #679 from dalf/brand-searxng
searxng.org: update setup.py & settings.yml
2022-01-05 19:07:53 +01:00
Martin Fischer
d01e8aa8cc [mod] introduce searx.engines.Engine for type hinting 2022-01-05 11:03:44 +01:00
Martin Fischer
1e195f5b95 [mod] move group_engines_in_tab to searx.webutils 2022-01-05 11:03:44 +01:00
Martin Fischer
5d74bf3820 [enh] move dictionaries, Erowid & IMDb out of general category
The general category is the category that is searched by default.
From a privacy standpoint it doesn't make sense to send all general
queries to specialized search engines that cannot deal with those
queries anyway.
2022-01-05 11:03:44 +01:00
Martin Fischer
ab90e2ac49 [enh] show categories not in any tab category in "Other" preferences tab
Previously we didn't have a good place to put search engines that don't
fit into any of the tab categories. This commit automatically puts
search engines that don't belong to any tab category in an "other"
category, that is only displayed in the user preferences (and not above
search results).
2022-01-05 11:03:44 +01:00
Martin Fischer
b02f762687 [enh] add more categories 2022-01-05 11:00:11 +01:00
Martin Fischer
8e9ad1ccc2 [enh] introduce categories_as_tabs
Previously all categories were displayed as search engine tabs.
This commit changes that so that only the categories listed under
categories_as_tabs in settings.yml are displayed.

This lets us introduce more categories without cluttering up the UI.
Categories not displayed as tabs  can still be searched with !bangs.
2022-01-03 07:01:49 +01:00
Martin Fischer
df34b1ddcf [enh] settings.yml: allow granular overwrites for about 2022-01-03 07:01:49 +01:00
Alexandre Flament
d83aa2b0d2
Merge pull request #613 from return42/pylint-bing-images
[pylint] Bing (Images) engine
2022-01-02 22:00:55 +01:00
Alexandre Flament
76cbfbbdda reference docs.searxng.org 2022-01-02 21:18:29 +01:00
Markus Heiser
61ce0c2244 [fix] bing engines: fetch_supported_languages
The Request to and the Response from https://www.bing.com/account/general has
been changed.

[1] https://github.com/searxng/searxng/pull/672#discussion_r777104919

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-01 17:31:38 +01:00
Markus Heiser
dc4f1f705d [pylint] Bing (Images) engine
Fix remarks from pylint and remove obsolete try/except block

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-12-28 14:43:39 +01:00
Markus Heiser
6d7a38a912 [pylint] Bing (Videos) engine
Fix remarks from pylint and remove obsolete try/except block

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-12-28 14:33:05 +01:00
Markus Heiser
d84226bf63 [fix] issues reported by pylint
Fix pylint issues from commit (3d96a983)

    [format.python] initial formatting of the python code

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-12-27 10:16:20 +01:00
Markus Heiser
3d96a9839a [format.python] initial formatting of the python code
This patch was generated by black [1]::

    make format.python

[1] https://github.com/psf/black

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-12-27 09:26:22 +01:00
Markus Heiser
fcdc2c2cd2 [format.python] disable py code formatting for some hunks of code
Disable the python code formatting from python-black, where the readability of
code suffers by formatting.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-12-27 09:16:03 +01:00
Martin Fischer
e28c6bda35 [doc] introduce about.language and sort engines by it 2021-12-21 09:58:51 +01:00
Markus Heiser
7a215e07e7
Merge pull request #611 from return42/fix-bing
[fix] bing engine: fix paging support, show inital page.
2021-12-20 10:08:52 +01:00
Markus Heiser
2af50c2588 [pylint] Reddit engine
Add Reddit engine to pylint process

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-12-18 17:59:47 +01:00
Markus Heiser
6b85607274 [fix] bing engine: fix paging support, show inital page.
Follow up queries for the pages needed to be fixed.

- Split search-term in one for initial query and one for following queries.
- Set some headers in HTTP requests, bing needs for paging support.
- IMO //div[@class="sa_cc"] does no longer match in a bing response.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-12-18 13:50:38 +01:00
Markus Heiser
b2177e5916 [pylint] Bing (Web) engine
Fix remarks from pylint and improved code-style.  In preparation for a bug-fix
of the Bing (Web) engine I add this engine to the pylint-list.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-12-18 13:40:36 +01:00
Markus Heiser
f41734a543 [fix] engine bing-news: replace the http:// by https://
BTW: add bing_news to the pylint process

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-12-17 13:25:50 +01:00
Markus Heiser
8cc7c880ae
Merge pull request #587 from dalf/fix-gigablast
[fix] gigablast engine
2021-12-12 15:58:13 +01:00
Markus Heiser
b5c9cc4ff3
Merge pull request #586 from dalf/remove-yggtorrent
[del] remove yggtorrent
2021-12-07 07:00:47 +01:00
Alexandre Flament
1a6207574e [fix] gigablast engine
fetch extra params after 3000 seconds
2021-12-06 22:55:15 +01:00
Alexandre Flament
fbc2a6ab4b [del] remove yggtorrent
yggtorrent is behind cloudflare now
close #580
2021-12-06 21:59:51 +01:00
Alexandre Flament
037cb7dd3d [fix] imdb: don't crash when there is no result 2021-12-06 21:49:18 +01:00
Markus Heiser
6e06618e0c [fix] google-videos engine: ignore news articles
In the video search, google also sometimes includes news.  E.g. in the DE
language when you search for `!gov paris`, google adds an article from a german
newspaper (FAZ), I assume these are sponsored link (not tagged advertisement?)

Those links do not have an image / this patch ignores *video links* wqithout an
image ID.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-11-26 17:11:20 +01:00
Markus Heiser
1ce09df9aa [fix] google video engine - rework of the HTML parser
The google video response has been changed slightly, a rework of the parser was
needed.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-11-26 01:14:17 +01:00
Markus Heiser
488ace1da9 [fix] google engine - suggestion
BTW: google no longer offers *spelling suggestions*

Closes: https://github.com/searxng/searxng/issues/442
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-11-25 19:42:03 +01:00
Markus Heiser
5b28c9109f [fix] google images: @href index 0 not found
Sometimes there is no href in the `<a ..>` tag of a *link_node* [1].

[1] https://github.com/searxng/searxng/issues/532

Reported-by: @TheEssem
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-11-21 09:55:59 +01:00
Markus Heiser
4c82ac7670 [drop] engine digg - https://digg.com/api is no longer available
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-11-19 15:00:22 +01:00
Tom
e1d60051ca
[fix] Qwant search query string
Search string: "!qwant time"
Resulting request URL: https://api.qwant.com/v3/search/web?q=q=time&count=10&offset=0&device=desktop&safesearch=1&locale=en_US
Notice the double "q="

Resulting request URL after fix: https://api.qwant.com/v3/search/web?q=time&count=10&offset=0&device=desktop&safesearch=1&locale=en_US
2021-11-17 18:13:54 +01:00
MrPaulBlack
41494d9f47 [fix] make reddit only in social media category avail.
fix https://github.com/searxng/searxng/issues/470
2021-11-01 20:37:17 +01:00
Alexandre Flament
64b29ad838 [mod] microsoft academic: increase timeout to 6 seconds
also avoid a crash when there is no result

close #433
2021-10-26 12:26:43 +02:00
Markus Heiser
713814547a [fix] yahoo engine - don't lump all search suggestions together
Closes: https://github.com/searxng/searxng/issues/421
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-10-21 07:51:05 +00:00
Markus Heiser
f63ffbb22b [fix] engine - yahoo: rewrite and fix issues
Languages are supported by mapping the language to a domain.  If domain is not
found in :py:obj:`lang2domain` URL ``<lang>.search.yahoo.com`` is used.

BTW: fix issue reported at https://github.com/searx/searx/issues/3020

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-10-16 20:05:26 +00:00
Markus Heiser
38a157b56f [pylint] engines: yahoo fix several issues reported from pylint
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-10-16 20:05:26 +00:00
MrPaulBlack
00b0394e19 [fix] language param for qwant 2021-10-14 16:11:44 +00:00
Noémi Ványi
4cc1ee8565 [fix] qwant engine - only get results from categories
Reported-by: https://github.com/searx/searx/issues/3014
Cherry-picked: https://github.com/searx/searx/commit/3bcca43
2021-10-12 18:42:50 +00:00
Paolo Basso
64df011e2f [mod] engines - add zlibrary engine 2021-10-11 14:58:44 +00:00
Markus Heiser
3abbe6d25b [fix] engine torznab - categories, before join convert int to str
BTW add init() function and replace SearxEngineAPIException by ValueError.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-10-07 15:27:55 +00:00
Markus Heiser
9fb77065bd [fix] engine torznab - marginal issues reported from linters
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-10-07 15:27:55 +00:00
Paolo Basso
d803df8d89 [mod] engines - add torznab WebAPI 2021-10-07 15:27:55 +00:00
Markus Heiser
19e41c137e [mod] set 'engine.supported_languages' from the origin python module
The key of the dictionary 'searx.data.ENGINES_LANGUAGES' is the *engine name*
configured in settings.xml.  When multiple engines are configured to use the
same origin engine (e.g. `engine: google`)::

    - name: google
      engine: google
      use_mobile_ui: false
      ...

    - name: google italian
      engine: google
      use_mobile_ui: false
      language: it
      ...

    - name: google mobile ui
      engine: google
      shortcut: gomui
      use_mobile_ui: true

There exists no entry for ENGINES_LANGUAGES[engine.name] (e.g. `name: google
mobile ui` or `name: google italian`).  This issue can be solved by recreate the
ENGINES_LANGUAGES::

    make data.languages

But this is nothing an SearXNG admin would like to do when just configuring
additional engines, since this just doubles entries in ENGINES_LANGUAGES and
BTW: `make data.languages` has various external requirements which might be not
installed or not available, on a production host.

With this patch, if engine.name fails, ENGINES_LANGUAGES[engine.engine] is used
to get the engine.supported_languages (e.g. `google` for the engine named
`google mobile`).

For an engine, when there is `language: ...` in the YAML settings, the engine
supports only one language, in this case engine.supported_languages should
contains this value defined in settings.yml (e.g. `it` for the engine named
`google italian`).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Closes: https://github.com/searxng/searxng/issues/384
2021-10-07 08:45:02 +02:00
Alexandre Flament
8a897b86f1 [mod] engines - IMDB: add thumbnails 2021-10-05 09:10:02 +02:00
Paul Alcock
823d44ed0a [mod] engines - add IMDB / Internet Movie Database
Merged from @Guilvareux's commit [1] and slightly modfied / see [2].

[1] https://github.com/searx/searx/pull/2980/commits/f2f90071
[2] https://github.com/searx/searx/pull/2980
2021-10-03 11:44:25 +02:00
Markus Heiser
a5b7ed9550 [mod] engine duckduckgo - update supported_languages_url
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-10-01 20:01:41 +02:00
Markus Heiser
4c9b8b29ee [mod] engine duckduckgo - use DuckDuckGo-Lite
Implement a scrapper for DuckDuckGo-Lite [1].  The existing DuckDuckGo [2]
engine does not support paging.  DuckDuckgo-Lite is much faster, less verbose
and does have a paging option (reversed engineered from the input form of [1]).

[1] https://lite.duckduckgo.com/lite
[2] https://duckduckgo.com/

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-10-01 20:01:41 +02:00
Markus Heiser
ecb3912bd0 [fix] engine stackexchange - decode HTML entities in title & content
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-09-29 08:08:18 +02:00
Markus Heiser
b62851559b [mod] replace old stackoverflow engine by Stack Exchange API v2.3
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-09-28 19:12:37 +02:00
Markus Heiser
55fee1e45d [mod] engines - add Stack Exchange API v2.3
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-09-28 19:01:04 +02:00
Alexandre Flament
b046322c7b
Merge pull request #333 from dalf/enh-engine-descriptions
RFC: /preferences: display engine descriptions
2021-09-25 11:29:25 +02:00
Alexandre Flament
ab569c1e12 [fix] openstreetmap engine: optmizer SPARQL query
add
hint:Query hint:optimizer "None".
to the SPARQL query to keep the response time small.

It tells the optimizer to follow the path from ?item to the different property values
instead of the other way around.
See https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/query_optimization#Property_paths
2021-09-25 11:16:22 +02:00
Alexandre Flament
8961131497 [fix] fix the about section of some engines 2021-09-24 20:20:30 +02:00
Alexandre Flament
6f11b61cd5 [fix] openstreetmap engine: map "all" language to English 2021-09-24 20:12:18 +02:00
Markus Heiser
443bf35e09 [pylint] fix global-variable-not-assigned issues
If there is no write access, there is no need for global.  Remove global
statement if there is no assignment.

global-variable-not-assigned:
  Using global for names but no assignment is done Used when a variable is
  defined through the "global" statement but no assignment to this variable is
  done.

In Pylint 2.11 the global-variable-not-assigned checker now catches global
variables that are never reassigned in a local scope and catches (reassigned)
functions [1][2]

[1] https://pylint.pycqa.org/en/latest/whatsnew/2.11.html
[2] https://github.com/PyCQA/pylint/issues/1375

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-09-17 10:14:27 +02:00