Commit graph

31 commits

Author SHA1 Message Date
Alexandre FLAMENT
bb5db079c7 [fix] searxng_extra/update/update_engine_descriptions.py (part 2)
Wikipedia description are fetched without the help the wikipedia engine:

* the SQPARL query return the wikipedia URL of the article
2023-04-15 16:04:05 +02:00
Markus Heiser
27369ebec2 [fix] searxng_extra/update/update_engine_descriptions.py (part 1)
Follow up of #2269

The script to update the descriptions of the engines does no longer work since
PR #2269 has been merged.

searx/engines/wikipedia.py
==========================

1. There was a misusage of zh-classical.wikipedia.org:

   - `zh-classical` is dedicate to classical Chinese [1] which is not
     traditional Chinese [2].

   - zh.wikipedia.org has LanguageConverter enabled [3] and is going to
     dynamically show simplified or traditional Chinese according to the
     HTTP Accept-Language header.

2. The update_engine_descriptions.py needs a list of all wikipedias.  The
   implementation from #2269 included only a reduced list:

   - https://meta.wikimedia.org/wiki/Wikipedia_article_depth
   - https://meta.wikimedia.org/wiki/List_of_Wikipedias

searxng_extra/update/update_engine_descriptions.py
==================================================

Before PR #2269 there was a match_language() function that did an approximation
using various methods.  With PR #2269 there are only the types in the data model
of the languages, which can be recognized by babel.  The approximation methods,
which are needed (only here) in the determination of the descriptions, must be
replaced by other methods.

[1] https://en.wikipedia.org/wiki/Classical_Chinese
[2] https://en.wikipedia.org/wiki/Traditional_Chinese_characters
[3] https://www.mediawiki.org/wiki/Writing_systems#LanguageConverter

Closes: https://github.com/searxng/searxng/issues/2330
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-04-15 16:03:59 +02:00
Markus Heiser
16f0db4493 [mod] replace utils.match_language by locales.match_locale
This patch replaces the *full of magic* ``utils.match_language`` function by a
``locales.match_locale``.  The ``locales.match_locale`` function is based on the
``locales.build_engine_locales`` introduced in 9ae409a0 [1].

In the past SearXNG did only support a search by a language but not in a region.
This has been changed a long time ago and regions have been added to SearXNG
core but not to the engines.  The ``utils.match_language`` was the function to
handle the different aspects of language/regions in SearXNG core and the
supported *languages* in the engine.  The ``utils.match_language`` did it with
some magic and works good for most use cases but fails in some edge case.

To replace the concurrence of languages and regions in the SearXNG core the
``locales.build_engine_locales`` was introduced in 9ae409a0 [1].  With the last
patches all engines has been migrated to a ``fetch_traits`` and a
language/region concept that is based on ``locales.build_engine_locales``.

To summarize: there is no longer a need for the ``locales.match_language``.

[1] https://github.com/searxng/searxng/pull/1652

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
c9cd376186 [mod] replace searx.languages by searx.sxng_locales
With the language and region tags from the EngineTraitsMap the handling of
SearXNG's tags of languages and regions has been normalized and is no longer
a *mystery*.  The "languages" became "locales" that are supported by babel and
by this, the update_engine_traits.py can be simplified a lot.

Other code places can be simplified as well, but these simplifications
should (respectively can) only be done when none of the engines work with the
deprecated EngineTraits.supported_languages interface anymore.

This commit replaces searx.languages by searx.sxng_locales and fix the naming of
some names from "language" to "locale" (e.g. language_codes --> sxng_locales).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
6e5f22e558 [mod] replace engines_languages.json by engines_traits.json
Implementations of the *traits* of the engines.

Engine's traits are fetched from the origin engine and stored in a JSON file in
the *data folder*.  Most often traits are languages and region codes and their
mapping from SearXNG's representation to the representation in the origin search
engine.

To load traits from the persistence::

    searx.enginelib.traits.EngineTraitsMap.from_data()

For new traits new properties can be added to the class::

    searx.enginelib.traits.EngineTraits

.. hint::

   Implementation is downward compatible to the deprecated *supported_languages
   method* from the vintage implementation.

   The vintage code is tagged as *deprecated* an can be removed when all engines
   has been ported to the *traits method*.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
9a710587e8 [fix] remove usage of deprecated-module distutis
Closes: https://github.com/searxng/searxng/issues/2168

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-02-10 15:31:54 +01:00
Markus Heiser
4c06837a50 [mod] make python code pylint 2.16.1 compliant
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-02-10 13:59:21 +01:00
ArtikusHG
1f8f8c1e91 Replace langdetect with fasttext 2022-12-16 21:07:39 +02:00
Alexandre Flament
e473addaff User agent: don't include the patch number in the Firefox version
The Firefox version in the user agent doesn't include the patch version: 106.0 not 106.0.2

Close #1914
2022-11-05 22:04:37 +01:00
Markus Heiser
9933155a2e [fix] update_osm_keys_tags.py: sort JSON dump
To get meaningful diff, the keys in JSON dump needs to be sorted.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-10-11 11:45:26 +02:00
Markus Heiser
ba8959ad7c [fix] typos / reported by @kianmeng in searx PR-3366
[PR-3366] https://github.com/searx/searx/pull/3366

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-27 18:32:14 +02:00
Alexandre Flament
578b2a8183 fix searxng_extra/update/update*.py scripts
call searx.locales.locales_initialize before using LOCALE_NAMES

Related to https://github.com/searxng/searxng/pull/1306
2022-07-02 12:16:00 +02:00
Markus Heiser
e8541b6006 [theme] peel out oscar from SearXNG development
This is the first step of removing oscar theme

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-04-30 13:20:27 +02:00
Markus Heiser
62982c8812 [fix] add back missing languages & regions (followup of PR #1071)
In PR #1071 the language catalog of dailymotion has been cleaned up, before
there had been over 7000 "languages" in the catalog.

As a side effect of this clean-up the language & region catalog in SearXNG has
been reduced [1].

This patch reduce the ``min_engines_per_lang`` from 13 to 12 to get the missed
languages back in language & region catalog of SearXNG.

[1] 3bb62823ec (diff-f3f00db0f87f95b882624a192e0aac21525638af0b18c9514e765fcf1991678d)

Requested-by: @tiekoetter in a Matrix chat
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-04-22 12:09:42 +02:00
Markus Heiser
effcde3d0e [fix] add missing territory (country) name
Related-to: https://github.com/searxng/searxng/pull/1029#issuecomment-1086824911
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-04-05 16:48:25 +02:00
Alexandre Flament
0379856712
Merge pull request #967 from return42/language-filter
[mod] add flags to the languages filter
2022-03-28 21:36:20 +02:00
Markus Heiser
34fd2021d8 [fix] pylint issue in py3.10
searxng_extra/update/update_firefox_version.py:16:0: W0402:
Uses of a deprecated module 'distutils.version' (deprecated-module)

[1] https://github.com/searxng/searxng/pull/1007

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-03-25 08:39:40 +01:00
Markus Heiser
2e4557f3f3 [fix] languages: show country name even if there is only one country
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-03-19 16:45:14 +01:00
Markus Heiser
a25e3767d4 [fix] don't show flags for languages without region identifier
SearXNG shows two different things:

region:
  "de-CH" is the equivalent of "Schweiz (de)" in DDG.

languages:
  "en" doesn't say anything about the location. It is up the engines to do their
  best to select English results without a region.

Suggested-by: @dalf https://github.com/searxng/searxng/pull/967#issuecomment-1072979693
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-03-19 15:09:13 +01:00
Markus Heiser
2841abaf55 [mod] add flags to the languages filter
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-03-19 15:09:13 +01:00
Markus Heiser
7cdd31440e [fix] external bangs: don't overwrite Bangs in data trie
Bangs with a `*` suffix (e.g. `!!d*`) overwrite Bangs with the same
prefix (e.g. `!!d`) [1].  This can be avoid when a non printable character is
used to tag a LEAF_KEY.

[1] https://github.com/searxng/searxng/pull/740#issuecomment-1010411888

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-12 19:37:13 +01:00
Markus Heiser
295876abaa [pylint] add scripts from searxng_extra/update to pylint
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-05 16:09:40 +01:00
Markus Heiser
ffea5d8ef5 [docs] add documentation for the scripts in searxng_extra/update
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-05 16:09:40 +01:00
Markus Heiser
8191e1a253 [fix] update_languages.py: generate code that passes CI
File searx/languages.py, created by update_languages.py has to pass quality
check from CI::

    make format.python

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-01 18:32:21 +01:00
Markus Heiser
8a07559ab5 [fix] update_languages.py: no excption on unknown locale & language
Fix exception handling of unknown locales and languages::

    ERROR: ca_ES_valencia --> [Errno 2] No such file or directory: 'local/py3/lib/python3.8/site-packages/babel/locale-data/ca_ES_valencia.dat'
    ERROR: languages['fil-PH'] --> {'name': None, 'english_name': None}
    ERROR: languages['nb-NO'] --> {'name': None, 'english_name': None}

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-01 17:31:38 +01:00
Markus Heiser
3d96a9839a [format.python] initial formatting of the python code
This patch was generated by black [1]::

    make format.python

[1] https://github.com/psf/black

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-12-27 09:26:22 +01:00
Markus Heiser
fcdc2c2cd2 [format.python] disable py code formatting for some hunks of code
Disable the python code formatting from python-black, where the readability of
code suffers by formatting.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-12-27 09:16:03 +01:00
Alexandre Flament
56e6d19b48
update_firefox_version.py: update user agent signature
The user agent from Windows is
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0

See https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent/Firefox#windows
2021-12-16 23:10:39 +01:00
Alexandre Flament
828088fa5a [mod] update_languages: min_engines_per_country=7
a (language,country) tuple is included if 7 engines have it, was 10 before.

close #432
2021-10-26 12:13:23 +02:00
Markus Heiser
955eab8240 [mod] searxng_extras - minor improvements
- fix docs/searxng_extra/standalone_searx.py.rst
- add SPDX tag
- pylint standalone_searx.py and update_wikidata_units.py

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-10-03 19:04:18 +02:00
Alexandre Flament
1bb82a6b54 SearXNG: searxng_extra 2021-10-02 17:30:39 +02:00