The URL was accidentally deleted in a85907a98, but is still required in
base.html for auto-discovery / from base.html::
<link title="{{ instance_name }}"
rel="search" href="{{ opensearch_url }}"
Signed-off-by: Markus Heiser <>
Changed value of "extra_proxy_timeout" from 10.0 to 10 as the variable expects an int.
Uncommenting this value with a non-int value will throw many errors and crash all engines.
To avoid unnecessary changes to the file, the list should be sorted before it is
written to the file.
You can test it by calling multiple times::
make data.locales
and searx/data/locales.json should be unchanged.
Signed-off-by: Markus Heiser <>
babel.Locale.parse loads more than 60MB in RAM. The only purpose is to get:
This commit calls babel.Locale.parse when the translations are update from
weblate and stored in::
This file can be build by::
./manage data.locales
By store these variables in when the translations are updated we save
round about 65MB (usually 4 worker = 260MB of RAM saved.
Co-authored-by: Markus Heiser <>
Parse the result list from given in the variable named
<script nonce="..">
window.MESON = window.MESON || {};
window.MESON.initialState = {"siteConfig": ...
window.MESON.loadedLang = "en";
The result list is in field::
Signed-off-by: Markus Heiser <>
Highlights all search queries in search result in one go.
Fixes the case where search query contains word from highlight HTML code,
which causes broken HTML to appear in search results.
timeout: 4.0
The timeout of presearch-WEB is left up from the default of 3sec to 4sec. The
engine has to send two HTTP requests, they often exceed the default timeout of
3sec. Since all other presearch categories (images, videos, news) also have a
timeout of 4 sec, the WEB search should also have the same timeout.
network: presearch
Place all HTTP requests in the same network, named ``presearch``.
Signed-off-by: Markus Heiser <>
In Presearch there are languages for the UI and regions for narrowing down the
search. With this change the SearXNG engine supports a search by region. The
details can be found in the documentation of the source code.
To test, you can search terms like::
!presearch bmw :zh-TW
!presearch bmw :en-CA
1. You should get results corresponding to the region (Taiwan, Canada)
2. and in the language (Chinese, Englisch).
3. The context in info box content is in the same language.
1. Region or language is not supported by Presearch or
2. SearXNG user did not selected a region tag, example::
!presearch bmw :en
Signed-off-by: Markus Heiser <>
This patch fixes issue reported by ``make test.unit``::
searx/search/checker/ SyntaxWarning: invalid escape sequence '\>'
rep = ['<' + tag + '[^\>]*>' for tag in HTML_TAGS]
Signed-off-by: Markus Heiser <>
All the environments defined in ./utils/brand.env are generated on the fly, so
there is no longer a need to define the brand environment in this file and all
the workflows to handle this file.
Signed-off-by: Markus Heiser <>
URL in sidedbar only exists in HTTP POST requests. On HTTP GET requests the
selector ``#search_url button#copy_url`` results in a ``null`` type and a
``.style.display`` raises::
Uncaught TypeError: d.querySelector(...) is null
As a result, the initialization of the event handler is no longer carried out.
Suggested-by: Markus Heiser <>
- the option server:public_instance lacks some documentation
- the processing of this option belongs in the limiter and not
in botdetection module
Signed-off-by: Markus Heiser <>
This patch was inspired by the discussion around PR-2882 [2]. The goals of this
patch are:
1. Convert plugin searx.plugin.limiter to normal code [1]
2. isolation of botdetection from the limiter [2]
3. searx/{tools => botdetection}/ and drop
4. in URL /config, 'limiter.enabled' is true only if the limiter is really
enabled (Redis is available).
This patch moves all the code that belongs to botdetection into namespace
searx.botdetection and code that belongs to limiter is placed in namespace
Tthe limiter used to be a plugin at some point botdetection was added, it was
not a plugin. The modularization of these two components was long overdue.
With the clear modularization, the documentation could then also be organized
according to the architecture.
To test:
- check the app works without the limiter, check `/config`
- check the app works with the limiter and with the token, check `/config`
- make .. and read
Signed-off-by: Markus Heiser <>
To test this PR run a local instance and try to query page 51:
A parameter exception will be raised:
searx.exceptions.SearxParameterException: Invalid value "51" for parameter pageno
And the client will receive a HTTP 400 (Bad request).
Signed-off-by: Markus Heiser <>
In python versions <py3.10 there is an issue with an undocumented method
HTMLParser.error() [1][2] that was deprecated in Python 3.4 and removed
in Python 3.5.
To be compatible to higher versions (>=py3.10) an error method is implemented
which throws an AssertionError exception like the higher Python versions do [3].
Signed-off-by: Markus Heiser <>
DDG's bot detection is sensitive to the vqd value. For some search terms (such
as extremely long search terms that are often sent by bots), no vqd value can be
If SearXNG cannot determine a vqd value, then no request should go out to
DDG (WEB): a request with a wrong vqd value leads to DDG temporarily putting
SearXNG's IP on a block list.
Requests from IPs in this block list run into timeouts.
Not sure, but it seems the block list is a sliding window: to get my IP rid from
the bot list I had to cool down my IP for 1h (send no requests from that IP to
Since such issues can't reproduce in a local instance I tested this patch 24h on
my public SearXNG instance: There are still errors (rare), but the reliability
is still 100%.
Signed-off-by: Markus Heiser <>
Some search terms do not have results and therefore no vqd value
BTW: remove a leftover from 9197efa
Signed-off-by: Markus Heiser <>
We have had problems with this before, the bot protection from ddg-lite seems to
have included this referer in the rating [1][2].
From reverse engineering:
- The Referer ```` was set in commt 257dc7d6c4 --> DDG lite
does not like this referer anymore!
- The 'Referer' header is only set on second and follow up pages but not on the
first page
- The vqd value is not needed on the first page, the ddg-lite client sets this
value only on follow up pages / this can help to reduce the vqd requests from
Related to 'Referer' header & ddg requests:
Signed-off-by: Markus Heiser <>
The change in the hotkey mechanism introduced in 317db5b04 does not allow
configuration via `settings.yml`. This commit adds that functionality.
Closes: #2898
Instead of thumbnail use img_src in the result item, otherwise the "movies"
categories looks clunky.
- b4e0d2eedc (r128785388)
Signed-off-by: Markus Heiser <>
Anna’s Archive has cleaned up their languages, available file extensions and
changed the HTML form.
Signed-off-by: Markus Heiser <>
Crossref was broken on result types journal-issue and component .. The old code
had lots of assumptions, and broke during parsing. Now the assumptions are more
explicit and checked them with the API.
This PR improves the UX by making auto-scroll more smoother. The css is changed
so all the auto-scroll will be smoother but User-scroll will not be influenced.
The scroll-behavior CSS property sets the behavior for a scrolling box when
scrolling is triggered by the navigation or CSSOM scrolling APIs.[1]
Remove the usage of
The results from Bing contains the target URL encoded in base64
See the u parameter, remove the first two character "a1", and done.
Also add a comment the check of the result_len / pageno
( from )
It seems there is an API change:
extratags can be either a dictionnary or None.
This commit avoid crash when extratags is None
Test query "!osm gare du nord"
The method EngineTraits.get_region(..) returns engine's region string
that **best fits** to SearXNG's locale. This means it returns a
region (country) if only a language is set in the locale. By example the method
returns for a locale tag `es` a region `ES`.
Google's search parameter `cr` restricts search results to documents originating
in a particular country / in case of a locale tag (language) as described above,
this argument should be unset in the query send to Google.
Signed-off-by: Markus Heiser <>
The search engines deliver hits for many search terms [1], but these are usually
not the focus of the user. In order to arrange these hits further down in the
list, their weighting is reduced.
Signed-off-by: Markus Heiser <>
Show URL of the ddg-search page, not the URL of a (generic) Javascript. The
latter one is not usefull for the user.
Signed-off-by: Markus Heiser <>
Tis patch adds some more fields to the result items and changed paging to the
``nextResultSet`` given in seekr's JSON response.
Signed-off-by: Markus Heiser <>
Sadly is blocked by a CAPTCHA that can't be avoid (at least in a
XPath engine).
Signed-off-by: Markus Heiser <>
* this is a small fix to increase the colspan of the category in engine preferences from 7 to 8, since there was a column added
=> fixing a small fallout from 4731290317
flask_babel.gettext() does not work in the engine modules.
the request() and response() functions of the engine modules run in the
processor, whose search() method runs in a thread and in the threads the
context of the Flask app does not exist. The context of the Flask app is
needed by the gettext() function for the L10n.
copy context of the Flask app into the threads. [1]
special case:
We cannot equip the search() method of the processors with the decorator [1],
because the decorator requires a context (Flask app) that does not yet exist
at the time of the initialization of the processors (the initialization of the
processors is part of the initialization of the Flask app).
Signed-off-by: Markus Heiser <>
Disable btdigg because on most SearXNG instances, SearXNG is blocked by btdigg
due to cloudflare too many requests.
This impementation did not parse the HTML page because there is an API in
XML (RSS). The RSS feed provides fewer data like amount of seeders/leechers and
the files in the torrent file. It's a tradeoff for a "stable" engine as the XML
from RSS content will change way less than the HTML page.
The Wikimedia wikis [1] engines provide good answers and have short response
times --> no reason to disable these enhgines by default. BTW: this patch adds
a (sub-) category ``wikimedia`` for the engines [1].
Signed-off-by: Markus Heiser <>
SearXNG does not allow a None value in the content field of a result item.
If the key (shortDescription, uploaderName) in the JSON response from piped
exists but is set to None, SearXNG ignores this result item::
DEBUG searx : result: invalid content: { .., 'content': None, ..}
Signed-off-by: Markus Heiser <>
`pointer-events` never gets set to "none" when the button is hidden,
allowing you to click the button. And your mouse further changes it's
cursor to the pointer style.
- re-enables z-library as the new domain is now available
from the open web. The announcement of the domain:
It is an official domain, it requires to log in to the "personal" subdomain
only to download files, but the search works.
- changes the result template of zlibrary to paper.html, filling the appropriate fields
- implements language filtering for zlibrary
- implement zlibrary custom filters (engine traits)
- refactor and document the zlibrary engine
We have built up detailed documentation of the *settings* and the *engines* over
the past few years. However, this documentation was still spread over various
chapters and was difficult to navigate in its entirety.
This patch rearranges the Settings & Engines documentation for better
To review new ordered docs::
make docs.clean
Signed-off-by: Markus Heiser <>
The renderuing of the WEB page is very strange; except the firts position all
other positions of Anna's result page are enclosed in SGML comments. These
cooments are *uncommented* by some JS code, see query of the class
'.js-scroll-hidden' in Anna's HTML template [1].
Signed-off-by: Markus Heiser <>
- torznab engine using types and clearer code
- torznab option to hide torrent and magnet links.
- document the torznab engine
- add myself to authors
Signed-off-by: Markus Heiser <>
It seems that Google is rolling out a modified WEB API [1][2].
In the past there was only the UI language in the `hl` argument but nowadays it
seems a combination of the UI language and the "search region" is mixed in this
argument and the `gl` argument has been removed. I'm very surprised that google
is starting to mix the parameters of the UI with the parameters of the search
This patch modifies the get_google_info(..) function. Beside Google-WEB this
function is also used by other Google services, here are some examples to test
region & language of ..
- Google-WEB: `!go dragon boat :en-CA`
- Google-News: `!gon dragon boat :en-CA`
- Google-Videos: `!gov bmw :en-CA`
- Goolge-Images `!goi bmw :en-CA`
- [1]
- [2]
Signed-off-by: Markus Heiser <>
This patch fixes some quirks and issues related to the engines and the network.
Each engine has its own network and this network was broken for the following
- archlinux
- bing
- dailymotion
- duckduckgo
- google
- peertube
- startpage
- wikipedia
Since the files have been touched anyway, the type annotaions of the engine
modules has also been completed so that error messages from the type checker are
no longer reported.
Related and (partial) fixed issue:
- [1]
- [2]
- [3]
Signed-off-by: Markus Heiser <>
This patch implements a simple JSONEncoder just to fix#2502 / on the long term
SearXNG needs a data schema for the result items and a json generator for the
result list.
Signed-off-by: Markus Heiser <>
Over the years the webapp module became more and more a mess. To improve the
modulaization a little this patch moves some implementations from the webapp
module to webutils module.
HINT: this patch brings non functional change
Signed-off-by: Markus Heiser <>
A blocklist and a passlist can be configured in /etc/searxng/limiter.toml::
pass_ip = [
'', # IPv4 of
block_ip = [
'', # IPv4 of
Signed-off-by: Markus Heiser <>
The monolithic implementation of the limiter was divided into methods and
implemented in the Python package searx.botdetection. Detailed documentation on
the methods has been added.
The methods are divided into two groups:
1. Probe HTTP headers
- Method http_accept
- Method http_accept_encoding
- Method http_accept_language
- Method http_connection
- Method http_user_agent
2. Rate limit:
- Method ip_limit
- Method link_token (new)
The (reduced) implementation of the limiter is now in the module
searx.botdetection.limiter. The first group was transferred unchanged to this
module. The ip_limit contains the sliding windows implemented by the limiter so
This merge also fixes some long outstandig issue:
- limiter does not evaluate the Accept-Language correct [1]
- limiter needs a IPv6 prefix to block networks instead of IPs [2]
Without additional configuration the limiter works as before (apart from the
bugfixes). For the commissioning of additional methods (link_toke), a
configuration must be made in an additional configuration file. Without this
configuration, the limiter runs as before (zero configuration).
The ip_limit Method implements the sliding windows of the vanilla limiter,
additionally the link_token method can be used in this method. The link_token
method can be used to investigate whether a request is suspicious. To activate
the link_token method in the ip_limit method add the following to your
link_token = true
HINT: this patch has no functional change / it is the preparation for following
changes and bugfixes
Over the years, the preferences template became an unmanageable beast. To make
the source code more readable the monolith is splitted into elements. The
splitting into elements also has the advantage that a new template can make use
of them.
The reversed checkbox is a quirk that is only used in the prefereces and must be
eliminated in the long term. For this the macro 'checkbox_onoff_reversed' was
added to the preferences.html template. The 'checkbox' macro is also a quirk of
the preferences.html we don't want to use in other templates (it is an
input-checkbox in a HTML form that was misused for status display).
Signed-off-by: Markus Heiser <>
In my tests I see bots rotating IPs (with endless IP lists). If such a bot has
100 IPs and has three attempts (SUSPICIOUS_IP_MAX = 3) then it can successfully
send up to 300 requests in one day while rotating the IP. To block the bots for
a longer period of time the SUSPICIOUS_IP_WINDOW, as the time period in which an
IP is observed, must be increased.
For normal WEB-browsers this is no problem, because the SUSPICIOUS_IP_WINDOW is
deleted as soon as the CSS with the token is loaded.
SUSPICIOUS_IP_WINDOW = 3600 * 24 * 30
Time (sec) before sliding window for one suspicious IP expires.
Maximum requests from one suspicious IP in the :py:obj:`SUSPICIOUS_IP_WINDOW`."""
Signed-off-by: Markus Heiser <>
For correct determination of the IP to the request the function
botdetection.get_real_ip() is implemented. This fonction is used in the
ip_limit and link_token method of the botdetection and it is used in the
self_info plugin.
A documentation about the X-Forwarded-For header has been added.
Signed-off-by: Markus Heiser <>
- counting requests in LONG_WINDOW and BURST_WINDOW is not needed when the
request is validated by the link_token method [1]
- renew a ping-key on validation [2], this is needed for infinite scrolling,
where no new token (CSS) is loaded. / this does not fix the BURST_MAX issue in
the vanilla limiter
- normalize the counter names of the ip_limit method to 'ip_limit.*'
- just integrate the ip_limit method straight forward in the limiter plugin /
non intermediate code --> ip_limit now returns None or a werkzeug.Response
object that can be passed by the plugin to the flask application / non
intermediate code that returns a tuple
Signed-off-by: Markus Heiser <>
To intercept bots that get their IPs from a range of IPs, there is a
``SUSPICIOUS_IP_WINDOW``. In this window the suspicious IPs are stored for a
longer time. IPs stored in this sliding window have a maximum of
``SUSPICIOUS_IP_MAX`` accesses before they are blocked. As soon as the IP makes
a request that is not suspicious, the sliding window for this IP is droped.
Signed-off-by: Markus Heiser <>
To activate the ``link_token`` method in the ``ip_limit`` method add the
following to your ``/etc/searxng/limiter.toml``::
link_token = true
Signed-off-by: Markus Heiser <>
In order to be able to meet the outstanding requirements, the implementation is
modularized and supplemented with documentation.
This patch does not contain functional change, except it fixes issue #2455
Aktivate limiter in the settings.yml and simulate a bot request by::
curl -H 'Accept-Language: de-DE,en-US;q=0.7,en;q=0.3' \
-H 'Accept: text/html'
-H 'User-Agent: xyz' \
-H 'Accept-Encoding: gzip' \
In the LOG:
DEBUG searx.botdetection.link_token : missing ping for this request: .....
Since ``BURST_MAX_SUSPICIOUS = 2`` you can repeat the query above two time
before you get a "Too Many Requests" response.
Signed-off-by: Markus Heiser <>
If there were no results but errors in the engines then the error dialogs of the
engines was displayed in the result list.
With the new design errors of the engines should only be displayed in the
sidebar and at the same time duplications of the (template) code will be
Signed-off-by: Markus Heiser <>
* set border top and bottom on sidebar collasables
* inrease peading on summary so its easier to click on mobile
* remove margins and add flex wrapper to normalize elements in sidebar
Make elements in the sidebar collapse able. Except infoboxes all elements in
the sidebar are collapsed by default.
By folding out the sidebar elements, the UI looks less cluttered. Especially on
small devices like smartphones, where the sidebar is above the results list, the
UX should be improved [1].
Signed-off-by: Markus Heiser <>
Block requests from PetalBlock. Normally robots.txt is enough to stop
PetalBlock from making requests [1]. However, if SearXNG is offered below a
path (, then the robots.txt is not available in the root
paths of the domain / subdomain.
Signed-off-by: Markus Heiser <>
Wikipedia's zh-classical is not zh_Hant (see doc-string of engines.wikipedia).
Fixed the example in the doc-string of locales.get_engine_locale() to 'zh_TW'.
Signed-off-by: Markus Heiser <>
To set the language from language recognition and hold the value selected by the
client, the previous implementation creates a copy of the SearchQuery object and
manipulates the SearchQuery object by calling function replace_auto_language().
This patch tries to implement a similar functionality in a more central place,
in function get_search_query_from_webapp() when the SearchQuery object is build
Additional this patch uses the language preferred by the client, if language
recognition does not have a match / the existing implementation does not care
about client preferences and uses 'all' in case of no match.
Signed-off-by: Markus Heiser <>
Follow up of #2269
The script to update the descriptions of the engines does no longer work since
PR #2269 has been merged.
1. There was a misusage of
- `zh-classical` is dedicate to classical Chinese [1] which is not
traditional Chinese [2].
- has LanguageConverter enabled [3] and is going to
dynamically show simplified or traditional Chinese according to the
HTTP Accept-Language header.
2. The needs a list of all wikipedias. The
implementation from #2269 included only a reduced list:
Before PR #2269 there was a match_language() function that did an approximation
using various methods. With PR #2269 there are only the types in the data model
of the languages, which can be recognized by babel. The approximation methods,
which are needed (only here) in the determination of the descriptions, must be
replaced by other methods.
Signed-off-by: Markus Heiser <>
Since [bb3a01f8] has been merged to the Farside project, Farside instances do no
longer need to send requests to SearXNG instances [1].
There are some old unmaintained Farside instances on the web that continue to
query SearXNG instances --> we can safely block their requests.
Signed-off-by: Markus Heiser <>
When the user press [TAB] the input form should be filled with the highlighted
item from the autocomplete list, but not release a search / with other words:
what we now have by pressing once on [ENTER] should be mapped to the [TAB] key
and pressing [ENTER] once should release a search query. [1]
Signed-off-by: Markus Heiser <>
- Update input when selecting autocomplete prediction with keyboard
- Search immediately by pressing enter key
- Search immediately by clicking on an autocomplete suggestion
On some result items from Bing-WEB the `<span class='algoSlug_icon'>` tag is the
only tag that contains a description. The issue can be reproduced by [1]::
!bi vmware
Reported-by: @AlyoshaVasilieva
Signed-off-by: Markus Heiser <>
This PR does no functional change it is just an attempt to make more clear in
the code, what a default category is and what a subcategory is. The previous
name 'others' leads to confusion with the **category 'other'**.
If a engine is not assigned to a category, the default is assigned::
If an engine has only one category and this category is shown as tab in the user
interface, this engine has no further subgrouping::
NO_SUBGROUPING = 'without further subgrouping'
Signed-off-by: Markus Heiser <>
When using ``use_default_settings: true``, removing default categories from
settings.yml will not remove them from the UI.
The value ``categories_as_tabs`` is a dictionary type (a4c2cfb) and dictionary
types are merged additive by ``settings_loader.update_settings()``.
This patch replaces the default ``categories_as_tabs`` by the one from the
Signed-off-by: Markus Heiser <>
- requests without HTTP header 'Connection' or missing 'User-Agent' will be
blocked by the limiter
- re_bot is related to 'User-Agent' and has been renamed to block_user_agent
Signed-off-by: Markus Heiser <>
Google-News returns internal links where the origin URL is encoded in a
base64 (RFC 2045 aka URL-safe) string.
Signed-off-by: Markus Heiser <>
In debug mode more detailed logging is needed to evaluate if an access should
have been blocked by the limiter.
BTW: remove duplicate code checking bot signature ``re_bot.match(user_agent)``
Signed-off-by: Markus Heiser <>
Since 28. March google has changed its response, this patch fixes the google
engine to scrap out the results & images from the new designed response.
Signed-off-by: Markus Heiser <>
This patch replaces the *full of magic* ``utils.match_language`` function by a
``locales.match_locale``. The ``locales.match_locale`` function is based on the
``locales.build_engine_locales`` introduced in 9ae409a0 [1].
In the past SearXNG did only support a search by a language but not in a region.
This has been changed a long time ago and regions have been added to SearXNG
core but not to the engines. The ``utils.match_language`` was the function to
handle the different aspects of language/regions in SearXNG core and the
supported *languages* in the engine. The ``utils.match_language`` did it with
some magic and works good for most use cases but fails in some edge case.
To replace the concurrence of languages and regions in the SearXNG core the
``locales.build_engine_locales`` was introduced in 9ae409a0 [1]. With the last
patches all engines has been migrated to a ``fetch_traits`` and a
language/region concept that is based on ``locales.build_engine_locales``.
To summarize: there is no longer a need for the ``locales.match_language``.
Signed-off-by: Markus Heiser <>
All engines has been migrated from ``supported_languages`` to the
``fetch_traits`` concept. There is no longer a need for the obsolete code that
implements the ``supported_languages`` concept.
Signed-off-by: Markus Heiser <>
re-implementation of the Archlinux Wiki:
- fetch_traits(): fetch languages, wiki URLs and title arguments
- add content field to the result list
- add documentation
Wikis from,, do no longer
exists (has been merged in the main wiki).
Signed-off-by: Markus Heiser <>
- fetch_traits() SepiaSearch and Peertube are using identical languages.
Replace module's dictionary `supported_languages` by `engine.traits.languages`
(data_type: `traits_v1`).
- fixed code to pass pylint
- request(): add argument boostLanguages
- response(): is replaced by peertube's video_response() function, which adds
metadata from channel name, host & tags
Signed-off-by: Markus Heiser <>
- fetch_traits(): fetch locales (and languages) from dailymotion API
- removed obsolete data-type "supported_languages"
- add documentation
- improved argument list of the HTTP request:
- add argument: family_filter_map
- add conditional argument: localization
Don't add localization and country arguments if the user does select a
language (:de, :en, ..)
- improve code quality (mainly improve readability)
Signed-off-by: Markus Heiser <>