[doc] documentation of the favicons infrastructure

Run ``make docs.live`` and visit http://0.0.0.0:8000/admin/searx.favicons.html

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This commit is contained in:
Markus Heiser 2024-09-20 18:08:40 +02:00 committed by Markus Heiser
parent 5ded9ada82
commit a7d02d4101
6 changed files with 264 additions and 16 deletions

View file

@ -15,6 +15,7 @@ Administrator documentation
installation-apache
update-searxng
answer-captcha
searx.favicons
searx.limiter
api
architecture

View file

@ -0,0 +1,251 @@
.. _favicons:
========
Favicons
========
.. sidebar:: warning
Don't activate the favicons before reading the documentation.
.. contents::
:depth: 2
:local:
:backlinks: entry
Activating the favicons in SearXNG is very easy, but this **generates a
significantly higher load** in the client/server communication and increases
resources needed on the server.
To mitigate these disadvantages, various methods have been implemented,
including a *cache*. The cache must be parameterized according to your own
requirements and maintained regularly.
To activate favicons in SearXNG's result list, set a default
``favicon_resolver`` in the :ref:`search <settings search>` settings:
.. code:: yaml
search:
favicon_resolver: "duckduckgo"
By default and without any extensions, SearXNG serves these resolvers:
- ``duckduckgo``
- ``allesedv``
- ``google``
- ``yandex``
With the above setting favicons are displayed, the user has the option to
deactivate this feature in his settings. If the user is to have the option of
selecting from several *resolvers*, a further setting is required / but this
setting will be discussed :ref:`later <register resolvers>` in this article,
first we have to setup the favicons cache.
Infrastructure
==============
The infrastructure for providing the favicons essentially consists of three
parts:
- :py:obj:`Favicons-Proxy <.favicons.proxy>` (aka *proxy*)
- :py:obj:`Favicons-Resolvers <.favicons.resolvers>` (aka *resolver*)
- :py:obj:`Favicons-Cache <.favicons.cache>` (aka *cache*)
To protect the privacy of users, the favicons are provided via a *proxy*. This
*proxy* is automatically activated with the above activation of a *resolver*.
Additional requests are required to provide the favicons: firstly, the *proxy*
must process the incoming requests and secondly, the *resolver* must make
outgoing requests to obtain the favicons from external sources.
A *cache* has been developed to massively reduce both, incoming and outgoing
requests. This *cache* is also activated automatically with the above
activation of a *resolver*. In its defaults, however, the *cache* is minimal
and not well suitable for a production environment!
.. _favicon cache setup:
Setting up the cache
====================
To parameterize the *cache* and more settings of the favicons infrastructure, a
TOML_ configuration is created in the file ``/etc/searxng/favicons.toml``.
.. code:: toml
[favicons]
cfg_schema = 1 # config's schema version no.
[favicons.cache]
db_url = "/var/cache/searxng/faviconcache.db" # default: "/tmp/faviconcache.db"
LIMIT_TOTAL_BYTES = 2147483648 # 2 GB / default: 50 MB
# HOLD_TIME = 5184000 # 60 days / default: 30 days
# BLOB_MAX_BYTES = 40960 # 40 KB / default 20 KB
# MAINTENANCE_MODE = "off" # default: "auto"
# MAINTENANCE_PERIOD = 600 # 10min / default: 1h
:py:obj:`cfg_schema <.FaviconConfig.cfg_schema>`:
Is required to trigger any processes required for future upgrades / don't
change it.
:py:obj:`cache.db_url <.FaviconCacheConfig.db_url>`:
The path to the (SQLite_) database file. The default path is in the `/tmp`_
folder, which is deleted on every reboot and is therefore unsuitable for a
production environment. The FHS_ provides the folder for the
application cache
The FHS_ provides the folder `/var/cache`_ for the cache of applications, so a
suitable storage location of SearXNG's caches is folder ``/var/cache/searxng``.
In container systems, a volume should be mounted for this folder and in a
standard installation (compare :ref:`create searxng user`), the folder must be
created and the user under which the SearXNG process is running must be given
write permission to this folder.
.. code:: bash
$ sudo mkdir /var/cache/searxng
$ sudo chown root:searxng /var/cache/searxng/
$ sudo chmod g+w /var/cache/searxng/
:py:obj:`cache.LIMIT_TOTAL_BYTES <.FaviconCacheConfig.LIMIT_TOTAL_BYTES>`:
Maximum of bytes stored in the cache of all blobs. The limit is only reached
at each maintenance interval after which the oldest BLOBs are deleted; the
limit is exceeded during the maintenance period.
.. attention::
If the maintenance period is too long or maintenance is switched
off completely, the cache grows uncontrollably.
SearXNG hosters can change other parameters of the cache as required:
- :py:obj:`cache.HOLD_TIME <.FaviconCacheConfig.HOLD_TIME>`
- :py:obj:`cache.BLOB_MAX_BYTES <.FaviconCacheConfig.BLOB_MAX_BYTES>`
Maintenance of the cache
------------------------
Regular maintenance of the cache is required! By default, regular maintenance
is triggered automatically as part of the client requests:
- :py:obj:`cache.MAINTENANCE_MODE <.FaviconCacheConfig.MAINTENANCE_MODE>` (default ``auto``)
- :py:obj:`cache.MAINTENANCE_PERIOD <.FaviconCacheConfig.MAINTENANCE_PERIOD>` (default ``6000`` / 1h)
As an alternative to maintenance as part of the client request process, it is
also possible to carry out maintenance using an external process. For example,
by creating a :man:`crontab` entry for maintenance:
.. code:: bash
$ python -m searx.favicons cache maintenance
The following command can be used to display the state of the cache:
.. code:: bash
$ python -m searx.favicons cache state
.. _favicon proxy setup:
Proxy configuration
===================
Most of the options of the :py:obj:`Favicons-Proxy <.favicons.proxy>` are
already set sensibly with settings from the :ref:`settings.yml <searxng
settings.yml>` and should not normally be adjusted.
.. code:: toml
[favicons.proxy]
max_age = 5184000 # 60 days / default: 7 days (604800 sec)
:py:obj:`max_age <.FaviconProxyConfig.max_age>`:
The `HTTP Cache-Control max-age`_ response directive indicates that the
response remains fresh until N seconds after the response is generated. This
setting therefore determines how long a favicon remains in the client's cache.
As a rule, in the favicons infrastructure of SearXNG's this setting only
affects favicons whose byte size exceeds :ref:`BLOB_MAX_BYTES <favicon cache
setup>` (the other favicons that are already in the cache are embedded as
`data URL`_ in the :py:obj:`generated HTML <.favicons.proxy.favicon_url>`,
which can greatly reduce the number of additional requests).
.. _register resolvers:
Register resolvers
------------------
A :py:obj:`resolver <.favicon.resolvers>` is a function that obtains the favicon
from an external source. The resolver functions available to the user are
registered with their fully qualified name (FQN_) in a ``resolver_map``.
If no ``resolver_map`` is defined in the ``favicon.toml``, the favicon
infrastructure of SearXNG generates this ``resolver_map`` automatically
depending on the ``settings.yml``. SearXNG would automatically generate the
following TOML configuration from the following YAML configuration:
.. code:: yaml
search:
favicon_resolver: "duckduckgo"
.. code:: toml
[favicons.proxy.resolver_map]
"duckduckgo" = "searx.favicons.resolvers.duckduckgo"
If this automatism is not desired, then (and only then) a separate
``resolver_map`` must be created. For example, to give the user two resolvers to
choose from, the following configuration could be used:
.. code:: toml
[favicons.proxy.resolver_map]
"duckduckgo" = "searx.favicons.resolvers.duckduckgo"
"allesedv" = "searx.favicons.resolvers.allesedv"
# "google" = "searx.favicons.resolvers.google"
# "yandex" = "searx.favicons.resolvers.yandex"
.. note::
With each resolver, the resource requirement increases significantly.
The number of resolvers increases:
- the number of incoming/outgoing requests and
- the number of favicons to be stored in the cache.
In the following we list the resolvers available in the core of SearXNG, but via
the FQN_ it is also possible to implement your own resolvers and integrate them
into the *proxy*:
- :py:obj:`searx.favicons.resolvers.duckduckgo`
- :py:obj:`searx.favicons.resolvers.allesedv`
- :py:obj:`searx.favicons.resolvers.google`
- :py:obj:`searx.favicons.resolvers.yandex`
.. _SQLite:
https://www.sqlite.org/
.. _FHS:
https://refspecs.linuxfoundation.org/FHS_3.0/fhs/index.html
.. _`/var/cache`:
https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch05s05.html
.. _`/tmp`:
https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s18.html
.. _TOML:
https://toml.io/en/
.. _HTTP Cache-Control max-age:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control#response_directives
.. _data URL:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs
.. _FQN: https://en.wikipedia.org/wiki/Fully_qualified_name

View file

@ -43,13 +43,9 @@
- ``wikipedia``
``favicon_resolver``:
:ref:`Favicon resolver <favicons>`, leave blank to turn off the feature by
default.
- ``allesedv``
- ``duckduckgo``
- ``google``
- ``yandex``
To activate favicons in SearXNG's result list select a default
favicon-resolver, leave blank to turn off the feature. Don't activate the
favicons before reading the :ref:`Favicons documentation <favicons>`.
``default_lang``:
Default search language - leave blank to detect from browser information or

View file

@ -1,8 +1,8 @@
.. _favicons:
.. _favicons source:
========
Favicons
========
=================
Favicons (source)
=================
.. contents::
:depth: 2

View file

@ -27,7 +27,7 @@ def init():
if not cfg_file.exists():
if is_active():
logger.error(f"missing favicon config: {cfg_file}")
cfg_file = config.DEFAULT_CFG_TOML
cfg_file = config.DEFAULT_CFG_TOML_PATH
logger.debug(f"load favicon config: {cfg_file}")
cfg = config.FaviconConfig.from_toml_file(cfg_file, use_cache=True)

View file

@ -13,10 +13,10 @@ from .proxy import FaviconProxyConfig
CONFIG_SCHEMA: int = 1
"""Version of the configuration schema."""
TOML_CACHE: dict[str, "FaviconConfig"] = {}
TOML_CACHE_CFG: dict[str, "FaviconConfig"] = {}
"""Cache config objects by TOML's filename."""
DEFAULT_CFG_TOML = pathlib.Path(__file__).parent / "favicons.toml"
DEFAULT_CFG_TOML_PATH = pathlib.Path(__file__).parent / "favicons.toml"
class FaviconConfig(BaseModel):
@ -40,7 +40,7 @@ class FaviconConfig(BaseModel):
specifies whether a cache should be used.
"""
cached = TOML_CACHE.get(str(cfg_file))
cached = TOML_CACHE_CFG.get(str(cfg_file))
if use_cache and cached:
return cached
@ -57,6 +57,6 @@ class FaviconConfig(BaseModel):
cfg = cls(**cfg)
if use_cache and cached:
TOML_CACHE[str(cfg_file.resolve())] = cfg
TOML_CACHE_CFG[str(cfg_file.resolve())] = cfg
return cfg