searxng/docs/admin/searx.favicons.rst
Markus Heiser a7d02d4101 [doc] documentation of the favicons infrastructure
Run ``make docs.live`` and visit http://0.0.0.0:8000/admin/searx.favicons.html

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-10-05 08:18:28 +02:00

8.6 KiB

Favicons

warning

Don't activate the favicons before reading the documentation.

Activating the favicons in SearXNG is very easy, but this generates a significantly higher load in the client/server communication and increases resources needed on the server.

To mitigate these disadvantages, various methods have been implemented, including a cache. The cache must be parameterized according to your own requirements and maintained regularly.

To activate favicons in SearXNG's result list, set a default favicon_resolver in the search <settings search> settings:

search:
  favicon_resolver: "duckduckgo"

By default and without any extensions, SearXNG serves these resolvers:

  • duckduckgo
  • allesedv
  • google
  • yandex

With the above setting favicons are displayed, the user has the option to deactivate this feature in his settings. If the user is to have the option of selecting from several resolvers, a further setting is required / but this setting will be discussed later <register resolvers> in this article, first we have to setup the favicons cache.

Infrastructure

The infrastructure for providing the favicons essentially consists of three parts:

  • :pyFavicons-Proxy <.favicons.proxy> (aka proxy)
  • :pyFavicons-Resolvers <.favicons.resolvers> (aka resolver)
  • :pyFavicons-Cache <.favicons.cache> (aka cache)

To protect the privacy of users, the favicons are provided via a proxy. This proxy is automatically activated with the above activation of a resolver. Additional requests are required to provide the favicons: firstly, the proxy must process the incoming requests and secondly, the resolver must make outgoing requests to obtain the favicons from external sources.

A cache has been developed to massively reduce both, incoming and outgoing requests. This cache is also activated automatically with the above activation of a resolver. In its defaults, however, the cache is minimal and not well suitable for a production environment!

Setting up the cache

To parameterize the cache and more settings of the favicons infrastructure, a TOML configuration is created in the file /etc/searxng/favicons.toml.

[favicons]

cfg_schema = 1   # config's schema version no.

[favicons.cache]

db_url = "/var/cache/searxng/faviconcache.db"  # default: "/tmp/faviconcache.db"
LIMIT_TOTAL_BYTES = 2147483648                 # 2 GB / default: 50 MB
# HOLD_TIME = 5184000                            # 60 days / default: 30 days
# BLOB_MAX_BYTES = 40960                         # 40 KB / default 20 KB
# MAINTENANCE_MODE = "off"                       # default: "auto"
# MAINTENANCE_PERIOD = 600                       # 10min / default: 1h
:pycfg_schema <.FaviconConfig.cfg_schema>:

Is required to trigger any processes required for future upgrades / don't change it.

:pycache.db_url <.FaviconCacheConfig.db_url>:

The path to the (SQLite) database file. The default path is in the /tmp folder, which is deleted on every reboot and is therefore unsuitable for a production environment. The FHS provides the folder for the application cache

The FHS provides the folder /var/cache for the cache of applications, so a suitable storage location of SearXNG's caches is folder /var/cache/searxng. In container systems, a volume should be mounted for this folder and in a standard installation (compare create searxng user), the folder must be created and the user under which the SearXNG process is running must be given write permission to this folder.

$ sudo mkdir /var/cache/searxng
$ sudo chown root:searxng /var/cache/searxng/
$ sudo chmod g+w /var/cache/searxng/
:pycache.LIMIT_TOTAL_BYTES <.FaviconCacheConfig.LIMIT_TOTAL_BYTES>:

Maximum of bytes stored in the cache of all blobs. The limit is only reached at each maintenance interval after which the oldest BLOBs are deleted; the limit is exceeded during the maintenance period.

Attention

If the maintenance period is too long or maintenance is switched off completely, the cache grows uncontrollably.

SearXNG hosters can change other parameters of the cache as required:

  • :pycache.HOLD_TIME <.FaviconCacheConfig.HOLD_TIME>
  • :pycache.BLOB_MAX_BYTES <.FaviconCacheConfig.BLOB_MAX_BYTES>

Maintenance of the cache

Regular maintenance of the cache is required! By default, regular maintenance is triggered automatically as part of the client requests:

  • :pycache.MAINTENANCE_MODE <.FaviconCacheConfig.MAINTENANCE_MODE> (default auto)
  • :pycache.MAINTENANCE_PERIOD <.FaviconCacheConfig.MAINTENANCE_PERIOD> (default 6000 / 1h)

As an alternative to maintenance as part of the client request process, it is also possible to carry out maintenance using an external process. For example, by creating a crontab entry for maintenance:

$ python -m searx.favicons cache maintenance

The following command can be used to display the state of the cache:

$ python -m searx.favicons cache state

Proxy configuration

Most of the options of the :pyFavicons-Proxy <.favicons.proxy> are already set sensibly with settings from the settings.yml <searxng settings.yml> and should not normally be adjusted.

[favicons.proxy]

max_age = 5184000             # 60 days / default: 7 days (604800 sec)
:pymax_age <.FaviconProxyConfig.max_age>:

The HTTP Cache-Control max-age response directive indicates that the response remains fresh until N seconds after the response is generated. This setting therefore determines how long a favicon remains in the client's cache. As a rule, in the favicons infrastructure of SearXNG's this setting only affects favicons whose byte size exceeds BLOB_MAX_BYTES <favicon cache setup> (the other favicons that are already in the cache are embedded as data URL in the :pygenerated HTML <.favicons.proxy.favicon_url>, which can greatly reduce the number of additional requests).

Register resolvers

A :pyresolver <.favicon.resolvers> is a function that obtains the favicon from an external source. The resolver functions available to the user are registered with their fully qualified name (FQN) in a resolver_map.

If no resolver_map is defined in the favicon.toml, the favicon infrastructure of SearXNG generates this resolver_map automatically depending on the settings.yml. SearXNG would automatically generate the following TOML configuration from the following YAML configuration:

search:
  favicon_resolver: "duckduckgo"
[favicons.proxy.resolver_map]

"duckduckgo" = "searx.favicons.resolvers.duckduckgo"

If this automatism is not desired, then (and only then) a separate resolver_map must be created. For example, to give the user two resolvers to choose from, the following configuration could be used:

[favicons.proxy.resolver_map]

"duckduckgo" = "searx.favicons.resolvers.duckduckgo"
"allesedv" = "searx.favicons.resolvers.allesedv"
# "google" = "searx.favicons.resolvers.google"
# "yandex" = "searx.favicons.resolvers.yandex"

Note

With each resolver, the resource requirement increases significantly.

The number of resolvers increases:

  • the number of incoming/outgoing requests and
  • the number of favicons to be stored in the cache.

In the following we list the resolvers available in the core of SearXNG, but via the FQN it is also possible to implement your own resolvers and integrate them into the proxy:

  • :pysearx.favicons.resolvers.duckduckgo
  • :pysearx.favicons.resolvers.allesedv
  • :pysearx.favicons.resolvers.google
  • :pysearx.favicons.resolvers.yandex