searxng/docs/admin/filtron.rst
Markus Heiser 39feb141bc docs(admin): add description of the utils/filtron.sh script
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-01-11 12:50:40 +01:00

5.3 KiB

How to protect an instance

Searx depens on external search services. To avoid the abuse of these services it is advised to limit the number of requests processed by searx.

An application firewall, filtron solves exactly this problem. Filtron is just a middleware between your web server (nginx, apache, ...) and searx.

filtron & go

Filtron needs Go installed. If Go is preinstalled, filtron is simply installed by go get package management (see filtron README). If you use filtron as middleware, a more isolated setup is recommended.

  1. Create a separated user account (filtron).
  2. Download and install Go binary in users $HOME (~filtron).
  3. Install filtron with the package management of Go (go get -v -u github.com/asciimoo/filtron)
  4. Setup a proper rule configuration [ref] <utils/templates/etc/filtron/rules.json> (/etc/filtron/rules.json).
  5. Setup a systemd service unit [ref] <utils/templates/lib/systemd/system/filtron.service> (/lib/systemd/system/filtron.service).

To simplify such a installation and the maintenance of; use our script utils/filtron.sh:

../utils/filtron.sh --help

Sample configuration of filtron

An example configuration can be find below. This configuration limits the access of:

  • scripts or applications (roboagent limit)
  • webcrawlers (botlimit)
  • IPs which send too many requests (IP limit)
  • too many json, csv, etc. requests (rss/json limit)
  • the same UserAgent of if too many requests (useragent limit)
[{
   "name":"search request",
   "filters":[
      "Param:q",
      "Path=^(/|/search)$"
   ],
   "interval":"<time-interval-in-sec (int)>",
   "limit":"<max-request-number-in-interval (int)>",
   "subrules":[
      {
         "name":"roboagent limit",
         "interval":"<time-interval-in-sec (int)>",
         "limit":"<max-request-number-in-interval (int)>",
         "filters":[
            "Header:User-Agent=(curl|cURL|Wget|python-requests|Scrapy|FeedFetcher|Go-http-client)"
         ],
         "actions":[
            {
               "name":"block",
               "params":{
                  "message":"Rate limit exceeded"
               }
            }
         ]
      },
      {
         "name":"botlimit",
         "limit":0,
         "stop":true,
         "filters":[
            "Header:User-Agent=(Googlebot|bingbot|Baiduspider|yacybot|YandexMobileBot|YandexBot|Yahoo! Slurp|MJ12bot|AhrefsBot|archive.org_bot|msnbot|MJ12bot|SeznamBot|linkdexbot|Netvibes|SMTBot|zgrab|James BOT)"
         ],
         "actions":[
            {
               "name":"block",
               "params":{
                  "message":"Rate limit exceeded"
               }
            }
         ]
      },
      {
         "name":"IP limit",
         "interval":"<time-interval-in-sec (int)>",
         "limit":"<max-request-number-in-interval (int)>",
         "stop":true,
         "aggregations":[
            "Header:X-Forwarded-For"
         ],
         "actions":[
            {
               "name":"block",
               "params":{
                  "message":"Rate limit exceeded"
               }
            }
         ]
      },
      {
         "name":"rss/json limit",
         "interval":"<time-interval-in-sec (int)>",
         "limit":"<max-request-number-in-interval (int)>",
         "stop":true,
         "filters":[
            "Param:format=(csv|json|rss)"
         ],
         "actions":[
            {
               "name":"block",
               "params":{
                  "message":"Rate limit exceeded"
               }
            }
         ]
      },
      {
         "name":"useragent limit",
         "interval":"<time-interval-in-sec (int)>",
         "limit":"<max-request-number-in-interval (int)>",
         "aggregations":[
            "Header:User-Agent"
         ],
         "actions":[
            {
               "name":"block",
               "params":{
                  "message":"Rate limit exceeded"
               }
            }
         ]
      }
   ]
}]

Route request through filtron

Filtron can be started using the following command:

$ filtron -rules rules.json

It listens on 127.0.0.1:4004 and forwards filtered requests to 127.0.0.1:8888 by default.

Use it along with nginx with the following example configuration.

location / {
     proxy_set_header   Host    $http_host;
     proxy_set_header   X-Real-IP $remote_addr;
     proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
     proxy_set_header   X-Scheme $scheme;
     proxy_pass         http://127.0.0.1:4004/;
}

Requests are coming from port 4004 going through filtron and then forwarded to port 8888 where a searx is being run.