Commit graph

200 commits

Author SHA1 Message Date
Venca24
c9593c8ffd [enh] add plugin converting strings into hash digests 2020-10-23 21:25:10 +02:00
Alexandre Flament
2006eb4680 [mod] move extract_text, extract_url to searx.utils 2020-10-02 18:13:56 +02:00
Alexandre Flament
485a502b88 [mod] add typing and __slots__ 2020-09-22 19:05:05 +02:00
Alexandre Flament
691d12726b [mod] check the engine tokens in searx/webadapter.py instead of searx/search.py 2020-09-22 18:59:51 +02:00
Alexandre Flament
2929495112 [mod] add searx.search.EngineRef
was previously a Dict with two or three keys: name, category, from_bang
make clear that this is a engine reference (see tests/unit/test_search.py for example)
all variables using this class are renamed accordingly.
2020-09-22 18:59:51 +02:00
Alexandre Flament
2dbc0de0cd [mod] add searx/webadapter.py
* move searx.search.get_search_query_from_webapp to searx.webadapter
* move searx.query.SearchQuery to searx.search
2020-09-22 18:59:51 +02:00
Alexandre Flament
ad0758e52a [mod] add searx/webutils.py
contains utility functions and classes used only by webapp.py
2020-09-22 11:57:06 +02:00
Alexandre Flament
6deb85072a [fix] searx.utils.HTMLTextExtractor: invalid HTML don't raise an Exception
Close #2188
2020-09-13 10:28:11 +02:00
Alexandre Flament
df12ed6e55 [mod] searx.RawTextQuery: the constructor call parse_query 2020-09-12 15:25:58 +02:00
Dalf
c225db45c8 Drop Python 2 (4/n): SearchQuery.query is a str instead of bytes 2020-09-10 10:49:42 +02:00
Dalf
7888377743 Drop Python 2 (3/n): objects 2020-09-10 10:39:04 +02:00
Dalf
1022228d95 Drop Python 2 (1/n): remove unicode string and url_utils 2020-09-10 10:39:04 +02:00
Noémi Ványi
f0ca1c3483
[enh] Add command line engines: git grep, find, etc. (#2128)
A new "base" engine called command is introduced. It is the foundation for all command line engines for now.
You can use this engine to create your own command line engine.

Add some engines (commented out to make sure no one enables anything accidentally):
* git grep: This engine lets you grep in the searx repo.
* locate: If locate is installed and initialized, you can search on the FS.
* find: You can find files with a specific name from where you started searx.
* pattern search in files: This engine utilizes the command fgrep.
* regex search in files: This engine runs `grep` to find a file based on its contents.
2020-09-08 09:51:53 +02:00
Alexandre Flament
b329058c1a Revert "[enh] test: load each engine to check for syntax errors"
This reverts commit 4fb3ed2c63.
2020-08-31 19:00:06 +02:00
Dalf
4fb3ed2c63 [enh] test: load each engine to check for syntax errors 2020-08-28 12:12:32 +02:00
Lukas van den Berk
4829a76aae
Created new plugin type custom_results. Added new plugin bang_redirect (#2027)
* Made first attempt at the bangs redirects plugin.

* It redirects. But in a messy way via javascript.

* First version with custom plugin

* Added a help page and a operator to see all the bangs available.

* Changed to .format because of support

* Changed to .format because of support

* Removed : in params

* Fixed path to json file and changed bang operator

* Changed bang operator back to &

* Made first attempt at the bangs redirects plugin.

* It redirects. But in a messy way via javascript.

* First version with custom plugin

* Added a help page and a operator to see all the bangs available.

* Changed to .format because of support

* Changed to .format because of support

* Removed : in params

* Fixed path to json file and changed bang operator

* Changed bang operator back to &

* Refactored getting search query. Also changed bang operator to ! and is now working.

* Removed prints

* Removed temporary bangs_redirect.js file. Updated plugin documentation

* Added unit test for the bangs plugin

* Fixed a unit test and added 2 more for bangs plugin

* Changed back to default settings.yml

* Added myself to AUTHORS.rst

* Refacored working of custom plugin.

* Refactored _get_bangs_data from list to dict to improve search speed.

* Decoupled bangs plugin from webserver with redirect_url

* Refactored bangs unit tests

* Fixed unit test bangs. Removed dubbel parsing in bangs.py

* Removed a dumb print statement

* Refactored bangs plugin to core engine.

* Removed bangs plugin.

* Refactored external bangs unit tests from plugin to core.

* Removed custom_results/bangs documentation from plugins.rst

* Added newline in settings.yml so the PR stays clean.

* Changed searx/plugins/__init__.py back to the old file

* Removed newline search.py

* Refactored get_external_bang_operator from utils to external_bang.py

* Removed unnecessary import form test_plugins.py

* Removed _parseExternalBang and _isExternalBang from query.py

* Removed get_external_bang_operator since it was not necessary

* Simplified external_bang.py

* Simplified external_bang.py

* Moved external_bangs unit tests to test_webapp.py. Fixed return in search with external_bang

* Refactored query parsing to unicode to support python2

* Refactored query parsing to unicode to support python2

* Refactored bangs plugin to core engine.

* Refactored search parameter to search_query in external_bang.py
2020-07-03 13:25:04 +00:00
Markus Heiser
eae3481688 [fix] commit 2c6531b2 breaks the unit test, this is a hotfix
commit 2c6531b2 does not only break the unit test, it is a significant change of
the data model and the searx search-syntax model (UI) without any discussion nor
documentation.

At the end, adding routes to instant answers is a nice feature but commit
2c6531b2 leaf some questions open.

In that sense, this patch is only a hotfix not a assessment.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-06-11 10:15:15 +02:00
Adam Tauber
8e727ac77f [fix] update csv unit test 2020-03-13 01:05:02 +01:00
Noémi Ványi
99435381a8 [enh] introduce private engines
This PR adds a new setting to engines named `tokens`.
It expects a list of tokens which lets searx validate
if the request should be accepted or not.
2020-02-08 11:47:39 +01:00
Adam Tauber
52ccaa7acc [mod] remove useless engine unit tests
These tests are not able to detect engine errors if the upstream
site changes.
2019-12-21 21:15:09 +01:00
Marc Abonce Seguin
ccaf6ca02c [fix] update xpaths for new google results page 2019-12-07 16:37:24 -07:00
Marc Abonce Seguin
9299355570 add seedpeer again 2019-11-24 22:01:44 -07:00
Adam Tauber
3c425f09c1 [fix] remove useless engine tests 2019-10-16 15:32:21 +02:00
Adam Tauber
e74bdf8429 [fix] engine test 2019-10-14 15:09:25 +02:00
Léo Bourrel
88261e111c Fix bing engine results count (#1387)
This PR fixes the result count from bing which was throwing an (hidden) error and add a validation to avoid reading more results than avalaible.

For example :
If there is 100 results from some search and we try to get results from 120 to 130, Bing will send back the results from 0 to 10 and no error. If we compare results count with the first parameter of the request we can avoid this "invalid" results.
2019-08-05 16:15:40 +02:00
Dalf
fcc9587ee9 [fix] fdroid engine 2019-08-05 15:44:02 +02:00
Dalf
9ff5001816 [fix] arxiv engine 2019-08-05 15:43:01 +02:00
Alexandre Flament
333e54943d
[fix] fix monkey patch in test_webapp.py (#1667)
at the end of test_webapp.py, the monkey patch of searx.search.Search was not revert which lead to side effects on other tests
close #1663
2019-08-03 13:23:36 +02:00
Alexandre Flament
72029d27de
[enh] Add timeout limit per request (#1640)
The new url parameter "timeout_limit" set timeout limit defined in second.
Example "timeout_limit=1.5" means the timeout limit is 1.5 seconds.

In addition, the query can start with <[number] to set the timeout limit.

For number between 0 and 99, the unit is the second :
Example: "<30 searx" means the timeout limit is 3 seconds

For number above 100, the unit is the millisecond:
Example: "<850 searx" means the timeout is 850 milliseconds.

In addition, there is a new optional setting: outgoing.max_request_timeout.
If not set, the user timeout can't go above searx configuration (as before: the max timeout of selected engine for a query).

If the value is set, the user can set a timeout between 0 and max_request_timeout using
<[number] or timeout_limit query parameter.

Related to #1077
Updated version of PR #1413 from @isj-privacore
2019-08-02 13:50:51 +02:00
Alexandre Flament
2179079a91
[fix] fix flickr_noapi decoding (#1655)
Characters that were not ASCII were incorrectly decoded.
Add an helper function: searx.utils.ecma_unescape (Python implementation of unescape Javascript function).
2019-08-02 13:37:13 +02:00
Dalf
6e0285b2db [fix] wikidata engine: faster processing, remove one HTTP redirection.
* Search URL is https://www.wikidata.org/w/index.php?{query}&ns0=1 (with ns0=1 at the end to avoid an HTTP redirection)
* url_detail: remove the disabletidy=1 deprecated parameter
* Add eval_xpath function: compile once for all xpath.
* Add get_id_cache: retrieve all HTML with an id, avoid the slow to procress dynamic xpath '//div[@id="{propertyid}"]'.replace('{propertyid}')
* Create an etree.HTMLParser() instead of using the global one (see #1575)
2019-07-29 07:39:39 +02:00
Frank de Lange
cbc5e13275 [enh] flickr_noapi: use complete JSON data block, add 'content', 'img_format', 'source', etc. (#1571)
Fetch complete JSON data block, use legend to extract images. 
Unquote urlencoded strings.
Add image description as 'content'. 
Add 'img_format' and 'source' data (needs PR #1567 to enable this data to be displayed). 
Show images which lack ownerid instead of discarding them.
2019-07-28 10:42:00 +02:00
Frank de Lange
204a2cbbf0 [fix] bing_videos (#1579)
use JSON where possible, compose 'content' using all available data, use correct 'url' (direct to source instead of redirect through bing)
2019-07-27 17:49:30 +02:00
Frank de Lange
11fc9913e9 [enh] bing_images: use data from embedded JSON to improve results (e.g. real page title) (#1568)
use data from embedded JSON to improve results (e.g. real page title), add image format and source info (see PR #1567), improve paging logic (it now works)
2019-07-27 08:22:02 +02:00
volth
eb182df132 [mod] restore btdigg engine as btdig.com (#1515) 2019-07-25 08:40:48 +02:00
rachmadani haryono
3b1122c5fa [fix] fix duden engine (#1594) 2019-07-25 08:17:45 +02:00
Alexandre Flament
554a21e1d0
[enh] Add Server-Timing header (#1637)
Server Timing specification: https://www.w3.org/TR/server-timing/

In the browser Dev Tools, focus on the main request, there are the responses per engine in the Timing tab.
2019-07-17 10:38:45 +02:00
rachmadani haryono
ec88fb8a0f [fix] secret_key can be bytes instead of a string (#1602)
Fix #1600
In settings.yml, the secret_key can be written as string or as base64 encoded data using !!binary notation.
2019-07-17 10:09:09 +02:00
rachmadani haryono
8f44014627 [fix] preference query parameter decoding (#1599)
Fix issue #1598
2019-07-17 09:42:40 +02:00
rachmadani haryono
ac357b12e3
Merge branch 'master' into feature/fix-config 2019-05-28 19:16:58 +08:00
Dalf
ffe0972f91 Remove some engines : subtitleseeker, seedpeer, swisscows
http://www.subtitleseeker.com and http://www.seedpeer.eu don't exist anymore.
https://swisscows.ch/ has change : the engine needs to be updated
2019-05-28 04:06:35 +02:00
rachmadaniHaryono
9afc1b1e83 new: dev: test for config api 2019-05-18 18:58:05 +08:00
Marc Abonce Seguin
3e1c2153f7 [fix] duckduckgo images requests 2019-04-13 00:38:37 -05:00
Marc Abonce Seguin
f2d49a6971 [fix] get youtube results from js object
Results are not appearing in the html document anymore,
instead they are found inside an object embedded in a script.
2019-03-26 21:09:15 -06:00
d-tux
f1814079f0
Merge branch 'master' into engines/unsplash 2019-01-14 13:40:57 +01:00
Marc Abonce Seguin
626a8e9ac9 [fix] unicode error with WolframAlpha API engine 2019-01-08 21:02:23 -06:00
d-tux
329172f66e
Merge branch 'master' into engines/unsplash 2019-01-08 09:24:45 +01:00
Noémi Ványi
97351a2c72 fix after rebase 2019-01-07 21:28:58 +01:00
Noémi Ványi
b63d645a52 Revert "remove 'all' option from search languages"
This reverts commit 4d1770398a.
2019-01-07 21:19:00 +01:00
Marc Abonce Seguin
0169b63e84 [fix] fetch google's supported languages 2019-01-06 21:31:45 -06:00