searxng/searx/engines/deviantart.py

# SPDX-License-Identifier: AGPL-3.0-or-later
# lint: pylint
"""Deviantart (Images)

"""

import urllib.parse
from lxml import html

from searx.utils import extract_text, eval_xpath, eval_xpath_list

# about
about = {
    "website": 'https://www.deviantart.com/',
    "wikidata_id": 'Q46523',
    "official_api_documentation": 'https://www.deviantart.com/developers/',
    "use_official_api": False,
    "require_api_key": False,
    "results": 'HTML',
}

# engine dependent config
categories = ['images']
paging = True

# search-url
base_url = 'https://www.deviantart.com'

results_xpath = '//div[@class="_2pZkk"]/div/div/a'
url_xpath = './@href'
thumbnail_src_xpath = './div/img/@src'
img_src_xpath = './div/img/@srcset'
title_xpath = './@aria-label'
premium_xpath = '../div/div/div/text()'
premium_keytext = 'Watch the artist to view this deviation'
cursor_xpath = '(//a[@class="_1OGeq"]/@href)[last()]'


def request(query, params):

    # https://www.deviantart.com/search?q=foo

    nextpage_url = params['engine_data'].get('nextpage')
    # don't use nextpage when user selected to jump back to page 1
    if params['pageno'] > 1 and nextpage_url is not None:
        params['url'] = nextpage_url
    else:
        params['url'] = f"{base_url}/search?{urllib.parse.urlencode({'q': query})}"

    return params


def response(resp):

    results = []
    dom = html.fromstring(resp.text)

    for result in eval_xpath_list(dom, results_xpath):
        # skip images that are blurred
        _text = extract_text(eval_xpath(result, premium_xpath))
        if _text and premium_keytext in _text:
            continue
        img_src = extract_text(eval_xpath(result, img_src_xpath))
        if img_src:
            img_src = img_src.split(' ')[0]
            parsed_url = urllib.parse.urlparse(img_src)
            img_src = parsed_url._replace(path=parsed_url.path.split('/v1')[0]).geturl()

        results.append(
            {
                'template': 'images.html',
                'url': extract_text(eval_xpath(result, url_xpath)),
                'img_src': img_src,
                'thumbnail_src': extract_text(eval_xpath(result, thumbnail_src_xpath)),
                'title': extract_text(eval_xpath(result, title_xpath)),
            }
        )

    nextpage_url = extract_text(eval_xpath(dom, cursor_xpath))
    if nextpage_url:
        results.append(
            {
                'engine_data': nextpage_url.replace("http://", "https://"),
                'key': 'nextpage',
            }
        )

    return results
[enh] engines: add about variable move meta information from comment to the about variable so the preferences, the documentation can show these information 2021-01-13 10:31:25 +00:00			`# SPDX-License-Identifier: AGPL-3.0-or-later`
[pylint] tag PYLINT_FILES by comment `# lint: pylint` These py files are linted by `test.pylint`, all other files are linted by `test.pep8`. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de> 2021-04-26 18:18:20 +00:00			`# lint: pylint`
[fix] engine deviantart: review of the result-scrapper The deviantart site changed and hence deviantart is currently unusable. 2023-09-08 10:08:14 +00:00			`"""Deviantart (Images)`

update versions.cfg to use the current up-to-date packages 2015-05-02 13:45:17 +00:00			`"""`
add comments to deviantart engine 2014-09-02 14:48:18 +00:00
[fix] engine deviantart: review of the result-scrapper The deviantart site changed and hence deviantart is currently unusable. 2023-09-08 10:08:14 +00:00			`import urllib.parse`
[refactor] deviantart - improve results and clean up source code Devian's request and response forms has been changed. - fixed title - fixed time_range_dict to 'popular--**' - use image from <noscript> if exists - drop obsolete "http to https, remove domain sharding" - use query URL https://www.deviantart.com/search/deviations?page=5&q=foo - add searx/engines/deviantart.py to pylint check (test.pylint) Error pattern:: There DEBUG:searx:result: invalid title: {'url': 'https://www.deviantart.com/ ... Signed-off-by: Markus Heiser <markus.heiser@darmarit.de> 2020-11-03 07:44:41 +00:00			`from lxml import html`
[enh] deviantart engine added 2013-10-20 09:12:10 +00:00
[fix] engine deviantart: review of the result-scrapper The deviantart site changed and hence deviantart is currently unusable. 2023-09-08 10:08:14 +00:00			`from searx.utils import extract_text, eval_xpath, eval_xpath_list`

[enh] engines: add about variable move meta information from comment to the about variable so the preferences, the documentation can show these information 2021-01-13 10:31:25 +00:00			`# about`
			`about = {`
			`"website": 'https://www.deviantart.com/',`
			`"wikidata_id": 'Q46523',`
			`"official_api_documentation": 'https://www.deviantart.com/developers/',`
			`"use_official_api": False,`
			`"require_api_key": False,`
			`"results": 'HTML',`
			`}`

add comments to deviantart engine 2014-09-02 14:48:18 +00:00			`# engine dependent config`
[enh] deviantart engine added 2013-10-20 09:12:10 +00:00			`categories = ['images']`
add comments to deviantart engine 2014-09-02 14:48:18 +00:00			`paging = True`
[enh] paging support for deviantart 2014-01-29 23:09:47 +00:00
[refactor] deviantart - improve results and clean up source code Devian's request and response forms has been changed. - fixed title - fixed time_range_dict to 'popular--**' - use image from <noscript> if exists - drop obsolete "http to https, remove domain sharding" - use query URL https://www.deviantart.com/search/deviations?page=5&q=foo - add searx/engines/deviantart.py to pylint check (test.pylint) Error pattern:: There DEBUG:searx:result: invalid title: {'url': 'https://www.deviantart.com/ ... Signed-off-by: Markus Heiser <markus.heiser@darmarit.de> 2020-11-03 07:44:41 +00:00			`# search-url`
			`base_url = 'https://www.deviantart.com'`
[fix] pep/flake8 compatibility 2014-01-20 01:31:20 +00:00
[fix] engine deviantart: review of the result-scrapper The deviantart site changed and hence deviantart is currently unusable. 2023-09-08 10:08:14 +00:00			`results_xpath = '//div[@class="_2pZkk"]/div/div/a'`
			`url_xpath = './@href'`
			`thumbnail_src_xpath = './div/img/@src'`
			`img_src_xpath = './div/img/@srcset'`
			`title_xpath = './@aria-label'`
			`premium_xpath = '../div/div/div/text()'`
			`premium_keytext = 'Watch the artist to view this deviation'`
			`cursor_xpath = '(//a[@class="_1OGeq"]/@href)[last()]'`
[format.python] initial formatting of the python code This patch was generated by black [1]:: make format.python [1] https://github.com/psf/black Signed-off-by: Markus Heiser <markus.heiser@darmarit.de> 2021-12-27 08:26:22 +00:00
add year filter to engines with time range support && tests Following engines does not support "Last year": * Bing News * DeviantArt * DuckDuckGo * Yahoo * YouTube (noapi) 2016-12-11 15:41:14 +00:00
[fix] engine deviantart: review of the result-scrapper The deviantart site changed and hence deviantart is currently unusable. 2023-09-08 10:08:14 +00:00			`def request(query, params):`
[refactor] deviantart - improve results and clean up source code Devian's request and response forms has been changed. - fixed title - fixed time_range_dict to 'popular--**' - use image from <noscript> if exists - drop obsolete "http to https, remove domain sharding" - use query URL https://www.deviantart.com/search/deviations?page=5&q=foo - add searx/engines/deviantart.py to pylint check (test.pylint) Error pattern:: There DEBUG:searx:result: invalid title: {'url': 'https://www.deviantart.com/ ... Signed-off-by: Markus Heiser <markus.heiser@darmarit.de> 2020-11-03 07:44:41 +00:00
[fix] engine deviantart: review of the result-scrapper The deviantart site changed and hence deviantart is currently unusable. 2023-09-08 10:08:14 +00:00			`# https://www.deviantart.com/search?q=foo`
add comments to deviantart engine 2014-09-02 14:48:18 +00:00
[fix] engine deviantart: review of the result-scrapper The deviantart site changed and hence deviantart is currently unusable. 2023-09-08 10:08:14 +00:00			`nextpage_url = params['engine_data'].get('nextpage')`
			`# don't use nextpage when user selected to jump back to page 1`
			`if params['pageno'] > 1 and nextpage_url is not None:`
			`params['url'] = nextpage_url`
			`else:`
			`params['url'] = f"{base_url}/search?{urllib.parse.urlencode({'q': query})}"`
[enh] deviantart engine added 2013-10-20 09:12:10 +00:00
[refactor] deviantart - improve results and clean up source code Devian's request and response forms has been changed. - fixed title - fixed time_range_dict to 'popular--**' - use image from <noscript> if exists - drop obsolete "http to https, remove domain sharding" - use query URL https://www.deviantart.com/search/deviations?page=5&q=foo - add searx/engines/deviantart.py to pylint check (test.pylint) Error pattern:: There DEBUG:searx:result: invalid title: {'url': 'https://www.deviantart.com/ ... Signed-off-by: Markus Heiser <markus.heiser@darmarit.de> 2020-11-03 07:44:41 +00:00			`return params`
[enh] deviantart engine added 2013-10-20 09:12:10 +00:00
[format.python] initial formatting of the python code This patch was generated by black [1]:: make format.python [1] https://github.com/psf/black Signed-off-by: Markus Heiser <markus.heiser@darmarit.de> 2021-12-27 08:26:22 +00:00
[enh] deviantart engine added 2013-10-20 09:12:10 +00:00			`def response(resp):`
add comments to deviantart engine 2014-09-02 14:48:18 +00:00
[refactor] deviantart - improve results and clean up source code Devian's request and response forms has been changed. - fixed title - fixed time_range_dict to 'popular--**' - use image from <noscript> if exists - drop obsolete "http to https, remove domain sharding" - use query URL https://www.deviantart.com/search/deviations?page=5&q=foo - add searx/engines/deviantart.py to pylint check (test.pylint) Error pattern:: There DEBUG:searx:result: invalid title: {'url': 'https://www.deviantart.com/ ... Signed-off-by: Markus Heiser <markus.heiser@darmarit.de> 2020-11-03 07:44:41 +00:00			`results = []`
[enh] deviantart engine added 2013-10-20 09:12:10 +00:00			`dom = html.fromstring(resp.text)`
Flake8 2015-01-17 18:24:35 +00:00
[fix] engine deviantart: review of the result-scrapper The deviantart site changed and hence deviantart is currently unusable. 2023-09-08 10:08:14 +00:00			`for result in eval_xpath_list(dom, results_xpath):`
			`# skip images that are blurred`
			`_text = extract_text(eval_xpath(result, premium_xpath))`
			`if _text and premium_keytext in _text:`
			`continue`
			`img_src = extract_text(eval_xpath(result, img_src_xpath))`
			`if img_src:`
			`img_src = img_src.split(' ')[0]`
			`parsed_url = urllib.parse.urlparse(img_src)`
			`img_src = parsed_url._replace(path=parsed_url.path.split('/v1')[0]).geturl()`

			`results.append(`
			`{`
			`'template': 'images.html',`
			`'url': extract_text(eval_xpath(result, url_xpath)),`
			`'img_src': img_src,`
			`'thumbnail_src': extract_text(eval_xpath(result, thumbnail_src_xpath)),`
			`'title': extract_text(eval_xpath(result, title_xpath)),`
			`}`
			`)`

			`nextpage_url = extract_text(eval_xpath(dom, cursor_xpath))`
			`if nextpage_url:`
			`results.append(`
			`{`
			`'engine_data': nextpage_url.replace("http://", "https://"),`
			`'key': 'nextpage',`
			`}`
			`)`
[refactor] deviantart - improve results and clean up source code Devian's request and response forms has been changed. - fixed title - fixed time_range_dict to 'popular--**' - use image from <noscript> if exists - drop obsolete "http to https, remove domain sharding" - use query URL https://www.deviantart.com/search/deviations?page=5&q=foo - add searx/engines/deviantart.py to pylint check (test.pylint) Error pattern:: There DEBUG:searx:result: invalid title: {'url': 'https://www.deviantart.com/ ... Signed-off-by: Markus Heiser <markus.heiser@darmarit.de> 2020-11-03 07:44:41 +00:00
[enh] deviantart engine added 2013-10-20 09:12:10 +00:00			`return results`