zyte-spider-templates

First steps

  • Initial setup

Templates

  • Spider templates
  • E-commerce
  • Article
  • Google search
    • Basic use
    • Parameters
      • GoogleSearchSpiderParams
        • GoogleSearchSpiderParams.cr
        • GoogleSearchSpiderParams.domain
        • GoogleSearchSpiderParams.geolocation
        • GoogleSearchSpiderParams.gl
        • GoogleSearchSpiderParams.hl
        • GoogleSearchSpiderParams.item_type
        • GoogleSearchSpiderParams.lr
        • GoogleSearchSpiderParams.max_pages
        • GoogleSearchSpiderParams.max_requests
        • GoogleSearchSpiderParams.results_per_page
        • GoogleSearchSpiderParams.search_queries
  • Job posting

Features

  • Search queries

Customization

  • Customization
  • Customizing spider templates
  • Customizing page objects

Reference

  • Settings
  • Request.meta keys
  • API

All the rest

  • Changes
zyte-spider-templates
  • Google search spider template (google_search)
  • View page source

Google search spider template (google_search)

Basic use

scrapy crawl google_search -a search_queries="foo bar"

Parameters

pydantic model zyte_spider_templates.spiders.serp.GoogleSearchSpiderParams[source]
field cr: str | None = None

Restricts search results to documents originating in particular countries. See https://developers.google.com/custom-search/v1/reference/rest/v1/cse/list#body.QUERY_PARAMETERS.cr

field domain: GoogleDomain = GoogleDomain.google_com

Target Google domain.

field geolocation: Geolocation | None = None

Country of the IP addresses to use.

field gl: GoogleGl | None = None

Boosts results relevant to this country. See https://developers.google.com/custom-search/v1/reference/rest/v1/cse/list#body.QUERY_PARAMETERS.gl

field hl: GoogleHl | None = None

User interface language, which can affect search results. See https://developers.google.com/custom-search/v1/reference/rest/v1/cse/list#body.QUERY_PARAMETERS.hl

field item_type: SerpItemType = SerpItemType.off

If specified, follow organic search result links, and extract the selected data type from the target pages. Spider output items will be of the specified data type, not search engine results page items.

field lr: str | None = None

Restricts search results to documents written in the specified languages. See https://developers.google.com/custom-search/v1/reference/rest/v1/cse/list#body.QUERY_PARAMETERS.lr

field max_pages: int = 1

Maximum number of result pages to visit per search query.

field max_requests: int | None = 100

The maximum number of Zyte API requests allowed for the crawl.

Requests with error responses that cannot be retried or exceed their retry limit also count here, but they incur in no costs and do not increase the request count in Scrapy Cloud.

field results_per_page: int | None = None

Maximum number of results per page.

field search_queries: List[str] | None [Required]

Input 1 search query per line (e.g. foo bar).

Previous Next

© Copyright 2023, Zyte Group Ltd.

Built with Sphinx using a theme provided by Read the Docs.