E-commerce spider template (`ecommerce`)

Basic use

scrapy crawl ecommerce -a url="https://books.toscrape.com"

Parameters

pydantic model zyte_spider_templates.spiders.ecommerce.EcommerceSpiderParams[source]

Config:

json_schema_extra: dict = {‘groups’: [{‘id’: ‘inputs’, ‘title’: ‘Inputs’, ‘description’: ‘Input data that determines the start URLs of the crawl.’, ‘widget’: ‘exclusive’}]}

Validators:

single_input » all fields

field crawl_strategy: EcommerceCrawlStrategy = EcommerceCrawlStrategy.full

Determines how the start URL and follow-up URLs are crawled.

Validated by:

single_input

field extract_from: ExtractFrom | None = None

Whether to perform extraction using a browser request (browserHtml) or an HTTP request (httpResponseBody).

Validated by:

single_input

field geolocation: Geolocation | None = None

ISO 3166-1 alpha-2 2-character string specified in https://docs.zyte.com/zyte-api/usage/reference.html#operation/extract/request/geolocation.

Validated by:

single_input

field max_requests: int | None = 100

The maximum number of Zyte API requests allowed for the crawl.

Requests with error responses that cannot be retried or exceed their retry limit also count here, but they incur in no costs and do not increase the request count in Scrapy Cloud.

Validated by:

single_input

field url: str = ''

Initial URL for the crawl. Enter the full URL including http(s), you can copy and paste it from your browser. Example: https://toscrape.com/

Constraints:

pattern = ^https?://[^:/s]+(:d{1,5})?(/[^s]*)*(#[^s]*)?$

Validated by:

single_input

field urls_file: str = ''

URL that point to a plain-text file with a list of URLs to crawl, e.g. https://example.com/url-list.txt. The linked list must contain 1 URL per line.

Constraints:

pattern = ^https?://[^:/s]+(:d{1,5})?(/[^s]*)*(#[^s]*)?$

Validated by:

single_input

validator single_input » all fields: Fields url and urls_file form a mandatory, mutually-exclusive field group: one of them must be defined, the rest must not be defined.

E-commerce spider template (ecommerce)

Basic use

Parameters

E-commerce spider template (`ecommerce`)