E-commerce spider template (ecommerce
)
Basic use
scrapy crawl ecommerce -a url="https://books.toscrape.com"
Parameters
- pydantic model zyte_spider_templates.spiders.ecommerce.EcommerceSpiderParams[source]
- Config:
json_schema_extra: dict = {‘groups’: [{‘id’: ‘inputs’, ‘title’: ‘Inputs’, ‘description’: ‘Input data that determines the start URLs of the crawl.’, ‘widget’: ‘exclusive’}]}
- Validators:
single_input
»all fields
- field crawl_strategy: EcommerceCrawlStrategy = EcommerceCrawlStrategy.full
Determines how the start URL and follow-up URLs are crawled.
- Validated by:
single_input
- field extract_from: ExtractFrom | None = None
Whether to perform extraction using a browser request (browserHtml) or an HTTP request (httpResponseBody).
- Validated by:
single_input
- field geolocation: Geolocation | None = None
ISO 3166-1 alpha-2 2-character string specified in https://docs.zyte.com/zyte-api/usage/reference.html#operation/extract/request/geolocation.
- Validated by:
single_input
- field max_requests: int | None = 100
The maximum number of Zyte API requests allowed for the crawl.
Requests with error responses that cannot be retried or exceed their retry limit also count here, but they incur in no costs and do not increase the request count in Scrapy Cloud.
- Validated by:
single_input
- field url: str = ''
Initial URL for the crawl. Enter the full URL including http(s), you can copy and paste it from your browser. Example: https://toscrape.com/
- Constraints:
pattern = ^https?://[^:/s]+(:d{1,5})?(/[^s]*)*(#[^s]*)?$
- Validated by:
single_input
- field urls_file: str = ''
URL that point to a plain-text file with a list of URLs to crawl, e.g. https://example.com/url-list.txt. The linked list must contain 1 URL per line.
- Constraints:
pattern = ^https?://[^:/s]+(:d{1,5})?(/[^s]*)*(#[^s]*)?$
- Validated by:
single_input