E-commerce spider template (`ecommerce`)

Basic use

scrapy crawl ecommerce -a url="https://books.toscrape.com"

Parameters

pydantic model zyte_spider_templates.spiders.ecommerce.EcommerceSpiderParams[source]

field crawl_strategy: EcommerceCrawlStrategy = EcommerceCrawlStrategy.automatic: Determines how the start URL and follow-up URLs are crawled.

field custom_attrs_input: Json[Dict[str, Any]] | None = None: Custom attributes to extract.

field custom_attrs_method: CustomAttrsMethod = CustomAttrsMethod.generate: Which model to use for custom attribute extraction.

field extract: EcommerceExtract = EcommerceExtract.product: Data to return.

field extract_from: ExtractFrom | None = None: Whether to perform extraction using a browser request (browserHtml) or an HTTP request (httpResponseBody).

field geolocation: Geolocation | None = None: Country of the IP addresses to use.

field max_requests: int | None = 100

The maximum number of Zyte API requests allowed for the crawl.

Requests with error responses that cannot be retried or exceed their retry limit also count here, but they incur in no costs and do not increase the request count in Scrapy Cloud.

field search_queries: List[str] [Optional]: A list of search queries, one per line, to submit using the search form found on each input URL. Only works for input URLs that support search. May not work on every website. Search queries are not compatible with the “full” and “navigation” crawl strategies, and when extracting products, they are not compatible with the “direct_item” crawl strategy either.

field url: str = '': Initial URL for the crawl. Enter the full URL including http(s), you can copy and paste it from your browser. Example: https://toscrape.com/

field urls: List[str] | None = None: Initial URLs for the crawl, separated by new lines. Enter the full URL including http(s), you can copy and paste it from your browser. Example: https://toscrape.com/

field urls_file: str = '': URL that point to a plain-text file with a list of URLs to crawl, e.g. https://example.com/url-list.txt. The linked file must contain 1 URL per line.

Settings

The following zyte-spider-templates settings may be useful for the e-commerce spider template:

MAX_REQUESTS_PER_SEED: Limit the number of follow-up requests per initial URL.

E-commerce spider template (ecommerce)

Basic use

Parameters

Settings

E-commerce spider template (`ecommerce`)