Reference
Spiders
- class zyte_spider_templates.EcommerceSpider(*args: Any, **kwargs: Any)[source]
Yield products from an e-commerce website.
See
EcommerceSpiderParams
for supported parameters.See also
- class zyte_spider_templates.GoogleSearchSpider(*args: Any, **kwargs: Any)[source]
Yield results from Google searches.
See
GoogleSearchSpiderParams
for supported parameters.
Pages
Parameter mixins
- pydantic model zyte_spider_templates.params.ExtractFromParam[source]
- field extract_from: ExtractFrom | None = None
Whether to perform extraction using a browser request (browserHtml) or an HTTP request (httpResponseBody).
- enum zyte_spider_templates.params.ExtractFrom(value)[source]
- Member Type:
Valid values are as follows:
- pydantic model zyte_spider_templates.params.GeolocationParam[source]
- field geolocation: Geolocation | None = None
ISO 3166-1 alpha-2 2-character string specified in https://docs.zyte.com/zyte-api/usage/reference.html#operation/extract/request/geolocation.
- enum zyte_spider_templates.params.Geolocation(value)[source]
- Member Type:
Valid values are as follows:
- pydantic model zyte_spider_templates.params.UrlParam[source]
- field url: str = ''
Initial URL for the crawl. Enter the full URL including http(s), you can copy and paste it from your browser. Example: https://toscrape.com/
- Constraints:
pattern = ^https?://[^:/s]+(:d{1,5})?(/[^s]*)*(#[^s]*)?$
- pydantic model zyte_spider_templates.spiders.ecommerce.EcommerceCrawlStrategyParam[source]
- field crawl_strategy: EcommerceCrawlStrategy = EcommerceCrawlStrategy.automatic
Determines how the start URL and follow-up URLs are crawled.
- enum zyte_spider_templates.spiders.ecommerce.EcommerceCrawlStrategy(value)[source]
- Member Type:
Valid values are as follows:
- automatic: str = <EcommerceCrawlStrategy.automatic: 'automatic'>
Automatically use the best crawl strategy based on the given URL inputs.
If given a homepage URL, it would attempt to crawl as many products it can discover. Otherwise, it attempt to crawl the products on a given page category.
- full: str = <EcommerceCrawlStrategy.full: 'full'>
Follow most links within the domain of URL in an attempt to discover and extract as many products as possible.
Follow pagination, subcategories, and product detail pages.
Pagination Only is a better choice if the target URL does not have subcategories, or if Zyte API is misidentifying some URLs as subcategories.