Changes
0.12.0 (2025-03-31)
Search queries support is added to the job posting spider template.
Fixed support for POST requests in search queries.
Improved validation in the Google search spider template.
0.11.2 (2024-12-30)
Do not log warning about disabled components.
0.11.1 (2024-12-26)
The e-commerce and job posting spider templates no longer ignore item requests for a different domain.
0.11.0 (2024-12-16)
New Articles spider template, built on top of Zyte API’s article and articleNavigation.
New Job Posting spider template, built on top of Zyte API’s jobPosting and jobPostingNavigation.
Search queries support is added to the e-commerce spider template. This allows to provide a list of search queries to the spider; the spider finds a search form on the target webpage, and submits all the queries.
ProductList extraction support is added to the e-commerce spider template. This allows spiders to extract basic product information without going into product detail pages.
New features are added to the Google Search spider template:
An option to follow the result links and extract data from the target pages (via the
extractargument)Content Languages (lr) parameter
Content Countries (cr) parameter
User Country (gl) parameter
User Language (hl) parameter
results_per_page parameter
Added a Scrapy add-on. This allows to greatly simplify the initial zyte-spider-templates configuration.
Bug fix: incorrectly extracted URLs no longer make spiders drop other requests.
Cleaned up the CI; improved the testing suite; cleaned up the documentation.
0.10.0 (2024-11-22)
Dropped Python 3.8 support, added Python 3.13 support.
Increased the minimum required versions of some dependencies:
pydantic:2→2.1scrapy-poet:0.21.0→0.24.0scrapy-spider-metadata:0.1.2→0.2.0scrapy-zyte-api[provider]:0.16.0→0.23.0zyte-common-items:0.22.0→0.23.0
Added custom attributes support to the e-commerce spider template through its new
custom_attrs_inputandcustom_attrs_methodparameters.The
max_pagesparameter of the Google Search spider template can no longer be 0 or lower.The Google Search spider template now follows pagination for the results of each query page by page, instead of sending a request for every page in parallel. It stops once it reaches a page without organic results.
Improved the description of
EcommerceCrawlStrategyvalues.Fixed type hint issues related to Scrapy.
0.9.0 (2024-09-17)
Now requires
zyte-common-items >= 0.22.0.New Google Search spider template, built on top of Zyte API’s serp.
The heuristics of the e-commerce spider template to ignore certain URLs when following category links now also handles subdomains. For example, before https://example.com/blog was ignored, now https://blog.example.com is also ignored.
In the spider parameters JSON schema, the
crawl_strategyparameter of the e-commerce spider template switches position, from being the last parameter to being betweenurls_fileandgeolocation.Removed the
valid_page_typesattribute ofzyte_spider_templates.middlewares.CrawlingLogsMiddleware.
0.8.0 (2024-08-21)
Added new input parameters:
urlsaccepts a newline-delimited list of URLs.urls_fileaccepts a URL that points to a plain-text file with a newline-delimited list of URLs.
Only one of
url,urlsandurls_fileshould be used at a time.Added new crawling strategies:
automatic- uses heuristics to see if an input URL is a homepage, for which it uses a modifiedfullstrategy where other links are discovered only in the homepage. Otherwise, it assumes it’s a navigation page and uses the existingnavigationstrategy.direct_item- input URLs are directly extracted as products.
Added new parameters classes:
LocationParamandPostalAddress. Note that these are available for use when customizing the templates and are not currently being utilized by any template.Backward incompatible changes:
automaticbecomes the new default crawling strategy instead offull.
CI test improvements.
0.7.2 (2024-05-07)
Implemented mixin classes for spider parameters, to improve reuse.
Improved docs, providing an example about overriding existing parameters when customizing parameters, and featuring
AnyResponsein the example about overriding parsing.
0.7.1 (2024-02-22)
The
crawl_strategyparameter ofEcommerceSpidernow defaults tofullinstead ofnavigation. We also reworded some descriptions ofEcommerceCrawlStrategyvalues for clarification.
0.7.0 (2024-02-09)
Updated requirement versions:
scrapy-poet >= 0.21.0
scrapy-zyte-api >= 0.16.0
With the updated dependencies above, this fixes the issue of having 2 separate Zyte API Requests (productNavigation and httpResponseBody) for the same URL. Note that this issue only occurs when requesting product navigation pages.
Moved
zyte_spider_templates.spiders.ecommerce.ExtractFromintozyte_spider_templates.spiders.base.ExtractFrom.
0.6.1 (2024-02-02)
Improved the
zyte_spider_templates.spiders.base.BaseSpiderParams.urldescription.
0.6.0 (2024-01-31)
Fixed the
extract_fromspider parameter that wasn’t working.The “www.” prefix is now removed when setting the spider’s
allowed_domains.The
zyte_common_items.ProductNavigation.nextPagelink won’t be crawled ifzyte_common_items.ProductNavigation.itemsis empty.zyte_common_items.Productitems that are dropped due to low probability (below 0.1) are now logged in stats:drop_item/product/low_probability.zyte_spider_templates.pages.HeuristicsProductNavigationPagenow inherits fromzyte_common_items.AutoProductNavigationPageinstead ofzyte_common_items.BaseProductNavigationPage.Moved e-commerce code from
zyte_spider_templates.spiders.base.BaseSpidertozyte_spider_templates.spiders.ecommerce.EcommerceSpider.Documentation improvements.
0.5.0 (2023-12-18)
The
zyte_spider_templates.page_objectsmodule is now deprecated in favor ofzyte_spider_templates.pages, in line withweb_poet.pages.
0.4.0 (2023-12-14)
Products outside of the target domain can now be crawled using
zyte_spider_templates.middlewares.AllowOffsiteMiddleware.Updated the documentation to also set up
zyte_common_items.ZyteItemAdapter.The
max_requestsspider parameter has now a default value of 100. Previously, it wasNonewhich was unlimited.Improved the description of the
max_requestsspider parameter.Official support for Python 3.12.
Misc documentation improvements.
0.3.0 (2023-11-03)
Added documentation.
Added a middleware that logs information about the crawl in JSON format,
zyte_spider_templates.middlewares.CrawlingLogsMiddleware. This replaces the old crawling information that was difficult to parse using regular expressions.
0.2.0 (2023-10-30)
Now requires
zyte-common-items >= 0.12.0.Added a new crawl strategy, “Pagination Only”.
Improved the request priority calculation based on the metadata probability value.
CI improvements.
0.1.0 (2023-10-24)
Initial release.