Changes

0.12.0 (2025-03-31)

0.11.2 (2024-12-30)

  • Do not log warning about disabled components.

0.11.1 (2024-12-26)

0.11.0 (2024-12-16)

  • New Articles spider template, built on top of Zyte API’s article and articleNavigation.

  • New Job Posting spider template, built on top of Zyte API’s jobPosting and jobPostingNavigation.

  • Search queries support is added to the e-commerce spider template. This allows to provide a list of search queries to the spider; the spider finds a search form on the target webpage, and submits all the queries.

  • ProductList extraction support is added to the e-commerce spider template. This allows spiders to extract basic product information without going into product detail pages.

  • New features are added to the Google Search spider template:

    • An option to follow the result links and extract data from the target pages (via the extract argument)

    • Content Languages (lr) parameter

    • Content Countries (cr) parameter

    • User Country (gl) parameter

    • User Language (hl) parameter

    • results_per_page parameter

  • Added a Scrapy add-on. This allows to greatly simplify the initial zyte-spider-templates configuration.

  • Bug fix: incorrectly extracted URLs no longer make spiders drop other requests.

  • Cleaned up the CI; improved the testing suite; cleaned up the documentation.

0.10.0 (2024-11-22)

0.9.0 (2024-09-17)

0.8.0 (2024-08-21)

  • Added new input parameters:

    • urls accepts a newline-delimited list of URLs.

    • urls_file accepts a URL that points to a plain-text file with a newline-delimited list of URLs.

    Only one of url, urls and urls_file should be used at a time.

  • Added new crawling strategies:

    • automatic - uses heuristics to see if an input URL is a homepage, for which it uses a modified full strategy where other links are discovered only in the homepage. Otherwise, it assumes it’s a navigation page and uses the existing navigation strategy.

    • direct_item - input URLs are directly extracted as products.

  • Added new parameters classes: LocationParam and PostalAddress. Note that these are available for use when customizing the templates and are not currently being utilized by any template.

  • Backward incompatible changes:

    • automatic becomes the new default crawling strategy instead of full.

  • CI test improvements.

0.7.2 (2024-05-07)

0.7.1 (2024-02-22)

0.7.0 (2024-02-09)

  • Updated requirement versions:

  • With the updated dependencies above, this fixes the issue of having 2 separate Zyte API Requests (productNavigation and httpResponseBody) for the same URL. Note that this issue only occurs when requesting product navigation pages.

  • Moved zyte_spider_templates.spiders.ecommerce.ExtractFrom into zyte_spider_templates.spiders.base.ExtractFrom.

0.6.1 (2024-02-02)

  • Improved the zyte_spider_templates.spiders.base.BaseSpiderParams.url description.

0.6.0 (2024-01-31)

0.5.0 (2023-12-18)

  • The zyte_spider_templates.page_objects module is now deprecated in favor of zyte_spider_templates.pages, in line with web_poet.pages.

0.4.0 (2023-12-14)

  • Products outside of the target domain can now be crawled using zyte_spider_templates.middlewares.AllowOffsiteMiddleware.

  • Updated the documentation to also set up zyte_common_items.ZyteItemAdapter.

  • The max_requests spider parameter has now a default value of 100. Previously, it was None which was unlimited.

  • Improved the description of the max_requests spider parameter.

  • Official support for Python 3.12.

  • Misc documentation improvements.

0.3.0 (2023-11-03)

0.2.0 (2023-10-30)

  • Now requires zyte-common-items >= 0.12.0.

  • Added a new crawl strategy, “Pagination Only”.

  • Improved the request priority calculation based on the metadata probability value.

  • CI improvements.

0.1.0 (2023-10-24)

Initial release.