Initial setup

Learn how to get spider templates installed and configured on an existing Scrapy project.

Tip

If you do not have a Scrapy project yet, use zyte-spider-templates-project as a starting template to get started quickly.

Requirements

  • Python 3.8+

  • Scrapy 2.11+

For Zyte API features, including AI-powered parsing, you need a Zyte API subscription.

Installation

pip install zyte-spider-templates

Configuration

In your Scrapy project settings (usually in settings.py):

For Zyte API features, including AI-powered parsing, configure scrapy-zyte-api with scrapy-poet integration.

The following additional settings are recommended:

  • Set CLOSESPIDER_TIMEOUT_NO_ITEM to 600, to force the spider to stop if no item has been found for 10 minutes.

  • Set SCHEDULER_DISK_QUEUE to "scrapy.squeues.PickleFifoDiskQueue" and SCHEDULER_MEMORY_QUEUE to "scrapy.squeues.FifoMemoryQueue", for better request priority handling.

  • Update SPIDER_MIDDLEWARES to include "zyte_spider_templates.middlewares.CrawlingLogsMiddleware": 1000, to log crawl data in JSON format for debugging purposes.

  • Ensure that zyte_common_items.ZyteItemAdapter is also configured:

    from itemadapter import ItemAdapter
    from zyte_common_items import ZyteItemAdapter
    
    ItemAdapter.ADAPTER_CLASSES.appendleft(ZyteItemAdapter)
    
  • Update SPIDER_MIDDLEWARES to include "zyte_spider_templates.middlewares.AllowOffsiteMiddleware": 500 and "scrapy.spidermiddlewares.offsite.OffsiteMiddleware": None. This allows for crawling item links outside of the domain.

For an example of a properly configured settings.py file, see the one in zyte-spider-templates-project.