Initial setup
Learn how to get spider templates installed and configured on an existing Scrapy project.
Tip
If you do not have a Scrapy project yet, use zyte-spider-templates-project as a starting template to get started quickly.
Requirements
Python 3.8+
Scrapy 2.11+
For Zyte API features, including AI-powered parsing, you need a Zyte API subscription.
Installation
pip install zyte-spider-templates
Configuration
In your Scrapy project settings (usually in settings.py
):
Update
SPIDER_MODULES
to include"zyte_spider_templates.spiders"
.Configure scrapy-poet, and update SCRAPY_POET_DISCOVER to include
"zyte_spider_templates.pages"
.
For Zyte API features, including AI-powered parsing, configure scrapy-zyte-api with scrapy-poet integration.
The following additional settings are recommended:
Set
CLOSESPIDER_TIMEOUT_NO_ITEM
to 600, to force the spider to stop if no item has been found for 10 minutes.Set
SCHEDULER_DISK_QUEUE
to"scrapy.squeues.PickleFifoDiskQueue"
andSCHEDULER_MEMORY_QUEUE
to"scrapy.squeues.FifoMemoryQueue"
, for better request priority handling.Update
SPIDER_MIDDLEWARES
to include"zyte_spider_templates.middlewares.CrawlingLogsMiddleware": 1000
, to log crawl data in JSON format for debugging purposes.Ensure that
zyte_common_items.ZyteItemAdapter
is also configured:from itemadapter import ItemAdapter from zyte_common_items import ZyteItemAdapter ItemAdapter.ADAPTER_CLASSES.appendleft(ZyteItemAdapter)
Update
SPIDER_MIDDLEWARES
to include"zyte_spider_templates.middlewares.AllowOffsiteMiddleware": 500
and"scrapy.spidermiddlewares.offsite.OffsiteMiddleware": None
. This allows for crawling item links outside of the domain.
For an example of a properly configured settings.py
file, see the one
in zyte-spider-templates-project.