Customizing spider templates

Subclass a built-in spider template to customize its metadata, parameters, and crawling logic.

Customizing metadata

Spider template metadata is defined using scrapy-spider-metadata, and can be redefined or customized in a subclass.

For example, to keep the upstream title but change the description:

from zyte_spider_templates import EcommerceSpider


class MySpider(EcommerceSpider):
    name = "my_spider"
    metadata = {
        **EcommerceSpider.metadata,
        "description": "Custom e-commerce spider template.",
    }

Customizing parameters

Spider template parameters are also defined using scrapy-spider-metadata, and can be redefined or customized in a subclass as well.

For example, to add a min_price parameter and filter out products with a lower price:

from decimal import Decimal
from typing import Iterable

from scrapy_poet import DummyResponse
from scrapy_spider_metadata import Args
from zyte_common_items import Product
from zyte_spider_templates import EcommerceSpider
from zyte_spider_templates.spiders.ecommerce import EcommerceSpiderParams


class MyParams(EcommerceSpiderParams):
    min_price: str = "0.00"


class MySpider(EcommerceSpider, Args[MyParams]):
    name = "my_spider"

    def parse_product(
        self, response: DummyResponse, product: Product
    ) -> Iterable[Product]:
        for product in super().parse_product(response, product):
            if Decimal(product.price) >= Decimal(self.args.min_price):
                yield product

You can also override existing parameters. For example, to hard-code the start URL:

from scrapy_spider_metadata import Args
from zyte_spider_templates import EcommerceSpider
from zyte_spider_templates.spiders.ecommerce import EcommerceSpiderParams


class MyParams(EcommerceSpiderParams):
    url: str = "https://books.toscrape.com"


class MySpider(EcommerceSpider, Args[MyParams]):
    name = "my_spider"

A mixin class exists for every spider parameter (see Parameter mixins), so you can use any combination of them in any order you like in your custom classes, while enjoying future improvements to validation, documentation or UI integration for Scrapy Cloud:

from scrapy_spider_metadata import Args
from zyte_spider_templates.params import GeolocationParam, UrlParam


class MyParams(GeolocationParam, UrlParam):
    pass


class MySpider(Args[MyParams]):
    name = "my_spider"

Customizing the crawling logic

The crawling logic of spider templates can be customized as any other Scrapy spider.

For example, you can make a spider that expects a product details URL and does not follow navigation at all:

from typing import Iterable

from scrapy import Request
from zyte_spider_templates import EcommerceSpider


class MySpider(EcommerceSpider):
    name = "my_spider"

    def start_requests(self) -> Iterable[Request]:
        for request in super().start_requests():
            yield request.replace(callback=self.parse_product)

All parsing logic is implemented separately in page objects, making it easier to read the code of built-in spider templates to modify them as desired.