Customizing spider templates

Subclass a built-in spider template to customize its metadata, parameters, and crawling logic.

Customizing metadata

Spider template metadata is defined using scrapy-spider-metadata, and can be redefined or customized in a subclass.

For example, to keep the upstream title but change the description:

from zyte_spider_templates import EcommerceSpider


class MySpider(EcommerceSpider):
    name = "my_spider"
    metadata = {
        **EcommerceSpider.metadata,
        "description": "Custom e-commerce spider template.",
    }

Customizing parameters

Spider template parameters are also defined using scrapy-spider-metadata, and can be redefined or customized in a subclass as well.

For example, to add a min_price parameter and filter out products with a lower price:

from decimal import Decimal
from typing import Iterable

from scrapy_poet import DummyResponse
from scrapy_spider_metadata import Args
from zyte_common_items import Product
from zyte_spider_templates import EcommerceSpider
from zyte_spider_templates.spiders.ecommerce import EcommerceSpiderParams


class MyParams(EcommerceSpiderParams):
    min_price: str = "0.00"


class MySpider(EcommerceSpider, Args[MyParams]):
    name = "my_spider"

    def parse_product(
        self, response: DummyResponse, product: Product
    ) -> Iterable[Product]:
        for product in super().parse_product(response, product):
            if Decimal(product.price) >= Decimal(self.args.min_price):
                yield product

Customizing the crawling logic

The crawling logic of spider templates can be customized as any other Scrapy spider.

For example, you can make a spider that expects a product details URL and does not follow navigation at all:

from typing import Iterable

from scrapy import Request
from zyte_spider_templates import EcommerceSpider


class MySpider(EcommerceSpider):
    name = "my_spider"

    def start_requests(self) -> Iterable[Request]:
        for request in super().start_requests():
            yield request.replace(callback=self.parse_product)

All parsing logic is implemented separately in page objects, making it easier to read the code of built-in spider templates to modify them as desired.