Reference

Spiders

class zyte_spider_templates.BaseSpider(*args: Any, **kwargs: Any)[source]
class zyte_spider_templates.EcommerceSpider(*args: Any, **kwargs: Any)[source]

Yield products from an e-commerce website.

See EcommerceSpiderParams for supported parameters.

class zyte_spider_templates.GoogleSearchSpider(*args: Any, **kwargs: Any)[source]

Yield results from Google searches.

See GoogleSearchSpiderParams for supported parameters.

Pages

class zyte_spider_templates.pages.HeuristicsProductNavigationPage(request_url: RequestUrl, product_navigation: ProductNavigation, response: AnyResponse, page_params: PageParams)[source]

Parameter mixins

pydantic model zyte_spider_templates.params.ExtractFromParam[source]
field extract_from: ExtractFrom | None = None

Whether to perform extraction using a browser request (browserHtml) or an HTTP request (httpResponseBody).

enum zyte_spider_templates.params.ExtractFrom(value)[source]
Member Type:

str

Valid values are as follows:

httpResponseBody: str = <ExtractFrom.httpResponseBody: 'httpResponseBody'>

Use HTTP responses. Cost-efficient and fast extraction method, which works well on many websites.

browserHtml: str = <ExtractFrom.browserHtml: 'browserHtml'>

Use browser rendering. Often provides the best quality.

pydantic model zyte_spider_templates.params.GeolocationParam[source]
field geolocation: Geolocation | None = None

ISO 3166-1 alpha-2 2-character string specified in https://docs.zyte.com/zyte-api/usage/reference.html#operation/extract/request/geolocation.

enum zyte_spider_templates.params.Geolocation(value)[source]
Member Type:

str

Valid values are as follows:

AF: str = <Geolocation.AF: 'AF'>
AL: str = <Geolocation.AL: 'AL'>
DZ: str = <Geolocation.DZ: 'DZ'>
AS: str = <Geolocation.AS: 'AS'>
AD: str = <Geolocation.AD: 'AD'>
AO: str = <Geolocation.AO: 'AO'>
AI: str = <Geolocation.AI: 'AI'>
AQ: str = <Geolocation.AQ: 'AQ'>
AG: str = <Geolocation.AG: 'AG'>
AR: str = <Geolocation.AR: 'AR'>
AM: str = <Geolocation.AM: 'AM'>
AW: str = <Geolocation.AW: 'AW'>
AU: str = <Geolocation.AU: 'AU'>
AT: str = <Geolocation.AT: 'AT'>
AZ: str = <Geolocation.AZ: 'AZ'>
BS: str = <Geolocation.BS: 'BS'>
BH: str = <Geolocation.BH: 'BH'>
BD: str = <Geolocation.BD: 'BD'>
BB: str = <Geolocation.BB: 'BB'>
BY: str = <Geolocation.BY: 'BY'>
BE: str = <Geolocation.BE: 'BE'>
BZ: str = <Geolocation.BZ: 'BZ'>
BJ: str = <Geolocation.BJ: 'BJ'>
BM: str = <Geolocation.BM: 'BM'>
BT: str = <Geolocation.BT: 'BT'>
BO: str = <Geolocation.BO: 'BO'>
BQ: str = <Geolocation.BQ: 'BQ'>
BA: str = <Geolocation.BA: 'BA'>
BW: str = <Geolocation.BW: 'BW'>
BV: str = <Geolocation.BV: 'BV'>
BR: str = <Geolocation.BR: 'BR'>
IO: str = <Geolocation.IO: 'IO'>
BN: str = <Geolocation.BN: 'BN'>
BG: str = <Geolocation.BG: 'BG'>
BF: str = <Geolocation.BF: 'BF'>
BI: str = <Geolocation.BI: 'BI'>
CV: str = <Geolocation.CV: 'CV'>
KH: str = <Geolocation.KH: 'KH'>
CM: str = <Geolocation.CM: 'CM'>
CA: str = <Geolocation.CA: 'CA'>
KY: str = <Geolocation.KY: 'KY'>
CF: str = <Geolocation.CF: 'CF'>
TD: str = <Geolocation.TD: 'TD'>
CL: str = <Geolocation.CL: 'CL'>
CN: str = <Geolocation.CN: 'CN'>
CX: str = <Geolocation.CX: 'CX'>
CC: str = <Geolocation.CC: 'CC'>
CO: str = <Geolocation.CO: 'CO'>
KM: str = <Geolocation.KM: 'KM'>
CG: str = <Geolocation.CG: 'CG'>
CD: str = <Geolocation.CD: 'CD'>
CK: str = <Geolocation.CK: 'CK'>
CR: str = <Geolocation.CR: 'CR'>
HR: str = <Geolocation.HR: 'HR'>
CU: str = <Geolocation.CU: 'CU'>
CW: str = <Geolocation.CW: 'CW'>
CY: str = <Geolocation.CY: 'CY'>
CZ: str = <Geolocation.CZ: 'CZ'>
CI: str = <Geolocation.CI: 'CI'>
DK: str = <Geolocation.DK: 'DK'>
DJ: str = <Geolocation.DJ: 'DJ'>
DM: str = <Geolocation.DM: 'DM'>
DO: str = <Geolocation.DO: 'DO'>
EC: str = <Geolocation.EC: 'EC'>
EG: str = <Geolocation.EG: 'EG'>
SV: str = <Geolocation.SV: 'SV'>
GQ: str = <Geolocation.GQ: 'GQ'>
ER: str = <Geolocation.ER: 'ER'>
EE: str = <Geolocation.EE: 'EE'>
SZ: str = <Geolocation.SZ: 'SZ'>
ET: str = <Geolocation.ET: 'ET'>
FK: str = <Geolocation.FK: 'FK'>
FO: str = <Geolocation.FO: 'FO'>
FJ: str = <Geolocation.FJ: 'FJ'>
FI: str = <Geolocation.FI: 'FI'>
FR: str = <Geolocation.FR: 'FR'>
GF: str = <Geolocation.GF: 'GF'>
PF: str = <Geolocation.PF: 'PF'>
TF: str = <Geolocation.TF: 'TF'>
GA: str = <Geolocation.GA: 'GA'>
GM: str = <Geolocation.GM: 'GM'>
GE: str = <Geolocation.GE: 'GE'>
DE: str = <Geolocation.DE: 'DE'>
GH: str = <Geolocation.GH: 'GH'>
GI: str = <Geolocation.GI: 'GI'>
GR: str = <Geolocation.GR: 'GR'>
GL: str = <Geolocation.GL: 'GL'>
GD: str = <Geolocation.GD: 'GD'>
GP: str = <Geolocation.GP: 'GP'>
GU: str = <Geolocation.GU: 'GU'>
GT: str = <Geolocation.GT: 'GT'>
GG: str = <Geolocation.GG: 'GG'>
GN: str = <Geolocation.GN: 'GN'>
GW: str = <Geolocation.GW: 'GW'>
GY: str = <Geolocation.GY: 'GY'>
HT: str = <Geolocation.HT: 'HT'>
HM: str = <Geolocation.HM: 'HM'>
VA: str = <Geolocation.VA: 'VA'>
HN: str = <Geolocation.HN: 'HN'>
HK: str = <Geolocation.HK: 'HK'>
HU: str = <Geolocation.HU: 'HU'>
IS: str = <Geolocation.IS: 'IS'>
IN: str = <Geolocation.IN: 'IN'>
ID: str = <Geolocation.ID: 'ID'>
IR: str = <Geolocation.IR: 'IR'>
IQ: str = <Geolocation.IQ: 'IQ'>
IE: str = <Geolocation.IE: 'IE'>
IM: str = <Geolocation.IM: 'IM'>
IL: str = <Geolocation.IL: 'IL'>
IT: str = <Geolocation.IT: 'IT'>
JM: str = <Geolocation.JM: 'JM'>
JP: str = <Geolocation.JP: 'JP'>
JE: str = <Geolocation.JE: 'JE'>
JO: str = <Geolocation.JO: 'JO'>
KZ: str = <Geolocation.KZ: 'KZ'>
KE: str = <Geolocation.KE: 'KE'>
KI: str = <Geolocation.KI: 'KI'>
KP: str = <Geolocation.KP: 'KP'>
KR: str = <Geolocation.KR: 'KR'>
KW: str = <Geolocation.KW: 'KW'>
KG: str = <Geolocation.KG: 'KG'>
LA: str = <Geolocation.LA: 'LA'>
LV: str = <Geolocation.LV: 'LV'>
LB: str = <Geolocation.LB: 'LB'>
LS: str = <Geolocation.LS: 'LS'>
LR: str = <Geolocation.LR: 'LR'>
LY: str = <Geolocation.LY: 'LY'>
LI: str = <Geolocation.LI: 'LI'>
LT: str = <Geolocation.LT: 'LT'>
LU: str = <Geolocation.LU: 'LU'>
MO: str = <Geolocation.MO: 'MO'>
MG: str = <Geolocation.MG: 'MG'>
MW: str = <Geolocation.MW: 'MW'>
MY: str = <Geolocation.MY: 'MY'>
MV: str = <Geolocation.MV: 'MV'>
ML: str = <Geolocation.ML: 'ML'>
MT: str = <Geolocation.MT: 'MT'>
MH: str = <Geolocation.MH: 'MH'>
MQ: str = <Geolocation.MQ: 'MQ'>
MR: str = <Geolocation.MR: 'MR'>
MU: str = <Geolocation.MU: 'MU'>
YT: str = <Geolocation.YT: 'YT'>
MX: str = <Geolocation.MX: 'MX'>
FM: str = <Geolocation.FM: 'FM'>
MD: str = <Geolocation.MD: 'MD'>
MC: str = <Geolocation.MC: 'MC'>
MN: str = <Geolocation.MN: 'MN'>
ME: str = <Geolocation.ME: 'ME'>
MS: str = <Geolocation.MS: 'MS'>
MA: str = <Geolocation.MA: 'MA'>
MZ: str = <Geolocation.MZ: 'MZ'>
MM: str = <Geolocation.MM: 'MM'>
NA: str = <Geolocation.NA: 'NA'>
NR: str = <Geolocation.NR: 'NR'>
NP: str = <Geolocation.NP: 'NP'>
NL: str = <Geolocation.NL: 'NL'>
NC: str = <Geolocation.NC: 'NC'>
NZ: str = <Geolocation.NZ: 'NZ'>
NI: str = <Geolocation.NI: 'NI'>
NE: str = <Geolocation.NE: 'NE'>
NG: str = <Geolocation.NG: 'NG'>
NU: str = <Geolocation.NU: 'NU'>
NF: str = <Geolocation.NF: 'NF'>
MK: str = <Geolocation.MK: 'MK'>
MP: str = <Geolocation.MP: 'MP'>
NO: str = <Geolocation.NO: 'NO'>
OM: str = <Geolocation.OM: 'OM'>
PK: str = <Geolocation.PK: 'PK'>
PW: str = <Geolocation.PW: 'PW'>
PS: str = <Geolocation.PS: 'PS'>
PA: str = <Geolocation.PA: 'PA'>
PG: str = <Geolocation.PG: 'PG'>
PY: str = <Geolocation.PY: 'PY'>
PE: str = <Geolocation.PE: 'PE'>
PH: str = <Geolocation.PH: 'PH'>
PN: str = <Geolocation.PN: 'PN'>
PL: str = <Geolocation.PL: 'PL'>
PT: str = <Geolocation.PT: 'PT'>
PR: str = <Geolocation.PR: 'PR'>
QA: str = <Geolocation.QA: 'QA'>
RO: str = <Geolocation.RO: 'RO'>
RU: str = <Geolocation.RU: 'RU'>
RW: str = <Geolocation.RW: 'RW'>
RE: str = <Geolocation.RE: 'RE'>
BL: str = <Geolocation.BL: 'BL'>
SH: str = <Geolocation.SH: 'SH'>
KN: str = <Geolocation.KN: 'KN'>
LC: str = <Geolocation.LC: 'LC'>
MF: str = <Geolocation.MF: 'MF'>
PM: str = <Geolocation.PM: 'PM'>
VC: str = <Geolocation.VC: 'VC'>
WS: str = <Geolocation.WS: 'WS'>
SM: str = <Geolocation.SM: 'SM'>
ST: str = <Geolocation.ST: 'ST'>
SA: str = <Geolocation.SA: 'SA'>
SN: str = <Geolocation.SN: 'SN'>
RS: str = <Geolocation.RS: 'RS'>
SC: str = <Geolocation.SC: 'SC'>
SL: str = <Geolocation.SL: 'SL'>
SG: str = <Geolocation.SG: 'SG'>
SX: str = <Geolocation.SX: 'SX'>
SK: str = <Geolocation.SK: 'SK'>
SI: str = <Geolocation.SI: 'SI'>
SB: str = <Geolocation.SB: 'SB'>
SO: str = <Geolocation.SO: 'SO'>
ZA: str = <Geolocation.ZA: 'ZA'>
GS: str = <Geolocation.GS: 'GS'>
SS: str = <Geolocation.SS: 'SS'>
ES: str = <Geolocation.ES: 'ES'>
LK: str = <Geolocation.LK: 'LK'>
SD: str = <Geolocation.SD: 'SD'>
SR: str = <Geolocation.SR: 'SR'>
SJ: str = <Geolocation.SJ: 'SJ'>
SE: str = <Geolocation.SE: 'SE'>
CH: str = <Geolocation.CH: 'CH'>
SY: str = <Geolocation.SY: 'SY'>
TW: str = <Geolocation.TW: 'TW'>
TJ: str = <Geolocation.TJ: 'TJ'>
TZ: str = <Geolocation.TZ: 'TZ'>
TH: str = <Geolocation.TH: 'TH'>
TL: str = <Geolocation.TL: 'TL'>
TG: str = <Geolocation.TG: 'TG'>
TK: str = <Geolocation.TK: 'TK'>
TO: str = <Geolocation.TO: 'TO'>
TT: str = <Geolocation.TT: 'TT'>
TN: str = <Geolocation.TN: 'TN'>
TM: str = <Geolocation.TM: 'TM'>
TC: str = <Geolocation.TC: 'TC'>
TV: str = <Geolocation.TV: 'TV'>
TR: str = <Geolocation.TR: 'TR'>
UG: str = <Geolocation.UG: 'UG'>
UA: str = <Geolocation.UA: 'UA'>
AE: str = <Geolocation.AE: 'AE'>
GB: str = <Geolocation.GB: 'GB'>
US: str = <Geolocation.US: 'US'>
UM: str = <Geolocation.UM: 'UM'>
UY: str = <Geolocation.UY: 'UY'>
UZ: str = <Geolocation.UZ: 'UZ'>
VU: str = <Geolocation.VU: 'VU'>
VE: str = <Geolocation.VE: 'VE'>
VN: str = <Geolocation.VN: 'VN'>
VG: str = <Geolocation.VG: 'VG'>
VI: str = <Geolocation.VI: 'VI'>
WF: str = <Geolocation.WF: 'WF'>
EH: str = <Geolocation.EH: 'EH'>
YE: str = <Geolocation.YE: 'YE'>
ZM: str = <Geolocation.ZM: 'ZM'>
ZW: str = <Geolocation.ZW: 'ZW'>
AX: str = <Geolocation.AX: 'AX'>
pydantic model zyte_spider_templates.params.MaxRequestsParam[source]
field max_requests: int | None = 100

The maximum number of Zyte API requests allowed for the crawl.

Requests with error responses that cannot be retried or exceed their retry limit also count here, but they incur in no costs and do not increase the request count in Scrapy Cloud.

pydantic model zyte_spider_templates.params.UrlParam[source]
field url: str = ''

Initial URL for the crawl. Enter the full URL including http(s), you can copy and paste it from your browser. Example: https://toscrape.com/

Constraints:
  • pattern = ^https?://[^:/s]+(:d{1,5})?(/[^s]*)*(#[^s]*)?$

pydantic model zyte_spider_templates.spiders.ecommerce.EcommerceCrawlStrategyParam[source]
field crawl_strategy: EcommerceCrawlStrategy = EcommerceCrawlStrategy.automatic

Determines how the start URL and follow-up URLs are crawled.

enum zyte_spider_templates.spiders.ecommerce.EcommerceCrawlStrategy(value)[source]
Member Type:

str

Valid values are as follows:

automatic: str = <EcommerceCrawlStrategy.automatic: 'automatic'>

Automatically use the best crawl strategy based on the given URL inputs.

If given a homepage URL, it would attempt to crawl as many products it can discover. Otherwise, it attempt to crawl the products on a given page category.

full: str = <EcommerceCrawlStrategy.full: 'full'>

Follow most links within the domain of URL in an attempt to discover and extract as many products as possible.

navigation: str = <EcommerceCrawlStrategy.navigation: 'navigation'>

Follow pagination, subcategories, and product detail pages.

Pagination Only is a better choice if the target URL does not have subcategories, or if Zyte API is misidentifying some URLs as subcategories.

pagination_only: str = <EcommerceCrawlStrategy.pagination_only: 'pagination_only'>

Follow pagination and product detail pages. Subcategory links are ignored.

direct_item: str = <EcommerceCrawlStrategy.direct_item: 'direct_item'>

Treat input URLs as direct links to product detail pages, and extract an product from each.

pydantic model zyte_spider_templates.spiders.serp.SerpMaxPagesParam[source]
field max_pages: int = 1

Maximum number of result pages to visit per search query.