E-commerce spider template (ecommerce)

Basic use

scrapy crawl ecommerce -a url="https://books.toscrape.com"

Parameters

pydantic model zyte_spider_templates.spiders.ecommerce.EcommerceSpiderParams[source]
field crawl_strategy: EcommerceCrawlStrategy = EcommerceCrawlStrategy.full

Determines how the start URL and follow-up URLs are crawled.

field extract_from: ExtractFrom | None = None

Whether to perform extraction using a browser request (browserHtml) or an HTTP request (httpResponseBody).

field geolocation: Geolocation | None = None

ISO 3166-1 alpha-2 2-character string specified in https://docs.zyte.com/zyte-api/usage/reference.html#operation/extract/request/geolocation.

field max_requests: int | None = 100

The maximum number of Zyte API requests allowed for the crawl.

Requests with error responses that cannot be retried or exceed their retry limit also count here, but they incur in no costs and do not increase the request count in Scrapy Cloud.

field url: str [Required]

Initial URL for the crawl. Enter the full URL including http(s), you can copy and paste it from your browser. Example: https://toscrape.com/

Constraints:
  • pattern = ^https?://[^:/s]+(:d{1,5})?(/[^s]*)*(#[^s]*)?$

model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

enum zyte_spider_templates.spiders.ecommerce.EcommerceCrawlStrategy(value)[source]
Member Type:

str

Valid values are as follows:

full: str = <EcommerceCrawlStrategy.full: 'full'>

Follow most links within the domain of URL in an attempt to discover and extract as many products as possible.

navigation: str = <EcommerceCrawlStrategy.navigation: 'navigation'>

Follow pagination, subcategories, and product detail pages.

Pagination Only is a better choice if the target URL does not have subcategories, or if Zyte API is misidentifying some URLs as subcategories.

pagination_only: str = <EcommerceCrawlStrategy.pagination_only: 'pagination_only'>

Follow pagination and product detail pages. Subcategory links are ignored.

enum zyte_spider_templates.spiders.base.ExtractFrom(value)[source]
Member Type:

str

Valid values are as follows:

httpResponseBody: str = <ExtractFrom.httpResponseBody: 'httpResponseBody'>

Use HTTP responses. Cost-efficient and fast extraction method, which works well on many websites.

browserHtml: str = <ExtractFrom.browserHtml: 'browserHtml'>

Use browser rendering. Often provides the best quality.

enum zyte_spider_templates.spiders.base.Geolocation(value)[source]
Member Type:

str

Valid values are as follows:

AF: str = <Geolocation.AF: 'AF'>
AL: str = <Geolocation.AL: 'AL'>
DZ: str = <Geolocation.DZ: 'DZ'>
AS: str = <Geolocation.AS: 'AS'>
AD: str = <Geolocation.AD: 'AD'>
AO: str = <Geolocation.AO: 'AO'>
AI: str = <Geolocation.AI: 'AI'>
AQ: str = <Geolocation.AQ: 'AQ'>
AG: str = <Geolocation.AG: 'AG'>
AR: str = <Geolocation.AR: 'AR'>
AM: str = <Geolocation.AM: 'AM'>
AW: str = <Geolocation.AW: 'AW'>
AU: str = <Geolocation.AU: 'AU'>
AT: str = <Geolocation.AT: 'AT'>
AZ: str = <Geolocation.AZ: 'AZ'>
BS: str = <Geolocation.BS: 'BS'>
BH: str = <Geolocation.BH: 'BH'>
BD: str = <Geolocation.BD: 'BD'>
BB: str = <Geolocation.BB: 'BB'>
BY: str = <Geolocation.BY: 'BY'>
BE: str = <Geolocation.BE: 'BE'>
BZ: str = <Geolocation.BZ: 'BZ'>
BJ: str = <Geolocation.BJ: 'BJ'>
BM: str = <Geolocation.BM: 'BM'>
BT: str = <Geolocation.BT: 'BT'>
BO: str = <Geolocation.BO: 'BO'>
BQ: str = <Geolocation.BQ: 'BQ'>
BA: str = <Geolocation.BA: 'BA'>
BW: str = <Geolocation.BW: 'BW'>
BV: str = <Geolocation.BV: 'BV'>
BR: str = <Geolocation.BR: 'BR'>
IO: str = <Geolocation.IO: 'IO'>
BN: str = <Geolocation.BN: 'BN'>
BG: str = <Geolocation.BG: 'BG'>
BF: str = <Geolocation.BF: 'BF'>
BI: str = <Geolocation.BI: 'BI'>
CV: str = <Geolocation.CV: 'CV'>
KH: str = <Geolocation.KH: 'KH'>
CM: str = <Geolocation.CM: 'CM'>
CA: str = <Geolocation.CA: 'CA'>
KY: str = <Geolocation.KY: 'KY'>
CF: str = <Geolocation.CF: 'CF'>
TD: str = <Geolocation.TD: 'TD'>
CL: str = <Geolocation.CL: 'CL'>
CN: str = <Geolocation.CN: 'CN'>
CX: str = <Geolocation.CX: 'CX'>
CC: str = <Geolocation.CC: 'CC'>
CO: str = <Geolocation.CO: 'CO'>
KM: str = <Geolocation.KM: 'KM'>
CG: str = <Geolocation.CG: 'CG'>
CD: str = <Geolocation.CD: 'CD'>
CK: str = <Geolocation.CK: 'CK'>
CR: str = <Geolocation.CR: 'CR'>
HR: str = <Geolocation.HR: 'HR'>
CU: str = <Geolocation.CU: 'CU'>
CW: str = <Geolocation.CW: 'CW'>
CY: str = <Geolocation.CY: 'CY'>
CZ: str = <Geolocation.CZ: 'CZ'>
CI: str = <Geolocation.CI: 'CI'>
DK: str = <Geolocation.DK: 'DK'>
DJ: str = <Geolocation.DJ: 'DJ'>
DM: str = <Geolocation.DM: 'DM'>
DO: str = <Geolocation.DO: 'DO'>
EC: str = <Geolocation.EC: 'EC'>
EG: str = <Geolocation.EG: 'EG'>
SV: str = <Geolocation.SV: 'SV'>
GQ: str = <Geolocation.GQ: 'GQ'>
ER: str = <Geolocation.ER: 'ER'>
EE: str = <Geolocation.EE: 'EE'>
SZ: str = <Geolocation.SZ: 'SZ'>
ET: str = <Geolocation.ET: 'ET'>
FK: str = <Geolocation.FK: 'FK'>
FO: str = <Geolocation.FO: 'FO'>
FJ: str = <Geolocation.FJ: 'FJ'>
FI: str = <Geolocation.FI: 'FI'>
FR: str = <Geolocation.FR: 'FR'>
GF: str = <Geolocation.GF: 'GF'>
PF: str = <Geolocation.PF: 'PF'>
TF: str = <Geolocation.TF: 'TF'>
GA: str = <Geolocation.GA: 'GA'>
GM: str = <Geolocation.GM: 'GM'>
GE: str = <Geolocation.GE: 'GE'>
DE: str = <Geolocation.DE: 'DE'>
GH: str = <Geolocation.GH: 'GH'>
GI: str = <Geolocation.GI: 'GI'>
GR: str = <Geolocation.GR: 'GR'>
GL: str = <Geolocation.GL: 'GL'>
GD: str = <Geolocation.GD: 'GD'>
GP: str = <Geolocation.GP: 'GP'>
GU: str = <Geolocation.GU: 'GU'>
GT: str = <Geolocation.GT: 'GT'>
GG: str = <Geolocation.GG: 'GG'>
GN: str = <Geolocation.GN: 'GN'>
GW: str = <Geolocation.GW: 'GW'>
GY: str = <Geolocation.GY: 'GY'>
HT: str = <Geolocation.HT: 'HT'>
HM: str = <Geolocation.HM: 'HM'>
VA: str = <Geolocation.VA: 'VA'>
HN: str = <Geolocation.HN: 'HN'>
HK: str = <Geolocation.HK: 'HK'>
HU: str = <Geolocation.HU: 'HU'>
IS: str = <Geolocation.IS: 'IS'>
IN: str = <Geolocation.IN: 'IN'>
ID: str = <Geolocation.ID: 'ID'>
IR: str = <Geolocation.IR: 'IR'>
IQ: str = <Geolocation.IQ: 'IQ'>
IE: str = <Geolocation.IE: 'IE'>
IM: str = <Geolocation.IM: 'IM'>
IL: str = <Geolocation.IL: 'IL'>
IT: str = <Geolocation.IT: 'IT'>
JM: str = <Geolocation.JM: 'JM'>
JP: str = <Geolocation.JP: 'JP'>
JE: str = <Geolocation.JE: 'JE'>
JO: str = <Geolocation.JO: 'JO'>
KZ: str = <Geolocation.KZ: 'KZ'>
KE: str = <Geolocation.KE: 'KE'>
KI: str = <Geolocation.KI: 'KI'>
KP: str = <Geolocation.KP: 'KP'>
KR: str = <Geolocation.KR: 'KR'>
KW: str = <Geolocation.KW: 'KW'>
KG: str = <Geolocation.KG: 'KG'>
LA: str = <Geolocation.LA: 'LA'>
LV: str = <Geolocation.LV: 'LV'>
LB: str = <Geolocation.LB: 'LB'>
LS: str = <Geolocation.LS: 'LS'>
LR: str = <Geolocation.LR: 'LR'>
LY: str = <Geolocation.LY: 'LY'>
LI: str = <Geolocation.LI: 'LI'>
LT: str = <Geolocation.LT: 'LT'>
LU: str = <Geolocation.LU: 'LU'>
MO: str = <Geolocation.MO: 'MO'>
MG: str = <Geolocation.MG: 'MG'>
MW: str = <Geolocation.MW: 'MW'>
MY: str = <Geolocation.MY: 'MY'>
MV: str = <Geolocation.MV: 'MV'>
ML: str = <Geolocation.ML: 'ML'>
MT: str = <Geolocation.MT: 'MT'>
MH: str = <Geolocation.MH: 'MH'>
MQ: str = <Geolocation.MQ: 'MQ'>
MR: str = <Geolocation.MR: 'MR'>
MU: str = <Geolocation.MU: 'MU'>
YT: str = <Geolocation.YT: 'YT'>
MX: str = <Geolocation.MX: 'MX'>
FM: str = <Geolocation.FM: 'FM'>
MD: str = <Geolocation.MD: 'MD'>
MC: str = <Geolocation.MC: 'MC'>
MN: str = <Geolocation.MN: 'MN'>
ME: str = <Geolocation.ME: 'ME'>
MS: str = <Geolocation.MS: 'MS'>
MA: str = <Geolocation.MA: 'MA'>
MZ: str = <Geolocation.MZ: 'MZ'>
MM: str = <Geolocation.MM: 'MM'>
NA: str = <Geolocation.NA: 'NA'>
NR: str = <Geolocation.NR: 'NR'>
NP: str = <Geolocation.NP: 'NP'>
NL: str = <Geolocation.NL: 'NL'>
NC: str = <Geolocation.NC: 'NC'>
NZ: str = <Geolocation.NZ: 'NZ'>
NI: str = <Geolocation.NI: 'NI'>
NE: str = <Geolocation.NE: 'NE'>
NG: str = <Geolocation.NG: 'NG'>
NU: str = <Geolocation.NU: 'NU'>
NF: str = <Geolocation.NF: 'NF'>
MK: str = <Geolocation.MK: 'MK'>
MP: str = <Geolocation.MP: 'MP'>
NO: str = <Geolocation.NO: 'NO'>
OM: str = <Geolocation.OM: 'OM'>
PK: str = <Geolocation.PK: 'PK'>
PW: str = <Geolocation.PW: 'PW'>
PS: str = <Geolocation.PS: 'PS'>
PA: str = <Geolocation.PA: 'PA'>
PG: str = <Geolocation.PG: 'PG'>
PY: str = <Geolocation.PY: 'PY'>
PE: str = <Geolocation.PE: 'PE'>
PH: str = <Geolocation.PH: 'PH'>
PN: str = <Geolocation.PN: 'PN'>
PL: str = <Geolocation.PL: 'PL'>
PT: str = <Geolocation.PT: 'PT'>
PR: str = <Geolocation.PR: 'PR'>
QA: str = <Geolocation.QA: 'QA'>
RO: str = <Geolocation.RO: 'RO'>
RU: str = <Geolocation.RU: 'RU'>
RW: str = <Geolocation.RW: 'RW'>
RE: str = <Geolocation.RE: 'RE'>
BL: str = <Geolocation.BL: 'BL'>
SH: str = <Geolocation.SH: 'SH'>
KN: str = <Geolocation.KN: 'KN'>
LC: str = <Geolocation.LC: 'LC'>
MF: str = <Geolocation.MF: 'MF'>
PM: str = <Geolocation.PM: 'PM'>
VC: str = <Geolocation.VC: 'VC'>
WS: str = <Geolocation.WS: 'WS'>
SM: str = <Geolocation.SM: 'SM'>
ST: str = <Geolocation.ST: 'ST'>
SA: str = <Geolocation.SA: 'SA'>
SN: str = <Geolocation.SN: 'SN'>
RS: str = <Geolocation.RS: 'RS'>
SC: str = <Geolocation.SC: 'SC'>
SL: str = <Geolocation.SL: 'SL'>
SG: str = <Geolocation.SG: 'SG'>
SX: str = <Geolocation.SX: 'SX'>
SK: str = <Geolocation.SK: 'SK'>
SI: str = <Geolocation.SI: 'SI'>
SB: str = <Geolocation.SB: 'SB'>
SO: str = <Geolocation.SO: 'SO'>
ZA: str = <Geolocation.ZA: 'ZA'>
GS: str = <Geolocation.GS: 'GS'>
SS: str = <Geolocation.SS: 'SS'>
ES: str = <Geolocation.ES: 'ES'>
LK: str = <Geolocation.LK: 'LK'>
SD: str = <Geolocation.SD: 'SD'>
SR: str = <Geolocation.SR: 'SR'>
SJ: str = <Geolocation.SJ: 'SJ'>
SE: str = <Geolocation.SE: 'SE'>
CH: str = <Geolocation.CH: 'CH'>
SY: str = <Geolocation.SY: 'SY'>
TW: str = <Geolocation.TW: 'TW'>
TJ: str = <Geolocation.TJ: 'TJ'>
TZ: str = <Geolocation.TZ: 'TZ'>
TH: str = <Geolocation.TH: 'TH'>
TL: str = <Geolocation.TL: 'TL'>
TG: str = <Geolocation.TG: 'TG'>
TK: str = <Geolocation.TK: 'TK'>
TO: str = <Geolocation.TO: 'TO'>
TT: str = <Geolocation.TT: 'TT'>
TN: str = <Geolocation.TN: 'TN'>
TM: str = <Geolocation.TM: 'TM'>
TC: str = <Geolocation.TC: 'TC'>
TV: str = <Geolocation.TV: 'TV'>
TR: str = <Geolocation.TR: 'TR'>
UG: str = <Geolocation.UG: 'UG'>
UA: str = <Geolocation.UA: 'UA'>
AE: str = <Geolocation.AE: 'AE'>
GB: str = <Geolocation.GB: 'GB'>
US: str = <Geolocation.US: 'US'>
UM: str = <Geolocation.UM: 'UM'>
UY: str = <Geolocation.UY: 'UY'>
UZ: str = <Geolocation.UZ: 'UZ'>
VU: str = <Geolocation.VU: 'VU'>
VE: str = <Geolocation.VE: 'VE'>
VN: str = <Geolocation.VN: 'VN'>
VG: str = <Geolocation.VG: 'VG'>
VI: str = <Geolocation.VI: 'VI'>
WF: str = <Geolocation.WF: 'WF'>
EH: str = <Geolocation.EH: 'EH'>
YE: str = <Geolocation.YE: 'YE'>
ZM: str = <Geolocation.ZM: 'ZM'>
ZW: str = <Geolocation.ZW: 'ZW'>
AX: str = <Geolocation.AX: 'AX'>