Web Queries
Start Crawl
Crawl a website and scrape every discovered page
POST
Start a crawl over a website. The crawler discovers pages from the entry URL (and sitemaps) and scrapes each one with the same output options as Scrape URL.
Crawls are always asynchronous: the response returns a
crawl_id immediately — poll Crawl Status for progress and results, or register a webhook for delivery.
Request Body
The entry URL to start crawling from.
Crawler options:
limit(number, default100) — max pages to crawlinclude/exclude(string[]) — URL patterns to include or skipmax_depth(number, default3) — link depth from the entry URLinclude_entire_domain(boolean, defaultfalse) — allow crawling any path on the entry domaininclude_subdomains(boolean, defaultfalse) — allow crawling subdomains of the entry domaininclude_external_links(boolean, defaultfalse) — allow crawling external linkssitemaps(boolean, defaulttrue) — seed from sitemap.xml
Output options per page — same as Scrape URL.
Proxy options:
location (default US) and sticky_session.Request options — same as Scrape URL.
JavaScript rendering options — same as Scrape URL.
Delivery webhook:
url, optional headers, events (default ["started", "completed", "failed"]).Response
Unique ID for this crawl job. Use it with Crawl Status.
crawling on accept.