Skip to main content
POST
/
web
/
crawl
curl -X POST https://scrape.st/web/crawl \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "crawler": { "limit": 50, "max_depth": 2 },
    "output": { "formats": ["markdown"] }
  }'
{
  "crawl_id": "crl_01HZX...",
  "version": "1.0.0",
  "status": "crawling"
}
Start a crawl over a website. The crawler discovers pages from the entry URL (and sitemaps) and scrapes each one with the same output options as Scrape URL. Crawls are always asynchronous: the response returns a crawl_id immediately — poll Crawl Status for progress and results, or register a webhook for delivery.

Request Body

url
string
required
The entry URL to start crawling from.
crawler
object
Crawler options:
  • limit (number, default 100) — max pages to crawl
  • include / exclude (string[]) — URL patterns to include or skip
  • max_depth (number, default 3) — link depth from the entry URL
  • include_entire_domain (boolean, default false) — allow crawling any path on the entry domain
  • include_subdomains (boolean, default false) — allow crawling subdomains of the entry domain
  • include_external_links (boolean, default false) — allow crawling external links
  • sitemaps (boolean, default true) — seed from sitemap.xml
output
object
Output options per page — same as Scrape URL.
proxy
object
Proxy options: location (default US) and sticky_session.
request
object
Request options — same as Scrape URL.
js_render
object
JavaScript rendering options — same as Scrape URL.
webhook
object
Delivery webhook: url, optional headers, events (default ["started", "completed", "failed"]).

Response

crawl_id
string
Unique ID for this crawl job. Use it with Crawl Status.
status
string
crawling on accept.
curl -X POST https://scrape.st/web/crawl \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "crawler": { "limit": 50, "max_depth": 2 },
    "output": { "formats": ["markdown"] }
  }'
{
  "crawl_id": "crl_01HZX...",
  "version": "1.0.0",
  "status": "crawling"
}