Website | Guides | API Docs | Chat
A web crawler and scraper, building blocks for data curation workloads.
-
Concurrent
-
Streaming
-
Decentralization
-
Headless Chrome Rendering
-
HTTP Proxies
-
Cron Jobs
-
Subscriptions
-
Smart Mode
-
Anti-Bot mitigation
-
Disk persistence
-
Privacy and Efficiency through Ad, Analytics, and Custom Tiered Network Blocking
-
Blacklisting, Whitelisting, and Budgeting Depth
-
Dynamic AI Prompt Scripting Headless with Step Caching
-
CSS/Xpath Scraping with spider_utils
-
HTML to markdown, text, and etc transformations with spider_transformations
The simplest way to get started is to use the Spider Cloud hosted service. View the spider or spider_cli directory for local installations. You can also use spider with Node.js using spider-nodejs and Python using spider-py.
See BENCHMARKS.
See EXAMPLES.
This project is licensed under the MIT license.
See CONTRIBUTING.