Skip to content

spider-rs/spider

Repository files navigation

Spider

Build Status Crates.io Documentation Rust Discord chat

Website | Guides | API Docs | Chat

A web crawler and scraper, building blocks for data curation workloads.

  • Concurrent

  • Streaming

  • Decentralization

  • Headless Chrome Rendering

  • HTTP Proxies

  • Cron Jobs

  • Subscriptions

  • Smart Mode

  • Anti-Bot mitigation

  • Disk persistence

  • Privacy and Efficiency through Ad, Analytics, and Custom Tiered Network Blocking

  • Blacklisting, Whitelisting, and Budgeting Depth

  • Dynamic AI Prompt Scripting Headless with Step Caching

  • CSS/Xpath Scraping with spider_utils

  • HTML to markdown, text, and etc transformations with spider_transformations

  • Changelog

Getting Started

The simplest way to get started is to use the Spider Cloud hosted service. View the spider or spider_cli directory for local installations. You can also use spider with Node.js using spider-nodejs and Python using spider-py.

Benchmarks

See BENCHMARKS.

Examples

See EXAMPLES.

License

This project is licensed under the MIT license.

Contributing

See CONTRIBUTING.