Skip to content
forked from bitmakerla/estela

estela, an elastic web scraping cluster 🕸

License

Notifications You must be signed in to change notification settings

zanachka/estela

 
 

Repository files navigation

estela

estela is an elastic web scraping cluster running on Kubernetes. It provides mechanisms to deploy, run and scale web scraping spiders via a REST API and a web interface.

Technologies

docker python react nodejs

Project Structure

The project consists of three main modules:

  • REST API : built with the Django REST framework toolkit, it exposes several endpoints to manage projects, spiders, and jobs. It uses Celery for task processing and takes care of deploying your Scrapy projects, among other things.
  • Queueing : estela needs a high-throughput, low-latency platform that controls real-time data feeds in a producer-consumer architecture. In this module, you will find a consumer used to collect and transport the information from the spider jobs into a database.
  • Web : A web interface implemented with React and Typescript that lets you manage projects and spiders.

Each of these modules works independently of the rest and can be changed. Each module has a more detailed description in its corresponding directory.

estela-cli

estela-cli is a command-line interface for estela.

How to Contribute

Please read CONTRIBUTING.md and follow the steps. Remember to abide by our adapted from ESTELA Code of Conduct too.

About

estela, an elastic web scraping cluster 🕸

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TypeScript 76.2%
  • Python 21.3%
  • SCSS 1.0%
  • JavaScript 0.6%
  • Makefile 0.5%
  • HTML 0.2%
  • Other 0.2%