spatula is a modern Python library for writing maintainable web scrapers.
Source: https://github.com/jamesturk/spatula
Documentation: https://jamesturk.github.io/spatula/
Issues: https://github.com/jamesturk/spatula/issues
- Page-oriented design: Encourages writing understandable & maintainable scrapers.
- Not Just HTML: Provides built in handlers for common data formats including CSV, JSON, XML, PDF, and Excel. Or write your own.
- Fast HTML parsing: Uses
lxml.html
for fast, consistent, and reliable parsing of HTML. - Flexible Data Model Support: Compatible with
dataclasses
,attrs
,pydantic
, or bring your own data model classes for storing & validating your scraped data. - CLI Tools: Offers several CLI utilities that can help streamline development & testing cycle.
- Fully Typed: Makes full use of Python 3 type annotations.