This repository contains three web scrapers designed to extract specific data from various sources. These scrapers are tailored for different websites and are intended to be used for data collection and analysis.
This scraper extracts images and their metadata from Wikimedia.
This scraper collects full-size images along with their titles, descriptions, dates, and locations from the DVISDSHUB website.
This scraper gathers data from the Metropolitan Museum of Art's website.
- Functionality: Extracts images and metadata from Wikimedia.
- Data Collected:
- Full-size images
- Titles
- Descriptions
- Metadata available on the image page
- Functionality: Scrapes images and associated metadata from DVISDSHUB.
- Data Collected:
- Full-size images
- Title (extracted from above the image on the image page)
- Description (extracted from below the image on the image page)
- Date (appended after "ca.")
- Location (prepended to the description/headline fields)
- Note: The scraper excludes wording like "[Image 1 of 7]".
- Format: Outputs data in an Excel file with the title and description in the same cell, formatted as:
Location: Title - Description ca. Date
- Functionality: Extracts data from the Metropolitan Museum of Art's website.
- Data Collected:
- Full-size images
- Titles
- Descriptions
- Metadata available on the artwork page