Python Web Scraper
- Scrap all search results for a keyword entered as an argument.
- Can be saved as
.csv
and.json
. - Also collect user data who uploaded contents included in search results.
pip install git+https://github.com/bigpicture-kr/default-scraper.git
It may require authentication before installing since default-scraper is a private repository of bigpicture-kr organization.
from default_scraper.instagram.parser import InstagramParser
USERNAME = ""
PASSWORD = ""
KEYWORD = ""
parser = InstagramParser(USERNAME, PASSWORD, KEYWORD, False)
parser.run()
Run following command to scrap contents from Instagram:
python main.py --platform instagram --keyword {KEYWORD} [--output_file OUTPUT_FILE] [--all]
Use --all
or -a
option to also scrap unstructured fields.
- Structured fields
pk
id
taken_at
media_type
code
comment_count
user
like_count
caption
accessibility_caption
original_width
original_height
images
- Some fields may be missing depending on Instagram's response data.
- Will support scraping from more platform services.