- Install Anaconda
- Create conda env
- conda env create -f environment.yml
- Download geckodriver and add to path. Make sure to install Firefox if you don't have it as well.
note: We used Katalon IDE brower plugin to easy generate some of python selenium statements the normal selenium IDE no longer supports Python code exports.
source activate new_councilmatic
source activate new_councilmatic
conda env update -f=environment.yml
source activate new_councilmatic
python -m ipykernel install --user --name new_councilmatic --display-name "new councilmatic"
make scrape generate
python run_calendar.py -h
python run_calendar.py -d 2018 > WebPage/website/scraped/year2018.json
Starting from 1/1/2019 up to (but not including) 1/14/2019
python run_calendar.py -d 2019 -sdt 1/1/2019 -edt 1/14/2019 > WebPage/website/scraped/cal01012019_01142019.json
python run_calendar.py -s "parking"
python run_calendar.py -d 2018 -s "parking"
python run_calendar.py -d 2018 -s "parking" > parking2018.csv
Run ScraperUpdate_AWS.sh on Amazon Server
jupyter notebook calendar.ipynb
- PC access http://councilmatic.aws.openoakland.org/pc/
- Mobile access http://councilmatic.aws.openoakland.org/mobile/
- to have a web scraping library.
- scraping from https://oakland.legistar.com/Calendar.aspx.
- need to scrape data from the city council table, city council events(aka city meetings, the calendar page) table and the legislation page.
- scrapers inherit from the scraper class and use selenium to naviagate to the pages, which might require javascript and access web content inside tables on the page.
- store in models(that are decoupled from DB).
next milestones... might want to have some kind of user interface.(note: almost done with the first milestone)