Skip to content

minho42/hospital-ranking

Repository files navigation

hospital-ranking

Scrapes Google ratings information (stars and review count) for Australian hospitals without using Google Maps API


Requirement

  • Python 3
  • Google Chrome
  • Chromedriver

Usage

Copy the source code

git clone https://github.com/minho42/hospital-ranking.git
cd hospital-ranking/

Optional: Use virtual environment

python -m venv venv
source venv/bin/activate

Install required packages

pip install -r requirements.txt

Download Chromedriver

Change variable CHROME_DRIVER_PATH in app.py

Download raw data ('Current Listing of Commonwealth declared hospitals')

Change filename to all_hospitals.xlsx

Run the script (this takes a long time, like > 30 minutes)

python app.py

Eventually, ranking.json is generated in frontend/ i.e. frontend/ranking.json which can be used in the frontend app


Variables in app.py

RAW_DATA_FILE

all_hospitals.xlsx

EXTRACTED_DATA_FILE

all_hospitals.json

[
  {
    "sector": "PUBLIC",
    "state": "NSW",
    "name": "CONCORD REPATRIATION HOSPITAL"
  }
]

RATING_FILE

rating.json

[
  {
    "sector": "PUBLIC",
    "state": "NSW",
    "name": "CONCORD REPATRIATION HOSPITAL",
    "stars": "2.9",
    "reviews": "288"
  }
]

RANKING_FILE

frontend/ranking.json

[
  {
    "sector": "PUBLIC",
    "state": "NSW",
    "name": "CONCORD REPATRIATION HOSPITAL",
    "stars": "2.9",
    "reviews": "288",
    "ranking": "2.9022899952150216"
  }
]

Ranking Formula

weighted rating (WR) = (v / (v + m)) _ R + (m / (v + m)) _ C

R = average for the hospital (mean) = (Rating)

v = number of reviews for the hospital = (reviews)

m = minimum reviews required to be listed (currently 1)

C = the mean review across the whole reviews

Referenced from https://www.quora.com/How-does-IMDbs-rating-system-work