Tags: hipagesgroup/data-ml-utils
Tags
feat: [TMX-960] add databricks sql connector (#38) * add libraries * test if this is stuck on the mocking of databricks sql connect * change file name to prevent confusion from pytest * test for connecting sql client * initialise the cursor first so that we can invoke the description * update package
chore: light refactoring of py scripts [DP-1454] (#20) * chore: light refactoring of py scripts * Update pyathena_utils.py * update readme.md * change tech docs url * update toml and makefile * Update mlflow_model_utils.py * Update mlflow_model_utils.py --------- Co-authored-by: Shu Ming Peh <>
feat: [DP-1721] EMR cost optimisation through config (#14) * add autoscaling and termination configuration to emr cluster * change EMR version to be referenced from function variable * add emr_version as a function parameter * change idle time out to be 3600 * add response for creating emr cluster * update test cases for EMR, moto does not work anymore for `run_job_flows` function * changed version to 0.2.7 in `pyproject.toml` * changed run_job_flow expected parameters in `conftest.py` * add autoscaling policy and changed idle termination to 20 minutes * add autoscaling role to expected run_job_flow parameters * include polling library to `requirements*.txt` and `pyproject.toml` * changing checking the master dns address to polling every 15 seconds for ~6 minutes. * update test case for spinning up EMR cluster with polling Co-authored-by: Shu Ming Peh <>
fix: patch fix emr cluster creation (#13) * change package version in poetry * change spot price duration, and the check of dns address after 8.5 minutes * change python version to 0.2.6 * change sleep time to 10.5 minutes before checking the master dns address Co-authored-by: Shu Ming Peh <>
fix: fix EMR pytests (#8) * add response of boto3 `get_cluster_id` and `describe_cluster` * add pytest fixture * stub boto3 functions that have malformatted input * change version of package * change installation guide on readme Co-authored-by: Shu Ming Peh <>
feat: Add boto3 client (#5) * adding skeleton of pyathena client first * delete `common_utils` and have `core` instead to store our configurations and util libraries * change how settings are called * change pyathena_client to `client.py` * change pyathena functions to be just initialise and create_msck_repair_table, borrows from utils function * add pyathena_utils to core * add tests for pyathena client * test out github actions * add pyathena in requirements.txt * test if this works; change to install using poetry * does this work? * add pyathena to poetry * use poetry to run tests instead * change to 50% coverage first * add test for `pyathena_utils.py`, and change coverage to 90% again. * add test_pyathena_utils to be omitted for testing * change project directory structure * add another function to query from pyathena * add pandas to poetry * change directory of test coverage, and change ci action to only pull request * dummy push * add drop table function + tests * add ipynb tutorial of pyathena client * adding sphinx documentation for pyathena_client * adding pyathena client sphinx docs * add pyathena_utils_api_specs * edit omit coverage files * include documentation for pyathena_utils * change name of boto3 * skeleton of boto3, botocore functions * change client_sagemaker functions to be more generic * change sagemaker functions to be more generic * update docstrings * change emr class to be more generic * test files for sagemaker client * remove unwanted test * remove unwanted function in test sagemaker * convert to decorator calls * remove more lines * add back aws creds * add function parameters typehints * add test for client initialisation * add test for client emr * sphinx documentation for both sagemaker and emr * add tutorial notebook for sagemaker * tutorial notebook for boto3 emr client * finishing up editing of sphinx documentation * change to github logo to redirect to git repo * add hipages logo * manually removing `docs/_build` * aligning tutorial folder to master * commenting out the error tests for now * fix pyathena test to run with boto3 tests * change tutorial to use dummy model version group name * requirements updated * cleanup * pass lint * change the way we open sql files to `pathlib.read_text` instead of `open(...,'r')` * ensure the opened tar.gz file is closed Co-authored-by: Shu Ming Peh <> Co-authored-by: Vincent Koc <koconder@users.noreply.github.com>