Tags · hipagesgroup/data-ml-utils

v1.3.0

feat: [TMX-960] add databricks sql connector (#38)

* add libraries

* test if this is stuck on the mocking of databricks sql connect

* change file name to prevent confusion from pytest

* test for connecting sql client

* initialise the cursor first so that we can invoke the description

* update package

Jul 1, 2024
8c801af
zip
tar.gz
Notes

v0.3.8

chore: light refactoring of py scripts [DP-1454] (#20)

* chore: light refactoring of py scripts

* Update pyathena_utils.py

* update readme.md

* change tech docs url

* update toml and makefile

* Update mlflow_model_utils.py

* Update mlflow_model_utils.py

---------

Co-authored-by: Shu Ming Peh <>

Apr 3, 2023
959d6a2
zip
tar.gz
Notes

v0.2.9

fix: Fix dependencies issue (#16)

* change dependencies in toml file

* change requirements.txt

* change version with 'bug' fix

* change pyarrow version

* static pyarrow version

Co-authored-by: Shu Ming Peh <>

Aug 26, 2022
d26ca0c
zip
tar.gz
Notes

v0.2.8

fix: [DP-1721] bug fix polling of master DNS address (#15)

* changed `response_body_dict` to be within the polling function, so that the correct value can be referenced

* changed version number

* change on-demand limit to 0

Co-authored-by: Shu Ming Peh <>

Aug 9, 2022
3f18bd0
zip
tar.gz
Notes

v0.2.7

feat: [DP-1721] EMR cost optimisation through config (#14)

* add autoscaling and termination configuration to emr cluster

* change EMR version to be referenced from function variable

* add emr_version as a function parameter

* change idle time out to be 3600

* add response for creating emr cluster

* update test cases for EMR, moto does not work anymore for `run_job_flows` function

* changed version to 0.2.7 in `pyproject.toml`

* changed run_job_flow expected parameters in `conftest.py`

* add autoscaling policy and changed idle termination to 20 minutes

* add autoscaling role to expected run_job_flow parameters

* include polling library to `requirements*.txt` and `pyproject.toml`

* changing checking the master dns address to polling every 15 seconds for ~6 minutes.

* update test case for spinning up EMR cluster with polling

Co-authored-by: Shu Ming Peh <>

Aug 7, 2022
523c92f
zip
tar.gz
Notes

v0.2.6.0

fix: patch fix emr cluster creation (#13)

* change package version in poetry

* change spot price duration, and the check of dns address after 8.5 minutes

* change python version to 0.2.6

* change sleep time to 10.5 minutes before checking the master dns address

Co-authored-by: Shu Ming Peh <>

Jul 22, 2022
16b8c92
zip
tar.gz
Notes

v0.2.4

feat: fix datatype conversion (#11)

May 25, 2022
aa3b3fb
zip
tar.gz
Notes

v0.2.3

feat: loosen dependency requirement (#9)

* change packaging version in requirements.txt

* change packaging version in requirements.txt

* feat: loosen dependencies requirements

May 17, 2022
5cb945e
zip
tar.gz
Notes

v0.2.2

fix: fix EMR pytests (#8)

* add response of boto3 `get_cluster_id` and `describe_cluster`

* add pytest fixture

* stub boto3 functions that have malformatted input

* change version of package

* change installation guide on readme

Co-authored-by: Shu Ming Peh <>

May 11, 2022
b28c575
zip
tar.gz
Notes

v0.2.1

feat: Add boto3 client (#5)

* adding skeleton of pyathena client first

* delete `common_utils` and have `core` instead to store our configurations and util libraries

* change how settings are called

* change pyathena_client to `client.py`

* change pyathena functions to be just initialise and create_msck_repair_table, borrows from utils function

* add pyathena_utils to core

* add tests for pyathena client

* test out github actions

* add pyathena in requirements.txt

* test if this works; change to install using poetry

* does this work?

* add pyathena to poetry

* use poetry to run tests instead

* change to 50% coverage first

* add test for `pyathena_utils.py`, and change coverage to 90% again.

* add test_pyathena_utils to be omitted for testing

* change project directory structure

* add another function to query from pyathena

* add pandas to poetry

* change directory of test coverage, and change ci action to only pull request

* dummy push

* add drop table function + tests

* add ipynb tutorial of pyathena client

* adding sphinx documentation for pyathena_client

* adding pyathena client sphinx docs

* add pyathena_utils_api_specs

* edit omit coverage files

* include documentation for pyathena_utils

* change name of boto3

* skeleton of boto3, botocore functions

* change client_sagemaker functions to be more generic

* change sagemaker functions to be more generic

* update docstrings

* change emr class to be more generic

* test files for sagemaker client

* remove unwanted test

* remove unwanted function in test sagemaker

* convert to decorator calls

* remove more lines

* add back aws creds

* add function parameters typehints

* add test for client initialisation

* add test for client emr

* sphinx documentation for both sagemaker and emr

* add tutorial notebook for sagemaker

* tutorial notebook for boto3 emr client

* finishing up editing of sphinx documentation

* change to github logo to redirect to git repo

* add hipages logo

* manually removing `docs/_build`

* aligning tutorial folder to master

* commenting out the error tests for now

* fix pyathena test to run with boto3 tests

* change tutorial to use dummy model version group name

* requirements updated

* cleanup

* pass lint

* change the way we open sql files to `pathlib.read_text` instead of `open(...,'r')`

* ensure the opened tar.gz file is closed

Co-authored-by: Shu Ming Peh <>
Co-authored-by: Vincent Koc <koconder@users.noreply.github.com>

May 5, 2022
b712e9b
zip
tar.gz
Notes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.3.0

v0.3.8

v0.2.9

v0.2.8

v0.2.7

v0.2.6.0

v0.2.4

v0.2.3

v0.2.2

v0.2.1

Tags: hipagesgroup/data-ml-utils