Skip to content

Tags: hipagesgroup/data-ml-utils

Tags

v1.3.0

Toggle v1.3.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat: [TMX-960] add databricks sql connector (#38)

* add libraries

* test if this is stuck on the mocking of databricks sql connect

* change file name to prevent confusion from pytest

* test for connecting sql client

* initialise the cursor first so that we can invoke the description

* update package

v0.3.8

Toggle v0.3.8's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
chore: light refactoring of py scripts [DP-1454] (#20)

* chore: light refactoring of py scripts

* Update pyathena_utils.py

* update readme.md

* change tech docs url

* update toml and makefile

* Update mlflow_model_utils.py

* Update mlflow_model_utils.py

---------

Co-authored-by: Shu Ming Peh <>

v0.2.9

Toggle v0.2.9's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
fix: Fix dependencies issue (#16)

* change dependencies in toml file

* change requirements.txt

* change version with 'bug' fix

* change pyarrow version

* static pyarrow version

Co-authored-by: Shu Ming Peh <>

v0.2.8

Toggle v0.2.8's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
fix: [DP-1721] bug fix polling of master DNS address (#15)

* changed `response_body_dict` to be within the polling function, so that the correct value can be referenced

* changed version number

* change on-demand limit to 0

Co-authored-by: Shu Ming Peh <>

v0.2.7

Toggle v0.2.7's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
feat: [DP-1721] EMR cost optimisation through config (#14)

* add autoscaling and termination configuration to emr cluster

* change EMR version to be referenced from function variable

* add emr_version as a function parameter

* change idle time out to be 3600

* add response for creating emr cluster

* update test cases for EMR, moto does not work anymore for `run_job_flows` function

* changed version to 0.2.7 in `pyproject.toml`

* changed run_job_flow expected parameters in `conftest.py`

* add autoscaling policy and changed idle termination to 20 minutes

* add autoscaling role to expected run_job_flow parameters

* include polling library to `requirements*.txt` and `pyproject.toml`

* changing checking the master dns address to polling every 15 seconds for ~6 minutes.

* update test case for spinning up EMR cluster with polling

Co-authored-by: Shu Ming Peh <>

v0.2.6.0

Toggle v0.2.6.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
fix: patch fix emr cluster creation (#13)

* change package version in poetry

* change spot price duration, and the check of dns address after 8.5 minutes

* change python version to 0.2.6

* change sleep time to 10.5 minutes before checking the master dns address

Co-authored-by: Shu Ming Peh <>

v0.2.4

Toggle v0.2.4's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
feat: fix datatype conversion (#11)

v0.2.3

Toggle v0.2.3's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
feat: loosen dependency requirement (#9)

* change packaging version in requirements.txt

* change packaging version in requirements.txt

* feat: loosen dependencies requirements

v0.2.2

Toggle v0.2.2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
fix: fix EMR pytests (#8)

* add response of boto3 `get_cluster_id` and `describe_cluster`

* add pytest fixture

* stub boto3 functions that have malformatted input

* change version of package

* change installation guide on readme

Co-authored-by: Shu Ming Peh <>

v0.2.1

Toggle v0.2.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
feat: Add boto3 client (#5)

* adding skeleton of pyathena client first

* delete `common_utils` and have `core` instead to store our configurations and util libraries

* change how settings are called

* change pyathena_client to `client.py`

* change pyathena functions to be just initialise and create_msck_repair_table, borrows from utils function

* add pyathena_utils to core

* add tests for pyathena client

* test out github actions

* add pyathena in requirements.txt

* test if this works; change to install using poetry

* does this work?

* add pyathena to poetry

* use poetry to run tests instead

* change to 50% coverage first

* add test for `pyathena_utils.py`, and change coverage to 90% again.

* add test_pyathena_utils to be omitted for testing

* change project directory structure

* add another function to query from pyathena

* add pandas to poetry

* change directory of test coverage, and change ci action to only pull request

* dummy push

* add drop table function + tests

* add ipynb tutorial of pyathena client

* adding sphinx documentation for pyathena_client

* adding pyathena client sphinx docs

* add pyathena_utils_api_specs

* edit omit coverage files

* include documentation for pyathena_utils

* change name of boto3

* skeleton of boto3, botocore functions

* change client_sagemaker functions to be more generic

* change sagemaker functions to be more generic

* update docstrings

* change emr class to be more generic

* test files for sagemaker client

* remove unwanted test

* remove unwanted function in test sagemaker

* convert to decorator calls

* remove more lines

* add back aws creds

* add function parameters typehints

* add test for client initialisation

* add test for client emr

* sphinx documentation for both sagemaker and emr

* add tutorial notebook for sagemaker

* tutorial notebook for boto3 emr client

* finishing up editing of sphinx documentation

* change to github logo to redirect to git repo

* add hipages logo

* manually removing `docs/_build`

* aligning tutorial folder to master

* commenting out the error tests for now

* fix pyathena test to run with boto3 tests

* change tutorial to use dummy model version group name

* requirements updated

* cleanup

* pass lint

* change the way we open sql files to `pathlib.read_text` instead of `open(...,'r')`

* ensure the opened tar.gz file is closed

Co-authored-by: Shu Ming Peh <>
Co-authored-by: Vincent Koc <koconder@users.noreply.github.com>