Release/0.11.0 #141

qbphilip · 2021-11-10T13:42:43Z

Motivation and Context

Why was this PR created?

Release new version.
Changelog:

Add expectation-maximisation (EM) algorithm to learn with latent variables
Add a new tutorial on adding latent variable as well as identifying its candidate location
Allow users to provide self-defined CPD, as per Possibility to manually define the CPTs #18 and Assign known cpds to Bayesian Network #99
Generalise the utility function to get Markov blanket and incorporate it within StructureModel (cf. Error about separate graphs when learning BN #136)
Add a link to PyGraphviz installation guide under the installation prerequisites
Add GPU support to Pytorch implementation, as requested in About from_pandas #56 and Improve computational efficiency or speed up training process with GPU is applicable #114 (some issues remain)
Add an example for structure model exporting into first causalnex tutorial, as per Is there a feature that allows exporting the CausalNex generated graphs to DAGitty? #124 and Exporting graph structure #129
Fix infinite loop when querying InferenceEngine after a do-intervention that splits
the graph into two or more subgraphs, as per do_intervention never ends running despite simple query #45 and Issues with isolates in do-calculus #100
Fix decision tree and mdlp discretisations bug when input data is shuffled
Fix broken URLs in FAQ documentation, as per 404 in Documentation links #113 and Broken link in documentation #125
Fix integer index type checking for timeseries data, as per I have a question about Dynotears #74 and [DYNOTEARS] TypeError: Index must be integers #86
Fix bug where inputs to the DAGRegressor/Classifier yielded different predictions between float and int dtypes, as per DAGRegressor prediction data type changes results #140

How has this been tested?

What testing strategies have you used?

Checklist

Read the contributing guidelines
Opened this PR as a 'Draft Pull Request' if it is work-in-progress
Updated the documentation to reflect the code changes
Added a description of this change and added my name to the list of supporting contributions in the RELEASE.md file
Added tests to cover my changes
Assigned myself to the PR

Notice

I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":
I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.
I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.
I certify that the use of this contribution as authorised by the Apache 2.0 license does not violate the intellectual property rights of anyone else.

Merge of Master into dev after v0.10.0

Co-authored-by: philip_pilgerstorfer <philip.pilgerstorfer!@quantumblack.com>

* Fix linting issues * Remove str from @pytest.mark.parametrize in TestDAGClassifier due to conflict when checking fro non-numeric columns

* Fix broken URLs for release notes and user guide * Add name into contributors list * Add a release note for documentation fix

…ex into develop

…ls (#167)

…169) * Fix broken URL to RELEASE.md in the FAQ section as mentioned in #113 and #115 * Add a link to PyGraphviz installation guide under the installation prerequisites section in the CausalNex documentation. * Remove -j auto argument in sphinx-build to make sure that it works in MacOS.

* Check index type using is_integer() * Update RELEASE.md

* trying to reporduce the hanging error * first iteration to handle splitted graph by do-intervention. tests needed * reverted to develop as only commented * added functions docstrings and typing * first attempt at tests * fixing tests * flake * Speed up _create_node_functions by taking the first element using next() * Use next(iter(x)) to get the first element * first iterations to address PR comments and discussion: adds default marginals and returns upstream marginal from default ones rather than nans * removing nan import * setting default marginal with query() * lint changes * removed jupyter notebook file from git * lint changes * latest modifications * added my info and fix updates * fisrt attempt pr comment to avoid duplicate call to obtain parents of node * PR comment: avoide duplicate call to get node parents * fixing lint * Refactor _remove_disconnected_nodes() and tidy up codes and docstrings * Add edge one by one (instead of constructing edge list) to make graph construction faster * Linting * last PR comments * Shift add_node() inside the loop for _remove_disconnected_node Co-authored-by: oentaryorj <oentaryorj@gmail.com>

* NoTears as ScoreSolver * refactor continuous solver * adding attribute to access weight matrix * refactoring continuous solver * Adding fit_lasso method * add data_gen_continuous.py and tests (#38) * add data_gen.py * rename * wrap SM * move data_gen_continous, create test * more coverage * test fixes * move discrete sem to another file * node list dupe check test * ValueError tests * replace dag and sem functions with Ben's verions * add Ben's tests * fix fstring * to_numpy_array coverage * Ben's comments * remove unreachable ValueError for coverage * remove unused fixture * remove redundant test * remove extensions Co-Authored-By: Ben Horsburgh <Ben.Horsburgh@quantumblack.com> * docstring Co-Authored-By: Ben Horsburgh <Ben.Horsburgh@quantumblack.com> * docstring Co-Authored-By: Ben Horsburgh <Ben.Horsburgh@quantumblack.com> * docs Co-Authored-By: Ben Horsburgh <Ben.Horsburgh@quantumblack.com> * doc Co-Authored-By: Ben Horsburgh <Ben.Horsburgh@quantumblack.com> * rename file, g_dag rename to sm * add new tests for equal weights * docstring * steve docstring, leq fix * steve comments + docstrings Co-authored-by: Ben Horsburgh <Ben.Horsburgh@quantumblack.com> * Adding check input and removing some inner functions * Removing attribute original_ndarray * Aligning from pandas with new implementation * Adding tests for fit_lasso * More tests for lasso * wrapping tabu params in a dict * Aligning tests with new tabu params * Aligning from_pandas with new tabu_params * Adding fit_intercept option to _fit method * Adding scaling option * fixing lasso tests * Adding a test for fit_intercept * scaling option only with mean * Correction in lasso bounds * Fix typos * Remove duplicated bounds function * adding comments * add torch files from xunzheng * add from_numpy_torch function that works like from_numpy_lasso * lint * add requirements * add debug functionality * add visual debug test * add license * allow running as main for viz, comments * move to contrib * make multi layer work a bit better * add comment for multi layer * use polynomial dag constraint for better speed comparison * revert unnecessary changes to keep PR lean * revert unnecessary changes to keep PR lean * revert unnecessary changes to keep PR lean * fixes * refactor * Integrated tests * Checkpoint * Refactoring * Finished initial refactoring * All tests passed * Cleaning * Git add testing * Get adjacency matrix * Done cleaning * Revert change to original notears * Revert change to original structuremodel * Revert change to pylintrc * Undo deletion * Apply suggestions from Zain Co-authored-by: Zain Patel <zain.patel@quantumblack.com> * Addressed Zain comments * Migrated from_numpy * Delete contrib test * Migrated w_threshold * Some linting * Change to None * Undo deletion * List comprehension * Refactoring scipy and remove scipy optimiser * Refactoring * Refactoring * Refactoring complete * change from np to torch tensor * More refactoring * Remove hnew equal to None * Refactor again and remove commented line * Minor change * change to params * Addressing Philip's comment * Add property * Add fc2 property weights * Change to weights * Docstring * Linting * Linting completed * Add gpu code * Add gpu to from_numpy and from_pandas * cuda 0 run out of memory * Debugging * put 5 * debugging gpu * shift to inner loop * debugging not in place * Use cada instead of to * Support both interfaces * Benchmarking gpu * Minor fix * correct import path for test * change gpu from 5 to 1 * Debugging * Debugging * Experimenting * Linting * Remove hidden layer and gpu * Linting * Testing and linting * Correct pytorch to torch * Add init zeros * Change weight threshold to 0.25 * Revert requirements.txt * Update release.md * Address coments * Corrected release.md * fc1 to adjacency * Fix linting issues * Add Cython into test_requirements.txt * Update cython version in test_requirements.txt * Pytorch NOTEARS extension - GPU (#64) * Add gpu * Fix CUDA checking * Shift use_gpu argument into constructor and update test_pytorch_notears.py * Send DAG layer to device and fix linting issues * Tidy up * Fix linting Co-authored-by: oentaryorj <oentaryorj@gmail.com> * Remove pytorch from extras_require in setup.py as this is already included in requirements.txt * Update +mdlp-discretization version in test_requirements.txt * Add logging.info to indicate pytorch backend * Linting * User {} in place of dict() to meet the linting requirements * Use {} in place of dict() to meet the linting requirements Co-authored-by: Ben Horsburgh <Ben.Horsburgh@quantumblack.com> Co-authored-by: LiseDiagneQB <60981366+LiseDiagneQB@users.noreply.github.com> Co-authored-by: Casey Juanxi Li <50737712+caseyliqb@users.noreply.github.com> Co-authored-by: qbphilip <philip.pilgerstorfer@quantumblack.com> Co-authored-by: Steve Ler <Steve.Ler@quantumblack.com> Co-authored-by: stevelersl <55385183+SteveLerQB@users.noreply.github.com> Co-authored-by: Zain Patel <zain.patel@quantumblack.com>

* Speed up query by caching baseline marginals * Update type hint in inference.py * Refactor query() and _single_query(); handle invalid observations type * Fix type hint for observations argument in query() and _single_query()

* Add Expectation Maximisation (EM) algorithm implementation * Massively scale up E-step by converting multiindexed pandas dataframe into dict of dict * Add a tutorial on latent variable model based on EM algorithm * Add test cases for the EM algorithm and ensure 100% test coverage * Update sklearn tutorials * Add note on EM algorithm into RELEASE.md * Reorganise .dot and .jpg files under docs/source/03_tutorial/supporting_files folder * Fix link in plotting tutorial * Exclude docs/source/03_tutorial from end-of-file check * Update doc_requirements.txt * Add DOT to JPG conversion steps into build-docs.sh

* Developed add_cpd function for bn * Change docstring * Add more tests * Fix python 3.5 * Change into pandas dataframe * Modify test * Update causalnex/network/network.py Co-Authored-By: Zain Patel <52913697+ZainPatelQB@users.noreply.github.com> * Remove list * Update version to 0.11.0 * Tweat RELEASE.md * Exclude docs/source/03_tutorial from large files checking in .pre-commit-config.yaml * Merge latest codes from develop branch and resolve conflicts * Revert test_network_model.py * Revert 04_sklearn_tutorial.ipynb Co-authored-by: Zain Patel <52913697+ZainPatelQB@users.noreply.github.com> Co-authored-by: oentaryorj <oentaryorj@gmail.com>

…atting (#179) * Convert .format() to f-string * Remove redundant str() and create a variable for ', '.join()

…\end{align} into EM formula (#178)

* fix: doc_requirements.txt to reduce vulnerabilities The following vulnerabilities are fixed by pinning transitive dependencies: - https://snyk.io/vuln/SNYK-PYTHON-SPHINX-570772 - https://snyk.io/vuln/SNYK-PYTHON-SPHINX-570773 * Update sphinx requirements.txt * Adjust conf.py and breadcrumbs.html accordingly * Remove setting _dummy * Restore _dummy Co-authored-by: snyk-bot <snyk-bot@snyk.io> Co-authored-by: oentaryorj <oentaryorj@gmail.com> Co-authored-by: Richard Oentaryo <oentaryorj@users.noreply.github.com>

* Add example for structure model exporting in first causalnex tutorial * Add a release note for the exporting example

Co-authored-by: Richard Oentaryo <oentaryorj@users.noreply.github.com>

* Fix broken tutorial link in README.md * Replace latex math mode $$ with $ in latent variable tutorial * Revert README.md

…al (#184) * Fix broken tutorial link in README.md * Remove tables for images, as this may cause incorrect image referencing in the final readthedocs

Thank you for the contribution!

* Proofread latent variable tutorial end to end

* Add CITATION.cff into repository * Add missing contributors to RELEASE.md * Fix linting issues * Update name list in CITATION.cff * Add citation instruction into README.md

…ial (#187) * Consolidate all utility functions in tutorial_utils.py * Add location identification strategy into latent variable tutorial * Tweak markdown in latent variabl tutorial * Fix linting issues * Remove student-por.csv * Rename n_splits to n_cv_folds for clarity * Add copyright notice into tutorial_utils.py * Add consultation with domain experts to description of location identification strategy * Add type hints to _compute_auc_stub and _compute_auc_lv_stub * Fix get_markov_blanket in predict_using_all_nodes and add docstring to private helper functions * Simplify ValueError message in network.py * Fix get_markov_blanket() * Simplify get_markov_blanket() * Fix return typehint * Rename n_cv_folds to n_splits * Add release note for get_markov_blanket() * Revert release notes * Add a note on LV tutorial in RELEASE.md * Update RELEASE.md

* Tidy up docstrings * Fix linting * Empty commit * Update causalnex/network/network.py * Update causalnex/network/network.py Co-authored-by: Zain Patel <zain.patel@quantumblack.com>

* Consolidate all utility functions in tutorial_utils.py * Add location identification strategy into latent variable tutorial * Tweak markdown in latent variabl tutorial * Fix linting issues * Remove student-por.csv * Rename n_splits to n_cv_folds for clarity * Add copyright notice into tutorial_utils.py * Add consultation with domain experts to description of location identification strategy * Add type hints to _compute_auc_stub and _compute_auc_lv_stub * Fix get_markov_blanket in predict_using_all_nodes and add docstring to private helper functions * Simplify ValueError message in network.py * Fix get_markov_blanket() * Simplify get_markov_blanket() * Fix return typehint * Rename n_cv_folds to n_splits * Add a note on LV tutorial in RELEASE.md * Move baseline marginal initialisation into do_intervention * Restore LV tutorial from develop branch

* Update .pre-commit-config.yaml * Update config.yml * Update PULL_REQUEST_TEMPLATE.md * Update CONTRIBUTING.md * Update network.py * Update plots.py * Update test_categorical_variable_mapper.py * Update config.yml * Update .pre-commit-config.yaml * Move notice to the bottom of pull request template * Update RELEASE.md * Remove unnecessary f-string * Simplify ValueError message in network.py * Simplify ValueError message in network.py * Linting network.py * Update RELEASE.md Co-authored-by: Richard Oentaryo <oentaryorj@users.noreply.github.com>

…ex into develop

#188) * Consolidate all utility functions in tutorial_utils.py * Add location identification strategy into latent variable tutorial * Tweak markdown in latent variabl tutorial * Fix linting issues * Remove student-por.csv * Rename n_splits to n_cv_folds for clarity * Add copyright notice into tutorial_utils.py * Add consultation with domain experts to description of location identification strategy * Add type hints to _compute_auc_stub and _compute_auc_lv_stub * Fix get_markov_blanket in predict_using_all_nodes and add docstring to private helper functions * Simplify ValueError message in network.py * Resolve linting issues * Add raises into docstring of get_markov_blanket * Update tutorial_utils.py to use StructureModel.get_markov_blanket instead * Generalise get_markov_blanket to weighted graph * Generalise get_markov_blanket to weighted graph * Add unit tests for StructureModel.get_markov_blanket() * Rename n_cv_folds to n_splits * Add release note for get_markov_blanket() * Delete get_markov_blanket in network_utils.py and update RELEASE.md accordingly * Remove bn_train_model fixture * Revert docstrings in structure/pytorch/dist_type * Revert docstrings of test scripts * Revert docstrings in the remaining test scripts * Update RELEASE.md * Update tuple type hint * Update tuple typehint

The following vulnerabilities are fixed by pinning transitive dependencies: - https://snyk.io/vuln/SNYK-PYTHON-SCIKITLEARN-1079100 Co-authored-by: snyk-bot <snyk-bot@snyk.io>

…solves #140) Co-authored-by: philip_pilgerstorfer <philip.pilgerstorfer!@quantumblack.com>

qbphilip and others added 30 commits May 11, 2021 19:43

Merge pull request #112 from quantumblacklabs/master

694c3cc

Merge of Master into dev after v0.10.0

fix docs (#164)

54fed40

Co-authored-by: philip_pilgerstorfer <philip.pilgerstorfer!@quantumblack.com>

fix discretisations bug when input index is shuffled

e04783b

bump dependencies

037afb6

Co-authored-by: philip_pilgerstorfer <philip.pilgerstorfer!@quantumblack.com>

Fix unit test and linting issues in CircleCI pipeline (#166)

e9a4db6

* Fix linting issues * Remove str from @pytest.mark.parametrize in TestDAGClassifier due to conflict when checking fro non-numeric columns

Fix broken links in documentation (#165)

372a390

* Fix broken URLs for release notes and user guide * Add name into contributors list * Add a release note for documentation fix

Merge branch 'develop' of github.com:quantumblacklabs/private-causaln…

add2d19

…ex into develop

Fix typos in first tutorial, distribution schema and plotting tutoria…

a1866ea

…ls (#167)

More robust integer index type checking for time series data (#171)

3efb04a

* Check index type using is_integer() * Update RELEASE.md

Use f-string in place of .format() for standardisation of string form…

8450d7e

…atting (#179) * Convert .format() to f-string * Remove redundant str() and create a variable for ', '.join()

Rewrite summary of steps table in markdown syntax; Add \begin{align} …

d6b3126

…\end{align} into EM formula (#178)

Add prefer='threads' to parallel EM (#181)

e7ec39d

Add graph exporting example into first causalnex tutorial (#180)

5b6a323

* Add example for structure model exporting in first causalnex tutorial * Add a release note for the exporting example

Typo in the first tutorial (#131)

f7e8492

Co-authored-by: Richard Oentaryo <oentaryorj@users.noreply.github.com>

Fix LaTeX formula rendering in latent variable tutorial (#183)

6df60ae

* Fix broken tutorial link in README.md * Replace latex math mode $$ with $ in latent variable tutorial * Revert README.md

Fix broken tutorial link in README.md (#182)

821526b

Fix wrong image references inside table in the latent variable tutori…

a1967dc

…al (#184) * Fix broken tutorial link in README.md * Remove tables for images, as this may cause incorrect image referencing in the final readthedocs

Improve legibility of the tutorials (#132)

08649f0

Thank you for the contribution!

Remove duplicate torch dependency in test_requirements.txt (#186)

ccad902

Proofread and fix latent variable tutorial end-to-end (#185)

fe4d540

* Proofread latent variable tutorial end to end

Add CITATION.cff to repository (#189)

0e754ef

* Add CITATION.cff into repository * Add missing contributors to RELEASE.md * Fix linting issues * Update name list in CITATION.cff * Add citation instruction into README.md

Tidy up docstrings (#191)

660f194

* Tidy up docstrings * Fix linting * Empty commit * Update causalnex/network/network.py * Update causalnex/network/network.py Co-authored-by: Zain Patel <zain.patel@quantumblack.com>

oentaryorj and others added 7 commits October 15, 2021 18:39

Merge branch 'develop' of github.com:quantumblacklabs/private-causaln…

49d94c4

…ex into develop

fix: test_requirements.txt to reduce vulnerabilities (#192)

b567525

The following vulnerabilities are fixed by pinning transitive dependencies: - https://snyk.io/vuln/SNYK-PYTHON-SCIKITLEARN-1079100 Co-authored-by: snyk-bot <snyk-bot@snyk.io>

Fix error in DAGRegressor prediction due to numpy.dtype behaviour (re…

3118e16

…solves #140) Co-authored-by: philip_pilgerstorfer <philip.pilgerstorfer!@quantumblack.com>

reshuffle release changelog

b4566ee

qbphilip requested review from oentaryorj and mzjp2 November 10, 2021 13:42

qbphilip self-assigned this Nov 10, 2021

oentaryorj approved these changes Nov 10, 2021

View reviewed changes

qbphilip merged commit aa39d8a into master Nov 11, 2021

qbphilip deleted the release/0.11.0 branch November 11, 2021 14:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release/0.11.0 #141

Release/0.11.0 #141

qbphilip commented Nov 10, 2021 •

edited by oentaryorj

Loading

Release/0.11.0 #141

Release/0.11.0 #141

Conversation

qbphilip commented Nov 10, 2021 • edited by oentaryorj Loading

Motivation and Context

How has this been tested?

Checklist

Notice

qbphilip commented Nov 10, 2021 •

edited by oentaryorj

Loading