Skip to content

Commit

Permalink
Merge pull request #182 from quantumblacklabs/release/0.11.1
Browse files Browse the repository at this point in the history
Changelog:
 * Add python 3.9, 3.10 support
 * Unlock Scipy restrictions
 * Fix bug: infinite loop on lv inference engine
 * Fix DAGLayer moving out of gpu during optimization step of Pytorch learning
 * Fix CPD comparison of floating point - rounding issue
 * Fix set_cpd for parentless nodes that are not MultiIndex
 * Add Docker files for development on a dockerized environment
  • Loading branch information
qbphilip authored Nov 16, 2022
2 parents aa39d8a + 4274997 commit d67ca0c
Show file tree
Hide file tree
Showing 34 changed files with 328 additions and 127 deletions.
16 changes: 16 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,18 @@ jobs:
PYTHON_VERSION: '3.8'
<<: *unit_test_steps

unit_tests_39:
docker: *python
environment:
PYTHON_VERSION: '3.9'
<<: *unit_test_steps

unit_tests_310:
docker: *python
environment:
PYTHON_VERSION: '3.10'
<<: *unit_test_steps

linters_37:
docker: *python
environment:
Expand Down Expand Up @@ -127,12 +139,16 @@ workflows:
- unit_tests_36
- unit_tests_37
- unit_tests_38
- unit_tests_39
- unit_tests_310
- linters_37
- docs
- all_circleci_checks_succeeded:
requires:
- unit_tests_36
- unit_tests_37
- unit_tests_38
- unit_tests_39
- unit_tests_310
- linters_37
- docs
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ repos:
exclude: ^causalnex/ebaybbn

- repo: https://github.com/psf/black
rev: 20.8b1
rev: 22.3.0
hooks:
- id: black

Expand Down
35 changes: 35 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,41 @@ This command will only work on Unix-like systems and requires `pandoc` to be ins

> ❗ Running `make build-docs` in a Python 3.5 environment may sometimes yield multiple warning messages like the following: `WARNING: toctree contains reference to nonexisting document '04_user_guide/04_user_guide'`. You can simply ignore them or switch to Python 3.6+ when building documentation.
## Developing in Docker
The Docker images have all the necessary dependencies built in. To develop using the docker containers do the following

1. Build the necessary container
```bash
export CONTAINER_TYPE='cpu' # or gpu
docker build -t quantumblacklabs/causalnex:devel-$CONTAINER_TYPE -f devel-$CONTAINER_TYPE.Dockerfile .
```

2. Run the container in interactive mode.
For running on CPU, simply run the docker container:
```bash
docker run -it -w /causalnex_src -v $PWD:/causalnex_src quantumblacklabs/causalnex:devel-cpu bash
```

For the `gpu` type your host machine needs access to a GPU with the CUDA driver installed. The `devel-gpu` image will be able to access the gpu on the host
```bash
docker run --gpus all -it -w /causalnex_src -v $PWD:/causalnex_src quantumblacklabs/causalnex:devel-gpu bash
```

3. Run tests
```bash
make test
```

4. If all tests pass you can build the wheel
```bash
make package
```

5. Now you can install the pip package that has your changes in either the container or on your host machine. The name of installed package will be `causalnex`
```bash
make install
```

## Hints on pre-commit usage
The checks will automatically run on all the changed files on each commit.
Even more extensive set of checks (including the heavy set of `pylint` checks)
Expand Down
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@

-----------------

| Theme | Status |
|------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Latest Release | [![PyPI version](https://badge.fury.io/py/causalnex.svg)](https://pypi.org/project/causalnex/) |
| Python Version | [![Python Version](https://img.shields.io/badge/python-3.6%20%7C%203.7%20%7C%203.8-blue.svg)](https://pypi.org/project/causalnex/) |
| `master` Branch Build | [![CircleCI](https://circleci.com/gh/quantumblacklabs/causalnex/tree/master.svg?style=shield&circle-token=92ab70f03f3183655473dad16be641959cd31b83)](https://circleci.com/gh/quantumblacklabs/causalnex/tree/master) |
| Theme | Status |
|------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Latest Release | [![PyPI version](https://badge.fury.io/py/causalnex.svg)](https://pypi.org/project/causalnex/) |
| Python Version | [![Python Version](https://img.shields.io/badge/python-3.6%20%7C%203.7%20%7C%203.8%20%7C%203.9%20%7C%203.10-blue.svg)](https://pypi.org/project/causalnex/) |
| `master` Branch Build | [![CircleCI](https://circleci.com/gh/quantumblacklabs/causalnex/tree/master.svg?style=shield&circle-token=92ab70f03f3183655473dad16be641959cd31b83)](https://circleci.com/gh/quantumblacklabs/causalnex/tree/master) |
| `develop` Branch Build | [![CircleCI](https://circleci.com/gh/quantumblacklabs/causalnex/tree/develop.svg?style=shield&circle-token=92ab70f03f3183655473dad16be641959cd31b83)](https://circleci.com/gh/quantumblacklabs/causalnex/tree/develop) |
| Documentation Build | [![Documentation](https://readthedocs.org/projects/causalnex/badge/?version=latest)](https://causalnex.readthedocs.io/) |
| License | [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) |
| Code Style | [![Code Style: Black](https://img.shields.io/badge/code%20style-black-black.svg)](https://github.com/ambv/black) |
| Documentation Build | [![Documentation](https://readthedocs.org/projects/causalnex/badge/?version=latest)](https://causalnex.readthedocs.io/) |
| License | [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) |
| Code Style | [![Code Style: Black](https://img.shields.io/badge/code%20style-black-black.svg)](https://github.com/ambv/black) |


## What is CausalNex?
Expand Down Expand Up @@ -93,4 +93,4 @@ See our [LICENSE](LICENSE.md) for more detail.

## We're hiring!

Do you want to be part of the team that builds CausalNex and [other great products](https://quantumblack.com/labs) at QuantumBlack? If so, you're in luck! QuantumBlack is currently hiring Machine Learning Engineers who love using data to drive their decisions. Take a look at [our open positions](https://www.quantumblack.com/careers/current-openings#content) and see if you're a fit.
Do you want to be part of the team that builds CausalNex and [other great products](https://www.mckinsey.com/capabilities/quantumblack/labs) at QuantumBlack? If so, you're in luck! QuantumBlack is currently hiring Machine Learning Engineers who love using data to drive their decisions. Take a look at [our open positions](https://www.mckinsey.com/capabilities/quantumblack/careers-and-community) and see if you're a fit.
16 changes: 15 additions & 1 deletion RELEASE.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
# Upcoming release

# Release 0.11.1

* Add python 3.9, 3.10 support
* Unlock Scipy restrictions
* Fix bug: infinite loop on lv inference engine
* Fix DAGLayer moving out of gpu during optimization step of Pytorch learning
* Fix CPD comparison of floating point - rounding issue
* Fix set_cpd for parentless nodes that are not MultiIndex
* Add Docker files for development on a dockerized environment

# Release 0.11.0
* Add expectation-maximisation (EM) algorithm to learn with latent variables
* Add a new tutorial on adding latent variable as well as identifying its candidate location
Expand All @@ -12,6 +24,7 @@
* Fix broken URLs in FAQ documentation, as per #113 and #125
* Fix integer index type checking for timeseries data, as per #74 and #86
* Fix bug where inputs to the DAGRegressor/Classifier yielded different predictions between float and int dtypes, as per #140
* Fix bug in set_cpd() where only pd.MultiIndex dataframes were considered which does not account for parentless nodes, as per #146

# Release 0.10.0
* Add supervised discretisation strategies using Decision Tree and MDLP algorithms
Expand Down Expand Up @@ -130,7 +143,8 @@ This work was later turned into a product thanks to the following contributors:
, [Nikolaos Tsaousis](https://www.linkedin.com/in/ntsaousis/)
, [Shuhei Ishida](https://www.linkedin.com/in/shuhei-i/)
, [Francesca Sogaro](https://www.linkedin.com/in/francesca-sogaro/)
, [Deepyaman Datta](https://www.linkedin.com/in/deepyaman/).
, [Deepyaman Datta](https://www.linkedin.com/in/deepyaman/)
, [Ryan Ng](https://www.linkedin.com/in/ryannsj/).

CausalNex would also not be possible without the generous sharing from leading researches in the field of causal inference
and we are grateful to everyone who advised and supported us, filed issues or helped resolve them, asked and answered
Expand Down
2 changes: 1 addition & 1 deletion causalnex/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,6 @@
causalnex toolkit for causal reasoning (Bayesian Networks / Inference)
"""

__version__ = "0.11.0"
__version__ = "0.11.1"

__all__ = ["structure", "discretiser", "evaluation", "inference", "network", "plots"]
5 changes: 3 additions & 2 deletions causalnex/inference/inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
"""
import copy
import inspect
import math
import re
import types
from typing import Any, Callable, Dict, Hashable, List, Optional, Tuple, Union
Expand Down Expand Up @@ -205,7 +206,7 @@ def _do(self, observation: str, state: Dict[Hashable, float]):
Raises:
ValueError: if states do not match original states of the node, or probabilities do not sum to 1.
"""
if sum(state.values()) != 1.0:
if not math.isclose(sum(state.values()), 1.0):
raise ValueError("The cpd for the provided observation must sum to 1")

if max(state.values()) > 1.0 or min(state.values()) < 0:
Expand Down Expand Up @@ -345,7 +346,7 @@ def template() -> float:
# initially there are none present, but caller will add appropriate arguments to the function
# getargvalues was "inadvertently marked as deprecated in Python 3.5"
# https://docs.python.org/3/library/inspect.html#inspect.getfullargspec
arg_spec = inspect.getargvalues(inspect.currentframe())
arg_spec = inspect.getargvalues(inspect.currentframe()) # pragma: no cover

return self._cpds[arg_spec.args[0]][ # target name
arg_spec.locals[arg_spec.args[0]]
Expand Down
42 changes: 34 additions & 8 deletions causalnex/network/network.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
import pandas as pd
from pgmpy.estimators import BayesianEstimator, MaximumLikelihoodEstimator
from pgmpy.factors.discrete.CPD import TabularCPD
from pgmpy.models import BayesianModel
from pgmpy.models import BayesianNetwork as BayesianModel

from causalnex.estimator.em import EMSingleLatentVariable
from causalnex.structure import StructureModel
Expand Down Expand Up @@ -294,10 +294,14 @@ def set_cpd(self, node: str, df: pd.DataFrame) -> "BayesianNetwork":
parent_node: self.node_states[parent_node]
for parent_node in self._structure.predecessors(node)
}
table_parents = {
name: set(df.columns.levels[i].values)
for i, name in enumerate(df.columns.names)
}
if isinstance(df.columns, pd.MultiIndex):
table_parents = {
name: set(df.columns.levels[i].values)
for i, name in enumerate(df.columns.names)
}
else:
table_parents = {}

if not (
set(df.index.values) == self.node_states[node]
and true_parents == table_parents
Expand All @@ -307,9 +311,17 @@ def set_cpd(self, node: str, df: pd.DataFrame) -> "BayesianNetwork":

sorted_df = df.reindex(sorted(df.columns), axis=1)
node_card = len(self.node_states[node])
evidence, evidence_card = zip(
*[(key, len(table_parents[key])) for key in sorted(table_parents.keys())]
)

if any(table_parents): # Check whether table parents is empty
evidence, evidence_card = zip(
*[
(key, len(table_parents[key]))
for key in sorted(table_parents.keys())
]
)
else:
evidence, evidence_card = (None, None)

tabular_cpd = TabularCPD(
node,
node_card,
Expand Down Expand Up @@ -538,6 +550,7 @@ def fit_latent_cpds( # pylint: disable=too-many-arguments
if the latent variable cannot be found in the network or
if the latent variable is present/observed in the data
if the latent variable states are empty
if additional non-lv nodes are added to the subgraph without being fit
"""
if not isinstance(lv_name, str):
raise ValueError(f"Invalid latent variable name '{lv_name}'")
Expand All @@ -546,6 +559,19 @@ def fit_latent_cpds( # pylint: disable=too-many-arguments
if not isinstance(lv_states, list) or len(lv_states) == 0:
raise ValueError(f"Latent variable '{lv_name}' contains no states")

# Unaccounted nodes that have not been fit will result in infinite loop during
# generation of InferenceEngine
unaccounted_nodes = []
for node in self.nodes:
if (node not in self.cpds) and (node != lv_name):
unaccounted_nodes.append(node)
if len(unaccounted_nodes) > 0:
raise ValueError(
f"Node(s) {unaccounted_nodes} have not had their states and cpds fit. "
"Before fitting latent variable cpds, add the additional nodes and"
"edges to the subgraph and fit with .fit_node_states_and_cpds() first."
)

# Register states for the latent variable
self._node_states[lv_name] = {v: k for k, v in enumerate(sorted(lv_states))}

Expand Down
2 changes: 1 addition & 1 deletion causalnex/plots/plots.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ def plot_structure(
node_attributes: Dict[str, Dict[str, str]] = None,
edge_attributes: Dict[Tuple[str, str], Dict[str, str]] = None,
graph_attributes: Dict[str, str] = None,
): # pylint: disable=missing-return-type-doc
):
"""
Plot a `StructureModel` using pygraphviz.
Expand Down
2 changes: 1 addition & 1 deletion causalnex/structure/data_generators/wrappers.py
Original file line number Diff line number Diff line change
Expand Up @@ -485,7 +485,7 @@ def _generate_inter_structure(
u = []
for i in range(p):
u_i = np.random.uniform(low=w_min, high=w_max, size=[num_nodes, num_nodes]) / (
w_decay ** i
w_decay**i
)
u_i[np.random.rand(num_nodes, num_nodes) < neg] *= -1
u.append(u_i)
Expand Down
18 changes: 9 additions & 9 deletions causalnex/structure/dynotears.py
Original file line number Diff line number Diff line change
Expand Up @@ -314,12 +314,12 @@ def _reshape_wa(
w_mat = w_plus - w_minus
a_plus = (
w_tilde[2 * d_vars :]
.reshape(2 * p_orders, d_vars ** 2)[::2]
.reshape(2 * p_orders, d_vars**2)[::2]
.reshape(d_vars * p_orders, d_vars)
)
a_minus = (
w_tilde[2 * d_vars :]
.reshape(2 * p_orders, d_vars ** 2)[1::2]
.reshape(2 * p_orders, d_vars**2)[1::2]
.reshape(d_vars * p_orders, d_vars)
)
a_mat = a_plus - a_minus
Expand Down Expand Up @@ -422,8 +422,8 @@ def _func(wa_vec: np.ndarray) -> float:
)
)
_h_value = _h(wa_vec)
l1_penalty = lambda_w * (wa_vec[: 2 * d_vars ** 2].sum()) + lambda_a * (
wa_vec[2 * d_vars ** 2 :].sum()
l1_penalty = lambda_w * (wa_vec[: 2 * d_vars**2].sum()) + lambda_a * (
wa_vec[2 * d_vars**2 :].sum()
)
return loss + 0.5 * rho * _h_value * _h_value + alpha * _h_value + l1_penalty

Expand Down Expand Up @@ -457,16 +457,16 @@ def _grad(wa_vec: np.ndarray) -> np.ndarray:

grad_vec_w = np.append(
obj_grad_w, -obj_grad_w, axis=0
).flatten() + lambda_w * np.ones(2 * d_vars ** 2)
grad_vec_a = obj_grad_a.reshape(p_orders, d_vars ** 2)
).flatten() + lambda_w * np.ones(2 * d_vars**2)
grad_vec_a = obj_grad_a.reshape(p_orders, d_vars**2)
grad_vec_a = np.hstack(
(grad_vec_a, -grad_vec_a)
).flatten() + lambda_a * np.ones(2 * p_orders * d_vars ** 2)
).flatten() + lambda_a * np.ones(2 * p_orders * d_vars**2)
return np.append(grad_vec_w, grad_vec_a, axis=0)

# initialise matrix, weights and constraints
wa_est = np.zeros(2 * (p_orders + 1) * d_vars ** 2)
wa_new = np.zeros(2 * (p_orders + 1) * d_vars ** 2)
wa_est = np.zeros(2 * (p_orders + 1) * d_vars**2)
wa_new = np.zeros(2 * (p_orders + 1) * d_vars**2)
rho, alpha, h_value, h_new = 1.0, 0.0, np.inf, np.inf

for n_iter in range(max_iter):
Expand Down
18 changes: 9 additions & 9 deletions causalnex/structure/notears.py
Original file line number Diff line number Diff line change
Expand Up @@ -476,8 +476,8 @@ def _func(w_vec: np.ndarray) -> float:
float: objective.
"""

w_pos = w_vec[: d ** 2]
w_neg = w_vec[d ** 2 :]
w_pos = w_vec[: d**2]
w_neg = w_vec[d**2 :]

wmat_pos = w_pos.reshape([d, d])
wmat_neg = w_neg.reshape([d, d])
Expand All @@ -498,10 +498,10 @@ def _grad(w_vec: np.ndarray) -> np.ndarray:
np.ndarray: gradient vector.
"""

w_pos = w_vec[: d ** 2]
w_neg = w_vec[d ** 2 :]
w_pos = w_vec[: d**2]
w_neg = w_vec[d**2 :]

grad_vec = np.zeros(2 * d ** 2)
grad_vec = np.zeros(2 * d**2)
wmat_pos = w_pos.reshape([d, d])
wmat_neg = w_neg.reshape([d, d])

Expand All @@ -514,8 +514,8 @@ def _grad(w_vec: np.ndarray) -> np.ndarray:
+ (rho * (np.trace(exp_hdmrd) - d) + alpha) * exp_hdmrd.T * wmat * 2
)
lbd_grad = beta * np.ones(d * d)
grad_vec[: d ** 2] = obj_grad.flatten() + lbd_grad
grad_vec[d ** 2 :] = -obj_grad.flatten() + lbd_grad
grad_vec[: d**2] = obj_grad.flatten() + lbd_grad
grad_vec[d**2 :] = -obj_grad.flatten() + lbd_grad

return grad_vec

Expand All @@ -533,7 +533,7 @@ def _grad(w_vec: np.ndarray) -> np.ndarray:
sol = sopt.minimize(_func, w_est, method="L-BFGS-B", jac=_grad, bounds=bnds)
w_new = sol.x
h_new = _h(
w_new[: d ** 2].reshape([d, d]) - w_new[d ** 2 :].reshape([d, d])
w_new[: d**2].reshape([d, d]) - w_new[d**2 :].reshape([d, d])
)
if h_new > 0.25 * h_val:
rho *= 10
Expand All @@ -545,7 +545,7 @@ def _grad(w_vec: np.ndarray) -> np.ndarray:
if h_val > h_tol and n_iter == max_iter - 1:
warnings.warn("Failed to converge. Consider increasing max_iter.")

w_new = w_est[: d ** 2].reshape([d, d]) - w_est[d ** 2 :].reshape([d, d])
w_new = w_est[: d**2].reshape([d, d]) - w_est[d**2 :].reshape([d, d])
w_new[np.abs(w_new) < w_threshold] = 0
return StructureModel(w_new.reshape([d, d]))

Expand Down
Loading

0 comments on commit d67ca0c

Please sign in to comment.