Implement correlated random number generation #1069

mtazzari · 2022-07-05T17:20:46Z

This Fix #1068 following specifications in #910

TODO:

perform quantitative checks on correlated rng
update all docstrings
check performance
write release notes

Check flow:

if correlations are defined in meta-data/model_settings.json:
- if all peril corr groups have all correlation_value=0 --> do NOT compute correlations.
- if at least one peril corr group has correlation_value>0 --> compute correlations.
- if runs/losses.../input/correlations.bin is not present ---> raise ERROR
- if --ignore-correlation is passed to gulpy --> do NOT compute correlations, regardless of correlation definitions.
if correlations are NOT defined in meta-data/model_settings.json --> do NOT compute correlations

Release notes feature title

This PR introduces the possibility to generate correlated random samples for items in the same peril correlation group. Specifications of this feature are at #910

sstruzik

Beside the few small details this is excellent.

oasislmf/computation/generate/files.py

oasislmf/pytools/gul/manager.py

… present

…elations groups are used and hashing group IDs is done

sstruzik · 2022-09-08T10:42:02Z

oasislmf/preparation/gul_inputs.py

+    Returns: (List[str]) the filtered columns
+    """
+    for col in VALID_OASIS_GROUP_COLS:
+        if col not in list(exposure_df_columns) + VALID_OASIS_GROUP_COLS:


I think this is always False as col in VALID_OASIS_GROUP_COLS are always in list(exposure_df_columns) + VALID_OASIS_GROUP_COLS

this is merely legacy code that has been moved to another area but happy to chance this:

OasisLMF/oasislmf/preparation/gul_inputs.py

Line 156 in 1c59d93

for col in group_id_cols:

@sstruzik is right here, there is an issue with the logic and will always evaluate to False

its checking for valid group_id columns by looking over the list VALID_OASIS_GROUP_COLS instead of checking the input given to the function from group_id_cols

sstruzik · 2022-09-08T10:43:23Z

oasislmf/preparation/gul_inputs.py

+            group_id_cols.remove(col)
+
+    peril_correlation_group = 'peril_correlation_group'
+    if peril_correlation_group not in group_id_cols and correlations is True:


I usually just do "if correlation", is there a reason to use "if correlation is True"?

correlations is a bool however, if we just have the if correlations it will pass if correlations is merely not None. Therefore it is safer to explicitly state if correlations is True. You can run the following code to see the difference:

one = 1 if one: print("one") if one is True: print("two")

sstruzik · 2022-09-08T10:51:36Z

oasislmf/preparation/gul_inputs.py

+    ]
+
+
+def process_group_id_cols(group_id_cols: List[str], exposure_df_columns: List[str], correlations: bool) -> List[str]:


this is one of my dislike example about typing.
I'm not sure what you are checking later (see comment about ln 74) but exposure_df_columns doesn't need to be a list. As you cast it as a list in your code so for example here you could pass df.columns directly and there no need to type.

For correlations, naming would be more explicit. has_correlation_groups or is_correlated that the current choice with bool
plus it wouldn't need to be a bool if you did if correlations instead of if correlations is True.
In the end all the typing limit the function potential use and in my opinion is more confusing.

Now removed typing and changed the correlations with has_correlation_groups

sstruzik · 2022-09-08T11:03:07Z

oasislmf/preparation/gul_inputs.py

+    return group_id_cols
+
+
+def hash_with_correlations(gul_inputs_df: pd.DataFrame, hashing_columns: List[str]) -> pd.DataFrame:


There is nothing specific to correlation in this function, it is just hashing based on some columns. and it is identical as the code we have at the end of get_gul_input_items. So the name is misleading.
The hashing itself should be done twice in two part of the code.

This has been changed to hash_group_id for the function name

sstruzik · 2022-09-08T11:10:29Z

oasislmf/computation/generate/files.py

        )
        correlation_input_items = get_correlation_input_items(
            model_settings_path=self.model_settings_json,
            gul_inputs_df=gul_inputs_df
        )

+        correlations: bool = establish_correlations(model_settings_path=self.model_settings_json)


I don't think you should have to read twice model_settings_path. get_correlation_input_items should return all the information that you need. So I would remove establish_correlations

model settings is only read once now

sstruzik · 2022-09-08T11:13:21Z

oasislmf/preparation/gul_inputs.py

-            break
-
-
+    # it is assumed that correlations are False for now, correlations for group ID hashing are assessed later on in


to avoid hashing twice, you could do the merge with the correlation data in this function instead of doing it after calling it. That should remove your chicken and egg problem.

hashing is only performed once now

sstruzik · 2022-09-08T11:13:56Z

oasislmf/utils/data.py

@@ -409,6 +410,27 @@ def get_model_settings(model_settings_fp, key=None, validate=True):
    return model_settings if not key else model_settings.get(key)


+def establish_correlations(model_settings_path: str) -> bool:


to remove see comment in the calling function

johcarter · 2022-09-21T13:15:37Z

piwind ci results have been updated in https://github.com/OasisLMF/OasisPiWind/tree/update/feature-correlated_rng. This branch should pass against those updated expected results.

A fix is still needed to proceed with model run when no correlation settings are present (fix error on missing correlations.bin). @maxwellflitton is on the case.

johcarter · 2022-09-21T14:32:24Z

All working as expected

sambles · 2022-10-03T13:02:17Z

PiWind passes https://ci.oasislmfdev.org/blue/organizations/jenkins/oasis_PiWind/detail/PR-99/15/pipeline
with OasisLMF/OasisPiWind#99

* [gulpy] first implementation * [gulpy] implementing correlated rng * [wip] implementing correlated rng * [wip] * [gulpy] working implementation of the correlated random values * minor cleanup * [gulpy] Update docstrings for random module functions * [gulpy] remove unused generate_correlated_hash * [gulpy] introduce --ignore-correlation flag * set hashed_group_id to True by default, cleanup * adding haahing patch * adding haahing patch * [gulpy] minor cleanup files.py parameter on same line * [gulpy] run correlation only if rho>0 * updating hashing * [gulpy] improve flow depending on corr definitions * Disable GroupID hashing for acceptance tests (#1094) * Update expected acceptance tests * Revert "Update expected acceptance tests" This reverts commit ad0907f. * Default "hashed_group_id" to false in exposure run * Move hashed_group_id=F default from "RunExposure" to "RunFmTest" * Fix/pip compile (#1097) * Only install pip-tools before pip-compile * Try pinning flake8 * Revert "Try pinning flake8" This reverts commit d845d5b. * Try pinning virtualenv * add --upgrade to pip install pip-tools * Fix test_get_dataframe__from_csv_file__set_col_defaults_option_and_use_defaults_ and run with falsifying example * Remove falsifying example Co-authored-by: Marco Tazzari <6020226+mtazzari@users.noreply.github.com> * Update group_id_cols default in get_gul_input_items * Hashing investigation (#1096) * adding haahing patch * adding haahing patch * updating hashing * Update oasislmf/preparation/gul_inputs.py Co-authored-by: Marco Tazzari <6020226+mtazzari@users.noreply.github.com> * [gul_inputs] bugfix don't modify inplace * Update test_summaries.py to not rely on "loc_id" as default for group_id_cols * Always create a correlations.bin, if missing model_settings file is blank (#1101) * adding peril_correlation_group for valid_oasis_group_cols * appending peril_correlation_group to columns if correlations group is present * adding peril_correlation_group column to hashing of group IDs if correlations groups are used and hashing group IDs is done * updating hashing group ID * updating to accomodate non-correlations * fixxing run * fixing empty correlations df write header if empty correlations * Remove empty file * Add missing defaults to get_gul_input_items (backwards compatible) * Fix Group_id valid column check * Force retest Co-authored-by: maxwellflitton <maxwellflitton@gmail.com> Co-authored-by: sambles <sambles@users.noreply.github.com> Co-authored-by: Sam Gamble <hexadessa@gmail.com>

mtazzari added 2 commits July 5, 2022 14:01

[gulpy] first implementation

eefce12

[gulpy] implementing correlated rng

a8f89de

mtazzari added enhancement New feature or request feature A main feature, captured on the backlog labels Jul 5, 2022

mtazzari self-assigned this Jul 5, 2022

mtazzari changed the base branch from master to develop July 5, 2022 17:21

mtazzari marked this pull request as draft July 5, 2022 17:22

mtazzari changed the title ~~Implement correlated random number generation~~ (wip) Implement correlated random number generation Jul 5, 2022

mtazzari and others added 4 commits July 6, 2022 15:37

[wip] implementing correlated rng

d8215d8

[wip]

614dd3d

[gulpy] working implementation of the correlated random values

d407089

Merge branch 'develop' into feature/correlated_rng

579910b

mtazzari changed the title ~~(wip) Implement correlated random number generation~~ Implement correlated random number generation Jul 15, 2022

minor cleanup

5417f9b

mtazzari marked this pull request as ready for review July 20, 2022 09:42

mtazzari and others added 4 commits July 22, 2022 15:52

[gulpy] Update docstrings for random module functions

5ad10f9

Merge branch 'develop' into feature/correlated_rng

4f564a1

[gulpy] remove unused generate_correlated_hash

2066aa2

[gulpy] introduce --ignore-correlation flag

f0311c0

mtazzari requested a review from sstruzik August 2, 2022 10:59

set hashed_group_id to True by default, cleanup

1709cee

sstruzik approved these changes Aug 8, 2022

View reviewed changes

oasislmf/computation/generate/files.py Outdated Show resolved Hide resolved

oasislmf/pytools/gul/manager.py Outdated Show resolved Hide resolved

oasislmf/pytools/gul/manager.py Outdated Show resolved Hide resolved

maxwellflitton and others added 7 commits August 8, 2022 14:11

adding haahing patch

2222be2

adding haahing patch

2c0d5e3

Merge branch 'develop' into hashing-investigation

6621208

Merge branch 'develop' into feature/correlated_rng

e8cf544

[gulpy] minor cleanup files.py parameter on same line

bb05858

[gulpy] run correlation only if rho>0

e593f0c

updating hashing

fbf1689

maxwellflitton added 3 commits September 5, 2022 13:08

adding peril_correlation_group for valid_oasis_group_cols

f7cb1ab

appending peril_correlation_group to columns if correlations group is…

7d772fb

… present

adding peril_correlation_group column to hashing of group IDs if corr…

e6ac89e

…elations groups are used and hashing group IDs is done

sstruzik requested changes Sep 8, 2022

View reviewed changes

maxwellflitton added 2 commits September 15, 2022 16:09

updating hashing group ID

823add8

updating to accomodate non-correlations

ff9f568

maxwellflitton added 2 commits September 21, 2022 14:19

fixxing run

32a5f66

fixing empty correlations df write header if empty correlations

d60deb2

sambles added 4 commits October 3, 2022 11:57

Merge branch 'develop' into feature/correlated_rng

1fc5651

Remove empty file

e95dac6

Add missing defaults to get_gul_input_items (backwards compatible)

62c805f

Fix Group_id valid column check

5f6933a

sambles mentioned this pull request Oct 3, 2022

Add correlation settings to PiWind testing OasisLMF/OasisPiWind#99

Merged

sambles self-requested a review October 3, 2022 13:31

sambles approved these changes Oct 3, 2022

View reviewed changes

Force retest

44e43f2

sambles merged commit 3d7f8a5 into develop Oct 3, 2022

sambles deleted the feature/correlated_rng branch October 4, 2022 09:57

awsbuild added this to the 1.27.0rc1 milestone Oct 6, 2022

mtazzari mentioned this pull request Oct 27, 2022

Implement correlated random number generation in gulpy #1068

Closed

sambles mentioned this pull request Nov 7, 2022

Fix/package install error #1133

Merged

mtazzari mentioned this pull request Nov 8, 2022

improve correlation functionality available through the engine #910

Closed

awsbuild modified the milestones: 1.27.0rc1, 1.27.0 Jan 12, 2023

sambles mentioned this pull request Jan 12, 2023

Release/1.27.0 OasisLMF/OasisPlatform#721

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement correlated random number generation #1069

Implement correlated random number generation #1069

mtazzari commented Jul 5, 2022 •

edited

Loading

sstruzik left a comment

sstruzik Sep 8, 2022

maxwellflitton Sep 13, 2022

sambles Oct 3, 2022 •

edited

Loading

sambles Oct 3, 2022

sstruzik Sep 8, 2022

maxwellflitton Sep 13, 2022

sstruzik Sep 8, 2022

maxwellflitton Sep 13, 2022

sstruzik Sep 8, 2022

maxwellflitton Sep 14, 2022

sstruzik Sep 8, 2022

maxwellflitton Sep 15, 2022

sstruzik Sep 8, 2022

maxwellflitton Sep 15, 2022

sstruzik Sep 8, 2022

johcarter commented Sep 21, 2022

johcarter commented Sep 21, 2022

sambles commented Oct 3, 2022

		]


		def process_group_id_cols(group_id_cols: List[str], exposure_df_columns: List[str], correlations: bool) -> List[str]:

		return group_id_cols


		def hash_with_correlations(gul_inputs_df: pd.DataFrame, hashing_columns: List[str]) -> pd.DataFrame:

		break


		# it is assumed that correlations are False for now, correlations for group ID hashing are assessed later on in

		@@ -409,6 +410,27 @@ def get_model_settings(model_settings_fp, key=None, validate=True):
		return model_settings if not key else model_settings.get(key)


		def establish_correlations(model_settings_path: str) -> bool:

Implement correlated random number generation #1069

Implement correlated random number generation #1069

Conversation

mtazzari commented Jul 5, 2022 • edited Loading

Release notes feature title

sstruzik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sambles Oct 3, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johcarter commented Sep 21, 2022

johcarter commented Sep 21, 2022

sambles commented Oct 3, 2022

mtazzari commented Jul 5, 2022 •

edited

Loading

sambles Oct 3, 2022 •

edited

Loading