-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement correlated random number generation #1069
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beside the few small details this is excellent.
…elations groups are used and hashing group IDs is done
Returns: (List[str]) the filtered columns | ||
""" | ||
for col in VALID_OASIS_GROUP_COLS: | ||
if col not in list(exposure_df_columns) + VALID_OASIS_GROUP_COLS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is always False as col in VALID_OASIS_GROUP_COLS are always in list(exposure_df_columns) + VALID_OASIS_GROUP_COLS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is merely legacy code that has been moved to another area but happy to chance this:
OasisLMF/oasislmf/preparation/gul_inputs.py
Line 156 in 1c59d93
for col in group_id_cols: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sstruzik is right here, there is an issue with the logic and will always evaluate to False
its checking for valid group_id columns by looking over the list VALID_OASIS_GROUP_COLS
instead of checking the input given to the function from group_id_cols
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oasislmf/preparation/gul_inputs.py
Outdated
group_id_cols.remove(col) | ||
|
||
peril_correlation_group = 'peril_correlation_group' | ||
if peril_correlation_group not in group_id_cols and correlations is True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually just do "if correlation", is there a reason to use "if correlation is True"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correlations
is a bool
however, if we just have the if correlations
it will pass if correlations
is merely not None
. Therefore it is safer to explicitly state if correlations is True
. You can run the following code to see the difference:
one = 1
if one:
print("one")
if one is True:
print("two")
oasislmf/preparation/gul_inputs.py
Outdated
] | ||
|
||
|
||
def process_group_id_cols(group_id_cols: List[str], exposure_df_columns: List[str], correlations: bool) -> List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is one of my dislike example about typing.
I'm not sure what you are checking later (see comment about ln 74) but exposure_df_columns doesn't need to be a list. As you cast it as a list in your code so for example here you could pass df.columns directly and there no need to type.
For correlations, naming would be more explicit. has_correlation_groups or is_correlated that the current choice with bool
plus it wouldn't need to be a bool if you did if correlations instead of if correlations is True.
In the end all the typing limit the function potential use and in my opinion is more confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now removed typing and changed the correlations
with has_correlation_groups
oasislmf/preparation/gul_inputs.py
Outdated
return group_id_cols | ||
|
||
|
||
def hash_with_correlations(gul_inputs_df: pd.DataFrame, hashing_columns: List[str]) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is nothing specific to correlation in this function, it is just hashing based on some columns. and it is identical as the code we have at the end of get_gul_input_items. So the name is misleading.
The hashing itself should be done twice in two part of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been changed to hash_group_id
for the function name
) | ||
correlation_input_items = get_correlation_input_items( | ||
model_settings_path=self.model_settings_json, | ||
gul_inputs_df=gul_inputs_df | ||
) | ||
|
||
correlations: bool = establish_correlations(model_settings_path=self.model_settings_json) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you should have to read twice model_settings_path. get_correlation_input_items should return all the information that you need. So I would remove establish_correlations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model settings is only read once now
break | ||
|
||
|
||
# it is assumed that correlations are False for now, correlations for group ID hashing are assessed later on in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to avoid hashing twice, you could do the merge with the correlation data in this function instead of doing it after calling it. That should remove your chicken and egg problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hashing is only performed once now
oasislmf/utils/data.py
Outdated
@@ -409,6 +410,27 @@ def get_model_settings(model_settings_fp, key=None, validate=True): | |||
return model_settings if not key else model_settings.get(key) | |||
|
|||
|
|||
def establish_correlations(model_settings_path: str) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to remove see comment in the calling function
piwind ci results have been updated in https://github.com/OasisLMF/OasisPiWind/tree/update/feature-correlated_rng. This branch should pass against those updated expected results. A fix is still needed to proceed with model run when no correlation settings are present (fix error on missing correlations.bin). @maxwellflitton is on the case. |
All working as expected |
* [gulpy] first implementation * [gulpy] implementing correlated rng * [wip] implementing correlated rng * [wip] * [gulpy] working implementation of the correlated random values * minor cleanup * [gulpy] Update docstrings for random module functions * [gulpy] remove unused generate_correlated_hash * [gulpy] introduce --ignore-correlation flag * set hashed_group_id to True by default, cleanup * adding haahing patch * adding haahing patch * [gulpy] minor cleanup files.py parameter on same line * [gulpy] run correlation only if rho>0 * updating hashing * [gulpy] improve flow depending on corr definitions * Disable GroupID hashing for acceptance tests (#1094) * Update expected acceptance tests * Revert "Update expected acceptance tests" This reverts commit ad0907f. * Default "hashed_group_id" to false in exposure run * Move hashed_group_id=F default from "RunExposure" to "RunFmTest" * Fix/pip compile (#1097) * Only install pip-tools before pip-compile * Try pinning flake8 * Revert "Try pinning flake8" This reverts commit d845d5b. * Try pinning virtualenv * add --upgrade to pip install pip-tools * Fix test_get_dataframe__from_csv_file__set_col_defaults_option_and_use_defaults_ and run with falsifying example * Remove falsifying example Co-authored-by: Marco Tazzari <6020226+mtazzari@users.noreply.github.com> * Update group_id_cols default in get_gul_input_items * Hashing investigation (#1096) * adding haahing patch * adding haahing patch * updating hashing * Update oasislmf/preparation/gul_inputs.py Co-authored-by: Marco Tazzari <6020226+mtazzari@users.noreply.github.com> * [gul_inputs] bugfix don't modify inplace * Update test_summaries.py to not rely on "loc_id" as default for group_id_cols * Always create a correlations.bin, if missing model_settings file is blank (#1101) * adding peril_correlation_group for valid_oasis_group_cols * appending peril_correlation_group to columns if correlations group is present * adding peril_correlation_group column to hashing of group IDs if correlations groups are used and hashing group IDs is done * updating hashing group ID * updating to accomodate non-correlations * fixxing run * fixing empty correlations df write header if empty correlations * Remove empty file * Add missing defaults to get_gul_input_items (backwards compatible) * Fix Group_id valid column check * Force retest Co-authored-by: maxwellflitton <maxwellflitton@gmail.com> Co-authored-by: sambles <sambles@users.noreply.github.com> Co-authored-by: Sam Gamble <hexadessa@gmail.com>
This Fix #1068 following specifications in #910
TODO:
Check flow:
meta-data/model_settings.json
:correlation_value=0
--> do NOT compute correlations.correlation_value>0
--> compute correlations.runs/losses.../input/correlations.bin
is not present ---> raise ERROR--ignore-correlation
is passed togulpy
--> do NOT compute correlations, regardless of correlation definitions.meta-data/model_settings.json
--> do NOT compute correlationsRelease notes feature title
This PR introduces the possibility to generate correlated random samples for items in the same peril correlation group. Specifications of this feature are at #910