-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidated Change list mapping method #4729
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #4729 +/- ##
===========================================
+ Coverage 44.66% 44.69% +0.02%
===========================================
Files 384 385 +1
Lines 23432 23462 +30
Branches 3799 3802 +3
===========================================
+ Hits 10467 10486 +19
- Misses 12965 12976 +11
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good start! PTAL at inline comments.
Also shouldn't we be updating other usages of this functionality (like the ML job)? /cc @prasanna08
core/domain/exp_domain.py
Outdated
exploration. | ||
|
||
Attributes: | ||
states_added: list(str). The states added to the exploration from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added_state_names, deleted_state_names, new_to_old_state_names
(they're state names, not states -- might want to change the docstrings accordingly e.g. "Names of states added ...")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
core/domain/exp_domain.py
Outdated
|
||
Args: | ||
change_list: list(dict). A list of all of the commit cmds from the | ||
old_stats_model up to the next version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
old_stats_model? Do you mean the old version of the exploration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
core/domain/exp_domain.py
Outdated
current_exp_version to the state names of prev_exp_version. | ||
""" | ||
|
||
def __init__(self, change_list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you also need the state names of the old version of the exploration to seed this (at least as a sanity check that e.g. added names are not already in the list of states, deleted names are already in the list, etc.)?
It would be OK to re-characterize states_renamed as a mapping from states whose "identities" still exist in both the old and new versions, and have a state name map to itself if nothing has happened to it during the change. That way you can keep track of the state changes fully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seanlip Not necessarily would we need it, would we? For example, we can't use the old state names as a sanity check for deletion of a state since a state can be renamed and then deleted. This sanity check could only work for addition of states and over engineering for this case might not be required right?
Re the second point, this would introduce a lot more keys in this dict and iterating through it wouldn't be as efficient. While creating this mapping of renames, I didn't see why we'd need to also keep state names that didn't undergo any change. Is there a use case that'd require this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re your first paragraph, let's say A gets renamed to B and then gets deleted. I would expect the states_renamed dict to start with A --> A, then become A --> B, and then before deleting B we check that it's either in states_renamed.values() or states_added. I think you do want to take the initial state names into account because otherwise you're kind of assuming things about the exploration based on the changelist. This also acts as a sanity check on our production data.
Re the second paragraph, IMO the efficiency costs in this case would be negligible. I think it's more definitive if you take all states into account since then in the client code that uses this function you won't have to do "if state name is in renamed states then do this". It'd just be "do this". (So I think it would actually be simpler, at least conceptually.)
core/domain/exp_domain.py
Outdated
if state_name in states_added: | ||
states_added.remove(state_name) | ||
else: | ||
states_deleted.append(state_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we remove it from the "rename dict" in this case?
What if a state gets renamed before it is deleted?
(Add backend tests for these cases too.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Fixed it differently, and tests cases added as well.
core/domain/stats_jobs_one_off.py
Outdated
prev_stats_dict['state_stats_mapping'][state_name] = ( | ||
stats_domain.StateStats.create_default()) | ||
|
||
for state_name in exp_versions_diff.states_deleted: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a state gets renamed before being deleted, does states_deleted hold the new name or the old name of the state? If the former then this won't work.
(It should probably hold the old name if you want to apply the ExplorationVersionsDiff directly to the new object.)
Also would it make more sense for the rename-dict to be keyed by the old state name? That way the algorithm for transferring the data is simpler: delete stuff, then map old state names to new state names by iterating through the rename dict, then add new states.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seanlip Re the second point, does it make a difference? The data transfer is pretty simple right now as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think keying by old state names is simpler since you're transforming the old exploration into the new one. So you basically have OldExp and a mapping from stuff in OldExp to stuff in NewExp. Having the mapping go the other way round is conceptually a bit weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pranavsid98! A few more comments.
core/domain/exp_domain.py
Outdated
added_state_names.remove(state_name) | ||
else: | ||
original_state_name = state_name | ||
while original_state_name in new_to_old_state_names: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand this. Why do you need a while loop? Isn't a single lookup sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Done.
core/domain/exp_domain.py
Outdated
old_to_new_state_names[change_dict['old_state_name']] = ( | ||
old_to_new_state_names.pop(change_dict[ | ||
'new_state_name'])) | ||
exp_versions_diff = ExplorationVersionsDiff(change_list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @pranavsid98 -- I thought we chatted about re-examining this method because we noticed it didn't take into account adds and deletes? Could you please check with @prasanna08 about that and verify that its usage is correct? (With your new method it might be possible to simplify this, too, and perhaps even get rid of get_state_names_mapping() altogether.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
core/domain/stats_jobs_one_off.py
Outdated
change_dict['new_state_name']] = ( | ||
prev_stats_dict['state_stats_mapping'].pop( | ||
change_dict['old_state_name'])) | ||
for state_name in exp_versions_diff.added_state_names: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you've got this in the wrong order. What if a changelist renames A --> B then adds A?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed order to delete, rename and then add!
core/domain/classifier_services.py
Outdated
old_state_name = current_state_name | ||
if current_state_name in ( | ||
exp_versions_diff.old_to_new_state_names.values()): | ||
# The structure of ExplorationVersionsDiff's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do a single pass outside the for loop to build the reverse mapping here (locally) and use that. Or better still add that as a method to the exp_versions_diff object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
core/domain/exp_domain.py
Outdated
old_state_name = new_state_name | ||
if new_state_name in ( | ||
exp_versions_diff.old_to_new_state_names.values()): | ||
# The structure of ExplorationVersionsDiff's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above, compute new-to-old first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
core/domain/stats_jobs_one_off.py
Outdated
@@ -231,21 +231,22 @@ def _apply_state_name_changes(cls, prev_stats_dict, change_list): | |||
dict. A dict representation of an ExplorationStatsModel | |||
with updated state_stats_mapping and version. | |||
""" | |||
exp_versions_diff = exp_domain.ExplorationVersionsDiff(change_list) | |||
|
|||
# Handling state additions, deletions and renames. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a note here along the lines of "(Note that the order these are applied is important.)" Also change to say "Handling state deletions, renames and additions (in that order)."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @pranavsid98!
@seanlip
I've added the system for consolidated change list mapping method and added tests too. I've modified the occurrences of where we can use this method too.
I did not modify the GenerateV1Stats job because its already been run in production.
Thanks
Pranav