Fix part of #3047: Collection contents migration #3209

wxyxinyu · 2017-03-16T05:53:49Z

This is task 1 of #3047: added a new field called collection_contents, and copies the old contents of nodes into it.

wxyxinyu · 2017-03-16T05:56:46Z

Also, some explanation for design choices:

The collection schema versions are structured such that migration is performed automatically upon loading (unlike in explorations), so the migration mapreduce job can't get the old version of the collection without significant changes here. I think this is as intended? This is why I changed the schema update command.
The migration job states the version being migrated to. The current changes don't increment the version, but I intend for the next release to have these changes as well as the addition of the skills list, which will increment the schema version.

codecov-io · 2017-03-16T15:34:19Z

Codecov Report

Merging #3209 into develop will decrease coverage by 0.03%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           develop    #3209      +/-   ##
===========================================
- Coverage    45.82%   45.79%   -0.04%     
===========================================
  Files          230      234       +4     
  Lines        18140    18203      +63     
  Branches      2933     2934       +1     
===========================================
+ Hits          8313     8336      +23     
- Misses        9827     9867      +40

Impacted Files	Coverage Δ
core/templates/dev/head/pages/Base.js	`7.69% <0%> (-69.24%)`	⬇️
...es/dev/head/domain/sidebar/SidebarStatusService.js	`3.7% <0%> (-22.23%)`	⬇️
...ead/domain/exploration/InteractionObjectFactory.js	`87.5% <0%> (-12.5%)`	⬇️
.../exploration_editor/editor_tab/StateInteraction.js	`38.65% <0%> (-1.61%)`	⬇️
core/templates/dev/head/app.js	`53.03% <0%> (-0.56%)`	⬇️
...es/exploration_editor/editor_tab/StateResponses.js	`18.01% <0%> (-0.13%)`	⬇️
...ev/head/pages/exploration_player/PlayerServices.js	`1.7% <0%> (ø)`	⬆️
.../exploration_editor/history_tab/HistoryServices.js	`93.9% <0%> (ø)`	⬆️
...ges/exploration_editor/feedback_tab/FeedbackTab.js	`0.72% <0%> (ø)`	⬆️
...ev/head/pages/exploration_editor/EditorServices.js	`36.01% <0%> (ø)`	⬆️
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4d7be26...ab02160. Read the comment docs.

BenHenning

Thanks @wxyxinyu, this is looking really good! I had a few high-level comments.

I think the idea was to called it collection_contents rather than collection_content (to avoid ambiguity with a state's content property).

Moreover, I think it's preferable to keep 'nodes' a top-level property of Collection. The collection_contents structure is only needed for the CollectionModel, this way we don't repeat 'collection' even inside the Collection object or its dicts. @seanlip do you have any input here?

BenHenning · 2017-03-17T04:42:37Z

core/domain/collection_domain.py

-# This takes additional 'from_version' and 'to_version' parameters for logging.
-CMD_MIGRATE_SCHEMA_TO_LATEST_VERSION = 'migrate_schema_to_latest_version'
+# This takes additional 'version' parameter for logging.
+CMD_MIGRATE_SCHEMA = 'migrate_schema'


This can't change, since we could have 'migrate_schema' currently stored in the database. Why change it?

See my comment above -- the collection migration structure is somewhat different from explorations, in that it performs any necessary migrations on reading the collection from the model, and not on save like in explorations. So, it doesn't make sense to have a 'from_version' and 'to_version'. Furthermore, I believe there have currently not been any migrations done (since this migration job is new), so changing it is ok.

Hmm, not having done any migrations up to now is a good point and we can change the string because of it. However, I don't know if I understand what you mean by not needing a from/to version. Isn't that still useful if we wanted to go back and see which versions were migrated as part of the job?

changed it back

BenHenning · 2017-03-17T04:42:53Z

core/domain/collection_domain.py

-            self.from_version = change_dict['from_version']
-            self.to_version = change_dict['to_version']
+        elif self.cmd == CMD_MIGRATE_SCHEMA:
+            self.version = change_dict['version']


Why change this?

see comment above.

changed it back

BenHenning · 2017-03-17T04:50:24Z

core/domain/collection_domain_test.py

+    Collection domain object class.
+    """
+
+    def test_correct_collection_content_schema_conversion_methods_exist(self):


These are good tests, but it's probably also worth adding a new test for each specific migration function (both for the collection content schema and the full collection dict migration).

What tests are you referring to here? I believe I already have a test which tests that nodes gets migrated to collection_contents.

BenHenning · 2017-03-17T04:52:51Z

core/domain/collection_jobs_one_off.py

+    models.NAMES.base_model, models.NAMES.collection])
+
+
+class CollectionMigrationJobManager(jobs.BaseMapReduceJobManager):


This should just be CollectionMigrationJob.

BenHenning · 2017-03-17T04:54:02Z

core/domain/collection_jobs_one_off.py

+            collection.validate()
+        except Exception as e:
+            logging.error(
+                'Collection %s failed non-strict validation: %s' %


I don't believe collections have strict validation. :) Please update the error string.

Just failed validation is specific enough, no need to clarify with strict versus non-strict.

BenHenning · 2017-03-17T05:06:34Z

core/domain/collection_jobs_one_off_test.py

+        nodes to collection_content if collection_content is empty.
+        """
+        # Create an exploration to put in the collection.
+        self.save_new_valid_exploration(


Do we need a valid exploration for this?

I guess not.

BenHenning · 2017-03-17T05:11:23Z

core/domain/collection_services.py

+    # If collection_content is empty, attempt to retrieve nodes data from nodes
+    # instead. This is temporary, and intended to not break backwards
+    # compatibility before the migration job is run.
+    # TODO(wxy): Remove this after collection migration is completed.


We can't really remove this, can we? Oppia can be used by other organizations that might have older collections before receiving this version, so they could effectively skip the releases that have this code if they are slow to update.

We can, actually, if it's dead code. They could upgrade piecemeal (to release X, first, do the migration, and then upgrade again to release Y). Similar to answer migration.

I'm not sure -- reading from nodes seems to be a hacky thing that isn't really supposed to happen, so I think it's better not to have the code in the codebase. But Ben makes a good point about backwards compatibility :/

BenHenning · 2017-03-17T05:18:05Z

core/domain/collection_services_test.py

+        rights_manager.publish_collection(self.owner_id, self.COLLECTION_ID)
+
+        # This should not give an error.
+        collection_services.update_collection(


Can we also validate that the commit actually happened?

BenHenning · 2017-03-17T05:18:59Z

core/storage/collection/gae_models.py

@@ -63,6 +63,11 @@ class CollectionModel(base_models.VersionedModel):
    # A dict representing all explorations belonging to this collection.
    nodes = ndb.JsonProperty(default={}, indexed=False)


Please add a comment indicating this is now deprecated.

@wxyxinyu see e.g. skill_tags in exploration/gae_models.py for an example.

BenHenning · 2017-03-17T05:19:40Z

core/templates/dev/head/domain/collection/CollectionObjectFactory.js

@@ -33,9 +33,10 @@ oppia.factory('CollectionObjectFactory', [
      // This map acts as a fast way of looking up a collection node for a given
      // exploration ID.
      this._explorationIdToNodeIndexMap = {};
-      for (var i = 0; i < collectionBackendObject.nodes.length; i++) {
+      var collectionContent = collectionBackendObject.collection_content;


Again, prefer not to represent collection_contents in the actual structure of the collection (frontend or backend), just as part of the model. I think this will simplify your PR a lot.

seanlip · 2017-03-17T05:43:10Z

Moreover, I think it's preferable to keep 'nodes' a top-level property of Collection. The collection_contents structure is only needed for the CollectionModel, this way we don't repeat 'collection' even inside the Collection object or its dicts. @seanlip do you have any input here?

@BenHenning, sorry -- are you only talking about the frontend here, or do you mean the domain object in the backend as well?

In both cases, I don't think I mind either way -- one way keeps parity with the storage model, the other way might be easier to understand. I lean mildly towards the latter for understandability, but would defer to @wxyxinyu's thoughts, too. It's the storage layer we really need to get right.

BenHenning

@seanlip I was referring to changing structure in both the backend and frontend domain objects. The need for the contents dict is generally confusing and specific to the model and migration pipeline, whereas from a collection domain point of view the structure requirements are different and can be simpler. I think what Xinyu has right now is really clean and looks good.

@wxyxinyu thanks! This really does look great. I had a few more comments. Let's not worry about the backward compatibility bits for now; Sean and I talked about this a bit at the meeting last night and we're not yet sold on a plan. Let's discuss it more during next Monday's meeting.

Regarding the tests, I was referring to having specific tests to verify the migration from v1 to v2 is correct, v2 to v3, etc. We do this for explorations, but it's more for general yaml conversion. I think taking precedence from that is fine, but we ought to have specific tests for the conversion behavior itself (not just verifying the migration job works or that the migration methods are present).

BenHenning · 2017-03-22T00:46:32Z

core/domain/collection_domain.py

-# This takes additional 'from_version' and 'to_version' parameters for logging.
-CMD_MIGRATE_SCHEMA_TO_LATEST_VERSION = 'migrate_schema_to_latest_version'
+# This takes additional 'version' parameter for logging.
+CMD_MIGRATE_SCHEMA = 'migrate_schema'


Hmm, not having done any migrations up to now is a good point and we can change the string because of it. However, I don't know if I understand what you mean by not needing a from/to version. Isn't that still useful if we wanted to go back and see which versions were migrated as part of the job?

BenHenning · 2017-03-22T00:50:40Z

core/domain/collection_domain.py

+        Note that the versioned_collection_contents being passed in is modified
+        in-place.
+        """
+        versioned_collection_contents['schema_version'] = (


Should we make sure that current_version + 1 doesn't exceed the current schema version defined in feconf? Maybe worth raising an exception if that happens.

BenHenning · 2017-03-22T00:53:28Z

core/domain/collection_jobs_one_off.py

+    @staticmethod
+    def map(item):
+        if item.deleted:
+            return


It's just so that whoever is running this job has a good idea of overall usage. We actually should also yield the successful case, so that in all we would see the following output from running the job:

number of collections not migrated because they are deleted

number of collections successfully migrated

each error message encountered when validating collections prior to migration

The exploration migration job ought to also be updated, but that can be done passively (not even worth tracking a bug for it). This is a nice-to-have that we might as well do with the introduction of this job.

BenHenning · 2017-03-22T00:54:05Z

core/domain/collection_jobs_one_off.py

+        try:
+            collection.validate()
+        except Exception as e:
+            logging.error(


Deleted collections aren't an error, but showing the admin the general behavior of the job is really useful. Please see my earlier comment about what this job ought to output for effectiveness.

BenHenning · 2017-03-22T00:57:41Z

core/domain/collection_jobs_one_off_test.py

+            updated_collection.schema_version,
+            feconf.CURRENT_COLLECTION_SCHEMA_VERSION)
+        after_converted_yaml = updated_collection.to_yaml()
+        self.assertEqual(after_converted_yaml, yaml_before_migration)


Ah, yes you are right. I was thinking that collection dicts included the version of the collection itself (not the schema version), but they apparently don't so checking dict is basically the same as yaml (except yaml doesn't include the collection ID, but that's moot).

BenHenning · 2017-03-22T01:00:54Z

core/domain/collection_services.py

+    # instead. This is temporary, and intended to not break backwards
+    # compatibility before the migration job is run.
+    # TODO(wxy): Remove this after collection migration is completed.
+    if versioned_collection_contents['collection_contents'] == {}:


More pythonic:

if not versioned_collection_contents['collection_contents']:

BenHenning · 2017-03-22T01:03:01Z

core/domain/collection_services.py

@@ -134,23 +135,35 @@ def get_collection_from_model(collection_model, run_conversion=True):
    """

    # Ensure the original collection model does not get altered.
-    versioned_collection = {
+    versioned_collection_contents = {


Maybe we could just do:

versioned_collection_contents = copy.deepcopy(collection_model.collection_contents) versioned_collection_contents['schema_version'] = collection_model.schema_version

It's a bit awkward to have versioned_collection_contents['collection_contents'], though I'm a bit ambivalent and am okay with it staying as is (since it has parity with explorations). It just seems highly unlikely we would ever have a key conflict in the contents dict with schema_version and have an opportunity to clean this up, whereas with exploration states it's more complicated since key names are user provided.

I think this is easier because we eventually need to copy versioned_collection_contents['collection_contents'] back into the model.

BenHenning · 2017-03-22T01:08:51Z

core/domain/collection_jobs_one_off.py

+            collection.validate()
+        except Exception as e:
+            logging.error(
+                'Collection %s failed non-strict validation: %s' %


Just failed validation is specific enough, no need to clarify with strict versus non-strict.

BenHenning · 2017-03-22T01:13:32Z

core/domain/collection_jobs_one_off_test.py

@@ -0,0 +1,152 @@
+# coding: utf-8
+#
+# Copyright 201/c7 The Oppia Authors. All Rights Reserved.


BenHenning · 2017-03-22T01:14:26Z

core/domain/collection_jobs_one_off_test.py

+        # job.
+        collection = collection_domain.Collection.create_default_collection(
+            self.COLLECTION_ID, 'A title', 'A Category', 'An Objective')
+        collection_services.save_new_collection(self.albert_id, collection)


Won't this be the newest schema version? See test_utils for adding a method which creates an early schema version of a collection, similar to what we have inside of test_utils for explorations. Happy to chat over Hangouts if you have any questions about this.

I believe this is actually supposed to be the latest version

Ah, yes you're definitely right.

seanlip · 2017-03-22T01:22:42Z

Just confirming that @BenHenning's comments on frontend/backend structure SGTM. (Thanks for clarifying, Ben!)

BenHenning

Thanks Xinyu. I had two small comments, but I'm still hoping to see:

Regarding the tests, I was referring to having specific tests to verify the migration from v1 to v2 is correct, v2 to v3, etc. We do this for explorations, but it's more for general yaml conversion. I think taking precedence from that is fine, but we ought to have specific tests for the conversion behavior itself (not just verifying the migration job works or that the migration methods are present).

Once we have more granular tests for the individual parts of the migration process, then this PR will LGTM.

BenHenning · 2017-03-24T02:57:57Z

core/domain/collection_domain.py

@@ -397,6 +398,13 @@ def update_collection_contents_from_model(
        Note that the versioned_collection_contents being passed in is modified
        in-place.
        """
+        if (versioned_collection_contents['schema_version'] >


Shouldn't current_version + 1 be checked?

BenHenning · 2017-03-24T02:59:22Z

core/domain/collection_jobs_one_off.py

-                (item.id, e))
+                'Collection %s failed validation: %s' % (item.id, e))
+            yield (CollectionMigrationJob._ERROR_KEY,
+                   'Collection %s failed validation: %s' % (item.id, e))


No need to log the exception since it's being logged with logging.error() and will be available via production logs. We can't always guarantee whether a particular Python value can be properly encoded into Unicode, and in the event this exception can't it will cause the mapreduce shard to fail (and a lot of errors to be outputted to the logs).

BenHenning · 2017-03-24T03:02:49Z

core/domain/collection_jobs_one_off_test.py

+        # job.
+        collection = collection_domain.Collection.create_default_collection(
+            self.COLLECTION_ID, 'A title', 'A Category', 'An Objective')
+        collection_services.save_new_collection(self.albert_id, collection)


Ah, yes you're definitely right.

wxyxinyu · 2017-03-26T15:56:22Z

I'm not quite sure what tests you're referring to. In exp_domain_test, yaml forms of explorations are loaded from each old version. This is already present in collection_domain_test. Are there any other tests?

BenHenning

Thanks @wxyxinyu!

wxyxinyu added 5 commits March 15, 2017 17:33

create collection_content

4f2f62b

Add collection migration job

7d8b70f

Fix some bugs

400ae6f

fix lint errors

0e9c4c5

fix frontend tests

814ee6f

wxyxinyu assigned BenHenning Mar 16, 2017

wxyxinyu requested review from seanlip and BenHenning March 16, 2017 05:53

seanlip removed their request for review March 16, 2017 09:49

Add backend test checking that schema migration methods exist

4a8ac38

wxyxinyu added 2 commits March 16, 2017 11:49

Fix test

85f8939

fix some typos

c5a9f09

BenHenning requested changes Mar 17, 2017

View reviewed changes

wxyxinyu added 2 commits March 18, 2017 12:17

respond to review comments

a5310d0

fix backend test

2afa041

BenHenning requested changes Mar 22, 2017

View reviewed changes

BenHenning assigned wxyxinyu and unassigned BenHenning Mar 23, 2017

wxyxinyu added 4 commits March 23, 2017 00:15

respond to review comments

ec702cb

yield job status in collection migration job

dd6df95

Edit error message

697eed0

fix test

91fd334

BenHenning requested changes Mar 24, 2017

View reviewed changes

respond to review comments

35e801b

wxyxinyu removed their assignment Mar 28, 2017

wxyxinyu assigned BenHenning Mar 28, 2017

BenHenning approved these changes Mar 28, 2017

View reviewed changes

put error message back

ab02160

seanlip added the PR: LGTM label Mar 28, 2017

seanlip merged commit 8530045 into develop Mar 28, 2017

seanlip deleted the collection-contents-migration branch March 28, 2017 08:20

wxyxinyu mentioned this pull request Apr 14, 2017

Migrate CollectionModel.nodes to CollectionModel.collection_dict #3047

Closed

		models.NAMES.base_model, models.NAMES.collection])


		class CollectionMigrationJobManager(jobs.BaseMapReduceJobManager):

		@@ -63,6 +63,11 @@ class CollectionModel(base_models.VersionedModel):
		# A dict representing all explorations belonging to this collection.
		nodes = ndb.JsonProperty(default={}, indexed=False)

Fix part of #3047: Collection contents migration #3209

Fix part of #3047: Collection contents migration #3209

Conversation

wxyxinyu commented Mar 16, 2017

wxyxinyu commented Mar 16, 2017 • edited Loading

codecov-io commented Mar 16, 2017 • edited Loading

Codecov Report

BenHenning left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seanlip commented Mar 17, 2017

BenHenning left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seanlip commented Mar 22, 2017

BenHenning left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wxyxinyu commented Mar 26, 2017

BenHenning left a comment

Choose a reason for hiding this comment

wxyxinyu commented Mar 16, 2017 •

edited

Loading

codecov-io commented Mar 16, 2017 •

edited

Loading