Migrate CollectionModel.nodes to CollectionModel.collection_dict #3047
Description
The recent questions work needs to extend the collection schema to include more than just collection nodes. Originally, CollectionModel
only stored a list of nodes, similar to the states of an exploration. It was originally assumed that all changes to a collection would be done within CollectionNode
s, but that assumption is no longer true. Fortunately the schema version for collections is generic enough to apply to more broad changes to a collection than changes to specific node structures, but there's no way to easily perform these changes using the collection migration framework.
There are a couple of problems with CollectionModel.nodes
:
- The nodes are just a list of
CollectionNode
dicts, despite the property being a dict - The property is called 'nodes' even though we want to extend its functionality beyond that
We want to effectively change nodes
to collection_dict
. Preliminary research seems to suggest NDB properties cannot be renamed. This means we need to introduce a new property to replace the old one. There are also consequences to removing a NDB property, so we really just want to deprecate the old one and introduce a new one. We can eventually remove the property, but that will need to be a follow-up issue and will require a technical leadership decision, as it breaks our backward compatibility when combining certain versions of the app with certain copies of the database.
The following work needs to be done:
Task list 1 [DONE except for running job in production]
CollectionModel
needs to be updated to introduce a newcollection_dict
property with the same type asnodes
CollectionModel
's retrieval functions need to be updated to copy the value fromnodes
and save it undercollection_dict['nodes']
collection_services.get_collection_from_model()
and all other places accessingCollectionModel.nodes
directly need to be updated to instead refer tocollection_dict['nodes']
. This includes converting from aCollection
to aCollectionModel
when saving.- A mapreduce job needs to be introduced which simply rewrites all versions of all collections (without introducing new versions) to populate the new
collection_dict
property. - Tests need to be introduced thoroughly testing the map reduce job, ensuring each version of a collection correctly has its
collection_dict
value populated and no new versions introduced. - Manual testing should be done locally by checking out the develop branch, populating explorations, switching to the branch with the migration, and ensure that the site works without even running the job ((2) is meant to allow this).
- The deployment documentation needs to be updated to mention this special, one-off documentation.
- Add a comment in
CollectionModel
to deprecate usage of thenodes
property - File an issue to finish the tasks listed in task list 2 below.
- Release these changes all at once (in one PR) and in a single release.
- Run the migration job in production.
Task list 2
- Write a new job which maps through all collection versions and clears each
CollectionModel.nodes
property by setting it to an empty dict - Add tests to verify no new versions are introduced with the cleanup job and that
CollectionModel.nodes
is properly cleared - File an issue to complete tasks list in task list 3 below
- Release this changes all at once (in one PR) and in a single release.
- Run the migration job in production.
Task list 3
- After the migration job as been run in production, remove the migration job
- After the cleanup job as been run in production, remove the cleanup job