Skip to content

Migrate CollectionModel.nodes to CollectionModel.collection_dict #3047

Closed
@BenHenning

Description

The recent questions work needs to extend the collection schema to include more than just collection nodes. Originally, CollectionModel only stored a list of nodes, similar to the states of an exploration. It was originally assumed that all changes to a collection would be done within CollectionNodes, but that assumption is no longer true. Fortunately the schema version for collections is generic enough to apply to more broad changes to a collection than changes to specific node structures, but there's no way to easily perform these changes using the collection migration framework.

There are a couple of problems with CollectionModel.nodes:

  1. The nodes are just a list of CollectionNode dicts, despite the property being a dict
  2. The property is called 'nodes' even though we want to extend its functionality beyond that

We want to effectively change nodes to collection_dict. Preliminary research seems to suggest NDB properties cannot be renamed. This means we need to introduce a new property to replace the old one. There are also consequences to removing a NDB property, so we really just want to deprecate the old one and introduce a new one. We can eventually remove the property, but that will need to be a follow-up issue and will require a technical leadership decision, as it breaks our backward compatibility when combining certain versions of the app with certain copies of the database.

The following work needs to be done:

Task list 1 [DONE except for running job in production]

  1. CollectionModel needs to be updated to introduce a new collection_dict property with the same type as nodes
  2. CollectionModel's retrieval functions need to be updated to copy the value from nodes and save it under collection_dict['nodes']
  3. collection_services.get_collection_from_model() and all other places accessing CollectionModel.nodes directly need to be updated to instead refer to collection_dict['nodes']. This includes converting from a Collection to a CollectionModel when saving.
  4. A mapreduce job needs to be introduced which simply rewrites all versions of all collections (without introducing new versions) to populate the new collection_dict property.
  5. Tests need to be introduced thoroughly testing the map reduce job, ensuring each version of a collection correctly has its collection_dict value populated and no new versions introduced.
  6. Manual testing should be done locally by checking out the develop branch, populating explorations, switching to the branch with the migration, and ensure that the site works without even running the job ((2) is meant to allow this).
  7. The deployment documentation needs to be updated to mention this special, one-off documentation.
  8. Add a comment in CollectionModel to deprecate usage of the nodes property
  9. File an issue to finish the tasks listed in task list 2 below.
  10. Release these changes all at once (in one PR) and in a single release.
  11. Run the migration job in production.

Task list 2

  1. Write a new job which maps through all collection versions and clears each CollectionModel.nodes property by setting it to an empty dict
  2. Add tests to verify no new versions are introduced with the cleanup job and that CollectionModel.nodes is properly cleared
  3. File an issue to complete tasks list in task list 3 below
  4. Release this changes all at once (in one PR) and in a single release.
  5. Run the migration job in production.

Task list 3

  1. After the migration job as been run in production, remove the migration job
  2. After the cleanup job as been run in production, remove the cleanup job

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions