Change the "index all explorations" job in the admin dashboard to an "index all activities" job #2553

seanlip · 2016-10-03T05:43:59Z

There is a job, exp_jobs_one_off.IndexAllExplorationsJobManager, that iterates through all the explorations and adds them to the search index. This should really be replaced with a job called activity_jobs_one_off.IndexAllActivitiesJobManager that instead iterates through all explorations and collections, and adds them to the search index.

This may entail a slight refactor (e.g. moving all search-index-related stuff to a new core.domain.search_services file so that methods like search_services.index_activities() can be called).

rajan-garg · 2017-01-29T06:45:53Z

@seanlip How to test this? and can you explain the refactoring part because i could not find the method search_services.index_activities()

seanlip · 2017-01-29T06:50:40Z

Go to /admin, and open the jobs panel; you should see a list of one-off jobs that you can run. I think there's functionality on localhost:8000 for manipulating the search index, so you could create a couple explorations, a couple collections, clear the search index, and see if running the job repopulates it. This can (and should) also be written as a backend test.

The idea with the refactoring is to create search_services by moving the functionality from exp_services to it. That's why the function doesn't exist yet. I should mention, though, that refactoring these things can be trickier than it looks at the outset (you'll need to make sure that there are no circular loops and that the backend architecture is still "layered" and coherent), so it might be better to take this up later if you feel like you need more experience with the codebase.

seanlip · 2017-06-04T07:51:55Z

Hi @LanJosh, would you be interested in taking this one up?

kevinlee12 · 2017-06-14T05:08:44Z

Hi @LanJosh, updates on this issue? thanks!

LanJosh · 2017-06-15T01:39:58Z

Hi @kevinlee12, sorry for taking so long. Currently trying to wrap my head around what's happening in the code right now, haven't gotten around to doing much of the refactoring yet!

kevinlee12 · 2017-06-15T03:18:57Z

okay, let me know if you have any questions!

LanJosh · 2017-07-04T15:35:06Z

Hi @kevinlee12 I'm a bit confused on indexing the activities. The entity classes to map over I think should be the CollectionModel and the ExplorationModel, but there is an ActivityModel that I can't see a relationship to.

seanlip · 2017-07-04T17:24:15Z

@LanJosh -- sorry, I'm confused. Where are you seeing ActivityModel? I can't find it in the codebase.

FYI, "activity" is a generic term that we sometimes use for referring to either an exploration or a collection.

seanlip · 2017-07-04T17:25:15Z

Ah -- if you're referring to the use of "all activities" in this issue, it just means "all explorations and collections".

LanJosh · 2017-07-04T22:06:38Z

I was referring to the ActivitiesReferenceModel in core.storage.activity, but that makes sense to just use collection and exploration model in the case of all activities for this issue.

seanlip · 2017-07-04T22:39:47Z

Ah, I see. I think that just stores an (activity_type, activity_id) reference that's used by code elsewhere -- I don't think you need to touch it for this issue.

LanJosh · 2017-07-09T04:00:56Z

For this issue I think there should be services from collection_services pulled into the new search_services file as well, in order to index the explorations and collections in the same job manager. Is there a simple way to differentiate between a collection and an exploration by their ID in order to select the correct indexing behavior? Or am I on the wrong path with merging those two services?

seanlip · 2017-07-10T06:09:26Z

I think you're right that some stuff needs to be pulled in from collection_services. There is no way to differentiate based on ID alone, you need to know the activity type ('exploration' or 'collection') as well.

anthkris · 2017-08-01T17:22:54Z

@LanJosh Any progress on this?

LanJosh · 2017-08-10T01:43:26Z

@anthkris I got a bit stuck on how to differentiate between the activity types and lost the thread. I had thought by returning exp_models.ExplorationModel and collection_models.CollectionModel in the EntityClassesToMapOver function I would have some information in the map function to differentiate, but it's still mysterious to me.

anthkris · 2017-08-10T02:37:16Z

@seanlip Can you offer any advice to @LanJosh ? Or direct us to who best to ask?

seanlip · 2017-08-10T06:35:55Z

Hi @LanJosh -- I think you're on the right track, actually. If you look at core/domain/stats_jobs_one_off.py, you'll see that the first job in that file uses isinstance to determine the type of the item. Does that help?

LanJosh · 2017-08-13T15:41:56Z

Yeah, I think I'll be able to use that. Thanks!

LanJosh · 2017-08-15T18:33:24Z

Had a few more questions if someone can answer them. Testing manually, the code seems to work, after deleting the objects in the full text search tab at localhost:8000, running the job make them appear again. I'm not having much luck with the unit tests though. After running the backend tests, it notes all test have passed but several warnings appear saying some tests were not run, probably due to an import error. Looking into the imports, it seems like it originates from a file I touched, exp_services, but the failed import is google.appengine.datastore, which is strange to me because gae seems to start locally. I tried resetting my virtualenv and redownloading gae using the prerequisite scripts but haven't had any luck. Any idea what could be going on?

seanlip · 2017-08-15T19:48:59Z

Hi @LanJosh, could you post a "[not for merge] work-in-progress" PR showing your changes, and the full error trace? Hard to tell what's going on just from the description.

Thanks!

LanJosh · 2017-08-15T23:51:53Z

PR: #3755
After wrestling with the linter a little, I think the issue is actually that there is a circular import between exp_services/collection_services and the new search_services. It may solve the problem to not have a new search services file and let the job manager differentiate between calling exp_services or collection_services, but I'm not sure if that the right direction for the refactor.

One example of the error trace

WARNING: FAILED TO RUN TESTS.

This is most likely due to an import error.

SUCCESS core.domain.user_query_services_test: 1 tests (2.5 secs)
SUCCESS core.domain.user_services_test: 39 tests (171.3 secs)
SUCCESS core.domain.value_generators_domain_test: 1 tests (2.4 secs)
SUCCESS core.domain.visualization_registry_test: 2 tests (4.1 secs)
= create_test_suites(parsed_args.test_target)
File "/Users/joshualan/Projects/opensource/oppia/core/tests/gae_suite.py", line 76, in create_test_suites
if test_target else [loader.discover(
File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/loader.py", line 100, in loadTestsFromName
parent, obj = obj, getattr(obj, part)
AttributeError: 'module' object has no attribute 'gae_datastore_services_test'

seanlip · 2017-08-16T00:09:22Z

Thanks for posting the code! The refactor is looking great, actually -- I think it's nice to separate the search stuff from the other stuff, so thank you very much for doing it. It makes things a lot cleaner.

Also, I agree with your diagnosis. It looks like search_services is trying to get the exploration summary from exp_services, and exp_services is asking search_services to index stuff. How about we do something like this instead (for explorations; collections is similar):

call index_explorations_given_ids() with a list of exploration summaries instead. That way, all the data is already there in the summary objects. (The function name should probably be changed, maybe to index_exploration_summaries().) This removes the dependency of search_services on exp_services.
define some of the functions in search_services as helper methods on the ExplorationSummary object instead, then we can just call them in-place since we'll be given the summary objects themselves to index in search_services.
if something external wants to call index_exploration_given_ids() then route them through exp_services, have that compile the summaries, and then that function should call search_services.index_exploration_summaries().

What do you think?

LanJosh · 2017-08-16T14:42:56Z

I think that should work well, but it seems some fields from the exploration are missing from the exploration summary -- the blurb and author notes. I'm unsure of how costly it is to add those fields.

seanlip · 2017-08-16T16:49:28Z

Iirc we don't actually have those fields populated at the moment anyway, they are all blank. So it should be fine to remove them from the "search doc"!

…

On Wednesday, August 16, 2017, Joshua Lan ***@***.***> wrote: I think that should work well, but it seems some fields from the exploration are missing from the exploration summary -- the blurb and author notes. I'm unsure of how costly it is to add those fields. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2553 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKFeyulNMWpCpOT84rIf4c0vUynib89bks5sYv_ygaJpZM4KMQc_> .

LanJosh · 2017-08-21T01:16:26Z

Hi Sean, I updated the PR as I'm having another difficulty. I started writing the test cases, but have to do some weird behavior to get them to pass which makes me think the test may be flaky in the future. After adding a rating using rating service, occasionally I have retrieve the summary again to get the updated ratings. I'm not sure why I have to only do this occasionally. Lines 65 and 98 in search_services_tests.py are probably the most relevant ones. I tried adding a wait instead to see if it just took some time to update, but that did not work. Looking at how the tests were previously implemented, it doesn't seem like it was necessary to retrieve the exploration domain object again, but they did retrieve the exploration summary domain object before each assertion.

seanlip · 2017-08-21T02:20:27Z

Hi @LanJosh! Yes, retrieving the summary again each time is expected behavior. Think about it like this: the domain object is just a plain Python object which has no expectations of persistence -- unlike the stuff in the storage model layer. So if you update the rating, the datastore object is updated, but the domain object you've already retrieved isn't -- it's just a snapshot of the datastore object at the time it was originally retrieved. So, you'll need to retrieve it again each time. Does that make sense?

…dashboard to an "index all activities" job

…t-m2 * upstream/develop: (27 commits) Fixes 3687: batch call for exploration and exploration rights object together. (oppia#3815) Fix part oppia#3400 Objective field directive (oppia#3740) Fix oppia#3800 Dropdown menu of the ABOUT in navigation bar (oppia#3855) Fix oppia#3295 Save draft change list to local storage. (oppia#3584) Disable learner dashboard feedback updates send button when no text entered. (oppia#3860) Disable save button while uploading audio to server; add Spanish support to audio languages; add audio loading message to learner view (oppia#3856) Fix oppia#3834: Introduce backend event models to record statistics (oppia#3841) Fix oppia#3852: Fix failing e2e test in develop (oppia#3853) Fix oppia#3826: Move validatorsService from app.js to ValidatorsService.js (oppia#3847) Update the max RAND limit for ID generation to fit within 32 bits. (oppia#3843) Fix oppia#3404: Add default placeholder text for mobile devices (oppia#3845) Fix oppia#2553: Change the "index all explorations" job in the admin dashb… (oppia#3831) Add I18N to give up button; change tooltip text. (oppia#3844) Fix oppia#3721: Update the release_info script to use LCA and the most recent release tag in order to compile the list of changes, rather than relying on git describe. (oppia#3838) show_warning_only_on_button_click (oppia#3833) Deprecate splash page experiment (oppia#3829) Fixes for end to end working between Oppia and Oppia-ml. (oppia#3824) Fix 'Empty path passed in method' error on the collection page. (oppia#3827) Learner dashboard 3.2 (oppia#3759) 📝 Fix oppia#3789 :Space out the profile drop down icons (oppia#3789). (oppia#3817) ...

seanlip added type: feature (minor) needs design doc backend labels Oct 3, 2016

seanlip modified the milestone: Recommended projects Jan 25, 2017

seanlip added owner: @seanlip and removed team: search labels Jan 29, 2017

seanlip assigned LanJosh Jun 4, 2017

LanJosh added a commit to LanJosh/oppia that referenced this issue Aug 15, 2017

Refactor index all explorations oppia#2553

4786603

LanJosh added a commit to LanJosh/oppia that referenced this issue Aug 15, 2017

Refactor index all explorations oppia#2553

5d17fb1

LanJosh added a commit to LanJosh/oppia that referenced this issue Aug 15, 2017

Refactor index all explorations oppia#2553

01b2a09

LanJosh added a commit to LanJosh/oppia that referenced this issue Aug 30, 2017

Fix oppia#2553: Change the "index all explorations" job in the admin …

825b174

…dashboard to an "index all activities" job

LanJosh added a commit to LanJosh/oppia that referenced this issue Aug 30, 2017

Fix oppia#2553: Change the "index all explorations" job in the admin …

9c57e8d

…dashboard to an "index all activities" job

LanJosh mentioned this issue Aug 30, 2017

Fix #2553: Change the "index all explorations" job in the admin dashb… #3831

Merged

tjiang11 closed this as completed in dd4d7c3 Sep 5, 2017

seanlip removed this from the Recommended projects milestone Sep 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change the "index all explorations" job in the admin dashboard to an "index all activities" job #2553

Change the "index all explorations" job in the admin dashboard to an "index all activities" job #2553

seanlip commented Oct 3, 2016

rajan-garg commented Jan 29, 2017

seanlip commented Jan 29, 2017

seanlip commented Jun 4, 2017

kevinlee12 commented Jun 14, 2017

LanJosh commented Jun 15, 2017

kevinlee12 commented Jun 15, 2017

LanJosh commented Jul 4, 2017

seanlip commented Jul 4, 2017

seanlip commented Jul 4, 2017

LanJosh commented Jul 4, 2017

seanlip commented Jul 4, 2017

LanJosh commented Jul 9, 2017

seanlip commented Jul 10, 2017

anthkris commented Aug 1, 2017

LanJosh commented Aug 10, 2017

anthkris commented Aug 10, 2017

seanlip commented Aug 10, 2017

LanJosh commented Aug 13, 2017

LanJosh commented Aug 15, 2017

seanlip commented Aug 15, 2017

LanJosh commented Aug 15, 2017

seanlip commented Aug 16, 2017

LanJosh commented Aug 16, 2017

seanlip commented Aug 16, 2017 via email

LanJosh commented Aug 21, 2017

seanlip commented Aug 21, 2017

Change the "index all explorations" job in the admin dashboard to an "index all activities" job #2553

Change the "index all explorations" job in the admin dashboard to an "index all activities" job #2553

Comments

seanlip commented Oct 3, 2016

rajan-garg commented Jan 29, 2017

seanlip commented Jan 29, 2017

seanlip commented Jun 4, 2017

kevinlee12 commented Jun 14, 2017

LanJosh commented Jun 15, 2017

kevinlee12 commented Jun 15, 2017

LanJosh commented Jul 4, 2017

seanlip commented Jul 4, 2017

seanlip commented Jul 4, 2017

LanJosh commented Jul 4, 2017

seanlip commented Jul 4, 2017

LanJosh commented Jul 9, 2017

seanlip commented Jul 10, 2017

anthkris commented Aug 1, 2017

LanJosh commented Aug 10, 2017

anthkris commented Aug 10, 2017

seanlip commented Aug 10, 2017

LanJosh commented Aug 13, 2017

LanJosh commented Aug 15, 2017

seanlip commented Aug 15, 2017

LanJosh commented Aug 15, 2017

One example of the error trace

seanlip commented Aug 16, 2017

LanJosh commented Aug 16, 2017

seanlip commented Aug 16, 2017 via email

LanJosh commented Aug 21, 2017

seanlip commented Aug 21, 2017