Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the "index all explorations" job in the admin dashboard to an "index all activities" job #2553

Closed
seanlip opened this issue Oct 3, 2016 · 26 comments

Comments

@seanlip
Copy link
Member

seanlip commented Oct 3, 2016

There is a job, exp_jobs_one_off.IndexAllExplorationsJobManager, that iterates through all the explorations and adds them to the search index. This should really be replaced with a job called activity_jobs_one_off.IndexAllActivitiesJobManager that instead iterates through all explorations and collections, and adds them to the search index.

This may entail a slight refactor (e.g. moving all search-index-related stuff to a new core.domain.search_services file so that methods like search_services.index_activities() can be called).

@seanlip seanlip modified the milestone: Recommended projects Jan 25, 2017
@rajan-garg
Copy link
Contributor

@seanlip How to test this? and can you explain the refactoring part because i could not find the method search_services.index_activities()

@seanlip
Copy link
Member Author

seanlip commented Jan 29, 2017

Go to /admin, and open the jobs panel; you should see a list of one-off jobs that you can run. I think there's functionality on localhost:8000 for manipulating the search index, so you could create a couple explorations, a couple collections, clear the search index, and see if running the job repopulates it. This can (and should) also be written as a backend test.

The idea with the refactoring is to create search_services by moving the functionality from exp_services to it. That's why the function doesn't exist yet. I should mention, though, that refactoring these things can be trickier than it looks at the outset (you'll need to make sure that there are no circular loops and that the backend architecture is still "layered" and coherent), so it might be better to take this up later if you feel like you need more experience with the codebase.

@seanlip
Copy link
Member Author

seanlip commented Jun 4, 2017

Hi @LanJosh, would you be interested in taking this one up?

@kevinlee12
Copy link
Contributor

Hi @LanJosh, updates on this issue? thanks!

@LanJosh
Copy link
Contributor

LanJosh commented Jun 15, 2017

Hi @kevinlee12, sorry for taking so long. Currently trying to wrap my head around what's happening in the code right now, haven't gotten around to doing much of the refactoring yet!

@kevinlee12
Copy link
Contributor

okay, let me know if you have any questions!

@LanJosh
Copy link
Contributor

LanJosh commented Jul 4, 2017

Hi @kevinlee12 I'm a bit confused on indexing the activities. The entity classes to map over I think should be the CollectionModel and the ExplorationModel, but there is an ActivityModel that I can't see a relationship to.

@seanlip
Copy link
Member Author

seanlip commented Jul 4, 2017

@LanJosh -- sorry, I'm confused. Where are you seeing ActivityModel? I can't find it in the codebase.

FYI, "activity" is a generic term that we sometimes use for referring to either an exploration or a collection.

@seanlip
Copy link
Member Author

seanlip commented Jul 4, 2017

Ah -- if you're referring to the use of "all activities" in this issue, it just means "all explorations and collections".

@LanJosh
Copy link
Contributor

LanJosh commented Jul 4, 2017

I was referring to the ActivitiesReferenceModel in core.storage.activity, but that makes sense to just use collection and exploration model in the case of all activities for this issue.

@seanlip
Copy link
Member Author

seanlip commented Jul 4, 2017

Ah, I see. I think that just stores an (activity_type, activity_id) reference that's used by code elsewhere -- I don't think you need to touch it for this issue.

@LanJosh
Copy link
Contributor

LanJosh commented Jul 9, 2017

For this issue I think there should be services from collection_services pulled into the new search_services file as well, in order to index the explorations and collections in the same job manager. Is there a simple way to differentiate between a collection and an exploration by their ID in order to select the correct indexing behavior? Or am I on the wrong path with merging those two services?

@seanlip
Copy link
Member Author

seanlip commented Jul 10, 2017

I think you're right that some stuff needs to be pulled in from collection_services. There is no way to differentiate based on ID alone, you need to know the activity type ('exploration' or 'collection') as well.

@anthkris
Copy link
Contributor

anthkris commented Aug 1, 2017

@LanJosh Any progress on this?

@LanJosh
Copy link
Contributor

LanJosh commented Aug 10, 2017

@anthkris I got a bit stuck on how to differentiate between the activity types and lost the thread. I had thought by returning exp_models.ExplorationModel and collection_models.CollectionModel in the EntityClassesToMapOver function I would have some information in the map function to differentiate, but it's still mysterious to me.

@anthkris
Copy link
Contributor

@seanlip Can you offer any advice to @LanJosh ? Or direct us to who best to ask?

@seanlip
Copy link
Member Author

seanlip commented Aug 10, 2017

Hi @LanJosh -- I think you're on the right track, actually. If you look at core/domain/stats_jobs_one_off.py, you'll see that the first job in that file uses isinstance to determine the type of the item. Does that help?

@LanJosh
Copy link
Contributor

LanJosh commented Aug 13, 2017

Yeah, I think I'll be able to use that. Thanks!

@LanJosh
Copy link
Contributor

LanJosh commented Aug 15, 2017

Had a few more questions if someone can answer them. Testing manually, the code seems to work, after deleting the objects in the full text search tab at localhost:8000, running the job make them appear again. I'm not having much luck with the unit tests though. After running the backend tests, it notes all test have passed but several warnings appear saying some tests were not run, probably due to an import error. Looking into the imports, it seems like it originates from a file I touched, exp_services, but the failed import is google.appengine.datastore, which is strange to me because gae seems to start locally. I tried resetting my virtualenv and redownloading gae using the prerequisite scripts but haven't had any luck. Any idea what could be going on?

@seanlip
Copy link
Member Author

seanlip commented Aug 15, 2017

Hi @LanJosh, could you post a "[not for merge] work-in-progress" PR showing your changes, and the full error trace? Hard to tell what's going on just from the description.

Thanks!

LanJosh added a commit to LanJosh/oppia that referenced this issue Aug 15, 2017
LanJosh added a commit to LanJosh/oppia that referenced this issue Aug 15, 2017
LanJosh added a commit to LanJosh/oppia that referenced this issue Aug 15, 2017
@LanJosh
Copy link
Contributor

LanJosh commented Aug 15, 2017

PR: #3755
After wrestling with the linter a little, I think the issue is actually that there is a circular import between exp_services/collection_services and the new search_services. It may solve the problem to not have a new search services file and let the job manager differentiate between calling exp_services or collection_services, but I'm not sure if that the right direction for the refactor.

One example of the error trace

WARNING: FAILED TO RUN TESTS.

This is most likely due to an import error.

SUCCESS core.domain.user_query_services_test: 1 tests (2.5 secs)
SUCCESS core.domain.user_services_test: 39 tests (171.3 secs)
SUCCESS core.domain.value_generators_domain_test: 1 tests (2.4 secs)
SUCCESS core.domain.visualization_registry_test: 2 tests (4.1 secs)
= create_test_suites(parsed_args.test_target)
File "/Users/joshualan/Projects/opensource/oppia/core/tests/gae_suite.py", line 76, in create_test_suites
if test_target else [loader.discover(
File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/loader.py", line 100, in loadTestsFromName
parent, obj = obj, getattr(obj, part)
AttributeError: 'module' object has no attribute 'gae_datastore_services_test'

@seanlip
Copy link
Member Author

seanlip commented Aug 16, 2017

Thanks for posting the code! The refactor is looking great, actually -- I think it's nice to separate the search stuff from the other stuff, so thank you very much for doing it. It makes things a lot cleaner.

Also, I agree with your diagnosis. It looks like search_services is trying to get the exploration summary from exp_services, and exp_services is asking search_services to index stuff. How about we do something like this instead (for explorations; collections is similar):

  • call index_explorations_given_ids() with a list of exploration summaries instead. That way, all the data is already there in the summary objects. (The function name should probably be changed, maybe to index_exploration_summaries().) This removes the dependency of search_services on exp_services.
  • define some of the functions in search_services as helper methods on the ExplorationSummary object instead, then we can just call them in-place since we'll be given the summary objects themselves to index in search_services.
  • if something external wants to call index_exploration_given_ids() then route them through exp_services, have that compile the summaries, and then that function should call search_services.index_exploration_summaries().

What do you think?

@LanJosh
Copy link
Contributor

LanJosh commented Aug 16, 2017

I think that should work well, but it seems some fields from the exploration are missing from the exploration summary -- the blurb and author notes. I'm unsure of how costly it is to add those fields.

@seanlip
Copy link
Member Author

seanlip commented Aug 16, 2017 via email

@LanJosh
Copy link
Contributor

LanJosh commented Aug 21, 2017

Hi Sean, I updated the PR as I'm having another difficulty. I started writing the test cases, but have to do some weird behavior to get them to pass which makes me think the test may be flaky in the future. After adding a rating using rating service, occasionally I have retrieve the summary again to get the updated ratings. I'm not sure why I have to only do this occasionally. Lines 65 and 98 in search_services_tests.py are probably the most relevant ones. I tried adding a wait instead to see if it just took some time to update, but that did not work. Looking at how the tests were previously implemented, it doesn't seem like it was necessary to retrieve the exploration domain object again, but they did retrieve the exploration summary domain object before each assertion.

@seanlip
Copy link
Member Author

seanlip commented Aug 21, 2017

Hi @LanJosh! Yes, retrieving the summary again each time is expected behavior. Think about it like this: the domain object is just a plain Python object which has no expectations of persistence -- unlike the stuff in the storage model layer. So if you update the rating, the datastore object is updated, but the domain object you've already retrieved isn't -- it's just a snapshot of the datastore object at the time it was originally retrieved. So, you'll need to retrieve it again each time. Does that make sense?

LanJosh added a commit to LanJosh/oppia that referenced this issue Aug 30, 2017
…dashboard to an "index all activities" job
LanJosh added a commit to LanJosh/oppia that referenced this issue Aug 30, 2017
…dashboard to an "index all activities" job
giritheja added a commit to giritheja/oppia that referenced this issue Sep 17, 2017
…t-m2

* upstream/develop: (27 commits)
  Fixes 3687: batch call for exploration and exploration rights object together. (oppia#3815)
  Fix part oppia#3400 Objective field directive (oppia#3740)
  Fix oppia#3800 Dropdown menu of the ABOUT in navigation bar (oppia#3855)
  Fix oppia#3295 Save draft change list to local storage. (oppia#3584)
  Disable learner dashboard feedback updates send button when no text entered. (oppia#3860)
  Disable save button while uploading audio to server; add Spanish support to audio languages; add audio loading message to learner view (oppia#3856)
  Fix oppia#3834: Introduce backend event models to record statistics (oppia#3841)
  Fix oppia#3852: Fix failing e2e test in develop (oppia#3853)
  Fix oppia#3826: Move validatorsService from app.js to ValidatorsService.js (oppia#3847)
  Update the max RAND limit for ID generation to fit within 32 bits. (oppia#3843)
  Fix oppia#3404: Add default placeholder text for mobile devices  (oppia#3845)
  Fix oppia#2553: Change the "index all explorations" job in the admin dashb… (oppia#3831)
  Add I18N to give up button; change tooltip text. (oppia#3844)
  Fix oppia#3721: Update the release_info script to use LCA and the most recent release tag in order to compile the list of changes, rather than relying on git describe. (oppia#3838)
  show_warning_only_on_button_click (oppia#3833)
  Deprecate splash page experiment (oppia#3829)
  Fixes for end to end working between Oppia and Oppia-ml. (oppia#3824)
  Fix 'Empty path passed in method' error on the collection page. (oppia#3827)
  Learner dashboard 3.2 (oppia#3759)
  📝 Fix oppia#3789 :Space out the profile drop down icons (oppia#3789). (oppia#3817)
  ...
@seanlip seanlip removed this from the Recommended projects milestone Sep 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants