-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change the "index all explorations" job in the admin dashboard to an "index all activities" job #2553
Comments
@seanlip How to test this? and can you explain the refactoring part because i could not find the method search_services.index_activities() |
Go to /admin, and open the jobs panel; you should see a list of one-off jobs that you can run. I think there's functionality on localhost:8000 for manipulating the search index, so you could create a couple explorations, a couple collections, clear the search index, and see if running the job repopulates it. This can (and should) also be written as a backend test. The idea with the refactoring is to create search_services by moving the functionality from exp_services to it. That's why the function doesn't exist yet. I should mention, though, that refactoring these things can be trickier than it looks at the outset (you'll need to make sure that there are no circular loops and that the backend architecture is still "layered" and coherent), so it might be better to take this up later if you feel like you need more experience with the codebase. |
Hi @LanJosh, would you be interested in taking this one up? |
Hi @LanJosh, updates on this issue? thanks! |
Hi @kevinlee12, sorry for taking so long. Currently trying to wrap my head around what's happening in the code right now, haven't gotten around to doing much of the refactoring yet! |
okay, let me know if you have any questions! |
Hi @kevinlee12 I'm a bit confused on indexing the activities. The entity classes to map over I think should be the CollectionModel and the ExplorationModel, but there is an ActivityModel that I can't see a relationship to. |
@LanJosh -- sorry, I'm confused. Where are you seeing ActivityModel? I can't find it in the codebase. FYI, "activity" is a generic term that we sometimes use for referring to either an exploration or a collection. |
Ah -- if you're referring to the use of "all activities" in this issue, it just means "all explorations and collections". |
I was referring to the ActivitiesReferenceModel in core.storage.activity, but that makes sense to just use collection and exploration model in the case of all activities for this issue. |
Ah, I see. I think that just stores an (activity_type, activity_id) reference that's used by code elsewhere -- I don't think you need to touch it for this issue. |
For this issue I think there should be services from collection_services pulled into the new search_services file as well, in order to index the explorations and collections in the same job manager. Is there a simple way to differentiate between a collection and an exploration by their ID in order to select the correct indexing behavior? Or am I on the wrong path with merging those two services? |
I think you're right that some stuff needs to be pulled in from collection_services. There is no way to differentiate based on ID alone, you need to know the activity type ('exploration' or 'collection') as well. |
@LanJosh Any progress on this? |
@anthkris I got a bit stuck on how to differentiate between the activity types and lost the thread. I had thought by returning exp_models.ExplorationModel and collection_models.CollectionModel in the EntityClassesToMapOver function I would have some information in the map function to differentiate, but it's still mysterious to me. |
Hi @LanJosh -- I think you're on the right track, actually. If you look at core/domain/stats_jobs_one_off.py, you'll see that the first job in that file uses isinstance to determine the type of the item. Does that help? |
Yeah, I think I'll be able to use that. Thanks! |
Had a few more questions if someone can answer them. Testing manually, the code seems to work, after deleting the objects in the full text search tab at localhost:8000, running the job make them appear again. I'm not having much luck with the unit tests though. After running the backend tests, it notes all test have passed but several warnings appear saying some tests were not run, probably due to an import error. Looking into the imports, it seems like it originates from a file I touched, exp_services, but the failed import is google.appengine.datastore, which is strange to me because gae seems to start locally. I tried resetting my virtualenv and redownloading gae using the prerequisite scripts but haven't had any luck. Any idea what could be going on? |
Hi @LanJosh, could you post a "[not for merge] work-in-progress" PR showing your changes, and the full error trace? Hard to tell what's going on just from the description. Thanks! |
PR: #3755 One example of the error trace
SUCCESS core.domain.user_query_services_test: 1 tests (2.5 secs) |
Thanks for posting the code! The refactor is looking great, actually -- I think it's nice to separate the search stuff from the other stuff, so thank you very much for doing it. It makes things a lot cleaner. Also, I agree with your diagnosis. It looks like search_services is trying to get the exploration summary from exp_services, and exp_services is asking search_services to index stuff. How about we do something like this instead (for explorations; collections is similar):
What do you think? |
I think that should work well, but it seems some fields from the exploration are missing from the exploration summary -- the blurb and author notes. I'm unsure of how costly it is to add those fields. |
Iirc we don't actually have those fields populated at the moment anyway,
they are all blank. So it should be fine to remove them from the "search
doc"!
…On Wednesday, August 16, 2017, Joshua Lan ***@***.***> wrote:
I think that should work well, but it seems some fields from the
exploration are missing from the exploration summary -- the blurb and
author notes. I'm unsure of how costly it is to add those fields.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2553 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AKFeyulNMWpCpOT84rIf4c0vUynib89bks5sYv_ygaJpZM4KMQc_>
.
|
Hi Sean, I updated the PR as I'm having another difficulty. I started writing the test cases, but have to do some weird behavior to get them to pass which makes me think the test may be flaky in the future. After adding a rating using rating service, occasionally I have retrieve the summary again to get the updated ratings. I'm not sure why I have to only do this occasionally. Lines 65 and 98 in search_services_tests.py are probably the most relevant ones. I tried adding a wait instead to see if it just took some time to update, but that did not work. Looking at how the tests were previously implemented, it doesn't seem like it was necessary to retrieve the exploration domain object again, but they did retrieve the exploration summary domain object before each assertion. |
Hi @LanJosh! Yes, retrieving the summary again each time is expected behavior. Think about it like this: the domain object is just a plain Python object which has no expectations of persistence -- unlike the stuff in the storage model layer. So if you update the rating, the datastore object is updated, but the domain object you've already retrieved isn't -- it's just a snapshot of the datastore object at the time it was originally retrieved. So, you'll need to retrieve it again each time. Does that make sense? |
…dashboard to an "index all activities" job
…dashboard to an "index all activities" job
…t-m2 * upstream/develop: (27 commits) Fixes 3687: batch call for exploration and exploration rights object together. (oppia#3815) Fix part oppia#3400 Objective field directive (oppia#3740) Fix oppia#3800 Dropdown menu of the ABOUT in navigation bar (oppia#3855) Fix oppia#3295 Save draft change list to local storage. (oppia#3584) Disable learner dashboard feedback updates send button when no text entered. (oppia#3860) Disable save button while uploading audio to server; add Spanish support to audio languages; add audio loading message to learner view (oppia#3856) Fix oppia#3834: Introduce backend event models to record statistics (oppia#3841) Fix oppia#3852: Fix failing e2e test in develop (oppia#3853) Fix oppia#3826: Move validatorsService from app.js to ValidatorsService.js (oppia#3847) Update the max RAND limit for ID generation to fit within 32 bits. (oppia#3843) Fix oppia#3404: Add default placeholder text for mobile devices (oppia#3845) Fix oppia#2553: Change the "index all explorations" job in the admin dashb… (oppia#3831) Add I18N to give up button; change tooltip text. (oppia#3844) Fix oppia#3721: Update the release_info script to use LCA and the most recent release tag in order to compile the list of changes, rather than relying on git describe. (oppia#3838) show_warning_only_on_button_click (oppia#3833) Deprecate splash page experiment (oppia#3829) Fixes for end to end working between Oppia and Oppia-ml. (oppia#3824) Fix 'Empty path passed in method' error on the collection page. (oppia#3827) Learner dashboard 3.2 (oppia#3759) 📝 Fix oppia#3789 :Space out the profile drop down icons (oppia#3789). (oppia#3817) ...
There is a job, exp_jobs_one_off.IndexAllExplorationsJobManager, that iterates through all the explorations and adds them to the search index. This should really be replaced with a job called activity_jobs_one_off.IndexAllActivitiesJobManager that instead iterates through all explorations and collections, and adds them to the search index.
This may entail a slight refactor (e.g. moving all search-index-related stuff to a new core.domain.search_services file so that methods like search_services.index_activities() can be called).
The text was updated successfully, but these errors were encountered: