-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export schedule dispatch #5643
Export schedule dispatch #5643
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #5643 +/- ##
===========================================
- Coverage 45.83% 45.79% -0.05%
===========================================
Files 511 513 +2
Lines 29826 29872 +46
Branches 4514 4519 +5
===========================================
+ Hits 13672 13679 +7
- Misses 16154 16193 +39
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tjiang11! Looks pretty good, left a few comments.
export/app.yaml
Outdated
version: "latest" | ||
|
||
handlers: | ||
- url: /cloud-datastore-export |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this handle URLs with stuff after /cloud-datastore-export too? Like, it should 404 on /cloud-datastore-export/1?
Asking because I noticed such URL requests get sent here by dispatch.yaml but then don't seem to get handled here if I'm reading correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm actually I think maybe dispatch.yaml just shouldn't include anything after /cloud-datastore-export, I don't see why it would need to. I'll change dispatch.yaml
dispatch.yaml
Outdated
@@ -0,0 +1,4 @@ | |||
dispatch: | |||
# Send requests for exports to the cloud datastore admin service. | |||
- url: "*/cloud-datastore-export/*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use underscores instead of hyphens, same as we do in other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
export/cloud_datastore_admin.py
Outdated
|
||
output_url_prefix = self.request.get('output_url_prefix') | ||
assert output_url_prefix and output_url_prefix.startswith('gs://') | ||
if '/' not in output_url_prefix[5:]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does 5: come from? Maybe make the origin of the "5" more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question, this code came from the tutorial on the online docs. I forgot to ask about accreditation, since when I tried to accredit it to Google in the copyright notice, I couldn't pass the lint checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, just leave a comment in the file with a link to the source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comment
export/cloud_datastore_admin.py
Outdated
} | ||
headers = { | ||
'Content-Type': 'application/json', | ||
'Authorization': 'Bearer ' + access_token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer interpolation: 'Bearer %s' % access_token
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
deadline=60, | ||
headers=headers) | ||
if result.status_code == httplib.OK: | ||
logging.info(result.content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's going to be lots of logs, maybe log a summary at the end of how many data items were successfully exported instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can you tell this is going to be a lot of logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's one per request, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it just be one log then, from one request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, you're right! I am not sure what I was thinking here, for some reason I thought there was a "for" loop. Please ignore this comment.
export/cloud_datastore_admin.py
Outdated
import webapp2 | ||
|
||
|
||
class Export(webapp2.RequestHandler): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExportToCloudDatastoreHandler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
export/cloud_datastore_admin.py
Outdated
|
||
class Export(webapp2.RequestHandler): | ||
|
||
def get(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard this by a cron header check? See cron.py for how to do this -- @acl_decorators.can_perform_cron_tasks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having some trouble with this, when I try running the cron job it keeps saying it can't find the module core.domain? I tried putting an init.py in the export folder but it seems to say the same thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I wouldn't have expected that. I wonder whether it's because app engine is regarding export/ as its own folder that's isolated from the world.
Perhaps take a look at pythonpath in that file to see what's in the pythonpath, and contrast it to what you get when logging pythonpath in a file in core?
If they truly are separated then it's fine to add a decorators.py file in export/ as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look into the pythonpath, and it does seem like it's different, since the first thing listed is referring to the cloud-datastore-admin app specifically. I went ahead and made a separate acl_decorators.
Note that the current state does not properly work. |
export/cloud_datastore_admin.py
Outdated
# Look for slash in the portion of the bucket URL that comes | ||
# after 'gs://'. If not present, then only a bucket name has been | ||
# provided and we append a trailing slash. | ||
if '/' not in output_url_prefix[5:]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll still want to do something about the "5:". Maybe extract 'gs://' into a named constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple more things, otherwise LGTM. Thanks!
export/app.yaml
Outdated
|
||
libraries: | ||
- name: webapp2 | ||
version: "latest" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this a specific version. (Don't want it to change under us.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
export/cloud_datastore_admin.py
Outdated
import json | ||
import logging | ||
|
||
import oppia.export.acl_decorators as acl_decorators |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from oppia.export import acl_decorators
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Sets up automatic backups utilizing the GCP import export service, as datastore admin backups are being deprecated.