Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EMR cluster master node hostname to info logs for convenience. #2007

Closed
wants to merge 1 commit into from

Conversation

aaronabf
Copy link
Contributor

Also removes superfluous setting of _emr_job_start.

Also removes superfluous setting of `_emr_job_start`.
@coyotemarin
Copy link
Collaborator

coyotemarin commented Feb 28, 2019

Hmm, printing the hostname might more distracting than useful. Good catch on _emr_job_start and _create_cluster(). Going to fork this branch and then merge it.

@aaronabf
Copy link
Contributor Author

Can I ask why you think this is distracting / not useful?

Very frequently when a job fails it is useful to ssh into the master. This logline skips the extra step of accessing the EMR UI to look up the master node ip.

@coyotemarin
Copy link
Collaborator

Oh, I see. I usually I ssh into the master node with aws emr ssh --cluster-id <cluster id>, so the hostname is kind of redundant.

Basically what I'm trying to avoid is making it harder to find the cluster ID (which is really, really useful information) by putting it right next to the hostname, which is much longer and only useful in specific cases.

It sounds like really when you need the hostname is after the job failed, not when we first join the cluster. And it's a little odd to provide it for existing pooled clusters only.

@aaronabf
Copy link
Contributor Author

I usually I ssh into the master node with aws emr ssh --cluster-id <cluster id>, so the hostname is kind of redundant.

Ah I see. I tend to not use the ssh command as 1) less api calls to potentially be throttled and 2) I often port forward to access the resource manager UI.

Basically what I'm trying to avoid is making it harder to find the cluster ID

What about putting put on a different log-line then?

And it's a little odd to provide it for existing pooled clusters only.

Ya I thought about that, but for new clusters we can't access the IP until it is finished provisioning.

@coyotemarin
Copy link
Collaborator

A different log line would be fine.

How about I integrate printing master node's address into the code that checks the cluster's status after the job has been submitted?

@aaronabf
Copy link
Contributor Author

Sure, something along those lines sounds good to me. I can also code that up if you would like.

@coyotemarin
Copy link
Collaborator

Sure, that would be super! Something like adding a method that logs the cluster's hostname if we haven't already, and adding it to _wait_for_steps_to_complete() and _wait_for_step_to_complete()?

Also don't forget that the runner can give up on a cluster and relaunch with a different one (see _relaunch()).

@coyotemarin
Copy link
Collaborator

Thank you for all the pull requests, by the way. :)

@aaronabf
Copy link
Contributor Author

_wait_for_step*_to_complete sounds good to me!

And my pleasure! We use mrjob quite heavily in our offline batch processing; just enjoying helping the tool evolve.

@coyotemarin
Copy link
Collaborator

Sorry to put this off for so long. Will try to get this into the next release.

@aaronabf
Copy link
Contributor Author

No worries, thanks David!

@coyotemarin coyotemarin added this to the v0.6.10 milestone May 31, 2019
@coyotemarin
Copy link
Collaborator

Superseded by #2074

@aaronabf aaronabf deleted the upstream/master-ip branch May 31, 2019 23:49
@coyotemarin coyotemarin removed this from the v0.6.10 milestone Jul 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants