Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check return value when calling ensureDnsRecords #28568

Conversation

xiangpengzhao
Copy link
Contributor

@xiangpengzhao xiangpengzhao commented Jul 7, 2016

When lockedUpdateDNSRecords calls ensureDnsRecords, it should check the return value. If it returns error, the ensuredCount should not increment.


This change is Reviewable

@k8s-bot
Copy link

k8s-bot commented Jul 7, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

1 similar comment
@k8s-bot
Copy link

k8s-bot commented Jul 7, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

ensuredCount += 1
if err := s.ensureDnsRecords(clusterName, service); err == nil {
ensuredCount += 1
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return the error directly ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes probably best to return the error here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jfrazelle ! I'm on vocation now, will fix it when I'm back to office.

@k8s-bot
Copy link

k8s-bot commented Jul 7, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

2 similar comments
@k8s-bot
Copy link

k8s-bot commented Jul 7, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

@k8s-bot
Copy link

k8s-bot commented Jul 7, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

@k8s-github-robot k8s-github-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note-label-needed labels Jul 7, 2016
@xiangpengzhao xiangpengzhao force-pushed the check_ensureDnsRecords_return_value branch 2 times, most recently from 3123d28 to 0ff3c52 Compare July 11, 2016 07:48
@xiangpengzhao
Copy link
Contributor Author

Fixed and rebased. PTAL @jfrazelle @nikhiljindal

@jessfraz
Copy link
Contributor

seems good to me!

On Mon, Jul 11, 2016 at 1:53 AM, Peter Zhao notifications@github.com
wrote:

Fixed and rebased. PTAL @jfrazelle https://github.com/jfrazelle
@nikhiljindal https://github.com/nikhiljindal


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#28568 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABYNbDHT409kPyh9I_k2jkYPOzBLzkWlks5qUfaFgaJpZM4JGsa9
.

@nikhiljindal
Copy link
Contributor

cc @quinton-hoole @kubernetes/sig-cluster-federation
@k8s-bot ok to test

@@ -728,7 +728,9 @@ func (s *ServiceController) lockedUpdateDNSRecords(service *cachedService, clust
for key := range s.clusterCache.clientMap {
for _, clusterName := range clusterNames {
if key == clusterName {
s.ensureDnsRecords(clusterName, service)
if err := s.ensureDnsRecords(clusterName, service); err != nil {
return fmt.Errorf("Failed to update DNS records for service %v due to %v", service, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about logging the error, but still try to ensureDNS records for other clusters?
That way we can return errors for all clusters at once

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we can't just bail out like this, unfortunately, as that will potentially cause updates of all services and clusters to fail. I'd be in support of logging an error here, and not incrementing the ensuredCount if the returned value from ensureDnsRecords is not nil.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nikhiljindal @quinton-hoole , I will change as you commented.

@ghost
Copy link

ghost commented Jul 19, 2016

Thanks @xiangpengzhao ! My apologies - I only saw this PR today. See my in-code comment. I think we need a fairly minor change before merging.

@ghost ghost added area/cluster-federation priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Jul 19, 2016
@ghost
Copy link

ghost commented Jul 19, 2016

@xiangpengzhao Does this fix any specific bug you're seeing? If so, could you link to the associated issue here please.

Also, note that we do not yet have an automated pre-submit e2e test for Federation, so you will need to run the Federation tests manually, and paste the results here.

Thanks

@xiangpengzhao xiangpengzhao force-pushed the check_ensureDnsRecords_return_value branch from 0ff3c52 to 13eeb12 Compare July 20, 2016 07:26
@xiangpengzhao
Copy link
Contributor Author

Does this fix any specific bug you're seeing? If so, could you link to the associated issue here please.

I found this when learning and reviewing Federation code, so there isn't associated bug AFAIK.

Also, note that we do not yet have an automated pre-submit e2e test for Federation, so you will need to run the Federation tests manually, and paste the results here.

I tried to run e2e test as the doc federation-e2e-tests but failed to run in my local env. Will the PR check Jenkins GCE e2e or Jenkins GCE e2e run e2e test for Federation? If so, I will fetch the log of them.

@xiangpengzhao
Copy link
Contributor Author

Code changed, PTAL @nikhiljindal @quinton-hoole

@xiangpengzhao xiangpengzhao force-pushed the check_ensureDnsRecords_return_value branch 3 times, most recently from e2bc83b to b6c832a Compare July 21, 2016 08:13
@xiangpengzhao
Copy link
Contributor Author

/cc @nikhiljindal @quinton-hoole

I found this when learning and reviewing Federation code, so there isn't associated bug AFAIK.

I tried to run e2e test as the doc federation-e2e-tests but failed to run in my local env. Will the PR check Jenkins GCE e2e or Jenkins GCE e2e run e2e test for Federation? If so, I will fetch the log of them.

@xiangpengzhao xiangpengzhao force-pushed the check_ensureDnsRecords_return_value branch from b6c832a to 392c181 Compare July 21, 2016 17:27
@nikhiljindal
Copy link
Contributor

No Jenkins GCE e2e will not run federation tests (yet, we have an open issue: kubernetes/test-infra#155).

@k8s-bot
Copy link

k8s-bot commented Aug 3, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

@nikhiljindal
Copy link
Contributor

@xiangpengzhao Sorry for all the trouble, we definitely need to make it easier to run federation e2e tests.
I have just sent a PR kubernetes/test-infra#335 that should help reduce the pain.

@xiangpengzhao
Copy link
Contributor Author

@nikhiljindal Glad to see that PR. Thank you so much for your work on it! Hope to see that PR merged ASAP:)

@k8s-bot
Copy link

k8s-bot commented Aug 10, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

@apelisse apelisse removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Aug 11, 2016
@xiangpengzhao
Copy link
Contributor Author

@k8s-bot federation gce e2e test this

@xiangpengzhao
Copy link
Contributor Author

@nikhiljindal I saw that your PR kubernetes/test-infra#335 was merged, cool! I commented @k8s-bot federation gce e2e test this as mentioned in the PR, but it didn't work. Can't the owner of one PR request a federation test ? Thanks!

@nikhiljindal
Copy link
Contributor

Yes. As with all other kubernetes jenkins tests, only admins will be able to trigger federation test.
My PR was merged, but I later found another issue which I am fixing in kubernetes/test-infra#389.

I have #30458 to make sure the test works as expected.

Thanks for your patience

@xiangpengzhao
Copy link
Contributor Author

@nikhiljindal Thanks for your wonderful work! I see that kubernetes/test-infra#389 was also merged 3 hours ago. Happy waiting to try the federation test 😄

@nikhiljindal
Copy link
Contributor

#26723 getting fully functional will take some time.
This code LGTM. Thanks for your patience.

@nikhiljindal nikhiljindal added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 17, 2016
@k8s-github-robot
Copy link

@k8s-bot test this

Tests are more than 96 hours old. Re-running tests.

@k8s-bot
Copy link

k8s-bot commented Aug 17, 2016

GCE e2e build/test passed for commit 4ccd90d.

@nikhiljindal nikhiljindal removed the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Aug 17, 2016
@xiangpengzhao
Copy link
Contributor Author

Thanks @nikhiljindal ! Will this PR be merged after you labeled 'lgtm' and it passed all the current checks? Do we need to continue to wait for the federation test to ensure it pass the test ?

@xiangpengzhao
Copy link
Contributor Author

cc @quinton-hoole

@ghost ghost modified the milestones: next-candidate, v1.4 Aug 18, 2016
@ghost ghost added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Aug 18, 2016
@ghost
Copy link

ghost commented Aug 18, 2016

@k8s-bot federation gce e2e test this

@ghost
Copy link

ghost commented Aug 18, 2016

@xiangpengzhao The above should run the test for you, I think. I'm surprised that this has not already auto-merged, given that it has passed all of the mandatory checks already. I'll check up for you.

@ghost
Copy link

ghost commented Aug 18, 2016

@xiangpengzhao I checked and apparently the merge-bot is down at the moment. That's being worked on, and as soon as it's resolved, this PR should get automatically merged.

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Aug 18, 2016

GCE e2e build/test passed for commit 4ccd90d.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 83ded28 into kubernetes:master Aug 18, 2016
@xiangpengzhao
Copy link
Contributor Author

Thanks @quinton-hoole !

The federation gce e2e test ran but failed due to below error. It seems to be some infra reasons.
Though the PR was merged, I hope that my code wouldn't have potential bugs : )

Wrote config for federation-cluster to /var/lib/jenkins/workspace/kubernetes-pull-build-test-federation-e2e-gce/.kube/config
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Pending)
federation-apiserver pod is not running! giving up.
2016/08/18 11:20:37 e2e.go:453: Error running up: exit status 1
2016/08/18 11:20:37 e2e.go:449: Step 'up' finished in 13m22.109745587s
2016/08/18 11:20:37 e2e.go:303: Dumping cluster logs to: /var/lib/jenkins/workspace/kubernetes-pull-build-test-federation-e2e-gce/_artifacts
2016/08/18 11:20:37 e2e.go:447: Running: dump cluster logs
Dumping master and node logs to /var/lib/jenkins/workspace/kubernetes-pull-build-test-federation-e2e-gce/_artifacts
Master not detected. Is the cluster up?
Nodes not detected. Is the cluster up?
2016/08/18 11:20:38 e2e.go:449: Step 'dump cluster logs' finished in 1.62235239s
2016/08/18 11:20:38 e2e.go:131: Error starting e2e cluster. Aborting.
exit status 1

@nikhiljindal
Copy link
Contributor

Yes am working on that. Not related to your PR

@xiangpengzhao xiangpengzhao deleted the check_ensureDnsRecords_return_value branch December 4, 2017 06:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants