Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check return value when calling ensureDnsRecords #28568

Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 13 additions & 5 deletions federation/pkg/federation-controller/service/servicecontroller.go
Original file line number Diff line number Diff line change
Expand Up @@ -724,18 +724,26 @@ func (s *ServiceController) lockedUpdateDNSRecords(service *cachedService, clust
if !wantsDNSRecords(service.appliedState) {
return nil
}

ensuredCount := 0
unensuredCount := 0
for key := range s.clusterCache.clientMap {
for _, clusterName := range clusterNames {
if key == clusterName {
s.ensureDnsRecords(clusterName, service)
ensuredCount += 1
err := s.ensureDnsRecords(clusterName, service)
if err != nil {
unensuredCount += 1
glog.V(4).Infof("Failed to update DNS records for service %v from cluster %s: %v", service, clusterName, err)
} else {
ensuredCount += 1
}
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error 2 lines below is not correct now. We need to keep track of number of missing cluster clients and number of failed ensureDNsRecords and return an error accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasonable. The numbers may be counted by below?

  1. number of missing cluster clients: count of key != clusterName
  2. number of failed ensureDNsRecords: count of when s.ensureDnsRecords return non nil?

If so, we can define two new variables: missedCount, unensuredCount(instead current ensuredCount)

if ensuredCount < len(clusterNames) {
return fmt.Errorf("Failed to update DNS records for %d of %d clusters for service %v due to missing clients for those clusters",
len(clusterNames)-ensuredCount, len(clusterNames), service)
missedCount := len(clusterNames) - ensuredCount - unensuredCount
if missedCount > 0 || unensuredCount > 0 {
return fmt.Errorf("Failed to update DNS records for %d clusters for service %v due to missing clients [missed count: %d] and/or failing to ensure DNS records [unensured count: %d]",
len(clusterNames), service, missedCount, unensuredCount)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my last commit, the way calculating missedCount is wrong. It cannot increment in the for loop. Fixed in this commit.

@quinton-hoole According to your in-code comments in last commit, I add the V4 log info when s.ensureDnsRecords returns error (call s.err below). But I cancel including any error info (s.err or all-cluster-missed error) in the final returned error (call final-err below). There are some reasons:

  1. When there is more than one s.err: for debugging, if we don't care about s.err, it's not necessary to include it in final-err; if we care about s.err, only include the last s.err in final.err is not enough, we have to inspect the logged s.err in each loop, so it's not necessary to include it in final-err either.
  2. When there is no s.err, there may be two cases: 1). all cluster miss; 2) some clusters miss and other ensureDnsRecords successfully. If we set error for these two cases along with s.err, it will make the logic complicated and code somewhat ugly.

IMO, the condition if missedCount > 0 || unensuredCount > 0 can cover all abnormal cases, and it's error text give enough info.

For example, if it prints:
Failed to update DNS records for 5 clusters for service foo due to missing clients [missed count: 2] and/or failing to ensure DNS records [unensured count: 1]
we know that there are 5 clusters, 2 missed, 1 failed to ensureDnsRecords and 2 succeeded.

I'm not sure if I describe clearly. Any more discuss is welcome.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, the comments posted 10 days ago are listed here. We can ignore them here.

return nil
}
Expand Down