Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endpoints api object needs change to allow useful merge patching #47787

Closed
shyamjvs opened this issue Jun 20, 2017 · 18 comments
Closed

Endpoints api object needs change to allow useful merge patching #47787

shyamjvs opened this issue Jun 20, 2017 · 18 comments
Labels
area/api Indicates an issue on api area. area/controller-manager kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.

Comments

@shyamjvs
Copy link
Member

Problem: Endpoints object currently is unsuitable for doing any reasonable PATCH operation (i.e. at the level of individual list entries, because a PATCH on the entire list is as good as a PUT of the endpoints object and hence useless).

Reason: There is no way of key'ing individual entries in the endpoints list currently because we do not have one entry for each <IP, Port> pair but rather have one entry for all the IPs having the same set of ports. This is represented as a cartesian product:

#### trimmed down for brevity ####
type Endpoints struct {
	Subsets []EndpointSubset `json:"subsets" protobuf:"bytes,2,rep,name=subsets"`
}

// For eg.
//   {
//     Addresses: [{"ip": "10.10.1.1"}, {"ip": "10.10.2.2"}],
//     Ports:     [{"name": "a", "port": 8675}, {"name": "b", "port": 309}]
//   }
type EndpointSubset struct {
	Addresses []EndpointAddress `json:"addresses,omitempty" protobuf:"bytes,1,rep,name=addresses"`
	NotReadyAddresses []EndpointAddress `json:"notReadyAddresses,omitempty" protobuf:"bytes,2,rep,name=notReadyAddresses"`
	Ports []EndpointPort `json:"ports,omitempty" protobuf:"bytes,3,rep,name=ports"`
}

We need to key the EndpointSubset struct somehow.
Also, the compaction of size => some extra computation for the patch object (but I guess we are fine with it?)

cc @kubernetes/sig-api-machinery-misc @kubernetes/sig-scalability-misc @smarterclayton @gmarek

@shyamjvs shyamjvs added area/api Indicates an issue on api area. area/controller-manager kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels Jun 20, 2017
@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. labels Jun 20, 2017
@shyamjvs
Copy link
Member Author

This can help a lot in the performance of the endpoint controller which is currently using PUTs instead of PATCHs. Related issue: #47597

@shyamjvs
Copy link
Member Author

Maybe we can use the 'Addresses' list itself as the key but I'm not sure if that's going to be unique (though I'd hope so) and if it's fine to have a list as a key.
Moreover, seems like some previous discussions favoured using maps with keys for objects than having lists (ref: #4889) in such cases.
@smarterclayton WDYT?

@liggitt
Copy link
Member

liggitt commented Jun 20, 2017

without contention, patch is actually more expensive both client-side (because patch computation is required) and server-side (because patch application and conflict-detection is required).

do we have contention on writing endpoints? I expected the endpoints controller to largely be the only writer

@liggitt
Copy link
Member

liggitt commented Jun 20, 2017

I thought the main issue was the extremely short resync interval, not write conflicts

@smarterclayton
Copy link
Contributor

I suggested PATCH to do blind overwrite, which avoids having to round trip between client and server, which reduces latency, which means the endpoints controller clears work faster, and moves more CPU to the master, where there is less latency to etcd.

@gmarek
Copy link
Contributor

gmarek commented Jun 21, 2017

I agree with @smarterclayton - we want to avoid unnecessary GETs. The issue here is that we'd need to do some API changes to accomplish that. The question here is - are we doing things like this?

@shyamjvs
Copy link
Member Author

cc @lavalamp @wojtek-t

@liggitt
Copy link
Member

liggitt commented Jun 21, 2017

The issue here is that we'd need to do some API changes to accomplish that.

does a json merge patch that overwrites the entire subsets list not work as expected?

@shyamjvs
Copy link
Member Author

@liggitt We are currently PUT'ing the endpoints object which IIUC is as good as merging the entire subsets list (as that's pretty much the whole endpoints object). This is needing us to send huge endpoints objects over the wire each time. Also (as @smarterclayton pointed out), this could be bad if just a single endpoint is flapping states.

@shyamjvs
Copy link
Member Author

shyamjvs commented Jun 21, 2017

Seems like it's not possible to merge patch individual elements of arrays as per RFC 7386 (the one that our jsonpatch vendored package follows). Quoting from the document:
Also, it is not possible to patch part of a target that is not an object, such as to replace just some if the values in an array.

However IIUC we've solved this problem in k8s by using some field of the array elements' struct as the patchMergeKey, so individual array elements can be keyed on that field.
This means adding a 'key' field to the EndpointSubset struct should work. Does this make sense @smarterclayton / @liggitt ?

@liggitt
Copy link
Member

liggitt commented Jun 21, 2017

This means adding a 'key' field to the EndpointSubset struct should work.

I think that would be a breaking API change

@shyamjvs
Copy link
Member Author

shyamjvs commented Jun 21, 2017

That's true. However if we want to be able to merge patch, we need the elements of the array to be somehow keyed.

Another option I could think of was to use the index of the array element as it's key. This won't need any API change, but it's not a very clean way. Because a patch operation like "update element at index 5" might end up reaching after some other element got inserted there. What's worse is it won't be detected as a conflict and you'll end up wrongly updating unless there's some other safety mechanism.
Could there be some better option or we just live without patches? :)

@shyamjvs
Copy link
Member Author

Closing this issue as it is infeasible without breaking API change. Let's reopen it in future if need be.

@smarterclayton
Copy link
Contributor

smarterclayton commented Jun 28, 2017 via email

@gmarek
Copy link
Contributor

gmarek commented Jun 29, 2017

Isn't that pretty much equivalent to normal update? There's not much data except those addresses.

@shyamjvs
Copy link
Member Author

@smarterclayton Yes, the reason for starting the issue was to do smarter patching than patching the whole list (which is basically the entire ep object). Unless there are some other benefits of patching I might be missing?

@smarterclayton
Copy link
Contributor

smarterclayton commented Jun 29, 2017 via email

@gmarek
Copy link
Contributor

gmarek commented Jun 29, 2017

OK, I clearly don't understand what you mean by "send all of subsets as one patch". Isn't it pretty much equivalent to plain update (assuming that metadata didn't change), except that instead of using protobuf we'd use JSON?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Indicates an issue on api area. area/controller-manager kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.
Projects
None yet
Development

No branches or pull requests

5 participants