Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api federation types #37561

Merged
merged 4 commits into from
Dec 6, 2016
Merged

Conversation

deads2k
Copy link
Contributor

@deads2k deads2k commented Nov 28, 2016

First commit adds types that can back the kubernetes-discovery server with an kubectl compatible way of adding federated servers. Second commit is just generated code.

After we have types, I'd like to start splitting kubernetes-discovery into a "legacy" mode which will support what we have today and a "normal" mode which will provide an API federation server like this: https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/federated-api-servers.md that includes both discovery and proxy in a single server. Something like this: https://github.com/openshift/kube-aggregator .

@kubernetes/sig-api-machinery @nikhiljindal


This change is Reviewable

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 28, 2016
@k8s-github-robot k8s-github-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. release-note-label-needed labels Nov 28, 2016
&APIServer{},
&APIServerList{},

&v1.ListOptions{},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we do this?
I've seen it in a lot of places, but I don't understand why we need it here...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen it in a lot of places, but I don't understand why we need it here...

It allows you to decode the options for requests send to your group. The cross registration makes it a valid conversion target.

"strings"
)

func SortByGroup(servers []*APIServer) [][]*APIServer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: SortedByGroup.

It's not in place.

}

// APIServerSpec contains information for locating and communicating with a server.
// Only https is supported, though you are able to disable host verification.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not only host verification, also the whole cert check.

// InsecureSkipTLSVerify disables TLS certificate verification when communicating with this server.
// This is strongly discouraged. You should use the CABundle instead.
InsecureSkipTLSVerify bool
// CABundle is a PEM encoded CA bundle which be used to validate an API server's serving certificate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which will be used

return nil
}

func (obj *APIServer) GetObjectKind() schema.ObjectKind { return &obj.TypeMeta }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this inherited from TypeMeta?

},
field.NewPath("metadata"))

// in this case we all empty group
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all?

allErrs = append(allErrs, field.Invalid(field.NewPath("spec", "insecureSkipTLSVerify"), apiServer.Spec.InsecureSkipTLSVerify, "may not be true if caBundle is present"))
}
if !apiServer.Spec.InsecureSkipTLSVerify && len(apiServer.Spec.CABundle) == 0 {
allErrs = append(allErrs, field.Invalid(field.NewPath("spec", "caBundle"), apiServer.Spec.CABundle, "must be present if insecureSkipTLSVerify is missing"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which means you don't want to allow root cert bundle verification of non-self-signed certs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which means you don't want to allow root cert bundle verification of non-self-signed certs?

fixed

@deads2k deads2k added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels Nov 29, 2016
// Group is the API group name this server hosts
Group string
// Version is the API version this server hosts
Version string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a list?

I don't think we want to make this a list. Thinking about how I want to roll out a second version, I don't want to risk my v1 server to run a v2 server. That means I'm looking at a side-by-side during the transition period, which means GroupVersion granularity instead group granularity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does "side-by-side" mean more than one server could serve the same group/version?

do we want to have to repeat the whole object with all the server-specific bits if a single server is serving multiple groups and/or versions?

Copy link
Contributor Author

@deads2k deads2k Nov 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does "side-by-side" mean more than one server could serve the same group/version?

No. One service for v1, another service for v2.

do we want to have to repeat the whole object with all the server-specific bits if a single server is serving multiple groups and/or versions?

Yes. There's no reason to believe that things like priority order would be the same for every GroupVersion being served.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does "side-by-side" mean more than one server could serve the same group/version?

No. One service for v1, another service for v2.

how do you plan to enforce that in case of conflicts? oldest wins?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, you could do it that way, but I don't think we should force people to do it that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the name of the object is forced to be version.group (see validation rules), you can't end up with two trying to run the same API version.

Copy link
Contributor Author

@deads2k deads2k Nov 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, you could do it that way, but I don't think we should force people to do it that way.

I think that the consistent rule of "one groupversion, one APIServer resource" is very easy to describe, enforce, prioritize, and avoid conflicts for. When you start combining them, you open the API enough to be self-contradictory. I don't think that avoiding granular objects is worth it. I don't think its likely to be very onerous and prescriptiveness can help tooling built around it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as it's explicitly acceptable to point multiple of these records at the same backing service, I guess I'm OK with this. It does mean there'll be more configuration work involved.

(I was actually expecting to split only by group. I guess it does seem valuable to host different versions separately in some situations, but is it really necessary?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I was actually expecting to split only by group. I guess it does seem valuable to host different versions separately in some situations, but is it really necessary?)

This came out of my first encounter with the enemy as I was dealing with an openshift project API that had one version hosted in openshift and another hosted in a third server that works against bare kube. It got me thinking about what I wanted to do in an upgrade scenario and I realized I wanted to avoid risking my v1 server.


// APIServerSpec contains information for locating and communicating with a server.
// Only https is supported, though you are able to disable certificate verification.
type APIServerSpec struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually (potentially) represents a APIServer replica pool in HA setups. Name is misleading.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe APIService{Spec, Status} is a better name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe APIService{Spec, Status} is a better name.

doing

// Only https is supported, though you are able to disable certificate verification.
type APIServerSpec struct {
// InternalHost is the host:port for locating the server inside the pod/service network.
InternalHost string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's a Host:Port pair, call it an address.

But why present it like this? Why not make this take a service name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why present it like this? Why not make this take a service name?

I wasn't prepared to say that it must point to a particular service. An address allows an external reference more easily and some of the first that I wanted to federate weren't in-cluster to start.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point, but you can represent such things as a service by adding the endpoint(s) yourself. I think referencing a service will let us understand the topology in a way that a single address will not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point, but you can represent such things as a service by adding the endpoint(s) yourself. I think referencing a service will let us understand the topology in a way that a single address will not.

So you'd like a service reference plus a port?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think it's too limiting to ask everyone to use 443, then I guess we need a port.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think it's too limiting to ask everyone to use 443, then I guess we need a port.

As a service, I guess there's not a good reason to allow anything else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why present it like this? Why not make this take a service name?

doing


// InsecureSkipTLSVerify disables TLS certificate verification when communicating with this server.
// This is strongly discouraged. You should use the CABundle instead.
InsecureSkipTLSVerify bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this and the next field need to be in this type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this and the next field need to be in this type?

As a summarizer, the discovery server accepts the remote TLS connection on your behalf.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would that be configurable per-apiserver?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would that be configurable per-apiserver?

You could have different custom CAs per-apiserver. It's less likely, but still possible that you have a few black sheep (unverifiable) amongst the rest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I would say that the summarizer "owns" the discovery paths.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. And you can't do it the ordinary way because you want it to be dynamic?

What do you think about handling that with a separate resource entirely, or even omitting it in the first draft. I expect for most people this will be unnecessary, and those who do have multiple CAs will still have only a few--CAs won't be 1:1 with apiservers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CAs won't be 1:1 with apiservers

in the simplest case (each one self-signs a cert), they are

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. And you can't do it the ordinary way because you want it to be dynamic?

Right, and potentially per-api server if someone is creating "burn after signing" keys (instead of self-signing).

What do you think about handling that with a separate resource entirely, or even omitting it in the first draft. I expect for most people this will be unnecessary, and those who do have multiple CAs will still have only a few--CAs won't be 1:1 with apiservers.

In openshift, I'd agree since we have a service serving cert signer infrastructure, but we don't have that here and I don't know that I think the same signing key will be used for every service. I hit this almost immediately when I tried to use this API with a "simple" server I built from the generic API server, though I will admit that was using the self-signed startup flow.

How about making the field optional and the summarizer/proxier could also accept a CA bundle? Insecure trumps (I think I still want this per server), apiserver.spec.CA next, CA bundle, then system CAs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only use case I can see for that is a short dev/test cycle for apiservers themselves. I think in any real cluster, the sysadmin is going to want everything signed by the same CA? Is there really a good use case for multiple CAs?

I mean, aren't we adding a certificates resource, so it should be easy to get a cert signed by the main apiserver?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, aren't we adding a certificates resource, so it should be easy to get a cert signed by the main apiserver?

Not in the short term. Signing a serving certificate, even one for service, requires having strict rules about re-use of namespaces and the like. I wouldn't make plans like this based on it yet. In addition, it itself would be an extension API server.

Starting with a way to describe, "this is how you trust me" seems very reasonable. I'm also willing to provide federation server level CA options. When we move out of alpha we could decide if its been useful.

// CABundle is a PEM encoded CA bundle which will be used to validate an API server's serving certificate.
CABundle []byte

// Priority controls the ordering of this API group in the overall discovery document that gets served.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an administration nightmare. I would prefer to remove this. We can add it in later if it is really necessary. We can hardcode a "first-class citizen" group list for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an administration nightmare. I would prefer to remove this. We can add it in later if it is really necessary. We can hardcode a "first-class citizen" group list for now.

Leaving it empty just means that secondary priorities take over. We hit this immediately with openshift types laid on top of kube types. Names collide and we need to pick priorities. I don't think that the winners should be hardcoded and impossible to change. Strong default resource manifests seem reasonable (kube resources come in as 1 by default), but I need to be able to change them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing sysadmins to adjust it means kubectl will function differently on different clusters. Big user surprise. I really think we should come up with a better system.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing sysadmins to adjust it means kubectl will function differently on different clusters. Big user surprise. I really think we should come up with a better system.

Pushing it to the client means that a cluster-admin can't choose which groups he wants to have priority in his cluster. Imagining two groups with Roles. I don't think that a client should choose which one of the Roles takes priority by default. As an explicit choice certainly (we support this today), but by default I think that's unexpected. I think its a better idea for a cluster-admin wants to control resource preference in his cluster and a client can choose to be explicit (best practice recommended today).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't say we should push it to the client, I agree, that makes it worse. I do think the resolution order should be a fixed property of the GVKs involved, not a property of the configuration in the discovery merger.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't say we should push it to the client, I agree, that makes it worse. I do think the resolution order should be a fixed property of the GVKs involved, not a property of the configuration in the discovery merger.

I don't wouldn't expect that priority to be consistent across all clusters, so we can't bake it into code. If we bake it into code, everyone will choose negative infinity or whatever the highest priority is. Where would it live if not here and not in code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still thinking about this. I agree there's a problem, but I don't like any solution so far. The solution in this PR is high-cognitive overhead for cluster administrators. I would prefer a solution that "just works" but without hard coding something or a central registry somewhere, I haven't thought of anything yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still thinking about this. I agree there's a problem, but I don't like any solution so far. The solution in this PR is high-cognitive overhead for cluster administrators. I would prefer a solution that "just works" but without hard coding something or a central registry somewhere, I haven't thought of anything yet.

I think that this, in combination with the bootstrapping described here: #37650 means that the default scenario results in no additional work for a cluster-admin, but the power is here when you need it and we have scenarios for it already.

I mean, even the base kube resources rely on priority to work since resources are/were cross registered under extensions and other groups like autoscaling or batch. Those only work today because it sorts right alphabetically.

@lavalamp
Copy link
Member

So the goal here is to add an api interface to the discovery merger?

Can you move the apifederator directory under cmd/kubernetes-discovery? In fact I think it should just be cmd/kubernetes-discovery/apis/.... I'm also in favor of moving the binary into the staging directory.

@deads2k
Copy link
Contributor Author

deads2k commented Nov 30, 2016

So the goal here is to add an api interface to the discovery merger?

I want that and I want to use the same API as the basis for extending the kubernetes-discovery to include a proxy. I've written up some idea here: #37650 .

I'm in favor of the move as well, but I want to wait until the genericapiserver gets there. I have several pulls in the queue to make genericapiserver easy enough to re-use to drive the kubernetes-discovery from it.

@lavalamp
Copy link
Member

I'm pretty strongly opposed to starting a new top-level directory with api types for a binary that already has a directory.

@deads2k
Copy link
Contributor Author

deads2k commented Nov 30, 2016

I'm pretty strongly opposed to starting a new top-level directory with api types for a binary that already has a directory.

Ok, I don't mind moving it.

@lavalamp
Copy link
Member

OK, so one more thing to think about: one of the problems with TPR is that they're global, not namespaced. I think people will want to lock some apiservers into some namespaces. For dev/test cycles if no other reason.

@liggitt
Copy link
Member

liggitt commented Nov 30, 2016

lock some apiservers into some namespaces

I'm not sure what you mean by that. A federated API that routes type A to one backend server inside namespace foo but a different backend server inside namespace bar?

@lavalamp
Copy link
Member

lavalamp commented Dec 1, 2016

Yeah, I think step one is to figure out what people need.

@deads2k
Copy link
Contributor Author

deads2k commented Dec 1, 2016

OK, so one more thing to think about: one of the problems with TPR is that they're global, not namespaced. I think people will want to lock some apiservers into some namespaces. For dev/test cycles if no other reason.

I don't think that a summarizer for the cluster as a whole should accept input from someone other than a cluster-admin. That means that I don't think the scenario of a namespace (project) admin adding a TPR which is reflected in the cluster-wide discovery summarizer is a reasonable one.

If they want to run their own API server hosting a new resource along with an etcd to back it, that seems fine. I don't think we should expose the core etcd to the whims of a namespace admin.

I would also claim that any "extra" summarizing API server which could be layered on in a namespace would operate differently than this, since it could not be trusted with front proxy credentials.

@lavalamp
Copy link
Member

lavalamp commented Dec 5, 2016

I agree, please change the name.

@smarterclayton
Copy link
Contributor

smarterclayton commented Dec 6, 2016 via email

@lavalamp
Copy link
Member

lavalamp commented Dec 6, 2016 via email

@deads2k
Copy link
Contributor Author

deads2k commented Dec 6, 2016

comments addressed.

@deads2k deads2k added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 6, 2016
@deads2k deads2k added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Dec 6, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins GCI GKE smoke e2e failed for commit e0f1d406dcbe640401de9462f74d57df7263406a. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@deads2k
Copy link
Contributor Author

deads2k commented Dec 6, 2016

@k8s-bot unit test this: issue #37961

@deads2k
Copy link
Contributor Author

deads2k commented Dec 6, 2016

@k8s-bot gci gke e2e test this

@k8s-ci-robot
Copy link
Contributor

Jenkins GCE e2e failed for commit e0f1d406dcbe640401de9462f74d57df7263406a. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins unit/integration failed for commit e0f1d406dcbe640401de9462f74d57df7263406a. Full PR test history.

The magic incantation to run this job again is @k8s-bot unit test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@deads2k
Copy link
Contributor Author

deads2k commented Dec 6, 2016

@k8s-bot cvm gce e2e test this: issue #38198

@deads2k
Copy link
Contributor Author

deads2k commented Dec 6, 2016

@k8s-bot unit test this

@deads2k deads2k added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Dec 6, 2016
@rmmh
Copy link
Contributor

rmmh commented Dec 6, 2016

@k8s-bot gci gce e2e test this (job was accidentally deleted)

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 36990, 37494, 38152, 37561, 38136)

@k8s-github-robot k8s-github-robot merged commit f600c94 into kubernetes:master Dec 6, 2016
@deads2k deads2k deleted the fed-01-types branch February 1, 2017 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants