Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pods need to pre-declare service links iff they want the environment variables created #1768

Closed
bgrant0607 opened this issue Oct 14, 2014 · 59 comments
Labels
area/api Indicates an issue on api area. area/downward-api kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/service-catalog Categorizes an issue or PR as relevant to SIG Service Catalog.

Comments

@bgrant0607
Copy link
Member

Forked from #1107 and #386.

Now seems like a good time to decide whether we want to require/encourage/allow pods to declare services they depend upon. Internally, we've often wished we had such a mechanism.

Not only would pre-declaration reduce accidental/lazy coupling, but it would also improve scalability by reducing the number of iptables rules that must be created. Pre-declaration would also be compatible with Docker's approach to links. If we supported service aliasing in these declarations, that would facilitate dependency injection for tests and a wide variety of deployment adaptation scenarios, which seems like a compelling alternative to custom environment variables, command-line flags, dynamic configuration services, and so on.

However, it at least needs possible to opt out of static declaration and/or enforcement of service dependencies, such as in the case that dependent services are registered dynamically -- think of a proxy, load balancer, web browser, monitoring service, or naming/discovery service running in a container.

We'd also definitely need to support more flavors of services (cardinal services, headless services, master election, sharding, ...) in order for most clients to be able to utilize the pre-declaration mechanism.

Something else to consider is how to address dependencies pulled in by client libraries, though perhaps it's not unreasonable to require client libraries to be transparent regarding which services they access.

/cc @thockin @smarterclayton

@bgrant0607 bgrant0607 added kind/design Categorizes issue or PR as related to design. sig/network Categorizes an issue or PR as relevant to SIG Network. area/api Indicates an issue on api area. area/kube-proxy area/downward-api labels Oct 14, 2014
@erictune
Copy link
Member

How will services v2 interact with namespaces and authorization?

If namespaces are used to separate different companies or different organizations within a large company, then probably most of the time namespace owners will:

  • not normally allow cross-namespace listing of services objects
  • want to partition network traffic as a coarse form of access control (either the only, or as part of defense in depth)
  • not want automatically created DNS records to be by default visible to all, as that might leak information about their application structure, scope, etc.

Even a cluster's users are confined to a few cooperative organizational units, they might want:

  • to not expose the internal architecture of a cluster of microservices which together form a mesoservice.
  • to namespace the automatically created DNS records to prevent collisions when same-named services are created in different clusters.

On the other hand, having to declare all dependencies seems like a bad user experience. There is probably a reason why we have wished for this but not implemented it for so many years.

One compromise might be to default to fully connected within a namespace but require explicit connection across namespaces.

@smarterclayton
Copy link
Contributor

Generally predeclaration seems valuable for making arbitrary software to work, and automatic injection seems useful for Kube designed software. Predeclaration works well for controlling dependencies, and automatic injection works well for forcing tolerance of missing components (degrading components?).

On Oct 15, 2014, at 7:47 PM, Eric Tune notifications@github.com wrote:

How will services v2 interact with namespaces and authorization?

If namespaces are used to separate different companies or different organizations within a large company, then probably most of the time namespace owners will:

not normally allow cross-namespace listing of services objects
want to partition network traffic as a coarse form of access control (either the only, or as part of defense in depth)
not want automatically created DNS records to be by default visible to all, as that might leak information about their application structure, scope, etc.
Even a cluster's users are confined to a few cooperative organizational units, they might want:

to not expose the internal architecture of a cluster of microservices which together form a mesoservice.
to namespace the automatically created DNS records to prevent collisions when same-named services are created in different clusters.
On the other hand, having to declare all dependencies seems like a bad user experience. There is probably a reason why we have wished for this but not implemented it for so many years.

One compromise might be to default to fully connected within a namespace but require explicit connection across namespaces.


Reply to this email directly or view it on GitHub.

@smarterclayton
Copy link
Contributor

On Oct 15, 2014, at 7:47 PM, Eric Tune notifications@github.com wrote:

How will services v2 interact with namespaces and authorization?

If namespaces are used to separate different companies or different organizations within a large company, then probably most of the time namespace owners will:

not normally allow cross-namespace listing of services objects
want to partition network traffic as a coarse form of access control (either the only, or as part of defense in depth)
not want automatically created DNS records to be by default visible to all, as that might leak information about their application structure, scope, etc.
Even a cluster's users are confined to a few cooperative organizational units, they might want:

to not expose the internal architecture of a cluster of microservices which together form a mesoservice.
to namespace the automatically created DNS records to prevent collisions when same-named services are created in different clusters.
On the other hand, having to declare all dependencies seems like a bad user experience. There is probably a reason why we have wished for this but not implemented it for so many years.

One compromise might be to default to fully connected within a namespace but require explicit connection across namespaces.

I think this is an excellent rule of thumb. Connection across namespace might require acks on both ends (to prevent me injecting my variables into your space). That explicit connection might be modeled as a service on one end that talks to pods, but on the other end might point to a service in a namespace. Both are required for traffic to flow across a namespace boundary.=

@smarterclayton
Copy link
Contributor

What would an explicit dependency from a pod to a service look like:

  1. A name of a service
  2. A namespace (if outside the current)
  3. The environment variable name to use for the service host
  4. Additional environment variables?
  5. An internal address and/or port to use (including 127.0.0.1 or hardcoded)
  6. The nature of the usage of the service?
  7. A DNS name for the service within this pod?

@bgrant0607 bgrant0607 added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Dec 4, 2014
@ghost
Copy link

ghost commented Dec 16, 2014

+100

@bgrant0607
Copy link
Member Author

@kelonye Could you please describe your use case in more detail? What are you trying to achieve? Do you actually want firewalls (#2880), or is this security through obscurity, or something else?

@ghost
Copy link

ghost commented Dec 17, 2014

@bgrant0607 a client of mine wants to run mini user applications using untrusted user images as linked services .. so security is 1. 2, have different services with the same name so say users A and B both have a REDIS and a WEB service.

@bgrant0607
Copy link
Member Author

@kelonye Namespaces were intended to address the multi-user issue. As for trusted vs. not, that's the firewall issue (#2880) -- which clients are permitted to see a service.

If we continue to support the environment variables, we are going to need to make them on request only. Creating so many variables for all services in a cluster, even just within a namespace, is a scalability problem. That has nothing to do with accessibility, though. It would just save the user from having to write a pre-start hook to resolve the service's DNS name and dump it into a file to be read by the container.

A separate issue is start-up dependencies. Though they're fragile, a number of applications do make (bad) assumptions about startup order, so we'll need to support them in some form in our deployment workflow mechanism(s) (#1704).

@pmorie
Copy link
Member

pmorie commented Jan 9, 2015

The following use-case requires predeclaration:

  1. As a user I want to be able to customize how a service is consumed by allowing the name and
    form of environment variables to be transformed prior to use by a pod so that images which are
    not designed to work with a specific named service can be used with kubernetes

Predeclaration would also play a role in the cross-namespace use-case:

  1. As a user I want a pod to be able to consume the metadata of a service in a different
    namespace (example: namespace A contains a database service which I want to use from namespace B)

I'll also note that there is a use-case where you want to opt-out of predeclaration and get info / firewall rules for every service in a pod's namespace. This adds a wrinkle because it divides pods into two types, those that follow the normal rules and those that don't, which would definitely impact scheduling in order to keep an opted-out pod from wrecking port allocation on a host where all other pods predeclare. I think @erictune's suggestion of defaulting to fully connected within a namespace and requiring predeclaration of cross namespace dependencies is a good middle ground.

@bgrant0607 @erictune @smarterclayton As a next step, how about a PR to explore the predeclaration mechanism in a vacuum. I would suggest that PodSpec be changed as follows:

type PodSpec struct {
  // other fields omitted
  ServiceLinks []ServiceLinks
}

type ServiceLink struct {
  TypeMeta
  ObjectMeta
  Name          string
  Namespace  string
  // Better perhaps as an ObjectReference?
}

The first iteration could change the BoundPodFactory or move the service env functionality to the Kubelet, the latter of which seems like the direction the tide is going.

@pmorie
Copy link
Member

pmorie commented Jan 9, 2015

@lavalamp ^

@lavalamp
Copy link
Member

lavalamp commented Jan 9, 2015

Moving BoundPodFactory stuff to kubelet is good and necessary, but @erictune may be working on that already.

If we switch to DNS and deprecate env vars completely, does that eliminate the need for predeclaration? That sounds much easier...

@pmorie
Copy link
Member

pmorie commented Jan 9, 2015

@lavalamp I don't think switching to DNS and deprecating env vars fully eliminates the need for predeclaration. There there's still the problem of iptables rules on the node.

@smarterclayton
Copy link
Contributor

You still have to know what DNS you're looking forward to, and software that runs on different namespaces or clusters won't have the same DNS.

You need something injected into the container that lets legacy software react to the cluster topology. Predeclaration for adaption is a key thing, there's lots of software out there that doesn't know anything about X_SERVICE_HOST at all.

Also, that same software has to work outside of a cluster as well - on a local dev box how would you point your app to your db (except by using env or mutating a file on disk)?

On Jan 9, 2015, at 3:01 PM, Paul Morie notifications@github.com wrote:

@lavalamp I don't think switching to DNS and deprecating env vars fully eliminates the need for predeclaration. There there's still the problem of iptables rules on the node.


Reply to this email directly or view it on GitHub.

@smarterclayton
Copy link
Contributor

On Jan 9, 2015, at 12:46 PM, Paul Morie notifications@github.com wrote:

@bgrant0607

The following use-case requires predeclaration:

As a user I want to be able to customize how a service is consumed by allowing the name and form of environment variables to be transformed prior to use by a pod so that images which are not designed to work with a specific named service can be used with kubernetes
Predeclaration would also play a role in the cross-namespace use-case:

As a user I want a pod to be able to consume the metadata of a service in a different namespace (example: namespace A contains a database service which I want to use from namespace B)
I'll also note that there is a use-case where you want to opt-out of predeclaration and get info / firewall rules for every service in a pod's namespace. This adds a wrinkle because it divides pods into two types, those that follow the normal rules and those that don't, which would definitely impact scheduling in order to keep an opted-out pod from wrecking port allocation on a host where all other pods predeclare. I think @erictune's suggestion of defaulting to fully connected within a namespace and requiring predeclaration of cross namespace dependencies is a good middle ground.

@bgrant0607 @erictune @smarterclayton As a next step, how about a PR to explore the predeclaration mechanism in a vacuum. I would suggest that PodSpec be changed as follows:

type PodSpec struct {
// other fields omitted
ServiceLinks ServiceLinkList
}

type ServiceLinkList struct {
TypeMeta
ListMeta
Items []ServiceLink
}

type ServiceLink struct {
TypeMeta
ObjectMeta
Name string
Namespace string
// Better perhaps as an ObjectReference?
}

I had been envisioning this to do adaptation, so mutating how the service shows up in the pod. I think that would make the use case a bit more concrete and practical.

The first iteration could change the BoundPodFactory or move the service env functionality to the Kubelet, the latter of which seems like the direction the tide is going.


Reply to this email directly or view it on GitHub.

@pmorie
Copy link
Member

pmorie commented Jan 9, 2015

@smarterclayton We can roll adaptation into the POC, I will think through a design and propose a model here

@erictune
Copy link
Member

@pmorie can you give a more concrete example of a system with legacy software that needs service links?

@pmorie
Copy link
Member

pmorie commented Jan 12, 2015

@erictune As a more detailed example, say I have an image that depends on a specially formatted environment variable with a URL for a service. As an example format, take:

DOCKER_URL=http://$DOCKER_HOST/$DOCKER_PORT

This use case is to be able to adapt to these special requirements an image may have without changing the image.

It's definitely the case that there's a pretty sizable subset of these cases that can be addressed by performing the translation via the shell and either setting containers' commands to set the variables or wrapping an image with another image containing a script.

It's arguable that for those cases, the adaptation mechanism isn't necessary. However, it is necessary for images that do not contain a shell, or that use the ENTRYPOINT feature of docker (in which case the environment cannot be overridden from the container's command without specifically overriding the entrypoint). Personally, I also think it's arguable that the experience will be better to adapt services in this manner even when images have a shell and can perform the substitution themselves.

@smarterclayton
Copy link
Contributor

Or even the mysql client (http://dev.mysql.com/doc/refman/5.0/en/environment-variables.html)

MYSQL_HOST
MYSQL_TCP_PORT

Neither of those match our existing env.

----- Original Message -----

@erictune As a more detailed example, say I have an image that depends on a
specially formatted environment variable with a URL for a service. As an
example format, take:

DOCKER_URL=http://$DOCKER_HOST/$DOCKER_PORT

This use case is to be able to adapt to these special requirements an image
may have without changing the image.

It's definitely the case that there's a pretty sizable subset of these cases
that can be addressed by performing the translation via the shell and either
setting containers' commands to set the variables or wrapping an image with
another image containing a script.

It's arguable that for those cases, the adaptation mechanism isn't necessary.
However, it is necessary for images that do not contain a shell, or that
use the ENTRYPOINT feature of docker (in which case the environment cannot
be overridden from the container's command without specifically overriding
the entrypoint). Personally, I also think it's arguable that the experience
will be better to adapt services in this manner even when images have a
shell and can perform the substitution themselves.


Reply to this email directly or view it on GitHub:
#1768 (comment)

@pmorie
Copy link
Member

pmorie commented Jan 12, 2015

@smarterclayton and I discussed this offline and think the adaptation use-case doesn't depend on predeclaration of services nor does it need to be coupled to services at all.

Here's another use-case that might require pre-declaration:

  1. As a user I want to express that a pod should not be started unless services it depends on have had IP/Ports allocated

Consider the following to address that use-case:

type PodSpec struct {
  // other fields omitted
  ServiceLinks []ServiceLinks
}

type ServiceLink struct {
  TypeMeta
  ObjectMeta

  Target    ObjectReference
  NeedReady bool
}

NeedReady would be a precondition that states that the service must have at least one endpoint.

Any thoughts? @smarterclayton @erictune @bgrant0607 @thockin @lavalamp

@pmorie
Copy link
Member

pmorie commented Jan 12, 2015

Also, in the context of the above, it would be good to produce an event after some time if a pod which is scheduled cannot start, but that's perhaps another issue.

@bgrant0607
Copy link
Member Author

Previous discussion of this last type of dependency was in #2385.

It has been my hope that DNS will eliminate the creation order problem between services and their clients. Creation-order dependencies are problematic since containers can go down at any time, and they have unclear meaning when updating or replacing objects (such as with rolling updates).

That said, sometimes there are unavoidable turn-up dependencies, such as to initialize stateful services, such as databases or message brokers. I envision handling such dependencies in deployment automation: #1704.

@bgrant0607
Copy link
Member Author

I just experienced this, as additional confirmation: The new directory-reading feature of kubectl is not so useful for containers using service environment variables. Someone tried to do:

cluster/kubectl.sh create -f examples/guestbook-go/

@zq-david-wang
Copy link

@smarterclayton I notice a huge delay when spawning bash via "docker exec" if there are thousands of services (I tested with 3000) in the namespace. It would be great to have options to disable service env when creating docker process.

The test I run is as following: (kubelet 1.6)
I have build a docker image with bash installed on alpine base image.
When run "docker exec -it [docker-id] bash",
it took about 20s to run it if there are 3000 services within the namespace;
it took only less than 1s to run it if I disable the service env by modifying the code.

@bgrant0607
Copy link
Member Author

cc @kubernetes/sig-network-feature-requests @kubernetes/sig-node-feature-requests

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 6, 2018
@bgrant0607
Copy link
Member Author

/remove-lifecycle stale
/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 23, 2018
@thockin thockin added the triage/unresolved Indicates an issue that can not or will not be resolved. label Mar 8, 2019
@freehan freehan removed the triage/unresolved Indicates an issue that can not or will not be resolved. label May 16, 2019
@endocrimes
Copy link
Member

Now that service links are optional (dealing with some of the performance and collision issues that they used to cause), we're mostly waiting on a way to declare a subset of them for an application here, and it seems no headway has been made on that recently, I'm going to bump this down to important-longterm.

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Jun 24, 2021
@endocrimes
Copy link
Member

/remove-priority important-soon

@k8s-ci-robot k8s-ci-robot removed the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jun 24, 2021
@MadhavJivrajani
Copy link
Contributor

/remove-kind design
/kind feature

kind/design will soon be removed from k/k in favor of kind/feature. Relevant discussion can be found here: kubernetes/community#5641

@k8s-ci-robot k8s-ci-robot removed the kind/design Categorizes issue or PR as related to design. label Jun 29, 2021
@thockin
Copy link
Member

thockin commented Jan 16, 2023

Realistically, no. Service links are not something we're going to do more to support.

@thockin thockin closed this as completed Jan 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Indicates an issue on api area. area/downward-api kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/service-catalog Categorizes an issue or PR as relevant to SIG Service Catalog.
Projects
None yet
Development

No branches or pull requests