-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Services v2 (ip-per-service) #1402
Conversation
f23d4f6
to
fa5e46d
Compare
Anyone who was looking at this - I think it is almost done. I need to do better unit and e2e testing of the final solution, and there are a couple of FIXMEs left (any salt experts want to help?). I am happy to start entertaining review comments at this time. |
fa5e46d
to
b784e91
Compare
@eyakubovich We should consider how this affects flannel |
bcb4538
to
0d8b76d
Compare
At first look I think it should be OK. If flannel is executed with --ip-masq, it installs a masquerade rule for traffic going from a flannel network to the outside world (same stuff Docker does). This PR is all about DNAT while flannel is doing SNAT so they should not conflict. |
|
||
We expect that using iptables for portals will work at small scale, but will | ||
not scale to large clusters with thousands of services. We hope that, by that | ||
time, we will have moved to a model wherein each pod declares which services |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one possible solution, but I don't think it's a foregone conclusion that we'll go that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we intercepted DNS resolutions, could we lazily set up iptables rules at DNS resolution time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make that optimization if we decided it was important to keep global portals, no pre-declarations, and scaling was actually a problem. IMO it's too clever by half, but it's worth keeping in our pocket, I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we shouldn't do the optimization now. My point was that requiring users to declare services used isn't the only solution, and I have concerns about the impact that would have on ease of use and use-case coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should sketch out what the user flow for both user-directed services and global services looks like and determine whether we want both or just one. Also have the discussion about how the ordering problems as implemented today for global services and pods affects actual user interaction.
----- Original Message -----
+
+
+## Shortcomings
+
+Part of theservice
specification is acreateExternalLoadBalancer
flag,
+which tells the master to make an external load balancer that points to
the
+service. In order to do this today, the service proxy must answer on a
known
+(i.e. not random) port. In this case, the service port is promoted to the
+proxy port. This means that is is still possible for users to collide
with
+each others services or with other pods. We expect mostservices
will
not
+set this flag, mitigating the exposure.
+
+We expect that using iptables for portals will work at small scale, but
will
+not scale to large clusters with thousands of services. We hope that, by
that
+time, we will have moved to a model wherein each pod declares which
servicesI agree that we shouldn't do the optimization now. My point was that
requiring users to declare services used isn't the only solution, and I have
concerns about the impact that would have on ease of use and use-case
coverage.
Reply to this email directly or view it on GitHub:
https://github.com/GoogleCloudPlatform/kubernetes/pull/1402/files#r18324012
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain a bit more what you want to see? I'm too close to it, I
think.
On Thu, Oct 2, 2014 at 7:10 AM, Clayton Coleman notifications@github.com
wrote:
In docs/services.md:
+
+
+## Shortcomings
+
+Part of theservice
specification is acreateExternalLoadBalancer
flag,
+which tells the master to make an external load balancer that points to the
+service. In order to do this today, the service proxy must answer on a known
+(i.e. not random) port. In this case, the service port is promoted to the
+proxy port. This means that is is still possible for users to collide with
+each others services or with other pods. We expect mostservices
will not
+set this flag, mitigating the exposure.
+
+We expect that using iptables for portals will work at small scale, but will
+not scale to large clusters with thousands of services. We hope that, by that
+time, we will have moved to a model wherein each pod declares which servicesMaybe we should sketch out what the user flow for both user-directed
services and global services looks like and determine whether we want both
or just one. Also have the discussion about how the ordering problems as
implemented today for global services and pods affects actual user
interaction.
... <#148d1324a4b6f164_>
----- Original Message -----
+ > + > +##
Shortcomings > + > +Part of theservice
specification is a
createExternalLoadBalancer
> flag, > +which tells the master to make an
external load balancer that points to > the > +service. In order to do this
today, the service proxy must answer on a > known > +(i.e. not random)
port. In this case, the service port is promoted to the > +proxy port. This
means that is is still possible for users to collide > with > +each others
services or with other pods. We expect mostservices
will > not > +set
this flag, mitigating the exposure. > + > +We expect that using iptables
for portals will work at small scale, but > will > +not scale to large
clusters with thousands of services. We hope that, by > that > +time, we
will have moved to a model wherein each pod declares which > services I
agree that we shouldn't do the optimization now. My point was that
requiring users to declare services used isn't the only solution, and I
have concerns about the impact that would have on ease of use and use-case
coverage. --- Reply to this email directly or view it on GitHub:
https://github.com/GoogleCloudPlatform/kubernetes/pull/1402/files#r18324012Reply to this email directly or view it on GitHub
https://github.com/GoogleCloudPlatform/kubernetes/pull/1402/files#r18341089
.
77d42a5
to
690dba9
Compare
I know this is an enormous PR - I can break it up into smaller PRs, though the commits are pretty clean. I'd really appreciate someone looking at at least the first few commits, though I think it is functionally complete modulo some improved testing. @smarterclayton do you have some iptables comprehension? Any other volunteers? |
@thockin I can take a look. |
I welcome all comers :) The first 4 or 5 commits are self-contained and On Fri, Oct 3, 2014 at 1:33 PM, bgrant0607 notifications@github.com wrote:
|
@@ -96,9 +96,9 @@ There are 4 ways that a container manifest can be provided to the Kubelet: | |||
|
|||
### Kubernetes Proxy | |||
|
|||
Each node also runs a simple network proxy. This reflects `services` as defined in the Kubernetes API on each node and can do simple TCP stream forwarding or round robin TCP forwarding across a set of backends. | |||
Each node also runs a simple network proxy. This reflects `services` (see [here](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/serices.md) for more details) as defined in the Kubernetes API on each node and can do simple TCP and UDP stream forwarding (round robin) across a set of backends. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I have found that relative references work better than absolute ones -- docs/....md. Also, you have a typo: serices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in next push
# $3: service port | ||
function wait_for_service_down() { | ||
for i in $(seq 1 20); do | ||
$(ssh-to-node "${test_node}" " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
back at you :)
e2e test passes on GCE every time. It still fails sometimes on vagrant - failing to connect in places that just shouldn't fail. I give up and am punting it to the Vagrant folks. @derekwaynecarr @pweil- - your ball. |
I am fine merging with the current behavior and we can separately address the consistency of the results. There has been some talk about looking into a libvirt based VM to improve performance over VirtualBox for Linux based devs or kernel devs in general.
|
As far as I can tell this PR is done. |
41aef32
to
9b5162c
Compare
I agree with @derekwaynecarr. We can work on the specific env issues separately. LGTM |
Bombs away! |
In preparation for kubernetes#1402.
In preparation for kubernetes#1402. (cherry picked from commit fa24fac)
In preparation for kubernetes#1402. (cherry picked from commit fa24fac)
…-openshift-master Bug OCPBUGS-2991: Disable expansion in SC, if driver does not support it
DO NOT COMMIT. Needs test updates and a new e2e test. Some FIXMEs still. 95% complete.
Add support for IP-per-service behavior. This creates virtual IP addresses for each service, and uses iptables to route traffic "magically".
The first few commits are relatively independent, and could be extracted to different PRs, if you feel like making me suffer.
See #1107 for background.