-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proxy min sync period #35334
Proxy min sync period #35334
Conversation
I'll fix up the minor stuff in a bit. |
ug, the splitting of component config + go client is a mess. |
@@ -77,7 +77,8 @@ func (s *ProxyServerConfig) AddFlags(fs *pflag.FlagSet) { | |||
fs.StringVar(&s.HostnameOverride, "hostname-override", s.HostnameOverride, "If non-empty, will use this string as identification instead of the actual hostname.") | |||
fs.Var(&s.Mode, "proxy-mode", "Which proxy mode to use: 'userspace' (older) or 'iptables' (faster). If blank, look at the Node object on the Kubernetes API and respect the '"+ExperimentalProxyModeAnnotation+"' annotation if provided. Otherwise use the best-available proxy (currently iptables). If the iptables proxy is selected, regardless of how, but the system's kernel or iptables versions are insufficient, this always falls back to the userspace proxy.") | |||
fs.Int32Var(s.IPTablesMasqueradeBit, "iptables-masquerade-bit", util.Int32PtrDerefOr(s.IPTablesMasqueradeBit, 14), "If using the pure iptables proxy, the bit of the fwmark space to mark packets requiring SNAT with. Must be within the range [0, 31].") | |||
fs.DurationVar(&s.IPTablesSyncPeriod.Duration, "iptables-sync-period", s.IPTablesSyncPeriod.Duration, "How often iptables rules are refreshed (e.g. '5s', '1m', '2h22m'). Must be greater than 0.") | |||
fs.DurationVar(&s.IPTablesSyncPeriod.Duration, "iptables-sync-period", s.IPTablesSyncPeriod.Duration, "The Maximum interval of how often iptables rules are refreshed (e.g. '5s', '1m', '2h22m'). Must be greater than 0.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Maximum/maximum
@@ -80,6 +80,9 @@ func SetDefaults_KubeProxyConfiguration(obj *KubeProxyConfiguration) { | |||
if obj.IPTablesSyncPeriod.Duration == 0 { | |||
obj.IPTablesSyncPeriod = unversioned.Duration{Duration: 30 * time.Second} | |||
} | |||
if obj.IPTablesMinSyncPeriod.Duration == 0 { | |||
obj.IPTablesMinSyncPeriod = unversioned.Duration{Duration: 10 * time.Second} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that changes in endpoints won't appear for 10 seconds? Too long. I was thinking 1-2 seconds max.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -77,7 +77,8 @@ func (s *ProxyServerConfig) AddFlags(fs *pflag.FlagSet) { | |||
fs.StringVar(&s.HostnameOverride, "hostname-override", s.HostnameOverride, "If non-empty, will use this string as identification instead of the actual hostname.") | |||
fs.Var(&s.Mode, "proxy-mode", "Which proxy mode to use: 'userspace' (older) or 'iptables' (faster). If blank, look at the Node object on the Kubernetes API and respect the '"+ExperimentalProxyModeAnnotation+"' annotation if provided. Otherwise use the best-available proxy (currently iptables). If the iptables proxy is selected, regardless of how, but the system's kernel or iptables versions are insufficient, this always falls back to the userspace proxy.") | |||
fs.Int32Var(s.IPTablesMasqueradeBit, "iptables-masquerade-bit", util.Int32PtrDerefOr(s.IPTablesMasqueradeBit, 14), "If using the pure iptables proxy, the bit of the fwmark space to mark packets requiring SNAT with. Must be within the range [0, 31].") | |||
fs.DurationVar(&s.IPTablesSyncPeriod.Duration, "iptables-sync-period", s.IPTablesSyncPeriod.Duration, "How often iptables rules are refreshed (e.g. '5s', '1m', '2h22m'). Must be greater than 0.") | |||
fs.DurationVar(&s.IPTablesSyncPeriod.Duration, "iptables-sync-period", s.IPTablesSyncPeriod.Duration, "The Maximum interval of how often iptables rules are refreshed (e.g. '5s', '1m', '2h22m'). Must be greater than 0.") | |||
fs.DurationVar(&s.IPTablesMinSyncPeriod.Duration, "iptables-min-sync-period", s.IPTablesMinSyncPeriod.Duration, "The Minimum interval of how often the iptables rules can be refreshed as endpoints and services change (e.g. '5s', '1m', '2h22m'). Must be greater than 0.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Minimum/minimum
@@ -495,7 +498,10 @@ func (proxier *Proxier) OnServiceUpdate(allServices []api.Service) { | |||
} | |||
} | |||
} | |||
proxier.syncProxyRules() | |||
if expired := time.Since(proxier.lastSync); expired > proxier.minsyncPeriod { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't right. If I am 10ms short of being able to refresh, I now have to wait for another event or for the longer sync-period. Don't you want a timer to wake up in proxier.minsyncPeriod - expired
?
@@ -87,6 +87,7 @@ type Proxier struct { | |||
mu sync.Mutex // protects serviceMap | |||
serviceMap map[proxy.ServicePortName]*serviceInfo | |||
syncPeriod time.Duration | |||
minsyncPeriod time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/minsync/minSync/
2474c7c
to
a293382
Compare
@thockin comments addressed, rebased, tests now passing. |
@@ -77,7 +77,8 @@ func (s *ProxyServerConfig) AddFlags(fs *pflag.FlagSet) { | |||
fs.StringVar(&s.HostnameOverride, "hostname-override", s.HostnameOverride, "If non-empty, will use this string as identification instead of the actual hostname.") | |||
fs.Var(&s.Mode, "proxy-mode", "Which proxy mode to use: 'userspace' (older) or 'iptables' (faster). If blank, look at the Node object on the Kubernetes API and respect the '"+ExperimentalProxyModeAnnotation+"' annotation if provided. Otherwise use the best-available proxy (currently iptables). If the iptables proxy is selected, regardless of how, but the system's kernel or iptables versions are insufficient, this always falls back to the userspace proxy.") | |||
fs.Int32Var(s.IPTablesMasqueradeBit, "iptables-masquerade-bit", util.Int32PtrDerefOr(s.IPTablesMasqueradeBit, 14), "If using the pure iptables proxy, the bit of the fwmark space to mark packets requiring SNAT with. Must be within the range [0, 31].") | |||
fs.DurationVar(&s.IPTablesSyncPeriod.Duration, "iptables-sync-period", s.IPTablesSyncPeriod.Duration, "How often iptables rules are refreshed (e.g. '5s', '1m', '2h22m'). Must be greater than 0.") | |||
fs.DurationVar(&s.IPTablesSyncPeriod.Duration, "iptables-sync-period", s.IPTablesSyncPeriod.Duration, "The maximum interval of how often iptables rules are refreshed (e.g. '5s', '1m', '2h22m'). Must be greater than 0.") | |||
fs.DurationVar(&s.IPTablesMinSyncPeriod.Duration, "iptables-min-sync-period", s.IPTablesMinSyncPeriod.Duration, "The minimum interval of how often the iptables rules can be refreshed as endpoints and services change (e.g. '5s', '1m', '2h22m'). Must be greater than 0.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if this is > iptables-sync-period? Comment should say it can';t be and should be validated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added logic check.
glog.V(6).Infof("Periodic sync") | ||
proxier.Sync() | ||
proxier.timer.Reset(proxier.syncPeriod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments say proxier.timer
is mutex protected. This doesn't take the mutex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
proxier.syncProxyRules() | ||
} else if proxier.timer != nil { | ||
remaining := proxier.minSyncPeriod - expired | ||
glog.V(4).Infof("Service update resetting synch period %v", remaining) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/synch/sync
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
derp.
@@ -179,6 +180,7 @@ func createProxier(loadBalancer LoadBalancer, listenIP net.IP, iptables iptables | |||
serviceMap: make(map[proxy.ServicePortName]*serviceInfo), | |||
portMap: make(map[portMapKey]*portMapValue), | |||
syncPeriod: syncPeriod, | |||
minSyncPeriod: minSyncPeriod, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plumbed if needed, but I should probably commend to denote.
Here's what is going to happen. A Service update (create) will arrive. We sync it. Then an Endpoints update will arrive very soon thereafter. It will get stuck behind this delay. In practice, this adds at least 2 seconds to every Service creation. More if anyone tweaks that flag. Maybe not a huge deal, but worth thinking about. Can we do better? We could do a semaphore thing and start with something like 3 tokens. Add a new token every interval up to max 3. This would give low-load systems enough freedom to enact changes right away and would bandwidth-limit high-load systems. I'm just spitballing here... |
It will add 2 seconds to the last service, not every, as they will be batched, but I'm not terribly concerned about that. I'm far more concerned about controlling the thrashing at this point, and this puts an simple control knob on that.
I'd like to profile before adding any more complexity. For us, the largest concern is the high amount of churn on large clusters, not the update-SLO on endpoints. In profiling, the longest delay is typically on the docker pull which is orders of magnitude greater then this delay. So keeping it simple makes the most sense, unless there is a use case that I'm missing.?.? other comments have been addressed. |
e76e357
to
e84aa4e
Compare
Maybe I am misunderstanding the flow. The Service create will trigger OnServiceUpdate(), which will sync. That sets the timestamp. 10ms later, the Endpoints update will trigger OnEndpointsUpdate(), which will set a timer for 1990 ms later. If the normal pattern is these two operations being highly temporally coupled, it seems we should design for that case, no? |
In a slow churn single operator environment, perhaps. |
@timothysc @thockin I did a patch yesterday to cut down on non-important Service/Endpoint changes, which does seem to reduce the need for resyncs somewhat. I'm currently looking into how to do an iptables diff to figure out if we really need to resync for the periodic loop, but that's a lot harder as Kube doesn't write rules in the same way as iptables-save reads them back, so we effectively have to create an iptables-save rule parser. I know Tim said at one point that "by the time we get to iptables-restore the work is all pretty much done", but that's not the case. The Go side is fine, but it's the kernel contention of iptables-restore that is the problem for most of our customers. When I benchmarked with 4.7 kernels and 6 or 8000 iptables rules, I was getting hundreds of ms runtime for a single iptables-restore. The Go-side time for generating those rules was incidental. So I think we can get some good mileage from simply not calling iptables-restore when we don't need to. |
67ad40e
to
1cb97b8
Compare
Jenkins GCI GKE smoke e2e failed for commit 1cb97b8. Full PR test history. The magic incantation to run this job again is |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue |
@k8s-oncall @thockin @timothysc - it seems that this PR broke gke-slow suite (it's mostly red since then and nothing is merging) |
@saad-ali ^^ |
@bprashanth I merged your PR #36282. Does this need to be reverted as well? |
@saad-ali - I didn't see that PR, maybe that one actually fix the problem. |
Looks like #36282 just needs a timing shift, which makes sense, but I'm wondering how the assortment of other suites didn't catch this? |
Yes, looks like PR #36282 took care of it. https://k8s-gubernator.appspot.com/builds/kubernetes-jenkins/logs/kubernetes-e2e-gci-gke-slow/ has been green since. |
are you sure? akf right now but last I checked it flaked this morning and the pr went in last night. I just don't want to cry wolf without evidence :) |
There are multiple slow suites, the one I looked at, kubernetes-e2e-gci-gke-slow, has been green since PR #36282 went in at 12:37 AM PDT. Same with kubernetes-e2e-gce-slow. However, kubernetes-e2e-gke-slow failed as recently as 2016-11-05 1:23 PM PDT. So it's possible PR #36282 just reduced the frequency of failure, but need a smoking gun to revert this PR. |
Did I just forget to read time? I see 8 flakes since 12:30 AM last night. Because of the way kubeproxy is structured, we're inserting a token into the bucket every 2 seconds for the iptables update, and sending a no-op update down the watch every second (2 every 2 seconds, scheduler and kube-controller-manager). So we end up filling the queue with no-op endpoint updates and blocking an interesting service update. @timothysc if you're interested in doing this right you should look at the watch processing pipeline. This pr calls Accept() holding the syncProxyRules lock (https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/iptables/proxier.go#L782, https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/iptables/proxier.go#L422), which is effective a time.Sleep(1 or 2 seconds) given what I described above. The on update functions receive a snapshot of endpoints/services on every update. Meaning:
this can essentially go on forever just processing no-op updates from the endpoint handler because go's locks aren't fair. Note that the "burst" will only help if you actually get 0 updates for a while such that you accumulate enough tokens. What we were doing previous is also bad, but the no-op updates would run asap leaving the channel empty for more important updates. I will probably send a pr to default it to the old behavior soonish. We can fix this as a bug during code freeze. |
Ah, @MrHohn has empirical evidence #36281 (comment) |
iptables-restore is a very heavy operation and depending on the kernel, the CPU, and the number of rules to restore, can take a very long time (~500ms or more). i7-5600U @ 2.6GHz: 700ms for 1000 services (2+4 cores+threads, kernel 4.6.6) i7-4790 @ 3.6GHz: 270ms for 1000 services (4+8 cores+threads, kernel 4.6.7) Other parts of kubernetes (eg kubenet) might be running iptables-restore too, leading to some pileup as each iptables-restore waits for others to complete. Related: kubernetes#26637 Related: kubernetes#33693 Related: kubernetes#35334
iptables-restore is a very heavy operation and depending on the kernel, the CPU, and the number of rules to restore, can take a very long time (~500ms or more). i7-5600U @ 2.6GHz: 700ms for 1000 services (2+4 cores+threads, kernel 4.6.6) i7-4790 @ 3.6GHz: 270ms for 1000 services (4+8 cores+threads, kernel 4.6.7) Other parts of kubernetes (eg kubenet) might be running iptables-restore too, leading to some pileup as each iptables-restore waits for others to complete. Related: kubernetes#26637 Related: kubernetes#33693 Related: kubernetes#35334
iptables-restore is a very heavy operation and depending on the kernel, the CPU, and the number of rules to restore, can take a very long time (~500ms or more). i7-5600U @ 2.6GHz: 700ms for 1000 services (2+4 cores+threads, kernel 4.6.6) i7-4790 @ 3.6GHz: 270ms for 1000 services (4+8 cores+threads, kernel 4.6.7) Other parts of kubernetes (eg kubenet) might be running iptables-restore too, leading to some pileup as each iptables-restore waits for others to complete. Related: kubernetes#26637 Related: kubernetes#33693 Related: kubernetes#35334
What this PR does / why we need it:
Gives the proxy the option to set a lower bound on the sync period when there are a high number of endpoint changes. This prevents excessive iptables re-writes under a number of conditions.
fixes #33693
and alleviates the symptoms of #26637
NOTE:
There are other minor fixes that I'm working on but keeping the PRs separate.
Release note:
Added iptables-min-syn-period(2) to proxy to prevent excessive iptables writes
This change is