Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kube-proxy: ICMP reject via LBs when no endpoints #74394

Merged
merged 3 commits into from
Mar 12, 2019

Conversation

thockin
Copy link
Member

@thockin thockin commented Feb 22, 2019

/kind bug

What this PR does / why we need it:

ICMP reject services with no endpoints through LBs. We already reject all other cases.

Fixes #48719

#72879 (by @marwinski) was almost identical, but I didn't see that until I had already written this. Full credit to @marwinski .

Does this PR introduce a user-facing change?:

Services of type=LoadBalancer which have no endpoints will now immediately ICMP reject connections, rather than time out.

@thockin thockin self-assigned this Feb 22, 2019
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 22, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Feb 22, 2019
@k8s-ci-robot k8s-ci-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Feb 22, 2019
@thockin thockin force-pushed the proxy-reject-lb-no-endpoints branch 2 times, most recently from 02a0bd9 to ef913cf Compare February 22, 2019 06:14
@thockin thockin added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Feb 22, 2019
@k8s-ci-robot k8s-ci-robot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Feb 22, 2019
jig.Scale(ns1, 0)
jig.Scale(ns2, 0)

//FIXME: prove ICMP-reject
Copy link
Contributor

@danwinship danwinship Feb 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the internal case (#72561) we ended up just assuming that we'd get "connection refused" rather than "timed out".

You could just check the counters on the iptables rules; run iptables-save -c before and after, and verify that the packet count on the expected REJECT rule actually went up, and assume that if the right rule got hit, then the right end-user behavior happened too. (Though in fact, in this case the test would still pass even if the network plugin ate the ICMP reply...)

You could also check the global ICMP counters in netstat -s, which would let you confirm that the node had received an ICMP reject for some reason. That would have some unknowable number of false positives, but shouldn't have false negatives.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 27, 2019
@marwinski
Copy link

As this PR is virtually identical to #72879 I would appreciate if you name me as co-author and close the other one.

#72879 is similar, but I didn't see it until I had already written this. The testing is the tricky part.

Testing is indeed tricky but here is how I did it manually (with some proposal to automate). Our problem was not the "connection refused" but the stale conntrack entries that were caused by not rejecting the packets. Downside is that it works only for providers that have load balancers that do not terminate the TCP connections.

Prepare service

> kubectl run nginx --image nginx --replicas=1
deployment.apps/nginx created
> kubectl expose deployment nginx --type=LoadBalancer --port=80
service/nginx exposed
> kubectl describe service  nginx
...
LoadBalancer Ingress:     104.40.197.150
> curl 104.40.197.150
<!DOCTYPE html>
<html>
...

Log on to the host or a pod with host network (in this case calico-node)

Make sure there are not SYN_SENT entries:

> kubectl exec -it -n kube-system calico-node-bsn4f -c calico-node sh
/ # conntrack -L | grep SYN_SENT
conntrack v1.4.4 (conntrack-tools): 489 flow entries have been shown.
/ #

-> no stale SYN_SENT entries, all clean

Start curl endless loop

> while true; do curl 104.40.197.150 ; done
<!DOCTYPE html>
<html>
...

Scale replicas to 0

kubectl scale deployment nginx --replicas=0
deployment.extensions/nginx scaled

Go to host and look for stale entries:

/ # conntrack -L | grep SYN_SENT
tcp      6 119 SYN_SENT src=109.192.86.132 dst=104.40.197.150 sport=60659 dport=80 [UNREPLIED] src=104.40.197.150 dst=109.192.86.132 sport=80 dport=60659 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
conntrack v1.4.4 (conntrack-tools): 434 flow entries have been shown.
/ #

With the fix we don't see any stale SYN_SENT entries.

This can be simplified in the following way, however that works only if the load balancer is not a terminating one:

  • create the service as shown above
  • Craft a connection request to the "external load balancer ip" but with the mac address of a cluster node
  • Without the rule you will see a hang when number of services is scaled to 0. You will also see stale SYN_SENT entries
  • With the rule you should see a REJECT and no stale conntrack entries.

This should be straightforward but will not work when NodePort is created for the LB (but is straightforward to automate).

@nikopen
Copy link
Contributor

nikopen commented Mar 4, 2019

/milestone v1.14

@k8s-ci-robot k8s-ci-robot added this to the v1.14 milestone Mar 4, 2019
@thockin thockin force-pushed the proxy-reject-lb-no-endpoints branch from 26d3382 to 67cb489 Compare March 5, 2019 23:51
@thockin
Copy link
Member Author

thockin commented Mar 5, 2019

Happy to name you as co-author (inasmuch as that's a thing).

We need to have automated testing for this. I have been jumping between tasks, and have not had much time on this one. I think I finally figured why the approach I had was not working. If this run fails the way I expect, then we can proceed.

@thockin thockin force-pushed the proxy-reject-lb-no-endpoints branch 2 times, most recently from e6eca1f to 2bf827b Compare March 6, 2019 02:14
rjaini added a commit to msazurestackworkloads/kubernetes that referenced this pull request May 6, 2019
* Fix kubernetes#73479 AWS NLB target groups missing tags

`elbv2.AddTags` doesn't seem to support assigning the same set of
tags to multiple resources at once leading to the following error:
  Error adding tags after modifying load balancer targets:
  "ValidationError: Only one resource can be tagged at a time"

This can happen when using AWS NLB with multiple listeners pointing
to different node ports.

When k8s creates a NLB it creates a target group per listener along
with installing security group ingress rules allowing the traffic to
reach the k8s nodes.

Unfortunately if those target groups are not tagged, k8s will not
manage them, thinking it is not the owner.

This small changes assigns tags one resource at a time instead of
batching them as before.

Signed-off-by: Brice Figureau <brice@daysofwonder.com>

* remove get azure accounts in the init process set timeout for get azure account operation

use const for timeout value

remove get azure accounts in the init process

add lock for account init

* add timeout in GetVolumeLimits operation

add timeout for getAllStorageAccounts

* add mixed protocol support for azure load balancer

* record event on endpoint update failure

* fix parse devicePath issue on Azure Disk

* Kubernetes version v1.12.7-beta.0 openapi-spec file updates

* add retry for detach azure disk

add more logging info in detach disk

add more logging for azure disk attach/detach

* Add/Update CHANGELOG-1.12.md for v1.12.6.

* Reduce cardinality of admission webhook metrics

* fix negative slice index error in keymutex

* Remove reflector metrics as they currently cause a memory leak

* Explicitly set GVK when sending objects to webhooks

* add Azure Container Registry anonymous repo support

apply fix for msi and fix test failure

* DaemonSet e2e: Update image and rolling upgrade test timeout

Use Nginx as the DaemonSet image instead of the ServeHostname image.
This was changed because the ServeHostname has a sleep after terminating
which makes it incompatible with the DaemonSet Rolling Upgrade e2e test.

In addition, make the DaemonSet Rolling Upgrade e2e test timeout a
function of the number of nodes that make up the cluster. This is
required because the more nodes there are, the longer the time it will
take to complete a rolling upgrade.

Signed-off-by: Alexander Brand <alexbrand09@gmail.com>

* Revert kubelet to default to ttl cache secret/configmap behavior

* cri_stats_provider: overload nil as 0 for exited containers stats

Always report 0 cpu/memory usage for exited containers to make
metrics-server work as expect.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>

* flush iptable chains first and then remove them

while cleaning up ipvs mode. flushing iptable chains first and then
remove the chains. this avoids trying to remove chains that are still
referenced by rules in other chains.

fixes kubernetes#70615

* Checks whether we have cached runtime state before starting a container that requests any device plugin resource. If not, re-issue Allocate grpc calls. This allows us to handle the edge case that a pod got assigned to a node even before it populates its extended resource capacity.

* Fix panic in kubectl cp command

* Augmenting API call retry in nodeinfomanager

* Bump debian-iptables to v11.0.1. Rebase docker image on debian-base:0.4.1

* Adding a check to make sure UseInstanceMetadata flag is true to get data from metadata.

* GetMountRefs fixed to handle corrupted mounts by treating it like an unmounted volume

* Update Cluster Autoscaler version to 1.12.3

* add module 'nf_conntrack' in ipvs prerequisite check

* Allow disable outbound snat when Azure standard load balancer is used

* Ensure Azure load balancer cleaned up on 404 or 403

* fix smb unmount issue on Windows

fix log warning

use IsCorruptedMnt in GetMountRefs on Windows

use errorno in IsCorruptedMnt check

fix comments: add more error code

add more error no checking

change year

fix comments

fix bazel error

fix bazel

fix bazel

fix bazel

revert bazel change

* kubelet: updated logic of verifying a static critical pod

- check if a pod is static by its static pod info
- meanwhile, check if a pod is critical by its corresponding mirror pod info

* Allow session affinity a period of time to setup for new services.

This is to deal with the flaky session affinity test.

* Restore username and password kubectl flags

* build/gci: bump CNI version to 0.7.5

* fix race condition issue for smb mount on windows

change var name

* allows configuring NPD release and flags on GCI and add cluster e2e test

* allows configuring NPD image version in node e2e test and fix the test

* bump repd min size in e2es

* Kubernetes version v1.12.8-beta.0 openapi-spec file updates

* Add/Update CHANGELOG-1.12.md for v1.12.7.

* stop vsphere cloud provider from spamming logs with `failed to patch IP` Fixes: kubernetes#75236

* Do not delete existing VS and RS when starting

* Fix updating 'currentMetrics' field for HPA with 'AverageValue' target

* Populate ClientCA in delegating auth setup

kubernetes#67768 accidentally removed population of the the ClientCA
in the delegating auth setup code.  This restores it.

* Update gcp images with security patches

[stackdriver addon] Bump prometheus-to-sd to v0.5.0 to pick up security fixes.
[fluentd-gcp addon] Bump fluentd-gcp-scaler to v0.5.1 to pick up security fixes.
[fluentd-gcp addon] Bump event-exporter to v0.2.4 to pick up security fixes.
[fluentd-gcp addon] Bump prometheus-to-sd to v0.5.0 to pick up security fixes.
[metatada-proxy addon] Bump prometheus-to-sd v0.5.0 to pick up security fixes.

* Fix AWS driver fails to provision specified fsType

* Updated regional PD minimum size; changed regional PD failover test to use StorageClassTest to generate PVC template

* Bump debian-iptables to v11.0.2

* Avoid panic in cronjob sorting

This change handles the case where the ith cronjob may have its start
time set to nil.

Previously, the Less method could cause a panic in case the ith
cronjob had its start time set to nil, but the jth cronjob did not. It
would panic when calling Before on a nil StartTime.

* Add volume mode downgrade test: should not mount/map in <1.13

* disable HTTP2 ingress test

* ensuring that logic is checking for differences in listener

* Use Node-Problem-Detector v0.6.3 on GCI

* Delete only unscheduled pods if node doesn't exist anymore.

* proxy: Take into account exclude CIDRs while deleting legacy real servers

* Increase default maximumLoadBalancerRuleCount to 250

* kube-proxy: rename internal field for clarity

* kube-proxy: rename vars for clarity, fix err str

* kube-proxy: rename field for congruence

* kube-proxy: reject 0 endpoints on forward

Previously we only REJECTed on OUTPUT which works for packets from the
node but not for packets from pods on the node.

* kube-proxy: remove old cleanup rules

* Kube-proxy: REJECT LB IPs with no endpoints

We REJECT every other case.  Close this FIXME.

To get this to work in all cases, we have to process service in
filter.INPUT, since LB IPS might be manged as local addresses.

* Retool HTTP and UDP e2e utils

This is a prefactoring for followup changes that need to use very
similar but subtly different test.  Now it is more generic, though it
pushes a little logic up the stack.  That makes sense to me.

* Fix small race in e2e

Occasionally we get spurious errors about "no route to host" when we
race with kube-proxy.  This should reduce that.  It's mostly just log
noise.

* Fix Azure SLB support for multiple backend pools

Azure VM and vmssVM support multiple backend pools for the same SLB, but
not for different LBs.

* Revert "Merge pull request kubernetes#76529 from spencerhance/automated-cherry-pick-of-#72534-kubernetes#74394-upstream-release-1.12"

This reverts commit 535e3ad, reversing
changes made to 336d787.
k8s-ci-robot added a commit that referenced this pull request May 7, 2019
…#72534-#74394-upstream-release-1.13

Automated cherry pick of #72534: kube-proxy: rename internal field for clarity #74394: Fix small race in e2e
rjaini added a commit to msazurestackworkloads/kubernetes that referenced this pull request May 31, 2019
* Fix kubernetes#73479 AWS NLB target groups missing tags

`elbv2.AddTags` doesn't seem to support assigning the same set of
tags to multiple resources at once leading to the following error:
  Error adding tags after modifying load balancer targets:
  "ValidationError: Only one resource can be tagged at a time"

This can happen when using AWS NLB with multiple listeners pointing
to different node ports.

When k8s creates a NLB it creates a target group per listener along
with installing security group ingress rules allowing the traffic to
reach the k8s nodes.

Unfortunately if those target groups are not tagged, k8s will not
manage them, thinking it is not the owner.

This small changes assigns tags one resource at a time instead of
batching them as before.

Signed-off-by: Brice Figureau <brice@daysofwonder.com>

* remove get azure accounts in the init process set timeout for get azure account operation

use const for timeout value

remove get azure accounts in the init process

add lock for account init

* add timeout in GetVolumeLimits operation

add timeout for getAllStorageAccounts

* add mixed protocol support for azure load balancer

* record event on endpoint update failure

* fix parse devicePath issue on Azure Disk

* Fix scanning of failed targets

If a iSCSI target is down while a volume is attached, reading from
/sys/class/iscsi_host/host415/device/session383/connection383:0/iscsi_connection/connection383:0/address
fails with an error. Kubelet should assume that such target is not
available / logged in and try to relogin. Eventually, if such error
persists, it should continue mounting the volume if the other
paths are healthy instead of failing whole WaitForAttach().

* Kubernetes version v1.12.7-beta.0 openapi-spec file updates

* add retry for detach azure disk

add more logging info in detach disk

add more logging for azure disk attach/detach

* Add/Update CHANGELOG-1.12.md for v1.12.6.

* Reduce cardinality of admission webhook metrics

* fix negative slice index error in keymutex

* Remove reflector metrics as they currently cause a memory leak

* Explicitly set GVK when sending objects to webhooks

* add Azure Container Registry anonymous repo support

apply fix for msi and fix test failure

* DaemonSet e2e: Update image and rolling upgrade test timeout

Use Nginx as the DaemonSet image instead of the ServeHostname image.
This was changed because the ServeHostname has a sleep after terminating
which makes it incompatible with the DaemonSet Rolling Upgrade e2e test.

In addition, make the DaemonSet Rolling Upgrade e2e test timeout a
function of the number of nodes that make up the cluster. This is
required because the more nodes there are, the longer the time it will
take to complete a rolling upgrade.

Signed-off-by: Alexander Brand <alexbrand09@gmail.com>

* Revert kubelet to default to ttl cache secret/configmap behavior

* cri_stats_provider: overload nil as 0 for exited containers stats

Always report 0 cpu/memory usage for exited containers to make
metrics-server work as expect.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>

* flush iptable chains first and then remove them

while cleaning up ipvs mode. flushing iptable chains first and then
remove the chains. this avoids trying to remove chains that are still
referenced by rules in other chains.

fixes kubernetes#70615

* Checks whether we have cached runtime state before starting a container that requests any device plugin resource. If not, re-issue Allocate grpc calls. This allows us to handle the edge case that a pod got assigned to a node even before it populates its extended resource capacity.

* Fix panic in kubectl cp command

* Augmenting API call retry in nodeinfomanager

* Bump debian-iptables to v11.0.1. Rebase docker image on debian-base:0.4.1

* Adding a check to make sure UseInstanceMetadata flag is true to get data from metadata.

* GetMountRefs fixed to handle corrupted mounts by treating it like an unmounted volume

* Update Cluster Autoscaler version to 1.12.3

* add module 'nf_conntrack' in ipvs prerequisite check

* Allow disable outbound snat when Azure standard load balancer is used

* Ensure Azure load balancer cleaned up on 404 or 403

* fix smb unmount issue on Windows

fix log warning

use IsCorruptedMnt in GetMountRefs on Windows

use errorno in IsCorruptedMnt check

fix comments: add more error code

add more error no checking

change year

fix comments

fix bazel error

fix bazel

fix bazel

fix bazel

revert bazel change

* kubelet: updated logic of verifying a static critical pod

- check if a pod is static by its static pod info
- meanwhile, check if a pod is critical by its corresponding mirror pod info

* Allow session affinity a period of time to setup for new services.

This is to deal with the flaky session affinity test.

* Restore username and password kubectl flags

* build/gci: bump CNI version to 0.7.5

* fix race condition issue for smb mount on windows

change var name

* allows configuring NPD release and flags on GCI and add cluster e2e test

* allows configuring NPD image version in node e2e test and fix the test

* bump repd min size in e2es

* Kubernetes version v1.12.8-beta.0 openapi-spec file updates

* Add/Update CHANGELOG-1.12.md for v1.12.7.

* stop vsphere cloud provider from spamming logs with `failed to patch IP` Fixes: kubernetes#75236

* Do not delete existing VS and RS when starting

* Fix updating 'currentMetrics' field for HPA with 'AverageValue' target

* Populate ClientCA in delegating auth setup

kubernetes#67768 accidentally removed population of the the ClientCA
in the delegating auth setup code.  This restores it.

* Update gcp images with security patches

[stackdriver addon] Bump prometheus-to-sd to v0.5.0 to pick up security fixes.
[fluentd-gcp addon] Bump fluentd-gcp-scaler to v0.5.1 to pick up security fixes.
[fluentd-gcp addon] Bump event-exporter to v0.2.4 to pick up security fixes.
[fluentd-gcp addon] Bump prometheus-to-sd to v0.5.0 to pick up security fixes.
[metatada-proxy addon] Bump prometheus-to-sd v0.5.0 to pick up security fixes.

* Fix AWS driver fails to provision specified fsType

* Updated regional PD minimum size; changed regional PD failover test to use StorageClassTest to generate PVC template

* Bump debian-iptables to v11.0.2

* Avoid panic in cronjob sorting

This change handles the case where the ith cronjob may have its start
time set to nil.

Previously, the Less method could cause a panic in case the ith
cronjob had its start time set to nil, but the jth cronjob did not. It
would panic when calling Before on a nil StartTime.

* Add volume mode downgrade test: should not mount/map in <1.13

* disable HTTP2 ingress test

* ensuring that logic is checking for differences in listener

* Use Node-Problem-Detector v0.6.3 on GCI

* Delete only unscheduled pods if node doesn't exist anymore.

* proxy: Take into account exclude CIDRs while deleting legacy real servers

* Increase default maximumLoadBalancerRuleCount to 250

* kube-proxy: rename internal field for clarity

* kube-proxy: rename vars for clarity, fix err str

* kube-proxy: rename field for congruence

* kube-proxy: reject 0 endpoints on forward

Previously we only REJECTed on OUTPUT which works for packets from the
node but not for packets from pods on the node.

* kube-proxy: remove old cleanup rules

* Kube-proxy: REJECT LB IPs with no endpoints

We REJECT every other case.  Close this FIXME.

To get this to work in all cases, we have to process service in
filter.INPUT, since LB IPS might be manged as local addresses.

* Retool HTTP and UDP e2e utils

This is a prefactoring for followup changes that need to use very
similar but subtly different test.  Now it is more generic, though it
pushes a little logic up the stack.  That makes sense to me.

* Fix small race in e2e

Occasionally we get spurious errors about "no route to host" when we
race with kube-proxy.  This should reduce that.  It's mostly just log
noise.

* Fix Azure SLB support for multiple backend pools

Azure VM and vmssVM support multiple backend pools for the same SLB, but
not for different LBs.

* Set CPU metrics for init containers under containerd

Copies PR kubernetes#76503 for release-1.12.

metrics-server doesn't return metrics for pods with init containers
under containerd because they have incomplete CPU metrics returned by
the kubelet /stats/summary API.

This problem has been fixed in 1.14 (kubernetes#74336), but the cherry-picks
dropped the usageNanoCores metric.

This change adds the missing usageNanoCores metric for init containers
in Kubernetes v1.12.

Fixes kubernetes#76292

* Restore metrics-server using of IP addresses

This preference list matches is used to pick prefered field from k8s
node object. It was introduced in metrics-server 0.3 and changed default
behaviour to use DNS instead of IP addresses. It was merged into k8s
1.12 and caused breaking change by introducing dependency on DNS
configuration.

* Revert "Merge pull request kubernetes#76529 from spencerhance/automated-cherry-pick-of-#72534-kubernetes#74394-upstream-release-1.12"

This reverts commit 535e3ad, reversing
changes made to 336d787.

* Kubernetes version v1.12.9-beta.0 openapi-spec file updates

* Add/Update CHANGELOG-1.12.md for v1.12.8.

* Upgrade compute API to version 2019-03-01

* Replace vmss update API with instance-level update API

* Cleanup codes that not required any more

* Add unit tests

* Update vendors

* Update Cluster Autoscaler to 1.12.5

* add shareName param in azure file storage class

skip create azure file if it exists

remove comments

* Create the "internal" firewall rule for kubemark master.

This is equivalent to the "internal" firewall rule that is created for
the regular masters.
The main reason for doing it is to allow prometheus scraping metrics
from various kubemark master components, e.g. kubelet.

Ref. kubernetes/perf-tests#503

* refactor detach azure disk retry operation

* move disk lock process to azure cloud provider

fix comments

fix import keymux check error

add unit test for attach/detach disk funcs

fix bazel issue

rebase

* fix disk list corruption issue

* Fix verify godeps failure for 1.12

github.com/evanphx/json-patch added a new tag at the same sha this
morning: https://github.com/evanphx/json-patch/releases/tag/v4.2.0

This confused godeps. This PR updates our file to match godeps
expectation.

Fixes issue 77238

* Upgrade Stackdriver Logging Agent addon image from 1.6.0 to 1.6.8.

* Test kubectl cp escape

* Properly handle links in tar

* use k8s.gcr.io/pause instead of kubernetes/pause

* Pick up security patches for fluentd-gcp-scaler by upgrading to version 0.5.2

* Error when etcd3 watch finds delete event with nil prevKV

* Make CreatePrivilegedPSPBinding reentrant

Make CreatePrivilegedPSPBinding reentrant so tests using it (e.g. DNS) can be
executed more than once against a cluster. Without this change, such tests will
fail because the PSP already exists, short circuiting test setup.

* check if Memory is not nil for container stats

* In GuaranteedUpdate, retry on any error if we are working with stale data

* BoundServiceAccountTokenVolume: fix InClusterConfig

* fix CVE-2019-11244: `kubectl --http-cache=<world-accessible dir>` creates world-writeable cached schema files

* Terminate watchers when watch cache is destroyed

* honor overridden tokenfile, add InClusterConfig override tests

* fix incorrect prometheus metrics
rjaini added a commit to msazurestackworkloads/kubernetes that referenced this pull request Jul 11, 2019
* Fix bug with volume getting marked as not in-use with pending op

Add test for verifying volume detach

* Fix flake with e2e test that checks detach while mount in progress

A volume can show up as in-use even before it gets attached
to the node.

* Fix kubernetes#73479 AWS NLB target groups missing tags

`elbv2.AddTags` doesn't seem to support assigning the same set of
tags to multiple resources at once leading to the following error:
  Error adding tags after modifying load balancer targets:
  "ValidationError: Only one resource can be tagged at a time"

This can happen when using AWS NLB with multiple listeners pointing
to different node ports.

When k8s creates a NLB it creates a target group per listener along
with installing security group ingress rules allowing the traffic to
reach the k8s nodes.

Unfortunately if those target groups are not tagged, k8s will not
manage them, thinking it is not the owner.

This small changes assigns tags one resource at a time instead of
batching them as before.

Signed-off-by: Brice Figureau <brice@daysofwonder.com>

* remove get azure accounts in the init process set timeout for get azure account operation

use const for timeout value

remove get azure accounts in the init process

add lock for account init

* add timeout in GetVolumeLimits operation

add timeout for getAllStorageAccounts

* add mixed protocol support for azure load balancer

* record event on endpoint update failure

* fix parse devicePath issue on Azure Disk

* Fix scanning of failed targets

If a iSCSI target is down while a volume is attached, reading from
/sys/class/iscsi_host/host415/device/session383/connection383:0/iscsi_connection/connection383:0/address
fails with an error. Kubelet should assume that such target is not
available / logged in and try to relogin. Eventually, if such error
persists, it should continue mounting the volume if the other
paths are healthy instead of failing whole WaitForAttach().

* Kubernetes version v1.12.7-beta.0 openapi-spec file updates

* add retry for detach azure disk

add more logging info in detach disk

add more logging for azure disk attach/detach

* Add/Update CHANGELOG-1.12.md for v1.12.6.

* Reduce cardinality of admission webhook metrics

* fix negative slice index error in keymutex

* Remove reflector metrics as they currently cause a memory leak

* Explicitly set GVK when sending objects to webhooks

* add Azure Container Registry anonymous repo support

apply fix for msi and fix test failure

* DaemonSet e2e: Update image and rolling upgrade test timeout

Use Nginx as the DaemonSet image instead of the ServeHostname image.
This was changed because the ServeHostname has a sleep after terminating
which makes it incompatible with the DaemonSet Rolling Upgrade e2e test.

In addition, make the DaemonSet Rolling Upgrade e2e test timeout a
function of the number of nodes that make up the cluster. This is
required because the more nodes there are, the longer the time it will
take to complete a rolling upgrade.

Signed-off-by: Alexander Brand <alexbrand09@gmail.com>

* Revert kubelet to default to ttl cache secret/configmap behavior

* cri_stats_provider: overload nil as 0 for exited containers stats

Always report 0 cpu/memory usage for exited containers to make
metrics-server work as expect.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>

* flush iptable chains first and then remove them

while cleaning up ipvs mode. flushing iptable chains first and then
remove the chains. this avoids trying to remove chains that are still
referenced by rules in other chains.

fixes kubernetes#70615

* Checks whether we have cached runtime state before starting a container that requests any device plugin resource. If not, re-issue Allocate grpc calls. This allows us to handle the edge case that a pod got assigned to a node even before it populates its extended resource capacity.

* Fix panic in kubectl cp command

* Augmenting API call retry in nodeinfomanager

* Bump debian-iptables to v11.0.1. Rebase docker image on debian-base:0.4.1

* Adding a check to make sure UseInstanceMetadata flag is true to get data from metadata.

* GetMountRefs fixed to handle corrupted mounts by treating it like an unmounted volume

* Update Cluster Autoscaler version to 1.12.3

* add module 'nf_conntrack' in ipvs prerequisite check

* Allow disable outbound snat when Azure standard load balancer is used

* Ensure Azure load balancer cleaned up on 404 or 403

* fix smb unmount issue on Windows

fix log warning

use IsCorruptedMnt in GetMountRefs on Windows

use errorno in IsCorruptedMnt check

fix comments: add more error code

add more error no checking

change year

fix comments

fix bazel error

fix bazel

fix bazel

fix bazel

revert bazel change

* kubelet: updated logic of verifying a static critical pod

- check if a pod is static by its static pod info
- meanwhile, check if a pod is critical by its corresponding mirror pod info

* Allow session affinity a period of time to setup for new services.

This is to deal with the flaky session affinity test.

* Restore username and password kubectl flags

* build/gci: bump CNI version to 0.7.5

* fix race condition issue for smb mount on windows

change var name

* allows configuring NPD release and flags on GCI and add cluster e2e test

* allows configuring NPD image version in node e2e test and fix the test

* bump repd min size in e2es

* Kubernetes version v1.12.8-beta.0 openapi-spec file updates

* Add/Update CHANGELOG-1.12.md for v1.12.7.

* stop vsphere cloud provider from spamming logs with `failed to patch IP` Fixes: kubernetes#75236

* Do not delete existing VS and RS when starting

* Fix updating 'currentMetrics' field for HPA with 'AverageValue' target

* Populate ClientCA in delegating auth setup

kubernetes#67768 accidentally removed population of the the ClientCA
in the delegating auth setup code.  This restores it.

* Update gcp images with security patches

[stackdriver addon] Bump prometheus-to-sd to v0.5.0 to pick up security fixes.
[fluentd-gcp addon] Bump fluentd-gcp-scaler to v0.5.1 to pick up security fixes.
[fluentd-gcp addon] Bump event-exporter to v0.2.4 to pick up security fixes.
[fluentd-gcp addon] Bump prometheus-to-sd to v0.5.0 to pick up security fixes.
[metatada-proxy addon] Bump prometheus-to-sd v0.5.0 to pick up security fixes.

* Fix AWS driver fails to provision specified fsType

* Updated regional PD minimum size; changed regional PD failover test to use StorageClassTest to generate PVC template

* Bump debian-iptables to v11.0.2

* Avoid panic in cronjob sorting

This change handles the case where the ith cronjob may have its start
time set to nil.

Previously, the Less method could cause a panic in case the ith
cronjob had its start time set to nil, but the jth cronjob did not. It
would panic when calling Before on a nil StartTime.

* Add volume mode downgrade test: should not mount/map in <1.13

* disable HTTP2 ingress test

* ensuring that logic is checking for differences in listener

* Use Node-Problem-Detector v0.6.3 on GCI

* Delete only unscheduled pods if node doesn't exist anymore.

* proxy: Take into account exclude CIDRs while deleting legacy real servers

* Increase default maximumLoadBalancerRuleCount to 250

* kube-proxy: rename internal field for clarity

* kube-proxy: rename vars for clarity, fix err str

* kube-proxy: rename field for congruence

* kube-proxy: reject 0 endpoints on forward

Previously we only REJECTed on OUTPUT which works for packets from the
node but not for packets from pods on the node.

* kube-proxy: remove old cleanup rules

* Kube-proxy: REJECT LB IPs with no endpoints

We REJECT every other case.  Close this FIXME.

To get this to work in all cases, we have to process service in
filter.INPUT, since LB IPS might be manged as local addresses.

* Retool HTTP and UDP e2e utils

This is a prefactoring for followup changes that need to use very
similar but subtly different test.  Now it is more generic, though it
pushes a little logic up the stack.  That makes sense to me.

* Fix small race in e2e

Occasionally we get spurious errors about "no route to host" when we
race with kube-proxy.  This should reduce that.  It's mostly just log
noise.

* Fix Azure SLB support for multiple backend pools

Azure VM and vmssVM support multiple backend pools for the same SLB, but
not for different LBs.

* Set CPU metrics for init containers under containerd

Copies PR kubernetes#76503 for release-1.12.

metrics-server doesn't return metrics for pods with init containers
under containerd because they have incomplete CPU metrics returned by
the kubelet /stats/summary API.

This problem has been fixed in 1.14 (kubernetes#74336), but the cherry-picks
dropped the usageNanoCores metric.

This change adds the missing usageNanoCores metric for init containers
in Kubernetes v1.12.

Fixes kubernetes#76292

* Restore metrics-server using of IP addresses

This preference list matches is used to pick prefered field from k8s
node object. It was introduced in metrics-server 0.3 and changed default
behaviour to use DNS instead of IP addresses. It was merged into k8s
1.12 and caused breaking change by introducing dependency on DNS
configuration.

* Revert "Merge pull request kubernetes#76529 from spencerhance/automated-cherry-pick-of-#72534-kubernetes#74394-upstream-release-1.12"

This reverts commit 535e3ad, reversing
changes made to 336d787.

* Kubernetes version v1.12.9-beta.0 openapi-spec file updates

* Add/Update CHANGELOG-1.12.md for v1.12.8.

* Upgrade compute API to version 2019-03-01

* Replace vmss update API with instance-level update API

* Cleanup codes that not required any more

* Add unit tests

* Update vendors

* Update Cluster Autoscaler to 1.12.5

* add shareName param in azure file storage class

skip create azure file if it exists

remove comments

* Create the "internal" firewall rule for kubemark master.

This is equivalent to the "internal" firewall rule that is created for
the regular masters.
The main reason for doing it is to allow prometheus scraping metrics
from various kubemark master components, e.g. kubelet.

Ref. kubernetes/perf-tests#503

* refactor detach azure disk retry operation

* move disk lock process to azure cloud provider

fix comments

fix import keymux check error

add unit test for attach/detach disk funcs

fix bazel issue

rebase

* fix disk list corruption issue

* Fix verify godeps failure for 1.12

github.com/evanphx/json-patch added a new tag at the same sha this
morning: https://github.com/evanphx/json-patch/releases/tag/v4.2.0

This confused godeps. This PR updates our file to match godeps
expectation.

Fixes issue 77238

* Upgrade Stackdriver Logging Agent addon image from 1.6.0 to 1.6.8.

* Test kubectl cp escape

* Properly handle links in tar

* use k8s.gcr.io/pause instead of kubernetes/pause

* Pick up security patches for fluentd-gcp-scaler by upgrading to version 0.5.2

* Error when etcd3 watch finds delete event with nil prevKV

* Make CreatePrivilegedPSPBinding reentrant

Make CreatePrivilegedPSPBinding reentrant so tests using it (e.g. DNS) can be
executed more than once against a cluster. Without this change, such tests will
fail because the PSP already exists, short circuiting test setup.

* check if Memory is not nil for container stats

* Bump ip-masq-agent version to v2.3.0

* In GuaranteedUpdate, retry on any error if we are working with stale data

* BoundServiceAccountTokenVolume: fix InClusterConfig

* fix CVE-2019-11244: `kubectl --http-cache=<world-accessible dir>` creates world-writeable cached schema files

* Terminate watchers when watch cache is destroyed

* honor overridden tokenfile, add InClusterConfig override tests

* fix incorrect prometheus metrics

* Kubernetes version v1.12.10-beta.0 openapi-spec file updates

* Add/Update CHANGELOG-1.12.md for v1.12.9.

* fix azure retry issue when return 2XX with error

fix comments

* Disable graceful termination for udp

* fix: update vm if detach a non-existing disk

fix gofmt issue

fix build error

* Fix incorrect procMount defaulting

* ipvs: fix string check for IPVS protocol during graceful termination

Signed-off-by: Andrew Sy Kim <kiman@vmware.com>

* kubeadm: apply taints on non-control-plane node join

This backports a change made in 1.13 which fixes the process applying
taints when joining worker nodes.

* fix flexvol stuck issue due to corrupted mnt point

fix comments about PathExists

fix comments

revert change in PathExists func

* Avoid the default server mux

* Default resourceGroup should be used when value of annotation azure-load-balancer-resource-group is empty string
@thockin thockin deleted the proxy-reject-lb-no-endpoints branch August 14, 2019 17:42
tssurya added a commit to tssurya/kubernetes that referenced this pull request Oct 19, 2020
In kubernetes#56164, we had split the reject rules for non-ep existing services
into KUBE-EXTERNAL-SERVICES chain in order to avoid calling KUBE-SERVICES
from INPUT. However in kubernetes#74394 KUBE-SERVICES was re-added into INPUT.

As noted in kubernetes#56164, kernel is sensitive to the size of INPUT chain. This
patch refrains from calling the KUBE-SERVICES chain from INPUT and FORWARD,
instead adds the lb reject rule to the KUBE-EXTERNAL-SERVICES chain which will be
called from INPUT and FORWARD.
tssurya added a commit to tssurya/kubernetes that referenced this pull request Oct 26, 2020
In kubernetes#56164, we had split the reject rules for non-ep existing services
into KUBE-EXTERNAL-SERVICES chain in order to avoid calling KUBE-SERVICES
from INPUT. However in kubernetes#74394 KUBE-SERVICES was re-added into INPUT.

As noted in kubernetes#56164, kernel is sensitive to the size of INPUT chain. This
patch refrains from calling the KUBE-SERVICES chain from INPUT and FORWARD,
instead adds the lb reject rule to the KUBE-EXTERNAL-SERVICES chain which will be
called from INPUT and FORWARD.

Conflicts:
    pkg/proxy/iptables/proxier.go
Minor conflict due to 1f7ea16
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/kubernetes that referenced this pull request Oct 27, 2020
In kubernetes#56164, we had split the reject rules for non-ep existing services
into KUBE-EXTERNAL-SERVICES chain in order to avoid calling KUBE-SERVICES
from INPUT. However in kubernetes#74394 KUBE-SERVICES was re-added into INPUT.

As noted in kubernetes#56164, kernel is sensitive to the size of INPUT chain. This
patch refrains from calling the KUBE-SERVICES chain from INPUT and FORWARD,
instead adds the lb reject rule to the KUBE-EXTERNAL-SERVICES chain which will be
called from INPUT and FORWARD.

Conflicts:
    pkg/proxy/iptables/proxier.go
Minor conflict due to 1f7ea16
tssurya added a commit to tssurya/kubernetes that referenced this pull request Nov 3, 2020
In kubernetes#56164, we had split the reject rules for non-ep existing services
into KUBE-EXTERNAL-SERVICES chain in order to avoid calling KUBE-SERVICES
from INPUT. However in kubernetes#74394 KUBE-SERVICES was re-added into INPUT.

As noted in kubernetes#56164, kernel is sensitive to the size of INPUT chain. This
patch refrains from calling the KUBE-SERVICES chain from INPUT and FORWARD,
instead adds the lb reject rule to the KUBE-EXTERNAL-SERVICES chain which will be
called from INPUT and FORWARD.

Conflicts:
    pkg/proxy/iptables/proxier.go
Minor conflict due to 1f7ea16
tssurya added a commit to tssurya/kubernetes that referenced this pull request Nov 9, 2020
In kubernetes#56164, we had split the reject rules for non-ep existing services
into KUBE-EXTERNAL-SERVICES chain in order to avoid calling KUBE-SERVICES
from INPUT. However in kubernetes#74394 KUBE-SERVICES was re-added into INPUT.

As noted in kubernetes#56164, kernel is sensitive to the size of INPUT chain. This
patch refrains from calling the KUBE-SERVICES chain from INPUT and FORWARD,
instead adds the lb reject rule to the KUBE-EXTERNAL-SERVICES chain which will be
called from INPUT and FORWARD.

Conflicts:
    pkg/proxy/iptables/proxier.go
Minor conflict due to 1f7ea16
mrmassis pushed a commit to mrmassis/kubernetes that referenced this pull request Nov 12, 2020
In kubernetes#56164, we had split the reject rules for non-ep existing services
into KUBE-EXTERNAL-SERVICES chain in order to avoid calling KUBE-SERVICES
from INPUT. However in kubernetes#74394 KUBE-SERVICES was re-added into INPUT.

As noted in kubernetes#56164, kernel is sensitive to the size of INPUT chain. This
patch refrains from calling the KUBE-SERVICES chain from INPUT and FORWARD,
instead adds the lb reject rule to the KUBE-EXTERNAL-SERVICES chain which will be
called from INPUT and FORWARD.
tssurya added a commit to tssurya/kubernetes that referenced this pull request Nov 20, 2020
In kubernetes#56164, we had split the reject rules for non-ep existing services
into KUBE-EXTERNAL-SERVICES chain in order to avoid calling KUBE-SERVICES
from INPUT. However in kubernetes#74394 KUBE-SERVICES was re-added into INPUT.

As noted in kubernetes#56164, kernel is sensitive to the size of INPUT chain. This
patch refrains from calling the KUBE-SERVICES chain from INPUT and FORWARD,
instead adds the lb reject rule to the KUBE-EXTERNAL-SERVICES chain which will be
called from INPUT and FORWARD.

Conflicts:
    pkg/proxy/iptables/proxier.go
Minor conflict due to 1f7ea16
tssurya added a commit to tssurya/kubernetes that referenced this pull request Nov 20, 2020
In kubernetes#56164, we had split the reject rules for non-ep existing services
into KUBE-EXTERNAL-SERVICES chain in order to avoid calling KUBE-SERVICES
from INPUT. However in kubernetes#74394 KUBE-SERVICES was re-added into INPUT.

As noted in kubernetes#56164, kernel is sensitive to the size of INPUT chain. This
patch refrains from calling the KUBE-SERVICES chain from INPUT and FORWARD,
instead adds the lb reject rule to the KUBE-EXTERNAL-SERVICES chain which will be
called from INPUT and FORWARD.

Conflicts:
    pkg/proxy/iptables/proxier.go
Minor conflict due to 1f7ea16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants