Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paramaterize stickyMaxAgeMinutes for service in API #49850

Merged

Conversation

m1093782566
Copy link
Contributor

@m1093782566 m1093782566 commented Jul 30, 2017

What this PR does / why we need it:

Currently I find stickyMaxAgeMinutes for a session affinity type service is hard code to 180min. There is a TODO comment, see

https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/iptables/proxier.go#L205

I think the seesion sticky max time varies from service to service and users may not aware of it since it's hard coded in all proxier.go - iptables, userspace and winuserspace.

Once we parameterize it in API, users can set/get the values for their different services.

Perhaps, we can introduce a new field api.ClientIPAffinityConfig in api.ServiceSpec.

There is an initial discussion about it in sig-network group. See,

https://groups.google.com/forum/#!topic/kubernetes-sig-network/i-LkeHrjs80

Which issue this PR fixes:

fixes #49831

Special notes for your reviewer:

Release note:

Paramaterize session affinity timeout seconds in service API for Client IP based session affinity.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 30, 2017
@k8s-ci-robot
Copy link
Contributor

Hi @m1093782566. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-github-robot k8s-github-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. release-note-label-needed labels Jul 30, 2017
@m1093782566 m1093782566 force-pushed the service-session-timeout branch 2 times, most recently from 829d7be to 822d765 Compare July 30, 2017 14:35
@m1093782566
Copy link
Contributor Author

/assign @thockin

@m1093782566 m1093782566 changed the title [WIP] Paramaterize stickyMaxAgeMinutes for service in API Paramaterize stickyMaxAgeMinutes for service in API Jul 30, 2017
@thockin
Copy link
Member

thockin commented Jul 30, 2017

@k8s-bot ok to test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 30, 2017
pkg/api/types.go Outdated
@@ -2573,6 +2573,13 @@ const (
ServiceAffinityNone ServiceAffinity = "None"
)

type ClientIPAffinityConfig struct {
// TimeoutSeconds specifies the seconds of ClientIP type session sticky time.
// The value should be specified if ServiceAffinity == "ClientIP".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment that it must be > 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

pkg/api/types.go Outdated
@@ -2700,6 +2707,10 @@ type ServiceSpec struct {
// +optional
SessionAffinity ServiceAffinity

// Optional: Represents the configurations of Client IP based session affinity.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want a struct, for future safety.

type ServiceAffinityConfig struct {
  // only one of the following may be set
  ClientIPConfig *ServiceAffinityConfigClientIP
}

type ServiceAffinityConfigClientIP struct {
  TimeoutSeconds int
}

Then embed ServiceAffinityConfig

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should see if there is a precedent for "Config" vs "Params"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that there is no precedent for "Config" or "Params". Does it matter?

return allErrs
}

allErrs = append(allErrs, ValidateNonnegativeField(int64(config.TimeoutSeconds), fldPath.Child("timeoutSeconds"))...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need positive, not just non-negative. 0 is no good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Validate (value >0 && value <= maxTimeout)

@@ -2844,6 +2854,8 @@ func ValidateService(service *api.Service) field.ErrorList {
allErrs = append(allErrs, field.NotSupported(specPath.Child("sessionAffinity"), service.Spec.SessionAffinity, supportedSessionAffinityType.List()))
}

allErrs = append(allErrs, validateClientIPAffinityConfig(service.Spec.ClientIPAffinityConfig, specPath.Child("clientIPAffinityConfig"))...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to check which sessionAffinity setting is in use. If it is None, don't verify.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and if it is ClientIP, require that the config is non-nil.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also we should probably document and enforce an upper bound. 86400 for 1 day? 604800 for 1 week?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to check which sessionAffinity setting is in use. If it is None, don't verify.

I check the sessionAffinity setting inside validateClientIPAffinityConfig(). PTAL. THANKS!

and if it is ClientIP, require that the config is non-nil.

If it is ClientIP, I create a default config. Is that ok?

also we should probably document and enforce an upper bound. 86400 for 1 day? 604800 for 1 week?

Current upper bound is 86400 for 1 day. Is that ok?


func TestValidateClientIPAffinityConfig(t *testing.T) {
validTimeoutSeconds := []int32{
0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 can't be valid. what does it mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. 0 is invalid.

}
errorCases := []*api.ClientIPAffinityConfig{}
for _, timeout := range invalidTimeoutSeconds {
config := &api.ClientIPAffinityConfig{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also test nil.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also test too big

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also test too big

Fixed.

also test nil.

Currently I will create a default config if user-provided config is nil. So, I test nil config in success cases. Please take a look. Please let me know if there is problem. Thanks!

@@ -135,6 +135,9 @@ func (lkct LinuxKernelCompatTester) IsCompatible() error {
const sysctlRouteLocalnet = "net/ipv4/conf/all/route_localnet"
const sysctlBridgeCallIPTables = "net/bridge/bridge-nf-call-iptables"

// stickyMaxAgeSeconds default value is 180 minutes
const defaultStickyMaxAgeSeconds int = 180 * 60
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should have a default here. document and default it in the API code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -450,12 +452,16 @@ func (proxier *Proxier) mergeService(service *api.Service) sets.String {
info.loadBalancerStatus = *helper.LoadBalancerStatusDeepCopy(&service.Status.LoadBalancer)
info.nodePort = int(servicePort.NodePort)
info.sessionAffinityType = service.Spec.SessionAffinity
// Use session affinity timeout value
if service.Spec.ClientIPAffinityConfig != nil && service.Spec.ClientIPAffinityConfig.TimeoutSeconds > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API code should ensure that by the time we get here, we already can assume that if affinity == ClientIP, the config structure is non-nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I agree. Fixed.

type ClientIPAffinityConfig struct {
// TimeoutSeconds specifies the seconds of ClientIP type session sticky time.
// The value should be specified if ServiceAffinity == "ClientIP".
// +optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the same comments from internal code, plus:

If this is optional, it must be a pointer, and you need to document the default value
Must be greater than 0.
Must be less than whatever we decide for limit.
No "should" in the comments. If this struct is provided, this field must be specified.

@m1093782566 m1093782566 force-pushed the service-session-timeout branch 6 times, most recently from 2cab86f to b7aa674 Compare July 31, 2017 14:57
@k8s-github-robot k8s-github-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jul 31, 2017
@m1093782566 m1093782566 force-pushed the service-session-timeout branch from b7aa674 to 257e54c Compare July 31, 2017 15:31
@k8s-github-robot k8s-github-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 31, 2017
@m1093782566 m1093782566 force-pushed the service-session-timeout branch from a278a4c to 5514f32 Compare July 31, 2017 16:28
@m1093782566
Copy link
Contributor Author

@thockin

Thanks for your quick reply. I fixed all comments now. Please take a look?

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The defaults should be set in the defaulting code - ./pkg/api/v1/defaults.go

pkg/api/types.go Outdated
// TimeoutSeconds specifies the seconds of ClientIP type session sticky time.
// The value must be >0 && <=86400(for 1 day) if ServiceAffinity == "ClientIP".
// Default value is 86400 for 1 day.
// +optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to decide if this is optional or not. If it is optional, that means a user could specify:

serviceAffinityConfig:
  clientIPConfig: {}

Is that OK? Or do we want to require this field if you ever specify the config at all?

If it is optional, it has to be a pointer, because convention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your good example.

Currently, I changed TimeoutSeconds to be a required field when clientIPConfig is not nil.

pkg/api/types.go Outdated
@@ -2700,6 +2725,10 @@ type ServiceSpec struct {
// +optional
SessionAffinity ServiceAffinity

// ServiceAffinityConfig contains the configurations of session affinity.
// +optional
ServiceAffinityConfig *ServiceAffinityConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should think about this from the API point of view. Do we need 2 levels of nesting? Or should this be embedded.

e.g.

sessionAffinity: ClientIP
serviceAffinityConfig:
  clientIPConfig:
    timeoutSeconds: 60

vs

sessionAffinity: ClientIP
clientIPAffinityConfig:
  timeoutSeconds: 60

Copy link
Contributor Author

@m1093782566 m1093782566 Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should consider it more.

In my previous implementaion, it was

sessionAffinity: ClientIP
clientIPAffinityConfig:
  timeoutSeconds: 60

And from your previous comment,

I think you want a struct, for future safety.

type ServiceAffinityConfig struct {
// only one of the following may be set
ClientIPConfig *ServiceAffinityConfigClientIP
}

type ServiceAffinityConfigClientIP struct {
TimeoutSeconds int
}
Then embed ServiceAffinityConfig

I guess you think clientIPAffinityConfig was not future-safe enough. So, I introduced a 2-level nesting.

sessionAffinity: ClientIP
serviceAffinityConfig:
  clientIPConfig:
    timeoutSeconds: 60

Am I missing the point?

I think the difference is serviceAffinityConfig vs. clientIPAffinityConfig. For example, if we introduce a new sessionAffinity type, such as "UserName", in the future. Perhaps we will to create a new type userNameAffinityConfig which has a timeoutSeconds field.

2 levels of nesting would be

sessionAffinity: UserName
serviceAffinityConfig:
  userNameAffinityConfig:
    timeoutSeconds: 60

embedded would be:

sessionAffinity: UserName
userNameAffinityConfig:
  timeoutSeconds: 60

It seems embedded way looks more clean and it seems serviceAffinityConfig is unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, I adopt the embedded way, and the API definition is

type ServiceSpec {
...
ClientIPAffinityConfig *ClientIPAffinityConfig
}
// ClientIPAffinityConfig represents the configurations of Client IP based session affinity.
type ClientIPAffinityConfig struct {
	TimeoutSeconds int32
}

What do you think about it? Thanks!

externalIPs: make([]string, len(service.Spec.ExternalIPs)),
loadBalancerSourceRanges: make([]string, len(service.Spec.LoadBalancerSourceRanges)),
onlyNodeLocalEndpoints: onlyNodeLocalEndpoints,
}
if info.sessionAffinityType == api.ServiceAffinityClientIP {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract this before, and use the result in the struct assignment. Don't use the API default here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Fixed.

sessionAffinityType: api.ServiceAffinityNone, // default
stickyMaxAgeMinutes: 180, // TODO: parameterize this in the API.
sessionAffinityType: api.ServiceAffinityNone, // default
stickyMaxAgeSeconds: int(api.ServiceAffinityClientIPDefaultSeconds), // default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as other - don't use the default here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Fixed.

ttlMinutes = 180 //default to 3 hours if not specified. Should 0 be unlimited instead????
func (lb *LoadBalancerRR) newServiceInternal(svcPort proxy.ServicePortName, affinityType api.ServiceAffinity, ttlSeconds int) *balancerState {
if ttlSeconds == 0 {
ttlSeconds = int(api.ServiceAffinityClientIPDefaultSeconds) //default to 3 hours if not specified. Should 0 be unlimited instead????
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as other - don't use the default here. do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this?

if ttlSeconds == 0 {
	ttlSeconds = int(api.ServiceAffinityClientIPDefaultSeconds) //default to 3 hours if not specified.  Should 0 be unlimited instead????

I'm afraid we may need this.

Since the function func (lb *LoadBalancerRR) newServiceInternal() will be called by OnEndpointsAdd(), see https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/userspace/roundrobin.go#L278-L282

and OnEndpointsUpdate(), see https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/userspace/roundrobin.go#L312-L316

They both set "0" as the ttlSeconds for newServiceInternal()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to keep it in order to not breaking the existing logic? What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are just placeholders until a real Service is propagated, right? Been a while since I touched that,

ttlMinutes = 180 //default to 3 hours if not specified. Should 0 be unlimited instead????
func (lb *LoadBalancerRR) newServiceInternal(svcPort proxy.ServicePortName, affinityType api.ServiceAffinity, ttlSeconds int) *balancerState {
if ttlSeconds == 0 {
ttlSeconds = int(api.ServiceAffinityClientIPDefaultSeconds) //default to 3 hours if not specified. Should 0 be unlimited instead????
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as other - don't use the default here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to userspace, the function func (lb *LoadBalancerRR) newServiceInternal() will be called by OnEndpointsAdd(), see https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/winuserspace/roundrobin.go#L268-L272

and OnEndpointsUpdate(), see https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/winuserspace/roundrobin.go#L302-L306

They both set "0" as the ttlSeconds for newServiceInternal(). So, we may need to set a default value here.

}
if affinity == api.ServiceAffinityClientIP {
// If ServiceAffinityConfig is nil, set the default value
if config == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaulting happens in API defaulting code, not validation.

./pkg/api/v1/defaults.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your tips. It's fixed now.

@m1093782566 m1093782566 force-pushed the service-session-timeout branch from b4d5404 to ad73fe6 Compare August 25, 2017 10:28
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 25, 2017
@m1093782566
Copy link
Contributor Author

/test pull-kubernetes-unit

@thockin
Copy link
Member

thockin commented Aug 25, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 25, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: m1093782566, thockin

Associated issue: 49831

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@m1093782566
Copy link
Contributor Author

/test pull-kubernetes-e2e-gce-etcd3

@m1093782566
Copy link
Contributor Author

/test pull-kubernetes-e2e-gce-bazel

@m1093782566
Copy link
Contributor Author

/retest

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 49850, 47782, 50595, 50730, 51341)

@k8s-github-robot k8s-github-robot merged commit b65f3cc into kubernetes:master Aug 26, 2017
@m1093782566 m1093782566 deleted the service-session-timeout branch August 26, 2017 03:49
@gnufied
Copy link
Member

gnufied commented Aug 29, 2017

I am seeing panic errors after this PR.

/data/go/src/runtime/asm_amd64.s:2197
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1616744]

goroutine 48 [running]:
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        /home/shared/go_projects/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x126
panic(0x18b5200, 0x2610290)
        /data/go/src/runtime/panic.go:489 +0x2cf
k8s.io/kubernetes/pkg/proxy/iptables.newServiceInfo(0xc4203bc7f0, 0x7, 0xc4203bc7e0, 0xa, 0xc4203bc82a, 0x5, 0xc42075c730, 0xc4203805a0, 0x9eb318c987fc4ed7)
        /home/shared/go_projects/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/proxy/iptables/proxier.go:199 +0x9d4
k8s.io/kubernetes/pkg/proxy/iptables.serviceToServiceMap(0xc4203805a0, 0xc4206e2060)
        /home/shared/go_projects/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/proxy/iptables/proxier.go:875 +0x1c8
k8s.io/kubernetes/pkg/proxy/iptables.(*serviceChangeMap).update(0xc420385620, 0xc420489df8, 0x0, 0xc4203805a0, 0x1ad7000)
        /home/shared/go_projects/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/proxy/iptables/proxier.go:312 +0xb4
k8s.io/kubernetes/pkg/proxy/iptables.(*Proxier).OnServiceAdd(0xc420385600, 0xc4203805a0)
        /home/shared/go_projects/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/proxy/iptables/proxier.go:645 +0x9a
k8s.io/kubernetes/pkg/proxy/config.(*ServiceConfig).handleAddService(0xc420295ad0, 0x1a807c0, 0xc4203805a0)
        /home/shared/go_projects/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/proxy/config/config.go:225 +0xe8
k8s.io/kubernetes/pkg/proxy/config.(*ServiceConfig).(k8s.io/kubernetes/pkg/proxy/config.handleAddService)-fm(0x1a807c0, 0xc4203805a0)
        /home/shared/go_projects/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/proxy/config/config.go:182 +0x3e
k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd(0xc42042ae90, 0xc42042aea0, 0xc42042aec0, 0x1a807c0, 0xc4203805a0)
        /home/shared/go_projects/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/controller.go:195 +0x49
k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache.(*ResourceEventHandlerFuncs).OnAdd(0xc4204e2420, 0x1a807c0, 0xc4203805a0)
        <autogenerated>:57 +0x73
k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache.(*processorListener).run(0xc4202fd130)
        /home/shared/go_projects/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/shared_informer.go:545 +0x287
k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache.(*processorListener).(k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache.run)-fm()
        /home/shared/go_projects/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/shared_informer.go:381 +0x2a
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc42000fa98, 0xc420202160)
        /home/shared/go_projects/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71 +0x4f
created by k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start
        /home/shared/go_projects/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:72 +0x62

I run somewhat of a odd setup of kubernetes for development because I don't use ./hack/local-cluster-up.sh, this is how kubeproxy launch command looks like:

/home/shared/go_projects/src/k8s.io/kubernetes/_output/bin/kube-proxy  --logtostderr=true -v 5 --master=http://xaos.lan:8080 --feature-gates ExpandPersistentVolumes=true

@gnufied
Copy link
Member

gnufied commented Aug 29, 2017

Crashes with --kubeconfig variant too:

/home/shared/go_projects/src/k8s.io/kubernetes/_output/bin/kube-proxy --logtostderr=true -v 5 --kubeconfig=/home/hekumar/redhat/kube_scripts/xaos_master_kube/xaos_kube_config --feature-gates ExpandPersistentVolumes=true

@gnufied
Copy link
Member

gnufied commented Aug 29, 2017

I think crash is caused by the service:

╰─k8s.io/kubernetes> k get service kubernetes -o json
{
    "apiVersion": "v1",
    "kind": "Service",
    "metadata": {
        "creationTimestamp": "2017-06-28T16:05:11Z",
        "labels": {
            "component": "apiserver",
            "provider": "kubernetes"
        },
        "name": "kubernetes",
        "namespace": "default",
        "resourceVersion": "6",
        "selfLink": "/api/v1/namespaces/default/services/kubernetes",
        "uid": "909a1c62-5c1b-11e7-8a60-4ccc6ab9973d"
    },
    "spec": {
        "clusterIP": "10.254.0.1",
        "ports": [
            {
                "name": "https",
                "port": 443,
                "protocol": "TCP",
                "targetPort": 6443
            }
        ],
        "sessionAffinity": "ClientIP",
        "type": "ClusterIP"
    },
    "status": {
        "loadBalancer": {}
    }
}

Still trying to figure out why this service doesn't have SessionAffinityConfig as it is expected of it.

@m1093782566
Copy link
Contributor Author

@gnufied Thanks for reporting.

Yes, currently sessionAffinityConfig is required when "sessionAffinity" = "ClientIP".

I will work with you together to find out why :)

Please feel free to update your information here.

@m1093782566
Copy link
Contributor Author

m1093782566 commented Aug 29, 2017 via email

@gnufied
Copy link
Member

gnufied commented Aug 29, 2017

Okay, I think deleting kubernetes service via k delete service kubernetes and restarting kubernetes components has fixed the problem. I see that, there was a change made to add sessionAffinityConfig in service that gets started via controller. I am wondering, how will this affect users who are simply upgrading and already have an etcd database.

@m1093782566
Copy link
Contributor Author

m1093782566 commented Aug 29, 2017 via email

@m1093782566
Copy link
Contributor Author

But I think any old service with Client IP type session affinity may has the same issue. So, I prefer solution 1.

@m1093782566
Copy link
Contributor Author

solution 3, which maybe better I think, add defaulting when create/update service.

k8s-github-robot pushed a commit that referenced this pull request Sep 3, 2017
Automatic merge from submit-queue (batch tested with PRs 51819, 51706, 51761, 51818, 51500)

fix kube-proxy panic because of nil sessionAffinityConfig

**What this PR does / why we need it**:

fix kube-proxy panic because of nil sessionAffinityConfig

**Which issue this PR fixes**: closes #51499 

**Special notes for your reviewer**:

I apology that this bug is introduced by #49850 :(

@thockin @smarterclayton @gnufied 

**Release note**:

```release-note
NONE
```
@chenopis
Copy link

chenopis commented Sep 6, 2017

As per @thockin's comment, I'm cc'ing @steveperry-53 for the doc and @radhikapc and @Bradamant3 for the release note.

@m1093782566
Copy link
Contributor Author

m1093782566 commented Sep 7, 2017

@chenopis Thanks for doing that. Please ping me when PR is ready, I would be very happy to review :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Proposal] Paramaterize stickyMaxAgeMinutes for service in API
10 participants