Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: prevent API server cache list/watch from other namespaces #25342

Closed
wants to merge 1 commit into from

Conversation

mqliang
Copy link
Contributor

@mqliang mqliang commented May 9, 2016

The previous code has a issue:

Assume we have two namespaces ns1 and ns11, when client list/watch key /registry/pods/ns1, API server cache will also return pods in ns11, since ns1 is a prefix of ns11

This PR fix this issue. @wojtek-t

@k8s-github-robot k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note-label-needed labels May 9, 2016
}
return filter(obj)

return match && filter(obj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch in general. Can you please add a unit test for it?

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 11, 2016
@xiang90
Copy link
Contributor

xiang90 commented May 11, 2016

/cc @hongchaodeng

match := false
for _, segment := range segments {
currPath = path.Join(currPath, segment)
if currPath == key {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if key ends with "/"?

Copy link
Contributor

@hongchaodeng hongchaodeng May 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more problem: the code is incorrect for exact match case.
For exact match, i.e. filter("/abc", "/abc"), it will be ignored in WatchList(). See etcd_helper and etcd3/watcher.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. - but it's only for main directory. So e.g. filter(/ns1/pod1, /ns1/pod1) will return true,

I think that we should base on the fact that here objKey is computed from runtime.Object and we should probably assume what it returns. That said, we should allow for exact match.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine here because the results are already filtered out by underlying storage impl.?

@mqliang
Copy link
Contributor Author

mqliang commented May 12, 2016

@wojtek-t IIUC, when a client LIST a nonexistent key, the cache layer will return a empty list, and if a client WATCH a nonexistent key, the cache layer will never return any events, while it should return 404 in such a case?

@wojtek-t
Copy link
Member

@wojtek-t IIUC, when a client LIST a nonexistent key, the cache layer will return a empty list, and if a client WATCH a nonexistent key, the cache layer will never return any events, while it should return 404 in such a case?

I don't agree. If client WATCH a nonexistent key it should open the watch and wait until such object appears. It may appear at some point. If not, then we will never return anything and this is fine.

We definitely want to watch for objects that don't exist now, and this is a very common practice in fact.

@mqliang
Copy link
Contributor Author

mqliang commented May 16, 2016

@wojtek-t

We definitely want to watch for objects that don't exist now, and this is a very common practice in fact.

Ok, the behavior when watch sounds good to me. But it seems that we should fix the behavior when list(cache should return KeyNotFound error instead of an empty list)? Since I think the cache layer should not break any assumption the old clients may hold before having such a cache. Thoughts?

@wojtek-t
Copy link
Member

wojtek-t commented May 16, 2016

If apiserver without cacher returns something different than apiserver with cacher, then this is definitely a bug and we should make them return the same (change cacher).

@wojtek-t
Copy link
Member

So to clarify - yes, they should be compatible.

@wojtek-t
Copy link
Member

wojtek-t commented Jul 4, 2016

@mqliang - sorry for being super unresponsive last time - I was on vacation and before I was super busy with 1.3 release.
I think this is now your turn on this PR, right? If you expect to me to do something on this one, please ping me.

@mqliang
Copy link
Contributor Author

mqliang commented Jul 4, 2016

@wojtek-t Never mind, it's not a urgent PR. Will update this tomorrow.

As for the "return KeyNotFound error instead of an empty list when list from a nonexistent namespace" issue, I will fix it after your #27277 are merged. I'd like add a namespace index for namespace scoped resource. It's very helpful for the above issue and for the "kubectl get XXX --namespcace=YYY" use case(it's also very common that many applications build on k8s need list a resource in a namespace).

@wojtek-t
Copy link
Member

wojtek-t commented Jul 4, 2016

@mqliang - thanks

@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 7, 2016
@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 7, 2016
@mqliang
Copy link
Contributor Author

mqliang commented Jul 7, 2016

@wojtek-t Updated. PTAL.

@wojtek-t wojtek-t added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels Jul 7, 2016
@wojtek-t
Copy link
Member

wojtek-t commented Jul 7, 2016

@mqliang - thanks!

LGTM

@wojtek-t wojtek-t added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 7, 2016
@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Jul 7, 2016

GCE e2e build/test passed for commit 3f7bca9.

@wojtek-t
Copy link
Member

wojtek-t commented Jul 7, 2016

@mqliang - it seems the test you added is flaky:

k8s.io/kubernetes/pkg/storage TestWatchList 1.50s
 go test -v k8s.io/kubernetes/pkg/storage -run TestWatchList$

cacher_test.go:182: (called from line 353)
cacher_test.go:183: Expected (ADDED): &api.Pod{TypeMeta:unversioned.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"foo", GenerateName:"", Namespace:"ns", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:unversioned.Time{Time:time.Time{sec:0, nsec:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*unversioned.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]api.OwnerReference(nil), Finalizers:[]string(nil)}, Spec:api.PodSpec{Volumes:[]api.Volume(nil), InitContainers:[]api.Container(nil), Containers:[]api.Container(nil), RestartPolicy:"Always", TerminationGracePeriodSeconds:(*int64)(0xc82031fbf8), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"ClusterFirst", NodeSelector:map[string]string(nil), ServiceAccountName:"", NodeName:"fakeNode", SecurityContext:(*api.PodSecurityContext)(0xc82081ab00), ImagePullSecrets:[]api.LocalObjectReference(nil), Hostname:"", Subdomain:""}, Status:api.PodStatus{Phase:"", Conditions:[]api.PodCondition(nil), Message:"", Reason:"", HostIP:"", PodIP:"", StartTime:(*unversioned.Time)(nil), InitContainerStatuses:[]api.ContainerStatus(nil), ContainerStatuses:[]api.ContainerStatus(nil)}}, got: &api.Pod{TypeMeta:unversioned.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"bar", GenerateName:"", Namespace:"ns", SelfLink:"", UID:"", ResourceVersion:"5", Generation:0, CreationTimestamp:unversioned.Time{Time:time.Time{sec:0, nsec:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*unversioned.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]api.OwnerReference(nil), Finalizers:[]string(nil)}, Spec:api.PodSpec{Volumes:[]api.Volume(nil), InitContainers:[]api.Container(nil), Containers:[]api.Container(nil), RestartPolicy:"Always", TerminationGracePeriodSeconds:(*int64)(0xc8202e3690), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"ClusterFirst", NodeSelector:map[string]string(nil), ServiceAccountName:"", NodeName:"", SecurityContext:(*api.PodSecurityContext)(0xc8208e5d00), ImagePullSecrets:[]api.LocalObjectReference(nil), Hostname:"", Subdomain:""}, Status:api.PodStatus{Phase:"", Conditions:[]api.PodCondition(nil), Message:"", Reason:"", HostIP:"", PodIP:"", StartTime:(*unversioned.Time)(nil), InitContainerStatuses:[]api.ContainerStatus(nil), ContainerStatuses:[]api.ContainerStatus(nil)}}
cacher_test.go:182: (called from line 357)
cacher_test.go:183: Expected (ADDED): &api.Pod{TypeMeta:unversioned.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"bar", GenerateName:"", Namespace:"ns", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:unversioned.Time{Time:time.Time{sec:0, nsec:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*unversioned.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]api.OwnerReference(nil), Finalizers:[]string(nil)}, Spec:api.PodSpec{Volumes:[]api.Volume(nil), InitContainers:[]api.Container(nil), Containers:[]api.Container(nil), RestartPolicy:"Always", TerminationGracePeriodSeconds:(*int64)(0xc82031fbe0), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"ClusterFirst", NodeSelector:map[string]string(nil), ServiceAccountName:"", NodeName:"", SecurityContext:(*api.PodSecurityContext)(0xc82081aa40), ImagePullSecrets:[]api.LocalObjectReference(nil), Hostname:"", Subdomain:""}, Status:api.PodStatus{Phase:"", Conditions:[]api.PodCondition(nil), Message:"", Reason:"", HostIP:"", PodIP:"", StartTime:(*unversioned.Time)(nil), InitContainerStatuses:[]api.ContainerStatus(nil), ContainerStatuses:[]api.ContainerStatus(nil)}}, got: &api.Pod{TypeMeta:unversioned.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"foo", GenerateName:"", Namespace:"ns", SelfLink:"", UID:"", ResourceVersion:"8", Generation:0, CreationTimestamp:unversioned.Time{Time:time.Time{sec:0, nsec:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*unversioned.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]api.OwnerReference(nil), Finalizers:[]string(nil)}, Spec:api.PodSpec{Volumes:[]api.Volume(nil), InitContainers:[]api.Container(nil), Containers:[]api.Container(nil), RestartPolicy:"Always", TerminationGracePeriodSeconds:(*int64)(0xc8202e3698), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"ClusterFirst", NodeSelector:map[string]string(nil), ServiceAccountName:"", NodeName:"fakeNode", SecurityContext:(*api.PodSecurityContext)(0xc8208e5d40), ImagePullSecrets:[]api.LocalObjectReference(nil), Hostname:"", Subdomain:""}, Status:api.PodStatus{Phase:"", Conditions:[]api.PodCondition(nil), Message:"", Reason:"", HostIP:"", PodIP:"", StartTime:(*unversioned.Time)(nil), InitContainerStatuses:[]api.ContainerStatus(nil), ContainerStatuses:[]api.ContainerStatus(nil)}}

Can you please take a look? [I'm removing lgtm for now.]

@wojtek-t wojtek-t removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 7, 2016
@wojtek-t
Copy link
Member

@mqliang - will you find some time to debug this test?

@k8s-github-robot
Copy link

@mqliang PR needs rebase

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 14, 2016
@wojtek-t
Copy link
Member

Already fixed by #28966 - closing this one.

@wojtek-t wojtek-t closed this Jul 18, 2016
@mqliang mqliang deleted the cache-list-bug branch November 28, 2016 06:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants