Cannot Get after Create using Client with Custom Resource #343

michaelkipper · 2019-02-28T15:42:59Z

Hi!

I'm tearing my hair out trying to debug this issue.

func testCreation(c client.Client) error {
	config := getTestCuscoServiceConfig()

	err := c.Create(context.Background(), &config)
	if err != nil {
		return errors.Wrap(err, "Could not create CuscoServiceConfig")
	}

	new_csc := v1.CuscoServiceConfig{}
	err = c.Get(context.Background(), client.ObjectKey{Name: "test-config", Namespace: "cusco"}, &new_csc)
	if err != nil {
		return errors.Wrap(err, "Could not get CuscoServiceConfig")
	}

	return nil
}

When run, I get this error:

Could not get CuscoServiceConfig: CuscoServiceConfig.cusco.shopify.io \"test-config\" not found"

Outside the program, though:

$ kubectl get cuscoserviceconfigs test-config -n cusco -o yaml
apiVersion: cusco.shopify.io/v1
kind: CuscoServiceConfig
metadata:
  creationTimestamp: "2019-02-28T15:35:09Z"
  generation: 1
  name: test-config
  namespace: cusco
  resourceVersion: "21614"
  selfLink: /apis/cusco.shopify.io/v1/namespaces/cusco/cuscoserviceconfigs/test-config
  uid: 6e8bb6e3-3b6e-11e9-932e-02423b5580eb
spec:
  init-container:
    enabled: false
    percentage: 75
  istio-version: 1.1.0
  proxy:
    enabled: true
    percentage: 25

Stepping through the code, my theory is that the Create call doesn't store the object in the cache, and the Get call only looks at the cache.

How should I go about debugging this?
Is this because CuscoServiceConfig is a CustomResource?

The text was updated successfully, but these errors were encountered:

DirectXMan12 · 2019-02-28T19:21:28Z

In a controller, you always write to the API server and read from the cache (unless you manually construct a Client). You can't expect to read the object you wrote immediately (and, in fact, there's generally a bit of delay). What are you trying to accomplish with the immediate Get? You'll get back the created object as part of the call to create.

EDIT: clarification for posterity

michaelkipper · 2019-02-28T19:26:28Z

@DirectXMan12: This was a snippet of something that's happening in an integration test that's failing. I would have expected read-after-write to be strongly consistent.

If this is not a guarantee, I can always drop a WaitForResource into the test.

DirectXMan12 · 2019-02-28T19:43:33Z

how did you get the client for the integration test?

DirectXMan12 · 2019-02-28T19:45:06Z

direct clients (client.New) should have the property you desire, since they always talk to the API server (I figured that was a snippet from a controller -- sorry for the confusion)

michaelkipper · 2019-02-28T20:14:52Z

The client comes from manager.New(clientConfig, *managerOptions)

A rough snippet:

import (
	"sigs.k8s.io/controller-runtime/pkg/client/config"
	"sigs.k8s.io/controller-runtime/pkg/manager"
)

mgrOptions := &manager.Options{
	MetricsBindAddress: *metricsAddr,
	Scheme:             scheme.Scheme,
}
...
clientConfig, err := config.GetConfig()
mgr, err := manager.New(clientConfig, *managerOptions)
...
client := mgr.GetClient()

DirectXMan12 · 2019-03-04T21:50:27Z

ah, yeah, this is probably a bit of a confusing statement, but don't use the manager client in tests. The manager-provided client is designed to do the right thing for controllers by default (which is to read from caches, meaning that it's not strongly consistent), which means it probably does the wrong thing for tests (which almost certainly want strong consistency).

The suggested pattern is to construct a new client using client.New for tests (which will provide you with a direct API client that doesn't have any special behavior).

This is something that we need to document better, but generally, my *_suite_test.go files (the general suite setup for a particular package) look like:

var testenv *envtest.Environment
var cfg *rest.Config
var client client.Client

var _ = BeforeSuite(func() {
    logf.SetLogger(zap.LoggerTo(GinkgoWriter, true))

    testenv = &envtest.Environment{}

    var err error
    cfg, err = testenv.Start()
    Expect(err).NotTo(HaveOccurred())

    client, err = client.New(cfg, client.Options{Scheme: myscheme.Scheme})
    Expect(err).NotTo(HaveOccurred())
})

var _ = AfterSuite(func() {
    testenv.Stop()
})

(this is very similar to a lot of the test suite setup files in controller-runtime itself)

shawn-hurley · 2019-03-04T21:53:13Z

Might be easier to document if we can instead tell folks to use manger.APIReader from this PR: #327

thoughts?

DirectXMan12 · 2019-03-04T21:53:56Z

yeah, once we get that merged, we can recommend that as well, although constructing a new client may be easier in some circumstances.

DirectXMan12 · 2019-03-04T22:00:59Z

(also, in case it wasn't clear, while it's fine (and generally preferable) to write tests expecting read-after-write consistency, you shouldn't do that for controllers. Kubernetes favors a kind-of 2-phase approach -- do your reads, process, do some writes, and return, letting the requeuing deal with the next cycle of reads if they're necessary, but don't try to read an object after you've written it).

DirectXMan12 · 2019-03-04T22:16:01Z

@michaelkipper does that answer your question?

runzexia · 2019-03-13T06:20:17Z

My first Reconciler completed the creation of the job and triggered the second Reconciler.
But when I get the job at the second Reconciler, there will be some time 404.
@DirectXMan12
user create my crd -> if job not exist -> get job -> create job

DirectXMan12 · 2019-03-14T20:22:44Z

if it's actually the create that's triggering the second reconciler, that shouldn't happen.

cizixs · 2019-04-29T12:40:30Z

In my case, I can not get the CRD resource, but list returns correct values.

// this works fine
myCRDList := &foov1beta1.FooList{}
g.Eventually(func() error { return c.List(context.TODO(), &client.ListOptions{}, myCRDList) }).Should(gomega.Succeed())

// this returns 404 not found error
foo := &foov1beta1.Foo{}
depKey := types.NamespacedName{Name: "foo-1")}
g.Eventually(func() error { return c.Get(context.TODO(), depKey, foo) }, timeout).Should(gomega.Succeed())

Is cache-reading relevant here?

@DirectXMan12

DirectXMan12 · 2019-06-04T00:59:50Z

you shouldn't be reading from the cache in a test -- don't use mgr.GetClient in your test methods. Instead, use client.New to get a direct client.

fejta-bot · 2019-09-02T02:02:16Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

DirectXMan12 · 2019-09-03T18:30:29Z

closing this as no response

edwardstudy · 2019-11-04T08:43:01Z

@DirectXMan12 Sorry about interrupting this closed issue.

I had the same situation that we can list resources but could not get the resource, and these actions were not happening after creations. Do we have any different path from local cache between List() and Get().

In my case, I can not get the CRD resource, but list returns correct values.

// this works fine
myCRDList := &foov1beta1.FooList{}
g.Eventually(func() error { return c.List(context.TODO(), &client.ListOptions{}, myCRDList) }).Should(gomega.Succeed())

// this returns 404 not found error
foo := &foov1beta1.Foo{}
depKey := types.NamespacedName{Name: "foo-1")}
g.Eventually(func() error { return c.Get(context.TODO(), depKey, foo) }, timeout).Should(gomega.Succeed())

Is cache-reading relevant here?

@DirectXMan12

DirectXMan12 · 2019-11-05T22:23:38Z

Do we have any different path from local cache between List() and Get().

No, that shouldn't happen. Is Foo namespaced?

Aside: I'd highly reccomend using a live client in the test bodies themselves, otherwise you'll get flaky tests (the standard caching client is fine in the controllers themselves, but for your test assertions, the live client is much better generally).

edwardstudy · 2019-11-07T09:04:22Z

@DirectXMan12 Hi, they're kube native resources. I could not get specified StatefulSet but can list them. And yes, them are namespaced resources. And it's in our reconciliation loop.

DirectXMan12 · 2019-12-06T01:26:45Z

since they're namespaced, you'll need depKey := types.NamespacedName{Name: "foo-1", Namespace: whateverTheNamespaceIs}

edwardstudy · 2020-01-16T08:25:06Z

@DirectXMan12 Sorry for late. correct my sentence that my request had right namespace. The point is that I could list them, but could not get it.

As the issue suggested, I got the resource from the direct client for a workaround.

Fix domain resource issues in controller

nobody4t · 2020-06-07T13:48:39Z

I have the same issue.
I created the controller, with clientset code generated. Its kubernetes version is 1.13.1.

Now I have my operator,based on operator-sdk 0.15.1. I try to get/list the controller CRD from my controller. But I can not. The operator uses this api. The result is empty, both of them as below:
List:

&{{ } {      0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] []  []} { false }}

Get:

&{{ } {   <nil>} []}

But I can use the clientset of the controller get the right result.

What confuses me is I can get/list all the CRD with kubectl, but not with this api. Not sure if I missed something?

* Use a KubeClient in EnvTest that always hits APIServer to avoid cache latency and inconsistency issues that can cause races and intermittent test failures. See kubernetes-sigs/controller-runtime#1464 and kubernetes-sigs/controller-runtime#343 for details. * Write Status after Spec. This ensures that tests waiting for a status update cannot possibly see it so fast that they go on to perform a write that conflicts with the Spec write of the controller. * Improve log messages to be clearer (aids in test debugging).

* Reduce test flakiness * Use a KubeClient in EnvTest that always hits APIServer to avoid cache latency and inconsistency issues that can cause races and intermittent test failures. See kubernetes-sigs/controller-runtime#1464 and kubernetes-sigs/controller-runtime#343 for details. * Write Status after Spec. This ensures that tests waiting for a status update cannot possibly see it so fast that they go on to perform a write that conflicts with the Spec write of the controller. * Improve log messages to be clearer (aids in test debugging). * Verify that finalizer is set before delete Prior to performing any deletion actions (either monitoring or deletion), verify that the finalizer is still set. If it's not set, immediately allow the k8s resource to be deleted. Don't take any action on the Azure resource in this instance.

johnugeorge mentioned this issue Mar 6, 2019

Retry studyjobcontroller kubeflow/katib#411

Closed

edwardstudy mentioned this issue Jul 19, 2019

Add a restart case for e2e test cloudfoundry-incubator/quarks-operator#420

Merged

oliviabarrick mentioned this issue Jul 19, 2019

Do not use cached client in controllers kudobuilder/kudo#602

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 2, 2019

DirectXMan12 closed this as completed Sep 3, 2019

DirectXMan12 pushed a commit that referenced this issue Jan 31, 2020

Merge pull request #343 from fanzhangio/fix-domain

d0f783b

Fix domain resource issues in controller

theishshah mentioned this issue Feb 5, 2020

📖: Add note about split client cache inconsistency #790

Merged

harveyxia mentioned this issue Apr 30, 2020

Manager default client does not provide strong read-after-write consistency solo-io/skv2#27

Open

moolen mentioned this issue Feb 15, 2021

Don't recommend using manager.GetClient in tests kubernetes-sigs/kubebuilder#2018

Closed

leovct mentioned this issue Mar 6, 2021

📖 Use client.New instead of manager client in tests kubernetes-sigs/kubebuilder#2066

Merged

mjaow mentioned this issue May 31, 2021

Get with resouce not found right after creating it successfully? #1543

Closed

RainbowMango mentioned this issue Jul 20, 2021

fix informer cache delay issue karmada-io/karmada#545

Merged

matthchr mentioned this issue Nov 5, 2021

Improve test reliability Azure/azure-service-operator#1960

Merged

2 tasks

evanli18 mentioned this issue Mar 14, 2022

Fix(health): terraform conflict kubevela/kubevela#3419

Closed

joejulian mentioned this issue Dec 6, 2022

operator: tests should use their own k8s client redpanda-data/redpanda#7638

Closed

6 tasks

jonathan-innis mentioned this issue Dec 24, 2022

Question: How to use fieldSelectors for CRDs during testing #2112

Closed

Aisuko mentioned this issue Apr 13, 2023

Replace envtest with fakeclient meshery/meshery-operator#457

Merged

1 task

moolen mentioned this issue Jul 17, 2023

Use the cached client for the controller tests external-secrets/external-secrets#2501

Closed

5 tasks

This was referenced Nov 17, 2024

RHOAIENG-15772: tests(nbcs): use the direct (uncached) client to drive tests opendatahub-io/kubeflow#461

Merged

tests(notebook-controller): use the direct (uncached) client to drive tests kubeflow/kubeflow#7663

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot Get after Create using Client with Custom Resource #343

Cannot Get after Create using Client with Custom Resource #343

michaelkipper commented Feb 28, 2019

DirectXMan12 commented Feb 28, 2019 •

edited

Loading

michaelkipper commented Feb 28, 2019

DirectXMan12 commented Feb 28, 2019

DirectXMan12 commented Feb 28, 2019 •

edited

Loading

michaelkipper commented Feb 28, 2019

DirectXMan12 commented Mar 4, 2019 •

edited

Loading

shawn-hurley commented Mar 4, 2019 •

edited

Loading

DirectXMan12 commented Mar 4, 2019

DirectXMan12 commented Mar 4, 2019

DirectXMan12 commented Mar 4, 2019

runzexia commented Mar 13, 2019

DirectXMan12 commented Mar 14, 2019

cizixs commented Apr 29, 2019 •

edited

Loading

DirectXMan12 commented Jun 4, 2019

fejta-bot commented Sep 2, 2019

DirectXMan12 commented Sep 3, 2019

edwardstudy commented Nov 4, 2019

DirectXMan12 commented Nov 5, 2019

edwardstudy commented Nov 7, 2019 •

edited

Loading

DirectXMan12 commented Dec 6, 2019

edwardstudy commented Jan 16, 2020

nobody4t commented Jun 7, 2020

Cannot Get after Create using Client with Custom Resource #343

Cannot Get after Create using Client with Custom Resource #343

Comments

michaelkipper commented Feb 28, 2019

DirectXMan12 commented Feb 28, 2019 • edited Loading

michaelkipper commented Feb 28, 2019

DirectXMan12 commented Feb 28, 2019

DirectXMan12 commented Feb 28, 2019 • edited Loading

michaelkipper commented Feb 28, 2019

DirectXMan12 commented Mar 4, 2019 • edited Loading

shawn-hurley commented Mar 4, 2019 • edited Loading

DirectXMan12 commented Mar 4, 2019

DirectXMan12 commented Mar 4, 2019

DirectXMan12 commented Mar 4, 2019

runzexia commented Mar 13, 2019

DirectXMan12 commented Mar 14, 2019

cizixs commented Apr 29, 2019 • edited Loading

DirectXMan12 commented Jun 4, 2019

fejta-bot commented Sep 2, 2019

DirectXMan12 commented Sep 3, 2019

edwardstudy commented Nov 4, 2019

DirectXMan12 commented Nov 5, 2019

edwardstudy commented Nov 7, 2019 • edited Loading

DirectXMan12 commented Dec 6, 2019

edwardstudy commented Jan 16, 2020

nobody4t commented Jun 7, 2020

DirectXMan12 commented Feb 28, 2019 •

edited

Loading

DirectXMan12 commented Feb 28, 2019 •

edited

Loading

DirectXMan12 commented Mar 4, 2019 •

edited

Loading

shawn-hurley commented Mar 4, 2019 •

edited

Loading

cizixs commented Apr 29, 2019 •

edited

Loading

edwardstudy commented Nov 7, 2019 •

edited

Loading