Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: dynamic resource allocation prototype #1

Closed
wants to merge 58 commits into from

Conversation

pohly
Copy link
Owner

@pohly pohly commented Apr 29, 2022

The purpose of this PR is to have a place where the https://github.com/pohly/kubernetes/commits/dynamic-resource-allocation branch can be discussed and commented upon. The PR itself will never get merged.

To try out the new feature, run a local cluster:

$ make
$ FEATURE_GATES=DynamicResourceAllocation=true  RUNTIME_CONFIG=cdi.k8s.io/v1alpha1 CONTAINER_RUNTIME_ENDPOINT=unix:///var/run/crio/crio.sock LOG_LEVEL=6 ENABLE_CSI_SNAPSHOTTER=false API_SECURE_PORT=6444 ALLOW_PRIVILEGED=1 hack/local-up-cluster.sh -O

Run the example resource driver controller:

$ export KUBECONFIG=/var/run/kubernetes/admin.kubeconfig
$ go run ./test/integration/cdi/example-driver -v5 controller

Run the example kubelet plugin:

$ export KUBECONFIG=/var/run/kubernetes/admin.kubeconfig
$ sudo mkdir -p /var/run/cdi && sudo chmod a+rwx /var/run/cdi /var/lib/kubelet/plugins_registry
$ go run ./test/integration/cdi/example-driver --feature-gates ContextualLogging=true -v=6 kubelet-plugin

Create some objects:

$ kubectl create -f test/integration/cdi/example-driver/deploy/example/resourceclass.yaml 
resourceclass.cdi.k8s.io/example created
$ kubectl create -f test/integration/cdi/example-driver/deploy/example/pod-inline.yaml
pod/pause created
$ kubectl get pods
NAME    READY   STATUS    RESTARTS   AGE
pause   1/1     Running   0          12s
$ kubectl get resourceclaims
NAME             CLASSNAME   ALLOCATIONMODE   PHASE       AGE
pause-resource   example     Delayed          Allocated   81s

ardaguclu and others added 17 commits June 1, 2022 14:01
This PR adds validation to check that `dry-run` and `force` flags
are not used at the same time. Because when `force` flag is set,
`dry-run` is discarded and objects are replaced already.
Signed-off-by: zhoumingcheng <zhoumingcheng@beyondcent.com>
When adding functionality to the kubelet package and a test file, is
kind of painful to run unit tests today locally.

We usually can't run specifying the test file, as if xx_test.go and
xx.go use the same package, we need to specify all the dependencies. As
soon as xx.go uses the Kuebelet type (we need to do that to fake a
kubelet in the unit tests), this is completely impossible to do in
practice.

So the other option is to run the unit tests for the whole package or
run only a specific funtion. Running a single function can work in some
cases, but it is painful when we want to test all the functions we
wrote. On the other hand, running the test for the whole package is very
slow.

Today some unit tests try to connect to the API server (with retries)
create and list lot of pods/volumes, etc. This makes running the unit
test for the kubelet package slow.

This patch tries to make running the unit test for the whole package
more palatable. This patch adds a skip if the short version was
requested (go test -short ...), so we don't try to connect
to the API server or skip other slow tests.

Before this patch running the unit tests took in my computer (I've run
it several times so the compilation is already done):

	$ time go test -v
	real	0m21.303s
	user	0m9.033s
	sys	0m2.052s

With this patch it takes ~1/3 of the time:

	$ time go test -short -v
	real	0m7.825s
	user	0m9.588s
	sys	0m1.723s

Around 8 seconds is something I can wait to run the tests :)

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
As outlined in the KEP, we now graduate the Kubelet feature to beta
which means that it is enabled by default. The corresponding Kubelet
flag still defaults to `false`, but we now have the chance to e2e test
the feature by using a new serial test case.

KEP: kubernetes/enhancements#2413

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
The test validates the following endpoints

- createAppsV1NamespacedControllerRevision
- deleteAppsV1CollectionNamespacedControllerRevision
- deleteAppsV1NamespacedControllerRevision
- listAppsV1ControllerRevisionForAllNamespaces
- patchAppsV1NamespacedControllerRevision
- readAppsV1NamespacedControllerRevision
- replaceAppsV1NamespacedControllerRevision
The utils found in pkg/kubelet/cri/remote/utils are the same as the
ones in pkg/kubelet/utils, with the difference that the latter have
had a few improvements recently.

This commit removes the duplicated code.
We're not interested in checking the file permissions of the
symlink itself, but it's target's permissions.
Signed-off-by: Dave Chen <dave.chen@arm.com>
This makes ktesting more resilient against logging from leaked goroutines,
which is a problem that came up in kubelet node shutdown
tests (kubernetes#110854).
Update `godoc.org` to `pkg.go.dev ` in kubeadm
…mers

Move kubectl wait to informers with a cache to avoid hanging due to objects disappearing from the cluster
scheduler: do not update sched.nextStartNodeIndex when evaluate nominated node
@pohly pohly force-pushed the dynamic-resource-allocation branch from d05fb30 to 9c15ed5 Compare July 7, 2022 18:11
Remove SIG Scheduling approvers from reviewers
k8s-ci-robot and others added 8 commits July 7, 2022 12:46
kubeadm: De-dup the confirmation on the interactive cmds
…on-test

Write  ControllerRevisionLifecycleTest +7 Endpoints
…lt-beta

Graduate SeccompDefault feature to beta
…-file-permissions

agnhost: Check symlink target's permissions for Windows
Computation of the StorageVersionHash use overridden storage versions in unit test
- update all the import statements
- run hack/pin-dependency.sh to change pinned dependency versions
- run hack/update-vendor.sh to update go.mod files and the vendor directory
- update the method signatures for custom reporters

Signed-off-by: Dave Chen <dave.chen@arm.com>
chendave and others added 10 commits July 8, 2022 10:46
Signed-off-by: Dave Chen <dave.chen@arm.com>
Ginkgo is now writing the JUnit file itself. The -report-dir parameter is used
as fallback for enabling JUnit output in case that users haven't migrated to
the new -junit-report parameter.

Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>
Signed-off-by: Dave Chen <dave.chen@arm.com>
Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>
Signed-off-by: Dave Chen <dave.chen@arm.com>
Signed-off-by: Dave Chen <dave.chen@arm.com>
The alias for vendor/github.com/onsi/ginkgo/ginkgo ensures that code like
https://github.com/chendave/test-infra/blob/30e99cb2a97fdee7bb41316881cfbca91bb600db/experiment/kind-conformance-image-e2e.sh#L110
continues to work. The one without "vendor/" is there just in case that it
was used because it also worked.

Long term, "ginkgo" is a nicer, version independent alias. It gets used
internally to avoid future churn and gets documented also publicly in the
Makefile help.

The caveat is that there's no guarantee that a future v3 CLI will be compatible
with current invocations. But the most common usage is through
hack/ginkgo-e2e.sh, which can deal with such differences.
Ginkgo has been migrated to V2, add this to unwanted dependencies
so that it won't be shown up as a dep again in the future.

Signed-off-by: Dave Chen <dave.chen@arm.com>
…place

Validate dry-run and force flags can not be used same time in replace
add unit test coverage for pkg/kubelet/util/util_unix_test.go
…tests

pkg/kubelet: skip long test on short mode
@pohly pohly force-pushed the dynamic-resource-allocation branch from 9c15ed5 to 35bdb59 Compare July 8, 2022 06:21
k8s-ci-robot and others added 5 commits July 7, 2022 23:57
ResourceClaimSpec and its associated code are needed for the core API and have
to be defined there to avoid import cycles. Therefore the other types also get
defined there.
Created with "make generated_files update".
This is needed for "kubectl get". It depends on the generated swagger docs.
@pohly pohly force-pushed the dynamic-resource-allocation branch 4 times, most recently from af09c21 to f19615c Compare July 8, 2022 13:32
pohly added 5 commits August 1, 2022 20:15
The logic of the driver is very simple (no real allocation, just set some env
variables). The main purpose is to develop and test the code that integrates
with Kubernetes.
This is similar to the support code for generic ephemeral inline volumes.
Differences:
- to avoid stuttering, the functions are just resourceclaim.Name and
  resourceclaim.IsForPod
- resourceclaim.Name returns the right name for both cases (template
  and reference), which will simplify some code
The controller uses the exact same logic as the generic ephemeral inline volume
controller, just for inline ResourceClaimTemplate -> ResourceClaim.
The plugin handles the interaction with ResourceClaims that are referenced by a
Pod.
@pohly pohly force-pushed the dynamic-resource-allocation branch from f19615c to 3d85228 Compare August 1, 2022 18:15
@pohly
Copy link
Owner Author

pohly commented Aug 2, 2022

There is now an official PR for this code which replaced this PR here: kubernetes#111023

@pohly pohly closed this Aug 2, 2022
pohly pushed a commit that referenced this pull request Aug 24, 2022
* Add APF concurrency utilization test
pohly added a commit that referenced this pull request Oct 16, 2023
These were found with a modified klog that enables "go vet" to check klog call
parameters:

    cmd/kubeadm/app/features/features.go:149:4: printf: k8s.io/klog/v2.Warningf format %t has arg v of wrong type string (govet)
    			klog.Warningf("Setting deprecated feature gate %s=%t. It will be removed in a future release.", k, v)
    test/images/sample-device-plugin/sampledeviceplugin.go:147:5: printf: k8s.io/klog/v2.Errorf does not support error-wrapping directive %w (govet)
    				klog.Errorf("error: %w", err)
    test/images/sample-device-plugin/sampledeviceplugin.go:155:3: printf: k8s.io/klog/v2.Errorf does not support error-wrapping directive %w (govet)
    		klog.Errorf("Failed to add watch to %q: %w", triggerPath, err)
    staging/src/k8s.io/code-generator/cmd/prerelease-lifecycle-gen/prerelease-lifecycle-generators/status.go:207:5: printf: k8s.io/klog/v2.Fatalf does not support error-wrapping directive %w (govet)
    				klog.Fatalf("Package %v: unsupported %s value: %q :%w", i, tagEnabledName, ptag.value, err)
    staging/src/k8s.io/legacy-cloud-providers/vsphere/nodemanager.go:286:3: printf: (k8s.io/klog/v2.Verbose).Infof format %s reads arg #1, but call has 0 args (govet)
    		klog.V(4).Infof("Node %s missing in vSphere cloud provider cache, trying node informer")
    staging/src/k8s.io/legacy-cloud-providers/vsphere/nodemanager.go:302:3: printf: (k8s.io/klog/v2.Verbose).Infof format %s reads arg #1, but call has 0 args (govet)
    		klog.V(4).Infof("Node %s missing in vSphere cloud provider caches, trying the API server")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.