Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTA-222: Add a manifest annotation to be used for object deletion #438

Merged

Conversation

jottofar
Copy link
Contributor

@jottofar jottofar commented Aug 17, 2020

The new manifest annotation will trigger new CVO delete logic, added by this PR, to delete the in-cluster object instead of creating/updating it. This will provide a more straightforward way for engineering to remove content.

See Enhancement for adding a manifest annotation for object removal
#419
for implementation details.

Per the enhancement whenever object deletions are pending, one or more deletions have been requested but object removal has not completed, the cluster is blocked from upgrading to the next minor release (Upgradeable==false).

if an object previously removed reappears a warning is logged which meets what is specified by the enhancement. This does not result in Upgradeable==false and no attempt is made to remove the object again.

Regarding implementation of enhancement section Upgrade / Downgrade Strategy, since an upgrade can result in >1 CVO restarts there is no straightforward way to distinguish the cause of an IsNotFound error. That is, discerning whether a given object being requested for deletion was ever present in the release or had been removed by a previous CVO pod. A warning is logged whenever an object specified for deletion is not found, and no indication exists in the current CVO pod that an object a delete request had been issued.

Since k8s.io v1beta1 APIs are no longer served In kuberentes v1.22 and are being removed via #545 delete is not implemented for these types. Users are expected to use v1 apiVersion in these delete manifests.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 17, 2020
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 17, 2020
@jottofar jottofar force-pushed the ota-222-manifest-delete branch 4 times, most recently from bff3f3e to 83ceaa0 Compare August 19, 2020 19:40
@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 19, 2020
@jottofar jottofar force-pushed the ota-222-manifest-delete branch from 83ceaa0 to 42c6d09 Compare August 19, 2020 20:27
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 19, 2020
@jottofar jottofar force-pushed the ota-222-manifest-delete branch 3 times, most recently from 15720a8 to cf015af Compare August 20, 2020 21:33
@jottofar
Copy link
Contributor Author

/retest

@jottofar jottofar force-pushed the ota-222-manifest-delete branch 2 times, most recently from f758f44 to e73b06a Compare August 21, 2020 21:49
@openshift-ci-robot
Copy link
Contributor

@jottofar: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-upgrade e73b06a link /test e2e-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

wking added a commit to wking/origin that referenced this pull request Dec 8, 2020
Until we can remove that namespace.  Some background in [1].  We
cannot remove the namespace until we land [2].

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1875900
[2]: openshift/cluster-version-operator#438
@openshift-merge-robot
Copy link
Contributor

@jottofar: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-agnostic e73b06a link /test e2e-agnostic
ci/prow/e2e-agnostic-upgrade e73b06a link /test e2e-agnostic-upgrade
ci/prow/e2e-agnostic-operator e73b06a link /test e2e-agnostic-operator

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 10, 2021
@jottofar jottofar force-pushed the ota-222-manifest-delete branch 2 times, most recently from 6f3cb50 to 8478432 Compare April 5, 2021 20:42
@jottofar
Copy link
Contributor Author

jottofar commented Apr 8, 2021

/test unit

@jottofar
Copy link
Contributor Author

jottofar commented Apr 8, 2021

/test e2e-agnostic-upgrade

@jottofar jottofar force-pushed the ota-222-manifest-delete branch 4 times, most recently from fd89f4d to 37814ce Compare April 9, 2021 13:41
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

16 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@jottofar
Copy link
Contributor Author

jottofar commented Jun 21, 2021

/cc @wking

Can I get an override on /test e2e-agnostic-upgrade. Failures are unrelated to my PR and it's unchanged since passing this test before.

@openshift-ci openshift-ci bot requested a review from wking June 21, 2021 14:27
@jottofar
Copy link
Contributor Author

/override ci/prow/e2e-agnostic-upgrade

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 21, 2021

@jottofar: Overrode contexts on behalf of jottofar: ci/prow/e2e-agnostic-upgrade

In response to this:

/override ci/prow/e2e-agnostic-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit d583781 into openshift:master Jun 21, 2021
wking added a commit to wking/cluster-version-operator that referenced this pull request Oct 7, 2021
…=False

Because:

  Upgradeable=False

    Reason: MultipleReasons
    Message: Cluster cannot be upgraded between minor versions for multiple reasons: AdminAckRequired,IncompatibleOperatorsInstalled

doesn't include all the useful information needed to resolve those
issues.  This pivots to using the same approach we use today when
aggregating multiple Upgradeable=False ClusterOperators.

Also shift the patch-blocking overrides check in front of the other
minor blocking checks.  It's more important (because it blocks more
targets), and it also will not included a nested sub-list (which might
be confusing, because today I'm not adding indents or any such markers
to make it easy to distinguish levels in a nested list hierarchy).

This cherry-picks 0c742e5 (pkg/cvo/upgradeable: Include messages
for multiple-reason Upgradeable=False, 2021-10-07, openshift#671) back to 4.8,
with a minor context change because 4.8 doesn't have 0afb8a8 (Add a
manifest annotation to be used for object deletion, 2020-08-17, openshift#438).
wking added a commit to wking/cluster-version-operator that referenced this pull request Oct 8, 2021
…=False

Because:

  Upgradeable=False

    Reason: MultipleReasons
    Message: Cluster cannot be upgraded between minor versions for multiple reasons: AdminAckRequired,IncompatibleOperatorsInstalled

doesn't include all the useful information needed to resolve those
issues.  This pivots to using the same approach we use today when
aggregating multiple Upgradeable=False ClusterOperators.

Also shift the patch-blocking overrides check in front of the other
minor blocking checks.  It's more important (because it blocks more
targets), and it also will not included a nested sub-list (which might
be confusing, because today I'm not adding indents or any such markers
to make it easy to distinguish levels in a nested list hierarchy).

This cherry-picks 0c742e5 (pkg/cvo/upgradeable: Include messages
for multiple-reason Upgradeable=False, 2021-10-07, openshift#671) back to 4.8,
with a minor context change because 4.8 doesn't have 0afb8a8 (Add a
manifest annotation to be used for object deletion, 2020-08-17, openshift#438).
wking added a commit to wking/cluster-version-operator that referenced this pull request May 4, 2022
I'd accidentally broken this in e8ffccb (lib: Add autogeneration
for some resource* functionality, 2020-07-29, openshift#420), when I moved the
health checks out of the Do methods and began passing typedObject into
them.  typedObject is the release manifest, which lacks the status
attributes we want the health checks to cover.

This also catches the generation script up with some past manual
changes:

* 0afb8a8 (Add a manifest annotation to be used for object
  deletion, 2020-08-17, openshift#438).
* 05e1af7 (Log resource diffs on update only in reconcile mode,
  2021-07-13, openshift#628).

The always-fatal ApplyAPIServicev1 function gives us the required
symmetry with 0afb8a8's DeleteAPIServicev1, although I'm a bit
fuzzy on who would have been using an APIService manifest.  I'm not
excited about supporting a new manifest kind I know nothing about, and
failing all attempts at non-delete reconciling should protect us from
accidentally encouraging manifest authors to feed us this kind.
wking added a commit to wking/cluster-version-operator that referenced this pull request May 4, 2022
I'd accidentally broken this in e8ffccb (lib: Add autogeneration
for some resource* functionality, 2020-07-29, openshift#420), when I moved the
health checks out of the Do methods and began passing typedObject into
them.  typedObject is the release manifest, which lacks the status
attributes we want the health checks to cover.

This also catches the generation script up with some past manual
changes:

* 0afb8a8 (Add a manifest annotation to be used for object
  deletion, 2020-08-17, openshift#438).
* 05e1af7 (Log resource diffs on update only in reconcile mode,
  2021-07-13, openshift#628).

The always-fatal ApplyAPIServicev1 function gives us the required
symmetry with 0afb8a8's DeleteAPIServicev1, although I'm a bit
fuzzy on who would have been using an APIService manifest.  I'm not
excited about supporting a new manifest kind I know nothing about, and
failing all attempts at non-delete reconciling should protect us from
accidentally encouraging manifest authors to feed us this kind.
wking added a commit to wking/cluster-version-operator that referenced this pull request May 4, 2022
I'd accidentally broken this in e8ffccb (lib: Add autogeneration
for some resource* functionality, 2020-07-29, openshift#420), when I moved the
health checks out of the Do methods and began passing typedObject into
them.  typedObject is the release manifest, which lacks the status
attributes we want the health checks to cover.

This also catches the generation script up with some past manual
changes:

* 0afb8a8 (Add a manifest annotation to be used for object
  deletion, 2020-08-17, openshift#438).
* 05e1af7 (Log resource diffs on update only in reconcile mode,
  2021-07-13, openshift#628).

The always-fatal ApplyAPIServicev1 function gives us the required
symmetry with 0afb8a8's DeleteAPIServicev1, although I'm a bit
fuzzy on who would have been using an APIService manifest.  I'm not
excited about supporting a new manifest kind I know nothing about, and
failing all attempts at non-delete reconciling should protect us from
accidentally encouraging manifest authors to feed us this kind.
wking added a commit to wking/cluster-version-operator that referenced this pull request May 9, 2022
The original APIService reconciliation landed in 662e182 (handle
APIService, Service objects, 2018-09-27, openshift#26).  We definitely
reconcile a bunch of Service manifests (e.g. for serving operator
metrics, including the CVO's own Service), but it's not clear what the
use case was for APIService.

DeleteAPIServicev1 landed in 0afb8a8 (Add a manifest annotation to
be used for object deletion, 2020-08-17, openshift#438), but was never
consumed.  We dropped APIService reconciliation support in 5681a70
(Drop APIService support, 2021-06-10, openshift#566).  This commit drops the
unused DeleteAPIServicev1.  Auditing z-stream tips from 4.3 through
4.11:

  $ oc adm release extract --to 4.11 quay.io/openshift-release-dev/ocp-release-nightly@sha256:080e5cf5e3e043ac0877b8f545ba2b596016f3fd76f3f457d15060603b3615e1
  $ oc adm release extract --to 4.10 quay.io/openshift-release-dev/ocp-release:4.10.13-x86_64
  $ oc adm release extract --to 4.9 quay.io/openshift-release-dev/ocp-release:4.9.32-x86_64
  $ oc adm release extract --to 4.8 quay.io/openshift-release-dev/ocp-release:4.8.39-x86_64
  $ oc adm release extract --to 4.7 quay.io/openshift-release-dev/ocp-release:4.7.50-x86_64
  $ oc adm release extract --to 4.6 quay.io/openshift-release-dev/ocp-release:4.6.57-x86_64
  $ oc adm release extract --to 4.5 quay.io/openshift-release-dev/ocp-release:4.5.41-x86_64
  $ oc adm release extract --to 4.4 quay.io/openshift-release-dev/ocp-release:4.4.33-x86_64
  $ oc adm release extract --to 4.3 quay.io/openshift-release-dev/ocp-release:4.3.40-x86_64
  $ grep -r 'kind: APIService' 4.*
  ...no hits...
wking added a commit to wking/cluster-version-operator that referenced this pull request May 9, 2022
I'd accidentally broken this in e8ffccb (lib: Add autogeneration
for some resource* functionality, 2020-07-29, openshift#420), when I moved the
health checks out of the Do methods and began passing typedObject into
them.  typedObject is the release manifest, which lacks the status
attributes we want the health checks to cover.

This also catches the generation script up with some past manual
changes:

* 0afb8a8 (Add a manifest annotation to be used for object
  deletion, 2020-08-17, openshift#438).
* 05e1af7 (Log resource diffs on update only in reconcile mode,
  2021-07-13, openshift#628).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants