Document API architectural approach for soundness and consistency #41954

bgrant0607 · 2016-08-16T18:10:22Z

There are a number of distributed-systems challenges with our API, which is:

eventually consistent (no guarantee about when a mutation will be observed),
weakly consistent (no guarantee that mutations will be observed in order), and
transactional only for individual resources.

Cases:

I modify resource A and watch A. How can I tell when I've observed my update, assuming I'm not the only actor?
I modify resource A and resource B. How can I tell when I've observed both updates, assuming I'm not the only actor?
I modify resource A and resource B. How can I tell when the controller managing resource B has observed the update to resource A?
I create resources A, B, ..., Z. How can I tell when I've observed the creation of all of those resources, assuming I'm not the only actor (e.g., some resources might quickly be deleted by another agent)?
More concrete: I'm the ReplicaSet controller. How can I ensure that I update ReplicaSet status with the most up-to-date pod status in a HA, master-elected configuration, and am not?

There are good reasons for the weak consistency semantics of the API, such as composability with add-on controllers, federated APIs, sharded storage, multiple layers of caches, etc.

The typical means of providing strong consistency is to provide all clients direct access to the database. That's not a viable approach for Kubernetes.

However, we probably have zero implementations of sound clients at the moment.

Examples of mechanisms we've discussed and/or partially implemented that would help:

per-resource sequence numbers (generation) Create per-object sequence number and report last value seen in status of each object kubernetes#7328
sequence number acknowledgement by controllers (observedGeneration) Write proposal for controller pod management: adoption, orphaning, ownership, etc. (aka controllers v2) kubernetes#14961, RC shouldn't update observedGeneration with pending creates/deletes kubernetes#25170
per-resource leader-election sequence numbers actors participating in leaderelection should annotate objects with a sequence number of the leadership kubernetes#22007

But we should think about the problem holistically.

This is a prereq to kubernetes/kubernetes#1957

cc @lavalamp @pwittrock @erictune

lavalamp · 2016-08-16T22:11:22Z

Actually it's better than I expected.

Cases:

I modify resource A and watch A. How can I tell when I've observed
my update, assuming I'm not the only actor?

The modification gives you back a ResourceVersion, which you can compare
(via equality) with RVs coming down your watch. Unfortunately you'd have to
expand this to compare via < to handle the general case where you have to
restart your watch--we currently claim that you cannot do this comparison,
although in practice it is currently safe if you confine yourself to a
single resource type.

I modify resource A and resource B. How can I tell when I've
observed both updates, assuming I'm not the only actor?

Same answer, just tracking RV per-resource.

I modify resource A and resource B. How can I tell when the
controller managing resource B has observed the update to resource A?

No general way at the moment.

I create resources A, B, ..., Z. How can I tell when I've observed
the creation of all of those resources, assuming I'm not the only actor
(e.g., some resources might quickly be deleted by another agent)?

Same as first two answers.

More concrete: I'm the ReplicaSet controller. How can I ensure that
I update ReplicaSet status with the most up-to-date pod status in a HA,
master-elected configuration, and am not?

I'm not 100% sure I follow the question, I think the ending is garbled?
But if you meant something like what I expect, I think we can extend the
precondition concept to support this.

adohe-zz · 2016-08-16T23:53:46Z

/subscribe

caesarxuchao · 2016-10-05T17:34:01Z

/sub

bgrant0607 · 2016-10-07T22:28:01Z

Somewhat related: kubernetes/kubernetes#34363

lavalamp · 2016-10-07T22:34:27Z

Fixed my email-garbled comment above.

smarterclayton · 2017-06-14T16:20:57Z

One more - preconditions on deletion and other actions. A resourceVersion precondition on delete, for example.

ash2k · 2017-08-03T12:29:58Z

resourceVersion precondition on delete

Would be very useful. Right now there is a race between delete and any other operation that updates the object.
E.g. a controller that owns an object (has a controller owner reference pointing to it) cannot safely delete it because something else may change ownership concurrently. I don't know if it is an issue that happens in practice in kubernetes codebase but it is an issue for Custom Resource controllers.

fejta-bot · 2018-01-02T07:54:38Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

bgrant0607 · 2018-01-09T06:20:53Z

/lifecycle frozen

nikhita · 2018-03-04T05:55:36Z

/remove-lifecycle stale

warmchang · 2019-05-09T07:39:23Z

Hi, is there any plan for this? Thx!

lavalamp · 2019-08-06T16:28:37Z

One more - preconditions on deletion and other actions. A resourceVersion precondition on delete, for example.

For those following along at home, we have this now.

lavalamp · 2019-08-06T16:38:58Z

I'd like to update my answers above considering the large amount of change the system has undergone.

I modify resource A and watch A. How can I tell when I've observed
my update, assuming I'm not the only actor?

Record the metadata.generation returned; when you observe and update with a generation >= that one, you've observed your change.

I modify resource A and resource B. How can I tell when I've
observed both updates, assuming I'm not the only actor?

Same answer, just tracking RV per-resource.

I modify resource A and resource B. How can I tell when the
controller managing resource B has observed the update to resource A?

We rely on the controller author to do something useful like record the observed generation.

I create resources A, B, ..., Z. How can I tell when I've observed
the creation of all of those resources, assuming I'm not the only actor
(e.g., some resources might quickly be deleted by another agent)?

We don't offer a way to detect an "after" relationship with a deletion. But we do now offer both UID and RV deletion preconditions, so folks doing the deletion no longer risk losing a change accidentally.

More concrete: I'm the ReplicaSet controller. How can I ensure that
I update ReplicaSet status with the most up-to-date pod status in a HA,
master-elected configuration, and am not?

I have unpublished drafts on individual object locking for controllers-- it involves an annotation with an "I hold lock X" assertion + a webhook which does a consistent read of the lock object to confirm.

bgrant0607 · 2019-08-06T16:49:22Z

Thanks for the updates.

Is metadata.generation updated for all resource types?

lavalamp · 2019-08-06T17:10:13Z

It is not 100% automated, so it's possible for an individual resource to do it wrong, but we would treat that as an important bug.

thockin · 2022-08-19T17:34:22Z

Closing old issues that are unlikely to be useful any further.

logicalhan · 2022-08-19T17:44:46Z

Closing old issues that are unlikely to be useful any further.

I actually believe this issue is still useful, we need documentation on the types of guarantees we provide for which API calls and what combinations of API calls one would need to make in order to preserve data integrity at a certain level.

logicalhan · 2022-08-19T17:51:20Z

To some extent, this issue served as documentation.

thockin · 2022-08-19T18:09:09Z

Can we turn it into documentation? Or re-open it as a request to write such documentation?

logicalhan · 2022-08-19T19:22:48Z

Can we turn it into documentation? Or re-open it as a request to write such documentation?

Yeah that makes sense.

sftim · 2023-07-09T16:34:15Z

For some more context, read: Life Beyond Distributed Transactions: An apostate's opinion

/reopen

k8s-ci-robot · 2023-07-09T16:34:20Z

@sftim: Reopened this issue.

In response to this:

For some more context: https://queue.acm.org/detail.cfm?id=3025012

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2023-07-09T16:34:26Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sftim · 2023-07-09T16:34:57Z

/transfer website
/sig docs
/kind documentation

sftim · 2023-07-09T16:36:06Z

/retitle Document API architectural approach for soundness and consistency

Some parts of this work might end up in https://k8s.dev/docs/

sftim · 2023-07-09T16:36:29Z

/language en
/wg api-expression
/triage accepted
/lifecycle frozen
/priority important-longterm

sftim · 2023-07-09T16:36:49Z

/remove-priority backlog

k8s-triage-robot · 2024-07-08T16:41:44Z

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

bgrant0607 added priority/backlog Higher priority than priority/awaiting-more-evidence. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Aug 16, 2016

bgrant0607 added the sig/service-catalog Categorizes an issue or PR as relevant to SIG Service Catalog. label Sep 29, 2016

bgrant0607 mentioned this issue Oct 7, 2016

Facilitate API orchestration kubernetes/kubernetes#34363

Open

ash2k mentioned this issue Aug 3, 2017

Delete operation should be concurrency safe atlassian/smith#118

Open

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 2, 2018

k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jan 9, 2018

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 4, 2018

bgrant0607 mentioned this issue May 31, 2018

Promote watch e2e test to conformance kubernetes/kubernetes#61424

Merged

seh mentioned this issue Feb 2, 2019

Delete: needs more preconditions kubernetes/kubernetes#73648

Closed

bgrant0607 mentioned this issue Feb 19, 2019

WATCH from RV "" or "0" breaks watch order invariant kubernetes/kubernetes#74022

Closed

thockin closed this as completed Aug 19, 2022

k8s-ci-robot reopened this Jul 9, 2023

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jul 9, 2023

k8s-ci-robot added the sig/docs Categorizes an issue or PR as relevant to SIG Docs. label Jul 9, 2023

k8s-ci-robot transferred this issue from kubernetes/kubernetes Jul 9, 2023

k8s-ci-robot changed the title ~~Make it possible to write a sound client from a distributed-systems perspective~~ Document API architectural approach for soundness and consistency Jul 9, 2023

k8s-ci-robot removed the priority/backlog Higher priority than priority/awaiting-more-evidence. label Jul 9, 2023

divya-mohan0209 added this to SIG Docs Longterm issues Aug 17, 2023

divya-mohan0209 moved this to In Progress in SIG Docs Longterm issues Aug 17, 2023

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jul 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document API architectural approach for soundness and consistency #41954

Document API architectural approach for soundness and consistency #41954

bgrant0607 commented Aug 16, 2016 •

edited

Loading

lavalamp commented Aug 16, 2016 •

edited

Loading

adohe-zz commented Aug 16, 2016 •

edited by tmrts

Loading

caesarxuchao commented Oct 5, 2016

bgrant0607 commented Oct 7, 2016

lavalamp commented Oct 7, 2016

smarterclayton commented Jun 14, 2017

ash2k commented Aug 3, 2017

fejta-bot commented Jan 2, 2018

bgrant0607 commented Jan 9, 2018

nikhita commented Mar 4, 2018

warmchang commented May 9, 2019

lavalamp commented Aug 6, 2019

lavalamp commented Aug 6, 2019

bgrant0607 commented Aug 6, 2019

lavalamp commented Aug 6, 2019

thockin commented Aug 19, 2022

logicalhan commented Aug 19, 2022

logicalhan commented Aug 19, 2022

thockin commented Aug 19, 2022

logicalhan commented Aug 19, 2022

sftim commented Jul 9, 2023 •

edited

Loading

k8s-ci-robot commented Jul 9, 2023

k8s-ci-robot commented Jul 9, 2023

sftim commented Jul 9, 2023

sftim commented Jul 9, 2023

sftim commented Jul 9, 2023

sftim commented Jul 9, 2023

k8s-triage-robot commented Jul 8, 2024

Document API architectural approach for soundness and consistency #41954

Document API architectural approach for soundness and consistency #41954

Comments

bgrant0607 commented Aug 16, 2016 • edited Loading

lavalamp commented Aug 16, 2016 • edited Loading

adohe-zz commented Aug 16, 2016 • edited by tmrts Loading

caesarxuchao commented Oct 5, 2016

bgrant0607 commented Oct 7, 2016

lavalamp commented Oct 7, 2016

smarterclayton commented Jun 14, 2017

ash2k commented Aug 3, 2017

fejta-bot commented Jan 2, 2018

bgrant0607 commented Jan 9, 2018

nikhita commented Mar 4, 2018

warmchang commented May 9, 2019

lavalamp commented Aug 6, 2019

lavalamp commented Aug 6, 2019

bgrant0607 commented Aug 6, 2019

lavalamp commented Aug 6, 2019

thockin commented Aug 19, 2022

logicalhan commented Aug 19, 2022

logicalhan commented Aug 19, 2022

thockin commented Aug 19, 2022

logicalhan commented Aug 19, 2022

sftim commented Jul 9, 2023 • edited Loading

k8s-ci-robot commented Jul 9, 2023

k8s-ci-robot commented Jul 9, 2023

sftim commented Jul 9, 2023

sftim commented Jul 9, 2023

sftim commented Jul 9, 2023

sftim commented Jul 9, 2023

k8s-triage-robot commented Jul 8, 2024

bgrant0607 commented Aug 16, 2016 •

edited

Loading

lavalamp commented Aug 16, 2016 •

edited

Loading

adohe-zz commented Aug 16, 2016 •

edited by tmrts

Loading

sftim commented Jul 9, 2023 •

edited

Loading