Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver,keys: force-flush when needed by split/merge/migrate #136077

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sumeerbhola
Copy link
Collaborator

  • ReplicatedEvalResult.DoTimelyApplicationToAllReplicas is added and set in splitTriggerHelper, MergeTrigger, Migrate, Subsume.
  • The previous setting to true is gated on cluster version V25_1_AddRangeForceFlushKey.
  • This causes ReplicaState.ForceFlushIndex to be set, and RangeForceFlushKey (a replicated range-id local key) to be written when applying the corresponding batch to the state machine. The index is set to the index of the entry being applied, and is monotonically increasing.
  • replica_rac2.Processor and rac2.RangeController have ForceFlushIndexChangedLocked methods that are called whenever the Replica sees a change in the force-flush-index. When in pull-mode the rangeController acts on the latest value it has seem to force-flush a replicaSendStream that has a send-queue that has some entries at or equal to the force-flush-index.

Fixes #135601

Epic: CRDB-37515

Release note: None

@sumeerbhola sumeerbhola requested review from pav-kv and kvoli November 24, 2024 05:37
Copy link

blathers-crl bot commented Nov 24, 2024

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a draft. Based on an initial review, and some testing, we can start pulling things out from here for PRs to merge.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli and @pav-kv)

Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 28 of 28 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @pav-kv and @sumeerbhola)


pkg/kv/kvserver/kvserverpb/proposer_kv.go line 73 at r1 (raw file):

	allowlist.IsProbe = false // probes are trivial, they always get refused in CheckForcedErr
	allowlist.State = nil
	allowlist.DoTimelyApplicationToAllReplicas = false

nit: acknowledge this is a draft, for the non-draft PRs we should add a comment here.


pkg/roachpb/metadata.proto line 568 at r1 (raw file):

  option (gogoproto.goproto_stringer) = true;

  optional uint64 index = 1 [(gogoproto.nullable) = false];

nit: we are using a uint64 in the rangecontroller/processor code but we have a typed version available from here, is there a reason for it?


pkg/kv/kvserver/kvflowcontrol/rac2/range_controller.go line 1207 at r1 (raw file):

			// quorum or leaseholder requirements may already have ensured that this
			// replica must force-flush.
			if rs.scratchEvent.replicaStateInfo.Next <= rc.forceFlushIndex &&

Just double checking my understanding here (bit rusty) about:

if rs.scratchEvent.replicaStateInfo.Next <= rc.forceFlushIndex &&

This is Next because (Match,Next) are in-flight and [Next,forceFlushIndex] is the range that needs to be force flushed as a result (ffIndex - match - inflight)?


pkg/kv/kvserver/kvflowcontrol/rac2/range_controller.go line 1531 at r1 (raw file):

// ForceFlushIndexChangedLocked implements RangeController.
func (rc *rangeController) ForceFlushIndexChangedLocked(ctx context.Context, index uint64) {
	rc.forceFlushIndex = index

Is there an invariant we could assert here w.r.t the prior rc.forceFlushIndex? I was thinking an increasing invariant (not strictly), but I'm guessing there's some edge case where it is valid, but I can't think of anything.


pkg/kv/kvserver/kvserverpb/proposer_kv.proto line 267 at r1 (raw file):

  // DoTimelyApplicationToAllReplicas is set to true when this proposal needs
  // to be applied on all replicas for various reasons.

nit: similar disclaimer, we should extend this comment w/ some examples like split, merge etc.

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @kvoli and @pav-kv)


pkg/kv/kvserver/kvflowcontrol/rac2/range_controller.go line 1207 at r1 (raw file):

Previously, kvoli (Austen) wrote…

Just double checking my understanding here (bit rusty) about:

if rs.scratchEvent.replicaStateInfo.Next <= rc.forceFlushIndex &&

This is Next because (Match,Next) are in-flight and [Next,forceFlushIndex] is the range that needs to be force flushed as a result (ffIndex - match - inflight)?

Correct.
I added a comment.

			// NB: Next is exclusive and the first entry that has not yet been sent.
			// And forceFlushIndex is inclusive. Therefore, [Next, forceFlushIndex]
			// needs to have been sent for force-flush to not be needed, and we
			// check for non-emptiness of the interval below.

pkg/kv/kvserver/kvflowcontrol/rac2/range_controller.go line 1531 at r1 (raw file):

Previously, kvoli (Austen) wrote…

Is there an invariant we could assert here w.r.t the prior rc.forceFlushIndex? I was thinking an increasing invariant (not strictly), but I'm guessing there's some edge case where it is valid, but I can't think of anything.

Good point. I added the following in processorImpl since that lives for the lifetime of the Replica struct so can do a stricter check there

	if buildutil.CrdbTestBuild && p.forceFlushIndex > index {
		panic(errors.AssertionFailedf("force-flush index decreased from %d to %d",
			p.forceFlushIndex, index))
	}

pkg/kv/kvserver/kvserverpb/proposer_kv.go line 73 at r1 (raw file):

Previously, kvoli (Austen) wrote…

nit: acknowledge this is a draft, for the non-draft PRs we should add a comment here.

Done. This is a bit subtle, at least to me, so I'm glad you asked.


pkg/kv/kvserver/kvserverpb/proposer_kv.proto line 267 at r1 (raw file):

Previously, kvoli (Austen) wrote…

nit: similar disclaimer, we should extend this comment w/ some examples like split, merge etc.

Done


pkg/roachpb/metadata.proto line 568 at r1 (raw file):

Previously, kvoli (Austen) wrote…

nit: we are using a uint64 in the rangecontroller/processor code but we have a typed version available from here, is there a reason for it?

I didn't understand the question, specifically which "typed version" and what is "here", in "typed version available from here". Presumably "here" is not this PR since it is simply using uint64

- ReplicatedEvalResult.DoTimelyApplicationToAllReplicas is added and set
  in splitTriggerHelper, MergeTrigger, Migrate, Subsume.
- The previous setting to true is gated on cluster version
  V25_1_AddRangeForceFlushKey.
- This causes ReplicaState.ForceFlushIndex to be set, and
  RangeForceFlushKey (a replicated range-id local key) to be written when
  applying the corresponding batch to the state machine. The index is set
  to the index of the entry being applied, and is monotonically
  increasing.
- replica_rac2.Processor and rac2.RangeController have
  ForceFlushIndexChangedLocked methods that are called whenever the
  Replica sees a change in the force-flush-index. When in pull-mode the
  rangeController acts on the latest value it has seem to force-flush
  a replicaSendStream that has a send-queue that has some entries at
  or equal to the force-flush-index.

Fixes cockroachdb#135601

Epic: CRDB-37515

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rac2: chronic send-queue in apply_to_all causes problems with splits, merges, and below-raft migrations
3 participants