-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver,keys: force-flush when needed by split/merge/migrate #136077
base: master
Are you sure you want to change the base?
Conversation
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a draft. Based on an initial review, and some testing, we can start pulling things out from here for PRs to merge.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kvoli and @pav-kv)
86d622d
to
eb0f6e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 28 of 28 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @pav-kv and @sumeerbhola)
pkg/kv/kvserver/kvserverpb/proposer_kv.go
line 73 at r1 (raw file):
allowlist.IsProbe = false // probes are trivial, they always get refused in CheckForcedErr allowlist.State = nil allowlist.DoTimelyApplicationToAllReplicas = false
nit: acknowledge this is a draft, for the non-draft PRs we should add a comment here.
pkg/roachpb/metadata.proto
line 568 at r1 (raw file):
option (gogoproto.goproto_stringer) = true; optional uint64 index = 1 [(gogoproto.nullable) = false];
nit: we are using a uint64 in the rangecontroller/processor code but we have a typed version available from here, is there a reason for it?
pkg/kv/kvserver/kvflowcontrol/rac2/range_controller.go
line 1207 at r1 (raw file):
// quorum or leaseholder requirements may already have ensured that this // replica must force-flush. if rs.scratchEvent.replicaStateInfo.Next <= rc.forceFlushIndex &&
Just double checking my understanding here (bit rusty) about:
if rs.scratchEvent.replicaStateInfo.Next <= rc.forceFlushIndex &&
This is Next because (Match,Next)
are in-flight and [Next,forceFlushIndex] is the range that needs to be force flushed as a result (ffIndex - match - inflight)?
pkg/kv/kvserver/kvflowcontrol/rac2/range_controller.go
line 1531 at r1 (raw file):
// ForceFlushIndexChangedLocked implements RangeController. func (rc *rangeController) ForceFlushIndexChangedLocked(ctx context.Context, index uint64) { rc.forceFlushIndex = index
Is there an invariant we could assert here w.r.t the prior rc.forceFlushIndex
? I was thinking an increasing invariant (not strictly), but I'm guessing there's some edge case where it is valid, but I can't think of anything.
pkg/kv/kvserver/kvserverpb/proposer_kv.proto
line 267 at r1 (raw file):
// DoTimelyApplicationToAllReplicas is set to true when this proposal needs // to be applied on all replicas for various reasons.
nit: similar disclaimer, we should extend this comment w/ some examples like split, merge etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @kvoli and @pav-kv)
pkg/kv/kvserver/kvflowcontrol/rac2/range_controller.go
line 1207 at r1 (raw file):
Previously, kvoli (Austen) wrote…
Just double checking my understanding here (bit rusty) about:
if rs.scratchEvent.replicaStateInfo.Next <= rc.forceFlushIndex &&This is Next because
(Match,Next)
are in-flight and [Next,forceFlushIndex] is the range that needs to be force flushed as a result (ffIndex - match - inflight)?
Correct.
I added a comment.
// NB: Next is exclusive and the first entry that has not yet been sent.
// And forceFlushIndex is inclusive. Therefore, [Next, forceFlushIndex]
// needs to have been sent for force-flush to not be needed, and we
// check for non-emptiness of the interval below.
pkg/kv/kvserver/kvflowcontrol/rac2/range_controller.go
line 1531 at r1 (raw file):
Previously, kvoli (Austen) wrote…
Is there an invariant we could assert here w.r.t the prior
rc.forceFlushIndex
? I was thinking an increasing invariant (not strictly), but I'm guessing there's some edge case where it is valid, but I can't think of anything.
Good point. I added the following in processorImpl
since that lives for the lifetime of the Replica
struct so can do a stricter check there
if buildutil.CrdbTestBuild && p.forceFlushIndex > index {
panic(errors.AssertionFailedf("force-flush index decreased from %d to %d",
p.forceFlushIndex, index))
}
pkg/kv/kvserver/kvserverpb/proposer_kv.go
line 73 at r1 (raw file):
Previously, kvoli (Austen) wrote…
nit: acknowledge this is a draft, for the non-draft PRs we should add a comment here.
Done. This is a bit subtle, at least to me, so I'm glad you asked.
pkg/kv/kvserver/kvserverpb/proposer_kv.proto
line 267 at r1 (raw file):
Previously, kvoli (Austen) wrote…
nit: similar disclaimer, we should extend this comment w/ some examples like split, merge etc.
Done
pkg/roachpb/metadata.proto
line 568 at r1 (raw file):
Previously, kvoli (Austen) wrote…
nit: we are using a uint64 in the rangecontroller/processor code but we have a typed version available from here, is there a reason for it?
I didn't understand the question, specifically which "typed version" and what is "here", in "typed version available from here". Presumably "here" is not this PR since it is simply using uint64
- ReplicatedEvalResult.DoTimelyApplicationToAllReplicas is added and set in splitTriggerHelper, MergeTrigger, Migrate, Subsume. - The previous setting to true is gated on cluster version V25_1_AddRangeForceFlushKey. - This causes ReplicaState.ForceFlushIndex to be set, and RangeForceFlushKey (a replicated range-id local key) to be written when applying the corresponding batch to the state machine. The index is set to the index of the entry being applied, and is monotonically increasing. - replica_rac2.Processor and rac2.RangeController have ForceFlushIndexChangedLocked methods that are called whenever the Replica sees a change in the force-flush-index. When in pull-mode the rangeController acts on the latest value it has seem to force-flush a replicaSendStream that has a send-queue that has some entries at or equal to the force-flush-index. Fixes cockroachdb#135601 Epic: CRDB-37515 Release note: None
Fixes #135601
Epic: CRDB-37515
Release note: None