Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advance epoch #105469

Merged
merged 1 commit into from
Oct 11, 2021
Merged

Advance epoch #105469

merged 1 commit into from
Oct 11, 2021

Conversation

MikeSpreitzer
Copy link
Member

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

This PR enhances the queueset implementation in API Priority and Fairness to cope with the possibility of overflow of work accumulators.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

This is built on #105412 and will be rebased once that PR merges.

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 4, 2021
@MikeSpreitzer
Copy link
Member Author

/sig api-machinery
/cc @wojtek-t
/cc @deads2k
/cc @tkashem
/cc @lavalamp

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. area/apiserver and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 4, 2021
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 4, 2021
@MikeSpreitzer MikeSpreitzer force-pushed the advance-epoch branch 2 times, most recently from 4f1bb71 to 6839bc3 Compare October 4, 2021 21:04
@MikeSpreitzer MikeSpreitzer force-pushed the advance-epoch branch 2 times, most recently from cb40446 to 3708369 Compare October 5, 2021 17:17
@fedebongio
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 5, 2021
klog.ErrorS(errors.New("progress meter wrapped around"), "Wrap", "QS", qs.qCfg.Name, "prevR", prevR, "currentR", qs.currentR)
incrR := SeatsTimesDuration(qs.getVirtualTimeRatioLocked(), timeSinceLast)
if incrR > RDecrement/2 {
klog.ErrorS(errors.New("unusably large R increment"), "Can not do it", "QS", qs.qCfg.Name, "when", realNow.Format(nsTimeFmt), "incrR", incrR)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this - it's not an actionable error really..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actionable by whom? If it happens, I would like to know because that means we have more work to do.

}
metrics.SetCurrentR(qs.qCfg.Name, qs.currentR.ToFloat())
}

// advanceEpoch subtracts RDecrement from the global progress meter R
// and all the readings that have been taked from that meter.
func (qs *queueSet) advanceEpoch() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an admin - I would really like to be able to monitor this.

I think we need a metric that shows
a) when advancing epoch happened
b) any errors asociated with that (in particular I'm seeing the underflow error below).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am most concerned by the fact that you are reporting seeing an underflow. That is far beyond what I expect. Can you give more details and/or describe how to reproduce?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems I wasn't clear - I didn't see underflow myself. I see that as error below.
But I would like to be able to monitor for that (i.e. have metrics - not just logs).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the metrics.

t.Errorf("SeatSeonds(%d) formatted as %q rather than expected %q", uint64(testCase.ss), actualStr, testCase.expect)
actualStrLen := len(actualStr)
actualSuffix := actualStr[actualStrLen-2:]
if actualSuffix != "ss" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm against these types of types.
The tests are supposed to be as simple as possible to avoid bugs in tests themselfes.

Given that this in particular is testing if the returned string looks as it should, what we should do is:

  • have an expectString field in the testCase
  • simply compare that with with what ss.String() returns

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to have a test that does not require editing when ssScale changes. It was helpful when I experimented with different settings for that. But since you insist, I have sorrowfully made the test use literal strings.

@@ -149,6 +150,14 @@ const MaxSeatSeconds = SeatSeconds(math.MaxUint64)
// MinSeatSeconds is the lowest representable value of SeatSeconds
const MinSeatSeconds = SeatSeconds(0)

// RDecrement is the amount by which the progress meter R is wound backwards
// when needed to avoid overflow.
const RDecrement = MaxSeatSeconds / 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an implementation detail - let's move these constants to queueset.go and make them private.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, good idea.

@k8s-ci-robot k8s-ci-robot added the sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. label Oct 6, 2021
@MikeSpreitzer
Copy link
Member Author

The force-push to c4c3d89f0f8 makes the changes suggested in review above.

@MikeSpreitzer
Copy link
Member Author

The force-push to 5b3e05d07c4 adds a bit of info to the info log message in queueset::advanceEpoch.

@@ -207,8 +208,8 @@ var (
Help: "Number of requests currently pending in queues of the API Priority and Fairness system",
StabilityLevel: compbasemetrics.ALPHA,
},
[]string{priorityLevel, flowSchema},
)
[]string{priorityLevel, flowSchema})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you changing those?

[I personally prefer what we had previously]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They weren't all using the same style. I picked a style that some used and involved blank lines, which I think is a good separator when the things being separated are big.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed them all to separate with close paren.

Namespace: namespace,
Subsystem: subsystem,
Name: "epoch_advance_total",
Help: "Number of times the queueset's progress meter zero-point was advanced",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove "zero-point" from the description.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make it something like "Number of times the queueset's progress meter was jumped bakwards".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@MikeSpreitzer
Copy link
Member Author

The force-push to 7abd330961e makes the changes discussed in the latest review.

@MikeSpreitzer
Copy link
Member Author

The force-push to 49cecbe715f makes one overlooked change.

Also add test for that situation.
@MikeSpreitzer
Copy link
Member Author

The force-push to a797fbd corrected the corresponding test.

@wojtek-t
Copy link
Member

/lgtm
/approve

Thanks!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 11, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MikeSpreitzer, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 9253fef into kubernetes:master Oct 11, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Oct 11, 2021
@MikeSpreitzer MikeSpreitzer deleted the advance-epoch branch October 11, 2021 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants