-
Notifications
You must be signed in to change notification settings - Fork 691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add namespace to unseal error metric #463
Merged
mkmik
merged 1 commit into
bitnami-labs:master
from
povilasv:add-namespace-to-error-metrics
Oct 26, 2020
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you have to remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
becaue we can't populate it any more, as the label combination is status abd namespace, and you can't init labels with just reason field, you need both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why it had to be initialized in the first place.
@kskewes I see you added that in #375; is it really necessary these counters are set to 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The backstory is here: https://www.robustperception.io/existential-issues-with-metrics
How it relates:
Sealed Secrets Controller behaviour in an error case as I understand it.
Prometheus behaviour:
/metrics
endpoint on retry interval (10s or 60s or as configured)These scrapes can happen at any point after the controller has started up.
Situation:
So we do this manually.
absent()
function in Prometheus but there are a lot of edge cases we have tried to handle in our other applications with varying degrees of success.Options (in no specific order):
NaN
to5
within the scrape period interval due to fast retry loop and Prometheus won't catch it. The second increase (or unseal error) will alert via the existing alert.absent()
metrics withoffset
etc per the above link.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other options:
List+watch for namespaces via k8s api and initialize metrics based in existing namespaces.
Advise users on how to craft alerts correctly in case of missing values (using
or
and other trickeries as Brian hinted at in the linked doc)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds reasonable; thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. The retry interval must have changed since I last tested. :)
With 10s scrape interval provided we have retries over at least 2 it might be ok.
It would be good to update the alert unit test in tests.yaml and the alert itself in jsonnet.
Would you like to do in separate PR? Or I can once we pull next release and can validate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Karl, I don't have bandwidth to take care of this, it would be really great if I could delegate that to you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Happy to pickup after current tasks. :)
Can validate retry versus scrape intervals in the unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we initialize at least "fetch" (which doesn't have a namespace) so the counter exists. See #543