-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROX-15088: Use integration health report in declarative config manager #5105
ROX-15088: Use integration health report in declarative config manager #5105
Conversation
Current dependencies on/for this PR: This comment was auto-generated by Graphite. |
Images are ready for the commit at 485047c. To use with deploy scripts, first |
80656e3
to
1abf2f1
Compare
1abf2f1
to
67c340c
Compare
/retest |
for _, message := range messages { | ||
if err := typeUpdater.Upsert(m.reconciliationCtx, message); err != nil { | ||
m.reconciliationErrorReporter.ProcessError(message, err) | ||
for handler, protoMessagesByType := range transformedMessagesByHandler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm eager to keep global order of proto types to avoid extra errors and displaying unhealthy configs when it's not needed. How about we use a structure like map[protoType]map[handlerID][]proto.Message
?
P.S. Another benefit of this structure is that it will allow to easily flag all handlers which contain unsupported type as unhealthy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, the global order is kept right now, no?
We go through each proto type, and for each proto type we consolidate all handlers, retrieving only the proto messages for this specific type. Once done, we continue with the next proto type. The global order is kept (also, ensured within the tests via gomock.InOrder
that the order is actually kept).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, sorry - didn't notice the outer loop. No questions to this part
} | ||
|
||
func extractNameFromProtoMessage(message proto.Message) string { | ||
messageWithName, ok := message.(interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One alternative for groups that comes to mind is to create a fake name based on group properties - something like provider_id:key:value:roleName
- WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we are back to good old serialized key for groups 🥴
I'm not sure how much it's better, given that the auth provider ID will be non-telling for customers to identify the auth provider (imagine you have two groups sharing the same attributes / role name, but for different auth providers). Guess the same applies right now with the group ID, so I changed it to be the tuple of these values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we are back to good old serialized key for groups 🥴
“The more you know about the past, the better prepared you are for the future.” —Theodore Roosevelt
given that the auth provider ID will be non-telling for customers to identify the auth provider
I guess you're right, they should know what they are looking at. What do you think about terrifying
auth_provider_id:<id>:key:<key>:value:<value>:role_name:<roleName>
format? A more pleasant alternative would be group <key>:<value>:<roleName> for auth provider ID <authProviderID>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm fine doing it either way, since we can be flexible with the name.
The root issue we have is that the group has no name and the auth provider is in ID-format instead of name, so looking at the declarative config it may be non-trivial identifying the groups.
Updated the names to the group <key>:<value>:<roleName> for auth provider ID <authProviderID>
format.
Name: "Config Map my-cool-config-map", | ||
Type: storage.IntegrationHealth_DECLARATIVE_CONFIG, | ||
Status: storage.IntegrationHealth_UNHEALTHY, | ||
ErrorMessage: "unmarshalling raw configuration: raw configuration \n{\"cool\": \"key\", \"value\": \"pairs\"}\n didn't match any of the given configurations: 4 errors occurred:\n\t* yaml: unmarshal errors:\n line 2: field cool not found in type declarativeconfig.AuthProvider\n line 2: field value not found in type declarativeconfig.AuthProvider\n\t* yaml: unmarshal errors:\n line 2: field cool not found in type declarativeconfig.AccessScope\n line 2: field value not found in type declarativeconfig.AccessScope\n\t* yaml: unmarshal errors:\n line 2: field cool not found in type declarativeconfig.PermissionSet\n line 2: field value not found in type declarativeconfig.PermissionSet\n\t* yaml: unmarshal errors:\n line 2: field cool not found in type declarativeconfig.Role\n line 2: field value not found in type declarativeconfig.Role\n\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can specify matcher to check not exact error message here, but rather that it contains an unmarshalling error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could, but I don't want to make the gomock.Matcher more complex than it needs to be. I'd just keep it for now, maybe later we can change it if another long-ish error exists.
aaa15f1
to
d975301
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's resolve conversation around group name and merge this PR 👍
errorsPerDeclarativeConfig: map[string]int32{}, | ||
idExtractor: idExtractor, | ||
nameExtractor: nameExtractor, | ||
shortCircuitSignal: concurrency.NewSignal(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is it necessary to initialize signals? Don't think we did that before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the description:
the release tests for ^^ didn't fail because the signals (both stopSignal & shortCircuitSignal) were not initialized via concurrency.NewSignal(), hence stopSignal.IsDone() always yielded true
Currently, the reconciliation didn't work because we always stopped the reconciliation loop.
for _, message := range messages { | ||
if err := typeUpdater.Upsert(m.reconciliationCtx, message); err != nil { | ||
m.reconciliationErrorReporter.ProcessError(message, err) | ||
for handler, protoMessagesByType := range transformedMessagesByHandler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, sorry - didn't notice the outer loop. No questions to this part
} | ||
log.Debugf("Error within reconciliation for %+v: %v", message, err) | ||
} else { | ||
currentDeclarativeConfigErrors := m.errorsPerDeclarativeConfig[messageID] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 3 lines can be simplified to
m.errorsPerDeclarativeConfig[messageID] = 0
} | ||
|
||
func extractNameFromProtoMessage(message proto.Message) string { | ||
messageWithName, ok := message.(interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we are back to good old serialized key for groups 🥴
“The more you know about the past, the better prepared you are for the future.” —Theodore Roosevelt
given that the auth provider ID will be non-telling for customers to identify the auth provider
I guess you're right, they should know what they are looking at. What do you think about terrifying
auth_provider_id:<id>:key:<key>:value:<value>:role_name:<roleName>
format? A more pleasant alternative would be group <key>:<value>:<roleName> for auth provider ID <authProviderID>
d975301
to
6a4a3ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of small fixes, otherwise LGTM
// IDExtractor extracts the ID from proto messages. | ||
type IDExtractor func(m proto.Message) string | ||
|
||
// NameExtractor extracts the ID from proto messages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// NameExtractor extracts the ID from proto messages. | |
// NameExtractor extracts the name from proto messages. |
Description
This PR is a follow up for pull/5048, in which the API of the
/v1/integrationhealth
service was extended to cover declarative configurations.Within this PR, the
integrationhealth.Reporter
is now used within thedeclarativeconfig.Manager
to report the health status of resources created from declarative configuration as well as the overall health of the config maps.For now, the following is registered and updated:
Notes:
Caveats within the PR:
handleMissingTypeUpdater
calledstopSignal.Done()
, while it should have calledstopSignal.Signal()
(currently, not stop signal would be sent).stopSignal
&shortCircuitSignal
) were not initialized viaconcurrency.NewSignal()
, hencestopSignal.IsDone()
always yieldedtrue
Both things were fixed within this PR.
Checklist
If any of these don't apply, please comment below.
Testing Performed
Manual tests:
central-services
chartstackrox
namespace. Set theROX_DECLARATIVE_CONFIGURATION
feature flag to true, add declarative config mounts, and optionally set loglevel to debugUNINITIALIZED
.auth-provider
should be marked unhealthy, including the error