-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AlertRoute for merging alertmanager configs for different teams #2927
Comments
I think this is a valid use case and one that we would want. Merging the Alertmanager routing tree from different configs is tricky but I think it can be done. If you want to write up a design for it that would be fantastic! Then we can discuss the design and ultimately we/you can get to implementing this! :) |
I know that Rancher does some level of merging of Alertmanager configs for it's monitoring solution, it could be a useful reference |
Do you have a link to that? |
I think most of that logic lives here |
I can't find config merging there (doesn't mean it doesn't exist but I couldn't find it). What you linked looks like a copy of the |
Sorry, let me keep digging - I haven't looked at the internals much, just the user end |
I'm new to writing k8s CRD - please correct me if I'm saying something that is absolutely wrong: the way I see design of it - we need 2 new k8s CRDs:
all other configs - global settings, inhibit_rules will stay as now in Alertmanager CRD - just to store it in the end in configmap(same as prometheus config) and not secret. Default route should be created in AlertRoute CRD but with some special name. |
I think that’s pretty close to what I would have designed as well. I think the templates are probably fine to inline into the receiver API, and the default route is something that I think I would require to be specified in the alertmanager if the “distributed config” Mode discussed here is chosen (maybe a reference to an object). The tricky part about all of this is how to merge the route configs so that the first matching route doesn’t prevent an alert to not go into a later route that it was actually meant for. |
I believe, issue with merging routes should be not that big - mostly people will use labels "team", "project", "app" for their specific alert route. At first, line in documentation "main routes are merged in random order" should do the job :) |
I don't think those types of organizations are really the ones that benefit most from this feature. Organizations that are that well structured could already with a static alertmanager config accomplish everything. It's those that have very different requirements, maybe multi tenant environments, that really need this, and we need to ensure that we don't make it easy to do the wrong things 🙂 . I think this is perfectly doable with carefully thinking about |
This is a dupe of #1528. Won't close this though as the discussion here is more active than on the other thread :) . The The way I see it the alertmanager object should have a config option that allows either enforcing "tenancy" based on the namespace label or configure them as user provided. In the "tenant aware" case we probably need to still be able to allow-list individual objects to not require to be namespaced for example for kubelet or node monitoring as they don't nominally belong to a namespace but rather infrastructure monitoring. |
Hello! I've started a design document: Feel free to comment there... |
Any progress on the design document @simonpasquier ? |
@spencergilbert I'd say the document is more or less. I need to find time to work on the implementation but given the global situation, it's hard to tell when it will happen... |
Is this suggesting that it's desirable to have one global state/configuration for alertmanager? I'm attempting to enable the teams I support to be autonomous and not to step on each others toes, so I feel strongly against this statement. I'm asking for clarification though, not to start an argument. (this statement added for clarity) When you say |
I was alluding to the fact that in a well structured environment with enforced practices/workflows, it's not even strictly necessary to have routes per team, as what Alertmanager can do out of the box can be perfectly sufficient to route to different slack channels or pagerduty receivers etc. That said, it may still be desirable for other reasons to decentralize.
That's practically what the Alertmanager CRD is about. The trust part doesn't really make a difference for this feature, I'm aware of both types existing as in multiple distinct "untrusted" entities sharing a cluster, as well as just split between teams, although the first being a more rare extreme. |
Closed by #3451 |
What is missing?
I believe, ability for each team/project to configure their alertmanager routes, grouping options and own receivers is a missing part in Prometheus operator.
Why do we need it?
K8s CRD( let's name it AlertRoute) together with ServiceMonitor and PrometheusRule will give teams full control over their monitoring and alerting.
Is it something you will consider merging if I contribute a PR for it?
The text was updated successfully, but these errors were encountered: