Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AlertmanagerConfig CRD #3451

Merged
merged 3 commits into from
Oct 14, 2020

Conversation

grdryn
Copy link
Contributor

@grdryn grdryn commented Aug 27, 2020

This builds upon @brancz's work in #3265 and will hopefully eventually implement the core of the proposal at https://docs.google.com/document/d/1aVVttvocop8zNezwrNFbHh_HKRW_fhJUZqhmrr8fGK0/edit?usp=sharing

Closes #3261, #2766, #2927, #1528.

@grdryn grdryn requested review from paulfantom and a team as code owners August 27, 2020 17:11
@s-urbaniak
Copy link
Contributor

@grdryn Is this ready for final review? The build failing and I think we should add some more e2e tests here too. Let us know if you need further assistance :)

@grdryn
Copy link
Contributor Author

grdryn commented Aug 31, 2020

@s-urbaniak no, this is not ready for review yet. There's still a lot to do! 🙂 I think GH added reviewers automatically.

I will indeed be in touch if/when I need help, thanks! 🙂

@grdryn grdryn force-pushed the alertmanagerconfig-crd branch 2 times, most recently from 25d6434 to 893c47c Compare September 7, 2020 18:30
Copy link
Contributor Author

@grdryn grdryn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonpasquier @s-urbaniak @brancz This currently "works" for the CRD type that exists so far (i.e. only the PagerDuty receiver type is implemented, but others should be easy to add after). Probably some tests need to be fixed up & some more added, I'll look in that direction next.

Could you please take a look when you have some time, to see if what I've done so far is generally in the right direction, or if there are things you'd like changed? It's very much inspired by how the prometheus controller handles ServiceMonitor (etc) CRs, and in many cases blatently plagiarised. At least that should make it somewhat familiar! 🙂

pkg/alertmanager/operator.go Outdated Show resolved Hide resolved
pkg/alertmanager/operator.go Show resolved Hide resolved
pkg/alertmanager/operator.go Outdated Show resolved Hide resolved
pkg/alertmanager/operator.go Outdated Show resolved Hide resolved
pkg/apis/monitoring/v1/alertmanager_config_types.go Outdated Show resolved Hide resolved
pkg/alertmanager/amcfg.go Show resolved Hide resolved
@grdryn
Copy link
Contributor Author

grdryn commented Sep 9, 2020

Existing tests appear to work now! 🎉 However, the prometheus e2e tests are failing in travis with:

The job exceeded the maximum time limit for jobs, and has been terminated.

Is that a known issue, or is it caused by the changes here? I don't think these changes should affect those prometheus controller e2e tests that much 🤷‍♂️

scripts/generate-crds.go Outdated Show resolved Hide resolved
@grdryn grdryn force-pushed the alertmanagerconfig-crd branch 2 times, most recently from ea2de4e to 3907ed0 Compare September 11, 2020 16:42
@grdryn grdryn force-pushed the alertmanagerconfig-crd branch 2 times, most recently from afd1050 to 3082d0d Compare September 15, 2020 13:59
Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for doing this! I've made a first limited review. I'll add more later.

pkg/alertmanager/amcfg.go Show resolved Hide resolved
pkg/alertmanager/amcfg.go Show resolved Hide resolved
pkg/alertmanager/amcfg.go Show resolved Hide resolved
pkg/alertmanager/operator.go Show resolved Hide resolved
pkg/alertmanager/operator.go Outdated Show resolved Hide resolved
pkg/alertmanager/operator.go Outdated Show resolved Hide resolved
@grdryn
Copy link
Contributor Author

grdryn commented Sep 22, 2020

I'm going to remove the WIP prefix for this PR now, as I think it's close to a state that could be merged. Only the PagerDuty receiver has been added so far. I can add the others here, or I can do them in follow-up changes if you'd prefer.

@grdryn grdryn changed the title WIP: Add AlertmanagerConfig CRD Add AlertmanagerConfig CRD Sep 22, 2020
@lilic
Copy link
Contributor

lilic commented Sep 23, 2020

Only the PagerDuty receiver has been added so far. I can add the others here, or I can do them in follow-up changes if you'd prefer.

I am fine in a follow-up as long as you can open the issues for maybe each one would be a good one for outside contributors to take, but up to @simonpasquier and @brancz on that one.

Copy link
Contributor

@lilic lilic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on the PR! Thanks so much! Couple of comments, leaving up to Frederic and Simon to have a final look, no blockers on my side.

pkg/alertmanager/amcfg.go Outdated Show resolved Hide resolved
}
}

// TODO: Do we need to enque secrets just for the namespace or in general?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO still valid before we merge the PR?

Copy link
Contributor Author

@grdryn grdryn Sep 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plagiarized this TODO when I plagiarised this function from here 🙈

// TODO: Do we need to enque secrets just for the namespace or in general?

I could be wrong, but here's how I see it:

  • We should enqueue reconciliation for an Alertmanager if any of the Secrets referenced in its spec are deleted (it has a secrets field which is a list, configSecret field, containers or initContainers fields may reference a Secret to mount, etc). These all have to be in the same namespace as the Alertmanager. This function, enqueuing all Alertmanagers in the namespace, covers these cases, I believe.

  • We may want to also enqueue reconciliation for an Alertmanager, if there's an AlertmanagerConfig selected by it that references a Secret that's deleted (upon reconciliation, if the Secret is missing for an AlertmanagerConfig, then that AlertmanagerConfig will be excluded when building the config file). If that's what we want, then I think it would either mean enqueuing all Alertmanagers in all namespaces, for any Secret changes in any AlertmanagerConfig namespaces; or determining if an Alertmanager should be reconciled by checking if the Secret is referenced in either the Alertmanager spec, or in any AlertmanagerConfigs that it selects.

I'm starting to confuse myself. In any case, if we want the second option in addition to the first, I'm not quite sure how to achieve that here. 🤔 Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to dig deeper into this.

pkg/apis/monitoring/v1beta1/doc.go Outdated Show resolved Hide resolved
Copy link
Contributor

@dgrisonnet dgrisonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Just a few comments, but it looks really good.

pkg/alertmanager/amcfg_test.go Show resolved Hide resolved
pkg/alertmanager/asset_store_test.go Show resolved Hide resolved
pkg/alertmanager/asset_store.go Show resolved Hide resolved
test/e2e/alertmanager_test.go Outdated Show resolved Hide resolved
@brancz
Copy link
Contributor

brancz commented Sep 23, 2020

I won't get to doing a deep dive on this soon, so it's better for @simonpasquier to do so. Two general comments:

  • Yes I think starting with just one provider is fine, the rest is pretty mechanical and can be done in follow up PRs.
  • Let's make sure we start with a v1alpha1 API :)

@grdryn grdryn force-pushed the alertmanagerconfig-crd branch from fd9e294 to 3c47587 Compare October 8, 2020 16:16
Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments for the Alertmanager controller. Otherwise this is looking really good. Given the amount of the code changes + the fact that the resource is v1alpha1, 👍 for merging this after addressing my last remarks.

}
}

func (c *Operator) enqueueForMonitorNamespace(nsName string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the intermediate enqueueForMonitorNamespace() function. We can have only func (c *Operator) enqueueForNamespace(nsName string) which would use the c.nsAlrtCfgInf's store.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I've done that now. I have to admit to being less familiar with this part of the changes, I think it comes from branczs earlier WIP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely understand, you've done an amazing work knowing that you didn't know extensively the code base before :)

}
}

// TODO: Do we need to enque secrets just for the namespace or in general?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to dig deeper into this.

pkg/alertmanager/operator.go Show resolved Hide resolved
pkg/alertmanager/amcfg.go Outdated Show resolved Hide resolved
pkg/alertmanager/operator.go Show resolved Hide resolved
@grdryn grdryn force-pushed the alertmanagerconfig-crd branch from a25808d to a9ba233 Compare October 12, 2020 10:53
@simonpasquier
Copy link
Contributor

👍 for me. Lets give other @prometheus-operator/prometheus-operator-reviewers a chance to review it :)

Copy link
Contributor

@lilic lilic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 👍

Thanks again!!

@lilic
Copy link
Contributor

lilic commented Oct 13, 2020

The only thing I would ask if the commits can be squashed, just to make it easier in case we need to revert something. 25 seems a bit too much even for this much changes. Do you mind doing that @grdryn and then we can merge 🎉

@simonpasquier
Copy link
Contributor

We could also Squash and merge with the GitHub UI.

@lilic
Copy link
Contributor

lilic commented Oct 13, 2020

Personally, I like to have generated (API in this example) and actual code separated, but not a blocker for me, happy to do it via github UI as well.

@simonpasquier
Copy link
Contributor

Leaving the final call to @grdryn if he's willing to squash the commits and split the generated code in a separate commit :)

brancz and others added 3 commits October 13, 2020 21:18
Command:

make --always-make generate \
 pkg/client/versioned/clientset.go \
 pkg/client/listers/monitoring/v1/prometheus.go \
 pkg/client/listers/monitoring/v1alpha1/prometheus.go \
 pkg/client/informers/externalversions/monitoring/v1/prometheus.go \
 pkg/client/informers/externalversions/monitoring/v1alpha1/prometheus.go
@grdryn grdryn force-pushed the alertmanagerconfig-crd branch from a9ba233 to 238b53e Compare October 13, 2020 20:19
@grdryn
Copy link
Contributor Author

grdryn commented Oct 13, 2020

I've squashed to 3 commits: brancz's original work, then the rest is the split between manual code changes and generated files. Is that the split you're looking for?

@lilic
Copy link
Contributor

lilic commented Oct 14, 2020

Thanks sounds good!

@lilic lilic merged commit e1bea09 into prometheus-operator:master Oct 14, 2020
@lilic
Copy link
Contributor

lilic commented Oct 14, 2020

Thanks so much again for the amazing work and patience! 🎉

@moshloop
Copy link

Thank you to everyone involved in getting this merged 🎉🙌

@keren-benzion
Copy link

keren-benzion commented Apr 25, 2022

Hi,
I think this can come in handy in my organisation. Do you have maybe a document which explains how to implement this? to make it easier for me to deploy it

thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Proposal] AlertmanagerRule CRD
9 participants