Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service failover option on Kubernetes CRD #11312

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

Conversation

jjsiv
Copy link

@jjsiv jjsiv commented Dec 1, 2024

What does this PR do?

This is a basic implementation of service failover feature on Kubernetes CRD provider.

Motivation

Currently failover is only configurable on the file provider. I've created an issue last year and there's been some interest in having this option also available on the Kubernetes provider #9919.

More

  • Added/updated tests
  • Added/updated documentation

Additional Notes

jjsiv added 2 commits December 1, 2024 12:17
Signed-off-by: Jedrzej Kotkowski <jedrzejk143@gmail.com>
Signed-off-by: Jedrzej Kotkowski <jedrzejk143@gmail.com>
@jjsiv
Copy link
Author

jjsiv commented Dec 1, 2024

This is more of a proof of concept implementation (no tests included or documentation is currently included) than anything and I'd mostly like to get some feedback on how we can proceed with this, if at all.

  1. This cannot replicate 1:1 functionality of failover on file provider without, I believe, making major and possibly breaking changes to how services are currently created in Traefik for Kubernetes provider. Without NativeLB enabled, failover will not work at all when endpoints are not available for the main service (for example in case main service pods cannot be created). It does work generally ok with NativeLB enabled, although still with some limitations compared to the file provider (e.g. won't work if wrong port is specified for main service).
  2. Implementation depends on healthcheck being available for use on services. Healtcheck was implemented here Support HealthCheck for ExternalName services #10467 although for ExternalName services only. This PR enables it for all other service types, although perhaps this should be another PR instead? Are there any considerations when enabling it?

If I could get some input on this or any other related feedback before I would be grateful.

@kevinpollet
Copy link
Member

Hello @jjsiv and thanks for your contribution,

We have reviewed the pull request and would like to share our concerns. Kubernetes (k8s) already includes a built-in health-check mechanism that manages service endpoints by adding or removing them based on pod readiness. Given this functionality, we believe extending the health-check mechanism to all service types is unnecessary. The health-check feature was originally introduced specifically for ExternalName services, as Kubernetes does not provide a native health-check mechanism for them.

We would like to leverage Kubernetes health-check information to design the Failover service. At first glance, we have identified two potential approaches.

  1. Modify the dynamic configuration process: Incorporate health-check information into the configuration, allowing Traefik to automatically remove services that are down.
  2. Enhance the Kubernetes provider: Add intelligence to the provider to construct the service tree based on Kubernetes health-check status.

Personally, I would vote for the first option, as it ensures that the health-check intelligence is not duplicated in the Kubernetes CRD provider.

Does it make sense? wdyt?

@jjsiv
Copy link
Author

jjsiv commented Dec 3, 2024

Thanks for your feedback @kevinpollet. I agree, to rely on built-in Kubernetes mechanisms would be ideal and the first approach sounds good. I will see what I can come up with.

@kevinpollet
Copy link
Member

Hello @jjsiv,

Have you made any progress on the subject?

If you want we can move forward on it and push some commits to your pull request or open a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants