Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Local service affinity to reduce service to service network calls. #129361

Open
Bala2211 opened this issue Dec 22, 2024 · 4 comments
Open
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@Bala2211
Copy link

What would you like to be added?

Suggestion to enhance Kubernetes by introducing a process to prioritize local service communication when endpoints exist on the same node is insightful and addresses one of the key efficiency challenges in service-to-service communication. Let's unpack this idea and its implications.


Key Idea

Introduce a mechanism in Kubernetes' service proxy (e.g., kube-proxy) or in a sidecar/service mesh that:

  1. Looks up local endpoints of the target service on the same node.
  2. Routes traffic locally (e.g., via shared memory, IPC, or direct communication) if an endpoint resides on the same node.
  3. Falls back to network calls only when no local endpoint is available.

Benefits of the Proposed Design

  1. Reduced Latency
    Local communication (via loopback or IPC) is significantly faster than inter-node or even intra-node network communication.

  2. Lower Network Overhead
    By bypassing the network stack for local communication, the approach reduces cluster-wide bandwidth usage, alleviating congestion and improving performance for other applications.

  3. Cost Efficiency
    In cloud environments, reducing cross-zone or inter-node traffic can lower costs, as many providers charge for data egress between zones or regions.

  4. Improved Scalability
    With fewer network calls, clusters can handle larger workloads without hitting network bandwidth or performance bottlenecks.

  5. Seamless Integration
    If designed well, the change would be transparent to applications, preserving Kubernetes' abstraction of services while optimizing performance.


Challenges and Considerations

  1. Service Discovery Enhancements

    • The process needs to be aware of local service endpoints in real-time, potentially requiring integration with Kubernetes' Endpoints API or similar mechanisms.
    • Any lag in updates could result in stale routing decisions.
  2. Proxy Modification

    • Kube-proxy would need modifications to check for local endpoints and route calls accordingly. Alternatively, a custom sidecar or service mesh could implement this logic.
  3. State Consistency

    • Handling cases where a local endpoint becomes unavailable during a request is critical to avoid failures or retries.
  4. Cross-Node Communication Scenarios

    • Some use cases explicitly require inter-node communication (e.g., when data locality or availability constraints exist). The design must ensure it doesn’t interfere with such requirements.
  5. Shared Memory or IPC Implementation

    • While bypassing the network stack, efficient local communication mechanisms (e.g., shared memory or Unix domain sockets) must be introduced to enable this functionality.

Minimal Design Change Example

  1. Enhance kube-proxy or Service Mesh
    Modify kube-proxy to:

    • Query the Kubernetes Endpoints API for available endpoints of the target service.
    • Prioritize endpoints on the same node.

    Example workflow:

    • Request arrives at kube-proxy.
    • Kube-proxy looks up endpoints for the service.
    • If a local endpoint exists, kube-proxy routes the request locally.
    • Otherwise, the request follows the standard networking path.
  2. Optional Local Library Layer
    Introduce a lightweight library or sidecar for service-to-service communication:

    • Services communicate with the library/sidecar instead of directly invoking the service proxy.
    • The library decides whether to route traffic locally or via the network.

Long-Term Improvements

  1. Topology-Aware Improvements
    Kubernetes already has topology-aware hints (beta as of Kubernetes 1.21+), which prioritize routing within the same node or zone. Your idea could extend these hints to enforce strict local communication when possible.

  2. Dynamic Endpoint Prioritization
    Extend Kubernetes' native load balancing to dynamically prioritize local endpoints while maintaining failover capabilities for cross-node communication.

  3. Integration with Service Mesh
    Service meshes like Istio or Linkerd could adopt this logic, ensuring optimized routing at the application layer without changing Kubernetes' core.


Why is this needed?

Introducing local endpoint prioritization is a promising optimization that aligns well with Kubernetes' goal of efficient, scalable service orchestration. While requiring some design changes, the benefits in reduced latency, cost, and network usage make it worth exploring, particularly for workloads with high intra-node communication.

@Bala2211 Bala2211 added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 22, 2024
@k8s-ci-robot
Copy link
Contributor

There are no sig labels on this issue. Please add an appropriate label by using one of the following commands:

  • /sig <group-name>
  • /wg <group-name>
  • /committee <group-name>

Please see the group list for a listing of the SIGs, working groups, and committees available.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Dec 22, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Dec 22, 2024
@sftim
Copy link
Contributor

sftim commented Dec 23, 2024

This sounds like topology-aware routing with internalTrafficPolicy @Bala2211

Can you explain how what you're suggesting is different?
/triage needs-information

@k8s-ci-robot k8s-ci-robot added the triage/needs-information Indicates an issue needs more information in order to work on it. label Dec 23, 2024
@bonsucosei
Copy link

require 'uri'
require 'net/http'

url = URI("https://api.codecov.io/api/v2/service/owner_username/repos/repo_name/test-results/")

http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true

request = Net::HTTP::Get.new(url)
request["accept"] = 'application/json'

response = http.request(request)
puts response.read_body

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

4 participants