Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental gRPC transport plugin #16534

Merged
merged 19 commits into from
Dec 31, 2024
Merged

Conversation

finnegancarroll
Copy link
Contributor

@finnegancarroll finnegancarroll commented Oct 31, 2024

Description

Auxiliary transports

Provides a framework for implementing pluggable auxiliary transports through NetworkPlugin.
Auxiliary transports run in parallel to existing http & native transport servers.
None or many Auxiliary transports may be provided and can be configured with the following settings:

aux.transport.types enables all listed transports which are installed.
aux.transport.<type>.ports configures the port range for the specified transport.

In the absence of aux.transport.<type>.ports the auxiliary transport falls back to a default port range (9400-9500).
If aux.transport.types is set and the corresponding transport plugin is not installed an exception is thrown.

transport-grpc plugin

A gRPC server plugin is implemented as an initial auxiliary transport.
This gRPC server includes only default list and Health/Check services.

To enable the gRPC plugin include the following in run.gradle.

setting 'aux.transport.types', '[experimental-transport-grpc]'
setting 'aux.transport.experimental-transport-grpc.ports', '9400-9500' //optional

Then run OpenSearch with the plugin installed.

./gradlew run -PinstalledPlugins="['transport-grpc']"

grpcurl commands for querying default services.

grpcurl -plaintext localhost:9400 list
grpcurl -plaintext localhost:9400 grpc.health.v1.Health/Check

Related Issues

RFCs detailing generation of protobuf files from api-specification - 1 2.

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for b5fddb1: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 24a40ad: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@finnegancarroll finnegancarroll mentioned this pull request Nov 4, 2024
1 task
Copy link
Contributor

github-actions bot commented Nov 7, 2024

❌ Gradle check result for 3bcd5bb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Nov 7, 2024

❌ Gradle check result for b2a1078: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Nov 8, 2024

❌ Gradle check result for ee85066: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for eda1c04: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 5038cf3: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 1e5d354: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 00d4eb1: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 6151622: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for b8d88d2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for b404cee: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 41e70eb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for e985043: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for bbabecd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 853b206: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for da16c66: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@finnegancarroll finnegancarroll changed the base branch from main to feature/grpc November 23, 2024 00:17
@finnegancarroll finnegancarroll marked this pull request as ready for review November 23, 2024 00:24
Copy link
Contributor

❌ Gradle check result for ecd998e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Finn Carroll <carrofin@amazon.com>
Copy link
Contributor

✅ Gradle check result for 6fc1453: SUCCESS

@andrross andrross merged commit 7a0e8fb into opensearch-project:main Dec 31, 2024
39 of 40 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-16534-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 7a0e8fbed7116173e11f52236c347affd4e22a16
# Push it to GitHub
git push --set-upstream origin backport/backport-16534-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-16534-to-2.x.

@andrross
Copy link
Member

@finnegancarroll Can you create a manual backport for this?

finnegancarroll added a commit to finnegancarroll/OpenSearch that referenced this pull request Jan 6, 2025
Introduce auxiliary transport to NetworkPlugin and add gRPC plugin.

Auxiliary transports are optional lifecycle components provided by
network plugins which run in parallel to the http server/native
transport. They are distinct from the existing NetworkPlugin
interfaces of 'getTransports' and 'getHttpTransports' as auxiliary
transports are optional. Each AuxTransport implements it's own
'aux.transport.type' and 'aux.transport.<type>.ports' setting. Since
Security.java initializes previous to Node.java during bootstrap
socket binding permissions are granted based on
'aux.transport.<type>.ports' for each enabled 'aux.transport.type',
falling back to a default if no ports are specified.

Signed-off-by: Finn Carroll <carrofin@amazon.com>
(cherry picked from commit 7a0e8fb)
finnegancarroll added a commit to finnegancarroll/OpenSearch that referenced this pull request Jan 6, 2025
Introduce auxiliary transport to NetworkPlugin and add gRPC plugin.

Auxiliary transports are optional lifecycle components provided by
network plugins which run in parallel to the http server/native
transport. They are distinct from the existing NetworkPlugin
interfaces of 'getTransports' and 'getHttpTransports' as auxiliary
transports are optional. Each AuxTransport implements it's own
'aux.transport.type' and 'aux.transport.<type>.ports' setting. Since
Security.java initializes previous to Node.java during bootstrap
socket binding permissions are granted based on
'aux.transport.<type>.ports' for each enabled 'aux.transport.type',
falling back to a default if no ports are specified.

Signed-off-by: Finn Carroll <carrofin@amazon.com>
(cherry picked from commit 7a0e8fb)
@reta reta mentioned this pull request Jan 6, 2025
1 task
andrross pushed a commit that referenced this pull request Jan 7, 2025
Introduce auxiliary transport to NetworkPlugin and add gRPC plugin.

Auxiliary transports are optional lifecycle components provided by
network plugins which run in parallel to the http server/native
transport. They are distinct from the existing NetworkPlugin
interfaces of 'getTransports' and 'getHttpTransports' as auxiliary
transports are optional. Each AuxTransport implements it's own
'aux.transport.type' and 'aux.transport.<type>.ports' setting. Since
Security.java initializes previous to Node.java during bootstrap
socket binding permissions are granted based on
'aux.transport.<type>.ports' for each enabled 'aux.transport.type',
falling back to a default if no ports are specified.

Signed-off-by: Finn Carroll <carrofin@amazon.com>
(cherry picked from commit 7a0e8fb)
@rishabhmaurya
Copy link
Contributor

@finnegancarroll how aux transport publish address is known to other nodes of the cluster to create a client?

bugmakerrrrrr pushed a commit to bugmakerrrrrr/OpenSearch that referenced this pull request Jan 8, 2025
Introduce auxiliary transport to NetworkPlugin and add gRPC plugin.

Auxiliary transports are optional lifecycle components provided by
network plugins which run in parallel to the http server/native
transport. They are distinct from the existing NetworkPlugin
interfaces of 'getTransports' and 'getHttpTransports' as auxiliary
transports are optional. Each AuxTransport implements it's own
'aux.transport.type' and 'aux.transport.<type>.ports' setting. Since
Security.java initializes previous to Node.java during bootstrap
socket binding permissions are granted based on
'aux.transport.<type>.ports' for each enabled 'aux.transport.type',
falling back to a default if no ports are specified.

Signed-off-by: Finn Carroll <carrofin@amazon.com>
@finnegancarroll
Copy link
Contributor Author

@finnegancarroll how aux transport publish address is known to other nodes of the cluster to create a client?

Currently there is no endpoint for discovering published ports of an aux transport as we expect aux transports to implement this endpoint themselves if applicable to their specific client/server transport. The gRPC plugin will have an equivalent to _cat/nodes when service definitions are added to the plugin.

@rishabhmaurya
Copy link
Contributor

@finnegancarroll @reta I was wondering if we should maintain the mapping of aux transport type and port used by it in DiscoverNode?

@reta
Copy link
Collaborator

reta commented Jan 8, 2025

@finnegancarroll @reta I was wondering if we should maintain the mapping of aux transport type and port used by it in DiscoverNode?

@rishabhmaurya are we replacing the transport, like default TCP/Netty4 -> TCP/Apache Arrow Flight? If yes, than the transport port would be part of the DiscoverNode. Otherwise we probably should not maintain any mappings in DiscoverNode I think (this is not essential to the cluster management).

@rishabhmaurya
Copy link
Contributor

@reta I get your point to not make DiscoveryNode and eventually cluster state bulky and it is also not relevant to cluster formation. But the downside being, the plugin dependent on aux transport, will need an additional call to get published port info when nodes are added/removed. This is also an overhead on the cluster as cluster manager node needs to get this information from each of the node so internally num of nodes times API calls to fetch this info from each node.

@reta
Copy link
Collaborator

reta commented Jan 8, 2025

But the downside being, the plugin dependent on aux transport, will need an additional call to get published port info when nodes are added/removed.

@rishabhmaurya correct, this is how our sniffer for HTTP works:

  • each node publishes HTTP transport info
  • the clients (other plugins) may periodically pull the nodes

What I am trying to understand is:

  • do we have this aux transport to be installed on every node in the cluster?
  • is it supposed to be used for node-to-node as transport? or both?
  • if yes, is it a replacement for transport?
  • if no, is it an additional transport, not something auxiliary but necessary for cluster to function properly?

Could you please help me here by clarifying these questions? That would help to find the right place for it, thank you.

@rishabhmaurya
Copy link
Contributor

thanks @reta

do we have this aux transport to be installed on every node in the cluster?

Yes, it will be installed on all non-cluster manager nodes.

is it supposed to be used for node-to-node as transport? or both?

Initially only for node-to-node transport and eventually for both.

if yes, is it a replacement for transport?

No. It probably will always be auxiliary in nature.

if no, is it an additional transport, not something auxiliary but necessary for cluster to function properly?

The cluster formation and functionality will not be affected with this transport. It will be a replacement for some of the search APIs to begin with.

@reta
Copy link
Collaborator

reta commented Jan 8, 2025

Thank you @rishabhmaurya , I honestly believe that we could start with sniffer like approach (from your explanation, it does not seem to be critical to immediately reflect topology changes with nodes joining / leaving the cluster and should not happen often). We could reconsider this later on or combine some operations with cluster discovery (when nodes leaves the cluster, just remove the aux transport entry for this node as well).

@rishabhmaurya
Copy link
Contributor

sounds reasonable thanks @reta

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport-failed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants