Skip to content

server: gRPC server does not set NextProtocols #136367

Open
@tbg

Description

Describe the problem

While upgrading gRPC from v1.56.3 to v1.68.0, I noticed that post-bump CRDB was unable to connect to pre-bump CRDB nodes, the nodes using gRPC at v1.68.0 would print

W241128 13:28:08.615942 205 server/init.go:402 ⋮ [T1,Vsystem,n?] 25 outgoing join rpc to ‹127.0.0.1:29000› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: credentials: cannot check peer: missing selected ALPN property"›

This check is new in v1.68.0 (as in, it's not yet there in v1.56.3). So this problem likely persists, but it has not caused any issues. This new check was added to provide a better failure mode. Apparently every HTTP2 server needs to support ALPN, but our HTTP2 server is gRPC and it is a bit of a mystery how we end up failing this check, especially given that everything works once we disable the check, which we can do via an env var.

See the full initial analysis here: #136278 (comment)

With a fork of grpc-go that turns the check off, #136278 can merge. This issue serves as a reminder to get to the bottom of this problem, as future versions of gRPC are likely to hard-code the check, and we don't want to have to be on a fork forever.

To Reproduce

See #136278 (comment)

Jira issue: CRDB-45001

Metadata

Assignees

No one assigned

    Labels

    C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-server-and-securityDB Server & Securitybranch-masterFailures and bugs on the master branch.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions