Skip to content

ARP cache stops nodes from reaching master #23395

Closed
@justinsb

Description

Observed this on AWS, but I am unsure why it would not happen on GCE also. If GCE has a workaround/fix, I would like to use the same one on AWS.

Here's what happened: I was upgrading from 1.1 to 1.2, which requires launching new nodes. One of my nodes did not come up, I think (but am not sure) that it had the same address as one of the old nodes. That node could not ping the master. The master had an incorrect ARP entry in the arp table for the node in question (compared vs the node's ifconfig). I flushed the node's ARP entry from the master and immediately ping resumed and the node came up soon thereafter.

I believe this is the problem, it might only affect newer kernels: https://news.ycombinator.com/item?id=8732151

Metadata

Assignees

No one assigned

    Labels

    priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions