ARP cache stops nodes from reaching master

Observed this on AWS, but I am unsure why it would not happen on GCE also.  If GCE has a workaround/fix, I would like to use the same one on AWS.

Here's what happened:  I was upgrading from 1.1 to 1.2, which requires launching new nodes.  One of my nodes did not come up, I think (but am not sure) that it had the same address as one of the old nodes.  That node could not ping the master.   The master had an incorrect ARP entry in the arp table for the node in question (compared vs the node's ifconfig).  I flushed the node's ARP entry from the master and immediately ping resumed and the node came up soon thereafter.

I believe this is the problem, it might only affect newer kernels: https://news.ycombinator.com/item?id=8732151


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARP cache stops nodes from reaching master #23395

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development