Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon: Remove extra mon from quorum before taking down pod #14667

Merged
merged 1 commit into from
Sep 5, 2024

Conversation

travisn
Copy link
Member

@travisn travisn commented Aug 30, 2024

When removing a mon from quorum, there is a race condition that can result in mon quorum going being lost at least temporarily. The mon pod was being deleted first, and then the mon removed from quorum. If any other mon went down between the time the pod of the bad mon was deleted and when the mon was removed from quorum, there may not be sufficient quorum to complete the action of removing the mon from quorum and the operator would be stuck.

For example, there could be 4 mons temporarily due to timing of upgrading K8s nodes where mons may be taken down for some number of minutes. Say a new mon is started while the down mon also comes back up. Now the operator sees it can remove the 4th mon from quorum, so it starts to remove it. Now say another mon goes down on another node that is being updated or otherwise drained. Since the 4th mon pod was deleted and another mon is down, there are only two mons remaining in quorum, but 3 mons are required in quorum when there are 4 mons. Therefore, the quorum is stuck until the third mon comes back up.

The solution is to first remove the extra mon from quorum before taking down the mon pod.

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

return errors.Wrap(err, "failed to update cluster rbd bootstrap peer token")
}

// We remove the mon pod last so that if there is some disconnect
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We remove the mon pod last

Can this be reframed??

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I'll clarify the comment


logger.Infof("there is an extra mon deployment that is not needed and not in quorum")
for _, deploy := range deployments {
monName := deploy.Labels[controller.DaemonIDLabel]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

controller.DaemonIDLabel
Here how we decide the extra one using label?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll clarify the comments. Basically, if we find an extra mon deployment that is not in the ceph mon quorum, we can delete the extra mon deployment.

Copy link
Member

@parth-gr parth-gr Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How it is identified as extra mon using this exp
monName := deploy.Labels[controller.DaemonIDLabel]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the name of the mon daemon, found in a running deployment. Then if the mon daemon of the same name is not found in the loop below comparing against each mon in the desired list of mons, it will be considered extra and needs to be removed on line 1090.

When removing a mon from quorum, there is a race condition that
can result in mon quorum going being lost at least temporarily.
The mon pod was being deleted first, and then the mon removed
from quorum. If any other mon went down between the time the
pod of the bad mon was deleted and when the mon was removed from
quorum, there may not be sufficient quorum to complete the
action of removing the mon from quorum and the operator would
be stuck.

For example, there could be 4 mons temporarily due to timing
of upgrading K8s nodes where mons may be taken down for some
number of minutes. Say a new mon is started while the down
mon also comes back up. Now the operator sees it can remove
the 4th mon from quorum, so it starts to remove it. Now say
another mon goes down on another node that is being updated
or otherwise drained. Since the 4th mon pod was deleted
and another mon is down, there are only two mons remaining
in quorum, but 3 mons are required in quorum when there
are 4 mons. Therefore, the quorum is stuck until the
third mon comes back up.

The solution is to first remove the extra mon from quorum
before taking down the mon pod.

Signed-off-by: Travis Nielsen <tnielsen@redhat.com>
Copy link
Member

@parth-gr parth-gr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@travisn travisn merged commit 772a4fa into rook:master Sep 5, 2024
54 checks passed
mergify bot added a commit that referenced this pull request Sep 5, 2024
mon: Remove extra mon from quorum before taking down pod (backport #14667)
@travisn travisn deleted the remove-mon-race branch October 4, 2024 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants