-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support mysql galera in PetSet #23828
Comments
cc @viglesiasce |
We're runnign e2es tests with a petset galera cluster now: https://github.com/kubernetes/kubernetes/tree/master/test/e2e/testing-manifests/petset/mysql-galera All that's left to close this bug is to align it with the example in HEAD, and document all the productionizing twists and turns. My examples is just rtm and try stuff till the e2e test consistently passed. |
Btw the image it uses is just the stock docker image from the galera site http://galeracluster.com/2015/05/getting-started-galera-with-docker-part-1/ (It's just uploaded to gcr.io for the e2e test), all the cluster bringup stuff is done in the init container so we're not managing a private image. Mysql runs as pid 1. |
@bprashanth: Two questions:
|
You need a dynamic provisioner, http://kubernetes.io/docs/user-guide/petset/#alpha-limitations. Do you have one in your cluster? if not you will need to hand create the volumes. Can you describe your failure mode in more detail?
I was under the impression that specifying the hostname in the mysql config will cause mysql to reresolve DNS periodically respecting the DNS TTL. Is that not the case? Petset doesn't get ips currently, almost every db I've tested handles this case well. I haven't run into an issue restarting galera either, but that doesn't mean there isn't any issue. Feedback and improvements welcome. |
I have created the volumes manually, they get bound to the volumeclaims created by The changing IP problem is what I want to test. The concern comes from services department I believe, but my task is to create a scenario to see if this problem will happen in a petset with DNS. |
That's a spurious error, I believe it will be fixed by #28909 |
|
@zefciu did you get around IP addresses? |
I still cannot run the YAML. Even with dynamic provisioning I get |
I can debug when I have some time but the e2e I pointed you at is passing as we speak so I'm guessing it's something to do with your env. where are you running this? have you made any modifications ot the yaml? what does logs show on the init containers? what does describe show on the pod? anything in controller manager logs? |
I am running on a ubuntu machine with ./hack/local-up-cluster.sh |
Did you create the volumes? |
The volumes and volume claims are created and bound using the dynamic provisioner. |
Synchronous replication for mysql. Each write is replicated across all nodes in the cluster and every server is an effective "master". New nodes added to the cluster download state based on a setting in my.cnf. There are 3 flavors of galera: corership, percona, mariadb, all support the wsrep (write set replication) api ( https://github.com/codership/mysql-wsrep) but are different in other ways. There are 2 ways to transfer state between members:
Initial deploy
To bootstrap the cluster start a single node up as a reference point for all other nodes, join everyone to this node, restart the reference point. More explicitly:
All 3 vendors have a (different) "bootstrap" command that wraps the first step, but one still needs to start mysqld on all the other nodes manually.
Perils:
wsrep_last_committed
value fromshow status like 'wsrep_%';
.Kube implementation notes:
Scaling
Adding nodes appears to be easy. Add a new node and specify IPs/hostnames of existing nodes, it downloads state. In practice it's more tricky, a single node is chosen as a "donor" and all state is rsynced. That donor will take a performance hit, the doner is chosen by the clustering algorithm.
The new node will need permissions to copy data from all nodes in the cluster. Instead of re-granting permissions it might be easier to just do so for eg: 10-dot/16?
TODO: There might be a way to copy the db offline and use IST to get the last few commits.
Failures
Galera uses quorum for failure handling, there's no failover, minority partition keeps trying to contact others but cannot commit data. Ideally a loadbalancer in front would only send writes to the PC. If nodes diverge in a way that no quorum is possible, one needs to pick and promote a master using the wsrep_last_committed value. Rehabilitation of failed nodes is tricky because SST mode will wipe the data dir (rm -rf essentially) and redownload.
Upgrade
Known incompatibility issues between some mysql versions, otherwise it doesn't matter which member is chosen for an update unless the cluster is currently bootstrapping.
Thoughts
Easier
Harder
Galera is simpler to reason about than other clustered solutions in some ways (including mysql cluster), the key differences from some curory research:
The text was updated successfully, but these errors were encountered: