Skip to content

Commit

Permalink
Take availability.md doc and
Browse files Browse the repository at this point in the history
- extract the portion related to multi-cluster operation into a new multi-cluster.md doc
- merge the remainder (that was basically high-level troubleshooting advice) into cluster-troubleshooting.md
  • Loading branch information
David Oppenheimer committed Jul 16, 2015
1 parent 596a8a4 commit c57c877
Show file tree
Hide file tree
Showing 5 changed files with 86 additions and 82 deletions.
2 changes: 1 addition & 1 deletion docs/admin/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ Before choosing a particular guide, here are some things to consider:
Kubernetes.
- If you are configuring kubernetes on-premises, you will need to consider what [networking
model](networking.md) fits best.
- If you are designing for very [high-availability](availability.md), you may want multiple clusters in multiple zones.
- If you are designing for very high-availability, you may want [clusters in multiple zones](multi-cluster.md).

## Setting up a cluster

Expand Down
74 changes: 74 additions & 0 deletions docs/admin/cluster-troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,80 @@ of the relevant log files. (note that on systemd based systems, you may need to
* /var/log/kubelet.log - Kubelet, responsible for running containers on the node
* /var/log/kube-proxy.log - Kube Proxy, responsible for service load balancing

## A general overview of cluster failure modes

This is an incomplete list of things that could go wrong, and how to deal with them.

Root causes:
- VM(s) shutdown
- Network partition within cluster, or between cluster and users
- Crashes in Kubernetes software
- Data loss or unavailability of persistent storage (e.g. GCE PD or AWS EBS volume)
- Operator error, e.g. misconfigured kubernetes software or application software

Specific scenarios:
- Apiserver VM shutdown or apiserver crashing
- Results
- unable to stop, update, or start new pods, services, replication controller
- existing pods and services should continue to work normally, unless they depend on the Kubernetes API
- Apiserver backing storage lost
- Results
- apiserver should fail to come up
- kubelets will not be able to reach it but will continue to run the same pods and provide the same service proxying
- manual recovery or recreation of apiserver state necessary before apiserver is restarted
- Supporting services (node controller, replication controller manager, scheduler, etc) VM shutdown or crashes
- currently those are colocated with the apiserver, and their unavailability has similar consequences as apiserver
- in future, these will be replicated as well and may not be co-located
- they do not have their own persistent state
- Individual node (VM or physical machine) shuts down
- Results
- pods on that Node stop running
- Network partition
- Results
- partition A thinks the nodes in partition B are down; partition B thinks the apiserver is down. (Assuming the master VM ends up in partition A.)
- Kubelet software fault
- Results
- crashing kubelet cannot start new pods on the node
- kubelet might delete the pods or not
- node marked unhealthy
- replication controllers start new pods elsewhere
- Cluster operator error
- Results
- loss of pods, services, etc
- lost of apiserver backing store
- users unable to read API
- etc.

Mitigations:
- Action: Use IaaS providers automatic VM restarting feature for IaaS VMs
- Mitigates: Apiserver VM shutdown or apiserver crashing
- Mitigates: Supporting services VM shutdown or crashes

- Action use IaaS providers reliable storage (e.g GCE PD or AWS EBS volume) for VMs with apiserver+etcd
- Mitigates: Apiserver backing storage lost

- Action: Use [replicated APIserver](high-availability.md) feature
- Mitigates: Apiserver VM shutdown or apiserver crashing
- Will tolerate one or more simultaneous apiserver failures
- Mitigates: Apiserver backing storage (i.e., etcd's data directory) lost
- Each apiserver has independent storage. Etcd will recover from loss of one member. Risk of total data loss greatly reduced.

- Action: Snapshot apiserver PDs/EBS-volumes periodically
- Mitigates: Apiserver backing storage lost
- Mitigates: Some cases of operator error
- Mitigates: Some cases of kubernetes software fault

- Action: use replication controller and services in front of pods
- Mitigates: Node shutdown
- Mitigates: Kubelet software fault

- Action: applications (containers) designed to tolerate unexpected restarts
- Mitigates: Node shutdown
- Mitigates: Kubelet software fault

- Action: Multiple independent clusters (and avoid making risky changes to all clusters at once)
- Mitigates: Everything listed above.


<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/admin/cluster-troubleshooting.md?pixel)]()
Expand Down
88 changes: 9 additions & 79 deletions docs/admin/availability.md → docs/admin/multi-cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,87 +20,17 @@ certainly want the docs that go with that version.</h1>
<!-- END STRIP_FOR_RELEASE -->

<!-- END MUNGE: UNVERSIONED_WARNING -->
# Availability

This document collects advice on reasoning about and provisioning for high-availability when using Kubernetes clusters.

## Failure modes

This is an incomplete list of things that could go wrong, and how to deal with them.

Root causes:
- VM(s) shutdown
- network partition within cluster, or between cluster and users.
- crashes in Kubernetes software
- data loss or unavailability of persistent storage (e.g. GCE PD or AWS EBS volume).
- operator error misconfigures kubernetes software or application software.

Specific scenarios:
- Apiserver VM shutdown or apiserver crashing
- Results
- unable to stop, update, or start new pods, services, replication controller
- existing pods and services should continue to work normally, unless they depend on the Kubernetes API
- Apiserver backing storage lost
- Results
- apiserver should fail to come up.
- kubelets will not be able to reach it but will continue to run the same pods and provide the same service proxying.
- manual recovery or recreation of apiserver state necessary before apiserver is restarted.
- Supporting services (node controller, replication controller manager, scheduler, etc) VM shutdown or crashes
- currently those are colocated with the apiserver, and their unavailability has similar consequences as apiserver
- in future, these will be replicated as well and may not be co-located
- they do not have own persistent state
- Node (thing that runs kubelet and kube-proxy and pods) shutdown
- Results
- pods on that Node stop running
- Kubelet software fault
- Results
- crashing kubelet cannot start new pods on the node
- kubelet might delete the pods or not
- node marked unhealthy
- replication controllers start new pods elsewhere
- Cluster operator error
- Results:
- loss of pods, services, etc
- lost of apiserver backing store
- users unable to read API
- etc

Mitigations:
- Action: Use IaaS providers automatic VM restarting feature for IaaS VMs.
- Mitigates: Apiserver VM shutdown or apiserver crashing
- Mitigates: Supporting services VM shutdown or crashes

- Action use IaaS providers reliable storage (e.g GCE PD or AWS EBS volume) for VMs with apiserver+etcd.
- Mitigates: Apiserver backing storage lost

- Action: Use Replicated APIserver feature (when complete: feature is planned but not implemented)
- Mitigates: Apiserver VM shutdown or apiserver crashing
- Will tolerate one or more simultaneous apiserver failures.
- Mitigates: Apiserver backing storage lost
- Each apiserver has independent storage. Etcd will recover from loss of one member. Risk of total data loss greatly reduced.

- Action: Snapshot apiserver PDs/EBS-volumes periodically
- Mitigates: Apiserver backing storage lost
- Mitigates: Some cases of operator error
- Mitigates: Some cases of kubernetes software fault

- Action: use replication controller and services in front of pods
- Mitigates: Node shutdown
- Mitigates: Kubelet software fault

- Action: applications (containers) designed to tolerate unexpected restarts
- Mitigates: Node shutdown
- Mitigates: Kubelet software fault

- Action: Multiple independent clusters (and avoid making risky changes to all clusters at once)
- Mitigates: Everything listed above.

## Choosing Multiple Kubernetes Clusters
# Considerations for running multiple Kubernetes clusters

You may want to set up multiple kubernetes clusters, both to
have clusters in different regions to be nearer to your users; and to tolerate failures and/or invasive maintenance.
This document describes some of the issues to consider when making a decision about doing so.

### Scope of a single cluster
Note that at present,
Kubernetes does not offer a mechanism to aggregate multiple clusters into a single virtual cluster. However,
we [plan to do this in the future](../proposals/federation.md).

## Scope of a single cluster

On IaaS providers such as Google Compute Engine or Amazon Web Services, a VM exists in a
[zone](https://cloud.google.com/compute/docs/zones) or [availability
Expand All @@ -124,7 +54,7 @@ Reasons to have multiple clusters include:
below).
- test clusters to canary new Kubernetes releases or other cluster software.

### Selecting the right number of clusters
## Selecting the right number of clusters
The selection of the number of kubernetes clusters may be a relatively static choice, only revisited occasionally.
By contrast, the number of nodes in a cluster and the number of pods in a service may be change frequently according to
load and growth.
Expand Down Expand Up @@ -153,5 +83,5 @@ failures of a single cluster are not visible to end users.


<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/admin/availability.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/admin/multi-cluster.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->
2 changes: 1 addition & 1 deletion docs/design/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Kubernetes enables users to ask a cluster to run a set of containers. The system

Kubernetes is intended to run on a number of cloud providers, as well as on physical hosts.

A single Kubernetes cluster is not intended to span multiple availability zones. Instead, we recommend building a higher-level layer to replicate complete deployments of highly available applications across multiple zones (see [the availability doc](../admin/availability.md) and [cluster federation proposal](../proposals/federation.md) for more details).
A single Kubernetes cluster is not intended to span multiple availability zones. Instead, we recommend building a higher-level layer to replicate complete deployments of highly available applications across multiple zones (see [the multi-cluster doc](../admin/multi-cluster.md) and [cluster federation proposal](../proposals/federation.md) for more details).

Finally, Kubernetes aspires to be an extensible, pluggable, building-block OSS platform and toolkit. Therefore, architecturally, we want Kubernetes to be built as a collection of pluggable components and layers, with the ability to use alternative schedulers, controllers, storage systems, and distribution mechanisms, and we're evolving its current code in that direction. Furthermore, we want others to be able to extend Kubernetes functionality, such as with higher-level PaaS functionality or multi-cluster layers, without modification of core Kubernetes source. Therefore, its API isn't just (or even necessarily mainly) targeted at end users, but at tool and extension developers. Its APIs are intended to serve as the foundation for an open ecosystem of tools, automation systems, and higher-level API layers. Consequently, there are no "internal" inter-component APIs. All APIs are visible and available, including the APIs used by the scheduler, the node controller, the replication-controller manager, Kubelet's API, etc. There's no glass to break -- in order to handle more complex use cases, one can just access the lower-level APIs in a fully transparent, composable manner.

Expand Down
2 changes: 1 addition & 1 deletion docs/getting-started-guides/scratch.md
Original file line number Diff line number Diff line change
Expand Up @@ -471,7 +471,7 @@ You will need to run one or more instances of etcd.
- Alternative: run 3 or 5 etcd instances.
- Log can be written to non-durable storage because storage is replicated.
- run a single apiserver which connects to one of the etc nodes.
See [Availability](../admin/availability.md) for more discussion on factors affecting cluster
See [cluster-troubleshooting](../admin/cluster-troubleshooting.md) for more discussion on factors affecting cluster
availability.

To run an etcd instance:
Expand Down

0 comments on commit c57c877

Please sign in to comment.