Skip to content

Commit

Permalink
Add original k8s-mesos docs to contrib/mesos
Browse files Browse the repository at this point in the history
  • Loading branch information
sttts committed Jul 19, 2015
1 parent e3521a8 commit 8fca9b6
Show file tree
Hide file tree
Showing 14 changed files with 215 additions and 0 deletions.
40 changes: 40 additions & 0 deletions contrib/mesos/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Kubernetes-Mesos

Kubernetes-Mesos modifies Kubernetes to act as an [Apache Mesos](http://mesos.apache.org/) framework.

## Features On Mesos

Kubernetes gains the following benefits when installed on Mesos:

- **Node-Level Auto-Scaling** - Kubernetes minion nodes are created automatically, up to the size of the provisioned Mesos cluster.
- **Resource Sharing** - Co-location of Kubernetes with other popular next-generation services on the same cluster (e.g. [Hadoop](https://github.com/mesos/hadoop), [Spark](http://spark.apache.org/), and [Chronos](https://mesos.github.io/chronos/), [Cassandra](http://mesosphere.github.io/cassandra-mesos/), etc.). Resources are allocated to the frameworks based on fairness and can be claimed or passed on depending on framework load.
- **Independence from special Network Infrastructure** - Mesos can (but of course doesn't have to) run on networks which cannot assign a routable IP to every container. The Kubernetes on Mesos endpoint controller is specially modified to allow pods to communicate with services in such an environment.

## Features On DCOS

Kubernetes can also be installed on [Mesosphere DCOS](https://mesosphere.com/learn/), which runs Mesos as its core. This provides the following *additional* enterprise features:

- **High Availability** - Kubernetes components themselves run within Marathon, which manages restarting/recreating them if they fail, even on a different host if the original host might fail completely.
- **Easy Installation** - One-step installation via the [DCOS CLI](https://github.com/mesosphere/dcos-cli) or DCOS UI. Both download releases from the [Mesosphere Universe](https://github.com/mesosphere/universe), [Multiverse](https://github.com/mesosphere/multiverse), or private package repositories.
- **Easy Maintenance** - See what's going on in the cluster with the DCOS UI.

For more information about how Kubernetes-Mesos is different from Kubernetes, see [Architecture](./docs/architecture.md).


## Release Status

Kubernetes-Mesos is alpha quality, still under active development, and not yet recommended for production systems.

For more information about development progress, see the [known issues](./docs/issues.md) or the [kubernetes-mesos repository](https://github.com/mesosphere/kubernetes-mesos) where backlog issues are tracked.

## Usage

This project combines concepts and technologies from two already-complex projects: Mesos and Kubernetes. It may help to familiarize yourself with the basics of each project before reading on:

* [Mesos Documentation](http://mesos.apache.org/documentation/latest)
* [Kubernetes Documentation](../../README.md)

To get up and running with Kubernetes-Mesos, follow the [Getting started guide](../../docs/getting-started-guides/mesos.md).


[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/contrib/mesos/README.md?pixel)]()
1 change: 1 addition & 0 deletions contrib/mesos/docs/architecture.gliffy

Large diffs are not rendered by default.

34 changes: 34 additions & 0 deletions contrib/mesos/docs/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Kubernetes-Mesos Architecture

An [Apache Mesos][1] cluster consists of one or more masters, and one or more slaves.
Kubernetes-Mesos (k8sm) operates as a Mesos framework that runs on the cluster.
As a framework, k8sm provides scheduler and executor components, both of which are hybrids of Kubernetes and Mesos:
the scheduler component integrates the Kubernetes scheduling API and the Mesos scheduler runtime, whereas;
the executor component integrates Kubernetes kubelet services and the Mesos executor runtime.

Multiple Mesos masters are typically configured to coordinate leadership election via Zookeeper.
Future releases of Mesos may implement leader election protocols [differently][2].
Kubernetes maintains its internal registry (pods, replication controllers, bindings, minions, services) in etcd.
Users typically interact with Kubernetes using the `kubectl` command to manage Kubernetes primitives.

When a pod is created in Kubernetes, the k8sm scheduler creates an associated Mesos task and queues it for scheduling.
Upon pairing the pod/task with an acceptable resource offer, the scheduler binds the pod/task to the offer's slave.
As a result of binding the pod/task is launched and delivered to an executor (an executor is created by the Mesos slave if one is not already running).
The executor launches the pod/task, which registers the bound pod with the kubelet engine and the kubelet begins to manage the lifecycle of the pod instance.

![Architecture Diagram](architecture.png)

## Networking

Kubernetes-Mesos uses "normal" Docker IPv4, host-private networking, rather than Kubernetes' SDN-based networking that assigns an IP per pod. This is mostly transparent to the user, especially when using the service abstraction to access pods. For details on some issues it creates, see [issues][3].

![Network Diagram](networking.png)

[1]: http://mesos.apache.org/
[2]: https://issues.apache.org/jira/browse/MESOS-1806
[3]: issues.md#service-endpoints

[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/contrib/mesos/docs/README.md?pixel)]()


[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/contrib/mesos/docs/architecture.md?pixel)]()
Binary file added contrib/mesos/docs/architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions contrib/mesos/docs/architecture.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
66 changes: 66 additions & 0 deletions contrib/mesos/docs/ha.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
## High Availability

### Scheduler

The implementation of the scheduler HA feature includes:

- Checkpointing by default (`--checkpoint`)
- Large failover-timeout by default (`--failover_timeout`)
- Hot-failover w/ multiple scheduler instances (`--ha`)
- Best effort task reconciliation on failover

#### Multiple Instances

Multiple scheduler instances may be run to support a warm-standby scenario in which one scheduler fails and another takes over immediately.
But at any moment in time only one scheduler is actually registered with the leading Mesos master.
Scheduler leader election is implemented using etcd so it is important to have an HA etcd configuration established for reliable scheduler HA.

It is currently recommended that no more than 2 scheduler instances be running at the same time.
Running more than 2 schedulers at once may work but has not been extensively tested.
YMMV.

#### Failover

Scheduler failover may be triggered by either the following events:

- loss of leadership when running in HA mode (`--ha`).
- the leading scheduler process receives a USR1 signal.

It is currently possible signal failover to a single, non-HA scheduler process.
In this case, if there are problems launching a replacement scheduler process then the cluster may be without a scheduler until another is manually started.

#### How To

##### Command Line Arguments

- `--ha` is required to enable scheduler HA and multi-scheduler leader election.
- `--km_path` or else (`--executor_path` and `--proxy_path`) should reference non-local-file URI's and must be identicial across schedulers.

If you have HDFS installed on your slaves then you can specify HDFS URI locations for the binaries:

```shell
$ hdfs dfs -put -f bin/km hdfs:///km
$ ./bin/km scheduler ... --mesos_master=zk://zk1:2181,zk2:2181/mesos --ha --km_path=hdfs:///km
```

**IMPORTANT:** some command line parameters specified for the scheduler process are passed to the Kubelet-executor and so are subject to compatibility tests:

- a Mesos master will not recognize differently configured executors as being compatible, and so...
- a scheduler will refuse to accept any offer for slave resources if there are incompatible executors running on the slave.

Within the scheduler, compatibility is largely determined by comparing executor configuration hashes:
a hash is calculated from a subset of the executor-related command line parameters provided to the scheduler process.
The command line parameters that affect the hash calculation are listed below.

- `--allow_privileged`
- `--api_servers`
- `--auth_path`
- `--cluster_*`
- `--executor_*`
- `--kubelet_*`
- `--km_path`
- `--profiling`
- `--proxy_path`


[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/contrib/mesos/docs/ha.md?pixel)]()
66 changes: 66 additions & 0 deletions contrib/mesos/docs/issues.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
## Known Issues

### Pod Placement

The initial plan was to implement pod placement (aka scheduling "constraints") using rules similar to those found in Marathon.
Upon further consideration it has been decided that a greater alignment between the stock Kubernetes scheduler and kubernetes-mesos scheduler would benefit both projects, as well as end-users.
Currently it is not possible to specify pod placement constraints for the kubernetes-mesos scheduler.
This issue is being tracked here: https://github.com/mesosphere/kubernetes-mesos/issues/338

### Resource Allocation

Resource requirements (limits) specified on Kubernetes pods are currently ignored, both in the scheduler and on the node. Instead hardcoded values are used for the time being. This issue is being tracked here: https://github.com/mesosphere/kubernetes-mesos/issues/68.

In general Mesos is designed to handle resource accounting and enforcement across the cluster. Part of that enforcement involves "growing" and "shrinking" the pool of resources allocated for executor containers.
The current implementation of the kubelet-executor launches pods as Docker containers (just like the upstream kubelet) and makes no attempt to actually "contain" the pods that are launched. Because the kubernetes-mesos scheduler cannot depend on the kubelet-executor to properly contain resources, it foregoes implementing accurate resource accounting.

Recent changes to both the Docker and Kubernetes codebase have made it possible to implement the necessary changes in the kubelet-executor for proper pod containment. This is in the works and will be merged into a later version when ready.

### Ports

Mesos typically defines `ports` resources for each slave and these ports are consumed by tasks, as they are launched, that require one or more host ports.
Kubernetes pod container specifications identify two types of ports, container ports and host ports:
container ports are allocated from the network namespace of the pod, which is independent from that of the host, whereas;
host ports are allocated from the network namespace of the host.
The k8sm scheduler recognizes the declared host ports of each container in a pod/task and for each such port, attempts to allocate it from the offered ports listed in mesos resource offers.
If no host port is declared, then the scheduler may choose any port from the offered ports ranges.

If slaves are configured to offer a `ports` resource range, for example [31000-32000], then any host ports declared in the pod container specification must fall within that range.
Ports declared outside that range (other than zero) will never match resource offers received by the k8sm scheduler, and so pod specifications that declare such ports will never be executed as tasks on the cluster.

As opposed to Kubernetes proper, a missing pod container host port specification or a host port set to zero will allocate a host port from a resource offer.

### Service Endpoints

At the time of this writing both Kubernetes and Mesos are using IPv4 addressing, albeit under different assumptions.
Mesos clusters configured with Docker typically use default Docker networking, which is host-private.
Kubernetes clusters assume a custom Docker networking configuration that assigns a cluster-routable IPv4 address to each pod, meaning that a process running anywhere on a Kubernetes cluster can reach a pod running on the same cluster by using the pod's Docker-assigned IPv4 address.

Kubernetes service endpoints terminate, by default, at a backing pod's IPv4 address using the container-port selected for in the service specification (PodIP:ContainerPort).
This is problematic when default Docker networking has been configured, such as in the case of typical Mesos clusters, because a pod's host-private IPv4 address is not intended to be reachable outside of its host.

The k8sm project has implemented a work-around: service endpoints are terminated at HostIP:HostPort, where the HostIP is the IP address of the Mesos slave and the HostPort is the host port declared in the pod container port specification.
Host ports that are not defined, or else defined as zero, will automatically be assigned a (host) port resource from a resource offer.
When using the `controller-manager` provided by this project users should be sure to assign a `name` to each `service.spec.port` object, otherwise errors may reported in the endpoints controller manager regarding non-unique port values (#322).

To disable the work-around and revert to vanilla Kubernetes service endpoint termination:

* execute the k8sm controller-manager with `-host_port_endpoints=false`

Then the usual Kubernetes network assumptions must be fulfilled for Kubernetes to work with Mesos, i.e. each container must get a cluster-wide routable IP (compare [Kubernetes Networking documentation](../../../docs/design/networking.md#container-to-container)).

Future support for IPv6 addressing in Docker and Kubernetes should obviate the need for this work-around.

### Orphan Pods

The default `executor_shutdown_grace_period` of a Mesos slave is 3 seconds.
When the executor is shut down it forcefully terminates the Docker containers that it manages.
However, if terminating the Docker containers takes longer than the `executor_shutdown_grace_period` then some containers may not get a termination signal at all.
A consequence of this is that some pod containers, previously managed by the framework's executor, will remain running on the slave indefinitely.

There are two work-arounds to this problem:
* Restart the framework and it should terminate the orphaned tasks.
* Adjust the value of `executor_shutdown_grace_period` to something greater than 3 seconds.


[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/contrib/mesos/docs/issues.md?pixel)]()
Binary file added contrib/mesos/docs/logos/k8s-256x256.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added contrib/mesos/docs/logos/k8s-48x48.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added contrib/mesos/docs/logos/k8s-96x96.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions contrib/mesos/docs/networking.gliffy

Large diffs are not rendered by default.

Binary file added contrib/mesos/docs/networking.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions contrib/mesos/docs/networking.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 5 additions & 0 deletions docs/getting-started-guides/mesos.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,8 @@ It provides a step by step walk through of adding Kubernetes to a Mesos cluster
**NOTE:** There are [known issues with the current implementation][7] and support for centralized logging and monitoring is not yet available.
Please [file an issue against the kubernetes-mesos project][8] if you have problems completing the steps below.

Further information is available in the Kubernetes on Mesos [contrib directory][13].

### Prerequisites

* Understanding of [Apache Mesos][6]
Expand Down Expand Up @@ -344,6 +346,8 @@ Address 1: 10.10.10.1

Try out some of the standard [Kubernetes examples][9].

Read about Kubernetes on Mesos' architecture in the [contrib directory][13].

**NOTE:** Some examples require Kubernetes DNS to be installed on the cluster.
Future work will add instructions to this guide to enable support for Kubernetes DNS.

Expand All @@ -361,6 +365,7 @@ Future work will add instructions to this guide to enable support for Kubernetes
[10]: http://open.mesosphere.com/getting-started/cloud/google/mesosphere/#vpn-setup
[11]: ../../cluster/addons/dns/skydns-rc.yaml.in
[12]: ../../cluster/addons/dns/skydns-svc.yaml.in
[13]: ../../contrib/mesos/README.md


<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
Expand Down

0 comments on commit 8fca9b6

Please sign in to comment.