Skip to content

Commit

Permalink
docs: flink eks
Browse files Browse the repository at this point in the history
  • Loading branch information
gangliao committed Nov 27, 2021
1 parent e1a274b commit a01c894
Show file tree
Hide file tree
Showing 8 changed files with 2,723 additions and 1,101 deletions.
4 changes: 2 additions & 2 deletions docs/benchmarks.scrbl
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@
@local-table-of-contents[#:style 'immediate-only]

@include-section{benchmarks/flink.scrbl}
@include-section{benchmarks/ec2.scrbl}

@include-section{benchmarks/eks.scrbl}
@include-section{benchmarks/kinesis.scrbl}
3 changes: 0 additions & 3 deletions docs/benchmarks/ec2.scrbl

This file was deleted.

139 changes: 139 additions & 0 deletions docs/benchmarks/eks.scrbl
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
#lang scribble/manual
@(require "../fancyverb.rkt" "../utils.rkt")
@(require scribble/core racket/list)
@(require (for-label racket rackunit))
@(require redex/reduction-semantics
redex/pict (only-in pict scale))

@(require scribble/examples racket/sandbox)

@(define core-racket
(parameterize ([sandbox-output 'string]
[sandbox-error-output 'string]
[sandbox-memory-limit 50])
(make-evaluator 'racket)))

@(core-racket '(require racket/match))

@(define (bash-repl . s)
(filebox (emph "shell")
(apply fancyverbatim "bash" s)))

@title[#:tag "flink_eks" #:style 'unnumbered]{Flink on Amazon EKS}

@table-of-contents[]

@section{Overview}

Flink supports different deployment modes when running on @link["https://kubernetes.io/"]{Kubernetes}. We
will show you how to deploy Flink on Kubernetes using the
@link["https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/deployment/resource-providers/native_kubernetes/"]{Native Kubernetes Deployment}.

@section[#:style 'unnumbered]{Amazon EKS}

@link["https://aws.amazon.com/eks/"]{Amazon EKS} is a fully managed Kubernetes service. EKS supports creating and managing Spot Instances using Amazon EKS
managed node groups following Spot best practices. This enables you to take advantage of the steep savings and scale
that Spot Instances provide for interruptible workloads. EKS-managed node groups require less operational effort
compared to using self-managed nodes.

Flink can run jobs on Kubernetes via Application and Session Modes only:

@bold{Application Mode}: This is a lightweight and scalable way to submit an application on Flink and is the preferred way
to launch application as it supports better resource isolation. Resource isolation is achieved by running a cluster
per job. Once the application shuts down all the Flink components are cleaned up.

@bold{Session Mode}: This is a long running Kubernetes deployment of Flink. Multiple applications can be launched on a
cluster and the applications competes for the resources. There may be multiple jobs running on a TaskManager in
parallel. Its main advantage is that it saves time on spinning up a new Flink cluster for new jobs, however if one
of the Task Managers fails it may impact all the jobs running on that.

@section[#:style 'unnumbered]{AWS Spot Instances}

Flink is distributed to manage and process high volumes of data. Designed for failure, they can run on machines with
different configurations, inherently resilient and flexible. Spot Instances can optimize runtimes by increasing
throughput, while spending the same (or less). Flink can tolerate interruptions using restart and failover strategies.

Job Manager and Task Manager are key building blocks of Flink. The Task Manager is the compute intensive part and
Job Manager is the orchestrator. We would be running Task Manager on Spot Instances and Job Manager on On Demand
Instances.

Flink supports elastic scaling via @link["https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/elastic_scaling/"]{Reactive Mode}.
This is ideal with Spot Instances as it implements elastic scaling with higher throughput in a cost optimized way.

@section[#:style 'unnumbered]{Flink Deployment}

For production use, we recommend deploying Flink Applications in the @bold{Application Mode}, as these modes provide a
better isolation for the Applications. We will be bundling the user code in the Flink image for that purpose and
upload in @link["https://aws.amazon.com/ecr/"]{Amazon ECR}. Amazon ECR is a fully managed container registry that
makes it easy to store, manage, share, and deploy your container images and artifacts anywhere.

@itemlist[#:style 'ordered
@item{Build the Amazon ECR Image
@itemlist[
@item{Login using the following cmd and don’t forget to replace the AWS_REGION and AWS_ACCOUNT_ID with your details.}
@item{Create a repository.}
@item{Build the Docker image.}
@item{Tag and Push your image to Amazon ECR.}
]}

@item{Create Amazon S3/Amazon Kinesis Access Policy
@itemlist[
@item{We must create an access policy to allow the Flink application to read/write from Amazon S3 and read Kinesis data streams.
Run the following to create the policy. Note the ARN.}
]}

@item{Cluster and node groups deployment
@itemlist[
@item{Create an EKS cluster. The cluster takes approximately 15 minutes to launch.}
@item{Create the node group using the nodeGroup config file. We are using multiple nodeGroups of different
sizes to adapt Spot best practice of diversification. Replace the <<Policy ARN>> string using the ARN string
from the previous step.}
@item{Download the Cluster Autoscaler and edit it to add the cluster-name.}
]}

@item{Install the Cluster AutoScaler using the following command: kubectl apply -f cluster-autoscaler-autodiscover.yaml
@itemlist[
@item{Using EKS Managed node groups requires significantly less operational effort compared to using self-managed
node group and enables: 1) Auto enforcement of Spot best practices. 2) Spot Instance lifecycle management. 3)
Auto labeling of Pods.}
@item{eksctl has integrated @link["https://github.com/aws/amazon-ec2-instance-selector"]{amazon-ec2-instance-selector}
to enable auto selection of instances based on the criteria passed.}
]}

@item{Create service accounts for Flink

@bash-repl{
$ kubectl create serviceaccount flink-service-account
$ kubectl create clusterrolebinding flink-role-binding-flink --clusterrole=edit --serviceaccount=default:flink-service-account
}
}

@item{Deploy Flink

This install folder here has all the YAML files required to deploy a standalone Flink cluster. Run the install.sh
file. This will deploy the cluster with a JobManager, a pool of TaskManagers and a Service exposing JobManager’s
ports.

@itemlist[
@item{This is a High-Availability(HA) deployment of Flink with the use of Kubernetes high availability service.}
@item{The JobManager runs on OnDemand and TaskManager on Spot. As the cluster is launched in Application Mode,
if a node is interrupted only one job will be restarted.}
@item{Autoscaling is enabled by the use of Reactive Mode. Horizontal Pod Autoscaler is used to monitor the CPU
load and scale accordingly.}
@item{Check-pointing is enabled which allows Flink to save state and be fault tolerant.}
]}

]


@section[#:style 'unnumbered]{Conclusion}

In this post, we demonstrated how you can run Flink workloads on a Kubernetes Cluster using Spot Instances,
achieving scalability, resilience, and cost optimization.


@section[#:style 'unnumbered]{Reference}

@itemlist[#:style 'ordered
@item{Kinnar Sen. Optimizing Apache Flink on Amazon EKS using Amazon EC2 Spot Instances. AWS Compute Blog. 11 NOV 2021. @link["https://aws.amazon.com/blogs/compute/optimizing-apache-flink-on-amazon-eks-using-amazon-ec2-spot-instances/"]{https://aws.amazon.com/blogs/compute/optimizing-apache-flink-on-amazon-eks-using-amazon-ec2-spot-instances/}.}
]
4 changes: 4 additions & 0 deletions docs/benchmarks/kinesis.scrbl
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#lang scribble/manual
@title[#:tag "flink_kinesis" #:style 'unnumbered]{Amazon Kinesis Data Analytics for Apache Flink}


Loading

0 comments on commit a01c894

Please sign in to comment.