Skip to content

Commit

Permalink
added support for operational alerts
Browse files Browse the repository at this point in the history
  • Loading branch information
michaelwittig committed Apr 26, 2017
1 parent 8614b45 commit 8532773
Show file tree
Hide file tree
Showing 20 changed files with 614 additions and 41 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ We are offering the following templates:
* [Elastic Compute Cloud (EC2)](./ec2/)
* [EC2 Container Service (ECS)](./ecs/)
* [Jenkins ](./jenkins/)
* [Operations](./operations/)
* [Security](./security/)
* [Static website](./static-website/)
* [Virtual Private Cloud (VPC)](./vpc/)
Expand Down
3 changes: 2 additions & 1 deletion ec2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@ This template describes an EC2 instance with auto-recovery. If the instance fail

### Dependencies
* `vpc/vpc-*azs.yaml` (**required**)
* `vpc/vpc-ssh-bastion.yaml`
* `vpc/vpc-ssh-bastion.yaml` (recommended)
* `operations/alert.yaml` (recommended)

## Support
We offer support for our CloudFormation templates: setting up environments based on our templates, adopting templates to specific use cases, resolving issues in production environments. [Hire us!](https://widdix.net/)
Expand Down
25 changes: 24 additions & 1 deletion ec2/ec2-auto-recovery.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Metadata:
Parameters:
- ParentVPCStack
- ParentSSHBastionStack
- ParentAlertStack
- Label:
default: 'EC2 Parameters'
Parameters:
Expand All @@ -24,7 +25,11 @@ Parameters:
Description: 'Stack name of parent VPC stack based on vpc/vpc-*azs.yaml template.'
Type: String
ParentSSHBastionStack:
Description: 'Optional Stack name of parent SSH bastion host/instance stack based on vpc/vpc-ssh-bastion.yaml template.'
Description: 'Optional but recommended stack name of parent SSH bastion host/instance stack based on vpc/vpc-ssh-bastion.yaml template.'
Type: String
Default: ''
ParentAlertStack:
Description: 'Optional but recommended stack name of parent alert stack based on operations/alert.yaml template.'
Type: String
Default: ''
KeyName:
Expand Down Expand Up @@ -105,6 +110,7 @@ Conditions:
HasSSHBastionSecurityGroup: !Not [!Equals [!Ref ParentSSHBastionStack, '']]
HasNotSSHBastionSecurityGroup: !Equals [!Ref ParentSSHBastionStack, '']
HasNewRelic: !Not [!Equals [!Ref NewRelicLicenseKey, '']]
HasAlertTopic: !Not [!Equals [!Ref ParentAlertStack, '']]
Resources:
ElasticIP:
Type: 'AWS::EC2::EIP'
Expand Down Expand Up @@ -418,6 +424,23 @@ Resources:
Dimensions:
- Name: InstanceId
Value: !Ref VirtualMachine
CPUTooHighAlarm:
Condition: HasAlertTopic
Type: 'AWS::CloudWatch::Alarm'
Properties:
AlarmDescription: 'Average CPU utilization over last 10 minutes higher than 80%'
Namespace: 'AWS/EC2'
MetricName: CPUUtilization
Statistic: Average
Period: 600
EvaluationPeriods: 1
ComparisonOperator: GreaterThanThreshold
Threshold: 80
AlarmActions:
- 'Fn::ImportValue': !Sub '${ParentAlertStack}-TopicARN'
Dimensions:
- Name: InstanceId
Value: !Ref VirtualMachine
Outputs:
TemplateID:
Description: 'cloudonaut.io template id'
Expand Down
5 changes: 4 additions & 1 deletion ecs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,9 @@ This template describes a fault tolerant and scalable ECS cluster on AWS. The cl

### Dependencies
* `vpc/vpc-*azs.yaml` (**required**)
* `vpc/vpc-ssh-bastion.yaml`
* `vpc/vpc-ssh-bastion.yaml` (recommended)
* `security/auth-proxy-*.yaml`
* `operations/alert.yaml` (recommended)

## ECS service
This template describes a fault tolerant and scalable ECS service on AWS. The service scales based on CPU utilization.
Expand Down Expand Up @@ -56,6 +57,7 @@ This template describes a fault tolerant and scalable ECS service that uses the

#### Dependencies
* `ecs/cluster.yaml` (**required**)
* `operations/alert.yaml` (recommended)

### Using a dedicated load balancer for the service
This template describes a fault tolerant and scalable ECS service that uses a dedicated load balancer for the service.
Expand All @@ -76,6 +78,7 @@ This template describes a fault tolerant and scalable ECS service that uses a de
#### Dependencies
* `vpc/vpc-*azs.yaml` (**required**)
* `ecs/cluster.yaml` (**required**)
* `operations/alert.yaml` (recommended)

## Support
We offer support for our CloudFormation templates: setting up environments based on our templates, adopting templates to specific use cases, resolving issues in production environments. [Hire us!](https://widdix.net/)
Expand Down
103 changes: 97 additions & 6 deletions ecs/cluster.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Metadata:
- ParentVPCStack
- ParentSSHBastionStack
- ParentAuthProxyStack
- ParentAlertStack
- Label:
default: 'EC2 Parameters'
Parameters:
Expand All @@ -36,11 +37,15 @@ Parameters:
Description: 'Stack name of parent VPC stack based on vpc/vpc-*azs.yaml template.'
Type: String
ParentSSHBastionStack:
Description: 'Optional Stack name of parent SSH bastion host/instance stack based on vpc/vpc-ssh-bastion.yaml template.'
Description: 'Optional but recommended stack name of parent SSH bastion host/instance stack based on vpc/vpc-ssh-bastion.yaml template.'
Type: String
Default: ''
ParentAuthProxyStack:
Description: 'Optional Stack name of parent auth proxy stack based on security/auth-proxy-*.yaml template.'
Description: 'Optional stack name of parent auth proxy stack based on security/auth-proxy-*.yaml template.'
Type: String
Default: ''
ParentAlertStack:
Description: 'Optional but recommended stack name of parent alert stack based on operations/alert.yaml template.'
Type: String
Default: ''
KeyName:
Expand Down Expand Up @@ -147,6 +152,7 @@ Conditions:
HasLoadBalancerCertificateArn: !Not [!Equals [!Ref LoadBalancerCertificateArn, '']]
HasAuthProxySecurityGroupAndLoadBalancerCertificateArn: !And [!Condition HasAuthProxySecurityGroup, !Condition HasLoadBalancerCertificateArn]
HasNotAuthProxySecurityGroupAndLoadBalancerCertificateArn: !And [!Condition HasNotAuthProxySecurityGroup, !Condition HasLoadBalancerCertificateArn]
HasAlertTopic: !Not [!Equals [!Ref ParentAlertStack, '']]
Resources:
Cluster:
Type: 'AWS::ECS::Cluster'
Expand Down Expand Up @@ -323,6 +329,40 @@ Resources:
FromPort: 22
ToPort: 22
CidrIp: '0.0.0.0/0'
HTTPCodeELB5XXTooHighAlarm:
Condition: HasAlertTopic
Type: 'AWS::CloudWatch::Alarm'
Properties:
AlarmDescription: 'Application load balancer returns 5XX HTTP status codes'
Namespace: 'AWS/ApplicationELB'
MetricName: HTTPCode_ELB_5XX_Count
Statistic: Sum
Period: 60
EvaluationPeriods: 1
ComparisonOperator: GreaterThanThreshold
Threshold: 0
AlarmActions:
- 'Fn::ImportValue': !Sub '${ParentAlertStack}-TopicARN'
Dimensions:
- Name: LoadBalancer
Value: !GetAtt LoadBalancer.LoadBalancerFullName
HTTPCodeTarget5XXTooHighAlarm:
Condition: HasAlertTopic
Type: 'AWS::CloudWatch::Alarm'
Properties:
AlarmDescription: 'Application load balancer receives 5XX HTTP status codes from targets'
Namespace: 'AWS/ApplicationELB'
MetricName: HTTPCode_Target_5XX_Count
Statistic: Sum
Period: 60
EvaluationPeriods: 1
ComparisonOperator: GreaterThanThreshold
Threshold: 0
AlarmActions:
- 'Fn::ImportValue': !Sub '${ParentAlertStack}-TopicARN'
Dimensions:
- Name: LoadBalancer
Value: !GetAtt LoadBalancer.LoadBalancerFullName
LoadBalancer:
Type: 'AWS::ElasticLoadBalancingV2::LoadBalancer'
Properties:
Expand Down Expand Up @@ -804,12 +844,46 @@ Resources:
Value: !Ref Cluster
MetricName: CPUReservation
ComparisonOperator: GreaterThanThreshold
Statistic: Average
Statistic: Average # special rule because we scale on reservations and not utilization
Period: 60
EvaluationPeriods: 1
Threshold: 80
AlarmActions:
- !Ref ScaleUpPolicy
CPUReservationTooHighAlarm:
Condition: HasAlertTopic
Type: 'AWS::CloudWatch::Alarm'
Properties:
AlarmDescription: 'Average CPU reservation over last 10 minutes higher than 90%'
Namespace: 'AWS/ECS'
Dimensions:
- Name: ClusterName
Value: !Ref Cluster
MetricName: CPUReservation
ComparisonOperator: GreaterThanThreshold
Statistic: Average # special rule because we scale on reservations and not utilization
Period: 600
EvaluationPeriods: 1
Threshold: 90
AlarmActions:
- 'Fn::ImportValue': !Sub '${ParentAlertStack}-TopicARN'
CPUUtilizationTooHighAlarm:
Condition: HasAlertTopic
Type: 'AWS::CloudWatch::Alarm'
Properties:
AlarmDescription: 'Average CPU utilization over last 10 minutes higher than 80%'
Namespace: 'AWS/ECS'
Dimensions:
- Name: ClusterName
Value: !Ref Cluster
MetricName: CPUUtilization
ComparisonOperator: GreaterThanThreshold
Statistic: Average
Period: 600
EvaluationPeriods: 1
Threshold: 80
AlarmActions:
- 'Fn::ImportValue': !Sub '${ParentAlertStack}-TopicARN'
MemoryReservationHighAlarm:
Type: 'AWS::CloudWatch::Alarm'
Properties:
Expand All @@ -820,12 +894,29 @@ Resources:
Value: !Ref Cluster
MetricName: MemoryReservation
ComparisonOperator: GreaterThanThreshold
Statistic: Average
Statistic: Average # special rule because we scale on reservations and not utilization
Period: 60
EvaluationPeriods: 1
Threshold: 80
AlarmActions:
- !Ref ScaleUpPolicy
MemoryReservationTooHighAlarm:
Condition: HasAlertTopic
Type: 'AWS::CloudWatch::Alarm'
Properties:
AlarmDescription: 'Average memory reservation over last 10 minutes higher than 90%'
Namespace: 'AWS/ECS'
Dimensions:
- Name: ClusterName
Value: !Ref Cluster
MetricName: MemoryReservation
ComparisonOperator: GreaterThanThreshold
Statistic: Average # special rule because we scale on reservations and not utilization
Period: 600
EvaluationPeriods: 1
Threshold: 90
AlarmActions:
- 'Fn::ImportValue': !Sub '${ParentAlertStack}-TopicARN'
CPUReservationLowAlarm:
Type: 'AWS::CloudWatch::Alarm'
Properties:
Expand All @@ -836,7 +927,7 @@ Resources:
Value: !Ref Cluster
MetricName: CPUReservation
ComparisonOperator: LessThanThreshold
Statistic: Average
Statistic: Average # special rule because we scale on reservations and not utilization
Period: 60
EvaluationPeriods: 1
Threshold: 20
Expand All @@ -852,7 +943,7 @@ Resources:
Value: !Ref Cluster
MetricName: MemoryReservation
ComparisonOperator: LessThanThreshold
Statistic: Average
Statistic: Average # special rule because we scale on reservations and not utilization
Period: 60
EvaluationPeriods: 1
Threshold: 20
Expand Down
36 changes: 31 additions & 5 deletions ecs/service-cluster-alb.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ Metadata:
default: 'Parent Stacks'
Parameters:
- ParentClusterStack
- ParentAlertStack
- Label:
default: 'Load Balancer Parameters'
Parameters:
Expand All @@ -26,6 +27,10 @@ Parameters:
ParentClusterStack:
Description: 'Stack name of parent Cluster stack based on ecs/cluster.yaml template.'
Type: String
ParentAlertStack:
Description: 'Optional but recommended stack name of parent alert stack based on operations/alert.yaml template.'
Type: String
Default: ''
LoadBalancerPriority:
Description: 'The priority for the rule. Elastic Load Balancing evaluates rules in priority order, from the lowest value to the highest value. If a request satisfies a rule, Elastic Load Balancing ignores all subsequent rules. A target group can have only one rule with a given priority.'
Type: Number
Expand Down Expand Up @@ -78,6 +83,7 @@ Conditions:
HasLoadBalancerHttps: !Equals [!Ref LoadBalancerHttps, 'true']
HasLoadBalancerPath: !Not [!Equals [!Ref LoadBalancerPath, '']]
HasLoadBalancerHostPattern: !Not [!Equals [!Ref LoadBalancerHostPattern, '']]
HasAlertTopic: !Not [!Equals [!Ref ParentAlertStack, '']]
Resources:
TaskDefinition:
Type: 'AWS::ECS::TaskDefinition'
Expand Down Expand Up @@ -251,6 +257,26 @@ Resources:
StepAdjustments:
- MetricIntervalLowerBound: 0
ScalingAdjustment: -25
CPUUtilizationTooHighAlarm:
Condition: HasAlertTopic
Type: 'AWS::CloudWatch::Alarm'
Properties:
AlarmDescription: 'Average CPU utilization over last 10 minutes higher than 80%'
Namespace: 'AWS/ECS'
Dimensions:
- Name: ClusterName
Value:
'Fn::ImportValue': !Sub '${ParentClusterStack}-Cluster'
- Name: ServiceName
Value: !GetAtt 'Service.Name'
MetricName: CPUUtilization
ComparisonOperator: GreaterThanThreshold
Statistic: Average
Period: 300
EvaluationPeriods: 1
Threshold: 80
AlarmActions:
- 'Fn::ImportValue': !Sub '${ParentAlertStack}-TopicARN'
CPUUtilizationHighAlarm:
Type: 'AWS::CloudWatch::Alarm'
Properties:
Expand All @@ -265,9 +291,9 @@ Resources:
MetricName: CPUUtilization
ComparisonOperator: GreaterThanThreshold
Statistic: Average
Period: 60
Period: 300
EvaluationPeriods: 1
Threshold: 80
Threshold: 60
AlarmActions:
- !Ref ScaleUpPolicy
CPUUtilizationLowAlarm:
Expand All @@ -284,9 +310,9 @@ Resources:
MetricName: CPUUtilization
ComparisonOperator: LessThanThreshold
Statistic: Average
Period: 60
EvaluationPeriods: 1
Threshold: 20
Period: 300
EvaluationPeriods: 3
Threshold: 30
AlarmActions:
- !Ref ScaleDownPolicy
Outputs:
Expand Down
Loading

0 comments on commit 8532773

Please sign in to comment.