Support spotFleet for instances groups #1784

ese · 2017-02-04T10:58:35Z

Spot fleets allow to use a wide range of instances types to provide certain amount of resources by requesting spot instances. This is great to provide capacity for workloads that require high amount of resources, dont require real-time interaction and are able to recover if the node dissapear
kubernetes/kubernetes#24472
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-fleet.html

justinsb · 2017-02-05T15:59:50Z

I agree this would be a great feature. There were some limitations around tagging of instances previously which make this non-trivial. An interesting option (IMO) is to have the autoscaler be "spot fleet" aware; either by directly using spot fleet or by reimplementing spot fleet.

The advantage of reimplementing spot fleet is maybe the autoscaler can know better what resources are required, for example whether we need CPU or Memory or GPUs etc. Also that it would be cloud agnostic.

jpcope · 2017-05-11T01:49:12Z

The user mumoshu in the public kubernetes-incubator/kube-aws project has created an implementation that works around the tagging limitations kubernetes-retired/kube-aws#112

As an individual interested in running kubernetes in a production context, the target-capacity limitation of cloudformation with the aforementioned implementation documented at https://github.com/kubernetes-incubator/kube-aws/blob/master/Documentation/kubernetes-on-aws-node-pool.md#known-limitations is troubling.

Running kube-aws update to increase or decrease targetCapacity of a spot fleet results in a complete replacement of the Spot Fleet hence some downtime. This is due to how CloudFormation works for updating a Spot Fleet

I do believe that a re-implementation of spot fleet would be beneficial especially if it means I can mix lifecycle type nodes and define pods that are mission-critical ( never use spot ) vs highly-available ( always have at least X pods on non-spot nodes; scale out using spot ) and interruption-tolerant ( only use spot )

In a production context zone coverage of application pods is also important. Ideally this should be adjustable so you can get the biggest bang for your buck if you are not running in production.

Also draining the pods when a termination is signaled http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html#spot-instance-termination-notices will make a more opaque end-user experience of services running in the cluster.

bcorijn · 2017-08-02T08:07:23Z

As mentioned in kubernetes/kubernetes#24472, AWS now supports propagating tags on Spot Fleets!

Globegitter · 2017-08-31T09:37:40Z

Would this then also allow to support https://github.com/cristim/autospotting? Or would it just be a matter of that tool to then have to set the right tag on each instance it attaches to the autoscaling group?

ajohnstone · 2017-10-28T12:24:34Z

Any update on this ?
https://aws.amazon.com/about-aws/whats-new/2017/07/tag-your-spot-fleet-ec2-instances/

justinsb · 2017-11-18T22:16:34Z

So spot fleet actually works with kubernetes, now that we have tagging support. Changing a spot fleet configuration though is different from how ASGs work, so we need another approach.

I'm hoping we can look at this in the 1.9 1.10 timeframe, using the machines API probably.

fejta-bot · 2018-02-16T23:09:18Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

cdenneen · 2018-02-17T00:10:59Z

/remove-lifecycle stale

@justinsb is there any updated docs on how to configure kops to do this?

chrislovecnm · 2018-02-17T20:43:23Z

@cdenneen this needs to be implemented, we only support spot instances via the ASG API calls.

/lifecycle frozen
/remove-lifecycle stale

chrislovecnm · 2018-02-17T20:44:06Z

I am marking with help wanted label since this is a bit of a project, not a great place to start, but a great feature to add.

itskingori · 2018-05-20T13:19:42Z

@justinsb I have two comments about #1784 (comment) ...

So spot fleet actually works with kubernetes, now that we have tagging support. Changing a spot fleet configuration though is different from how ASGs work, so we need another approach.

We now have EC2 Fleets which supersede Spot Fleets. I'm not sure if we should keep this open and maybe change the title ... or close it and open another about EC2 Fleets.
At the bottom of that post, they mention that they are planning on connecting EC2 Fleet and EC2 Auto Scaling groups ... IMO, this would significantly affect (simplify) the design/implementation.

@chrislovecnm And on #1784 (comment) ...

I am marking with help wanted label since this is a bit of a project, not a great place to start, but a great feature to add.

I'd posted on Slack that I'm looking to contribute by tackling this but based on my research I'm convinced that the solution is not trivial, a bit premature to tackle and is probably best addressed on the autoscaler level. And if AWS indeed connect EC2 Fleets with ASGs, then even the autoscaler won't need to reinvent the wheel. I guess the question is when! 😅

FYI, there are two issues on autoscaler about fleets. There's kubernetes/autoscaler#519 requesting support for Spot Fleets and kubernetes/autoscaler#838 requesting support for EC2 Fleets.

cjbottaro · 2018-08-08T14:39:20Z

Is there any documentation on how to manually make your own instance groups?

In EKS, I made my own spot fleet, but that is because the userdata for nodes joining an EKS cluster is pretty simple and I could easily change it up for my needs.

The userdata for nodes launched with kops is a bit more involved... :/

cdenneen · 2018-08-08T14:54:50Z

@cjbottaro how did you make your own spot fleet for EKS? I'm interested in how you implemented this and the benefits you see in doing so. Thanks

cjbottaro · 2018-08-08T15:50:48Z

the benefits

We currently run in ECS and we have two AGS (using launch configs) populating our cluster: one for on-demand instances and one for spot. More than a few times, we've seen spot prices spike, and thus wipe out 3/4 of our cluster, causing an outage before we realize what's going on and up our on-demand instances to take over.

Spot fleets (which require launch templates instead of launch configs) let you specify multiple machine types. So if prices spike for one type and get wiped out, the fleet will automatically fulfill other machine types that you can afford. Furthermore, you can tell the spot fleet to diversity among the machine types / availability zones, so there is less chance of max extinction.

How in EKS

I followed the tutorial to get an EKS cluster up and running, which includes creating a launch config and ASG for nodes. I simply converted the launch config to a launch template (by hand in the AWS web console) and copy/pasted the userdata which configures and starts kubelet, then made a spot fleet using that launch template. The user data for EKS nodes is pretty small and easy to understand. To add taints and labels, you can simply add something like this to the end of the user data:

NODE_NAME=`hostname -s`.ec2.internal
export KUBECONFIG=/var/lib/kubelet/kubeconfig
until kubectl get node $NODE_NAME; do sleep 1; done
kubectl label node $NODE_NAME lifecycle=od
kubectl taint node $NODE_NAME dedicated=datastores:NoExecute

Issue with kops

Kops has these first class objects instancegroups and it seems like the userdata for nodes is the output of a template plus the instancegroup manifest. The problem is that it seems like there are a lot of values in the manifest that come out in the user data... too many for a simple copy/paste of the resulting user data to not be error prone.

It would be nice if there was a command to output the user data for a given instancegroup:

kops export userdata my_instance_group

So we can create our own launch configs / launch templates.

Thanks!

rbtcollins · 2018-10-28T22:12:00Z

I think this also needs some proper k8s core integration - I've filed kubernetes/kubernetes#70342 about that.

gambol99 · 2018-12-31T18:34:13Z

related #6277

disha1104 · 2019-01-17T12:44:16Z

@gambol99 , I checked the PR, it looks great.
Does it mean, we'll be able to use launch templates and smart ASG's with Kops? As Kops currently only supports Launch Configuration.

gambol99 · 2019-01-17T13:41:33Z

@disha1104 ...

Does it mean, we'll be able to use launch templates and smart ASG's with Kops?

yep :-) ..

disha1104 · 2019-01-17T13:47:25Z

That's really great.
Is this planned for next-release, like by when can I expect this feature?
Also, is the launch-configuration conversion to launch-template handled? And also the Spot-termination handler thing, like in terms of reliability of the nodes?

oded-dd · 2019-02-11T11:33:52Z

It's amazing. I was thinking on using spotinst but once it is deployed I would rather use it.

When it is planned to be merged?

gambol99 · 2019-02-11T14:05:36Z

@disha1104

Is this planned for next-release, like by when can I expect this feature?

So I don't cut the releases ... you'd need to be speak to @justinsb

Also, is the launch-configuration conversion to launch-template handled?

The launch-configuration -> launch-template is done automatically (though admittedly it leaves the LC hanging, so just needs to deleted manually post update)

And also the Spot-termination handler thing, like in terms of reliability of the nodes?

So I this doesn't add a spot termination handler addon as I'd rather let the users choose how they want it done.

disha1104 · 2019-02-20T09:29:03Z

@justinsb which release is this planned for?

wanghanlin · 2019-02-25T04:37:59Z

#6277 is merged, I think we can close this one 🎉

k8s-ci-robot · 2019-02-25T04:38:06Z

@wanghanlin: You can't close an active issue/PR unless you authored it or you are a collaborator.

In response to this:

#6277 is merged, I think we can close this one 🎉
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fejta-bot · 2019-06-12T05:44:57Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-07-12T06:33:49Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-08-11T07:17:17Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-08-11T07:17:24Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ese changed the title ~~Support spot fleet fot instances groups~~ Support spotFleet fot instances groups Feb 4, 2017

ese changed the title ~~Support spotFleet fot instances groups~~ Support spotFleet for instances groups Feb 4, 2017

justinsb added this to the 1.5.1 milestone Feb 5, 2017

JensRantil mentioned this issue May 29, 2017

Consider supporting spot fleet #404

Closed

justinsb added the blocks-next label Aug 17, 2017

justinsb modified the milestones: 1.5.2, 1.9 Nov 18, 2017

justinsb removed the blocks-next label Nov 18, 2017

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 16, 2018

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 17, 2018

chrislovecnm added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. and removed lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. labels Feb 17, 2018

zacblazic mentioned this issue Mar 20, 2018

Spot request for spot instance group expires after 1 week #4424

Closed

justinsb modified the milestones: 1.9.0, 1.10 May 26, 2018

zytek mentioned this issue Sep 17, 2018

Any luck with manually creating EC2 Spot Fleet ? #5797

Closed

justinsb modified the milestones: 1.10, 1.12 Mar 14, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 12, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 12, 2019

k8s-ci-robot closed this as completed Aug 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support spotFleet for instances groups #1784

Support spotFleet for instances groups #1784

ese commented Feb 4, 2017 •

edited

Loading

justinsb commented Feb 5, 2017

jpcope commented May 11, 2017 •

edited

Loading

bcorijn commented Aug 2, 2017

Globegitter commented Aug 31, 2017

ajohnstone commented Oct 28, 2017

justinsb commented Nov 18, 2017

fejta-bot commented Feb 16, 2018

cdenneen commented Feb 17, 2018 •

edited

Loading

chrislovecnm commented Feb 17, 2018

chrislovecnm commented Feb 17, 2018

itskingori commented May 20, 2018

cjbottaro commented Aug 8, 2018

cdenneen commented Aug 8, 2018

cjbottaro commented Aug 8, 2018

rbtcollins commented Oct 28, 2018

gambol99 commented Dec 31, 2018

disha1104 commented Jan 17, 2019

gambol99 commented Jan 17, 2019

disha1104 commented Jan 17, 2019

oded-dd commented Feb 11, 2019

gambol99 commented Feb 11, 2019

disha1104 commented Feb 20, 2019

wanghanlin commented Feb 25, 2019 •

edited

Loading

k8s-ci-robot commented Feb 25, 2019

fejta-bot commented Jun 12, 2019

fejta-bot commented Jul 12, 2019

fejta-bot commented Aug 11, 2019

k8s-ci-robot commented Aug 11, 2019

Support spotFleet for instances groups #1784

Support spotFleet for instances groups #1784

Comments

ese commented Feb 4, 2017 • edited Loading

justinsb commented Feb 5, 2017

jpcope commented May 11, 2017 • edited Loading

bcorijn commented Aug 2, 2017

Globegitter commented Aug 31, 2017

ajohnstone commented Oct 28, 2017

justinsb commented Nov 18, 2017

fejta-bot commented Feb 16, 2018

cdenneen commented Feb 17, 2018 • edited Loading

chrislovecnm commented Feb 17, 2018

chrislovecnm commented Feb 17, 2018

itskingori commented May 20, 2018

cjbottaro commented Aug 8, 2018

cdenneen commented Aug 8, 2018

cjbottaro commented Aug 8, 2018

rbtcollins commented Oct 28, 2018

gambol99 commented Dec 31, 2018

disha1104 commented Jan 17, 2019

gambol99 commented Jan 17, 2019

disha1104 commented Jan 17, 2019

oded-dd commented Feb 11, 2019

gambol99 commented Feb 11, 2019

disha1104 commented Feb 20, 2019

wanghanlin commented Feb 25, 2019 • edited Loading

k8s-ci-robot commented Feb 25, 2019

fejta-bot commented Jun 12, 2019

fejta-bot commented Jul 12, 2019

fejta-bot commented Aug 11, 2019

k8s-ci-robot commented Aug 11, 2019

ese commented Feb 4, 2017 •

edited

Loading

jpcope commented May 11, 2017 •

edited

Loading

cdenneen commented Feb 17, 2018 •

edited

Loading

wanghanlin commented Feb 25, 2019 •

edited

Loading