-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add container runtime #5572
Add container runtime #5572
Conversation
cc @vmarmol |
Container runtime interface looks good to me, and i am ok to iterate later with image handling operations. There are compiling issues in second commit. |
Oh sorry, I updated the Also rebased on master, replace BoundPod with Pod |
1dd6453
to
26f061d
Compare
// ListPods returns a list of Pod. The boolean parameter specifies whether | ||
// the runtime returns all pods including those already exited and dead pods(used | ||
// for garbage collection). | ||
ListPods(all bool) ([]*api.Pod, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't figure out how you are planning to use ListPods in container garbage collection. Thinking about a scenario, a given pod with restart policy always has two containers: A and B. A is in crashloop, and per container cap is 5. How are you going to enforce the limit at the higher level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dchen1107 Good question! In the rocket world, I think we won't need to garbage collect each container, because it will be taken care by the PID 1 in the pod, so we only need to GC in the level of pods.
But in the docker scenario, I think maybe we can add an ID
field in api.Container, so that we can get all A1 A2 A3 in the pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dchen1107 Besides ID
, there is another thing we need to store -- hash
. Currently I did a hack which stores it on pod.Annotations..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also usually need to keep around a dead container for its logs and to find out how it ended (idk how Rocket handles that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to enforce per container limit, and I like the idea to push down those logic to runtime library including docker.
But how to enforce per node limit? The node might end up with tons of pods which in Running phase, but a lot of crashing containers. When exceeding per node limit, kubelet have to recycle containers more aggressively without respecting per container limit.
Also I don't think I understand your suggestion on ID field in api.Container? How is it going to help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, I think I can just use the container statuses in the pod?
api.Pod.Status.Info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
api.Pod.Status.Info includes per container detail status, but one entry per container name. I plan to extend it to include last terminated information for each container. But still it doesn't have rest terminated containers' information.
I do think we might need a container type in runtime abstraction, but open to other suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think define container and pod in the container runtime here can be more clear. And actually there are lot of things in api.Container we don't actually use here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How bad to have ListContainersInPod(string containerName, pod_uid string, ...) which returns all containers running or dead grouped by a pod?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not bad at all. But what could be the type of the returning containers? Since we want the container id (for killing) and hash (for comparison) in syncPod
, we will need to store them somewhere.
At least we need to return map[id]api.Container
, but we still need the hash, so maybe map[fullName]api.Container
. But embedding hash in the name is not a good idea IMO. So I am still thinking about creating Container type in runtime, which can has any fields we want: id, hash...
Any suggestions?
// running containers, or mixed with dead ones (when ListPods(true)). | ||
Containers []*Container | ||
// The status of the pod. | ||
Status api.PodStatus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only thing that worries me with Status in Pod is that for Docker we'll need to inspect every container after we list. This will be a bit expensive. Lets go with this and see how it goes. The abstraction does make it much nicer to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vmarmol Yes, I have thought about that. I think it might make more sense to let each syncPod() routine inspect its containers in the pod. Let me add a TODO to revisit this here, thank you!
Over all I really like the simplicity of the API :) today's API is a mess of stuff all put together. |
8e4a20b
to
8d6c144
Compare
// ListPods returns a list of Pod. The boolean parameter specifies whether | ||
// the runtime returns all pods including those already exited and dead pods(used | ||
// for garbage collection). | ||
ListPods(all bool) ([]*Pod, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Sorry to be a pain, but can we rename this to GetPods()? We seem to be standardizing on Get instead of List with the functions below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
3dcb27b
to
8c9a875
Compare
LGTM overall. There are a couple of things missing, but we can iterate them later. Meanwhile I will also take a look on how to refactory existing dockertool package to support this API. @vmarmol Any more comments? I will merge it on green. |
@dchen1107 Sure, we might need to modify the interfaces while refactoring the dockertool. I have tried to do a GetPods(), I think that is OK. Tricky parts would be hide pod infra container in RunContainerInPod() with efficiency. |
I had a PR just merged which hiding pod infra container completely in dockertool now. That package needs more refactory / cleanup. |
@dchen1107 Fantastic! Ugh, Shippable build is failing. But I cannot get the information. What is the script being run on shippable? |
NM, just found a way to run my branch on shippable |
I kicked out shippable a while back. Travis takes long. |
LGTM :) |
@dchen1107 @vmarmol Think that GetPodStatus might take pod.Spec as well to know the terminationMessagePath. But we can change that in the refactor PR. |
I think you need a rebase again. |
1e03e32
to
7888e94
Compare
a014b0a
to
13033d4
Compare
Shippable keeps failing me... Try rebase again. |
I just kicked out shippable for this PR. :-) |
Thanks for the PR. We can start another round of refactory and cleanup. |
Split the PR of #5518.
This is an attempt to introduce container runtime interface.
Some pieces are missing, (image pulling part, which I will try to fill out in the next couple of days)
The runtime cache is just copy-pasted from the docker cache, (I think we can refactor that piece later).
The interface fits rocket's model well, though by now we cannot handle
RunContainerInPod
/KillContainerInPod
efficiently. But I think there is a plan to get that point soon.For the docker part, I am now trying to hide the
docker.GetKubeletDockerContainers
withListPods
. And will try to see if we can hide other docker code in the kubelet with the interface.Feedback welcome! Thank you!
@dchen1107