Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet should cache image history #25057

Closed
yujuhong opened this issue May 2, 2016 · 2 comments
Closed

kubelet should cache image history #25057

yujuhong opened this issue May 2, 2016 · 2 comments
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.

Comments

@yujuhong
Copy link
Contributor

yujuhong commented May 2, 2016

ref: #23595 (comment)

kubelet list images and get image history from each container runtime periodically. We should cache the history for the images, and only query the runtime when necessary (new images).

@yujuhong yujuhong added sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. sig/node Categorizes an issue or PR as relevant to SIG Node. labels May 2, 2016
@Random-Liu
Copy link
Member

Random-Liu commented May 27, 2016

From http://perf-dash.k8s.io/, @yujuhong and I observed that the runtime cpu usage increased a lot (0.35 cores -> 0.50 cores) after #25784 got merged.

The reason is that we have more images prepulled on each node, and the image history operation consumes much more resources.

From docker log, we could see every 10 seconds, there will be a bunch of image history inspection:

time="2016-05-25T15:40:32.026282119Z" level=info msg="GET /images/e0f821df2a220f64d2f6722dd9ebce4b60c50497622cd3189080f4dfd93fde9d/history" 
time="2016-05-25T15:40:32.027489324Z" level=info msg="GET /images/b2091bb33512a946fb6c81ae8e139640e84f20f165557dc5926a6cb73a9dfa43/history" 
time="2016-05-25T15:40:32.028450291Z" level=info msg="GET /images/4ec16685914fbb54f5c6d924c913bb5aca4edc7326f799d622db9fa50a403311/history" 
time="2016-05-25T15:40:32.029884964Z" level=info msg="GET /images/3f8217a3a8f1e891612aece9cbf8b8defeb1f1ffa39836ebb7de5e03139f56a7/history" 
time="2016-05-25T15:40:32.032075931Z" level=info msg="GET /images/5e39f07038295497ff06205bd9649e1fb3bb8e46baea8a69f185e1bc9141d9b4/history" 
time="2016-05-25T15:40:32.033525411Z" level=info msg="GET /images/fd115996926267fd4f08734f50aa556c99f19f683b03b977aa0ba22f3a3184e3/history" 
time="2016-05-25T15:40:32.034735627Z" level=info msg="GET /images/2f92faace8498fab0cc8c7b28f5c943c0540e108caf8ce41e05ab4162233d9f6/history" 
time="2016-05-25T15:40:32.036438076Z" level=info msg="GET /images/e26b3574db839ba907a7d9efb12265e128dbe35e671d3a63189b04dee7748b23/history" 
time="2016-05-25T15:40:32.037434341Z" level=info msg="GET /images/7948654508f0c0a9d43e45ccdc88a920717bd8d47bdd9d401ced76654aa0129b/history" 
time="2016-05-25T15:40:32.040446683Z" level=info msg="GET /images/c9d27dda963c3dbdcfbadccd3ef4b69155c1c72fa8c7964a8bcb7f93ee9abb6e/history" 
time="2016-05-25T15:40:32.041490576Z" level=info msg="GET /images/f926c435a94000574429abdddbf8cd2b3363ee8aff5a8590c1d26288f857114b/history" 
time="2016-05-25T15:40:32.043334034Z" level=info msg="GET /images/c119c36f298e4b0d8fde08f84e6fc8b610816fd684e78972f564931d5d571bdb/history" 
time="2016-05-25T15:40:32.044316075Z" level=info msg="GET /images/b175bcb790231169e232739bd2172bded9669c25104a9b723999c5f366ed7543/history" 
time="2016-05-25T15:40:32.045152871Z" level=info msg="GET /images/76c17b1c0712c1752e5deabc4837e5fa07ff705a062abdcb01c63455dfcbba2e/history" 
time="2016-05-25T15:40:32.047083206Z" level=info msg="GET /images/f268497d63d0301948891a34155a140cf956b07c0bfda1e2ca0fed2920431aa8/history" 
time="2016-05-25T15:40:32.048049361Z" level=info msg="GET /images/2bf7220faf78d974cc4aeabdacb259c20a54954078b33cbba3949bd8a475ad1b/history" 
time="2016-05-25T15:40:32.049137739Z" level=info msg="GET /images/4da9c39361ca001b9fea8dcca8d427bcaa521f6c84d349c28974975cd6da96d7/history" 
time="2016-05-25T15:40:32.051417961Z" level=info msg="GET /images/851b47c67382fe40ccfd24bcc25444992f796e88b7ffe4fcfef0b2de322425d2/history" 
time="2016-05-25T15:40:32.052385451Z" level=info msg="GET /images/451b3de376acbd981f8916e7ec00595f0d5c5d9baa0db8c722690a6a489d5677/history" 
time="2016-05-25T15:40:32.053749872Z" level=info msg="GET /images/cf20f838dc83ea34817be1c546825f25a65745658593650ebdb5049bc6386d11/history" 
time="2016-05-25T15:40:32.054537811Z" level=info msg="GET /images/8950680a606cf7a0b7916dbf4a435b35d28d75c705999847eddb5ed38eb53204/history" 
time="2016-05-25T15:40:32.055350941Z" level=info msg="GET /images/0bac9e8a78faf81cff38ba54baaf63c3818dce51c1ea24d3a516ab8c1ca70a72/history" 
time="2016-05-25T15:40:32.056456406Z" level=info msg="GET /images/b0399646762c78819414e113808b8bff9d30151f36e1ffba73fce190c1b7b960/history" 
time="2016-05-25T15:40:32.057320477Z" level=info msg="GET /images/4a7708194f7880f28498a7788b81770e230fae6de5d77bb848aecfefc90ae323/history" 
time="2016-05-25T15:40:32.059598328Z" level=info msg="GET /images/c832d89a011dae68d5f179bc6bb1024eae2f50aea4fc205c69fd4ba5da5aa85e/history" 
time="2016-05-25T15:40:32.062254938Z" level=info msg="GET /images/93e4a66f9fc5be431e8cbbd1d26692679e4bba57a701c0b81d6a18fc41613cab/history" 
time="2016-05-25T15:40:32.063909413Z" level=info msg="GET /images/5c8faa6d16fa088080ef69a997ae718732bda6416770c55f7c112b1ab3e5fb0b/history" 
time="2016-05-25T15:40:32.065862423Z" level=info msg="GET /images/f94ba9414e196599263898609a857a414fb9c7f201b3f8adc2599c15118ff048/history" 
time="2016-05-25T15:40:32.069331347Z" level=info msg="GET /images/9eb6fe14f2a4637e55183ca7bfc5b28f9b233008b5f718b983557ea9a8ed562b/history" 
time="2016-05-25T15:40:32.070359698Z" level=info msg="GET /images/4c86699a45349288ab91f45001c3a43e04bc87eab3f848c6bf45538fddf899ed/history" 
time="2016-05-25T15:40:32.071135204Z" level=info msg="GET /images/676c42d1b0b20b8f51f071258b502c5d45d5a9122ef98f3cdd77f59fcdd03436/history" 
time="2016-05-25T15:40:32.072157519Z" level=info msg="GET /images/3e004fdaffa93ed97f145f3d3aa817f7e29424a50bb615b9296a3a4f205cdb02/history" 
time="2016-05-25T15:40:32.073057381Z" level=info msg="GET /images/efb17dbf75379d08e56cc3686bc05355546ad93d83f9405d4a10f4c0eedc6018/history" 
time="2016-05-25T15:40:32.074556195Z" level=info msg="GET /images/e5f6c5a2b5646828f51e8e0d30a2987df7e8183ab2c3ed0ca19eaa03cc5db08c/history" 
time="2016-05-25T15:40:32.077393503Z" level=info msg="GET /images/3fc703149474a6c72d9c7f296417f49cf67be3788cbb4a91c821da5cf551d42c/history" 
time="2016-05-25T15:40:32.078596935Z" level=info msg="GET /images/8ecfc335293e1590eb35f0b64d21387a074c53b250208fd4b18f0c18bc270d30/history" 
time="2016-05-25T15:40:32.094003193Z" level=info msg="GET /images/a36f3a9b3692711e516317b1d6d012df6b6b390c92a415638d64a5b5f016de94/history" 
time="2016-05-25T15:40:32.095564872Z" level=info msg="GET /images/7332861f06d0f1487265e8c3d99bb8800df8e6770e11ad1fe5a7c580362c2a80/history" 
time="2016-05-25T15:40:32.097673811Z" level=info msg="GET /images/a9deb569fea0e569b9f8262b62e641a9e8e54c911ce798d3d5ac0922d6f27796/history" 
time="2016-05-25T15:40:32.098819152Z" level=info msg="GET /images/8d83bde9424777eb0ba69fc61fb7283fd0a51f6f9202dd558c9fbe44359a3bc6/history" 
time="2016-05-25T15:40:32.099769976Z" level=info msg="GET /images/028eeaf25269086b55474b7bb800744bf6f9e4d7f54db78b5ce20c5e24d02a47/history" 

After #23595 got merged, we've already observed the cpu usage increasing (0.30 cores -> 0.35 cores) (see #23595 (comment)).
Now we have 38 prepulled images on each node, the cpu usage increases more (0.35 cores -> 0.50 cores).

It means that this has eaten 0.20 cores with 38 images on the node and will consume even more if there are more images on the node. This is a significant performance regression, we should:

  • Cache the image history and avoid calling runtime each time.
  • Or extend the probe interval.

@yujuhong @vishh @timstclair @dchen1107 I think we should mark higher priority and finish this in 1.3. WDYT?
/cc @kubernetes/sig-node

@yujuhong
Copy link
Contributor Author

This almost doubles docker's cpu usage in some cases, as shown in the dashboard.
This also means that even if there are no containers running, nodes may still dedicate a significant amount just to check the image history. We should fix this.

k8s-github-robot pushed a commit that referenced this issue May 30, 2016
Automatic merge from submit-queue

Kubelet: Cache image history to eliminate the performance regression

Fix #25057.

The image history operation takes almost 50% of cpu usage in kubelet performance test. We should cache image history instead of getting it from runtime everytime.

This PR cached image history in imageStatsProvider and added unit test.

@yujuhong @vishh 
/cc @kubernetes/sig-node 

Mark v1.3 because this is a relatively significant performance regression.

[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/.github/PULL_REQUEST_TEMPLATE.md?pixel)]()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.
Projects
None yet
Development

No branches or pull requests

2 participants