Kernel panic when having a privileged container with docker >= 1.10 #27885

rata · 2016-06-22T17:31:14Z

Hi,

I'm using a privileged container in a kubernetes pod to build images. The container runs docker 1.10.3. I'm using kubernetes 1.2.4 on AWS (setup with kube-up).

From time to time, a node crashes. Here is the output of the last crash at the end.

It seems this is the bug, and maybe it's related to using docker >= 1.10 on debian jessie kernel (although it is not confirmed) as reported here: moby/moby#21081

If this is the case, THIS PROBABLY AFFECTS kubernentes 1.3 that is due to be released.

cc @justinsb

[   82.728265] aufs au_opts_verify:1570:docker[1654]: dirperm1 breaks the protection by the permission bits on the lower branch
[   82.760820] aufs au_opts_verify:1570:docker[1635]: dirperm1 breaks the protection by the permission bits on the lower branch
[   82.896108] aufs au_opts_verify:1570:docker[1654]: dirperm1 breaks the protection by the permission bits on the lower branch
[   82.928699] aufs au_opts_verify:1570:docker[1654]: dirperm1 breaks the protection by the permission bits on the lower branch
[   82.992993] aufs au_opts_verify:1570:docker[1673]: dirperm1 breaks the protection by the permission bits on the lower branch
[   83.385415] aufs au_opts_verify:1570:docker[1691]: dirperm1 breaks the protection by the permission bits on the lower branch
[   83.480134] aufs au_opts_verify:1570:docker[1691]: dirperm1 breaks the protection by the permission bits on the lower branch
[   83.592429] aufs au_opts_verify:1570:docker[1744]: dirperm1 breaks the protection by the permission bits on the lower branch
[   84.002341] aufs au_opts_verify:1570:docker[1689]: dirperm1 breaks the protection by the permission bits on the lower branch
[   84.083000] aufs au_opts_verify:1570:docker[1516]: dirperm1 breaks the protection by the permission bits on the lower branch
[   84.140267] aufs au_opts_verify:1570:docker[1516]: dirperm1 breaks the protection by the permission bits on the lower branch
[   84.219145] aufs au_opts_verify:1570:docker[1801]: dirperm1 breaks the protection by the permission bits on the lower branch
[   84.252038] aufs au_opts_verify:1570:docker[1801]: dirperm1 breaks the protection by the permission bits on the lower branch
[   84.293019] aufs au_opts_verify:1570:docker[1805]: dirperm1 breaks the protection by the permission bits on the lower branch
[   84.581778] aufs au_warn_loopback:122:loop1[1857]: you may want to try another patch for loopback file on ext4(0xef53) branch
[   84.603270] divide error: 0000 [#1] SMP 
[   84.604057] Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c xt_statistic xt_nat xt_mark ipt_REJECT xt_tcpudp xt_comment loop veth binfmt_misc sch_htb ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables x_tables nf_nat nf_conntrack bridge stp llc aufs(C) nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc crc32_pclmul ppdev aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd evdev psmouse serio_raw parport_pc ttm parport drm_kms_helper drm i2c_piix4 i2c_core processor button thermal_sys autofs4 ext4 crc16 mbcache jbd2 btrfs xor raid6_pq dm_mod ata_generic crct10dif_pclmul crct10dif_common xen_netfront xen_blkfront crc32c_intel ata_piix libata scsi_mod floppy
[   84.609355] CPU: 1 PID: 1853 Comm: docker Tainted: G         C    3.16.0-4-amd64 #1 Debian 3.16.7-ckt20-1+deb8u4
[   84.609355] Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/12/2016
[   84.609355] task: ffff8801e3657470 ti: ffff8801e47a8000 task.ti: ffff8801e47a8000
[   84.609355] RIP: 0010:[<ffffffffa0577200>]  [<ffffffffa0577200>] pool_io_hints+0xf0/0x1a0 [dm_thin_pool]
[   84.609355] RSP: 0018:ffff8801e47abbc8  EFLAGS: 00010246
[   84.609355] RAX: 0000000000010000 RBX: ffff8801e4736840 RCX: ffff8801c2662000
[   84.609355] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8801e48c4080
[   84.609355] RBP: ffff8801e47abc10 R08: 0000000000000000 R09: 0000000000000000
[   84.609355] R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffa057f5c8
[   84.609355] R13: 0000000000000001 R14: ffff8801e47abc90 R15: 0000000000000131
[   84.609355] FS:  00007ff465daf700(0000) GS:ffff8801efc20000(0000) knlGS:0000000000000000
[   84.609355] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   84.609355] CR2: 000000c207f1c3fb CR3: 00000001e2a5a000 CR4: 00000000001406e0
[   84.609355] Stack:
[   84.609355]  ffffffff810a7c71 0000000043e06d70 ffffc9000115f040 0000000000000000
[   84.609355]  0000000043e06d70 ffffc9000115f040 0000000000000000 ffff8800e9da3800
[   84.609355]  ffffffffa00ba615 000fffffffffffff 00000000ffffffff 00000000000000ff
[   84.609355] Call Trace:
[   84.609355]  [<ffffffff810a7c71>] ? complete+0x31/0x40
[   84.609355]  [<ffffffffa00ba615>] ? dm_calculate_queue_limits+0x95/0x130 [dm_mod]
[   84.609355]  [<ffffffffa00b7ec3>] ? dm_swap_table+0x73/0x320 [dm_mod]
[   84.609355]  [<ffffffffa00b0101>] ? crc_t10dif_generic+0x101/0x1000 [crct10dif_common]
[   84.609355]  [<ffffffffa00bd0d0>] ? table_load+0x330/0x330 [dm_mod]
[   84.609355]  [<ffffffffa00bd165>] ? dev_suspend+0x95/0x220 [dm_mod]
[   84.609355]  [<ffffffffa00bda55>] ? ctl_ioctl+0x205/0x430 [dm_mod]
[   84.609355]  [<ffffffffa00bdc8f>] ? dm_ctl_ioctl+0xf/0x20 [dm_mod]
[   84.609355]  [<ffffffff811ba99f>] ? do_vfs_ioctl+0x2cf/0x4b0
[   84.609355]  [<ffffffff810d485e>] ? SyS_futex+0x6e/0x150
[   84.609355]  [<ffffffff811bac01>] ? SyS_ioctl+0x81/0xa0
[   84.609355]  [<ffffffff81513ecd>] ? system_call_fast_compare_end+0x10/0x15
[   84.609355] Code: 0f 84 a5 00 00 00 3b 96 10 06 00 00 49 c7 c4 c8 f5 57 a0 77 26 8b b6 18 06 00 00 89 d0 c1 e0 09 48 39 f0 0f 82 92 00 00 00 31 d2 <48> f7 f6 85 d2 74 2d 49 c7 c4 70 f5 57 a0 66 90 48 89 e6 e8 28 
[   84.609355] RIP  [<ffffffffa0577200>] pool_io_hints+0xf0/0x1a0 [dm_thin_pool]
[   84.609355]  RSP <ffff8801e47abbc8>
[   84.770467] ---[ end trace fcce781faebae9ce ]---
[   84.773018] Kernel panic - not syncing: Fatal exception
[   84.775963] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[    6.096097] xenbus_probe_frontend: Waiting for devices to initialise: 25s...20s...15s...
[   17.402123] reboot: Failed to start orderly shutdown: forcing the issue
[   17.407629] xenbus: xenbus_dev_shutdown: device/vif/0: Initialising != Connected, skipping
[   17.412875] xenbus: xenbus_dev_shutdown: device/vbd/51744: Initialising != Connected, skipping
[   17.417585] xenbus: xenbus_dev_shutdown: device/vbd/51712: Initialising != Connected, skipping
[   17.421263] xenbus: xenbus_dev_shutdown: device/vfb/0: Initialised != Connected, skipping
[   17.424839] ACPI: Preparing to enter system sleep state S5
[   17.427112] reboot: Power down

The text was updated successfully, but these errors were encountered:

girishkalele · 2016-06-22T18:04:38Z

@rata

The Docker issue you pointed to is a different kernel panic - need to find if your panic has been reported before.

[   84.603270] divide error: 0000 [#1] SMP 
[   84.609355] Call Trace:
[   84.609355]  [<ffffffff810a7c71>] ? complete+0x31/0x40
[   84.609355]  [<ffffffffa00ba615>] ? dm_calculate_queue_limits+0x95/0x130 [dm_mod]
[   84.609355]  [<ffffffffa00b7ec3>] ? dm_swap_table+0x73/0x320 [dm_mod]
[   84.609355]  [<ffffffffa00b0101>] ? crc_t10dif_generic+0x101/0x1000 [crct10dif_common]
[   84.609355]  [<ffffffffa00bd0d0>] ? table_load+0x330/0x330 [dm_mod]
[   84.609355]  [<ffffffffa00bd165>] ? dev_suspend+0x95/0x220 [dm_mod]
[   84.609355]  [<ffffffffa00bda55>] ? ctl_ioctl+0x205/0x430 [dm_mod]
[   84.609355]  [<ffffffffa00bdc8f>] ? dm_ctl_ioctl+0xf/0x20 [dm_mod]
[   84.609355]  [<ffffffff811ba99f>] ? do_vfs_ioctl+0x2cf/0x4b0
[   84.609355]  [<ffffffff810d485e>] ? SyS_futex+0x6e/0x150
[   84.609355]  [<ffffffff811bac01>] ? SyS_ioctl+0x81/0xa0
[   84.609355]  [<ffffffff81513ecd>] ? system_call_fast_compare_end+0x10/0x15

and the one reported in the thread is a NULL pointer dereference with a different stack trace.

Mar 10 03:01:10 node01 kernel: [1691882.846915] BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
Mar 10 03:01:10 node01 kernel: [1691882.846982] IP: [<ffffffff810a2c38>] pick_next_task_fair+0x6b8/0x820
Mar 10 03:01:10 node01 kernel: [1691882.847028] PGD 0 
Mar 10 03:01:10 node01 kernel: [1691882.856551] Call Trace:
Mar 10 03:01:10 node01 kernel: [1691882.856585]  [<ffffffff8101b975>] ? sched_clock+0x5/0x10
Mar 10 03:01:10 node01 kernel: [1691882.856622]  [<ffffffff8150fed6>] ? __schedule+0x106/0x700
Mar 10 03:01:10 node01 kernel: [1691882.856660]  [<ffffffff8108ea86>] ? smpboot_thread_fn+0xc6/0x190
Mar 10 03:01:10 node01 kernel: [1691882.856698]  [<ffffffff8108e9c0>] ? SyS_setgroups+0x170/0x170
Mar 10 03:01:10 node01 kernel: [1691882.856736]  [<ffffffff8108805d>] ? kthread+0xbd/0xe0
Mar 10 03:01:10 node01 kernel: [1691882.856772]  [<ffffffff81087fa0>] ? kthread_create_on_node+0x180/0x180
Mar 10 03:01:10 node01 kernel: [1691882.856811]  [<ffffffff81513c58>] ? ret_from_fork+0x58/0x90
Mar 10 03:01:10 node01 kernel: [1691882.856848]  [<ffffffff81087fa0>] ? kthread_create_on_node+0x180/0x180

Random-Liu · 2016-06-22T18:11:19Z

It seems that @jfrazelle encountered both issues before.
https://gist.github.com/jfrazelle/df0667df1be407ef96c2

@rata @girishkalele

jessfraz · 2016-06-22T18:13:12Z

ah that's a bad kernel I remember that, I think there is a minor release update for it in ubuntu that is much much better

dchen1107 · 2016-06-22T18:17:43Z

@rata Thanks for reporting the issue. @Random-Liu and I looked at the initial docker issue, and looks like there are several kernel panics and both docker 1.10.X and docker 1.11.X on various kernel versions are affected. So far, I didn't observe the same failure in our jenkins tests, it could be we paper over the issue somehow. Anyway, we should make the problem visible to the end users first, and help with the debugging and fix since it might affect our Kubernetes 1.3 users.

Here are the plan I am thinking:

Document this as known issue for both docker 1.10.X and docker 1.11.X in our release
Update NodeProblemDetector's manifest file to catch above kernel dumps, so that after the node come up, NodeProblemDector can report an KernelCrash event back to upstream components, and the users can understand why their applications are restarted.

Random-Liu · 2016-06-22T18:20:10Z

@dchen1107 SGTM! :)

dchen1107 · 2016-06-22T18:46:52Z

xref: docker 1.10.X (#19720), docker 1.11.X (#23397)

rata · 2016-06-22T19:50:11Z

@girishkalele ohh, sorry. I was in a hurry and they looked similar, sorry didn't check in detail but didn't have the time.

@dchen1107: thanks!

Is there any way to have some confidence that upgrade to k8s 1.3 won't cause many issues with nodes crashing because of this? I mean, when thinking to upgrade my production cluster to 1.3, I may need to create a new one in 1.3, run things (only to test k8s) for a few weeks there and then maybe upgrade? There is no downgrade procedure, right?

Also, just curious: is it a problem if docker 1.9 continues to be used? Or in 1.3 we are using some features that require docker > 1.9? Just to know if that is an option too, until the problem is better understood

Maybe the bug is cause because some storage driver is used (and only affects that storage driver). My container was using debian:jessie and installed docker from docker's apt repositories and just started the daemon. I'm on mobile connection right now, so I can't check the driver easily. I can check it out in a few hours (like 6 hours) when I'm home again.

Random-Liu · 2016-06-22T20:05:45Z

Also, just curious: is it a problem if docker 1.9 continues to be used? Or in 1.3 we are using some features that require docker > 1.9? Just to know if that is an option too, until the problem is better understood

1.9 should still be supported.

Maybe the bug is cause because some storage driver is used (and only affects that storage driver). My container was using debian:jessie and installed docker from docker's apt repositories and just started the daemon. I'm on mobile connection right now, so I can't check the driver easily. I can check it out in a few hours (like 6 hours) when I'm home again.

I think it's aufs based on the kernel log and how you installed docker. :)

dchen1107 · 2016-06-22T20:14:12Z

Yes, 1.9.1 is still compatible with Kubernetes 1.3 release here.

rata · 2016-06-23T01:37:27Z

@Random-Liu @dchen1107: awesome, thanks! I'll try using another storage driver and report back if I hit it or not :-)

rata · 2016-06-23T15:33:05Z

It seems kubernetes 1.2.4 in AWS uses docker with AUFS:

root      5825  2.0  0.7 2117088 60908 ?       Ssl  May27 787:20 /usr/bin/docker daemon -H fd:// -s aufs -g /mnt/ephemeral/docker --bridge=cbr0 --iptables=false --ip-masq=false --log-level=warn

Is this the case on GKE and GCE too?

I'll check what is the default storage driver in k8s 1.3

dchen1107 · 2016-06-23T15:53:59Z

@rata, Kubernetes today support 3 different storage driver: aufs, overlayfs, and devicemapper. On both GKE and GCE case, Kubernetes are using aufs. We are switching to overlayfs through a new containervm image: gci, but just start this process.

rata · 2016-06-23T18:04:04Z

@dchen1107: thanks for the info. It seems it's difficult for me to use another storage driver as the kube-up setup on AWS uses aufs and as nodes crashes, nodes are created again with aufs and not easy to use other format without modifying the Auto Scaling Group.

Random-Liu · 2016-06-24T06:35:32Z

@rata @dchen1107 @girishkalele
FYI, the node problem detector v0.2 should be able to report the kernel panic to the control plane as event.
This will at least surface the problem to the user.

See kubernetes/node-problem-detector#22

philips · 2016-06-27T22:16:22Z

I am confused on what is going on here.

During the release burndown @mike-saparov mentioned that we are considering recommending Docker v1.9 for k8s v1.3 because of this Kernel bug. However, it seems more reasonable to document it and ask the distros to patch their Kernels.

Can someone give an update on what the current thinking is for the release?

rata · 2016-06-27T22:40:52Z

@philips: Maybe trying to reproduce helps you to have a better idea? That's the only thing I can add, the rest of the message is mostly about that and nothing else. So feel free to ignore :-)

I can easily reproduce this using a a pod with two containers: a) privileged container with debian jessie with docker 1.11.2 or 1.10.3 from docker repos (it happens with both), b) docker-gc branch "fixes" from https://github.com/rata/docker-gc (actually, I realize the repo at work has a small script that sleeps and runs docker-gc in an infinite loop and that is run). Although, if I only use a pod with only one container with docker >= 1.10 installed in debian jessie and listen as a daemon and use it via the network to build docker images (just like the pod with 2 containers, but without docker-gc cache is not deleted), then after a few days it crashes too. But with docker-gc it crashes way faster.

I can upload the Dockerfiles and yamls used if someone wants them

I'm not sure if this bug has been fixed upstream or if @jfrazelle, that also saw this, knows a work around. Nor if the tests and people is using a newer docker version without issues. Maybe the bug is related to something docker-gc does and unlikely to happen. But to the best of my knowledge, is not known. And also, kubernetes deletes docker images when there is not enough space free, not sure if that (or something else that kubernetes might do and I don't know) makes it more likely to happen.

I'll not have time to try to fix the kernel bug (or try newer kernels and see if it doesn't happen) these days. But no problem to help someone reproduce, upload the dockerfiles and deployments I use, etc.

dchen1107 · 2016-06-27T23:03:15Z

@philips We didn't recommend Docker 1.9 for k8s v1.3 yet, also what we discussed at burndown meeting has nothing to do with this issue. For this one, we plan to document it, or suggested the user of the node-problem-detector to upgrade their detector so that the kernel issues are visible to the end users. Also the users can understand why their applications being restarted or why their nodes being rebooted.

At burndown meeting, we talked about 1.3 blocker issue: #27691. The engineers suspected the issues in either Kubernetes component (we changed the entire code path for 1.3) or docker runtime code. To narrow down the issue, we decided to run some tests against docker 1.9.1, and kubernetes 1.3 beta.

mike-saparov · 2016-06-28T00:30:51Z

@dchen1107 thanks for clarification!

Random-Liu · 2016-06-29T22:42:30Z

XREF #27076

xinuc · 2016-08-04T12:53:44Z

this error still happens with Docker 1.12 btw.

rata · 2016-08-04T14:53:50Z

Just in case it's useful to someone, I workaround this by basically writing to an external volume.

The pod that builds docker images now uses an EBS volume mounted on /var/lib/docker and this issue never happened again (so far, at least). This makes sense, as it seemed to be an aufs related issue and now it is not using it to write docker images.

xinuc · 2016-08-05T11:14:22Z

we downgraded kubernetes to 1.2.6 but keep using docker 1.12, and the problem disappear.

so it's kubernetes 1.3's issue.

rata · 2016-08-05T13:33:54Z

On Fri, Aug 05, 2016 at 04:15:11AM -0700, Nugroho Herucahyono wrote:

we downgraded kubernetes to 1.2.6 but keep using docker 1.12, and the problem disappear.

so it's kubernetes 1.3's issue.

A kernel bug seems more like a kernel issue :)

What kernel version? Can you upgrade your kernel and see if the issue persist
with k8s 1.3?

chy168 · 2016-10-15T06:33:28Z

the same error happened:

aufs au_opts_verify:1597:dockerd[4583]: dirperm1 breaks the protection by the permission bits on the lower branch

server info:
Kubernetes: 1.3.6
Linux kernel: 4.4.0-38-generic
Docker version: 1.12.1

cmluciano · 2017-03-01T01:32:26Z

Anyone still observing this behavior?

rata · 2017-03-01T02:42:37Z

@cmluciano Using the workaround I posted it doesn't happen. And it seems that with newer kernels it also doesn't happen. Are you seeing it? which k8s, docker and kernel version?

cmluciano · 2017-03-01T03:04:28Z

I have not, wondering if this issue should be closed

rata · 2017-03-01T03:19:41Z

@cmluciano oh, good point. Will close it, it can be reopened if relevant. Thanks!

rata changed the title ~~Kernel panic when having a privileged container with docker 1.10~~ Kernel panic when having a privileged container with docker >= 1.10 Jun 22, 2016

dchen1107 added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jun 22, 2016

dchen1107 added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jun 22, 2016

Random-Liu mentioned this issue Jun 24, 2016

Kernel Monitor: Add look back support and kernel panic handling kubernetes/node-problem-detector#22

Merged

rata mentioned this issue Aug 17, 2016

Validated kernel version / filesystem plugin for 1.3? #30706

Closed

Random-Liu mentioned this issue Nov 14, 2016

logwatchers: add new kmsg-based kernel log watcher kubernetes/node-problem-detector#41

Merged

firecube mentioned this issue Nov 24, 2016

Raspberry PI 1 - dirperm1 breaks the protection by the permission bits on the lower branch balena-os/balena-os#132

Closed

rata closed this as completed Mar 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel panic when having a privileged container with docker >= 1.10 #27885

Kernel panic when having a privileged container with docker >= 1.10 #27885

rata commented Jun 22, 2016

girishkalele commented Jun 22, 2016

Random-Liu commented Jun 22, 2016

jessfraz commented Jun 22, 2016

dchen1107 commented Jun 22, 2016

Random-Liu commented Jun 22, 2016

dchen1107 commented Jun 22, 2016

rata commented Jun 22, 2016

Random-Liu commented Jun 22, 2016

dchen1107 commented Jun 22, 2016

rata commented Jun 23, 2016

rata commented Jun 23, 2016

dchen1107 commented Jun 23, 2016

rata commented Jun 23, 2016

Random-Liu commented Jun 24, 2016 •

edited

Loading

philips commented Jun 27, 2016

rata commented Jun 27, 2016

dchen1107 commented Jun 27, 2016

mike-saparov commented Jun 28, 2016

Random-Liu commented Jun 29, 2016 •

edited

Loading

xinuc commented Aug 4, 2016

rata commented Aug 4, 2016 •

edited

Loading

xinuc commented Aug 5, 2016

rata commented Aug 5, 2016

chy168 commented Oct 15, 2016

cmluciano commented Mar 1, 2017

rata commented Mar 1, 2017 •

edited

Loading

cmluciano commented Mar 1, 2017

rata commented Mar 1, 2017

Kernel panic when having a privileged container with docker >= 1.10 #27885

Kernel panic when having a privileged container with docker >= 1.10 #27885

Comments

rata commented Jun 22, 2016

girishkalele commented Jun 22, 2016

Random-Liu commented Jun 22, 2016

jessfraz commented Jun 22, 2016

dchen1107 commented Jun 22, 2016

Random-Liu commented Jun 22, 2016

dchen1107 commented Jun 22, 2016

rata commented Jun 22, 2016

Random-Liu commented Jun 22, 2016

dchen1107 commented Jun 22, 2016

rata commented Jun 23, 2016

rata commented Jun 23, 2016

dchen1107 commented Jun 23, 2016

rata commented Jun 23, 2016

Random-Liu commented Jun 24, 2016 • edited Loading

philips commented Jun 27, 2016

rata commented Jun 27, 2016

dchen1107 commented Jun 27, 2016

mike-saparov commented Jun 28, 2016

Random-Liu commented Jun 29, 2016 • edited Loading

xinuc commented Aug 4, 2016

rata commented Aug 4, 2016 • edited Loading

xinuc commented Aug 5, 2016

rata commented Aug 5, 2016

chy168 commented Oct 15, 2016

cmluciano commented Mar 1, 2017

rata commented Mar 1, 2017 • edited Loading

cmluciano commented Mar 1, 2017

rata commented Mar 1, 2017

Random-Liu commented Jun 24, 2016 •

edited

Loading

Random-Liu commented Jun 29, 2016 •

edited

Loading

rata commented Aug 4, 2016 •

edited

Loading

rata commented Mar 1, 2017 •

edited

Loading