Skip to content

GCE PD Data Loss #3965

Closed
Closed
@smreed

Description

0.9.1 (edit: originally thought it was 0.9.2) in GCE and I'm seeing data disappear from GCE PDs after deleting the pod that mounts it.

I'm attaching some relevant looking bits from the kubelet log.

I0130 01:00:21.156131    4820 kubelet.go:1255] Killing unwanted container {podFullName:prometheus.default.api uid:6fcd7fc4-a7ff-11e4-85c6-42010af0b3db containerName:prometheus}
I0130 01:00:21.156169    4820 kubelet.go:863] Killing container with id "1f85b85dc825b218f9273869678d09b7f6c3a93307ac2b5edb9255f5a406dfb3" and name "/k8s_prometheus.95c0e9de_prometheus.default.api_6fcd7fc4-a7ff-11e4-85c6-42010af0b3db_0f4a949a"
I0130 01:00:21.156331    4820 event.go:117] Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default", Name:"prometheus", UID:"6fcd7fc4-a7ff-11e4-85c6-42010af0b3db", APIVersion:"v1beta1", ResourceVersion:"", FieldPath:"spec.containers{node-exporter}"}): reason: 'killing' Killing b4bfcd6c3de50df86f1e6dfb5ddbcba94fb731c07375f849e907d685bde1c142 - /k8s_node-exporter.562dc0b8_prometheus.default.api_6fcd7fc4-a7ff-11e4-85c6-42010af0b3db_6354e686
I0130 01:00:21.322846    4820 event.go:117] Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default", Name:"prometheus", UID:"6fcd7fc4-a7ff-11e4-85c6-42010af0b3db", APIVersion:"v1beta1", ResourceVersion:"", FieldPath:"spec.containers{prometheus}"}): reason: 'killing' Killing 1f85b85dc825b218f9273869678d09b7f6c3a93307ac2b5edb9255f5a406dfb3 - /k8s_prometheus.95c0e9de_prometheus.default.api_6fcd7fc4-a7ff-11e4-85c6-42010af0b3db_0f4a949a
E0130 01:00:21.354926    4820 kubelet.go:1345] Couldn't sync containers: remove /var/lib/kubelet/pods/6fcd7fc4-a7ff-11e4-85c6-42010af0b3db/volumes/kubernetes.io~gce-pd/prometheus-storage: device or resource busy
E0130 01:00:31.359134    4820 kubelet.go:1345] Couldn't sync containers: remove /var/lib/kubelet/pods/6fcd7fc4-a7ff-11e4-85c6-42010af0b3db/volumes/kubernetes.io~gce-pd/prometheus-storage: device or resource busy
E0130 01:00:41.363139    4820 kubelet.go:1345] Couldn't sync containers: remove /var/lib/kubelet/pods/6fcd7fc4-a7ff-11e4-85c6-42010af0b3db/volumes/kubernetes.io~gce-pd/prometheus-storage: device or resource busy
E0130 01:00:51.368149    4820 kubelet.go:1345] Couldn't sync containers: remove /var/lib/kubelet/pods/6fcd7fc4-a7ff-11e4-85c6-42010af0b3db/volumes/kubernetes.io~gce-pd/prometheus-storage: device or resource busy
E0130 01:01:01.372352    4820 kubelet.go:1345] Couldn't sync containers: remove /var/lib/kubelet/pods/6fcd7fc4-a7ff-11e4-85c6-42010af0b3db/volumes/kubernetes.io~gce-pd/prometheus-storage: device or resource busy

Around this point I ssh to the minion and ls -l the root of the PD and found nothing there. There was definitely data on the device before I deleted the pod.

Also, after this happens, I have to manually detach the disk using gcloud or the GCE console and then umount all of the places where the PD was mounted on the minion.

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.priority/backlogHigher priority than priority/awaiting-more-evidence.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions