Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to mount volume gcePersistentDisk with readOnly: true for multiple pods (timeout expired waiting for volumes to attach/mount) #31176

Closed
ceefour opened this issue Aug 22, 2016 · 13 comments
Assignees
Labels
area/controller-manager lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage.

Comments

@ceefour
Copy link

ceefour commented Aug 22, 2016

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.):

timeout expired waiting for volumes to attach/mount gcePersistentDisk readOnly


Is this a BUG REPORT or FEATURE REQUEST? (choose one):

Bug report

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.4", GitCommit:"dd6b458ef8dbf24aff55795baa68f83383c9b3a9", GitTreeState:"clean", BuildDate:"2016-08-01T16:45:16Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.4", GitCommit:"dd6b458ef8dbf24aff55795baa68f83383c9b3a9", GitTreeState:"clean", BuildDate:"2016-08-01T16:38:31Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: GKE
  • OS (e.g. from /etc/os-release):
PRETTY_NAME="Debian GNU/Linux 7 (wheezy)"
NAME="Debian GNU/Linux"
VERSION_ID="7"
VERSION="7 (wheezy)"
ID=debian
ANSI_COLOR="1;31"
HOME_URL="http://www.debian.org/"
SUPPORT_URL="http://www.debian.org/support/"
BUG_REPORT_URL="http://bugs.debian.org/"

What happened:

readOnly persistent disk won't mount multiple times

What you expected to happen:

Can be mounted multiple times

How to reproduce it (as minimally and precisely as possible):

Create 3 RCs (with 1 replica each) that mount the same persistent disk as read only.

The first pod always succeeds mounting.

kubectl get po
NAME              READY     STATUS              RESTARTS   AGE
mongo-express     1/1       Running             0          2h
mongo-rc0-rpah5   1/1       Running             0          30m
mongo-rc1-xkcyy   1/1       Running             0          17m
mongo-rc2-aktox   0/1       ContainerCreating   0          30m
kubectl describe po mongo-rc0-rpah5
Name:           mongo-rc0-rpah5
Namespace:      default
Node:           gke-fatih-small-pool-59881027-k909/10.142.0.5
Start Time:     Tue, 23 Aug 2016 05:01:48 +0700
Labels:         instance=fatih0
                name=mongo-node0
Status:         Running
IP:             10.60.2.5
Controllers:    ReplicationController/mongo-rc0
Containers:
  mongo-node0:
    Container ID:       docker://2845af5d815ff62505284ee8ef22bb5be3fa7f276e00d9347c9ce0e726e45d2f
    Image:              mongo
    Image ID:           docker://sha256:af52553e1c34b3ec48a2e50cf73a1eed1fc6d2fd2b0d3d73d7397c8d6341551f
    Port:               27017/TCP
    Command:
      mongod
      --replSet
      bippo
      --storageEngine
      wiredTiger
      --keyFile
      /etc/mongo-conf/mongo.keyfile
    Requests:
      cpu:                      100m
    State:                      Running
      Started:                  Tue, 23 Aug 2016 05:02:06 +0700
    Ready:                      True
    Restart Count:              0
    Environment Variables:      <none>
Conditions:
  Type          Status
  Initialized   True
  Ready         True
  PodScheduled  True
Volumes:
  mongo-conf:
    Type:       GCEPersistentDisk (a Persistent Disk resource in Google Compute Engine)
    PDName:     mongo-conf
    FSType:     ext4
    Partition:  0
    ReadOnly:   true
  mongo-persistent-storage0:
    Type:       GCEPersistentDisk (a Persistent Disk resource in Google Compute Engine)
    PDName:     mongodb-disk0
    FSType:     ext4
    Partition:  0
    ReadOnly:   false
  default-token-i0lox:
    Type:       Secret (a volume populated by a Secret)
    SecretName: default-token-i0lox
QoS Tier:       Burstable
Events:
  FirstSeen     LastSeen        Count   From                                            SubobjectPath
Type            Reason          Message
  ---------     --------        -----   ----                                            -------------
--------        ------          -------
  33m           33m             1       {default-scheduler }                                                    Normal
        Scheduled       Successfully assigned mongo-rc0-rpah5 to gke-fatih-small-pool-59881027-k909
  33m           33m             1       {kubelet gke-fatih-small-pool-59881027-k909}    spec.containers{mongo-node0}
Normal          Pulling         pulling image "mongo"
  33m           33m             1       {kubelet gke-fatih-small-pool-59881027-k909}    spec.containers{mongo-node0}
Normal          Pulled          Successfully pulled image "mongo"
  33m           33m             1       {kubelet gke-fatih-small-pool-59881027-k909}    spec.containers{mongo-node0}
Normal          Created         Created container with docker id 2845af5d815f
  33m           33m             1       {kubelet gke-fatih-small-pool-59881027-k909}    spec.containers{mongo-node0}
Normal          Started         Started container with docker id 2845af5d815f

The second pod is intermittent. The third pod (below) always fails.

kubectl describe po mongo-rc2-aktox
Name:           mongo-rc2-aktox
Namespace:      default
Node:           gke-fatih-small-pool-59881027-h46g/10.142.0.6
Start Time:     Tue, 23 Aug 2016 05:01:49 +0700
Labels:         instance=fatih2
                name=mongo-node2
Status:         Pending
IP:
Controllers:    ReplicationController/mongo-rc2
Containers:
  mongo-node2:
    Container ID:
    Image:              mongo
    Image ID:
    Port:               27017/TCP
    Command:
      mongod
      --replSet
      bippo
      --storageEngine
      wiredTiger
      --keyFile
      /etc/mongo-conf/mongo.keyfile
    Requests:
      cpu:                      100m
    State:                      Waiting
      Reason:                   ContainerCreating
    Ready:                      False
    Restart Count:              0
    Environment Variables:      <none>
Conditions:
  Type          Status
  Initialized   True
  Ready         False
  PodScheduled  True
Volumes:
  mongo-conf:
    Type:       GCEPersistentDisk (a Persistent Disk resource in Google Compute Engine)
    PDName:     mongo-conf
    FSType:     ext4
    Partition:  0
    ReadOnly:   true
  mongo-persistent-storage2:
    Type:       GCEPersistentDisk (a Persistent Disk resource in Google Compute Engine)
    PDName:     mongodb-disk2
    FSType:     ext4
    Partition:  0
    ReadOnly:   false
  default-token-i0lox:
    Type:       Secret (a volume populated by a Secret)
    SecretName: default-token-i0lox
QoS Tier:       Burstable
Events:
  FirstSeen     LastSeen        Count   From                                            SubobjectPath   Type
Reason          Message
  ---------     --------        -----   ----                                            -------------   --------
------          -------
  30m           30m             1       {default-scheduler }                                        Normal
Scheduled       Successfully assigned mongo-rc2-aktox to gke-fatih-small-pool-59881027-h46g
  28m           1m              13      {kubelet gke-fatih-small-pool-59881027-h46g}                Warning
FailedMount     Unable to mount volumes for pod "mongo-rc2-aktox_default(06ce97cd-68b4-11e6-b129-42010af0011e)": timeout expired waiting for volumes to attach/mount for pod "mongo-rc2-aktox"/"default". list of unattached/unmounted volumes=[mongo-conf]
  28m           1m              13      {kubelet gke-fatih-small-pool-59881027-h46g}                Warning
FailedSync      Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "mongo-rc2-aktox"/"default". list of unattached/unmounted volumes=[mongo-conf]

Anything else do we need to know:

Relevant /var/log/kubelet.log in the node:

ceefour@gke-fatih-small-pool-59881027-h46g:~$ tail -n100 /var/log/kubelet.log  | grep mongo-conf

E0822 22:30:30.157423    3439 kubelet.go:1932] Unable to mount volumes for pod "mongo-rc2-aktox_default(06ce97cd-68b4-11e6-b129-42010af0011e)": timeout expired waiting for volumes to attach/mount for pod "mongo-rc2-aktox"/"default". list of unattached/unmounted volumes=[mongo-conf]; skipping pod
E0822 22:30:30.157465    3439 pod_workers.go:183] Error syncing pod 06ce97cd-68b4-11e6-b129-42010af0011e, skipping: timeout expired waiting for volumes to attach/mount for pod "mongo-rc2-aktox"/"default". list of unattached/unmounted volumes=[mongo-conf]
I0822 22:31:05.117374    3439 reconciler.go:180] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/gce-pd/mongo-conf" (spec.Name: "mongo-conf") pod "06ce97cd-68b4-11e6-b129-42010af0011e" (UID: "06ce97cd-68b4-11e6-b129-42010af0011e")
E0822 22:31:05.120603    3439 nestedpendingoperations.go:233] Operation for "\"kubernetes.io/gce-pd/mongo-conf\"" failed. No retries permitted until 2016-08-22 22:33:05.120584643 +0000 UTC (durationBeforeRetry 2m0s). Error: Volume "kubernetes.io/gce-pd/mongo-conf" (spec.Name: "mongo-conf") pod "06ce97cd-68b4-11e6-b129-42010af0011e" (UID: "06ce97cd-68b4-11e6-b129-42010af0011e") is not yet attached according to node status.
E0822 22:32:45.156181    3439 kubelet.go:1932] Unable to mount volumes for pod "mongo-rc2-aktox_default(06ce97cd-68b4-11e6-b129-42010af0011e)": timeout expired waiting for volumes to attach/mount for pod "mongo-rc2-aktox"/"default". list of unattached/unmounted volumes=[mongo-conf]; skipping pod
E0822 22:32:45.156226    3439 pod_workers.go:183] Error syncing pod 06ce97cd-68b4-11e6-b129-42010af0011e, skipping: timeout expired waiting for volumes to attach/mount for pod "mongo-rc2-aktox"/"default". list of unattached/unmounted volumes=[mongo-conf]
I0822 22:33:05.156100    3439 reconciler.go:180] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/gce-pd/mongo-conf" (spec.Name: "mongo-conf") pod "06ce97cd-68b4-11e6-b129-42010af0011e" (UID: "06ce97cd-68b4-11e6-b129-42010af0011e")
E0822 22:33:05.159285    3439 nestedpendingoperations.go:233] Operation for "\"kubernetes.io/gce-pd/mongo-conf\"" failed. No retries permitted until 2016-08-22 22:35:05.159269631 +0000 UTC (durationBeforeRetry 2m0s). Error: Volume "kubernetes.io/gce-pd/mongo-conf" (spec.Name: "mongo-conf") pod "06ce97cd-68b4-11e6-b129-42010af0011e" (UID: "06ce97cd-68b4-11e6-b129-42010af0011e") is not yet attached according to node status.
E0822 22:34:58.157744    3439 kubelet.go:1932] Unable to mount volumes for pod "mongo-rc2-aktox_default(06ce97cd-68b4-11e6-b129-42010af0011e)": timeout expired waiting for volumes to attach/mount for pod "mongo-rc2-aktox"/"default". list of unattached/unmounted volumes=[mongo-conf]; skipping pod
E0822 22:34:58.161069    3439 pod_workers.go:183] Error syncing pod 06ce97cd-68b4-11e6-b129-42010af0011e, skipping: timeout expired waiting for volumes to attach/mount for pod "mongo-rc2-aktox"/"default". list of unattached/unmounted volumes=[mongo-conf]
I0822 22:35:05.188747    3439 reconciler.go:180] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/gce-pd/mongo-conf" (spec.Name: "mongo-conf") pod "06ce97cd-68b4-11e6-b129-42010af0011e" (UID: "06ce97cd-68b4-11e6-b129-42010af0011e")
E0822 22:35:05.191576    3439 nestedpendingoperations.go:233] Operation for "\"kubernetes.io/gce-pd/mongo-conf\"" failed. No retries permitted until 2016-08-22 22:37:05.191557765 +0000 UTC (durationBeforeRetry 2m0s). Error: Volume "kubernetes.io/gce-pd/mongo-conf" (spec.Name: "mongo-conf") pod "06ce97cd-68b4-11e6-b129-42010af0011e" (UID: "06ce97cd-68b4-11e6-b129-42010af0011e") is not yet attached according to node status.

/var/log/kubelet.log for 50 last lines:

ceefour@gke-fatih-small-pool-59881027-h46g:~$ tail -n50 /var/log/kubelet.log
I0822 22:35:22.870417    3439 server.go:959] GET /healthz: (28.36µs) 200 [[curl/7.26.0] 127.0.0.1:47687]
I0822 22:35:32.239049    3439 reconciler.go:254] MountVolume operation started for volume "kubernetes.io/secret/33f45369-687d-11e6-b129-42010af0011e-default-token-sb5ix" (spec.Name: "default-token-sb5ix") to pod "33f45369-687d-11e6-b129-42010af0011e" (UID: "33f45369-687d-11e6-b129-42010af0011e"). Volume is already mounted to pod, but remount was requested.
I0822 22:35:32.242781    3439 operation_executor.go:740] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/33f45369-687d-11e6-b129-42010af0011e-default-token-sb5ix" (spec.Name: "default-token-sb5ix") pod "33f45369-687d-11e6-b129-42010af0011e" (UID: "33f45369-687d-11e6-b129-42010af0011e").
I0822 22:35:32.883593    3439 server.go:959] GET /healthz: (42.657µs) 200 [[curl/7.26.0] 127.0.0.1:47694]
I0822 22:35:38.160222    3439 reconciler.go:254] MountVolume operation started for volume "kubernetes.io/secret/ad21d245-689f-11e6-b129-42010af0011e-default-token-i0lox" (spec.Name: "default-token-i0lox") to pod "ad21d245-689f-11e6-b129-42010af0011e" (UID: "ad21d245-689f-11e6-b129-42010af0011e"). Volume is already mounted to pod, but remount was requested.
I0822 22:35:38.163717    3439 operation_executor.go:740] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/ad21d245-689f-11e6-b129-42010af0011e-default-token-i0lox" (spec.Name: "default-token-i0lox") pod "ad21d245-689f-11e6-b129-42010af0011e" (UID: "ad21d245-689f-11e6-b129-42010af0011e").
I0822 22:35:38.537711    3439 server.go:959] GET /healthz: (44.537µs) 200 [[Go-http-client/1.1] 127.0.0.1:47701]
I0822 22:35:39.660496    3439 container_manager_linux.go:614] Found 45 PIDs in root, 45 of them are not to be moved
I0822 22:35:42.896284    3439 server.go:959] GET /healthz: (38.658µs) 200 [[curl/7.26.0] 127.0.0.1:47705]
I0822 22:35:52.910178    3439 server.go:959] GET /healthz: (25.065µs) 200 [[curl/7.26.0] 127.0.0.1:47712]
I0822 22:36:02.922869    3439 server.go:959] GET /healthz: (24.963µs) 200 [[curl/7.26.0] 127.0.0.1:47717]
I0822 22:36:05.013580    3439 server.go:959] GET /stats/summary/: (6.953162ms) 200 [[Go-http-client/1.1] 10.60.2.3:57402]
I0822 22:36:12.935348    3439 server.go:959] GET /healthz: (27.682µs) 200 [[curl/7.26.0] 127.0.0.1:47724]
I0822 22:36:22.948215    3439 server.go:959] GET /healthz: (28.114µs) 200 [[curl/7.26.0] 127.0.0.1:47731]
I0822 22:36:32.961851    3439 server.go:959] GET /healthz: (44.817µs) 200 [[curl/7.26.0] 127.0.0.1:47738]
I0822 22:36:38.537762    3439 server.go:959] GET /healthz: (47.175µs) 200 [[Go-http-client/1.1] 127.0.0.1:47743]
I0822 22:36:39.176725    3439 reconciler.go:254] MountVolume operation started for volume "kubernetes.io/secret/ad21d245-689f-11e6-b129-42010af0011e-default-token-i0lox" (spec.Name: "default-token-i0lox") to pod "ad21d245-689f-11e6-b129-42010af0011e" (UID: "ad21d245-689f-11e6-b129-42010af0011e"). Volume is already mounted to pod, but remount was requested.
I0822 22:36:39.189075    3439 operation_executor.go:740] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/ad21d245-689f-11e6-b129-42010af0011e-default-token-i0lox" (spec.Name: "default-token-i0lox") pod "ad21d245-689f-11e6-b129-42010af0011e" (UID: "ad21d245-689f-11e6-b129-42010af0011e").
I0822 22:36:39.663850    3439 container_manager_linux.go:614] Found 45 PIDs in root, 45 of them are not to be moved
I0822 22:36:42.974584    3439 server.go:959] GET /healthz: (40.231µs) 200 [[curl/7.26.0] 127.0.0.1:47746]
I0822 22:36:43.184638    3439 reconciler.go:254] MountVolume operation started for volume "kubernetes.io/secret/33f45369-687d-11e6-b129-42010af0011e-default-token-sb5ix" (spec.Name: "default-token-sb5ix") to pod "33f45369-687d-11e6-b129-42010af0011e" (UID: "33f45369-687d-11e6-b129-42010af0011e"). Volume is already mounted to pod, but remount was requested.
I0822 22:36:43.187674    3439 operation_executor.go:740] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/33f45369-687d-11e6-b129-42010af0011e-default-token-sb5ix" (spec.Name: "default-token-sb5ix") pod "33f45369-687d-11e6-b129-42010af0011e" (UID: "33f45369-687d-11e6-b129-42010af0011e").
I0822 22:36:52.988636    3439 server.go:959] GET /healthz: (40.953µs) 200 [[curl/7.26.0] 127.0.0.1:47754]
I0822 22:37:03.003414    3439 server.go:959] GET /healthz: (42.733µs) 200 [[curl/7.26.0] 127.0.0.1:47759]
I0822 22:37:05.034193    3439 server.go:959] GET /stats/summary/: (6.947947ms) 200 [[Go-http-client/1.1] 10.60.2.3:57402]
I0822 22:37:05.233719    3439 reconciler.go:180] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/gce-pd/mongo-conf" (spec.Name: "mongo-conf") pod "06ce97cd-68b4-11e6-b129-42010af0011e" (UID: "06ce97cd-68b4-11e6-b129-42010af0011e")
E0822 22:37:05.236650    3439 nestedpendingoperations.go:233] Operation for "\"kubernetes.io/gce-pd/mongo-conf\"" failed. No retries permitted until 2016-08-22 22:39:05.23662061 +0000 UTC (durationBeforeRetry 2m0s). Error: Volume "kubernetes.io/gce-pd/mongo-conf" (spec.Name: "mongo-conf") pod "06ce97cd-68b4-11e6-b129-42010af0011e" (UID: "06ce97cd-68b4-11e6-b129-42010af0011e") is not yet attached according to node status.
E0822 22:37:09.156182    3439 kubelet.go:1932] Unable to mount volumes for pod "mongo-rc2-aktox_default(06ce97cd-68b4-11e6-b129-42010af0011e)": timeout expired waiting for volumes to attach/mount for pod "mongo-rc2-aktox"/"default". list of unattached/unmounted volumes=[mongo-conf]; skipping pod
E0822 22:37:09.156242    3439 pod_workers.go:183] Error syncing pod 06ce97cd-68b4-11e6-b129-42010af0011e, skipping: timeout expired waiting for volumes to attach/mount for pod "mongo-rc2-aktox"/"default". list of unattached/unmounted volumes=[mongo-conf]
I0822 22:37:13.015933    3439 server.go:959] GET /healthz: (54.514µs) 200 [[curl/7.26.0] 127.0.0.1:47768]
I0822 22:37:23.028881    3439 server.go:959] GET /healthz: (43.187µs) 200 [[curl/7.26.0] 127.0.0.1:47775]
I0822 22:37:23.185043    3439 reconciler.go:254] MountVolume operation started for volume "kubernetes.io/secret/06ce97cd-68b4-11e6-b129-42010af0011e-default-token-i0lox" (spec.Name: "default-token-i0lox") to pod "06ce97cd-68b4-11e6-b129-42010af0011e" (UID: "06ce97cd-68b4-11e6-b129-42010af0011e"). Volume is already mounted to pod, but remount was requested.
I0822 22:37:23.188665    3439 operation_executor.go:740] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/06ce97cd-68b4-11e6-b129-42010af0011e-default-token-i0lox" (spec.Name: "default-token-i0lox") pod "06ce97cd-68b4-11e6-b129-42010af0011e" (UID: "06ce97cd-68b4-11e6-b129-42010af0011e").
I0822 22:37:33.042085    3439 server.go:959] GET /healthz: (42.351µs) 200 [[curl/7.26.0] 127.0.0.1:47784]
I0822 22:37:38.537583    3439 server.go:959] GET /healthz: (40.577µs) 200 [[Go-http-client/1.1] 127.0.0.1:47789]
I0822 22:37:39.667127    3439 container_manager_linux.go:614] Found 45 PIDs in root, 45 of them are not to be moved
I0822 22:37:43.055076    3439 server.go:959] GET /healthz: (26.24µs) 200 [[curl/7.26.0] 127.0.0.1:47792]
I0822 22:37:53.067791    3439 server.go:959] GET /healthz: (26.387µs) 200 [[curl/7.26.0] 127.0.0.1:47800]
I0822 22:38:03.089970    3439 server.go:959] GET /healthz: (38.445µs) 200 [[curl/7.26.0] 127.0.0.1:47805]
I0822 22:38:05.013454    3439 server.go:959] GET /stats/summary/: (7.716171ms) 200 [[Go-http-client/1.1] 10.60.2.3:57402]
I0822 22:38:06.174617    3439 reconciler.go:254] MountVolume operation started for volume "kubernetes.io/secret/ad21d245-689f-11e6-b129-42010af0011e-default-token-i0lox" (spec.Name: "default-token-i0lox") to pod "ad21d245-689f-11e6-b129-42010af0011e" (UID: "ad21d245-689f-11e6-b129-42010af0011e"). Volume is already mounted to pod, but remount was requested.
I0822 22:38:06.177284    3439 operation_executor.go:740] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/ad21d245-689f-11e6-b129-42010af0011e-default-token-i0lox" (spec.Name: "default-token-i0lox") pod "ad21d245-689f-11e6-b129-42010af0011e" (UID: "ad21d245-689f-11e6-b129-42010af0011e").
I0822 22:38:11.184431    3439 reconciler.go:254] MountVolume operation started for volume "kubernetes.io/secret/33f45369-687d-11e6-b129-42010af0011e-default-token-sb5ix" (spec.Name: "default-token-sb5ix") to pod "33f45369-687d-11e6-b129-42010af0011e" (UID: "33f45369-687d-11e6-b129-42010af0011e"). Volume is already mounted to pod, but remount was requested.
I0822 22:38:11.187573    3439 operation_executor.go:740] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/33f45369-687d-11e6-b129-42010af0011e-default-token-sb5ix" (spec.Name: "default-token-sb5ix") pod "33f45369-687d-11e6-b129-42010af0011e" (UID: "33f45369-687d-11e6-b129-42010af0011e").
I0822 22:38:13.101813    3439 server.go:959] GET /healthz: (35.796µs) 200 [[curl/7.26.0] 127.0.0.1:47814]
I0822 22:38:23.122472    3439 server.go:959] GET /healthz: (36.311µs) 200 [[curl/7.26.0] 127.0.0.1:47821]
I0822 22:38:33.136366    3439 server.go:959] GET /healthz: (40.312µs) 200 [[curl/7.26.0] 127.0.0.1:47828]
I0822 22:38:38.537838    3439 server.go:959] GET /healthz: (44.226µs) 200 [[Go-http-client/1.1] 127.0.0.1:47833]
I0822 22:38:39.670329    3439 container_manager_linux.go:614] Found 45 PIDs in root, 45 of them are not to be moved
I0822 22:38:43.157960    3439 server.go:959] GET /healthz: (34.945µs) 200 [[curl/7.26.0] 127.0.0.1:47836]
ceefour@gke-fatih-small-pool-59881027-h46g:~$
ceefour@gke-fatih-small-pool-59881027-h46g:~$ tail -n50 /var/log/kubelet.log
I0822 22:35:38.537711    3439 server.go:959] GET /healthz: (44.537µs) 200 [[Go-http-client/1.1] 127.0.0.1:47701]
I0822 22:35:39.660496    3439 container_manager_linux.go:614] Found 45 PIDs in root, 45 of them are not to be moved
I0822 22:35:42.896284    3439 server.go:959] GET /healthz: (38.658µs) 200 [[curl/7.26.0] 127.0.0.1:47705]
I0822 22:35:52.910178    3439 server.go:959] GET /healthz: (25.065µs) 200 [[curl/7.26.0] 127.0.0.1:47712]
I0822 22:36:02.922869    3439 server.go:959] GET /healthz: (24.963µs) 200 [[curl/7.26.0] 127.0.0.1:47717]
I0822 22:36:05.013580    3439 server.go:959] GET /stats/summary/: (6.953162ms) 200 [[Go-http-client/1.1] 10.60.2.3:57402]
I0822 22:36:12.935348    3439 server.go:959] GET /healthz: (27.682µs) 200 [[curl/7.26.0] 127.0.0.1:47724]
I0822 22:36:22.948215    3439 server.go:959] GET /healthz: (28.114µs) 200 [[curl/7.26.0] 127.0.0.1:47731]
I0822 22:36:32.961851    3439 server.go:959] GET /healthz: (44.817µs) 200 [[curl/7.26.0] 127.0.0.1:47738]
I0822 22:36:38.537762    3439 server.go:959] GET /healthz: (47.175µs) 200 [[Go-http-client/1.1] 127.0.0.1:47743]
I0822 22:36:39.176725    3439 reconciler.go:254] MountVolume operation started for volume "kubernetes.io/secret/ad21d245-689f-11e6-b129-42010af0011e-default-token-i0lox" (spec.Name: "default-token-i0lox") to pod "ad21d245-689f-11e6-b129-42010af0011e" (UID: "ad21d245-689f-11e6-b129-42010af0011e"). Volume is already mounted to pod, but remount was requested.
I0822 22:36:39.189075    3439 operation_executor.go:740] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/ad21d245-689f-11e6-b129-42010af0011e-default-token-i0lox" (spec.Name: "default-token-i0lox") pod "ad21d245-689f-11e6-b129-42010af0011e" (UID: "ad21d245-689f-11e6-b129-42010af0011e").
I0822 22:36:39.663850    3439 container_manager_linux.go:614] Found 45 PIDs in root, 45 of them are not to be moved
I0822 22:36:42.974584    3439 server.go:959] GET /healthz: (40.231µs) 200 [[curl/7.26.0] 127.0.0.1:47746]
I0822 22:36:43.184638    3439 reconciler.go:254] MountVolume operation started for volume "kubernetes.io/secret/33f45369-687d-11e6-b129-42010af0011e-default-token-sb5ix" (spec.Name: "default-token-sb5ix") to pod "33f45369-687d-11e6-b129-42010af0011e" (UID: "33f45369-687d-11e6-b129-42010af0011e"). Volume is already mounted to pod, but remount was requested.
I0822 22:36:43.187674    3439 operation_executor.go:740] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/33f45369-687d-11e6-b129-42010af0011e-default-token-sb5ix" (spec.Name: "default-token-sb5ix") pod "33f45369-687d-11e6-b129-42010af0011e" (UID: "33f45369-687d-11e6-b129-42010af0011e").
I0822 22:36:52.988636    3439 server.go:959] GET /healthz: (40.953µs) 200 [[curl/7.26.0] 127.0.0.1:47754]
I0822 22:37:03.003414    3439 server.go:959] GET /healthz: (42.733µs) 200 [[curl/7.26.0] 127.0.0.1:47759]
I0822 22:37:05.034193    3439 server.go:959] GET /stats/summary/: (6.947947ms) 200 [[Go-http-client/1.1] 10.60.2.3:57402]
I0822 22:37:05.233719    3439 reconciler.go:180] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/gce-pd/mongo-conf" (spec.Name: "mongo-conf") pod "06ce97cd-68b4-11e6-b129-42010af0011e" (UID: "06ce97cd-68b4-11e6-b129-42010af0011e")
E0822 22:37:05.236650    3439 nestedpendingoperations.go:233] Operation for "\"kubernetes.io/gce-pd/mongo-conf\"" failed. No retries permitted until 2016-08-22 22:39:05.23662061 +0000 UTC (durationBeforeRetry 2m0s). Error: Volume "kubernetes.io/gce-pd/mongo-conf" (spec.Name: "mongo-conf") pod "06ce97cd-68b4-11e6-b129-42010af0011e" (UID: "06ce97cd-68b4-11e6-b129-42010af0011e") is not yet attached according to node status.
E0822 22:37:09.156182    3439 kubelet.go:1932] Unable to mount volumes for pod "mongo-rc2-aktox_default(06ce97cd-68b4-11e6-b129-42010af0011e)": timeout expired waiting for volumes to attach/mount for pod "mongo-rc2-aktox"/"default". list of unattached/unmounted volumes=[mongo-conf]; skipping pod
E0822 22:37:09.156242    3439 pod_workers.go:183] Error syncing pod 06ce97cd-68b4-11e6-b129-42010af0011e, skipping: timeout expired waiting for volumes to attach/mount for pod "mongo-rc2-aktox"/"default". list of unattached/unmounted volumes=[mongo-conf]
I0822 22:37:13.015933    3439 server.go:959] GET /healthz: (54.514µs) 200 [[curl/7.26.0] 127.0.0.1:47768]
I0822 22:37:23.028881    3439 server.go:959] GET /healthz: (43.187µs) 200 [[curl/7.26.0] 127.0.0.1:47775]
I0822 22:37:23.185043    3439 reconciler.go:254] MountVolume operation started for volume "kubernetes.io/secret/06ce97cd-68b4-11e6-b129-42010af0011e-default-token-i0lox" (spec.Name: "default-token-i0lox") to pod "06ce97cd-68b4-11e6-b129-42010af0011e" (UID: "06ce97cd-68b4-11e6-b129-42010af0011e"). Volume is already mounted to pod, but remount was requested.
I0822 22:37:23.188665    3439 operation_executor.go:740] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/06ce97cd-68b4-11e6-b129-42010af0011e-default-token-i0lox" (spec.Name: "default-token-i0lox") pod "06ce97cd-68b4-11e6-b129-42010af0011e" (UID: "06ce97cd-68b4-11e6-b129-42010af0011e").
I0822 22:37:33.042085    3439 server.go:959] GET /healthz: (42.351µs) 200 [[curl/7.26.0] 127.0.0.1:47784]
I0822 22:37:38.537583    3439 server.go:959] GET /healthz: (40.577µs) 200 [[Go-http-client/1.1] 127.0.0.1:47789]
I0822 22:37:39.667127    3439 container_manager_linux.go:614] Found 45 PIDs in root, 45 of them are not to be moved
I0822 22:37:43.055076    3439 server.go:959] GET /healthz: (26.24µs) 200 [[curl/7.26.0] 127.0.0.1:47792]
I0822 22:37:53.067791    3439 server.go:959] GET /healthz: (26.387µs) 200 [[curl/7.26.0] 127.0.0.1:47800]
I0822 22:38:03.089970    3439 server.go:959] GET /healthz: (38.445µs) 200 [[curl/7.26.0] 127.0.0.1:47805]
I0822 22:38:05.013454    3439 server.go:959] GET /stats/summary/: (7.716171ms) 200 [[Go-http-client/1.1] 10.60.2.3:57402]
I0822 22:38:06.174617    3439 reconciler.go:254] MountVolume operation started for volume "kubernetes.io/secret/ad21d245-689f-11e6-b129-42010af0011e-default-token-i0lox" (spec.Name: "default-token-i0lox") to pod "ad21d245-689f-11e6-b129-42010af0011e" (UID: "ad21d245-689f-11e6-b129-42010af0011e"). Volume is already mounted to pod, but remount was requested.
I0822 22:38:06.177284    3439 operation_executor.go:740] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/ad21d245-689f-11e6-b129-42010af0011e-default-token-i0lox" (spec.Name: "default-token-i0lox") pod "ad21d245-689f-11e6-b129-42010af0011e" (UID: "ad21d245-689f-11e6-b129-42010af0011e").
I0822 22:38:11.184431    3439 reconciler.go:254] MountVolume operation started for volume "kubernetes.io/secret/33f45369-687d-11e6-b129-42010af0011e-default-token-sb5ix" (spec.Name: "default-token-sb5ix") to pod "33f45369-687d-11e6-b129-42010af0011e" (UID: "33f45369-687d-11e6-b129-42010af0011e"). Volume is already mounted to pod, but remount was requested.
I0822 22:38:11.187573    3439 operation_executor.go:740] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/33f45369-687d-11e6-b129-42010af0011e-default-token-sb5ix" (spec.Name: "default-token-sb5ix") pod "33f45369-687d-11e6-b129-42010af0011e" (UID: "33f45369-687d-11e6-b129-42010af0011e").
I0822 22:38:13.101813    3439 server.go:959] GET /healthz: (35.796µs) 200 [[curl/7.26.0] 127.0.0.1:47814]
I0822 22:38:23.122472    3439 server.go:959] GET /healthz: (36.311µs) 200 [[curl/7.26.0] 127.0.0.1:47821]
I0822 22:38:33.136366    3439 server.go:959] GET /healthz: (40.312µs) 200 [[curl/7.26.0] 127.0.0.1:47828]
I0822 22:38:38.537838    3439 server.go:959] GET /healthz: (44.226µs) 200 [[Go-http-client/1.1] 127.0.0.1:47833]
I0822 22:38:39.670329    3439 container_manager_linux.go:614] Found 45 PIDs in root, 45 of them are not to be moved
I0822 22:38:43.157960    3439 server.go:959] GET /healthz: (34.945µs) 200 [[curl/7.26.0] 127.0.0.1:47836]
I0822 22:38:53.169820    3439 server.go:959] GET /healthz: (25µs) 200 [[curl/7.26.0] 127.0.0.1:47843]
I0822 22:39:03.181619    3439 server.go:959] GET /healthz: (29.069µs) 200 [[curl/7.26.0] 127.0.0.1:47849]
I0822 22:39:05.018297    3439 server.go:959] GET /stats/summary/: (6.900965ms) 200 [[Go-http-client/1.1] 10.60.2.3:57402]
I0822 22:39:05.318114    3439 reconciler.go:180] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/gce-pd/mongo-conf" (spec.Name: "mongo-conf") pod "06ce97cd-68b4-11e6-b129-42010af0011e" (UID: "06ce97cd-68b4-11e6-b129-42010af0011e")
E0822 22:39:05.321046    3439 nestedpendingoperations.go:233] Operation for "\"kubernetes.io/gce-pd/mongo-conf\"" failed. No retries permitted until 2016-08-22 22:41:05.321027505 +0000 UTC (durationBeforeRetry 2m0s). Error: Volume "kubernetes.io/gce-pd/mongo-conf" (spec.Name: "mongo-conf") pod "06ce97cd-68b4-11e6-b129-42010af0011e" (UID: "06ce97cd-68b4-11e6-b129-42010af0011e") is not yet attached according to node status.
I0822 22:39:13.194669    3439 server.go:959] GET /healthz: (26.614µs) 200 [[curl/7.26.0] 127.0.0.1:47856]
@k8s-github-robot k8s-github-robot added area/controller-manager sig/node Categorizes an issue or PR as relevant to SIG Node. labels Aug 22, 2016
@jingxu97 jingxu97 added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Aug 30, 2016
@jingxu97 jingxu97 self-assigned this Aug 30, 2016
@jingxu97
Copy link
Contributor

@ceefour Could you please share the full log during this error occurred and also the spec for your RC? Are you using PVC/PV or directly use pd name? Is that possible to your PD mongo-conf is attached to some node with RW mode? Thanks!

@ceefour
Copy link
Author

ceefour commented Aug 31, 2016

It's possible that because I ran into #29358 , this problem masked it.
Unfortunately I have been no longer using/trying readOnly PD.

On Wed, Aug 31, 2016, 06:06 Jing Xu notifications@github.com wrote:

@ceefour https://github.com/ceefour Could you please share the full log
during this error occurred and also the spec for your RC? Are you using
PVC/PV or directly use pd name? Is that possible to your PD mongo-conf is
attached to some node with RW mode? Thanks!


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#31176 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AABeO6bQlEActSQxsMI1wKff7ystJTumks5qlLd3gaJpZM4JqYKF
.

@jingxu97
Copy link
Contributor

@ceefour Thank you for your response. Since we fixed a few things recently in storage related part. If it is ok, could you please give a try for readOnly PD and let us know if you still have any problem? Thanks!

@ceefour
Copy link
Author

ceefour commented Nov 27, 2016

Situation improves on 1.4.6, but minor annoyance because there is still lag while it's retrying:

>kubectl get po
NAME                         READY     STATUS              RESTARTS   AGE
mongo-arb-3311699171-5yc0t   1/1       Running             0          3m
mongo0-375173819-7jtxa       0/1       ContainerCreating   0          3m
mongo1-1101050559-y1s10      1/1       Running             0          9m

>kubectl describe po mongo0-375173819-7jtxa
Name:           mongo0-375173819-7jtxa
Namespace:      default
Node:           gke-fatih-g1-d2fb3adc-nl4a/10.142.0.4
Start Time:     Sun, 27 Nov 2016 17:23:06 +0700
Labels:         instance=fatih0
                mongo-rs-name=bippo
                pod-template-hash=375173819
Status:         Pending
IP:
Controllers:    ReplicaSet/mongo0-375173819
Containers:
  mongo:
    Container ID:
    Image:              mongo
    Image ID:
    Port:               27017/TCP
    Command:
      /bin/sh
      -c
    Args:
      cp /etc/mongo-keyfile/mongo.keyfile /etc/mongo.keyfile && chmod 600 /etc/mongo.keyfile && chown mongodb:mongodb /etc/mongo.keyfile && mongod --replSet bippo --keyFile /etc/mongo.keyfile
    Requests:
      cpu:              100m
      memory:           700Mi
    State:              Waiting
      Reason:           ContainerCreating
    Ready:              False
    Restart Count:      0
    Volume Mounts:
      /data/db from mongo-persistent-storage0 (rw)
      /etc/mongo-keyfile from mongo-keyfile (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-i0lox (ro)
    Environment Variables:      <none>
Conditions:
  Type          Status
  Initialized   True
  Ready         False
  PodScheduled  True
Volumes:
  mongo-keyfile:
    Type:       Secret (a volume populated by a Secret)
    SecretName: mongo-keyfile
  mongo-persistent-storage0:
    Type:       GCEPersistentDisk (a Persistent Disk resource in Google Compute Engine)
    PDName:     mongodb-disk0
    FSType:     ext4
    Partition:  0
    ReadOnly:   false
  default-token-i0lox:
    Type:       Secret (a volume populated by a Secret)
    SecretName: default-token-i0lox
QoS Class:      Burstable
Tolerations:    <none>
Events:
  FirstSeen     LastSeen        Count   From                                    SubobjectPath   Type            Reason
        Message
  ---------     --------        -----   ----                                    -------------   --------        ------
        -------
  2m            2m              1       {default-scheduler }                                    Normal          Scheduled       Successfully assigned mongo0-375173819-7jtxa to gke-fatih-g1-d2fb3adc-nl4a
  2m            1m              8       {controller-manager }                                   Warning         FailedMount     Failed to attach volume "mongo-persistent-storage0" on node "gke-fatih-g1-d2fb3adc-nl4a" with: googleapi: Error 400: The disk resource 'mongodb-disk0' is already being used by 'gke-fatih-pool-1-2efbbabb-kfxk'
  47s           47s             1       {kubelet gke-fatih-g1-d2fb3adc-nl4a}                    Warning         FailedMount     Unable to mount volumes for pod "mongo0-375173819-7jtxa_default(7cd25e46-b48b-11e6-a2cb-42010af000ba)": timeout expired waiting for volumes to attach/mount for pod "mongo0-375173819-7jtxa"/"default". list of unattached/unmounted volumes=[mongo-persistent-storage0]
  47s           47s             1       {kubelet gke-fatih-g1-d2fb3adc-nl4a}                    Warning         FailedSync      Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "mongo0-375173819-7jtxa"/"default". list of unattached/unmounted volumes=[mongo-persistent-storage0]

The disk resource 'mongodb-disk0' is already being used by 'gke-fatih-pool-1-2efbbabb-kfxk' but that instance no longer exists now:

>kubectl get node
NAME                         STATUS    AGE
gke-fatih-g1-d2fb3adc-l0ad   Ready     5m
gke-fatih-g1-d2fb3adc-nl4a   Ready     5m

C:\Users\ceefour\git\yoopabot>kubectl get po
NAME                         READY     STATUS    RESTARTS   AGE
mongo-arb-3311699171-5yc0t   1/1       Running   0          5m
mongo0-375173819-7jtxa       1/1       Running   0          5m
mongo1-1101050559-y1s10      1/1       Running   0          11m

@jingxu97
Copy link
Contributor

jingxu97 commented Nov 28, 2016 via email

@brianbaquiran
Copy link

brianbaquiran commented Feb 2, 2017

I am seeing the same or a similar issue, but it happens after the pod has been running for quite some time.

Name:		influxdb-3155791781-100b4
Namespace:	default
Node:		gke-demo-cluster-default-pool-03638b68-15cz/10.128.0.4
Start Time:	Wed, 11 Jan 2017 18:42:33 +0800
Labels:		app=influxdb
		pod-template-hash=3155791781
Status:		Running
IP:		
Controllers:	ReplicaSet/influxdb-3155791781
Containers:
  db:
    Container ID:	docker://ed4458d25ec1d075b567a22d0b93ea2b59561bc6ff3b7f1efb39c450e4a4120c
    Image:		influxdb:latest
    Image ID:		docker://sha256:14e953917a5971a1638dcc329aa5ba50d9b55c7bb5638fbe4c837c5d43945f24
    Port:		8086/TCP
    Requests:
      cpu:		100m
    State:		Terminated
      Reason:		Completed
      Exit Code:	0
      Started:		Wed, 11 Jan 2017 18:44:30 +0800
      Finished:		Mon, 30 Jan 2017 23:21:59 +0800
    Ready:		False
    Restart Count:	0
    Volume Mounts:
      /var/lib/influxdb from influxdb-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-iz44v (ro)
    Environment Variables:
      INFLUXDB_GRAPHITE_ENABLED:	true
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  influxdb-data:
    Type:	GCEPersistentDisk (a Persistent Disk resource in Google Compute Engine)
    PDName:	influxdb
    FSType:	ext4
    Partition:	0
    ReadOnly:	false
  default-token-iz44v:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-iz44v
QoS Class:	Burstable
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From							SubobjectPath	Type		Reason		Message
  ---------	--------	-----	----							-------------	--------	------		-------
  2d		27s		1717	{kubelet gke-demo-cluster-default-pool-03638b68-15cz}			Warning		FailedMount	Unable to mount volumes for pod "influxdb-3155791781-100b4_default(a8fb3123-d7ea-11e6-9070-42010af00136)": timeout expired waiting for volumes to attach/mount for pod "influxdb-3155791781-100b4"/"default". list of unattached/unmounted volumes=[influxdb-data]
  2d		27s		1717	{kubelet gke-demo-cluster-default-pool-03638b68-15cz}			Warning		FailedSync	Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "influxdb-3155791781-100b4"/"default". list of unattached/unmounted volumes=[influxdb-data]
Name:		shakeys-uwsgi-256045863-jvxtv
Namespace:	default
Node:		gke-demo-cluster-default-pool-03638b68-15cz/10.128.0.4
Start Time:	Wed, 11 Jan 2017 03:56:28 +0800
Labels:		app=shakeys-uwsgi
		pod-template-hash=256045863
Status:		Running
IP:		
Controllers:	ReplicaSet/shakeys-uwsgi-256045863
Containers:
  chatbot:
    Container ID:	docker://43fde1ebb31bb7631d2b995c217b0277a615931d1f6f1d05ab651fde17799f52
    Image:		us.gcr.io/panoptez/chatbotmvp_uwsgi
    Image ID:		docker://sha256:75f31ced698eaf79dceffa827587b722d085183b32371fbf27482f5fb5bbb1df
    Ports:		3031/TCP, 9191/TCP
    Requests:
      cpu:		100m
    State:		Terminated
      Reason:		Completed
      Exit Code:	0
      Started:		Wed, 11 Jan 2017 03:56:30 +0800
      Finished:		Mon, 30 Jan 2017 23:22:55 +0800
    Ready:		False
    Restart Count:	0
    Volume Mounts:
      /usr/local/lib/python2.7/dist-packages/spacy/data from spacy-data (rw)
      /usr/share/nltk_data from nltk-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-iz44v (ro)
    Environment Variables:
      PEZ_AI_ENDPOINT:		/shakeys
      PEZ_AI_CONF:		/pez_ai/conf/shk.cfg
      PAGE_ACCESS_TOKEN:	EAADFdZBumZCcIBAAt9CSYP3hwzHEybebZA4vKfOGuX25ZC3xoLJOXRKTvthj1kdrYtcK35kSHuXgROyAQhZBlzXJSk5PfI2zskffSxGguVJcOdtiVJUtoQBpmcl1KdlBlRs99PAY59rQ41AhFpoU0AOdnioZCMMSnBkkEQclN1XgZDZD
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  spacy-data:
    Type:	GCEPersistentDisk (a Persistent Disk resource in Google Compute Engine)
    PDName:	spacydata
    FSType:	ext4
    Partition:	0
    ReadOnly:	true
  nltk-data:
    Type:	GCEPersistentDisk (a Persistent Disk resource in Google Compute Engine)
    PDName:	nltkdata
    FSType:	ext4
    Partition:	0
    ReadOnly:	true
  default-token-iz44v:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-iz44v
QoS Class:	Burstable
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From							SubobjectPath	Type		Reason		Message
  ---------	--------	-----	----							-------------	--------	------		-------
  2d		39s		1715	{kubelet gke-demo-cluster-default-pool-03638b68-15cz}			Warning		FailedMount	Unable to mount volumes for pod "shakeys-uwsgi-256045863-jvxtv_default(dfa2c601-d76e-11e6-9070-42010af00136)": timeout expired waiting for volumes to attach/mount for pod "shakeys-uwsgi-256045863-jvxtv"/"default". list of unattached/unmounted volumes=[spacy-data nltk-data]
  2d		39s		1715	{kubelet gke-demo-cluster-default-pool-03638b68-15cz}			Warning		FailedSync	Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "shakeys-uwsgi-256045863-jvxtv"/"default". list of unattached/unmounted volumes=[spacy-data nltk-data]

Actually, now that I'm looking more closely at the describe output, both pods were on the same node and experienced the error within a minute of each other.

Note also the pods were not sharing the same GCEPersistentDisk.

The nodes are currently at v1.4.5, but I see v1.4.7 is available on GKE. I can upgrade if it will fix the problem.

@msau42
Copy link
Member

msau42 commented Aug 5, 2017

Can this be closed?

@patvdleer
Copy link

I seem to run into the same problem

I have a backend api service which mounts a volume readonly false and a worker which mounts it readonly true. This results in the following:

Error: UPGRADE FAILED: failed to create resource: Deployment.apps "local-celeryworker" is invalid: [spec.template.spec.volumes[0].persistentVolumeClaim: Forbidden: may not specify more than 1 volume type, spec.template.spec.volumes[0].gcePersistentDisk.readOnly: Invalid value: false: must be true for replicated pods > 1; GCE PD can only be mounted on multiple machines if it is read-only]

helm chart (for both except the read-only):

      volumes:
      - name: datadir-{{ template "trackableappname" . }}
        gcePersistentDisk:
          pdName: datadir-{{ template "trackableappname" . }}
          fsType: ext4
        persistentVolumeClaim:
          readOnly: true
          claimName: {{ .Values.existingClaim | default (include "fullname" .) }}

@msau42
Copy link
Member

msau42 commented Nov 27, 2017

HI @patvdleer, there's a few problems with your helm chart:

  • Each volume name can only have one volume type. So for each volume, you can only specify one of gcePersistentDisk or persistentVolumeClaim, but not both.
  • You cannot mount a gce PD as writable on multiple nodes (ie replicas > 1)
  • You cannot mount the same gce PD as both writable on one node, and readable on multiple nodes at the same time. The writing pod and the reading pods need to be launched one after another, and not simultaneously. See this for more information.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 25, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 27, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@nyurik
Copy link

nyurik commented Oct 27, 2018

I hit a similar issue several times, and after some digging in the node's dmesg, it turned out that the persisted disk was not properly unmounted when I was first initializing it. This caused the disk to be in a bad state - unable to mount as readonly. The solution was to manually re-attach the disk with --mode=rw, mount it as rw - which automatically fixes the bad state, unmount it, and detach. Afterwards, everything was working well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller-manager lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
Development

No branches or pull requests

9 participants