Pods with PV on rook-ceph doesn't start on the second node when the first node is offline #14993

gserpentino · 2024-11-11T14:12:24Z

gserpentino
Nov 11, 2024

I'm trying this test on my kubernetes test environment.
I try to simulate a failure on one of a worker,
and I would like to have the pods on the good node, up and running
but actually is not.

Application is on node kubelcient1
root@kubemaster:~/Node_Check_Operator# kubectl get pods,nodes -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/grafana-69d855495d-qjfgd 1/1 Running 0 14m 10.244.151.57 kubeclient1
pod/wordpress-5b9ddb4b9d-ld6lv 1/1 Running 7 (5m27s ago) 14m 10.244.151.43 kubeclient1
pod/wordpress-mysql-65cd85d4d7-bf79h 1/1 Running 4 (6m39s ago) 14m 10.244.151.34 kubeclient1

I simulate a failure on node kubeclient1
systemctl stop kubelet on node kubeclient1

pods on kubeclient1 goes in terminating state but dont release lock on PV
and the pods on node kubeclient2 goes in loop error

kubectl get pods
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/grafana-69d855495d-hpfhp 1/1 Terminating 0 89m 10.244.123.163 kubeclient1
pod/grafana-69d855495d-qjfgd 1/1 Running 0 8m56s 10.244.151.57 kubeclient2
pod/wordpress-5b9ddb4b9d-ld6lv 0/1 CrashLoopBackOff 5 (2m44s ago) 8m56s 10.244.151.43 kubeclient2
pod/wordpress-5b9ddb4b9d-lr2rs 1/1 Terminating 4 (85m ago) 88m 10.244.123.149 kubeclient1
pod/wordpress-mysql-65cd85d4d7-26p28 1/1 Terminating 1 (86m ago) 89m 10.244.123.170 kubeclient1
pod/wordpress-mysql-65cd85d4d7-bf79h 1/1 Running 4 (43s ago) 8m56s 10.244.151.34 kubeclient2

kubectl get volumeattachment
NAME ATTACHER PV NODE ATTACHED AGE
csi-0211bb93fad967b204c6254e34680757cae2c93000977ab37d11e51a596d4fed rook-ceph.cephfs.csi.ceph.com pvc-46508706-ec05-4ab1-954a-54462f0e425c kubeclient2 true 10m
csi-590226f253850369b23ee9210ea2224f4a6cffe5c969eb2e46d494b9f334bea5 rook-ceph.cephfs.csi.ceph.com pvc-cd07016b-b5e6-4b87-b1c2-cf9bd913750d kubeclient1 true 90m
csi-9f3a64fd4123fd6c014ddc69f55db0d9ff05beb6ecae5b8c869ebdac1aa6c374 rook-ceph.cephfs.csi.ceph.com pvc-4c701e0b-8447-4312-8e38-495375bcfd98 kubeclient1 true 90m
csi-b9b3787eac769b8374ff4a8e96c531762f11f653f32da1c320bb12104f5e3da3 rook-ceph.cephfs.csi.ceph.com pvc-cd07016b-b5e6-4b87-b1c2-cf9bd913750d kubeclient2 true 10m
csi-c07f7a4420565d25008d365914d4d6a1f0227b3f10f11ce49830707c7fb55e7d rook-ceph.cephfs.csi.ceph.com pvc-46508706-ec05-4ab1-954a-54462f0e425c kubeclient1 true 90m
csi-e579a076da144ca7e95b768e2cc21cdd78dc8c870cd235540d67e0fefb767fb5 rook-ceph.cephfs.csi.ceph.com pvc-4c701e0b-8447-4312-8e38-495375bcfd98 kubeclient2 true 10m

So to resume the situation i execute following command but I thought kubernetes/rook-ceph solved this issues automatically

kubectl delete pod/grafana-69d855495d-hpfhp pod/wordpress-5b9ddb4b9d-lr2rs pod/wordpress-mysql-65cd85d4d7-26p28 --force

Done this, the situation is follow:

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/grafana-69d855495d-qjfgd 1/1 Running 0 9m49s 10.244.151.57 kubeclient2
pod/wordpress-5b9ddb4b9d-ld6lv 0/1 CrashLoopBackOff 6 (24s ago) 9m49s 10.244.151.43 kubeclient2
pod/wordpress-mysql-65cd85d4d7-bf79h 1/1 Running 4 (96s ago) 9m49s 10.244.151.34 kubeclient2

To start definitely the pods it is necessary
to execute systemctl start kubelet on node kubeclient1

In this way the lock is released and the pods start in the right way on kubeclient2

How to solve this problem?
I woluld like that the application that use PV start on the other node automatically
so I would like the lock on the first node is released automatically if the node goes down
or became unreacable.

Many thanks for help
Gabriele

travisn · 2024-11-11T18:54:41Z

travisn
Nov 11, 2024
Maintainer

Try this doc on handling node loss

0 replies

gserpentino · 2024-11-11T22:42:52Z

gserpentino
Nov 11, 2024
Author

I was hoping there was something automatic in the rook-ceph cluster that could resolve these types of issues/locks so that when one node goes down, the pod spins up on the other and the application restarts without any problems. All without manual intervention.

0 replies

gserpentino · 2024-11-12T10:29:25Z

gserpentino
Nov 12, 2024
Author

I have isolated the node kubeclient2 with systemctl stop kubelet.
I have deleted the pods in Terminating status on node kubeclient2
This is the situation :

pod/grafana-69d855495d-l7z2b 1/1 Running 0 17m 10.244.123.172 kubeclient1
pod/wordpress-5b9ddb4b9d-bcd2g 0/1 ContainerCreating 0 17m kubeclient1
pod/wordpress-mysql-65cd85d4d7-srb48 0/1 ContainerCreating 0 17m kubeclient1

kubectl describe pod/wordpress-mysql-65cd85d4d7-srb48

Events:
Type Reason Age From Message

Normal Scheduled 18m default-scheduler Successfully assigned default/wordpress-mysql-65cd85d4d7-srb48 to kubeclient1
Warning FailedAttachVolume 18m attachdetach-controller Multi-Attach error for volume "pvc-66d6fc07-d656-45b2-a989-077ed96bf7f8" Volume is already used by pod(s) wordpress-mysql-65cd85d4d7-ss226
Normal SuccessfulAttachVolume 5m31s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-66d6fc07-d656-45b2-a989-077ed96bf7f8"
Warning FailedMount 3m31s kubelet MountVolume.MountDevice failed for volume "pvc-66d6fc07-d656-45b2-a989-077ed96bf7f8" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
Warning FailedMount 83s (x8 over 3m30s) kubelet MountVolume.MountDevice failed for volume "pvc-66d6fc07-d656-45b2-a989-077ed96bf7f8" : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000004-e92da125-ad1c-4213-a61a-c1fd9ceb759d already exists

root@kubemaster:/Node_Check_Operator/kubernetes-csi-addons# kubectl get volumeattachment
NAME ATTACHER PV NODE ATTACHED AGE
csi-1b35379969cb06369a37a5e10c3b7ce13b92f23c08f2fdbe0d93ce4d7569babf rook-ceph.rbd.csi.ceph.com pvc-66d6fc07-d656-45b2-a989-077ed96bf7f8 kubeclient1 true 7m14s
csi-b8c88875bdd7820b0d71efdf2c1a87029e79977a8263a75bbde5497821054f07 rook-ceph.rbd.csi.ceph.com pvc-a42c8d76-3b55-4ae4-8f16-7612d1cb4a37 kubeclient1 true 7m3s
csi-c07f7a4420565d25008d365914d4d6a1f0227b3f10f11ce49830707c7fb55e7d rook-ceph.cephfs.csi.ceph.com pvc-46508706-ec05-4ab1-954a-54462f0e425c kubeclient1 true 20m
root@kubemaster:/Node_Check_Operator/kubernetes-csi-addons#

1 reply

Madhu-1 Nov 12, 2024
Collaborator

@gserpentino i see that you are using cephfs not rbd, we dont need to have NodeLoss precedure, i assume you are having some ceph specific problem, Is the ceph cluster is in healthy state? check the cephfs plugin container logs on the node where the pods are scheduled.

gserpentino · 2024-11-12T15:52:33Z

gserpentino
Nov 12, 2024
Author

Hello Maduh-1,
below some verification.
Let mw know if you need something other.
Many thanks

Two application on kubeclient2

default pod/wordpress-5b9ddb4b9d-6sph9 1/1 Running 0 23s 10.244.151.45 kubeclient2
default pod/wordpress-mysql-65cd85d4d7-sgq9p 1/1 Running 0 50s 10.244.151.11 kubeclient2

NAMESPACE NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS VOLUMEATTRIBUTESCLASS REASON AGE VOLUMEMODE
persistentvolume/pvc-ae7e26ab-a088-428b-8493-893ff6bfbb3b 20Gi RWX Delete Bound default/mysql-pv-claim rook-cephfs 49s Filesystem
persistentvolume/pvc-bacd0922-0c58-4668-8ed8-dfb3e418139a 20Gi RWX Delete Bound default/wp-pv-claim rook-cephfs 23s Filesystem

systemctl stop kubelet on kubeclient2
This is the status of rook-cephfs after the kubclient2 down
Before were in HEALTY condition.

bash-5.1$ ceph status
cluster:
id: c36b72e5-6694-48ec-9a5b-91f294a326cc
health: HEALTH_WARN
1/3 mons down, quorum d,y
1 osds down
1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT} flags set
1 host (1 osds) down
Reduced data availability: 9 pgs stale
Degraded data redundancy: 5682/17051 objects degraded (33.324%), 49 pgs degraded, 49 pgs undersized

services:
mon: 3 daemons, quorum d,y (age 98s), out of quorum: q
mgr: c(active, since 2h), standbys: b
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 2 up (since 89s), 3 in (since 4h)

data:
volumes: 1/1 healthy
pools: 4 pools, 81 pgs
objects: 5.69k objects, 13 GiB
usage: 38 GiB used, 262 GiB / 300 GiB avail
pgs: 5682/17051 objects degraded (33.324%)
49 active+undersized+degraded
23 active+clean
9 stale+active+clean

io:
client: 3.8 KiB/s rd, 6.6 KiB/s wr, 1 op/s rd, 2 op/s wr

bash-5.1$

No logs written by the pod/csi-cephfsplugin-lwg6v
root@kubemaster:~# kubectl NAME csi-cephfsplugin-2x9bq csi-cephfsplugin-9nqbf csi-cephfsplugin-lwg6v csi-cephfsplugin-provisioner-658889bb46-6k8tz csi-cephfsplugin-provisioner-658889bb46-jfk68 csi-rbdplugin-5sdpd csi-rbdplugin-cnsxn csi-rbdplugin-provisioner-698cc7f8cb-7fb7j csi-rbdplugin-provisioner-698cc7f8cb-84xvx csi-rbdplugin-x742d rook-ceph-exporter-kubeclient1-7 rook-ceph-exporter-kubemaster-76 rook-ceph-mds-myfs-a-766cbffcfb-rvkmg rook-ceph-mds-myfs-b-578f75785c-5kwj5 rook-ceph-mgr-a-66c8985894-bm6tr rook-ceph-mgr-b-f6d9fcb5d-67frk rook-ceph-mgr-c-595f4f78db-8kltp rook-ceph-mon-d-84ff96f647-rqrcn rook-ceph-mon-q-5cd68485f8-plm58 rook-ceph-mon-y-68855d6d7f-gdg9p rook-ceph-operator-5c49669f69-s5rgx rook-ceph-osd-0-85f985bc9b-jr64w rook-ceph-osd-1-5fdfc844cf-nf67s rook-ceph-osd-2-84c5fc64cb-dtdfx rook-ceph-osd-prepare-kubeclient1-rgtgh rook-ceph-osd-prepare-kubemaster-bltqx rook-ceph-tools-68bf47bc65-5v8t8 get pod -n rook-ceph -o wide
READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
2/2 Running 332 (3h43m ago) 40d 192.168.1.212 kubeclient2
2/2 Running 329 (3h43m ago) 40d 192.168.1.210 kubemaster
2/2 Running 6 (3h43m ago) 5h48m 192.168.1.211 kubeclient1
6/6 Running 19 (3h43m ago) 6h11m 10.244.141.11 kubemaster
6/6 Running 13 (122m ago) 5h27m 10.244.123.142 kubeclient1
3/3 Running 9 (3h43m ago) 5h48m 192.168.1.211 kubeclient1
3/3 Running 9 (3h43m ago) 6h11m 192.168.1.210 kubemaster
6/6 Running 18 (3h43m ago) 6h11m 10.244.141.54 kubemaster
6/6 Running 13 (122m ago) 5h27m 10.244.123.133 kubeclient1
3/3 Running 15 (125m ago) 6h10m 192.168.1.212 kubeclient2
f4dc979b5-k8zwq 1/1 Running 2 (3h43m ago) 5h47m 10.244.123.183 kubeclient1
7bf7c5b7-tg4p4 1/1 Running 61 (3h43m ago) 25d 10.244.141.49 kubemaster
1/1 Running 416 (2m47s ago) 25d 10.244.141.15 kubemaster
1/1 Running 434 (2m53s ago) 25d 10.244.141.25 kubemaster
0/2 Pending 0 11m
2/2 Running 5 (124m ago) 6h40m 10.244.123.162 kubeclient1
2/2 Running 29 (3h43m ago) 4d1h 10.244.141.47 kubemaster
1/1 Running 10 (3h43m ago) 2d2h 10.244.141.3 kubemaster
0/1 Pending 0 6m19s
1/1 Running 2 (3h43m ago) 6h9m 10.244.123.174 kubeclient1
1/1 Running 2 (3h43m ago) 5h32m 10.244.123.158 kubeclient1
1/1 Running 2 (3h43m ago) 6h35m 10.244.123.144 kubeclient1
0/1 Pending 0 6m19s
1/1 Running 92 (3h43m ago) 40d 10.244.141.35 kubemaster
0/1 Completed 0 123m 10.244.123.171 kubeclient1
0/1 Completed 0 123m 10.244.141.39 kubemaster
1/1 Running 16 (3h43m ago) 5d1h 10.244.141.23 kubemaster

root@kubemaster:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubeclient1 Ready worker 40d v1.31.1
kubeclient2 NotReady worker 40d v1.31.1
kubemaster Ready control-plane 40d v1.31.1

root@kubemaster:~# kubectl get pods,nodes -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/grafana-69d855495d-l7z2b 1/1 Running 2 (148m ago) 5h41m 10.244.123.151 kubeclient1
pod/wordpress-5b9ddb4b9d-26wdz 1/1 Running 8 (6m12s ago) 20m 10.244.123.134 kubeclient1
pod/wordpress-5b9ddb4b9d-6sph9 1/1 Terminating 0 31m 10.244.151.45 kubeclient2
pod/wordpress-mysql-65cd85d4d7-pqz5j 0/1 RunContainerError 6 (4m15s ago) 20m 10.244.123.184 kubeclient1
pod/wordpress-mysql-65cd85d4d7-sgq9p 1/1 Terminating 0 32m 10.244.151.11 kubeclient2

NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node/kubeclient1 Ready worker 40d v1.31.1 192.168.1.211 Ubuntu 24.04.1 LTS 6.8.0-48-generic docker://27.3.1
node/kubeclient2 NotReady worker 40d v1.31.1 192.168.1.212 Ubuntu 24.04.1 LTS 6.8.0-48-generic docker://27.3.1
node/kubemaster Ready control-plane 40d v1.31.1 192.168.1.210 Ubuntu 24.04.1 LTS 6.8.0-48-generic docker://27.3.1
root@kubemaster:~#

Logs og mysql
2024-11-12 15:42:32 1 [Note] InnoDB: Check that you do not already have another mysqld process using the same InnoDB data or log files.
2024-11-12 15:42:33 1 [ERROR] InnoDB: Unable to lock ./ibdata1, error: 11
2024-11-12 15:42:33 1 [Note] InnoDB: Check that you do not already have another mysqld process using the same InnoDB data or log files.
2024-11-12 15:42:34 1 [ERROR] InnoDB: Unable to lock ./ibdata1, error: 11
2024-11-12 15:42:34 1 [Note] InnoDB: Check that you do not already have another mysqld process using the same InnoDB data or log files.
2024-11-12 15:42:35 1 [ERROR] InnoDB: Unable to lock ./ibdata1, error: 11
2024-11-12 15:42:35 1 [Note] InnoDB: Check that you do not already have another mysqld process using the same InnoDB data or log files.
2024-11-12 15:42:36 1 [ERROR] InnoDB: Unable to lock ./ibdata1, error: 11
2024-11-12 15:42:36 1 [Note] InnoDB: Check that you do not already have another mysqld process using the same InnoDB data or log files.
2024-11-12 15:42:37 1 [ERROR] InnoDB: Unable to lock ./ibdata1, error: 11
2024-11-12 15:42:37 1 [Note] InnoDB: Check that you do not already have another mysqld process using the same InnoDB data or log files.
2024-11-12 15:42:38 1 [ERROR] InnoDB: Unable to lock ./ibdata1, error: 11
2024-11-12 15:42:38 1 [Note] InnoDB: Check that you do not already have another mysqld process using the same InnoDB data or log files.
2024-11-12 15:42:39 1 [ERROR] InnoDB: Unable to lock ./ibdata1, error: 11
2024-11-12 15:42:39 1 [Note] InnoDB: Check that you do not already have another mysqld process using the same InnoDB data or log files.
2024-11-12 15:42:40 1 [ERROR] InnoDB: Unable to lock ./ibdata1, error: 11
2024-11-12 15:42:40 1 [Note] InnoDB: Check that you do not already have another mysqld process using the same InnoDB data or log files.
2024-11-12 15:42:41 1 [ERROR] InnoDB: Unable to lock ./ibdata1, error: 11
2024-11-12 15:42:41 1 [Note] InnoDB: Check that you do not already have another mysqld process using the same InnoDB data or log files.
2024-11-12 15:42:41 1 [Note] InnoDB: Unable to open the first data file
2024-11-12 15:42:41 72bc89140040 InnoDB: Operating system error number 11 in a file operation.
InnoDB: Error number 11 means 'Resource temporarily unavailable'.
InnoDB: Some operating system error numbers are described at
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/operating-system-error-codes.html
2024-11-12 15:42:41 1 [ERROR] InnoDB: Can't open './ibdata1'
2024-11-12 15:42:41 1 [ERROR] InnoDB: Could not open or create the system tablespace. If you tried to add new data files to the system tablespace, and it failed here, you should now edit innodb_data_file_path in my.cnf back to what it was, and remove the new ibdata files InnoDB created in this failed attempt. InnoDB only wrote those files full of zeros, but did not yet use them in any way. But be careful: do not remove old data files which contain your precious data!
2024-11-12 15:42:41 1 [ERROR] Plugin 'InnoDB' init function returned error.
2024-11-12 15:42:41 1 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2024-11-12 15:42:41 1 [ERROR] Unknown/unsupported storage engine: InnoDB
2024-11-12 15:42:41 1 [ERROR] Aborting

2024-11-12 15:42:41 1 [Note] Binlog end
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'partition'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'PERFORMANCE_SCHEMA'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_SYS_DATAFILES'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_SYS_TABLESPACES'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_SYS_FOREIGN_COLS'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_SYS_FOREIGN'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_SYS_FIELDS'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_SYS_COLUMNS'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_SYS_INDEXES'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_SYS_TABLESTATS'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_SYS_TABLES'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_FT_INDEX_TABLE'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_FT_INDEX_CACHE'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_FT_CONFIG'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_FT_BEING_DELETED'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_FT_DELETED'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_FT_DEFAULT_STOPWORD'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_METRICS'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_BUFFER_POOL_STATS'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_BUFFER_PAGE_LRU'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_BUFFER_PAGE'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_CMP_PER_INDEX_RESET'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_CMP_PER_INDEX'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_CMPMEM_RESET'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_CMPMEM'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_CMP_RESET'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_CMP'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_LOCK_WAITS'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_LOCKS'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'INNODB_TRX'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'BLACKHOLE'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'ARCHIVE'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'MRG_MYISAM'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'MyISAM'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'MEMORY'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'CSV'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'sha256_password'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'mysql_old_password'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'mysql_native_password'
2024-11-12 15:42:41 1 [Note] Shutting down plugin 'binlog'
2024-11-12 15:42:41 1 [Note] mysqld: Shutdown complete

root@kubemaster:# kubectl get volumeattachment
NAME ATTACHER PV NODE ATTACHED AGE
csi-7c09ca74bd93c815d36f4b43cd1370ee3d43c3d04bf476acda3ef3f7da744daa rook-ceph.cephfs.csi.ceph.com pvc-bacd0922-0c58-4668-8ed8-dfb3e418139a kubeclient1 true 19m
csi-bcc4141c2eedd2946c774d4694faf0b1a7101b23c8cdc089a511dbec4eab083e rook-ceph.cephfs.csi.ceph.com pvc-bacd0922-0c58-4668-8ed8-dfb3e418139a kubeclient2 true 30m
csi-c07f7a4420565d25008d365914d4d6a1f0227b3f10f11ce49830707c7fb55e7d rook-ceph.cephfs.csi.ceph.com pvc-46508706-ec05-4ab1-954a-54462f0e425c kubeclient1 true 5h40m
csi-ea24c729539b138bbcc36bc8be5e8ba1f6879f19c59239661ea45073d765b3e6 rook-ceph.cephfs.csi.ceph.com pvc-ae7e26ab-a088-428b-8493-893ff6bfbb3b kubeclient1 true 19m
csi-f2472348de9eb419bd905c8a281483402e94d81caca745b72cc9433742488849 rook-ceph.cephfs.csi.ceph.com pvc-ae7e26ab-a088-428b-8493-893ff6bfbb3b kubeclient2 true 30m
root@kubemaster:#

root@kubemaster:~# kubectl get pv,pvc -o wide
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS VOLUMEATTRIBUTESCLASS REASON AGE VOLUMEMODE
persistentvolume/pvc-0e034e98-f539-4dd6-965c-3f1ff3148e97 50Gi RWO Delete Bound oracle-namespace/freedb-sample rook-cephfs 14d Filesystem
persistentvolume/pvc-0ee9a68a-e381-452e-93d5-b81fcffce4bb 2Gi RWO Delete Bound mongodb/logs-volume-example-mongodb-0 rook-cephfs 16d Filesystem
persistentvolume/pvc-1d962525-2485-46af-8011-59593cbb4803 2Gi RWO Delete Bound mongodb/logs-volume-example-mongodb-1 rook-cephfs 16d Filesystem
persistentvolume/pvc-397793ce-a40b-46dd-ae8f-991a1f65a042 40Gi RWX Delete Bound default/ubuntu-jammy rook-cephfs 39d Filesystem
persistentvolume/pvc-46508706-ec05-4ab1-954a-54462f0e425c 1Gi RWX Delete Bound default/grafana-pvc rook-cephfs 28d Filesystem
persistentvolume/pvc-7ea36ed1-1bf2-4d0e-af10-34fbc459ab43 10Gi RWO Delete Bound mongodb/data-volume-example-mongodb-1 rook-cephfs 16d Filesystem
persistentvolume/pvc-ae7e26ab-a088-428b-8493-893ff6bfbb3b 20Gi RWX Delete Bound default/mysql-pv-claim rook-cephfs 33m Filesystem
persistentvolume/pvc-b6ac48f4-4298-4ede-86b0-0adc2f3db083 2Gi RWO Delete Bound mongodb/logs-volume-example-mongodb-2 rook-cephfs 16d Filesystem
persistentvolume/pvc-bacd0922-0c58-4668-8ed8-dfb3e418139a 20Gi RWX Delete Bound default/wp-pv-claim rook-cephfs 32m Filesystem
persistentvolume/pvc-d6ae6ed5-f6e3-4fa6-8a33-2b96808ea724 10Gi RWO Delete Bound mongodb/data-volume-example-mongodb-2 rook-cephfs 16d Filesystem
persistentvolume/pvc-e95b6ece-e0f5-4f9c-93c7-e25f4115a24a 10Gi RWO Delete Bound mongodb/data-volume-example-mongodb-0 rook-cephfs 16d Filesystem

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE VOLUMEMODE
persistentvolumeclaim/grafana-pvc Bound pvc-46508706-ec05-4ab1-954a-54462f0e425c 1Gi RWX rook-cephfs 28d Filesystem
persistentvolumeclaim/mysql-pv-claim Bound pvc-ae7e26ab-a088-428b-8493-893ff6bfbb3b 20Gi RWX rook-cephfs 33m Filesystem
persistentvolumeclaim/ubuntu-jammy Bound pvc-397793ce-a40b-46dd-ae8f-991a1f65a042 40Gi RWX rook-cephfs 39d Filesystem
persistentvolumeclaim/wp-pv-claim Bound pvc-bacd0922-0c58-4668-8ed8-dfb3e418139a 20Gi RWX rook-cephfs 32m Filesystem
root@kubemaster:~#

At this point I decide to start kubelet on node kubeclient2.
Th elog on PV goes away but I found onlu two ceph-mon on the cluster
and the command ceph status doesnt answere.

I dicide to kill the pod/rook-ceph-operator-5c49669f69-j255j
and now i have ceph-mon and everything work.

bash-5.1$ ceph status
cluster:
id: c36b72e5-6694-48ec-9a5b-91f294a326cc
health: HEALTH_WARN
1/4 mons down, quorum d,y,z

services:
mon: 4 daemons, quorum d,y,z (age 6m), out of quorum: q
mgr: c(active, since 2h), standbys: b, a
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 3 up (since 6m), 3 in (since 4h)

data:
volumes: 1/1 healthy
pools: 4 pools, 81 pgs
objects: 5.69k objects, 13 GiB
usage: 38 GiB used, 262 GiB / 300 GiB avail
pgs: 81 active+clean

io:
client: 853 B/s rd, 1 op/s rd, 0 op/s wr

bash-5.1$

but now I have to delete the quorum 1

bash-5.1$ ceph mon rm q
bash-5.1$
bash-5.1$

bash-5.1$ ceph status
cluster:
id: c36b72e5-6694-48ec-9a5b-91f294a326cc
health: HEALTH_OK

services:
mon: 3 daemons, quorum d,y,z (age 18s)
mgr: c(active, since 2h), standbys: b, a
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 3 up (since 8m), 3 in (since 4h)

data:
volumes: 1/1 healthy
pools: 4 pools, 81 pgs
objects: 5.69k objects, 13 GiB
usage: 38 GiB used, 262 GiB / 300 GiB avail
pgs: 81 active+clean

io:
client: 852 B/s rd, 1 op/s rd, 0 op/s wr

in short, several problems here and there

0 replies

gserpentino · 2024-11-19T07:53:00Z

gserpentino
Nov 19, 2024
Author

Hello all,
any news about my issue?
In a few words, when a node goes offline, the pods switch on the other but the application (oracle or mongodb) doesn't start becauese the storage is locked by the original pod.

Ciao e grazie
Gabriele

0 replies

gserpentino · 2024-11-20T10:40:57Z

gserpentino
Nov 20, 2024
Author

Hello all, after the last rook-ceph patch applied, now the switch and the open of oracle database on the other node due to the failire of the original node, work. Ciao Gabriele

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pods with PV on rook-ceph doesn't start on the second node when the first node is offline #14993

{{title}}

Replies: 6 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Pods with PV on rook-ceph doesn't start on the second node when the first node is offline #14993

gserpentino Nov 11, 2024

Replies: 6 comments · 1 reply

travisn Nov 11, 2024 Maintainer

gserpentino Nov 11, 2024 Author

gserpentino Nov 12, 2024 Author

Madhu-1 Nov 12, 2024 Collaborator

gserpentino Nov 12, 2024 Author

gserpentino Nov 19, 2024 Author

gserpentino Nov 20, 2024 Author

gserpentino
Nov 11, 2024

Replies: 6 comments 1 reply

travisn
Nov 11, 2024
Maintainer

gserpentino
Nov 11, 2024
Author

gserpentino
Nov 12, 2024
Author

Madhu-1 Nov 12, 2024
Collaborator

gserpentino
Nov 12, 2024
Author

gserpentino
Nov 19, 2024
Author

gserpentino
Nov 20, 2024
Author