You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Kubernetes version:
1.31.3
Installation method:
Host OS: Ubuntu server 24.04
CNI and version: calico/node:v3.29.0
CRI and version: containerd 1.7.24
What I did before
I uses Helm to deploy Rook and Hive in the cluster, where Rook utilizes the official rook-ceph and rook-ceph-cluster chart packages (both version 1.15.6). Hive uses a chart package that integrates Hadoop and Hive, and the image is custom-built, integrating Hadoop, Hive, Spark, and Flink.
Before using Rook, the persistence solution was unified using NFS, and the Hive data warehouse was running normally, indicating that both the chart packages and the images themselves should be fine.
Turn to cephfs
After switching from NFS to Ceph, three storage classes were automatically created, with Hive’s persistence now using CephFS, occupying 4 PVCs.
$ kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
ceph-block (default) rook-ceph.rbd.csi.ceph.com Delete Immediate true 6d13h
ceph-bucket rook-ceph.ceph.rook.io/bucket Delete Immediate false 6d13h
ceph-filesystem rook-ceph.cephfs.csi.ceph.com Delete Immediate true 6d13h
Some pods use Ceph RBD for persistence, and no issues have been found. In order to provide the Hive JDBC driver JAR file to Superset, a custom pod was created to upload the JAR file to CephFS. After deleting the pod used for the upload, the file still exists, and Superset continues running normally.
But Hadoop initialization of NameNode failed with an error, unable to create the required directories.
Exiting with status 1: java.io.IOException: Cannot create directory /opt/apache/hadoop-3.4.1/data/hdfs/namenode/current
Exiting with status 1: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /opt/apache/hadoop-3.4.1/data/hdfs/namenode is in an inconsistent state: storage directory does not exist or is not accessible.
The pod containing the NameNode keeps crashing and cannot be accessed, so I accessed the pod running the ResourceManager (which comes from the same image) and ran hadoop namenode -format, which successfully initialized. The difference between the two pods is that the ResourceManager pod does not use persistence, so it is not using CephFS.
hadoop@hive-hadoop-yarn-rm-0:/opt/apache$ echo $HADOOP_HOME
/opt/apache/hadoop
hadoop@hive-hadoop-yarn-rm-0:/opt/apache$ ls -l /opt/apache/|grep /opt/apache/hadoop
lrwxrwxrwx 1 hadoop hadoop 24 Dec 11 22:50 hadoop -> /opt/apache/hadoop-3.4.1
hadoop@hive-hadoop-yarn-rm-0:/opt/apache$ ls -l /opt/apache/hadoop-3.4.1/data/hdfs/
total 8
drwxr-xr-x 5 hadoop hadoop 4096 Dec 11 22:50 datanode
drwxr-xr-x 2 hadoop hadoop 4096 Dec 11 22:50 namenode
hadoop@hive-hadoop-yarn-rm-0:/opt/apache$ ls -l /opt/apache/hadoop-3.4.1/data/hdfs/namenode/
total 0
hadoop@hive-hadoop-yarn-rm-0:/opt/apache$ hadoop namenode -format
WARNING: Use of this script to execute namenode is deprecated.
WARNING: Attempting to execute replacement "hdfs namenode" instead.
2024-12-14 12:47:55,001 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = hive-hadoop-yarn-rm-0.hive-hadoop-yarn-rm.hive.svc.cluster.local/10.244.189.104
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 3.4.1
STARTUP_MSG: classpath = ...... more
STARTUP_MSG: build = https://github.com/apache/hadoop.git -r 4d7825309348956336b8f06a08322b78422849b1; compiled by 'mthakur' on 2024-10-09T14:57Z
STARTUP_MSG: java = 1.8.0_421
************************************************************/
2024-12-14 12:47:55,013 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2024-12-14 12:47:55,369 INFO namenode.NameNode: createNameNode [-format]
2024-12-14 12:47:57,737 INFO common.Util: Assuming 'file' scheme for path /opt/apache/hadoop/data/hdfs/namenode in configuration.
2024-12-14 12:47:57,740 INFO common.Util: Assuming 'file' scheme for path /opt/apache/hadoop/data/hdfs/namenode in configuration.
This is the values.yaml file of the Hive chart package. Originally, when using NFS, the two StorageClasses were set to nfs-storage. Now they have been changed to ceph-filesystem, with no other modifications.
persistence:
nameNode:
enabled: true
enabledStorageClass: true
storageClass: ceph-filesystem # Before it is nfs-storage and works well
accessMode: ReadWriteOnce
size: 10Gi
volumes:
- name: dfs
mountPath: /opt/apache/hadoop/data/hdfs/namenode
persistentVolumeClaim:
claimName: dfs-hadoop-hadoop-hdfs-nn
dataNode:
enabled: true
enabledStorageClass: true
storageClass: ceph-filesystem # Before it is nfs-storage and works well
accessMode: ReadWriteOnce
size: 100Gi
volumes:
- name: dfs1
mountPath: /opt/apache/hdfs/datanode1
persistentVolumeClaim:
claimName: dfs1-hadoop-hadoop-hdfs-dn
- name: dfs2
mountPath: /opt/apache/hdfs/datanode2
persistentVolumeClaim:
claimName: dfs2-hadoop-hadoop-hdfs-dn
- name: dfs3
mountPath: /opt/apache/hdfs/datanode3
persistentVolumeClaim:
claimName: dfs3-hadoop-hadoop-hdfs-dn
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Cluster information:
Kubernetes version:
1.31.3
Installation method:
Host OS: Ubuntu server 24.04
CNI and version: calico/node:v3.29.0
CRI and version: containerd 1.7.24
What I did before
I uses Helm to deploy Rook and Hive in the cluster, where Rook utilizes the official rook-ceph and rook-ceph-cluster chart packages (both version 1.15.6). Hive uses a chart package that integrates Hadoop and Hive, and the image is custom-built, integrating Hadoop, Hive, Spark, and Flink.
Before using Rook, the persistence solution was unified using NFS, and the Hive data warehouse was running normally, indicating that both the chart packages and the images themselves should be fine.
Turn to cephfs
After switching from NFS to Ceph, three storage classes were automatically created, with Hive’s persistence now using CephFS, occupying 4 PVCs.
Some pods use Ceph RBD for persistence, and no issues have been found. In order to provide the Hive JDBC driver JAR file to Superset, a custom pod was created to upload the JAR file to CephFS. After deleting the pod used for the upload, the file still exists, and Superset continues running normally.
Problem occured
But Hadoop initialization of NameNode failed with an error, unable to create the required directories.
The pod containing the NameNode keeps crashing and cannot be accessed, so I accessed the pod running the ResourceManager (which comes from the same image) and ran hadoop namenode -format, which successfully initialized. The difference between the two pods is that the ResourceManager pod does not use persistence, so it is not using CephFS.
My configmaps
These are ConfigMaps of the Hive chart package:
This is the values.yaml file of the Hive chart package. Originally, when using NFS, the two StorageClasses were set to nfs-storage. Now they have been changed to ceph-filesystem, with no other modifications.
Thank you for your attention to this issue!
Beta Was this translation helpful? Give feedback.
All reactions