Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harvester to-be-investigated issue tracking #3

Open
w13915984028 opened this issue Aug 16, 2022 · 12 comments
Open

Harvester to-be-investigated issue tracking #3

w13915984028 opened this issue Aug 16, 2022 · 12 comments

Comments

@w13915984028
Copy link
Owner

Track those issues.

@w13915984028
Copy link
Owner Author

w13915984028 commented Aug 16, 2022

Remove from and rejoin a node from cluster:

harvester/harvester#2665 (root cause is found)
harvester/harvester#2470

At the moment, the re-started node failed at bootstrap stage, rancherd waited kubelet, but the latter failed to start.

@w13915984028
Copy link
Owner Author

w13915984028 commented Aug 22, 2022

The centos-7 vmdk file from vmware workstation (using both IDE or SCSI), converted to qemu image qcow2, then can start with
sudo qemu-system-x86_64 -m 4G -accel kvm -hda /home/jianwang/images/centos-ide.qcow2, but failed when starting with kubevirt.

kubevirt/kubevirt#8175

harvester/harvester#2561 .

solved with workaround

@w13915984028
Copy link
Owner Author

w13915984028 commented Aug 31, 2022

rancher-monitoring-prometheus-adapter seems busy, TBD

I0831 14:50:04.753431       1 httplog.go:104] "HTTP" verb="GET" URI="/healthz" latency="89.139µs" userAgent="kube-probe/1.22" audit-ID="9ba55a6d-5ecd-4f92-967c-c8e87083933f" srcIP="192.168.122.131:54022" resp=200
I0831 14:50:04.753867       1 httplog.go:104] "HTTP" verb="GET" URI="/healthz" latency="972.854µs" userAgent="kube-probe/1.22" audit-ID="ef2d53bc-b80e-4138-a45f-0bb251b4903b" srcIP="192.168.122.131:54014" resp=200
I0831 14:50:06.607622       1 httplog.go:104] "HTTP" verb="GET" URI="/apis/custom.metrics.k8s.io/v1beta1?timeout=32s" latency="13.856765ms" userAgent="kubectl/v1.22.12+rke2r1 (linux/amd64) kubernetes/b058e17" audit-ID="ad3a1b20-14db-4311-8306-761234a3bba3" srcIP="192.168.122.131:36634" resp=200
I0831 14:50:06.619239       1 httplog.go:104] "HTTP" verb="GET" URI="/apis/custom.metrics.k8s.io/v1beta1?timeout=32s" latency="10.991853ms" userAgent="kubectl/v1.22.12+rke2r1 (linux/amd64) kubernetes/b058e17" audit-ID="03d0c6e1-e30d-4e60-8299-cc76e4166865" srcIP="192.168.122.131:36634" resp=200
I0831 14:50:06.635917       1 httplog.go:104] "HTTP" verb="GET" URI="/apis/custom.metrics.k8s.io/v1beta1?timeout=32s" latency="9.884385ms" userAgent="kubectl/v1.22.12+rke2r1 (linux/amd64) kubernetes/b058e17" audit-ID="bcebc2ad-41f9-4e0c-a85a-11ac07c1670e" srcIP="192.168.122.131:36634" resp=200
I0831 14:50:06.665654       1 httplog.go:104] "HTTP" verb="GET" URI="/apis/custom.metrics.k8s.io/v1beta1?timeout=32s" latency="13.042048ms" userAgent="kubectl/v1.22.12+rke2r1 (linux/amd64) kubernetes/b058e17" audit-ID="3801e8e4-84a6-4d58-8c53-7edfb50a02d8" srcIP="192.168.122.131:36634" resp=200
harv31:~ # 
harv31:~ # 
harv31:~ # 
harv31:~ # 
harv31:~ # 
harv31:~ # kk logs -n cattle-monitoring-system rancher-monitoring-prometheus-adapter-8846d4757-qphc4
harv31:~ # kk get pods -n cattle-monitoring-system rancher-monitoring-prometheus-adapter-8846d4757-qphc4 -o JSON
{

        "containers": [
            {
                "args": [
                    "/adapter",
                    "--secure-port=6443",
                    "--cert-dir=/tmp/cert",
                    "--logtostderr=true",
                    "--prometheus-url=http://rancher-monitoring-prometheus.cattle-monitoring-system.svc:9090",
                    "--metrics-relist-interval=1m",
                    "--v=4",
                    "--config=/etc/adapter/config.yaml"
                ],
                "image": "rancher/mirrored-prometheus-adapter-prometheus-adapter:v0.9.0",
                "imagePullPolicy": "IfNotPresent",
harv31:~ # ps aux | grep adapter

10001    15803 11.0  0.9 1030400 200488 ?      Ssl  14:14   4:23 /adapter /adapter --secure-port=6443 --cert-dir=/tmp/cert --logtostderr=true --prometheus-url=http://rancher-monitoring-prometheus.cattle-monitoring-system.svc:9090 --metrics-relist-interval=1m --v=4 --config=/etc/adapter/config.yaml

@w13915984028
Copy link
Owner Author

w13915984028 commented Nov 2, 2022

Node promotion:

harvester/harvester#3039
[BUG] Adding a third node mades the second one to fail

And a similar one:
harvester/harvester#3091

@w13915984028
Copy link
Owner Author

disk pressure cause by tmp file
[BUG] Disk pressure caused by Rancher agent tmp files

@w13915984028
Copy link
Owner Author

w13915984028 commented Feb 14, 2023

NVMe PCIe - Slow Virtual Machine Performance
harvester/harvester#3356

upload BIG image, fail at 99%, dueto checksum computing:
harvester/harvester#3450
longhorn/longhorn#4865
#3555 [BUG] both upload and download may fail due to LH GetFileChecksum in last step

@w13915984028
Copy link
Owner Author

w13915984028 commented May 31, 2023

Install fail/timeout, when the Harvester NODE and the ISO server have poor network performance.
harvester/harvester#2651 FIXED
[[BUG] Virtual Media Installation Hangs For 2+hrs With "containerd.sock" Connection Error]

After NODE rebooting, it may:

Trouble shooting: Stuck in 'Setting up node/Harvester'
[BUG] Stuck in 'Setting up node/Harvester' after install harvester #3844
harvester/harvester#3844 (comment)

POD harvester-cluster-repo-67ddddf8d7-zc7zd is ImagePullBackOff

@w13915984028
Copy link
Owner Author

Trouble shooting: upload/download of large images stuck at 99%:
harvester/harvester#3555
harvester/harvester#3450
harvester/harvester#3086

@w13915984028
Copy link
Owner Author

Trouble shooting: wrong configuration of VLAN network IP segment:

[BUG]when the vm vlan network segment is the same as the host, host can not connection to vm
harvester/harvester#3414

[BUG] iptables on Harvester hosts prevents Vms network from working correctly (no access to internet)
harvester/harvester#3852

[BUG] After the weekend, VM on one node can't connect to outside network
harvester/harvester#3745

@w13915984028
Copy link
Owner Author

Future enhancement: keep VM MAC stable: should allow user to supply MAC address when creating VM and keep it unchanged:

harvester/harvester#3602
[FEATURE] Prefix Mac Address

[Question] How to ensure permanent static IP assigned to newly built VMs?
harvester/harvester#3682

[BUG] backup restore does not carry the same MAC address
harvester/harvester#3541

@w13915984028
Copy link
Owner Author

Data corruption / in-consistency

harvester/harvester#2522
[BUG] Windows VMs crashing

harvester/harvester#2448
[BUG] VM file system may be corrupted when Stopped via the UI
#2448

harvester/harvester#1432
[BUG] monitoring not loading - invalid checksum; corrupted block
#1432

harvester/harvester#2092
[BUG] Single-node Harvester rancher-monitoring-prometheus enter "CrashLoopBackOff" due to "reloadBlocks: corrupted block" #2092

@w13915984028
Copy link
Owner Author

Recover VM asap when HOST is gone:

harvester/harvester#3864
[[Question] When a harvester host unexpected shutdown or reboot, the VMs on the host will not failover to the other nodes on

https://docs.harvesterhci.io/dev/advanced/settings/#vm-force-reset-policy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant