Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

functests.3nodes fails with IndexError #1749

Open
jschmid1 opened this issue Sep 16, 2019 · 4 comments
Open

functests.3nodes fails with IndexError #1749

jschmid1 opened this issue Sep 16, 2019 · 4 comments
Assignees
Labels

Comments

@jschmid1
Copy link
Contributor

salt-run --no-color state.orch ceph.functests.3nodes fails reproducably with:

2019-09-13T15:40:50.841 INFO:teuthology.orchestra.run.target-ses-070.stdout:----------
2019-09-13T15:40:50.841 INFO:teuthology.orchestra.run.target-ses-070.stdout:          ID: Rebuilding on rebuild.node test
2019-09-13T15:40:50.841 INFO:teuthology.orchestra.run.target-ses-070.stdout:    Function: salt.runner
2019-09-13T15:40:50.841 INFO:teuthology.orchestra.run.target-ses-070.stdout:        Name: rebuild.node
2019-09-13T15:40:50.842 INFO:teuthology.orchestra.run.target-ses-070.stdout:      Result: False
2019-09-13T15:40:50.842 INFO:teuthology.orchestra.run.target-ses-070.stdout:     Comment: Runner function 'rebuild.node' failed.
2019-09-13T15:40:50.842 INFO:teuthology.orchestra.run.target-ses-070.stdout:     Started: 15:38:45.153140
2019-09-13T15:40:50.842 INFO:teuthology.orchestra.run.target-ses-070.stdout:    Duration: 67140.127 ms
2019-09-13T15:40:50.842 INFO:teuthology.orchestra.run.target-ses-070.stdout:     Changes:
2019-09-13T15:40:50.843 INFO:teuthology.orchestra.run.target-ses-070.stdout:              ----------
2019-09-13T15:40:50.843 INFO:teuthology.orchestra.run.target-ses-070.stdout:              return:
2019-09-13T15:40:50.843 INFO:teuthology.orchestra.run.target-ses-070.stdout:                  Exception occurred in runner rebuild.node: Traceback (most recent call last):
2019-09-13T15:40:50.843 INFO:teuthology.orchestra.run.target-ses-070.stdout:                    File "/usr/lib/python3.6/site-packages/salt/client/mixins.py", line 377, in low
2019-09-13T15:40:50.844 INFO:teuthology.orchestra.run.target-ses-070.stdout:                      data['return'] = func(*args, **kwargs)
2019-09-13T15:40:50.844 INFO:teuthology.orchestra.run.target-ses-070.stdout:                    File "/srv/modules/runners/rebuild.py", line 221, in node
2019-09-13T15:40:50.844 INFO:teuthology.orchestra.run.target-ses-070.stdout:                      rebuild.run()
2019-09-13T15:40:50.844 INFO:teuthology.orchestra.run.target-ses-070.stdout:                    File "/srv/modules/runners/rebuild.py", line 204, in run
2019-09-13T15:40:50.844 INFO:teuthology.orchestra.run.target-ses-070.stdout:                      self._check_deploy(deploy_ret, minion)
2019-09-13T15:40:50.845 INFO:teuthology.orchestra.run.target-ses-070.stdout:                    File "/srv/modules/runners/rebuild.py", line 162, in _check_deploy
2019-09-13T15:40:50.845 INFO:teuthology.orchestra.run.target-ses-070.stdout:                      if ret[minion][0][0] != 0:
2019-09-13T15:40:50.845 INFO:teuthology.orchestra.run.target-ses-070.stdout:                  KeyError: 0
2019-09-13T15:40:50.845 INFO:teuthology.orchestra.run.target-ses-070.stdout:  Name: Wait for Ceph for rebuild.node test - Function: salt.state - Result: Changed Started: - 15:39:52.293730 Duration: 6884.255 ms
@jschmid1 jschmid1 added the bug label Sep 16, 2019
@jschmid1 jschmid1 self-assigned this Sep 16, 2019
@jschmid1
Copy link
Contributor Author

The error is raised due to an empty return from disks.deploy which is evaluated (unconditionally) in rebuild.py.

The fix for this is two-fold:

  1. Allow to have empty return (but raise an error) in _check_deploy

  2. Make osd.remove $id actually zap the disk (unmount/zap/clean)

Due to a yet unknown behavior in the osd.remove runner, which no longer completely destroys the LVs on the OSDs it destroys:

after a zap:

data1:~ # lsblk
NAME                                                                       MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda                                                                        254:0    0  20G  0 disk
└─vda1                                                                     254:1    0  20G  0 part /
vdb                                                                        254:16   0  20G  0 disk
└─ceph--051e6039--51aa--4216--92b1--15d97b25a1f0-osd--data--141e0a0c--86f8--4082--99eb--6e526010d7f7
                                                                           253:0    0  19G  0 lvm
vdc                                                                        254:32   0  20G  0 disk
└─ceph--e6cd697b--029b--4b78--8838--e1af0aa9e1df-osd--data--c1cd2c10--ee8b--4f90--bd0a--f26b4bcfc078
                                                                           253:1    0  19G  0 lvm
vdd                                                                        254:48   0  20G  0 disk
└─ceph--29305dd3--c305--4f87--8e6d--ab3507d1c70b-osd--data--a634f37e--4a1c--483c--bd8a--312361bd779c
                                                                           253:2    0  19G  0 lvm
vde                                                                        254:64   0  20G  0 disk
└─ceph--8ddf59a4--e7bd--4d9c--b06f--11e24aec4e52-osd--data--0ec7d57e--96dc--4a6f--a9aa--d2acbb68293d
                                                                           253:3    0  19G  0 lvm
vdf                                                                        254:80   0  20G  0 disk
└─ceph--bf65731b--15d4--4c62--b04c--46a796786b29-osd--data--5a37b17c--5f56--4cb3--b6cf--9d3153c7032b
                                                                           253:4    0  19G  0 lvm
vdg                                                                        254:96   0  10G  0 disk
└─ceph--226cf2f2--5f77--41fd--917c--f84601ba9d4d-osd--data--494a7156--7c50--4c59--b750--876cd38438ce
                                                                           253:5    0   9G  0 lvm
vdh                                                                        254:112  0  10G  0 disk
└─ceph--49cb4914--6852--4ada--af04--7f58ff9ac38d-osd--data--7227cdd6--87d9--4ef1--b8e0--787129bef068
                                                                           253:6    0   9G  0 lvm

it's expected to have clean/unmounted disks after this.

The command that is used is ceph-volume lvm zap --osd-id $id --destroy

I suspect that ceph-volume handles drives differently when passed with --osd-id and when passed a raw device (/dev/sdx). The --destroy parameter shouldn't change it's behavior based on the input.
This needs to be verified though.

@smithfarm
Copy link
Contributor

Note that this only started happening when we switched the ceph package from 14.2.2 to 14.2.3.

@smithfarm
Copy link
Contributor

smithfarm commented Sep 17, 2019

Also, upstream has released (or, rather, is in the process of releasing) 14.2.4 to fix this. See ceph/ceph#30429

@smithfarm
Copy link
Contributor

Confirmed - the failure doesn't happen with 14.2.4, so this is another symptom of the ceph-volume regression that found its way into 14.2.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants