Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bin/flatcar-install: run write_to_disk from subshell #97

Merged
merged 1 commit into from
Jun 26, 2023

Conversation

tormath1
Copy link
Contributor

@tormath1 tormath1 commented Jun 14, 2023

Somehow, with the bash upgrade the trap RETURN signal is not handled correctly (it's ignored).

Adding an explicit return makes it work.


The bash upgrade and the way it's done (a single huge commit) it's too complicated to make a proper bisect but I assume the issue is regarding the changes on how trap signals are handled.

Testing done

  • Locally
  • Implementing Mantle test for this script

@tormath1 tormath1 self-assigned this Jun 14, 2023
@@ -596,7 +596,7 @@ function write_to_disk() {
echo "Failed to reread partitions on ${DEVICE}" >&2
done
[ -z "$try" ] || exit 1
udevadm settle
udevadm settle && return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this work, too, because when udevadm settle fails, the script exists due to set -e?

Suggested change
udevadm settle && return
udevadm settle
return

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment for this workaround would also be good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion - it works fine.

Copy link
Member

@pothos pothos Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the trap? Can't the write the "done" in the last line instead of the return?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think the idea was to write the "done" as soon as the function did something, instead of the last line it should be done as first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current approach with the explicit return doesn't cover the case where dd failed and thus the is_modified check wouldn't work anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed @krnowak approach and use a subshell + EXIT trap instead of explicit return - but I still can't explain why it works on Equinix Metal setup.

@krnowak
Copy link
Member

krnowak commented Jun 14, 2023

I remember replacing all RETURN traps to EXIT traps in subshells because of that. See it here: flatcar/scripts@eee6b50

@tormath1
Copy link
Contributor Author

I remember replacing all RETURN traps to EXIT traps in subshells because of that. See it here: flatcar/scripts@eee6b50

Wow, that's interesting thanks for sharing. Interesting to to see that we did not get the issue with 5.1 but in 5.2...

@pothos
Copy link
Member

pothos commented Jun 14, 2023

The install script is tested in the Equinix Metal installation path, strange that we didn't see the issue there?

@tormath1 tormath1 force-pushed the tormath1/flatcar-install branch from 5074ccd to f962af0 Compare June 14, 2023 14:08
@tormath1
Copy link
Contributor Author

The install script is tested in the Equinix Metal installation path, strange that we didn't see the issue there?

Thanks, I'm now even more confused. :D

@tormath1 tormath1 force-pushed the tormath1/flatcar-install branch from f962af0 to a7c2128 Compare June 15, 2023 12:45
@tormath1 tormath1 changed the title bin/flatcar-install: explicit return bin/flatcar-install: run write_to_disk from subshell Jun 15, 2023
mkfifo -m 0600 "${WORKDIR}/disk_modified"
trap '(exec 2>/dev/null ; echo done > "${WORKDIR}/disk_modified") &' RETURN
trap '(exec 2>/dev/null ; echo done > "${WORKDIR}/disk_modified") &' EXIT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
trap '(exec 2>/dev/null ; echo done > "${WORKDIR}/disk_modified") &' EXIT
(exec 2>/dev/null ; echo done > "${WORKDIR}/disk_modified") &

I'm fine with the current approach, my idea was to simplify this and don't use traps.
This should work because we don't wait for the write somewhere in parallel but it's all sequential code and the marker can be written early.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the trap is creating a weird behavior:

++ write_to_disk
++ mkfifo -m 0600 /tmp/flatcar-install.r3mrgIyVoA/disk_modified
+ wget --tries 10 --timeout=20 --retry-connrefused --no-verbose -O - https://beta.release.flatcar-linux.net/amd64-usr/3602.1.0/flatcar_production_image.bin.bz2
+ gpg --batch --trusted-key E25D9AED0593B34A --verify /tmp/flatcar-install.r3mrgIyVoA/flatcar_production_image.bin.bz2.sig -
+ tee /dev/fd/62
++ lbzip2 -cd
+++ blockdev --getsz /dev/sda
++ exec
++ dd conv=nocreat count=1024 if=/dev/zero of=/dev/sda seek=41942016 status=none
++ dd bs=1M conv=nocreat of=/dev/sda status=none
2023-06-23 09:40:39 URL:https://beta.release.flatcar-linux.net/amd64-usr/3602.1.0/flatcar_production_image.bin.bz2 [379416813/379416813] -> "-" [1]
gpg: Signature made Mon May 29 09:40:52 2023 UTC
gpg:                using RSA key 8D6DA7853CFE1B1ED346ED0DEDBDF411267EC954
gpg:                issuer "buildbot@flatcar-linux.org"
gpg: key E25D9AED0593B34A marked as ultimately trusted
gpg: checking the trustdb
gpg: marginals needed: 3  completes needed: 1  trust model: pgp
gpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u
gpg: Good signature from "Flatcar Buildbot (Official Builds) <buildbot@flatcar-linux.org>" [ultimate]
+ wait_for_disk
+ '[' -n '' ']'
+ read -rt 7200 _disk_status
+ write_cloudinit
+ [[ -n '' ]]
+ write_ignition
+ [[ -n '' ]]
+ [[ -n '' ]]
+ echo 'Success! Flatcar Container Linux beta 3602.1.0 is installed on /dev/sda'
Success! Flatcar Container Linux beta 3602.1.0 is installed on /dev/sda
+ rm -rf /tmp/flatcar-install.r3mrgIyVoA
+ trap - EXIT

It ends right after the dd (it does not go through the udevadm settle and the for loop) so it takes a few second before the changes being visible

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, install_from_url doesn't wait for write_to_disk to finish. It even doesn't capture the error code because > >(write_to_disk) redirects the output to the input FD but it's not a pipe where pipefail would work. This should be fixed to properly fail on error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prepared a fix in #99

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to touch the waiting logic anyway now to signal the failure and removed the trap.

Copy link
Member

@pothos pothos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two other RETURN traps for umount, do you want to use the subshell trick there?

Somehow, with the bash upgrade the trap RETURN signal is not handled
correctly (it's ignored).

We run now everything from a subshell and turn RETURN trap to an EXIT
trap.

Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
@tormath1 tormath1 force-pushed the tormath1/flatcar-install branch from a7c2128 to 206f49a Compare June 26, 2023 08:43
@tormath1
Copy link
Contributor Author

@pothos done and tested with a merge of your PR and mine. ✔️

@tormath1 tormath1 marked this pull request as ready for review June 26, 2023 12:15
@tormath1 tormath1 merged commit d311e4b into flatcar-master Jun 26, 2023
@tormath1 tormath1 deleted the tormath1/flatcar-install branch June 26, 2023 12:15
pothos added a commit to flatcar/scripts that referenced this pull request Jun 26, 2023
This pulls in
flatcar/init#97
and
flatcar/init#99
to work around a bash regression and add handling for disk write errors.
pothos added a commit to flatcar/scripts that referenced this pull request Jun 27, 2023
This pulls in
flatcar/init#97
and
flatcar/init#99
to work around a bash regression and add handling for disk write errors.
pothos added a commit to flatcar/scripts that referenced this pull request Jun 27, 2023
This pulls in
flatcar/init#97
and
flatcar/init#99
to work around a bash regression and add handling for disk write errors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants