-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bin/flatcar-install: run write_to_disk from subshell #97
Conversation
bin/flatcar-install
Outdated
@@ -596,7 +596,7 @@ function write_to_disk() { | |||
echo "Failed to reread partitions on ${DEVICE}" >&2 | |||
done | |||
[ -z "$try" ] || exit 1 | |||
udevadm settle | |||
udevadm settle && return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this work, too, because when udevadm settle fails, the script exists due to set -e
?
udevadm settle && return | |
udevadm settle | |
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment for this workaround would also be good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion - it works fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the trap? Can't the write the "done" in the last line instead of the return
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think the idea was to write the "done" as soon as the function did something, instead of the last line it should be done as first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current approach with the explicit return
doesn't cover the case where dd
failed and thus the is_modified
check wouldn't work anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed @krnowak approach and use a subshell + EXIT trap instead of explicit return
- but I still can't explain why it works on Equinix Metal setup.
I remember replacing all RETURN traps to EXIT traps in subshells because of that. See it here: flatcar/scripts@eee6b50 |
Wow, that's interesting thanks for sharing. Interesting to to see that we did not get the issue with 5.1 but in 5.2... |
The install script is tested in the Equinix Metal installation path, strange that we didn't see the issue there? |
5074ccd
to
f962af0
Compare
Thanks, I'm now even more confused. :D |
f962af0
to
a7c2128
Compare
bin/flatcar-install
Outdated
mkfifo -m 0600 "${WORKDIR}/disk_modified" | ||
trap '(exec 2>/dev/null ; echo done > "${WORKDIR}/disk_modified") &' RETURN | ||
trap '(exec 2>/dev/null ; echo done > "${WORKDIR}/disk_modified") &' EXIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trap '(exec 2>/dev/null ; echo done > "${WORKDIR}/disk_modified") &' EXIT | |
(exec 2>/dev/null ; echo done > "${WORKDIR}/disk_modified") & |
I'm fine with the current approach, my idea was to simplify this and don't use traps.
This should work because we don't wait for the write somewhere in parallel but it's all sequential code and the marker can be written early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the trap is creating a weird behavior:
++ write_to_disk
++ mkfifo -m 0600 /tmp/flatcar-install.r3mrgIyVoA/disk_modified
+ wget --tries 10 --timeout=20 --retry-connrefused --no-verbose -O - https://beta.release.flatcar-linux.net/amd64-usr/3602.1.0/flatcar_production_image.bin.bz2
+ gpg --batch --trusted-key E25D9AED0593B34A --verify /tmp/flatcar-install.r3mrgIyVoA/flatcar_production_image.bin.bz2.sig -
+ tee /dev/fd/62
++ lbzip2 -cd
+++ blockdev --getsz /dev/sda
++ exec
++ dd conv=nocreat count=1024 if=/dev/zero of=/dev/sda seek=41942016 status=none
++ dd bs=1M conv=nocreat of=/dev/sda status=none
2023-06-23 09:40:39 URL:https://beta.release.flatcar-linux.net/amd64-usr/3602.1.0/flatcar_production_image.bin.bz2 [379416813/379416813] -> "-" [1]
gpg: Signature made Mon May 29 09:40:52 2023 UTC
gpg: using RSA key 8D6DA7853CFE1B1ED346ED0DEDBDF411267EC954
gpg: issuer "buildbot@flatcar-linux.org"
gpg: key E25D9AED0593B34A marked as ultimately trusted
gpg: checking the trustdb
gpg: marginals needed: 3 completes needed: 1 trust model: pgp
gpg: depth: 0 valid: 1 signed: 0 trust: 0-, 0q, 0n, 0m, 0f, 1u
gpg: Good signature from "Flatcar Buildbot (Official Builds) <buildbot@flatcar-linux.org>" [ultimate]
+ wait_for_disk
+ '[' -n '' ']'
+ read -rt 7200 _disk_status
+ write_cloudinit
+ [[ -n '' ]]
+ write_ignition
+ [[ -n '' ]]
+ [[ -n '' ]]
+ echo 'Success! Flatcar Container Linux beta 3602.1.0 is installed on /dev/sda'
Success! Flatcar Container Linux beta 3602.1.0 is installed on /dev/sda
+ rm -rf /tmp/flatcar-install.r3mrgIyVoA
+ trap - EXIT
It ends right after the dd
(it does not go through the udevadm settle
and the for loop) so it takes a few second before the changes being visible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, install_from_url
doesn't wait for write_to_disk
to finish. It even doesn't capture the error code because > >(write_to_disk)
redirects the output to the input FD but it's not a pipe where pipefail
would work. This should be fixed to properly fail on error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prepared a fix in #99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to touch the waiting logic anyway now to signal the failure and removed the trap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two other RETURN
traps for umount
, do you want to use the subshell trick there?
Somehow, with the bash upgrade the trap RETURN signal is not handled correctly (it's ignored). We run now everything from a subshell and turn RETURN trap to an EXIT trap. Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
a7c2128
to
206f49a
Compare
@pothos done and tested with a merge of your PR and mine. ✔️ |
This pulls in flatcar/init#97 and flatcar/init#99 to work around a bash regression and add handling for disk write errors.
This pulls in flatcar/init#97 and flatcar/init#99 to work around a bash regression and add handling for disk write errors.
This pulls in flatcar/init#97 and flatcar/init#99 to work around a bash regression and add handling for disk write errors.
Somehow, with the bash upgrade the trap RETURN signal is not handled correctly (it's ignored).
Adding an explicit return makes it work.
The bash upgrade and the way it's done (a single huge commit) it's too complicated to make a proper bisect but I assume the issue is regarding the changes on how trap signals are handled.
Testing done