-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable lazy initialization of ext3/ext4 filesystems #38865
Conversation
Hi @codablock. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with If you have questions or suggestions related to this bot's behavior, please file an issue against the kubernetes/test-infra repository. |
CC @colemickens |
@k8s-bot ok to test |
lgtm, this is consistent with mkfs.ext4 default. |
@rootfs any chance to get this into 1.5.x as well? As you know, azure is affected by this and makes using large volumes very difficult |
would love it in 1.5 once merged into 1.6 |
Jenkins unit/integration failed for commit c938794e60b60dde81a8767b59ff2cfbf77b6312. Full PR test history. The magic incantation to run this job again is |
@codablock have you fixed unit test too?
|
Jenkins GCE e2e failed for commit c938794e60b60dde81a8767b59ff2cfbf77b6312. Full PR test history. The magic incantation to run this job again is |
c938794
to
13a2bc8
Compare
@rootfs Unit tests should be fixed now. I also removed the extended flags from all the bash scripts to be more consistent. |
The root cause of #30752 appears to be an alignment issue in Azure. If that is the case, is this PR still necessary? |
@saad-ali Alignment turned out to not be the fault on #30752. As we don't even use partitions in the Azure Dynamic Disk provisioning case, there is no chance to have unaligned partitions. I also did some performance tests to confirm that we don't have alignment issues. Please see #30752 (comment) for detailed information on this. |
@codablock thanks for the detailed examination of the issue. @saad-ali this looks ok to me and I think we should go ahead with it. Any concerns before I LGTM? Thanks |
I'm going to LGTM this. We can roll it back if @saad-ali has any further concerns. /lgtm |
@brendandburns Thanks for the LGTM :) The Github check Btw, is there any documentation available that describes the whole CI and merge process of k8s? |
@k8s-bot kops aws e2e test this |
Jenkins kops AWS e2e failed for commit 13a2bc8. Full PR test history. cc @codablock The magic incantation to run this job again is Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Saw various timeouts in several test cases. Re-kicking to see if it's temporary. @k8s-bot kops aws e2e test this |
@k8s-bot test this |
Jenkins verification failed for commit 13a2bc8. Full PR test history. cc @codablock The magic incantation to run this job again is Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@k8s-bot verify test this |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue |
@saad-ali Can we get this one cherry-picked to 1.5.x as well? |
ping @saad-ali for a 1.5.x cherry-pick. |
Commit found in the "release-1.5" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
Ah, I didn't realize this had been tagged... I went ahead and manually sent the PR using the edit: Oh, no, I guess the cherry-pick just happened to get merged right around the same time I opened an automated cherry-pick of my own... |
What this PR does / why we need it: It enables lazy inode table and journal initialization in ext3 and ext4.
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #30752, fixes #30240Release note:
Special notes for your reviewer:
This PR removes the extended options to mkfs.ext3/mkfs.ext4, so that the defaults (enabled) for lazy initialization are used.
These extended options come from a script that was historically located at /usr/share/google/safe_format_and_mount and later ported to GO so this dependency to the script could be removed. After some search, I found the original script here: https://github.com/GoogleCloudPlatform/compute-image-packages/blob/legacy/google-startup-scripts/usr/share/google/safe_format_and_mount
Checking the history of this script, I found the commit Disable lazy init of inode table and journal.. This one introduces the extended flags with this description:
The problem is, that this is not true for all cloud providers and all disk types, e.g. Azure and AWS. I only tested with magnetic disks on Azure and AWS, so maybe it's different for SSDs on these cloud providers. The result is that this performance optimization dramatically increases the time needed to format a disk in such cases.
When mkfs.ext4 is told to not lazily initialize the inode tables and the check for guaranteed zeroing on discard fails, it falls back to a very naive implementation that simply loops and writes zeroed buffers to the disk. Performance on this highly depends on free memory and also uses up all this free memory for write caching, reducing performance of everything else in the system.
As of #30752, there is also something inside kubelet that somehow degrades performance of all this. It's however not exactly known what it is but I'd assume it has something to do with cgroups throttling IO or memory.
I checked the kernel code for lazy inode table initialization. The nice thing is, that the kernel also does the guaranteed zeroing on discard check. If it is guaranteed, the kernel uses discard for the lazy initialization, which should finish in a just few seconds. If it is not guaranteed, it falls back to using bios, which does not require the use of the write cache. The result is, that free memory is not required and not touched, thus performance is maxed and the system does not suffer.
As the original reason for disabling lazy init was a performance optimization and the kernel already does this optimization by default (and in a much better way), I'd suggest to completely remove these flags and rely on the kernel to do it in the best way.