Enable lazy initialization of ext3/ext4 filesystems #38865

codablock · 2016-12-16T09:43:55Z

What this PR does / why we need it: It enables lazy inode table and journal initialization in ext3 and ext4.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #30752, fixes #30240

Release note:

Enable lazy inode table and journal initialization for ext3 and ext4

Special notes for your reviewer:
This PR removes the extended options to mkfs.ext3/mkfs.ext4, so that the defaults (enabled) for lazy initialization are used.

These extended options come from a script that was historically located at /usr/share/google/safe_format_and_mount and later ported to GO so this dependency to the script could be removed. After some search, I found the original script here: https://github.com/GoogleCloudPlatform/compute-image-packages/blob/legacy/google-startup-scripts/usr/share/google/safe_format_and_mount

Checking the history of this script, I found the commit Disable lazy init of inode table and journal.. This one introduces the extended flags with this description:

Now that discard with guaranteed zeroing is supported by PD,
initializing them is really fast and prevents perf from being affected
when the filesystem is first mounted.

The problem is, that this is not true for all cloud providers and all disk types, e.g. Azure and AWS. I only tested with magnetic disks on Azure and AWS, so maybe it's different for SSDs on these cloud providers. The result is that this performance optimization dramatically increases the time needed to format a disk in such cases.

When mkfs.ext4 is told to not lazily initialize the inode tables and the check for guaranteed zeroing on discard fails, it falls back to a very naive implementation that simply loops and writes zeroed buffers to the disk. Performance on this highly depends on free memory and also uses up all this free memory for write caching, reducing performance of everything else in the system.

As of #30752, there is also something inside kubelet that somehow degrades performance of all this. It's however not exactly known what it is but I'd assume it has something to do with cgroups throttling IO or memory.

I checked the kernel code for lazy inode table initialization. The nice thing is, that the kernel also does the guaranteed zeroing on discard check. If it is guaranteed, the kernel uses discard for the lazy initialization, which should finish in a just few seconds. If it is not guaranteed, it falls back to using bios, which does not require the use of the write cache. The result is, that free memory is not required and not touched, thus performance is maxed and the system does not suffer.

As the original reason for disabling lazy init was a performance optimization and the kernel already does this optimization by default (and in a much better way), I'd suggest to completely remove these flags and rely on the kernel to do it in the best way.

k8s-ci-robot · 2016-12-16T09:43:56Z

Hi @codablock. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with @k8s-bot ok to test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

If you have questions or suggestions related to this bot's behavior, please file an issue against the kubernetes/test-infra repository.

k8s-reviewable · 2016-12-16T09:44:04Z

This change is

codablock · 2016-12-16T09:45:06Z

CC @colemickens

rootfs · 2016-12-16T18:02:33Z

@k8s-bot ok to test

rootfs · 2016-12-16T18:03:00Z

lgtm, this is consistent with mkfs.ext4 default.

codablock · 2016-12-16T18:17:58Z

@rootfs any chance to get this into 1.5.x as well? As you know, azure is affected by this and makes using large volumes very difficult

rootfs · 2016-12-16T18:19:30Z

would love it in 1.5 once merged into 1.6
@lavalamp

k8s-ci-robot · 2016-12-16T18:20:20Z

Jenkins unit/integration failed for commit c938794e60b60dde81a8767b59ff2cfbf77b6312. Full PR test history.

The magic incantation to run this job again is @k8s-bot unit test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

rootfs · 2016-12-16T18:22:08Z

@codablock have you fixed unit test too?

safe_format_and_mount_test.go:176: Unexpected args [-F /dev/foo]. Expecting [-E lazy_itable_init=0,lazy_journal_init=0 -F /dev/foo]
safe_format_and_mount_test.go:176: Unexpected args [-F /dev/foo]. Expecting [-E lazy_itable_init=0,lazy_journal_init=0 -F /dev/foo]
safe_format_and_mount_test.go:176: Unexpected args [-F /dev/foo]. Expecting [-E lazy_itable_init=0,lazy_journal_init=0 -F /dev/foo]
safe_format_and_mount_test.go:176: Unexpected args [-F /dev/foo]. Expecting [-E lazy_itable_init=0,lazy_journal_init=0 -F /dev/foo]
safe_format_and_mount_test.go:176: Unexpected args [-F /dev/foo]. Expecting [-E lazy_itable_init=0,lazy_journal_init=0 -F /dev/foo]
safe_format_and_mount_test.go:176: Unexpected args [-F /dev/foo]. Expecting [-E lazy_itable_init=0,lazy_journal_init=0 -F /dev/foo]
safe_format_and_mount_test.go:176: Unexpected args [-F /dev/foo]. Expecting [-E lazy_itable_init=0,lazy_journal_init=0 -F /dev/foo]
safe_format_and_mount_test.go:176: Unexpected args [-F /dev/foo]. Expecting [-E lazy_itable_init=0,lazy_journal_init=0 -F /dev/foo]

k8s-ci-robot · 2016-12-16T18:50:58Z

Jenkins GCE e2e failed for commit c938794e60b60dde81a8767b59ff2cfbf77b6312. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

codablock · 2016-12-18T10:14:58Z

@rootfs Unit tests should be fixed now. I also removed the extended flags from all the bash scripts to be more consistent.

saad-ali · 2016-12-21T00:58:37Z

The root cause of #30752 appears to be an alignment issue in Azure. If that is the case, is this PR still necessary?

codablock · 2016-12-21T11:01:37Z

@saad-ali Alignment turned out to not be the fault on #30752. As we don't even use partitions in the Azure Dynamic Disk provisioning case, there is no chance to have unaligned partitions. I also did some performance tests to confirm that we don't have alignment issues.

Please see #30752 (comment) for detailed information on this.

colemickens · 2017-01-09T06:06:38Z

@saad-ali / @lavalamp Do we want to proceed with this? I'd like to see it merged since we can't seem to determine the underlying cause for why mkfs is so slow when run under kubelet with the current settings.

brendandburns · 2017-01-10T20:34:38Z

@codablock thanks for the detailed examination of the issue.

@saad-ali this looks ok to me and I think we should go ahead with it.

Any concerns before I LGTM?

Thanks
--brendan

brendandburns · 2017-01-12T20:15:36Z

I'm going to LGTM this. We can roll it back if @saad-ali has any further concerns.

/lgtm

codablock · 2017-01-13T09:46:08Z

@brendandburns Thanks for the LGTM :) The Github check Jenkins kops AWS e2e hangs on "Waiting for status to be reported" since quite some time. Do we have to retrigger CI?

Btw, is there any documentation available that describes the whole CI and merge process of k8s?

ncdc · 2017-01-17T15:03:31Z

@k8s-bot kops aws e2e test this

k8s-ci-robot · 2017-01-17T16:31:33Z

Jenkins kops AWS e2e failed for commit 13a2bc8. Full PR test history. cc @codablock

The magic incantation to run this job again is @k8s-bot kops aws e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

ncdc · 2017-01-17T17:35:04Z

Saw various timeouts in several test cases. Re-kicking to see if it's temporary.

@k8s-bot kops aws e2e test this

liggitt · 2017-01-18T06:01:03Z

@k8s-bot test this

k8s-ci-robot · 2017-01-18T07:52:12Z

Jenkins verification failed for commit 13a2bc8. Full PR test history. cc @codablock

The magic incantation to run this job again is @k8s-bot verify test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

rootfs · 2017-01-18T14:57:09Z

@k8s-bot verify test this

k8s-github-robot · 2017-01-18T16:24:41Z

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

k8s-github-robot · 2017-01-18T17:09:51Z

Automatic merge from submit-queue

colemickens · 2017-01-18T20:52:12Z

@saad-ali Can we get this one cherry-picked to 1.5.x as well?

colemickens · 2017-01-24T23:56:18Z

ping @saad-ali for a 1.5.x cherry-pick.

…rigin-release-1.5 Automatic merge from submit-queue Automated cherry pick of #38865 Cherry pick of #38865 on release-1.5. #38865: Enable lazy initialization of ext3/ext4 filesystems

k8s-cherrypick-bot · 2017-02-10T04:00:35Z

Commit found in the "release-1.5" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

colemickens · 2017-02-10T04:04:03Z

Ah, I didn't realize this had been tagged... I went ahead and manually sent the PR using the ./hack/cherry_pick_pull.sh script.

edit: Oh, no, I guess the cherry-pick just happened to get merged right around the same time I opened an automated cherry-pick of my own...

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 16, 2016

k8s-github-robot assigned lavalamp Dec 16, 2016

k8s-github-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Dec 16, 2016

codablock mentioned this pull request Dec 16, 2016

mkfs.ext4 is slow when run by kubelet #30752

Closed

Enable lazy initialization of ext3/ext4 filesystems

13a2bc8

codablock force-pushed the ext4_no_lazy_init branch from c938794 to 13a2bc8 Compare December 18, 2016 10:09

k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Dec 18, 2016

saad-ali added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Dec 21, 2016

theobolo mentioned this pull request Jan 7, 2017

Question : how to upgrade a cluster in Azure ACS ? colemickens/azure-kubernetes-status#15

Open

brendandburns self-assigned this Jan 10, 2017

brendandburns unassigned lavalamp Jan 10, 2017

brendandburns removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Jan 10, 2017

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 12, 2017

k8s-github-robot merged commit 6dfe5c4 into kubernetes:master Jan 18, 2017

colemickens mentioned this pull request Jan 18, 2017

Changing node when mounting an Azure disk is taking forever Azure/acs-engine#192

Closed

codablock mentioned this pull request Feb 1, 2017

Automated cherry pick of #38865 #40793

Merged

saad-ali added this to the v1.5 milestone Feb 7, 2017

saad-ali added cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cherrypick-candidate labels Feb 7, 2017

k8s-cherrypick-bot removed the cherrypick-candidate label Feb 10, 2017

rootfs mentioned this pull request Jun 12, 2017

Add optional mount args. Change the args for Azure disk mount. #47158

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable lazy initialization of ext3/ext4 filesystems #38865

Enable lazy initialization of ext3/ext4 filesystems #38865

codablock commented Dec 16, 2016 •

edited

Loading

k8s-ci-robot commented Dec 16, 2016

k8s-reviewable commented Dec 16, 2016

codablock commented Dec 16, 2016

rootfs commented Dec 16, 2016

rootfs commented Dec 16, 2016

codablock commented Dec 16, 2016

rootfs commented Dec 16, 2016

k8s-ci-robot commented Dec 16, 2016

rootfs commented Dec 16, 2016

k8s-ci-robot commented Dec 16, 2016

codablock commented Dec 18, 2016

saad-ali commented Dec 21, 2016

codablock commented Dec 21, 2016

colemickens commented Jan 9, 2017

brendandburns commented Jan 10, 2017

brendandburns commented Jan 12, 2017

codablock commented Jan 13, 2017

ncdc commented Jan 17, 2017

k8s-ci-robot commented Jan 17, 2017

ncdc commented Jan 17, 2017

liggitt commented Jan 18, 2017

k8s-ci-robot commented Jan 18, 2017

rootfs commented Jan 18, 2017

k8s-github-robot commented Jan 18, 2017

k8s-github-robot commented Jan 18, 2017

colemickens commented Jan 18, 2017

colemickens commented Jan 24, 2017

k8s-cherrypick-bot commented Feb 10, 2017

colemickens commented Feb 10, 2017 •

edited

Loading

Enable lazy initialization of ext3/ext4 filesystems #38865

Enable lazy initialization of ext3/ext4 filesystems #38865

Conversation

codablock commented Dec 16, 2016 • edited Loading

k8s-ci-robot commented Dec 16, 2016

k8s-reviewable commented Dec 16, 2016

codablock commented Dec 16, 2016

rootfs commented Dec 16, 2016

rootfs commented Dec 16, 2016

codablock commented Dec 16, 2016

rootfs commented Dec 16, 2016

k8s-ci-robot commented Dec 16, 2016

rootfs commented Dec 16, 2016

k8s-ci-robot commented Dec 16, 2016

codablock commented Dec 18, 2016

saad-ali commented Dec 21, 2016

codablock commented Dec 21, 2016

colemickens commented Jan 9, 2017

brendandburns commented Jan 10, 2017

brendandburns commented Jan 12, 2017

codablock commented Jan 13, 2017

ncdc commented Jan 17, 2017

k8s-ci-robot commented Jan 17, 2017

ncdc commented Jan 17, 2017

liggitt commented Jan 18, 2017

k8s-ci-robot commented Jan 18, 2017

rootfs commented Jan 18, 2017

k8s-github-robot commented Jan 18, 2017

k8s-github-robot commented Jan 18, 2017

colemickens commented Jan 18, 2017

colemickens commented Jan 24, 2017

k8s-cherrypick-bot commented Feb 10, 2017

colemickens commented Feb 10, 2017 • edited Loading

codablock commented Dec 16, 2016 •

edited

Loading

colemickens commented Feb 10, 2017 •

edited

Loading