Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing hang in archive.CopyFileWithTar with invalid dst #17088

Merged
merged 1 commit into from
Oct 20, 2015

Conversation

swernli
Copy link
Contributor

@swernli swernli commented Oct 15, 2015

@jhowardmsft @jstarks @vbatts @tiborvass

archive.CopyFileWithTar can hit a hang if the destination path is invalid (either an invalid path name or an inaccessible folder), because archive.Untar will return the error without closing the Reader pipe but after CopyFileWithTar has started the go func that is waiting on the pipe. The fix here is to check the error of archive.Untar and if it failed, explicitly close the pipe to unblock the thread waiting on it.

This issue is hard to hit in Linux, since the unit tests run as root and there aren't any invalid directory names we could give, but in Windows simply including "ADD myfile.txt /invalid:directory/" in a Dockerfile would cause the build to hang and the handles to the files in the layer to be leaked when the daemon is killed.

Signed-off-by: Stefan J. Wernli swernli@microsoft.com

@LK4D4
Copy link
Contributor

LK4D4 commented Oct 15, 2015

LGTM

@unclejack
Copy link
Contributor

@swernli Can we also have some tests, please? It looks like it's a severe enough issue to make it worth the effort to add a test.

@tiborvass tiborvass added this to the 1.9.0 milestone Oct 15, 2015
@tiborvass tiborvass self-assigned this Oct 15, 2015
@tiborvass
Copy link
Contributor

@swernli I agree with @unclejack, if you could add a test that'd be fantastic.

@swernli
Copy link
Contributor Author

swernli commented Oct 15, 2015

@unclejack @tiborvass I spent a day or so working on a test, but found that the issue is much harder to repro in Linux than on Windows, and these tests do not yet run on Windows. Perhaps I can upload what I have and you can advise the right pattern to use?

The issue is that in order to get the problem to repro, the destination of the copy has to be invalid somehow. Linux doesn't have the strict folder naming that Windows does, so that leaves trying to copy into a folder that is inaccessible. Unfortunately, the tests appear to run as root, meaning I can't really create a folder that I can't access. I was going to look for a way to set the user ID to non-root for this part of the test to see if that would work.

Another concern is that when the failure does occur, since it causes a deadlock, go sees that no threads are progressing, and then crashes the test with a stack trace instead of giving a more reasonable test failure. Is this acceptable in a CI test? The output on my machine looks weird, since I only see messages for the previous test and then suddenly a crash, without even a clear indication that it was my test that caused it.

@swernli swernli force-pushed the sjw/archive_hang_fix branch 3 times, most recently from 79a050a to bc4dbfb Compare October 16, 2015 00:59
@swernli
Copy link
Contributor Author

swernli commented Oct 16, 2015

@unclejack @tiborvass Ok, I think I finally have a unit test that accurately reproduces the bug without my changes and confirms the fix with my changes. Any feedback on the test would be appreciated!

@swernli swernli force-pushed the sjw/archive_hang_fix branch from bc4dbfb to dea15da Compare October 16, 2015 01:04
@calavera
Copy link
Contributor

@swernli it looks like that test doesn't pass on janky 😔

@swernli
Copy link
Contributor Author

swernli commented Oct 16, 2015

@calavera Are the tests run as root? I'm trying to mark a directory as immutable using chattr, but it seems like that is the step that's failing. This works fine if I run the tests manually, but I'm not familiar with the environment the tests run in for janky. Should I expect chattr to work there?

@swernli swernli force-pushed the sjw/archive_hang_fix branch 2 times, most recently from bf2e44f to 6fba626 Compare October 16, 2015 20:54
@swernli
Copy link
Contributor Author

swernli commented Oct 16, 2015

@tiborvass @calavera I really need to get more clarity on how these tests are run in order to get a working test. I need to be able to create a directory that I cannot copy into. The solutions I've tried that work locally are creating the folder with 0000 permissions (which only works as non-root users, since root has access to the folder anyway) and modifying the folder using chattr so that it is immutable (which fails in your test environment either because chattr is unavailable or blocked). So far every strategy that I've tried for testing this only works locally, but not in your test environment.

Since this is much easier to repro on Windows (simply pick an invalid directory name, like "not:valid") perhaps this is a unit test better left for when we actually have Windows tests for archive?

@swernli swernli changed the title Fixing hang in archive.CopyWithTar with invalid dst Fixing hang in archive.CopyFileWithTar with invalid dst Oct 16, 2015
@calavera
Copy link
Contributor

Since this is much easier to repro on Windows (simply pick an invalid directory name, like "not:valid") perhaps this is a unit test better left for when we actually have Windows tests for archive?

that makes sense, we can move that test to pkg/archive/archive_windows_test.go and leave it there building only for windows.

@tiborvass
Copy link
Contributor

Yeah that makes sense.

Signed-off-by: Stefan J. Wernli <swernli@microsoft.com>
@swernli swernli force-pushed the sjw/archive_hang_fix branch from 6fba626 to a150eee Compare October 19, 2015 18:56
@swernli
Copy link
Contributor Author

swernli commented Oct 19, 2015

@tiborvass @calavera Ok, what I've done instead is add the test only in archive_windows_test.go. Right now the tests in this folder don't actually compile on Windows, but if I comment enough of them out so that this one actually gets run, it correctly tests the scenario. When we eventually get to this folder in our Windows CI work, we'll already have at least one working test in place!

@calavera
Copy link
Contributor

That sounds perfect. The more we can work towards that goal the better 😸

LGTM

jessfraz pushed a commit that referenced this pull request Oct 20, 2015
Fixing hang in archive.CopyFileWithTar with invalid dst
@jessfraz jessfraz merged commit 98c01c2 into moby:master Oct 20, 2015
@swernli
Copy link
Contributor Author

swernli commented Oct 20, 2015

Thanks!

@tiborvass tiborvass removed their assignment Oct 22, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants