-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass force=true option to rename in case of loading failure in compilecache #36638
Conversation
err = ccall(:jl_fs_rename, Int32, (Cstring, Cstring), src, dst) | ||
# on error, default to cp && rm | ||
if err < 0 | ||
# force: is already done in the mv function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what is meant by this comment? I think this looks like a very old vestige, when this function got split?
@@ -1338,7 +1338,7 @@ function compilecache(pkg::PkgId, path::String) | |||
chmod(tmppath, filemode(path) & 0o777) | |||
|
|||
# this is atomic according to POSIX: | |||
rename(tmppath, cachefile) | |||
rename(tmppath, cachefile; force=true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could simply be changed to mv(tmppath, cachefile; force = true)
, and then the changes to rename()
are unnecessary.
But I'm not too happy with this solution because the reason this was added in the first place was for distributed execution on a shared filesystem, so that processes wouldn't error out because of racing to create the cache. With this fix, every process will overwrite the cache file -- the intended behavior was for all the processes that lost the race to silently fail to rename their temporary cache to the primary cache.
I was thinking of adding the following before the finally
:
catch e
if !isa(e, ArgumentError)
throw(e)
end
Or something like that but it doesn't matter much I suppose; we don't save on the system calls with the intended way and rename
is atomic so it shouldn't break anything.
Any thoughts on this @KristofferC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kpamnany Note the current error in the linked issue does not involve distributed execution. The error crops up upon any recompilation of packages, which makes it impossible to use Pkg
on master without manually deleting the procompilation files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand. The swiftest fix is to change the rename
to mv
with force = true
as I said.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, might be best if you could open a PR to discuss your proposed improvement over this stop-gap solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On inspection, mv()
uses checkfor_mv_cp_cptree()
before rename()
and that function does an rm()
, thus destroying the atomicity of rename()
which is what we wanted in the first place. So your fix here is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to merge this in the meantime because I can't currently develop and use Julia packages on master with this bug.
It's probably worthwhile to explore something like in #36638 (comment) in a separate PR
…ecache (JuliaLang#36638) This seems to be affecting Windows systems in particular
As noted in #41584 and https://discourse.julialang.org/t/safe-overwriting-of-files/117758/3 `mv` is usually expected to be "best effort atomic". Currently calling `mv` with `force=true` calls `checkfor_mv_cp_cptree(src, dst, "moving"; force=true)` before renaming. `checkfor_mv_cp_cptree` will delete `dst` if exists and isn't the same as `src`. If `dst` is an existing file and julia stops after deleting `dst` but before doing the rename, `dst` will be removed but will not be replaced with `src`. This PR changes `mv` with `force=true` to first try rename, and only delete `dst` if that fails. Assuming file system support and the first rename works, julia stopping will not lead to `dst` being removed without being replaced. This also replaces a stopgap solution from #36638 (comment)
…ng#55384) As noted in JuliaLang#41584 and https://discourse.julialang.org/t/safe-overwriting-of-files/117758/3 `mv` is usually expected to be "best effort atomic". Currently calling `mv` with `force=true` calls `checkfor_mv_cp_cptree(src, dst, "moving"; force=true)` before renaming. `checkfor_mv_cp_cptree` will delete `dst` if exists and isn't the same as `src`. If `dst` is an existing file and julia stops after deleting `dst` but before doing the rename, `dst` will be removed but will not be replaced with `src`. This PR changes `mv` with `force=true` to first try rename, and only delete `dst` if that fails. Assuming file system support and the first rename works, julia stopping will not lead to `dst` being removed without being replaced. This also replaces a stopgap solution from JuliaLang#36638 (comment)
This fixes #36621 for me
What's happening is that the rename via
jl_fs_rename
fails and then we have acp
failure sinceforce=true
is not passed.