-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory corruption in 0.9.3 #183
Comments
To explain a bit more what I meant by it being invalid, this line: Line 30 in 819b7f6
is taking something (presumably an integer?) emitted from a C library and then trying to interpret it as an struct SpglibReturnCode
data::Union{Variant{:SUCCESS, (), Tuple{}}, Variant{:SPACEGROUP_SEARCH_FAILED, (), Tuple{}}, ...}
end where each of those function SpglibReturnCode(i::Int)
if i == 0
SUCCESS
elseif i == 1
SPACEGROUP_SEARCH_FAILED
elseif ...
...
else
error("invalid return code")
end
end and then you'd have a sum type you can work with I guess. |
Many thanks Mason, and also @singularitti for providing this very useful package! |
Thank you for reporting this issue @kbarros and thank you for your guidance @MasonProtter! I probably misunderstood how to use your package. However, I am not sure what the bug is: the code actually work for my tests. Could you shed light on me what specific example is breaking? @kbarros |
Yeah, that's because julia appears to use an julia> let r = Ref(0)
p::Ptr{SpglibReturnCode} = pointer_from_objref(r)
unsafe_load(p)
end
SUCCESS::SpglibReturnCode
julia> let r = Ref(1)
p::Ptr{SpglibReturnCode} = pointer_from_objref(r)
unsafe_load(p)
end
SPACEGROUP_SEARCH_FAILED::SpglibReturnCode
julia> let r = Ref(2)
p::Ptr{SpglibReturnCode} = pointer_from_objref(r)
unsafe_load(p)
end
CELL_STANDARDIZATION_FAILED::SpglibReturnCode
julia> let r = Ref(3)
p::Ptr{SpglibReturnCode} = pointer_from_objref(r)
unsafe_load(p)
end
SYMMETRY_OPERATION_SEARCH_FAILED::SpglibReturnCode This works locally, but it's definitely possible for this to cause problems in unexpected ways |
We at DFTK.jl are also noticing crashes since upgrading Spglib to 0.9.3 (see JuliaMolSim/DFTK.jl#947 (comment)). So the issues is not specific to @kbarros 's example. |
@singularitti, I have reached out on the Julia Slack server. Id be happy to work together with you to resolve this, since it might be affecting quite a lot of users. |
This fork has contains the fix proposed by @MasonProtter: https://github.com/kbarros/Spglib.jl. Specifically, here is the revised function: function get_error_code()
# The return value for the C function `spg_get_error_code` is a C enum
# called SpglibError. The C compiler guarantees that enums are internally
# stored as type `int`; its size is "implementation dependent". Typically 4
# or 8 bytes, but this could depend on the architecture and compiler details
# https://stackoverflow.com/a/366033/500314. Julia provides `CInt` to
# guarantee an integer that has the same size as the C `int` value. Manually
# map this enum tag to variants of the `@enum SpglibReturnCode`. The latter
# is defined as a Union using the Julia SumTypes library, and its
# representation details are also implementation dependent (e.g., the "tag"
# of this Union need not have the same size as a `CInt`). Note that a
# mistake here could cause memory corruption, as discussed in
# https://github.com/singularitti/Spglib.jl/issues/183.
code = @ccall libsymspg.spg_get_error_code()::CInt
if code == 0
return SUCCESS
elseif code == 1
return SPACEGROUP_SEARCH_FAILED
elseif code == 2
return CELL_STANDARDIZATION_FAILED
elseif code == 3
return SYMMETRY_OPERATION_SEARCH_FAILED
elseif code == 4
return ATOMS_TOO_CLOSE
elseif code == 5
return POINTGROUP_NOT_FOUND
elseif code == 6
return NIGGLI_FAILED
elseif code == 7
return DELAUNAY_FAILED
elseif code == 8
return ARRAY_SIZE_SHORTAGE
elseif code == 9
return NONE
end
end CI was already not passing, but I verified that this change does not cause any new test failures. I can't reproduce the crash on my Mac. |
It's not ready yet, I need to update other places where |
Given that this package is only using these return codes to pass values back and forth from a C library, it seems better to just revert the change and use |
Agreed. Updated my fork to remove SumTypes and label return types explicitly. Please see: https://github.com/kbarros/Spglib.jl/blob/main/src/error.jl Note that Spglib <= v0.9.2 also seemed to be dependent on undefined behavior (casting Cint into a Julia |
Thank you for your help @kbarros @MasonProtter! And I apologize all the trouble this bug have caused @CedricTravelletti. The only reason I want to use SumTypes.jl is that I want to switch to a mechanism where |
Testing the fork is tricky because the memory corruption only causes problems on certain machines (not my Mac, for example). We have just now tested the fork on a colleague's machine and verified that it seems to work. I will open a pull request. (Now opened: #184). @CedricTravelletti would you like to test the fork before merging? |
Resolves crashes from memory corruption: singularitti/Spglib.jl#183.
We have noticed intermittent crashes since upgrading from Spglib 0.9.2 to 0.9.3. Looks like the only real difference between these versions is the switch from enums to SumTypes: 819b7f6.
It appears that there is memory corruption when reading an spglib_jll response into a sum type, and this has been confirmed @MasonProtter on the SumTypes repo: MasonProtter/SumTypes.jl#71
The error could appear as a segfault or some other error. First found on our Github actions CI with Julia 1.9.4, Ubuntu, x86: https://github.com/SunnySuite/Sunny.jl/actions/runs/7629267203/job/20782289111?pr=217
I haven't seen the crashes yet on my Mac. On Linux/x86, reproducing might be as simple as
] add Sunny#spglib_crash
and thenusing Sunny
.The text was updated successfully, but these errors were encountered: