-
-
Notifications
You must be signed in to change notification settings - Fork 866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix memory access out of bounds in numpy + chrome 89 #1474
Conversation
@rth There's a new intermittent failure in build-core =(
|
Thanks for trying to fix this!
Never saw this one before so far. |
I think I've identified a general (but tedious) procedure to fix these:
|
Actually I think I can add:
and replace most memory accesses in |
@rth I suspect fixing 0.17.0 + chrome 89 + numpy may be a lost cause. Question is, why are these random memory access failures only happening in numpy? Is it loaded in an unusual way? Or maybe these load/store issues aren't just in numpy but also in any C extension in a side module? How much test code do we have that uses C extension modules? I think we have good reason to believe that this chrome 89 bug does not affect the core Pyodide since that code has a lot of coverage. I think it's possible that not just numpy but all extension modules are broken. If it really is only numpy and not all extension modules that are affected, then we would need to get to the bottom of why numpy specifically. One thing that would be useful would be to go through the benchmarks one at a time and see which ones succeed / fail. That probably would provide valuable clues. |
Hmm, well we had some of those before as well e.g. in scipy #729 (comment) even before the emscripten migration (for all browsers). So it's not a completely new occurrence, just new in numpy.
Yeah, if it's widespread patching is likely not going to be enough. How about emscripten or LLVM options for instance to prevent inlining https://github.com/emscripten-core/emscripten/blob/e91b2274b84cafcad05382f77e4c4b6e1ddc354d/src/settings.js#L292. Those can be added for numpy specifically (in meta.yaml) or for all packages.
We can run package test suites inside pyodide in REPL. For instance,
to run tests for the
So it doesn't look like all C extensions are broken. For numpy better to run those tests submodule by submodule (e.g. on Edit: for instance in Firefox,
in Chrome, the browser tab crashes for dev version so I probably need to rebuild with debug symbols as you did. |
Well that's good =)
Ideally it'd be nice to run occasional tests on other browsers too. It'd be great to have a larger test suite that would run say every two weeks but only if there's been a new commit. Sounds annoying to set up though. |
I think the core test suite should likely catch these bugs we've been seeing. |
One thing that is more worrisome about this than say #729 is that the errors in #729 are at If the troubles are only in |
Unfortunately these errors are not problems with inlining. The crashes are happening with stuff like: for(i = 0; i < len; i++){
max = MAX(array[i], max);
}
for(i = 0; i < len; i++){
x = array[i]; // crash here
// do something with x
} |
@rth The docs say:
Is there a reason we couldn't use pip to install numpy in the Dockerfile and then just use that for the host numpy rather than building it? I think this could speed up build packages by a lot and also fix all those |
It might work, but I think we still need to patch the configuration pyodide/packages/numpy/meta.yaml Line 30 in 4c315c7
In any case, #1476 would be a solution to that build issue I think. |
@rth Okay this fixes it! (Benchmarks passed in test suite.) |
Looks like there is one more segfault in the chrome packages test suite, this one in E conftest.JavascriptException: RuntimeError: memory access out of bounds
E at __static_1 (<anonymous>:wasm-function[174]:0x17748)
E at byn$fpcast-emu$__static_1 (<anonymous>:wasm-function[506]:0x87748)
E at __static_2 (<anonymous>:wasm-function[175]:0x17eca)
E at __static_199 (<anonymous>:wasm-function[339]:0x538d5)
E at byn$fpcast-emu$__static_199 (<anonymous>:wasm-function[704]:0x881f7)
E at __static_254 (http://127.0.0.1:35323/pyodide.asm.wasm:wasm-function[900]:0x1d3c92)
E at byn$fpcast-emu$__static_254 (http://127.0.0.1:35323/pyodide.asm.wasm:wasm-function[21695]:0x7bc48e)
E at _PyObject_GenericGetAttrWithDict (http://127.0.0.1:35323/pyodide.asm.wasm:wasm-function[1712]:0x2183aa)
E at PyObject_GenericGetAttr (http://127.0.0.1:35323/pyodide.asm.wasm:wasm-function[1711]:0x21815f) |
Wonderful , great work @hoodmane !
I'll try to run the tests numpy test suite later and will report how it goes. It might be worth adding a comment on how you compute those offsets. |
Working on cleaning things up now. |
Okay I think this is ready to merge when the CI finishes. |
|
The good news is as far as I can tell this issue (or at least the original one) is not happening with Chrome 91 (and possibly 90) and with browser auto-update, most users actually seem to use the latest Chrome version. So 6 month after 91 release we could very well set it as minimum requirement and remove these patches. Spend some more time digging through the Chrome issue tracker, and opened a new issue https://bugs.chromium.org/p/chromium/issues/detail?id=1200031 to make sure it doesn't go unreported (if it wasn't already fixed) |
The code looks fine to me, thanks for the documentation effort. Let me just run a few more numpy tests before merging. |
Thanks, that clearly needed to be done. I added some info there. |
@rth We good to merge this now? |
Run a few numpy broadcasting tests, to check the soundness of the re-implementation, >>> import numpy
>>> import pytest
>>> pytest.main(['--pyargs', 'numpy', '--continue-on-collection-errors', '-k', 'broadcast'])
The 3 failures are due to ctypes. To measurable impact on the overall performance either. Great work @hoodmane ! |
Still crashed in Chrome 90.0.4430.72.
|
Using |
OK, you're right. I was using the old version code. Now, after I've pulled the newest code from git, no crash appears any more. |
Great! Thanks for checking @daoxian ! |
This is an attempt to fix #1473.