-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Java] Initial Apple Silicon support #5891
Conversation
On an M1 Macbook Pro clang reports: $ clang -dumpmachine arm64-apple-darwin20.1.0 So the regex check needs to look for "arm64" first, as otherwise it matches 32-bit ARM and you get NEON compilation failures.
… aarch64 architectures).
@tracysh , can you share some thought about the MLAS cmake change, I personally prefer adding a MacOS ARM64 in the cmake file, instead of detecting from the dumpout. Also, The failures in Unit test is caused the incorrect build script, or the arch difference? |
The build change looks okay, but Apple has tweaked the calling convention from the standard. @Craigacp, can you try modifying onnxruntime\core\mlas\lib\aarch64\QgemmU8X8KernelNeon.S to change line 26 to the below? This packs the parameters per Apple's changes. #if defined(APPLE) https://developer.apple.com/documentation/xcode/writing_arm64_code_for_apple_platforms |
@tracysh nope, that didn't work, the same tests fail. |
There's a lot of output from the errors, the start of it looks like this:
I attached the rest of the build log. |
@Craigacp, just to double-check: if you have just the below line: .equ .LGemmU8X8KernelFrame_ZeroMode, 12 The code I pasted above lost the underscores from the |
Yep I just finished running it without the ifdef and it works fine, give me a min to recheck it with the ifdef & underscores. |
All tests pass with that fix. I formatted your fix so it fits in with the rest of the .S file. Also it builds really fast on this computer. |
@Craigacp , did you build your project with --use_xcode, or not? |
My build command line is |
With
|
This specific issue could be solved by buildsytem=1, but there will be more other issues. |
/azp run Linux CPU CI Pipeline,orttraining-linux-ci-pipeline,centos7_cpu,Windows GPU TensorRT CI Pipeline,Windows GPU CI Pipeline |
Azure Pipelines successfully started running 4 pipeline(s). |
/azp run MacOS CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline |
Azure Pipelines successfully started running 2 pipeline(s). |
/azp run Linux CPU CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,MacOS NoContribops CI Pipeline,Windows CPU CI Pipeline, |
Azure Pipelines successfully started running 1 pipeline(s). |
Looks like the Mac test was cancelled? Windows doesn't seem to be setting |
I've updated the Java CMake architecture detection logic. To be honest it would be nice if something in CMake would consistently set the platform and architecture so I didn't have to borrow the logic from |
/azp run MacOS CI Pipeline |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run Linux CPU CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,MacOS NoContribops CI Pipeline,Windows CPU CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline,Linux CPU CI Pipeline,orttraining-linux-ci-pipeline,centos7_cpu,Windows GPU TensorRT CI Pipeline,Windows GPU CI Pipeline |
You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list. |
/azp run Linux CPU CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,MacOS NoContribops CI Pipeline, |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run Linux CPU CI Pipeline |
/azp run Linux CPU Minimal Build E2E CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp help |
Supported commands
See additional documentation. |
/azp run |
You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list. |
/azp run Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS NoContribops CI Pipeline |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, centos7_cpu, centos7_cpu (linux_centos_ci Debug), centos7_cpu (linux_centos_ci Release), orttraining-linux-ci-pipeline, orttraining-linux-ci-pipeline |
Azure Pipelines successfully started running 4 pipeline(s). |
* Rearranging checks in onnxruntime_mlas.cmake to pickup Apple Silicon. On an M1 Macbook Pro clang reports: $ clang -dumpmachine arm64-apple-darwin20.1.0 So the regex check needs to look for "arm64" first, as otherwise it matches 32-bit ARM and you get NEON compilation failures. * Adding Java side library loading support for Apple Silicon (and other aarch64 architectures). * Adding Qgemm fix from @tracysh * Fixes the java packaging on Windows. * Missed a check in the java platform detector. (cherry picked from commit 8b83c51)
* Update onnx (#5720) * update onnx * update docker image for testing (cherry picked from commit 705d093) * cherry pick PR 5720 * C#: Add CreateFromMemory to FixedBufferOnnxValue to allow bind user buffers and pass custom binary compatible types (#5886) Add CreateFromMemory to FixedBufferOnnxValue so users can bind their own custom binary compatible buffers to feed/fetch data. (cherry picked from commit c2d6100) * [Java] Initial Apple Silicon support (#5891) * Rearranging checks in onnxruntime_mlas.cmake to pickup Apple Silicon. On an M1 Macbook Pro clang reports: $ clang -dumpmachine arm64-apple-darwin20.1.0 So the regex check needs to look for "arm64" first, as otherwise it matches 32-bit ARM and you get NEON compilation failures. * Adding Java side library loading support for Apple Silicon (and other aarch64 architectures). * Adding Qgemm fix from @tracysh * Fixes the java packaging on Windows. * Missed a check in the java platform detector. (cherry picked from commit 8b83c51) * Add OpenVINO EP shared lib to Py Wheel (#5920) * Add OpenVINO EP shared lib to Py Wheel Include the libonnxruntime_providers_openvino.so/.dll to the wheel * Follow libs.extend pattern as other EPs (cherry picked from commit 4092686) * Make NNAPI EP reject nodes with no-shape inputs (#5927) (cherry picked from commit 8736865) * Sahar/fix documentation shared lib (#5926) * Update OpenVINO-ExecutionProvider.Md update openvino-executionprovider.md for shared library * Update Build.md updated --build_shared_lib flag for building openvino shared provider lib * Update Dockerfile.openvino building for shared library with the new changes for openvino shared lib * Revert "Update Build.md" This reverts commit c9cf5fe. * Revert "Update Dockerfile.openvino " This reverts commit e1624e4. * Update OpenVINO-ExecutionProvider.md fix documentation to the shared library Co-authored-by: sfatimar <sahar.fatima@intel/com> (cherry picked from commit 8168c91) * Update dockerfiles (#5929) 1. Remove conda from the images. Because conda contains a file named /opt/miniconda/lib/libcrypto.so.1.0.0 which can't pass our security scan. Also, it will be easier for us to manage the third party usage registrations. 2. Remove openssh from the images. Because the official openssh package provided by Ubuntu can't pass our security scan. 3. Reduce the image size to 1/3 by using stages. Also, because it contains less packages, it will be less often needed to update. 4. Put the LICENSE-IMAGE.txt file in right place. It is missed in current images. You can see it was added to a temp folder "/code" but it got deleted afterwards. 5. Update the CPU docker image's base image to Ubuntu 18.04. The GPU one is already 18.04. It's better to keep them the same. 6. Remove the build arg ONNXRUNTIME_REPO/ONNXRUNTIME_BRANCH. Instead, the new one always uses the local source. I feel it can reduce confusion. (cherry picked from commit 1dbabb2) * Add Longformer Attention Cuda Op(#5932) Limitation: Global tokens must be at the beginning of sequence. (cherry picked from commit 31a6be3) * Bug fix for MaskRCNN and FasterRCNN (#5935) Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> (cherry picked from commit e39e82b) * Fix publishing pipelines. (#5942) Fix publishing pipelines. (cherry picked from commit c4b55d2) * Fix Python Linux GPU package name (#5943) Fix Python Linux GPU package name. I accidentally added "noopenmp" to it. (cherry picked from commit 5fdd9f0) * Update BUILD.md with shared provider information (#5944) * Update build instructions to include information about shared providers (cherry picked from commit 27513d1) * [OpenVINO]Fix memory leak in `IsDebugEnabled()` under Windows (#5948) * w * w Co-authored-by: modav <modav@microsoft.com> (cherry picked from commit e207589) * Add support for Python 3.8+ on Windows when CUDA is enabled (#5956) (cherry picked from commit 015fbb3) * Support the cross compiling for Apple Silicon (#5974) * support macos_arm64 cross compiling * update the build docs * update as commented. * Update BUILD.md (cherry picked from commit 2ec211e) * Update docker files to put 'unattended-upgrades' in a right place(#5983) (cherry picked from commit 3323fb6) * Enable the xcode build for Apple Silicon (arm64 MacOS) (#5924) * fix the build script for macos/xcode * add the version check * correct the osx-arch configuration * typo (cherry picked from commit 1852ade) * Add python 3.9 support (#5874) 1. Add python 3.9 support(except Linux ARM) 2. Add Windows GPU python 3.8 to our packaging pipeline. * Revert some pipeline changes in #5874 Co-authored-by: Ashwini Khade <askhade@microsoft.com> Co-authored-by: Du Li <duli@OrtTrainingDev0.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: Adam Pocock <craigacp@gmail.com> Co-authored-by: S. Manohar Karlapalem <manohar.karlapalem@intel.com> Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com> Co-authored-by: sfatimar <64512376+sfatimar@users.noreply.github.com> Co-authored-by: Changming Sun <chasun@microsoft.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: Maajid khan <n.maajidkhan@gmail.com> Co-authored-by: Ryan Hill <38674843+RyanUnderhill@users.noreply.github.com> Co-authored-by: Moshe David <mosdav165@gmail.com> Co-authored-by: Ivan Stojiljkovic <17503404+ivanst0@users.noreply.github.com> Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
Description:
The Java library loader needed support adding for aarch64 binaries, and the cmake update to put those binaries in the right place.
I also rearranged the CPU architecture detection in
cmake/onnxruntime_mlas.cmake
, as Apple Silicon Macs report their architecture asarm64
which was previously caught by the 32-bit ARM regex.For reference the
clang -dumpmachine
test produces this output:Note this PR makes the Java tests pass on Apple Silicon, however there are test failures in the core library:
I've not looked into these yet, and I suspect it's because of some issues in the asm which I'll need some help to figure out.
Motivation and Context