[Java] Initial Apple Silicon support #5891

Craigacp · 2020-11-22T19:00:33Z

Description:
The Java library loader needed support adding for aarch64 binaries, and the cmake update to put those binaries in the right place.

I also rearranged the CPU architecture detection in cmake/onnxruntime_mlas.cmake, as Apple Silicon Macs report their architecture as arm64 which was previously caught by the 32-bit ARM regex.

For reference the clang -dumpmachine test produces this output:

$ clang -dumpmachine
arm64-apple-darwin20.1.0

Note this PR makes the Java tests pass on Apple Silicon, however there are test failures in the core library:

1: [==========] 2265 tests from 189 test suites ran. (30822 ms total)
1: [  PASSED  ] 2245 tests.
1: [  FAILED  ] 20 tests, listed below:
1: [  FAILED  ] DynamicQuantizeMatMul.Int8_test
1: [  FAILED  ] DynamicQuantizeMatMul.Int8_test_bias
1: [  FAILED  ] DynamicQuantizeMatMul.UInt8_test
1: [  FAILED  ] DynamicQuantizeMatMul.UInt8_test_bias
1: [  FAILED  ] MatMulIntegerToFloat.Int8_test
1: [  FAILED  ] MatMulIntegerToFloat.Int8_bias_test
1: [  FAILED  ] MatMulIntegerToFloat.UInt8_test
1: [  FAILED  ] MatMulIntegerToFloat.UInt8_bias_test
1: [  FAILED  ] QAttentionTest.QAttentionBatch2
1: [  FAILED  ] QAttentionTest.QAttentionPastState_u8u8
1: [  FAILED  ] QAttentionTest.QAttentionPastState_u8s8
1: [  FAILED  ] DynamicQuantLSTMTest.SmallSize
1: [  FAILED  ] DynamicQuantLSTMTest.LargeSize
1: [  FAILED  ] MatmulIntegerOpTest.MatMulInteger_WithZero_ZeroPoint
1: [  FAILED  ] MatmulIntegerOpTest.MatMulInteger_Uint8_Int8_Scalar
1: [  FAILED  ] MatmulIntegerOpTest.MatMulInteger_Uint8_Int8_GEMV
1: [  FAILED  ] ConvIntegerTest.WithGroup_2D
1: [  FAILED  ] ConvIntegerTest.WithPadding_3D
1: [  FAILED  ] ConvIntegerTest.WithStride2_2D
1: [  FAILED  ] ConvIntegerTest.WithStride3_2D

I've not looked into these yet, and I suspect it's because of some issues in the asm which I'll need some help to figure out.

Motivation and Context

Why is this change required? What problem does it solve? Makes ONNX Runtime work on my M1 Macbook Pro.

On an M1 Macbook Pro clang reports: $ clang -dumpmachine arm64-apple-darwin20.1.0 So the regex check needs to look for "arm64" first, as otherwise it matches 32-bit ARM and you get NEON compilation failures.

… aarch64 architectures).

wenbingl · 2020-11-23T19:02:26Z

@tracysh , can you share some thought about the MLAS cmake change, I personally prefer adding a MacOS ARM64 in the cmake file, instead of detecting from the dumpout.

Also, The failures in Unit test is caused the incorrect build script, or the arch difference?

tracysh · 2020-11-23T19:23:27Z

The build change looks okay, but Apple has tweaked the calling convention from the standard. @Craigacp, can you try modifying onnxruntime\core\mlas\lib\aarch64\QgemmU8X8KernelNeon.S to change line 26 to the below? This packs the parameters per Apple's changes.

#if defined(APPLE)
.equ .LGemmU8X8KernelFrame_ZeroMode, 12
#else
.equ .LGemmU8X8KernelFrame_ZeroMode, 16
#endif

https://developer.apple.com/documentation/xcode/writing_arm64_code_for_apple_platforms

Craigacp · 2020-11-23T21:29:32Z

@tracysh nope, that didn't work, the same tests fail.

Craigacp · 2020-11-23T22:06:20Z

There's a lot of output from the errors, the start of it looks like this:

1: [----------] 5 tests from DynamicQuantizeMatMul
1: [ RUN      ] DynamicQuantizeMatMul.Int8_test
1: /Users/craigacp/Development/onnxruntime/onnxruntime/test/providers/provider_test_utils.cc:156: Failure
1: The difference between expected[i] and output[i] is 339801.45132279396, which exceeds threshold, where
1: expected[i] evaluates to -339800,
1: output[i] evaluates to 1.4513227939605713, and
1: threshold evaluates to 9.9999997473787516e-05.
1: i:64, provider_type: CPUExecutionProvider
1: Google Test trace:
1: /Users/craigacp/Development/onnxruntime/onnxruntime/test/common/tensor_op_test_utils.cc:14: ORT test random seed: 1017963480
1: /Users/craigacp/Development/onnxruntime/onnxruntime/test/providers/provider_test_utils.cc:156: Failure
1: The difference between expected[i] and output[i] is 358001.86117720604, which exceeds threshold, where
1: expected[i] evaluates to 358003.3125,
1: output[i] evaluates to 1.4513227939605713, and
1: threshold evaluates to 9.9999997473787516e-05.
1: i:64, provider_type: CPUExecutionProvider
1: Google Test trace:
1: /Users/craigacp/Development/onnxruntime/onnxruntime/test/common/tensor_op_test_utils.cc:14: ORT test random seed: 1017963480
1: [  FAILED  ] DynamicQuantizeMatMul.Int8_test (12 ms)
1: [ RUN      ] DynamicQuantizeMatMul.Int8_test_bias
1: /Users/craigacp/Development/onnxruntime/onnxruntime/test/providers/provider_test_utils.cc:156: Failure
1: The difference between expected[i] and output[i] is 394420.71676766872, which exceeds threshold, where
1: expected[i] evaluates to -394419.40625,
1: output[i] evaluates to 1.3105176687240601, and
1: threshold evaluates to 9.9999997473787516e-05.
1: i:64, provider_type: CPUExecutionProvider
1: Google Test trace:
1: /Users/craigacp/Development/onnxruntime/onnxruntime/test/common/tensor_op_test_utils.cc:14: ORT test random seed: 1017963480
1: /Users/craigacp/Development/onnxruntime/onnxruntime/test/providers/provider_test_utils.cc:156: Failure
1: The difference between expected[i] and output[i] is 299807.25198233128, which exceeds threshold, where
1: expected[i] evaluates to 299808.5625,
1: output[i] evaluates to 1.3105176687240601, and
1: threshold evaluates to 9.9999997473787516e-05.
1: i:64, provider_type: CPUExecutionProvider
1: Google Test trace:
1: /Users/craigacp/Development/onnxruntime/onnxruntime/test/common/tensor_op_test_utils.cc:14: ORT test random seed: 1017963480
1: [  FAILED  ] DynamicQuantizeMatMul.Int8_test_bias (13 ms)

I attached the rest of the build log.
build_log.txt

tracysh · 2020-11-23T22:29:51Z

@Craigacp, just to double-check: if you have just the below line:

.equ .LGemmU8X8KernelFrame_ZeroMode, 12

The code I pasted above lost the underscores from the "__APPLE__".

Craigacp · 2020-11-23T22:31:01Z

Yep I just finished running it without the ifdef and it works fine, give me a min to recheck it with the ifdef & underscores.

Craigacp · 2020-11-23T22:38:39Z

All tests pass with that fix. I formatted your fix so it fits in with the rest of the .S file.

Also it builds really fast on this computer.

wenbingl · 2020-11-23T23:01:03Z

@Craigacp , did you build your project with --use_xcode, or not?

Craigacp · 2020-11-23T23:16:38Z

My build command line is ./build.sh --update --parallel --build --test --build_java. I can rerun with the --use_xcode option.

Craigacp · 2020-11-23T23:18:25Z

With --use_xcode I get this error:

CMake Error in external/onnx/CMakeLists.txt:
  The custom command generating

    /Users/craigacp/Development/onnxruntime/build/MacOS/Debug/external/onnx/onnx/onnx-operators-ml.proto

  is attached to multiple targets:

    gen_onnx_operators_proto
    onnx_proto

  but none of these is a common dependency of the other(s).  This is not
  allowed by the Xcode "new build system".

wenbingl · 2020-11-23T23:25:15Z

With --use_xcode I get this error:

CMake Error in external/onnx/CMakeLists.txt:
  The custom command generating

    /Users/craigacp/Development/onnxruntime/build/MacOS/Debug/external/onnx/onnx/onnx-operators-ml.proto

  is attached to multiple targets:

    gen_onnx_operators_proto
    onnx_proto

  but none of these is a common dependency of the other(s).  This is not
  allowed by the Xcode "new build system".

This specific issue could be solved by buildsytem=1, but there will be more other issues.
So please merge PR after validation, and xcode issue will be addressed in the another PR.

thiagocrepaldi · 2020-11-23T23:45:09Z

/azp run Linux CPU CI Pipeline,orttraining-linux-ci-pipeline,centos7_cpu,Windows GPU TensorRT CI Pipeline,Windows GPU CI Pipeline

azure-pipelines · 2020-11-23T23:45:29Z

Azure Pipelines successfully started running 4 pipeline(s).

thiagocrepaldi · 2020-11-23T23:46:47Z

/azp run MacOS CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline

azure-pipelines · 2020-11-23T23:46:59Z

Azure Pipelines successfully started running 2 pipeline(s).

thiagocrepaldi · 2020-11-23T23:49:20Z

/azp run Linux CPU CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,MacOS NoContribops CI Pipeline,Windows CPU CI Pipeline,

azure-pipelines · 2020-11-23T23:49:30Z

Azure Pipelines successfully started running 1 pipeline(s).

Craigacp · 2020-11-24T01:02:31Z

Looks like the Mac test was cancelled? Windows doesn't seem to be setting X86_64 like I expected, so I'll check the logic.

Craigacp · 2020-11-24T03:06:40Z

I've updated the Java CMake architecture detection logic. To be honest it would be nice if something in CMake would consistently set the platform and architecture so I didn't have to borrow the logic from cmake/onnxruntime_mlas.cmake, especially given the MSVC code path doesn't set the same environment variables as the non-MSVC path, but that can be tidied up later.

wenbingl · 2020-11-24T17:31:34Z

/azp run MacOS CI Pipeline

azure-pipelines · 2020-11-24T17:31:43Z

Azure Pipelines successfully started running 1 pipeline(s).

wenbingl · 2020-11-24T18:06:35Z

/azp run Linux CPU CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,MacOS NoContribops CI Pipeline,Windows CPU CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline,Linux CPU CI Pipeline,orttraining-linux-ci-pipeline,centos7_cpu,Windows GPU TensorRT CI Pipeline,Windows GPU CI Pipeline

azure-pipelines · 2020-11-24T18:06:40Z

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

wenbingl · 2020-11-24T18:07:41Z

/azp run Linux CPU CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,MacOS NoContribops CI Pipeline,

azure-pipelines · 2020-11-24T18:07:52Z

Azure Pipelines successfully started running 1 pipeline(s).

wenbingl · 2020-11-24T19:29:28Z

/azp run Linux CPU CI Pipeline

wenbingl · 2020-11-24T19:36:08Z

/azp run Linux CPU Minimal Build E2E CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline

azure-pipelines · 2020-11-24T19:36:17Z

Azure Pipelines successfully started running 1 pipeline(s).

snnn · 2020-11-24T19:57:49Z

/azp help

azure-pipelines · 2020-11-24T19:57:53Z

Supported commands help: Get descriptions, examples and documentation about supported commands Example: help "command_name" list: List all pipelines for this repository using a comment. Example: "list" run: Run all pipelines or specific pipelines for this repository using a comment. Use this command by itself to trigger all related pipelines, or specify specific pipelines to run. Example: "run" or "run pipeline_name, pipeline_name, pipeline_name" where: Report back the Azure DevOps orgs that are related to this repository and org Example: "where" See additional documentation.

snnn · 2020-11-24T19:57:54Z

/azp run

azure-pipelines · 2020-11-24T19:57:59Z

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

snnn · 2020-11-24T19:58:35Z

/azp run Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS NoContribops CI Pipeline

azure-pipelines · 2020-11-24T19:58:50Z

Azure Pipelines successfully started running 1 pipeline(s).

wenbingl · 2020-11-24T21:59:52Z

/azp run Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, centos7_cpu, centos7_cpu (linux_centos_ci Debug), centos7_cpu (linux_centos_ci Release), orttraining-linux-ci-pipeline, orttraining-linux-ci-pipeline

azure-pipelines · 2020-11-24T22:00:16Z

Azure Pipelines successfully started running 4 pipeline(s).

@tracysh

* Rearranging checks in onnxruntime_mlas.cmake to pickup Apple Silicon. On an M1 Macbook Pro clang reports: $ clang -dumpmachine arm64-apple-darwin20.1.0 So the regex check needs to look for "arm64" first, as otherwise it matches 32-bit ARM and you get NEON compilation failures. * Adding Java side library loading support for Apple Silicon (and other aarch64 architectures). * Adding Qgemm fix from @tracysh * Fixes the java packaging on Windows. * Missed a check in the java platform detector. (cherry picked from commit 8b83c51)

@tracysh

* Update onnx (#5720) * update onnx * update docker image for testing (cherry picked from commit 705d093) * cherry pick PR 5720 * C#: Add CreateFromMemory to FixedBufferOnnxValue to allow bind user buffers and pass custom binary compatible types (#5886) Add CreateFromMemory to FixedBufferOnnxValue so users can bind their own custom binary compatible buffers to feed/fetch data. (cherry picked from commit c2d6100) * [Java] Initial Apple Silicon support (#5891) * Rearranging checks in onnxruntime_mlas.cmake to pickup Apple Silicon. On an M1 Macbook Pro clang reports: $ clang -dumpmachine arm64-apple-darwin20.1.0 So the regex check needs to look for "arm64" first, as otherwise it matches 32-bit ARM and you get NEON compilation failures. * Adding Java side library loading support for Apple Silicon (and other aarch64 architectures). * Adding Qgemm fix from @tracysh * Fixes the java packaging on Windows. * Missed a check in the java platform detector. (cherry picked from commit 8b83c51) * Add OpenVINO EP shared lib to Py Wheel (#5920) * Add OpenVINO EP shared lib to Py Wheel Include the libonnxruntime_providers_openvino.so/.dll to the wheel * Follow libs.extend pattern as other EPs (cherry picked from commit 4092686) * Make NNAPI EP reject nodes with no-shape inputs (#5927) (cherry picked from commit 8736865) * Sahar/fix documentation shared lib (#5926) * Update OpenVINO-ExecutionProvider.Md update openvino-executionprovider.md for shared library * Update Build.md updated --build_shared_lib flag for building openvino shared provider lib * Update Dockerfile.openvino building for shared library with the new changes for openvino shared lib * Revert "Update Build.md" This reverts commit c9cf5fe. * Revert "Update Dockerfile.openvino " This reverts commit e1624e4. * Update OpenVINO-ExecutionProvider.md fix documentation to the shared library Co-authored-by: sfatimar <sahar.fatima@intel/com> (cherry picked from commit 8168c91) * Update dockerfiles (#5929) 1. Remove conda from the images. Because conda contains a file named /opt/miniconda/lib/libcrypto.so.1.0.0 which can't pass our security scan. Also, it will be easier for us to manage the third party usage registrations. 2. Remove openssh from the images. Because the official openssh package provided by Ubuntu can't pass our security scan. 3. Reduce the image size to 1/3 by using stages. Also, because it contains less packages, it will be less often needed to update. 4. Put the LICENSE-IMAGE.txt file in right place. It is missed in current images. You can see it was added to a temp folder "/code" but it got deleted afterwards. 5. Update the CPU docker image's base image to Ubuntu 18.04. The GPU one is already 18.04. It's better to keep them the same. 6. Remove the build arg ONNXRUNTIME_REPO/ONNXRUNTIME_BRANCH. Instead, the new one always uses the local source. I feel it can reduce confusion. (cherry picked from commit 1dbabb2) * Add Longformer Attention Cuda Op(#5932) Limitation: Global tokens must be at the beginning of sequence. (cherry picked from commit 31a6be3) * Bug fix for MaskRCNN and FasterRCNN (#5935) Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> (cherry picked from commit e39e82b) * Fix publishing pipelines. (#5942) Fix publishing pipelines. (cherry picked from commit c4b55d2) * Fix Python Linux GPU package name (#5943) Fix Python Linux GPU package name. I accidentally added "noopenmp" to it. (cherry picked from commit 5fdd9f0) * Update BUILD.md with shared provider information (#5944) * Update build instructions to include information about shared providers (cherry picked from commit 27513d1) * [OpenVINO]Fix memory leak in `IsDebugEnabled()` under Windows (#5948) * w * w Co-authored-by: modav <modav@microsoft.com> (cherry picked from commit e207589) * Add support for Python 3.8+ on Windows when CUDA is enabled (#5956) (cherry picked from commit 015fbb3) * Support the cross compiling for Apple Silicon (#5974) * support macos_arm64 cross compiling * update the build docs * update as commented. * Update BUILD.md (cherry picked from commit 2ec211e) * Update docker files to put 'unattended-upgrades' in a right place(#5983) (cherry picked from commit 3323fb6) * Enable the xcode build for Apple Silicon (arm64 MacOS) (#5924) * fix the build script for macos/xcode * add the version check * correct the osx-arch configuration * typo (cherry picked from commit 1852ade) * Add python 3.9 support (#5874) 1. Add python 3.9 support(except Linux ARM) 2. Add Windows GPU python 3.8 to our packaging pipeline. * Revert some pipeline changes in #5874 Co-authored-by: Ashwini Khade <askhade@microsoft.com> Co-authored-by: Du Li <duli@OrtTrainingDev0.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: Adam Pocock <craigacp@gmail.com> Co-authored-by: S. Manohar Karlapalem <manohar.karlapalem@intel.com> Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com> Co-authored-by: sfatimar <64512376+sfatimar@users.noreply.github.com> Co-authored-by: Changming Sun <chasun@microsoft.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: Maajid khan <n.maajidkhan@gmail.com> Co-authored-by: Ryan Hill <38674843+RyanUnderhill@users.noreply.github.com> Co-authored-by: Moshe David <mosdav165@gmail.com> Co-authored-by: Ivan Stojiljkovic <17503404+ivanst0@users.noreply.github.com> Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>

Craigacp added 2 commits November 21, 2020 23:05

Rearranging checks in onnxruntime_mlas.cmake to pickup Apple Silicon.

d66fb33

On an M1 Macbook Pro clang reports: $ clang -dumpmachine arm64-apple-darwin20.1.0 So the regex check needs to look for "arm64" first, as otherwise it matches 32-bit ARM and you get NEON compilation failures.

Adding Java side library loading support for Apple Silicon (and other…

004d695

… aarch64 architectures).

Craigacp requested a review from a team as a code owner November 22, 2020 19:00

Craigacp marked this pull request as draft November 22, 2020 19:00

guoyu-wang requested a review from wenbingl November 23, 2020 06:17

wenbingl requested a review from tracysh November 23, 2020 18:55

Adding Qgemm fix from @tracysh

f17e73e

Craigacp marked this pull request as ready for review November 23, 2020 22:38

wenbingl previously approved these changes Nov 23, 2020

View reviewed changes

Fixes the java packaging on Windows.

7f25d4c

Craigacp dismissed wenbingl’s stale review via 7f25d4c November 24, 2020 03:04

Missed a check in the java platform detector.

06e2a3e

faxu added the release:1.6 label Nov 24, 2020

wenbingl approved these changes Nov 24, 2020

View reviewed changes

wenbingl merged commit 8b83c51 into microsoft:master Nov 24, 2020

faxu added the triage:approved label Dec 1, 2020

faxu removed the release:1.6 label Dec 4, 2020

Craigacp deleted the apple-silicon-fix branch January 11, 2021 01:45

[Java] Initial Apple Silicon support #5891

[Java] Initial Apple Silicon support #5891

Conversation

Craigacp commented Nov 22, 2020 • edited Loading

wenbingl commented Nov 23, 2020

tracysh commented Nov 23, 2020

Craigacp commented Nov 23, 2020

Craigacp commented Nov 23, 2020 • edited Loading

tracysh commented Nov 23, 2020

Craigacp commented Nov 23, 2020 • edited Loading

Craigacp commented Nov 23, 2020 • edited Loading

wenbingl commented Nov 23, 2020

Craigacp commented Nov 23, 2020

Craigacp commented Nov 23, 2020

wenbingl commented Nov 23, 2020

thiagocrepaldi commented Nov 23, 2020

azure-pipelines bot commented Nov 23, 2020

thiagocrepaldi commented Nov 23, 2020

azure-pipelines bot commented Nov 23, 2020

thiagocrepaldi commented Nov 23, 2020

azure-pipelines bot commented Nov 23, 2020

Craigacp commented Nov 24, 2020

Craigacp commented Nov 24, 2020 • edited Loading

wenbingl commented Nov 24, 2020

azure-pipelines bot commented Nov 24, 2020

wenbingl commented Nov 24, 2020

azure-pipelines bot commented Nov 24, 2020

wenbingl commented Nov 24, 2020

azure-pipelines bot commented Nov 24, 2020

wenbingl commented Nov 24, 2020

wenbingl commented Nov 24, 2020

azure-pipelines bot commented Nov 24, 2020

snnn commented Nov 24, 2020

azure-pipelines bot commented Nov 24, 2020

snnn commented Nov 24, 2020

azure-pipelines bot commented Nov 24, 2020

snnn commented Nov 24, 2020

azure-pipelines bot commented Nov 24, 2020

wenbingl commented Nov 24, 2020

azure-pipelines bot commented Nov 24, 2020

Craigacp commented Nov 22, 2020 •

edited

Loading

Craigacp commented Nov 23, 2020 •

edited

Loading

Craigacp commented Nov 23, 2020 •

edited

Loading

Craigacp commented Nov 23, 2020 •

edited

Loading

Craigacp commented Nov 24, 2020 •

edited

Loading