Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] Initial Apple Silicon support #5891

Merged
merged 5 commits into from
Nov 24, 2020

Conversation

Craigacp
Copy link
Contributor

@Craigacp Craigacp commented Nov 22, 2020

Description:
The Java library loader needed support adding for aarch64 binaries, and the cmake update to put those binaries in the right place.

I also rearranged the CPU architecture detection in cmake/onnxruntime_mlas.cmake, as Apple Silicon Macs report their architecture as arm64 which was previously caught by the 32-bit ARM regex.

For reference the clang -dumpmachine test produces this output:

$ clang -dumpmachine
arm64-apple-darwin20.1.0

Note this PR makes the Java tests pass on Apple Silicon, however there are test failures in the core library:

1: [==========] 2265 tests from 189 test suites ran. (30822 ms total)
1: [  PASSED  ] 2245 tests.
1: [  FAILED  ] 20 tests, listed below:
1: [  FAILED  ] DynamicQuantizeMatMul.Int8_test
1: [  FAILED  ] DynamicQuantizeMatMul.Int8_test_bias
1: [  FAILED  ] DynamicQuantizeMatMul.UInt8_test
1: [  FAILED  ] DynamicQuantizeMatMul.UInt8_test_bias
1: [  FAILED  ] MatMulIntegerToFloat.Int8_test
1: [  FAILED  ] MatMulIntegerToFloat.Int8_bias_test
1: [  FAILED  ] MatMulIntegerToFloat.UInt8_test
1: [  FAILED  ] MatMulIntegerToFloat.UInt8_bias_test
1: [  FAILED  ] QAttentionTest.QAttentionBatch2
1: [  FAILED  ] QAttentionTest.QAttentionPastState_u8u8
1: [  FAILED  ] QAttentionTest.QAttentionPastState_u8s8
1: [  FAILED  ] DynamicQuantLSTMTest.SmallSize
1: [  FAILED  ] DynamicQuantLSTMTest.LargeSize
1: [  FAILED  ] MatmulIntegerOpTest.MatMulInteger_WithZero_ZeroPoint
1: [  FAILED  ] MatmulIntegerOpTest.MatMulInteger_Uint8_Int8_Scalar
1: [  FAILED  ] MatmulIntegerOpTest.MatMulInteger_Uint8_Int8_GEMV
1: [  FAILED  ] ConvIntegerTest.WithGroup_2D
1: [  FAILED  ] ConvIntegerTest.WithPadding_3D
1: [  FAILED  ] ConvIntegerTest.WithStride2_2D
1: [  FAILED  ] ConvIntegerTest.WithStride3_2D

I've not looked into these yet, and I suspect it's because of some issues in the asm which I'll need some help to figure out.

Motivation and Context

  • Why is this change required? What problem does it solve? Makes ONNX Runtime work on my M1 Macbook Pro.

On an M1 Macbook Pro clang reports:

$ clang -dumpmachine
arm64-apple-darwin20.1.0

So the regex check needs to look for "arm64" first, as otherwise it
matches 32-bit ARM and you get NEON compilation failures.
@Craigacp Craigacp requested a review from a team as a code owner November 22, 2020 19:00
@Craigacp Craigacp marked this pull request as draft November 22, 2020 19:00
@guoyu-wang guoyu-wang requested a review from wenbingl November 23, 2020 06:17
@wenbingl wenbingl requested a review from tracysh November 23, 2020 18:55
@wenbingl
Copy link
Member

@tracysh , can you share some thought about the MLAS cmake change, I personally prefer adding a MacOS ARM64 in the cmake file, instead of detecting from the dumpout.

Also, The failures in Unit test is caused the incorrect build script, or the arch difference?

@tracysh
Copy link
Contributor

tracysh commented Nov 23, 2020

The build change looks okay, but Apple has tweaked the calling convention from the standard. @Craigacp, can you try modifying onnxruntime\core\mlas\lib\aarch64\QgemmU8X8KernelNeon.S to change line 26 to the below? This packs the parameters per Apple's changes.

#if defined(APPLE)
.equ .LGemmU8X8KernelFrame_ZeroMode, 12
#else
.equ .LGemmU8X8KernelFrame_ZeroMode, 16
#endif

https://developer.apple.com/documentation/xcode/writing_arm64_code_for_apple_platforms

@Craigacp
Copy link
Contributor Author

@tracysh nope, that didn't work, the same tests fail.

@Craigacp
Copy link
Contributor Author

Craigacp commented Nov 23, 2020

There's a lot of output from the errors, the start of it looks like this:

1: [----------] 5 tests from DynamicQuantizeMatMul
1: [ RUN      ] DynamicQuantizeMatMul.Int8_test
1: /Users/craigacp/Development/onnxruntime/onnxruntime/test/providers/provider_test_utils.cc:156: Failure
1: The difference between expected[i] and output[i] is 339801.45132279396, which exceeds threshold, where
1: expected[i] evaluates to -339800,
1: output[i] evaluates to 1.4513227939605713, and
1: threshold evaluates to 9.9999997473787516e-05.
1: i:64, provider_type: CPUExecutionProvider
1: Google Test trace:
1: /Users/craigacp/Development/onnxruntime/onnxruntime/test/common/tensor_op_test_utils.cc:14: ORT test random seed: 1017963480
1: /Users/craigacp/Development/onnxruntime/onnxruntime/test/providers/provider_test_utils.cc:156: Failure
1: The difference between expected[i] and output[i] is 358001.86117720604, which exceeds threshold, where
1: expected[i] evaluates to 358003.3125,
1: output[i] evaluates to 1.4513227939605713, and
1: threshold evaluates to 9.9999997473787516e-05.
1: i:64, provider_type: CPUExecutionProvider
1: Google Test trace:
1: /Users/craigacp/Development/onnxruntime/onnxruntime/test/common/tensor_op_test_utils.cc:14: ORT test random seed: 1017963480
1: [  FAILED  ] DynamicQuantizeMatMul.Int8_test (12 ms)
1: [ RUN      ] DynamicQuantizeMatMul.Int8_test_bias
1: /Users/craigacp/Development/onnxruntime/onnxruntime/test/providers/provider_test_utils.cc:156: Failure
1: The difference between expected[i] and output[i] is 394420.71676766872, which exceeds threshold, where
1: expected[i] evaluates to -394419.40625,
1: output[i] evaluates to 1.3105176687240601, and
1: threshold evaluates to 9.9999997473787516e-05.
1: i:64, provider_type: CPUExecutionProvider
1: Google Test trace:
1: /Users/craigacp/Development/onnxruntime/onnxruntime/test/common/tensor_op_test_utils.cc:14: ORT test random seed: 1017963480
1: /Users/craigacp/Development/onnxruntime/onnxruntime/test/providers/provider_test_utils.cc:156: Failure
1: The difference between expected[i] and output[i] is 299807.25198233128, which exceeds threshold, where
1: expected[i] evaluates to 299808.5625,
1: output[i] evaluates to 1.3105176687240601, and
1: threshold evaluates to 9.9999997473787516e-05.
1: i:64, provider_type: CPUExecutionProvider
1: Google Test trace:
1: /Users/craigacp/Development/onnxruntime/onnxruntime/test/common/tensor_op_test_utils.cc:14: ORT test random seed: 1017963480
1: [  FAILED  ] DynamicQuantizeMatMul.Int8_test_bias (13 ms)

I attached the rest of the build log.
build_log.txt

@tracysh
Copy link
Contributor

tracysh commented Nov 23, 2020

@Craigacp, just to double-check: if you have just the below line:

.equ .LGemmU8X8KernelFrame_ZeroMode, 12

The code I pasted above lost the underscores from the "__APPLE__".

@Craigacp
Copy link
Contributor Author

Craigacp commented Nov 23, 2020

Yep I just finished running it without the ifdef and it works fine, give me a min to recheck it with the ifdef & underscores.

@Craigacp
Copy link
Contributor Author

Craigacp commented Nov 23, 2020

All tests pass with that fix. I formatted your fix so it fits in with the rest of the .S file.

Also it builds really fast on this computer.

@Craigacp Craigacp marked this pull request as ready for review November 23, 2020 22:38
wenbingl
wenbingl previously approved these changes Nov 23, 2020
@wenbingl
Copy link
Member

@Craigacp , did you build your project with --use_xcode, or not?

@Craigacp
Copy link
Contributor Author

My build command line is ./build.sh --update --parallel --build --test --build_java. I can rerun with the --use_xcode option.

@Craigacp
Copy link
Contributor Author

With --use_xcode I get this error:

CMake Error in external/onnx/CMakeLists.txt:
  The custom command generating

    /Users/craigacp/Development/onnxruntime/build/MacOS/Debug/external/onnx/onnx/onnx-operators-ml.proto

  is attached to multiple targets:

    gen_onnx_operators_proto
    onnx_proto

  but none of these is a common dependency of the other(s).  This is not
  allowed by the Xcode "new build system".

@wenbingl
Copy link
Member

With --use_xcode I get this error:

CMake Error in external/onnx/CMakeLists.txt:
  The custom command generating

    /Users/craigacp/Development/onnxruntime/build/MacOS/Debug/external/onnx/onnx/onnx-operators-ml.proto

  is attached to multiple targets:

    gen_onnx_operators_proto
    onnx_proto

  but none of these is a common dependency of the other(s).  This is not
  allowed by the Xcode "new build system".

This specific issue could be solved by buildsytem=1, but there will be more other issues.
So please merge PR after validation, and xcode issue will be addressed in the another PR.

@thiagocrepaldi
Copy link
Contributor

/azp run Linux CPU CI Pipeline,orttraining-linux-ci-pipeline,centos7_cpu,Windows GPU TensorRT CI Pipeline,Windows GPU CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@thiagocrepaldi
Copy link
Contributor

/azp run MacOS CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@thiagocrepaldi
Copy link
Contributor

/azp run Linux CPU CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,MacOS NoContribops CI Pipeline,Windows CPU CI Pipeline,

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@Craigacp
Copy link
Contributor Author

Looks like the Mac test was cancelled? Windows doesn't seem to be setting X86_64 like I expected, so I'll check the logic.

@Craigacp
Copy link
Contributor Author

Craigacp commented Nov 24, 2020

I've updated the Java CMake architecture detection logic. To be honest it would be nice if something in CMake would consistently set the platform and architecture so I didn't have to borrow the logic from cmake/onnxruntime_mlas.cmake, especially given the MSVC code path doesn't set the same environment variables as the non-MSVC path, but that can be tidied up later.

@wenbingl
Copy link
Member

/azp run MacOS CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@wenbingl
Copy link
Member

/azp run Linux CPU CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,MacOS NoContribops CI Pipeline,Windows CPU CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline,Linux CPU CI Pipeline,orttraining-linux-ci-pipeline,centos7_cpu,Windows GPU TensorRT CI Pipeline,Windows GPU CI Pipeline

@azure-pipelines
Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@wenbingl
Copy link
Member

/azp run Linux CPU CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,MacOS NoContribops CI Pipeline,

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@wenbingl
Copy link
Member

/azp run Linux CPU CI Pipeline

@wenbingl
Copy link
Member

/azp run Linux CPU Minimal Build E2E CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@snnn
Copy link
Member

snnn commented Nov 24, 2020

/azp help

@azure-pipelines
Copy link

Supported commands
  • help:
    • Get descriptions, examples and documentation about supported commands
    • Example: help "command_name"
  • list:
    • List all pipelines for this repository using a comment.
    • Example: "list"
  • run:
    • Run all pipelines or specific pipelines for this repository using a comment. Use this command by itself to trigger all related pipelines, or specify specific pipelines to run.
    • Example: "run" or "run pipeline_name, pipeline_name, pipeline_name"
  • where:
    • Report back the Azure DevOps orgs that are related to this repository and org
    • Example: "where"

See additional documentation.

@snnn
Copy link
Member

snnn commented Nov 24, 2020

/azp run

@azure-pipelines
Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@snnn
Copy link
Member

snnn commented Nov 24, 2020

/azp run Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS NoContribops CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@wenbingl
Copy link
Member

/azp run Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, centos7_cpu, centos7_cpu (linux_centos_ci Debug), centos7_cpu (linux_centos_ci Release), orttraining-linux-ci-pipeline, orttraining-linux-ci-pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@wenbingl wenbingl merged commit 8b83c51 into microsoft:master Nov 24, 2020
duli2012 pushed a commit that referenced this pull request Dec 2, 2020
* Rearranging checks in onnxruntime_mlas.cmake to pickup Apple Silicon.

On an M1 Macbook Pro clang reports:

$ clang -dumpmachine
arm64-apple-darwin20.1.0

So the regex check needs to look for "arm64" first, as otherwise it
matches 32-bit ARM and you get NEON compilation failures.

* Adding Java side library loading support for Apple Silicon (and other aarch64 architectures).

* Adding Qgemm fix from @tracysh

* Fixes the java packaging on Windows.

* Missed a check in the java platform detector.
(cherry picked from commit 8b83c51)
duli2012 added a commit that referenced this pull request Dec 2, 2020
* Update onnx (#5720)

* update onnx

* update docker image for testing
(cherry picked from commit 705d093)

* cherry pick PR 5720

* C#: Add CreateFromMemory to FixedBufferOnnxValue to allow bind user buffers and pass custom binary compatible types (#5886)

Add CreateFromMemory to FixedBufferOnnxValue so users can bind their own custom binary compatible buffers to feed/fetch data.
(cherry picked from commit c2d6100)

* [Java] Initial Apple Silicon support (#5891)

* Rearranging checks in onnxruntime_mlas.cmake to pickup Apple Silicon.

On an M1 Macbook Pro clang reports:

$ clang -dumpmachine
arm64-apple-darwin20.1.0

So the regex check needs to look for "arm64" first, as otherwise it
matches 32-bit ARM and you get NEON compilation failures.

* Adding Java side library loading support for Apple Silicon (and other aarch64 architectures).

* Adding Qgemm fix from @tracysh

* Fixes the java packaging on Windows.

* Missed a check in the java platform detector.
(cherry picked from commit 8b83c51)

* Add OpenVINO EP shared lib to Py Wheel (#5920)

* Add OpenVINO EP shared lib to Py Wheel

Include the libonnxruntime_providers_openvino.so/.dll to the wheel

* Follow libs.extend pattern as other EPs
(cherry picked from commit 4092686)

* Make NNAPI EP reject nodes with no-shape inputs (#5927)

(cherry picked from commit 8736865)

* Sahar/fix documentation shared lib (#5926)

* Update OpenVINO-ExecutionProvider.Md

update openvino-executionprovider.md for shared library

* Update Build.md

updated --build_shared_lib flag for building openvino shared provider lib

* Update Dockerfile.openvino

building for shared library with the new changes for openvino shared lib

* Revert "Update Build.md"

This reverts commit c9cf5fe.

* Revert "Update Dockerfile.openvino "

This reverts commit e1624e4.

* Update OpenVINO-ExecutionProvider.md

fix documentation to the shared library

Co-authored-by: sfatimar <sahar.fatima@intel/com>
(cherry picked from commit 8168c91)

* Update dockerfiles (#5929)

1. Remove conda from the images. Because conda contains a file named /opt/miniconda/lib/libcrypto.so.1.0.0 which can't pass our security scan. Also, it will be easier for us to manage the third party usage registrations.
2. Remove openssh from the images. Because the official openssh package provided by Ubuntu can't pass our security scan.
3. Reduce the image size to 1/3 by using stages. Also, because it contains less packages, it will be less often needed to update.
4. Put the LICENSE-IMAGE.txt file in right place. It is missed in current images. You can see it was added to a temp folder "/code" but it got deleted afterwards.
5. Update the CPU docker image's base image to Ubuntu 18.04. The GPU one is already 18.04. It's better to keep them the same.
6. Remove the build arg ONNXRUNTIME_REPO/ONNXRUNTIME_BRANCH. Instead, the new one always uses the local source. I feel it can reduce confusion.
(cherry picked from commit 1dbabb2)

* Add Longformer Attention Cuda Op(#5932)

Limitation: Global tokens must be at the beginning of sequence.
(cherry picked from commit 31a6be3)

* Bug fix for MaskRCNN and FasterRCNN (#5935)

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
(cherry picked from commit e39e82b)

* Fix publishing pipelines. (#5942)

Fix publishing pipelines.
(cherry picked from commit c4b55d2)

* Fix Python Linux GPU package name (#5943)

Fix Python Linux GPU package name. I accidentally added "noopenmp" to it.

(cherry picked from commit 5fdd9f0)

* Update BUILD.md with shared provider information (#5944)

* Update build instructions to include information about shared providers

(cherry picked from commit 27513d1)

* [OpenVINO]Fix memory leak in `IsDebugEnabled()` under Windows (#5948)

* w

* w

Co-authored-by: modav <modav@microsoft.com>
(cherry picked from commit e207589)

* Add support for Python 3.8+ on Windows when CUDA is enabled (#5956)

(cherry picked from commit 015fbb3)

* Support the cross compiling for Apple Silicon (#5974)

* support macos_arm64 cross compiling

* update the build docs

* update as commented.

* Update BUILD.md
(cherry picked from commit 2ec211e)

* Update docker files to put 'unattended-upgrades' in a right place(#5983)

(cherry picked from commit 3323fb6)

* Enable the xcode build for Apple Silicon (arm64 MacOS) (#5924)

* fix the build script for macos/xcode

* add the version check

* correct the osx-arch configuration

* typo
(cherry picked from commit 1852ade)

* Add python 3.9 support (#5874)

1. Add python 3.9 support(except Linux ARM)
2. Add Windows GPU python 3.8 to our packaging pipeline.

* Revert some pipeline changes in #5874

Co-authored-by: Ashwini Khade <askhade@microsoft.com>
Co-authored-by: Du Li <duli@OrtTrainingDev0.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Co-authored-by: Adam Pocock <craigacp@gmail.com>
Co-authored-by: S. Manohar Karlapalem <manohar.karlapalem@intel.com>
Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com>
Co-authored-by: sfatimar <64512376+sfatimar@users.noreply.github.com>
Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: Maajid khan <n.maajidkhan@gmail.com>
Co-authored-by: Ryan Hill <38674843+RyanUnderhill@users.noreply.github.com>
Co-authored-by: Moshe David <mosdav165@gmail.com>
Co-authored-by: Ivan Stojiljkovic <17503404+ivanst0@users.noreply.github.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
@faxu faxu removed the release:1.6 label Dec 4, 2020
@Craigacp Craigacp deleted the apple-silicon-fix branch January 11, 2021 01:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants