Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NNPA] Add numerical test for NNPA #1335

Merged
merged 108 commits into from
May 30, 2022
Merged

[NNPA] Add numerical test for NNPA #1335

merged 108 commits into from
May 30, 2022

Conversation

imaihal
Copy link
Collaborator

@imaihal imaihal commented Apr 12, 2022

This PR adds numerical tests for NNPA.

Currently tests for MatMul2D, Gemm, LSTM, and GRU are provided and run by using following command.

cmake --build . --config Release --target check-onnx-numerical-nnpa

These tests uses the same test code with numerical tests for CPU (test/modellib and test/numerial), but uses different cmake file(test/accelerator/NNPA/numerical/CMakeLists.txt).

  • Gemm
    Since alpha and beta should be one for Matmul of zDNN library, #ifdef directive TEST_GEMM_ALPHA_BETA_1 are added in test/numerical/TestGemm.cpp and set in the CMakeLists.txt (test/accelerator/NNPA/numerical/CMakeLists.txt)
  • LSTM
    Since LSTM of zDNN library does not support peephole tensor, #ifdef directive TEST_LSTM_NONEP_ONLY are added in test/numerial/TestLSTM.cpp and set in the CMakeLists.
    Currently bidirectinal LSTM is not supported in onnx-mlir for NNPA, so, it is disabled by using #ifdef directive TEST_RNN_NO_BIDIR
  • GRU
    Since GRU of zDNN library does not support LinearBeforeReset, #ifdef directive TEST_GRU_L1 are added in test/numerial/TestGRU.cpp and set in the CMakeLists.
    Currently bidirectinal LSTM is not supported in onnx-mlir for NNPA, so, it is disabled by using #ifdef directive TEST_RNN_NO_BIDIR

To use these #ifdef directives, this PR updated add_onnx_mlir_executable() in MLIR.cmake.

  • ATOL and RTOL for accuracy check
    ATOL and RTOL need to be changed for tests. They are configured by environment variable TEST_ATOL and TES_RTOL in CMakeLists.txt.

  • Verification that zDNN instruction are generated in shared library
    checkInstructionFromEnv() in ModelLibBuilder verifies that the instruction specified the environment variable is included in generated shared library. The instructions such as zdnn_matmul_op, zdnn_lstm, and zdnn_gru are specified in CMakeLists.txt.

  • Test results on a machine with NNPA (I removed the elapsed time because I don't know if I can publish it here.)

Test project /home/trl/imaihal/onnx-mlir/build/test/accelerators/NNPA/numerical
    Start 1: TestMatMul2DNNPA
1/4 Test #1: TestMatMul2DNNPA ................   Passed
    Start 2: TestGemmNNPA
2/4 Test #2: TestGemmNNPA ....................   Passed
    Start 3: TestLSTMNNPA
3/4 Test #3: TestLSTMNNPA ....................   Passed
    Start 4: TestGRUNNPA
4/4 Test #4: TestGRUNNPA .....................   Passed

100% tests passed, 0 tests failed out of 4

imaihal added 13 commits April 8, 2022 03:27
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
@imaihal imaihal marked this pull request as ready for review April 13, 2022 08:45
imaihal added 2 commits April 14, 2022 04:09
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Copy link
Collaborator

@AlexandreEichenberger AlexandreEichenberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes are good. I would like for you to add a document page on AccelNNPAHowToUseAndTest.md and list the explanation that you gave on the conversation. Over time, this page will add more info.

Please make this addition part of this PR. Thanks

@imaihal imaihal changed the title Add backend and numerical test for NNPA [NNPA] Add numerical test for NNPA Apr 19, 2022
@imaihal
Copy link
Collaborator Author

imaihal commented Apr 19, 2022

I will remove backend test from this PR because we need to discuss how to set ATOL and RTOL for backend test. We need to set them to pass backend test on NNPA.
Update: Created PR for backend test #1415

imaihal added 6 commits April 19, 2022 07:10
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
@tungld tungld marked this pull request as draft April 22, 2022 08:30
@imaihal
Copy link
Collaborator Author

imaihal commented May 24, 2022

@jenkins-droid test this please

Copy link
Collaborator

@AlexandreEichenberger AlexandreEichenberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its going in the right directions, and the new infrastructure about testing for the ops is very helpful. Thanks

test/numerical/TestGRU.cpp Outdated Show resolved Hide resolved
dptr = (int *)dlsym(handle, instructionName.c_str());
if (dptr == NULL) {
printf("%s\n", dlerror());
return false;
Copy link
Collaborator

@AlexandreEichenberger AlexandreEichenberger May 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at the code in ExecutionSession builder, there is already code that can load a library without being specific to windows or linux.

  _sharedLibraryHandle =
      llvm::sys::DynamicLibrary::getPermanentLibrary(sharedLibPath.c_str());
  if (!_sharedLibraryHandle.isValid())
    throw std::runtime_error(reportLibraryOpeningError(sharedLibPath));

The execution session also look for specific entries, so I would investigate if we can reuse the OS agnostic methods developed by LLVM.

IN particular, we also look there if a given function is found or not, for example

  _queryEntryPointsFunc = reinterpret_cast<queryEntryPointsFuncType>(
      _sharedLibraryHandle.getAddressOfSymbol(_queryEntryPointsName.c_str()));
  if (!_queryEntryPointsFunc)
    throw std::runtime_error(reportSymbolLoadingError(_queryEntryPointsName));

test/numerical/TestGRU.cpp Outdated Show resolved Hide resolved
test/numerical/TestGemm.cpp Outdated Show resolved Hide resolved
test/numerical/TestLSTM.cpp Outdated Show resolved Hide resolved
test/numerical/TestLSTM.cpp Outdated Show resolved Hide resolved
test/numerical/TestMatMul2D.cpp Outdated Show resolved Hide resolved
imaihal added 7 commits May 25, 2022 09:07
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
This reverts commit 6bb232f.

Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
@AlexandreEichenberger
Copy link
Collaborator

@imaihal let me know when you have all your changes in, and I will review again.

@imaihal
Copy link
Collaborator Author

imaihal commented May 27, 2022

@AlexandreEichenberger Yes. Now ready for review again. Thanks!

Copy link
Collaborator

@tungld tungld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tungld I created checkSharedLibInstruction() to verify instructions in shared library. We can verify that zDNN instructions are included automatically by using the function.

Thanks so much! It's useful.

The current code looks good to me with some minor changes to make things consistent. Please wait for Alex's input too.

In the future, we can improve the code with two things:

  1. checkInstructionFromEnv should support multiple instructions since one ONNX op may call multiple zDNN ops.
  2. gemm.compileAndLoad() && gemm.checkInstructionFromEnv(). compileAndLoad has loaded the shared library, no need to load it again in checkInstructionFromEnv. To do that, we should edit ExecutionSession so that we can get the loaded library from it.

std::string instructionName =
getenv("TEST_CHECK_INSTRUCTION") ? getenv("TEST_CHECK_INSTRUCTION") : "";
if (instructionName.empty())
return true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks redundant since checkInstruction(instructionName) also checks empty string.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please pass the env var name as a parameter. Its easier to read all the param in the specific test file, e.g. TestMatMul2D....

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks!

@@ -92,10 +93,15 @@ int main(int argc, char *argv[]) {
const auto aTrans = *rc::gen::inRange(0, 2);
const auto bTrans = *rc::gen::inRange(0, 2);
const auto cRank = *rc::gen::inRange(1, 3);
#ifdef TEST_ALPHA_BETA_1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to use a more specific name, say TEST_GEMM_ALPHA_BETA_1.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks!

@@ -70,14 +80,23 @@ int main(int argc, char *argv[]) {
// Whether initial value of the cell(initial_c) is specified.
const auto isNoneC = *rc::gen::element(0, 1);
// Whether the weight tensor for peepholes(P) is specified.
#ifdef TEST_NONEP_ONLY
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to use TEST_LSTM_NO_P or TEST_LSTM_NO_PEEPHOLE to make it consistent with the other environment variables.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks!

LINK_LIBS PRIVATE ${TEST_LINK_LIBS}
)

add_numerical_unittest(TestGemm-NNPA zdnn_matmul_op
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we check multiple zdnn functions here? For example, some GEMM ops will use both zdnn_matmul_op and zdnn_add_op. It's ok to implement this with another PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comments!
It is possible to check multiple functions, but currently it is not easy to select one function from those multiple functions in each testcase.
For example in GEMM, zdnn_matmul_op and zdnn_add_op can be checked in all testcases of GEMM, but it is not possible to select only zdnn_matmul_op in some testcases and zdnn_matmul_op and zdnn_add_op in the other testcases. I think we need another config for each testcase.
So, I would like to set this TODO for now.

@AlexandreEichenberger
Copy link
Collaborator

gemm.compileAndLoad() && gemm.checkInstructionFromEnv(). compileAndLoad has loaded the shared library, no need to load it again in checkInstructionFromEnv. To do that, we should edit ExecutionSession so that we can get the loaded library from it.

definitely, just add

llvm::sys::DynamicLibrary &getDynLibraryHandle() { return _sharedLibraryHandle };

Copy link
Collaborator

@AlexandreEichenberger AlexandreEichenberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there... please implement the changes suggested by Tung and I. The added functionality of handing a list of instructions to check should probably let for a subsequent PR.

I will approve it now so that you may merge it once the above changes are made. Big thanks.

std::string instructionName =
getenv("TEST_CHECK_INSTRUCTION") ? getenv("TEST_CHECK_INSTRUCTION") : "";
if (instructionName.empty())
return true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please pass the env var name as a parameter. Its easier to read all the param in the specific test file, e.g. TestMatMul2D....

if (instructionName.empty())
return true;
std::string sharedLibName = getSharedLibName(sharedLibBaseName);
llvm::sys::DynamicLibrary sharedLibraryHandle =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please extract the library from the ExecutionSession model variable onnx_mlir::ExecutionSession *exec;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks!

@@ -40,8 +40,9 @@ bool isOMConvTheSameAsNaiveImplFor(const int N, const int C, const int H,

Conv2DLibBuilder conv(SHARED_LIB_BASE.str(), N, C, H, W, kH, kW, autoPad,
pHBegin, pHEnd, pWBegin, pWEnd, stride, dilation, isDynamic);
return conv.build() && conv.compileAndLoad() && conv.prepareInputs() &&
conv.run() && conv.verifyOutputs();
return conv.build() && conv.compileAndLoad() &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks SOOO much better, thanks for the changes.

imaihal added 6 commits May 30, 2022 02:04
…ction

Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
@imaihal imaihal merged commit 042031f into onnx:main May 30, 2022
@imaihal imaihal deleted the nnpa_tests branch May 30, 2022 10:18
@jenkins-droid
Copy link
Collaborator

Jenkins Linux ppc64le Build #5206 [push] [NNPA] Add numerical tes... started at 06:22

@jenkins-droid
Copy link
Collaborator

Jenkins Linux s390x Build #6084 [push] [NNPA] Add numerical tes... started at 06:19

@jenkins-droid
Copy link
Collaborator

Jenkins Linux amd64 Build #6069 [push] [NNPA] Add numerical tes... started at 05:19

@jenkins-droid
Copy link
Collaborator

Jenkins Linux s390x Build #6084 [push] [NNPA] Add numerical tes... passed after 1 hr 0 min

@jenkins-droid
Copy link
Collaborator

Jenkins Linux ppc64le Build #5206 [push] [NNPA] Add numerical tes... passed after 1 hr 13 min

@jenkins-droid
Copy link
Collaborator

Jenkins Linux amd64 Build #6069 [push] [NNPA] Add numerical tes... passed after 1 hr 14 min

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants