-
Notifications
You must be signed in to change notification settings - Fork 323
Informal meeting agenda and notes
A direct link is here.
Our weekly informal meetings are held on zoom on the first Tuesday of each month at 8-9pm EST
Recording of our old meetings are found on this public box folder. New meeting recording are under the LFX website.
Present: Alexandre, Liam, Lixi, Megan, Philip, Soren, Tong
Discussion:
- Liam: Continued effort to update TOSA effort, some work on minimal padding for Conv/Pooling ops. Alex: please let us know if current shape inference support is lacking.
- Lixi: work on OpenMP, nearly ready for a 1-level demo.
- Philip: PR needs to be rebased to add upgrades for OpSet 19
- Megan: debug a model, add verification for type/shapes for matmul.
- Tong: custom op PR, will remove some options not yet used, added justification for usage
- Soren: draft for cast op upgrades, tried to add support for smaller floats, will ask Gong if need help disabling some tests for machines that don't support shorter floats (e.g. s390)
- Alex: IsInf cannot easily be simonized due to type size reduction (float -> bit), which is not really supported on most SIMD machines. Will revisit if important for perf. Will work with Philip to come up with a presentation for ONNX meeting; Philip to attend in person, Alex to verify that his registration went through.
Present: Alexandre, Haruki, Kiyo, Lixi, Philip, Soren, Tong, Tung, Yasushi
Discussion:
- Alexandre: have to restart the rotation for LLVM updates; Soren suggested that folks & Companies that have done it take the responsibility do do it or find suitable replacements. No objections. Alex to contact folks to restart work. Philip completed the current cycle (thanks).
- Tong: present PR for Custom Op, how shape inference is computed, how function calls are generated. Generally well received, technically sound. Minor suggestion to use
dtype
for one attribute (Soren), question on how to deal with operations that take shape info from attribute (Tung, no solution at the present for this). Soren/Alex asked for a motivation example, to better understand the context in which it will be used. - Tung: report experience on big models, with issues for large constants that boost binary size to more than 4GB. Idea to save constant in file, suggestion to use memory mapped interface for constants.
- Soren: deal with some similar issues with element attributes, want to save in files; potential to use a similar mechanism.
- Tung: also need to make the constant loading thread-safe; Tong propose looking at the shared-pointer interface in C++ (Question: we also support C interface, thin layer possible).
- Lixi: report progress on testing OpenMP in mlir and use it within onnx-mlir.
- Philip: thanks to Gong for updating the testing support, will now be able to make progress on 1.19 ONNX ops release.
Present: Alexandre, Haruki, Megan, Philip, Soren, Tong, Tung, Yasushi
Present: Alexandre, Haruki, Lixi, Megan, Michael, Philip, Ram, Soren, Tung, Yasushi
Discussion:
- Philip: waiting for Gong to help with the upgrade for handling the benchmarks in the next OpSet.
- Soren: function inlining, local model functions will be inlined, Philip to add switches to select which one to keep in the future.
- Soren: new format for lit-tests using experimental format; Michael: format will stay for good as used to create model functions.
- Tung: improved Dynamic analysis; added splitting matmul for large matrices for NNPA; Groq has similar functionality when lowering in their own dialects, keeping Tung's operations at the ONNX to ZHIGH is fine.
- Tung: added TRILU operation.
- Yasushi: measure perf for Bidaf, need to add support for strings.
- Haruki: investigate perf of foundation models.
- Gong: starts to work on the changes for benchmarks in the next ONNX release.
- Lixi: worked on ONNX-MLIR setup for M1 Macs.
- Michael: looked at torch Dynamo exporter in PyTorch 2.0; has exporter to ONNX local functions, will provide examples for us to test.
- Michael: will look at ONNX to LinAlg, wanted to know if its ok to start from Tong's PR.
(see video)
Present: Alexandre, Haruki, Megan, Michael, Kiyo, Philip, Ram, Soren, Tong, Tung, Yasushi
Discussion:
- Tung presented on the
RunONNXModel.py
andRunONNXModelZoo.py
scripts, demonstrating how to run, verify using ORT or with results of different set of options. - Alex: mentioned that another script
build-run-onnx-lib.sh
is also useful when using GDB as the model is linked in a single test program all in C++. - Tung: has a prototype for using opaque pointers as required by LLVM
- Philip: will work on the last opset 18 op this week. Suggest that the shape inference does not assert on ops it does not know. One approach is to run shape inference with a mode (return success on unknown ops / error). Another approach is to always return success, and have a different pass that check for ops that we don't know.
- Soren: has a PR 2185 for constant propagation that recommend a new onnx call/return for better shape inference handling. Michael may have reservations, will check the issue.
- Megan: completed the LLVM merge, thanked help from Tung and someone from ByteDance.
- Tong: continue with work on element wise kernel merge.
- Yasushi: has a PR 2164 and request feedback
- Haruki: looked at Roberta performance and has a related PR.
- IBM (Charles topics to be presented by Alex):
- Proposed LLVM schedule through mid October
- ETA on next ONNX release (1.14)?
- https://onnx.ai/onnx/operators/ shows OpSet 19 even though 1.14 hasn't released yet. Is that normal or does it mean they are near release?
Notes found there.
Present: Alexandre, Chen, Haruki, Gong, Kiyo, Megan, Michael, Philip, Tong, Tung, Soren, Yasushi
Discussion:
- Philip: NNPA file with reference to zdnn header file break build. Answer: set the right CMAKE flag (
-DONNX_MLIR_ACCELERATORS=NNPA
), should work. Will open issue if does not work. - Alex: status about opaque pointers. Tung: needed in KRNL to LLVM (for
OMTensors
, need type and shape passed via pointers), so this will break our current scheme. - Chen (Bytedance): want to contribute to LLVM update. Alex pointed him to the wiki page to add his name.
- Philip: within one or two ops for release 18, has to do some NNPA work to get it finalized.
- Soren: Try to remove operations with neutral elements (e.g. add 0, mul 1...). PR 2114. Test for large constant equality: SSA is used for values and attributes, difficult for large constants. Agreed it's ok to have to explicitly compare, not just do pointer compare.
- Tung: work on quantized model, adding missing ops.
- Yasushi: problem with Bidaf9 in model zoo (#2084) maybe too many strings, +70k, currently crashes. If anyone has ideas, please provide on issue.
- Tong: Worked on node location in the frontend translator.
- Haruki: investigate accuracy issues (#1933), ideas are welcome.
- Alexandre: tool to scan
.s
and.so
for simd code (utils/analyse_simd.py
), and simd models for x86 & IBM Z.
Present: Alexandre, Charles, Gong, Imurillo, Maximilian (and team), Megan, Mike, Philip, Soren
Discussion:
- Maximilian: LLVM update went well, there are opaque ptr that are default on, new thing for LLVM. Disabled it in MLIR for now as they opaque ptr cannot be used as is in Krnl to LLVM pass as of now.
- Philip: would like to have mechanisms to select which patterns to use. Will take a look at TorchMLIR decompositions for the pattern they use.
- Soren: Update to opset 18: reduction uses axis attribute and input. Refactor to have code only for one (input) to avoid redundancy.
- Soren: Hybrid pattern base shape inference works, disabled by a flag.
Present: Gong, Haruki, Kiyo, Michael, Philip, Soren, Tong, Tung
Discussion:
- Phillip: working on opset 18 transition.
- Soren, Tong: a hybrid pass for onnx-to-onnx transformation: decomposition, shape inference, op fusion, constant propagation, canonicalization
- Michael: will resume onnx-mlir work soon.
- Gong: improved onnx model zoo report to handle timeout. Timeout is set to 30 minutes by default.
- Yasushi: issue with bidaf model, LLVM tools failed to handle very large array of strings => out-of-memory
- Haruki: ConvTranspose is now supported in onnx-mlir. Donot decompose ConvTranspose if MHLO is enabled since MHLO uses its specific decomposition.
- Tung: Presentation of ShapeTransform operation for optimizing data transformation
Presentation of ShapeTransform operation PDF link
Present: Alex, Charles, Gong, Haruki, Kiyo, Megan, Michael, Philip, Soren, Stella, Tong, Tung, Weon
Discussion:
- Stella: plea for more MLIR/LLVM to ONNX-MLIR volunteers, ideally 2 per company. Alex suggested that we wait until Groq/AMD have gone through one round and ask if they would be willing to up it to 2 folks per cycle.
- Stella: added advice to volunteers, namely that sometimes the Window build run out of time. Simply re-launch the update and it will complete, as the MLIR/LLVM build, which takes a lot of time, is cached.
- Soren: Wants to decouple disposableElementAttribute from the ONNX dialect, as they need it internally at Groq for another dialect. That made sense to all of us. Soren suggested creating better interfaces to access key data; Tung/Alex suggested possibly creating a new dialect (possibly in the MLIR dir, or a new one) that has only the constant operations relating to disposable element attributes, so that they may be reused by "any" dialects.
- Tung mentioned the new shapeTransfer operation, which let us combine data transformations. Tung will make a presentation next time.
- Philip completed the Opset 1.18 PR, with more to come for Split and another op.
- Charles mentioned creating an issue to track all the must-fix issues prior to cutting a branch associated with the new Opset.
- Alex mentioned progress on SIMD for elementary ops.
Present: Alex, Gong, Haruki, Kevin, Philip, Soren, Stella, Tong, Tung
Discussion:
- Discussion on tests and performance measurements. Windows currently does not run Backend Tests (consensus: probably ok as they run elsewhere). Issue for performance as they need dedicated servers. Gong states that all our CIs run on shared machines. Alex states that we have support for performance testing (see
make check-onnx-perf
) of simple operations. Philip states that LVM has support for performance estimation (MCA? Not sure if I got the acronym correctly). - Philip: will complete PR for opset 18; Gong asked if we can migrate to 18.1 (all agree).
- Tong reported on progress with enabling Fold; Soren mentioned the approach in his old PR, Tong to look at again.
- Tung: mentioned work on Bertsquad model, issue with custom ops defined/used in ORT. Also mentioned his new work on Shape Transform to optimize consecutive transforms.
- Alex did a quick demo of
utils/fixLitTest.py
to analyze lit test (-t
option to run each test in a lit file individually;-tdf xxx
to run only one test (xxx
) and get the commands to manually compile and compare, and finally the-r
flag to repair tests that fails).
Present: Alex, Charles, Gong, Philip, Maximilian, Dominic, Tiago, Luis (AMD: please let me know of the spelling, apologies if I got it wrong)
Discussion:
- Maximilian: question about whether flagging folks for reviewing is fine. Absolutely.
- Luis: question about LLVM update cadence: goal is every two months for a given developers. Meaning we would like to have about 8+ folks performing this task.
- Mike: working on LLVM update and bugs occurring on Jenkins, getting the environment working for that.
- Philip: question to AMD about their need for ONNX preprocessing. Imurillo: due to two reasons: one for handling quantization and two for bugs in the torch to onnx converter (would like to push fixes there eventually). Luis: questions about how to get/see which model zoo compile. Gong: scan the zoo repo for .onnx file as the manifest is not up to date. Alex: exclude some models that have unsupported ops (many quantization and small int benchmarks).
- Philip: is LLVM good at auto-simd? Alex: in my experience, it is very good at handling vector operations introduced by the Vector dialect, less so when doing auto-simd.
- Alex: ONNX steering committee created a new Compilers SIG group, Philip and Alex to co-lead. Will have one meeting a month, currently scheduled as part of an onnx-mlir open source meeting.
Present: Alex, Charles, Gong, Haruki, Kevin, Kiyo, Philip, Rama, Soren, Tong, Tung, Yasushi
Discussion:
- Charles: presents changes to amTensorList interface. No-one has issues.
- Charles: discussions about releases. Philip suggests that they are timed with upgrades to opsets (meaning when we can ingest the new operations, not necessarily lower them to all backends).
- Tung: constant attributes, when emit ONNX IR, printout has disposable attribute, which cannot be re-imported into onnx-mlir. Soren will look into the issue.
- Tong: PR for folding exposed some issues. Soren suggested to look at his PR for inspiration.
- Gong: issue about download error; disabled and will update with ONNX (new release).
- Stella: addresses the topic of warning as errors. Folks are generally in favor. MS has its build clean on Windows. Suggest that we try to have bots that build on Linux with our settings to force LLVM/MLIR developers to address warning when pushing LLVM/MLIR code. Gong was going to look into it. Stella to give pointers. Philip: LLVM gives sometimes deprecation warnings, it would force us to deprecate right away (more work now, less later).
Present: Haruki, Kiyo, Philip, Soren, Tung, Yasushi
Discussion:
-
Philip: links to some models in the python backend test were broken. ONNX will provide a patch soon. Stella created a PR to temporarily disabled these models in onnx-mlir
-
Haruki: talked about his PR about ConvTransposeOp. Phillip & Tung suggested to have a new pass in
src/Transpose/ONNX
-
Soren: constant propagation is slow for f32-f16 conversion. Will SIMDize the conversion (perhaps, for x86 first).
-
Tung: mentioned that Stella updated LLVM commit in onnx-mlir. Thanks Stella!
Present: Alex, Haruki, Kiyo, Philip, Rama, Soren, Stella, Tong, Tung, Yasushi
Discussion:
- Stella: participation from 6 folks for the LLVM update, ideally we would have a few more.
- Philip: progress with the Opset 18. Short discussion about handling of functions. Current proposal is to do it in the front end under directives (opt in and opt out list as string). Stella recommended use of config file (clang has support for this). Alex suggested that there is also a high level control (triple, cpu, machine, target can trigger a custom list).
- Tung: Issued a PR for handling of conversion pattern. No objections voiced.
- Alex: new shape inference analysis hooked to the new mechanism.
- Alex: mentioned that ONNX is considering creating a compiler SIG. No opinion voiced either way.
- Alex: current webex is tied to an individual, not great when going in vacation. Will investigate either zoom or account for meeting.
Present: Alex, Haruki, Kevin, Nathaniel, Soren, Tong, Tung, Yasushi
Discussion
-
Alex for Stella: please add name/companies for LLLVM schedule here. IBM will contribute, Nathaniel indicated that he can reach out to MHLO folks, Alex will reach to AMD folks.
-
Soren: Posted an issue about possible slowdown of
onnx-mlir
due to printing of large arrays of constant (issue #1963). Temporary solution is listed there. Soren will have a long term fix soon.DisposableElementAttr
are very efficient in general, when we print (e.g. for someonnx-mlir
target and/oronnx-mlir-opt
, these attributes are printed asDenseElementAttr
. If re-ingested, they will remain asDenseElementAttr
(along with the memory overhead they have). If constant shape analysis is performed, constants will be transformed back toDisposableElementAttr
, but memory used byDenseElementAttr
will not be freed. There is one pass along in theonnx-mlir
driver that appears to use/result in some slowdown, that is what Soren will address. -
Tong mentioned that there is a bug when re-ingesting mlir files, he will open a PR for this.
-
Tong is looking into one shot buffers to be used in onnx-mlir and encountered an issue that he is solving now.
-
Tung is working on better memory copying (in transpose) to move from individual load/store to blocks of memcpy. Reports good results, discussion ensued on how to possibly make the optimization more general as many other places may benefit from it.
-
Yasushi is working on ONNX Unique Op and investigate an inspector/executor and will consider also over-allocating data first, computing the unique elements, and then possibly copying the relevant data into a smaller memory buffer (if required).
-
Haruki is working on the ONNX ConvTranspose op.
-
Nathaniel has taken the first slot for porting the green LLVM build into onnx-mlir
-
Alex has added Index Expression support for float as needed for some ONNX ops (e.g. Resize). Working out a few bugs before asking for reviews.
Present: Alex, Shay, Charles, Ferdinand, Liam, Luis, Max, Megan, Philip, Soren, Stella, Tong, plus folks in meeting rooms without a direct connection.
Discussion
- AMD presented their folks as well as their onnx-mlir effort. They use onnx for their production compiler and translate onnx dialect to TOSA. Current coverage is about 90 patterns for torch-mlir to TOSA, 150 for TF to TOSA, much fewer for onnx-mlir. Currently focusing on static shapes. Interested in contributing more of their internal code to onnx-mlir, very interested in the effort to keep onnx-mlir in sync with LLVM.
- Alex: For any new backend such as TOSA, we are interested in as complete a coverage as possible, with well defined support. Ops can be documented using this script, introduced in PR 1475.
- Luis: ONNX is not able to handle many of the quantization patterns used by AMD.
- Alex: please contribute to the ONNX roadmap effort.
- Philip: How much of your code is internal and what is the intention for the future?
- Luis: We will contribute more ops based on op prioritization.
- Stella: opened issue on how to have a schedule for LLVM updates
- Philip (Grow): will drop PR for opset 18, will be 2-3 ops still to update, including split with an extra param.
Present: Alex, Haruki, Kevin, Megan, Nathaniel, Philip, Soren, Stella, Tong, Tung, Leon
Discussion
- Soren spoke about the status of his PR, which are almost ready to be merged. Discussion about Krnl.Constants
- Tong spoke about his experiment with Linalg and asked MS for their insight for OneShot memory handling.
- Philip spoke about the process of upgrading to OpSet 18. Proposal is that when an op changes too much, we keep a version of the old so that folks upgrading the op do not have to upgrade all different backends. Well received.
- Alex spoke about shape inference, discussion ensued about having builders that will automatically upgrade the output type of the operation.
- Stella proposed a rotation for the upgrade of LLVM. MS proposed to do 1 month at a time, Gorq is onboard with this idea, and so is IBM. Proposal is to model the upgrading process like mlir PyTorch, but at a cadence of once every 2 weeks. The person updating LLVM does not necessarily does all the work as there are sometimes very difficult upgrades. Keeping a good process for communication is key so that expertise of others can be easily tapped.
- Stella also proposed to have a tighter system to verify packages to avoid issues that may have arisen elsewhere. Proposal will come soon.
- Stella presents their current assessment of ONNX to LinAlg transform
- Alex presents the new scheme for Shape Helpers.
- Discussion on meeting time that includes Europe.
Discussion
- Presentation by Alex of the new infrastructure for shape helper. Presentation is here.
- Presentation by Microsoft of their effort to convert ONNX dialect to Linalg, with examples for convolution, matmul.
Discussion
Follow up discussion about the constant proposal by Soren. Tung is happy about the follow up on constant folding. Tung suggested that we can use the constant folding method to perform in the actual folding in the constant propagation phase; constant propagation is still needed for using associativity/commutativity properties of the ops.
Tong reported on his ongoing experiment with ONNX to LinAlg, Gong reported on fixing the build and performing more tests before pushing new docker images, Tung presented his interface for symbolic analysis for determining identical runtime (aka question mark) values during shape inference. Alex reported on his progress to the upgrade of shape helper data structures
Present: Alex, Gong, Haruki, Kiyo, Philip, Soren, Tian, Tong, Tung, Yasushi
Discussion
Present: Alex, Gong, Haruki, Kiyo, Philip, Soren, Tian, Tong, Tunk, Yasushi
Discussion
Presentation by Soren of the new approach to constant. Reported significant reduction of memory usage (both peak and average) as well as significant speedup of the constant propagation optimization. Using the current interface for constants, Soren proposed to both have memory storage as well as accessor data structure, which may include strides/operations. For example, if an operation add "+1" to an existing constant X, instead of generating a new constant Y, the accessor data structure "save" the operation (+1) in its representation so that, when access via that accessor structure, the +1 operation will be performed on the fly. All constants are in a pool and there is a garbage collection method that removes constants that are not accessible, by scanning the module at the end of an optimization.
A few questions were raised. First is how to print the data: one approach is to transform it back to the regular constants, another is to create a custom "assembly" format so that it may be printed/re-ingested using a custom method.
Tung also asked about its interaction with constant folding. Soren will look into that for next week.
DisposableElementsAttr Presentation PDF link
Present: Alex, Stella, Brad, Gong, Haruki, Kiyo, Philip, Rama, Soren, Tian, Tong, Tunk, Yasushi
Discussion
- Rama discussed about the need for model local functions. There is currently some support in the frontend, Tong mentioned there is support for inlining also in MLIR.
- Discussion ensued for type and shape inference. Types need to be determined in MLIR, a local function may be called at several sites with different types. There is some need for cloning depending on types and possibly shapes.
- Philip indicated that a facility is needed to selectively inline, depending on the target.
- Alex indicated that if inlining is done within MLIR (pending useful support), it might be easier to develop target-specific set of rules that can be applied. Agreed that support for customization for different target is needed.
- Stella concluded that there is general agreement on supporting model local function and they will investigate ways to go forward. Stella/Alex: potentially starting to inline all functions and progressively backing off from that "safe" heuristic.
Agenda:
- Presentation on accelerator infrastructure and NNPA by Tung.
Present: Alex, Brad, Chengji, Haruki, Ian, Kiyo, Michael, Nathaniel, Philip, Soren, Stella, Tian, Tong, Tung, Yasushi
Discussion
- Tung presentation on Accelerator support for NNPA.
- Soren: new element attribute that is ready for review. Will test will also be shared with MLIR and use it as proof of concept.
- Tung: created analysis of unknown dimensions
- Philip: Can we move faster on supporting onnx 18; can we be pulling it the latest as it comes. Philip's goal; 18 has a lot of functions and it will be a pretty big change. Also changes
- Soren: issue open by Stella about model local function. Otherwise the current draft has some support for them, and we should have it backed up with tests. Some of the functions are currently caught as operations directly, such as greater-or-equal.
Agenda:
- Discussion led by Microsoft on their teams goal for onnx-mlir.
Present: Alex, Brad, Gong, Haruki, Ian, Kevin, Kiyo, Philip, Soren, Stella, Tian, Tong, Tung
Discussion
- Presentation on the desire of Microsoft to lower ONNX dialect to MLIR dialects such as Linalg (primary target) and Affine. Microsoft wants to use a common dialect for optimizations of models coming from multiple sources, including ONNX. Some discussion ensued about specific aspects, such as needs for supporting dynamic dimensions and maps needed for certain accelerators. Interest from IBM too to get to a more standard MLIR approach, requested info on how well Linalg performs for CPU targets, say for example on MatMul. Reception was generally positive. Philip suggested that we possibly ask contributors to conversion passes (ONNX to xxx) to regularly report on their progress so that we can be aware of the progress of the respective projects.
Present: Tung, Gong, Tong, Kiyo, Philip, Soren, Kevin, Haruki, Alex, Brad
Discussion
- Alex: PR https://github.com/onnx/onnx-mlir/issues/1795 about building ONNX-MLIR without kernel and rt
- Soren:
- Inlining: PR to inline, and it would be helpful because we don't really have shape inference across functions. By inlining, we get all of this.
- Hybrid analysis: Discussion about functions, loop, and scan.
- Constant propagation: Dense element attributes use blobs, and we can free them but no support to free them. It uses an interface Resource DenseElementAttribute it can then work more easily. It does not do anything new in term of memory management. Soren added a resource pool, and at the end we can run through the all the nodes and find them and free them. Will be cleaner and more like MLIR.
- The idea that Soren had to use them broadly between passes with a garbage collector, then it does automatically clean up. We then need to print them in the lit tests. Its printed using the shared element attribute, hide that it is different in the lit test print. Tung stated that they have a dialect for resource management, that they may use. Soren: that does not seems to do the job that we need.
- Step 3: do lazy constant propagation, you could record the list of operations, do a tree of computation. That can do more to reduce memory. Right now, we could create a lot of intermediate results that need memory that is only freed at the end of the computation.Tung there is also a construct and fold, currently we don't support it in ONNX but it would be good.
- First step: replace buffer pool with dense attribute to be more MLIR, which should be a substitution of
Present: Tung, Gong, Tong, Kiyo, Philip, Soren, Kevin, Haruki, Alex
Discussion
- Soren: shape inference: tested a hybrid mode where shape inference is implemented as rule based, so that it may be integrated with canonicalization
- Looking into adding constant propagation, but has to remove buffers. Initial idea is to have "dense element attributes" (is that right), and lower it after ONNX dialects are done.
- Discussion about passes at op, function, or module level. TongL Use of nested pass manager does the recursion for us.
- Soren: question about code gen from protobuf to onnx; instead of calls, what about doing the inlining? Soren: create an issue to describe this issue.
- Tung still working on shape inference using IndexExpr, carry question mark as unique value to make.
- Tong: new PR for lowering of concatenate. Current code is correct, but when lower to loop we lose some semantic. Try to reuse same symbol to enable more optimization (e.g. loop fusion).
- Philip: upstreaming 7 or 11 have landed, the last 4, one has to do a lot of exporters so to accept a poorly formed graph. 3 changes for shape inferences (upsample, transpose,...). One controversial is to add some logic conv shape inference,
Present: Tong, Gong, Kevin, Kiyo, Haruki, Tung, Alex, Philip, Soren, Tian.
Discussion
- Tung: now we have 13 failures to 28. We need a notification, with an email when something change. Gong: there are new tests, so it might not be regressions. Gong: will add some history, it runs once per merge (into main). Tung: we have a list of unsupported and the new benchmarks may need to go in that list.
- Soren: Tung explained how to do constant without buffer using dense attributes, would it be welcome to do, but will give it a try.
- Soren: Make progress for a long chain of dependences between constants prop, shapes, rewrote the shape inference pass so that it works with the pattern, to combine canonicalization, propagation, shape inference, shape inference. It can be called into one pass. Tong: shape inference is on the function, some others are on the operations. It is possible to nest passes. Soren made progress to do shape inference as an operation per operation. Tong: shape inference needs context; Soren: try to put a PR to do it on an operation. Tung: we can call multiple passes in a single pass (https://github.com/onnx/onnx-mlir/blob/main/src/Transform/ONNX/SimplifyShapeRelatedOps.cpp).
- Philip: that was a model from hugging face; we have done a mechanism to pull stuff from hugging face and import into onnx-mlir to do some testing.
- Tung: still working on shape inference. If the output is unknown dim, we return -1. We cannot distinguish 2 unknown operations. We want to be able to carry shape inference in a symbolic. Idea is to use IndexExpr question mark.
- Gong: still looking into memory leak for one of the tests, into TestGemm. They call compile-module, and there is some leaks inside the compiler. Have a version with a fork, and the parent does not have any leaks, the child process has more losses. Created an issue, there may be a few issues from the model itself.
- Haruki doing work for instrumentation.
- Philip: pushed PRs to reduce divergence between forks. We have some extra optimization that are trying to push
- Soren: how do we know about the dependence with ONNX. Gong the dep is inside third_party/onnx, we can do a git log there to find out.
Agenda:
- Philip Lassen requested some times to present thoughts to upstream Groom's change. Issue. May have been fully covered on Sept 20.
- Benedikt Mandelkow requested some time to present his use case. Issue 1668 include the presentation that he will use.
Present: Alexandre, Benedikt, Gong, Haruki, Kevin, Kiyo, Philip, Soren, Tian, Tong, Tung
Notes:
- Benedikt request for types, simplify language interface by using typed interface. Folks suggesting using JSON signatures to generate language interface calls.
- Benedikt request for output in languages that can be further compiled. Suggestion was to use/develop/integrate into MLIR infra as operations needing translation to high level languages mostly would use dialects "owned" by MLIR.
- Team talked about their contributions and their next steps.
Agenda:
- Benedikt Mandelkow requested some time to present his use case. Issue 1668 include the presentation that he will use. Did not present, maybe reported to next time.
Present: Alexandre, Arash, Chengji Yao, Chongsong Chen, Gong, Haruki I, Kevin, Kiyo, Li-Wen Chang, Philip, Soren Lassen, Tong, Tung.
Notes:
- Philip: worked with onnx-mlir for more than a year, add a fork, diverged a bit, it would make sense to upstreaming.
- Issue
- decomposition at the onnx level as opposed to the krnl which is tool
- more granularity on what the decomposition for opset vs others onnx-to-onnx optimization
- some onnx model does not have a topological sorting.
- buffer code for constants, which are not needed anymore. Tung will open an issue for that.
- Alex: having decomposition better structured and operating on the ONNX dialect is beneficial for all paths, whereas when handling is pushed to Krnl, it is not as useful to other paths (e.g. Tosa, MHLO, ...).
- Tong: pr to hande ONNX sequence. Seqences are aggregation of tensors or maps. Tensors may have different shapes.
- describe the different sequence operation
- describe the sequence issues, lowering to memref, and issue with deallocate.
- current approach does more copies to guarantee correctness.
- optimization of copy can be done by a smarter lowering.
- there are models that have sequences in inputs and outputs. And sequences of unranked tensors. We may need to do forward/backward shape propagation (or at least rank).
- Tung: shape inference will be improved for mixed static/dynamic shapes. Basic technique is to decompose the shape into individual dimensions, being constant or dynamic values, and concatenate these individual dimensions before being used by shapes/
- Alex: introducing new support for data layout into ONNX. Question by Philip: does it make sense to do it in ONNX or should it be in a lower dialect? Alex: not sure what is best, if there are many ops working on different layout, it might be a lot of overhead to replicate many ops for both dialects. Pass will be optional, so if not called, ONNX is guaranteed to only have traditional layouts. Philip to review PR, others are welcome to provide their opinions as well.
- Bytedance folks: many PRs in flight, it will take a bit of time to use IndexExpr but not opposed on principle.
Agenda:
- discussion on ONNX and Torch. RFC is here #1639.
Present: Krishna, Sean Silva, Philip Lassen, Quinn Dawkins, Bill Xing, Soren Lassen + IBMers Kiyo, Tian, Gong, Tong, Kevin, Tung, Mori, Yasushi, Haruki, Alexandre. (Apologies to folks not listed, my partial notes did not include all participants.)
Notes:
- High level integration of torch-mlir was discussed. Presentation showed a possible integration of the ONNX dialect converter to a dialect of the torch-mlir project: (1) the Torch Dialect satisfying the Torch Backend Contract (essentially a set of rules that determine the dialect that is used at the boundary between the frontend and the backend of the torch-mlir project; or (2) a "configurable decomposition" (probably a higher level of the same torch-dialect, but before some decomposition). Both are said to be free of "class" operations that are part of the higher (i.e. closer to frontend) dialects used in the torch-mlir project.
- Current backend (as shown in RFC issue) include linalg, tosa, or mhlo. Currently used exclusively (one out of the three) but it does necessarily need to be so, as they are implemented with MLIR conversion rules.
- Motivation was given why it is attractive for current torch-mlir users to translate onnx dialect to pytorch-mlir, as they have already invested in customization of operations/optimizations in the lowering from torch dialect to the backends of the torch-mlir projects. Developers stated that while a direct path from, for example, onnx-dialect to mhlo vs one going via torch dialect may not result exactly in the same graph, their performance characteristics are expected to be similar.
- Torch-mlir folks suggested to start experimenting with the onnx -> torch -> torch-mlir backends to gain experience and determine then if the direct path are needed (e.g. should be preserved for all of the operations, or should be used for a limited set of operations).
- Questions were asked about the preservation of high level ONNX constructs such as RNNs operations (GRU, LSTM, ...) and possibly some other high level onnx operations. Currently pytorch/torch-mlir decompose these instructions in smaller components. This (IBM experience) is not best for certain accelerators such as NNPA that have support for high level ops that deliver better performance than smaller component operations. Possible solutions are to package these high level operations as functions that can be later processed in a custom fashions by specialized backend.
- Shape inference was also discussed: both onnx-mlir and torch-mlir have support for this. In general, IBMers have found that while we support full dynamic shapes, performance is often better with derived static shapes, esp for accelerator primitives.
- The question of synchronizing the different LLVM components from the various contributions to onnx-mlir (e.g. MLIR/TOSA/Linalg vs MHLO vs Torch-MLIR) were discussed. There seems to be a convergence of LLVM upgrade on a regular basis (1-4 times a months). IBM has sometime s Big-Endian issues that results in PR within LLVM that are then needed. Current cadence of upgrade should be sufficient for that.
- There was a discussion of onnx-mlir could be dependent on a smaller subset of LLVM when only the higher dialects (e.g. only onnx dialect) are used, which seems to be a possible solution to lower dependence on LLVM in some onnx-mlir project use cases.
- Steps for reaching a decision:
- We will reach out to contributors of MHLO and TOSA to get their feedback, if any, on this proposal and reconvene at a later time for final decision. I believe that the overall feedback of folks in attendance was generally positive.
- Requested the presentation to be added to the original issue for folks to consult.
- Please continue posting questions/comments in the original issue for any questions/comments you may have.
Agenda:
- request for proposals, older PRs
Present: Alexandre, Gong, Philip, Tian, Tong, Tung
Notes:
- Will request folks from Torch to present their request for onnx-mlir integration, should spend some time learning more about their projects.
- Discussion: how to handle constant propagation.
- Philip: Could use ONNX Runtime to compute values.
- Tung: Issue with ORT and big endian machine, which IBM use extensively.
- Tian: Could use LLVM JIT support to compile code.
- Tong: We could have operations that are partially constant, partially not. Examples?
- Alex: We could compile operations on the fly to evaluate operations with constant.
- Tung: We could generate for each op a library call using onnx-mlir to evaluate an op with constant inputs.
- Agreed on opening an issue on this.
- Philip: Migrating to Opset 15 is going well, almost there, next step is 16.
- Gong: Completed the PR that download benchmarks only once per CI run (as opposed to once per test, which we have many of)
- Alex: Completed PR that treat 1x1 convolutions as matrix multiply, speedup on Z14 of 2-6x, but still not near peek.
- Tung: Shape inference should use the user-given shapes at lowering to Krnl too. Short discussion on how to do it.
- Tong: Progress with handling sequences, provided some thought on symbolic shape inference, in part using memref normalization.