[WASI-NN] tensorrt: Add support for TensorRT-LLM #3878

ibmibmibm · 2024-11-18T09:05:53Z

No description provided.

juntao · 2024-11-18T09:05:58Z

Hello, I am a code review agent on flows.network. Here are my reviews of changed source code files in this PR.

.github/workflows/matrix-extensions.json

Potential issues

Redundant Options in StableDiffusion CUDA Plugins: Both wasmedge_stablediffusion-cuda-11 and wasmedge_stablediffusion-cuda-12 plugins have the same CMAKE_CUDA_ARCHITECTURES='60;61;70', which may not cover all necessary architectures for both CUDA versions.
Missing TensorRT-LLM Plugin Entry: There is no entry for a TensorRT-LLM plugin in the list, despite the subject "[WASI-NN] tensorrt: Add support for TensorRT-LLM" suggesting its addition.
Inconsistent CUDA Compiler Path: The wasmedge_stablediffusion-cuda-12 and wasi_nn-tensorrt plugins both specify the CUDA compiler path as /usr/local/cuda/bin/nvcc, which may not be consistent across all build environments and should ideally be parameterized.

Summary of changes

- Added TensorRT-LLM support for WASI-NN: Introduced a new plugin configuration for the wasi_nn-tensorrt backend with specific options and platforms.
New build configuration options: Included -DWASMEDGE_PLUGIN_WASI_NN_BACKEND=TensorRT and specified the CUDA compiler path (/usr/local/cuda/bin/nvcc) in the options field.
Platform target for TensorRT-LLM: Added a new platform ubuntu2004_tensorrt_cuda12 to support building the plugin on systems with TensorRT and CUDA 12.

.github/workflows/reusable-build-extensions.yml

Potential issues

Issue with asset_tag naming inconsistency: The build_on_ubuntu_latest job uses ubuntu24.04-clang and ubuntu24.04-gcc for the asset_tag, which is inconsistent with the pattern used in other jobs (e.g., ubuntu2004_x86_64). This inconsistency can lead to confusion and errors in downstream processing that rely on a specific naming convention.
Issue with hard-coded asset tags: The list of asset tags in the readfile step is hardcoded, which can lead to maintenance issues if new asset tags are added or existing ones are removed. It would be better to dynamically generate this list from the matrix-extensions.json file or another configuration source.
Issue with missing TensorRT-LLM specific configurations: The patch does not show any specific changes related to TensorRT-LLM in the build configurations. If TensorRT-LLM requires special handling or dependencies, these are missing, potentially leading to build failures or incorrect builds for TensorRT-LLM.

Summary of changes

- Added support for a new Docker image and asset tag specifically for TensorRT-LLM with CUDA 12 on Ubuntu 20.04.
Updated the workflow to include ubuntu2004_tensorrt_cuda12 in the list of asset tags for processing.
Introduced a new job matrix entry for building extensions using the newly added TensorRT-LLM and CUDA 12 environment.

cmake/WASINNDeps.cmake

Potential issues

N/A

Summary of changes

- Added support for TensorRT backend in WASI-NN.
Configured and included the TensorRT-LLM library using FetchContent.
Defined necessary compile definitions and linked required libraries for TensorRT support.

plugins/wasi_nn/CMakeLists.txt

Potential issues

Incorrect String Conversion in foreach Loop: The line string(TOLOWER ${BACKEND} BACKEND) modifies the loop variable BACKEND, which can lead to unexpected behavior or infinite loops if WASMEDGE_PLUGIN_WASI_NN_BACKEND contains mixed-case entries.
Missing Include Directories for TensorRT-LLM: If TensorRT-LLM backend is added, there are no corresponding include directories specified in the target_include_directories command, which might cause compilation errors due to missing headers.
Hardcoded Compiler Option: The compiler option -DWASMEDGE_PLUGIN is hardcoded in the target_compile_options command without any condition or check, which may not be suitable for all build environments or configurations.

Summary of changes

- Added wasinn_tensorrt.cpp to the list of source files for building the WasmEdgePluginWasiNN library, enabling support for TensorRT.

Introduced a new implementation file specifically tailored for integrating TensorRT with WASI-NN, likely enhancing performance and compatibility for neural network inference on edge devices using TensorRT.
This addition suggests extended functionality within the WasmEdgePluginWasiNN module to leverage NVIDIA's TensorRT capabilities, particularly with recent advancements like TensorRT-LLM.

plugins/wasi_nn/wasinn_tensorrt.cpp

Potential issues

Incorrect Error Handling in Tokenizer::load: The method incorrectly returns ErrNo::InvalidArgument on a std::bad_alloc, which can occur due to memory exhaustion and should be handled differently, possibly by returning a specific memory error code.
Assuming Macro Usage: The use of assuming macro without any definition or include might lead to compilation errors. It's unclear if this is a custom macro and it should be either defined or replaced with standard C++ assertions like assert.
File Reading Errors Not Handled in load: The file reading operations for rank0.engine, config.json, and tokenizer.json do not handle potential I/O errors (e.g., file not found, read errors), which could lead to undefined behavior or silent failures.

Summary of changes

-### Key Changes

Added TensorRT-LLM Support: Introduced support for TensorRT-LLM by integrating the necessary libraries and functionalities, including parsing configuration and tokenizer files.
Tokenizer Implementation: Implemented a Tokenizer class to handle loading, initialization, and parsing/unparsing tokens using JSON configurations and handling Unicode characters.
Graph and Context Management: Added functions to load graph resources (engine, config, tokenizer), initialize execution contexts, set inputs, and retrieve outputs specifically for the TensorRT backend.

plugins/wasi_nn/wasinn_tensorrt.h

Potential issues

Tokenizer Class Initialization Issue: The Byte2U8Char and U8Char2Byte static members in the Tokenizer class are declared but never defined, which will lead to linker errors.
Graph Struct Missing Destructor: The Graph struct does not have a destructor to properly manage the Executor, which could lead to resource leaks if an exception occurs during its construction or use.
Context Constructor Parameter Usage: The Graph & parameter in the Context constructor is passed by reference but never used, suggesting either incorrect usage or unnecessary code that should be removed for clarity.

Summary of changes

- Added TensorRT-LLM support: Introduced new classes and structures (Tokenizer, Graph, Context) within the TensorRT namespace to enable integration with TensorRT-LLM.

Conditional compilation for TensorRT backend: Wrapped TensorRT-specific code in #ifdef WASMEDGE_PLUGIN_WASI_NN_BACKEND_TENSORRT to ensure it's only included when the TensorRT backend is enabled.
Implemented tokenizer and graph execution: Added methods (load, parse, unparse) for token handling and defined structures for managing graph execution and context, including loading models and processing input/output tensors.

plugins/wasi_nn/wasinnenv.cpp

Potential issues

Potential Infinite Loop: The load function reads the entire file into memory without checking for errors during the read operation, which could lead to an infinite loop if File.read() fails without setting the failbit.
Memory Leak: The create function dynamically allocates a new WasiNNModule instance but does not provide a mechanism to delete it, potentially causing a memory leak.
Hardcoded URI Prefix Check: The check for the "unix://" prefix in the RPC URI is case-sensitive and may fail if the URI uses uppercase letters, which could lead to unnecessary warnings.

Summary of changes

- Added support for TensorRT by inserting {"tensorrt"sv, Backend::TensorRT} into the BackendMap.
Implemented a check in the load function to prevent loading files that exceed the maximum size of Data, logging an error if the file is too big.
No change in DeviceMap.

plugins/wasi_nn/wasinnenv.h

Potential issues

Potential Null Dereference: The get<B>() and get<T>() methods in both Graph and Context classes use std::get_if, which returns a pointer that could be nullptr. Accessing the dereferenced value directly without checking if it is nullptr can lead to null dereference issues.
Lack of Error Handling: The __builtin_unreachable() in the default case of switch statements in getBackend() methods implies that all possible backends are covered by the defined cases. If a new backend is added without updating these methods, it could lead to undefined behavior.
Thread Safety Concerns: While MdMutex protects access to MdMap, the RawMdMap and NNGraph/NNContext vectors are not protected by any mutex in their respective methods (mdBuild). Concurrent modifications or accesses to these members from different threads could lead to race conditions.

Summary of changes

- Added TensorRT Header: Included "wasinn_tensorrt.h" to support TensorRT integration.
Potential for New Functionality: This change sets the stage for adding and utilizing TensorRT-LLM specific features.
Modular Expansion: The inclusion of a new header file suggests an extension of the existing modular architecture to accommodate TensorRT.

plugins/wasi_nn/wasinntypes.h

Potential issues

FOR_EACH_BACKEND Macro: The FOR_EACH_BACKEND macro is missing entries for Autodetect, MLX, and Whisper. This can lead to inconsistencies if the macro is used for iteration over all backends.
Enum Class Device Default Case: The Device formatter does not handle the AUTO case, which can result in "Unknown" being printed for an AUTO device when it should be "AUTO".
Backend TensorRT-LLM Missing: The patch description mentions support for TensorRT-LLM, but there is no entry for it in the Backend enum or the FOR_EACH_BACKEND macro, indicating incomplete implementation.

Summary of changes

- Added TensorRT as a new backend: Introduced TensorRT to the Backend enum and included it in the FOR_EACH_BACKEND macro, enabling support for TensorRT-LLM.
Updated FOR_EACH_BACKEND macro: Added F(TensorRT) to iterate over the new TensorRT backend alongside existing ones.
Adjusted enum value: Incremented the numeric value of Backend::TensorRT to 13, ensuring it doesn't conflict with other backends.

utils/docker/Dockerfile.ubuntu-cuda

Potential issues

Issue 1: The TENSORRT environment variable is not set correctly; it should be used in a way that influences the behavior, but it's being treated as a string literal (ON) rather than an actual conditional flag.
Issue 2: The ENV CXXFLAGS="-Wno-error" line can suppress important warnings which might hide critical issues during compilation, affecting code quality and maintainability.
Issue 3: The cleanup step FROM base AS clean-apt does not build upon the base image's setup but rather starts a new stage; if the intention is to keep this separate for cache efficiency, ensure it logically follows from the initial setup or remove if unnecessary.

Summary of changes

- Added Conditional TensorRT Installation: Introduced an ARG TENSORRT=OFF to conditionally install TensorRT and related dependencies based on the build argument.
Expanded Dependencies for TensorRT: When TENSORRT is set to ON, additional packages including CUDA development tools, NCCL, Python bindings, and OpenMPI are installed.
Enhanced Build Flexibility: The patch allows building the Docker image with or without TensorRT support by setting the TENSORRT build argument.

utils/docker/docker-bake.ubuntu.hcl

Potential issues

Inconsistent contexts field in cuda-tensorrt target: The cuda-tensorrt target uses the context "wasmedge/wasmedge:ubuntu-20.04-build-gcc" which is not defined within the provided code, potentially leading to build failures.
Hardcoded CUDA version in cuda and cuda-tensorrt targets: Both cuda and cuda-tensorrt targets have hardcoded Ubuntu versions (20.04). This should be parameterized or made dynamic to allow flexibility across different Ubuntu versions.
Redundant and inconsistent target naming: The naming convention for some targets, especially in base-2004-clang-aarch64, is inconsistent with other targets (e.g., using -aarch64 suffix) and could be standardized to improve readability and maintainability.

Summary of changes

- Added a new target "cuda-tensorrt" to support TensorRT integration.
Defined Dockerfile and context for the new target with specific CUDA version (12.6).
Configured build arguments to enable TensorRT (TENSORRT=ON).

codecov · 2024-11-18T10:21:15Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.25%. Comparing base (61db304) to head (336a836).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #3878   +/-   ##
=======================================
  Coverage   80.24%   80.25%           
=======================================
  Files         277      277           
  Lines       38535    38535           
  Branches     6721     6721           
=======================================
+ Hits        30924    30927    +3     
+ Misses       6030     6028    -2     
+ Partials     1581     1580    -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Shen-Ta Hsieh <beststeve@secondstate.io>

* Add Docker image for tensorrt Signed-off-by: Shen-Ta Hsieh <beststeve@secondstate.io>

github-actions bot added c-Plugin An issue related to WasmEdge Plugin WASI-NN https://github.com/WebAssembly/wasi-nn c-CMake An issue related to CMake mechanism or options labels Nov 18, 2024

ibmibmibm force-pushed the beststeve/wasi-nn-tensorrt branch from 5ca0fff to 26dc4a6 Compare November 18, 2024 10:09

ibmibmibm force-pushed the beststeve/wasi-nn-tensorrt branch 2 times, most recently from 95f7564 to 0ffce71 Compare November 18, 2024 11:28

ibmibmibm force-pushed the beststeve/wasi-nn-tensorrt branch from 0ffce71 to 1410951 Compare November 29, 2024 17:12

github-actions bot added the c-CI An issue related to CI label Dec 1, 2024

ibmibmibm added 2 commits December 6, 2024 13:25

[WASI-NN] tensorrt: Add support for TensorRT-LLM

c92fe2a

Signed-off-by: Shen-Ta Hsieh <beststeve@secondstate.io>

[WASI-NN] Add CI test for tensorrt-llm

336a836

* Add Docker image for tensorrt Signed-off-by: Shen-Ta Hsieh <beststeve@secondstate.io>

ibmibmibm force-pushed the beststeve/wasi-nn-tensorrt branch from cc51d78 to 336a836 Compare December 6, 2024 05:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WASI-NN] tensorrt: Add support for TensorRT-LLM #3878

[WASI-NN] tensorrt: Add support for TensorRT-LLM #3878

ibmibmibm commented Nov 18, 2024

juntao commented Nov 18, 2024 •

edited

Loading

codecov bot commented Nov 18, 2024 •

edited

Loading

[WASI-NN] tensorrt: Add support for TensorRT-LLM #3878

Are you sure you want to change the base?

[WASI-NN] tensorrt: Add support for TensorRT-LLM #3878

Conversation

ibmibmibm commented Nov 18, 2024

juntao commented Nov 18, 2024 • edited Loading

Potential issues

Summary of changes

Potential issues

Summary of changes

Potential issues

Summary of changes

Potential issues

Summary of changes

Potential issues

Summary of changes

Potential issues

Summary of changes

Potential issues

Summary of changes

Potential issues

Summary of changes

Potential issues

Summary of changes

Potential issues

Summary of changes

Potential issues

Summary of changes

codecov bot commented Nov 18, 2024 • edited Loading

Codecov Report

juntao commented Nov 18, 2024 •

edited

Loading

codecov bot commented Nov 18, 2024 •

edited

Loading