[Bug] Manually building RTMDet-Inst tensorRT engine fails with internal error #2236

mattiasbax · 2023-07-03T08:55:11Z

Checklist

I have searched related issues but cannot get the expected help.
2. I have read the FAQ documentation but cannot get the expected help.
3. The bug has not been fixed in the latest version.

Describe the bug

I want to parse the RTMDet-Inst model and build the serialized model from .onnx file representation in order to control which tensorRT version is being used (and not necessarily 8.2 which is default for mmdeploy)

When I try to load the RTMDet-Inst end2end.onnx model created using mmdeploy into a tensorRT python script to build the engine I get the following error:

[TRT] [E] 4: [graphShapeAnalyzer.cpp::nvinfer1::builder::`anonymous-namespace'::ShapeAnalyzerImpl::processCheck::862] Error Code 4: Internal Error (/TopK: K exceeds the maximum value allowed (3840).)

Reproduction

import tensorrt as trt

logger = trt.Logger(trt.Logger.WARNING)
trt.init_libnvinfer_plugins(logger, '')
runtime = trt.Runtime(logger)
builder = trt.Builder(logger)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
parser = trt.OnnxParser(network, logger)
success = parser.parse_from_file("end2end.onnx")
for idx in range(parser.num_errors):
    print(parser.get_error(idx))

if not success:
    return None # Error handling code here

config = builder.create_builder_config()
config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 1 << 20)
serialized_engine = builder.build_serialized_network(network, config) # <--- this fails with internal error

Environment

07/03 10:52:46 - mmengine - INFO - **********Environmental information**********
07/03 10:52:49 - mmengine - INFO - sys.platform: win32
07/03 10:52:49 - mmengine - INFO - Python: 3.8.16 (default, Mar  2 2023, 03:18:16) [MSC v.1916 64 bit (AMD64)]
07/03 10:52:49 - mmengine - INFO - CUDA available: True
07/03 10:52:49 - mmengine - INFO - numpy_random_seed: 2147483648
07/03 10:52:49 - mmengine - INFO - GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU
07/03 10:52:49 - mmengine - INFO - CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6        
07/03 10:52:49 - mmengine - INFO - NVCC: Cuda compilation tools, release 11.6, V11.6.124
07/03 10:52:49 - mmengine - INFO - MSVC: Microsoft (R) C/C++ Optimizing Compiler Version 19.34.31937 for x64  
07/03 10:52:49 - mmengine - INFO - GCC: n/a
07/03 10:52:49 - mmengine - INFO - PyTorch: 1.13.1+cu116
07/03 10:52:49 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - C++ Version: 199711
  - MSVC 192829337
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 2019
  - LAPACK is enabled (usually provided by MKL)        
  - CPU capability usage: AVX2
  - CUDA Runtime 11.6
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
07/03 10:52:49 - mmengine - INFO - OpenCV: 4.7.0
07/03 10:52:49 - mmengine - INFO - MMEngine: 0.7.4
07/03 10:52:49 - mmengine - INFO - MMCV: 2.0.0
07/03 10:52:49 - mmengine - INFO - MMCV Compiler: MSVC 193431937
07/03 10:52:49 - mmengine - INFO - MMCV CUDA Compiler: 11.6
07/03 10:52:49 - mmengine - INFO - MMDeploy: 1.0.0rc3+
07/03 10:52:49 - mmengine - INFO -

07/03 10:52:49 - mmengine - INFO - **********Backend information**********
07/03 10:52:49 - mmengine - INFO - tensorrt:    8.6.1
07/03 10:52:49 - mmengine - INFO - tensorrt custom opsAvailable
07/03 10:52:49 - mmengine - INFO - ONNXRuntime: 1.14.1
07/03 10:52:49 - mmengine - INFO - ONNXRuntime-gpu:   1.14.1
07/03 10:52:49 - mmengine - INFO - ONNXRuntime custom ops:     NotAvailable
07/03 10:52:49 - mmengine - INFO - pplnn:       None
07/03 10:52:49 - mmengine - INFO - ncnn:        None
07/03 10:52:49 - mmengine - INFO - snpe:        None
07/03 10:52:49 - mmengine - INFO - openvino:    None
07/03 10:52:49 - mmengine - INFO - torchscript: 1.13.1+cu116
07/03 10:52:49 - mmengine - INFO - torchscript custom ops:     NotAvailable
07/03 10:52:49 - mmengine - INFO - rknn-toolkit:      None
07/03 10:52:49 - mmengine - INFO - rknn-toolkit2:     None
07/03 10:52:49 - mmengine - INFO - ascend:      None
07/03 10:52:49 - mmengine - INFO - coreml:      None
07/03 10:52:49 - mmengine - INFO - tvm: None
07/03 10:52:49 - mmengine - INFO -

07/03 10:52:49 - mmengine - INFO - **********Codebase information**********
07/03 10:52:49 - mmengine - INFO - mmdet:       3.0.0
07/03 10:52:49 - mmengine - INFO - mmseg:       None
07/03 10:52:49 - mmengine - INFO - mmcls:       None
07/03 10:52:49 - mmengine - INFO - mmocr:       None
07/03 10:52:49 - mmengine - INFO - mmedit:      None
07/03 10:52:49 - mmengine - INFO - mmdet3d:     None
07/03 10:52:49 - mmengine - INFO - mmpose:      None
07/03 10:52:49 - mmengine - INFO - mmrotate:    None
07/03 10:52:49 - mmengine - INFO - mmaction:    None
(trt_export_38)

Error traceback

[TRT] [E] 4: [graphShapeAnalyzer.cpp::nvinfer1::builder::`anonymous-namespace'::ShapeAnalyzerImpl::processCheck::862] Error Code 4: Internal Error (/TopK: K exceeds the maximum value allowed (3840).)

The text was updated successfully, but these errors were encountered:

RunningLeon · 2023-07-03T09:15:22Z

@mattiasbax hi, how did you get the onnx file? Could you provide a sample code?

mattiasbax · 2023-07-03T09:23:26Z

Hi @RunningLeon ,

The onnx file was acquired by the provided deploy.py script, i.e:

python tools/deploy.py configs/mmdet/instance-seg/instance-seg_rtmdet-ins_onnxruntime_static-640x640.py ../mmdetection/configs/rtmdet/rtmdet-ins_s_8xb32-300e_coco.py ../mmdetection/checkpoints/rtmdet-ins_s_8xb32-300e_coco_20221121_212604-fdc5d7ec.pth ../mmdetection/demo/demo.jpg --work-dir ./work_dir/rtmdet-ins-ort-s --device cuda:0 --dump-info

The onnx file has been loaded into a python script using onnxruntime and executed inference on both image and video data successfully.

RunningLeon · 2023-07-03T09:36:03Z

@mattiasbax mmdeploy provides torch2onnx with the awareness of the backend configured in deploy configs. So if you want to deploy onto TensorRT, you have to rerun tools/deploy.py with tensorrt deploy config such as configs/mmdet/instance-seg/instance-seg_tensorrt_static-800x1344.py

mattiasbax · 2023-07-03T09:41:26Z

@RunningLeon So is there is a difference in the provided .onnx file if I run deploy.py with "instance-seg_tensorrt_static-800x1344.py" and "instance-seg_rtmdet-ins_onnxruntime_static-640x640"? The .engine file I received from running deploy.py with tensorrt_static_xyz.py was not compatible with my tensorRT version, which is why I wanted to build the engine file myself.

RunningLeon · 2023-07-03T09:45:33Z

Yes, the onnx files are different. I just tested OK on torch1.10.0+trt8.4.1.5+cuda11.3.

mattiasbax · 2023-07-03T09:54:45Z

@RunningLeon I see, thanks.

I just tried with the above script using the other .onnx file provided from using tensorrt_static_xyz.py file. I got these errors:

[07/03/2023-11:51:42] [TRT] [E] 3: getPluginCreator could not find plugin: TRTBatchedNMS version: 1
[07/03/2023-11:51:42] [TRT] [E] ModelImporter.cpp:771: While parsing node number 463 [TRTBatchedNMS -> "/TRTBatchedNMS_output_0"]:
[07/03/2023-11:51:42] [TRT] [E] ModelImporter.cpp:772: --- Begin node ---
[07/03/2023-11:51:42] [TRT] [E] ModelImporter.cpp:773: input: "/Unsqueeze_10_output_0"
input: "/Where_output_0"
output: "/TRTBatchedNMS_output_0"
output: "/TRTBatchedNMS_output_1"
output: "/TRTBatchedNMS_output_2"
name: "/TRTBatchedNMS"
op_type: "TRTBatchedNMS"
attribute {
  i: 0
  type: INT
}
attribute {
  name: "iou_threshold"
  f: 0.6
  type: FLOAT
}
attribute {
  name: "is_normalized"
  i: 0
  type: INT
}
attribute {
  name: "keep_topk"
  i: 100
  type: INT
}
attribute {
  name: "num_classes"
  i: 80
  type: INT
}
attribute {
  name: "return_index"
  i: 1
  type: INT
}
attribute {
  name: "score_threshold"
  f: 0.05
  type: FLOAT
}
attribute {
  name: "topk"
  i: 5000
  type: INT
}
domain: "mmdeploy"

[07/03/2023-11:51:42] [TRT] [E] ModelImporter.cpp:774: --- End node ---
[07/03/2023-11:51:42] [TRT] [E] ModelImporter.cpp:777: ERROR: builtin_op_importers.cpp:5404 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
In node 463 (importFallbackPluginImporter): UNSUPPORTED_NODE: Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"

I guess I'm missing some details on how to import the plugins correctly I guess

RunningLeon · 2023-07-03T10:06:33Z

hi, pls. rebuild mmdeploy with tensorrt or use docker image https://mmdeploy.readthedocs.io/en/latest/01-how-to-build/build_from_docker.html#use-docker-image

mattiasbax · 2023-07-03T10:15:23Z

@RunningLeon Ok, I will try to rebuild mmdeploy. Do I have to build it with tensorRT 8.2.3 or can I build it with later versions?

mattiasbax · 2023-07-04T06:50:32Z

Hi @RunningLeon ,

I rebuilt my mmdeploy in accordance to (https://mmdeploy.readthedocs.io/en/latest/01-how-to-build/windows.html) and successfully built the mmdeploy_tensorrt_ops.dll.

I still get the above errors:

[07/04/2023-08:46:44] [TRT] [E] ModelImporter.cpp:779: ERROR: builtin_op_importers.cpp:4870 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
In node 463 (importFallbackPluginImporter): UNSUPPORTED_NODE: Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"

How do I use the mmdeploy_tensorrt_ops so that my application can find the operations?

RunningLeon · 2023-07-04T07:17:16Z

hi, you could try add the dll to path env

mattiasbax · 2023-07-04T07:25:55Z

@RunningLeon I already did that, without any success unfortunately. I even tried moving it directly to the tensorrt lib folder (which is in path env).

Is there anything else I need from the built custom ops? I did not build the SDKs, only the TensorRT custom ops

RunningLeon · 2023-07-04T08:22:30Z

hi, pls. check cuda lib cudnn lib tensorrt lib and add them all to the path

mattiasbax · 2023-07-04T09:07:22Z

@RunningLeon They are all correctly in the path. Finding cuda, cudnn or tensorrt is not the problem it seems. It seems that it is not loading the DLL that contains the custom ops for the RTMDet-inst model.

Should I try bumping to a later TensorRT version so that I can use load library (https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Plugin/IPluginRegistry.html#tensorrt.IPluginRegistry.load_library) to specify the path to the mmdeploy_tesonrrt_ops.dll?

mattiasbax · 2023-07-04T14:09:32Z

Could you maybe send me your built mmdeploy_tesonrrt_ops.dll so I can test if it's the plugin or the environment that is faulty on my end?

github-actions · 2023-07-12T02:09:20Z

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

github-actions · 2023-07-17T02:12:46Z

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.

Yeah-l · 2023-08-21T03:29:45Z

I'm experiencing the same error, looking forward to a solution!

datinje · 2023-08-27T17:06:17Z

same for me : in my case the inference with python passes but with C++ I am getting the same error
ERROR] 4: [graphShapeAnalyzer.cpp::processCheck::862] Error Code 4: Internal Error (/model/proposal_generator/TopK: K exceeds the maximum value allowed (3840).)
look like a pattern

RunningLeon · 2023-08-28T01:46:41Z

same for me : in my case the inference with python passes but with C++ I am getting the same error ERROR] 4: [graphShapeAnalyzer.cpp::processCheck::862] Error Code 4: Internal Error (/model/proposal_generator/TopK: K exceeds the maximum value allowed (3840).) look like a pattern

@datinje hi, could you try this PR #2343?

SushmaDG · 2024-07-02T16:03:53Z

I have the same issue when I try to generate TensorRT engine file. I have used the tensorrt deploy config as suggested in the comment above : #2236 (comment)

Error[4]: [graphShapeAnalyzer.cpp::nvinfer1::builder::`anonymous-namespace'::ShapeAnalyzerImpl::processCheck::862] 
Error Code 4: Internal Error (/TopK: K exceeds the maximum value allowed (3840).)

I also have the latest mmdeploy version that includes PR #2343, as suggested by @RunningLeon .

RunningLeon self-assigned this Jul 3, 2023

RunningLeon added awaiting response TensorRT mmdet labels Jul 3, 2023

github-actions bot added the Stale label Jul 12, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Manually building RTMDet-Inst tensorRT engine fails with internal error #2236

[Bug] Manually building RTMDet-Inst tensorRT engine fails with internal error #2236

mattiasbax commented Jul 3, 2023

RunningLeon commented Jul 3, 2023

mattiasbax commented Jul 3, 2023

RunningLeon commented Jul 3, 2023

mattiasbax commented Jul 3, 2023

RunningLeon commented Jul 3, 2023

mattiasbax commented Jul 3, 2023

RunningLeon commented Jul 3, 2023

mattiasbax commented Jul 3, 2023

mattiasbax commented Jul 4, 2023

RunningLeon commented Jul 4, 2023 •

edited

Loading

mattiasbax commented Jul 4, 2023

RunningLeon commented Jul 4, 2023

mattiasbax commented Jul 4, 2023

mattiasbax commented Jul 4, 2023

github-actions bot commented Jul 12, 2023

github-actions bot commented Jul 17, 2023

Yeah-l commented Aug 21, 2023

datinje commented Aug 27, 2023

RunningLeon commented Aug 28, 2023

SushmaDG commented Jul 2, 2024

[Bug] Manually building RTMDet-Inst tensorRT engine fails with internal error #2236

[Bug] Manually building RTMDet-Inst tensorRT engine fails with internal error #2236

Comments

mattiasbax commented Jul 3, 2023

Checklist

Describe the bug

Reproduction

Environment

Error traceback

RunningLeon commented Jul 3, 2023

mattiasbax commented Jul 3, 2023

RunningLeon commented Jul 3, 2023

mattiasbax commented Jul 3, 2023

RunningLeon commented Jul 3, 2023

mattiasbax commented Jul 3, 2023

RunningLeon commented Jul 3, 2023

mattiasbax commented Jul 3, 2023

mattiasbax commented Jul 4, 2023

RunningLeon commented Jul 4, 2023 • edited Loading

mattiasbax commented Jul 4, 2023

RunningLeon commented Jul 4, 2023

mattiasbax commented Jul 4, 2023

mattiasbax commented Jul 4, 2023

github-actions bot commented Jul 12, 2023

github-actions bot commented Jul 17, 2023

Yeah-l commented Aug 21, 2023

datinje commented Aug 27, 2023

RunningLeon commented Aug 28, 2023

SushmaDG commented Jul 2, 2024

RunningLeon commented Jul 4, 2023 •

edited

Loading