Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: High RAM usage in iGPU #28009

Open
3 tasks done
yaniv5678 opened this issue Dec 10, 2024 · 6 comments
Open
3 tasks done

[Bug]: High RAM usage in iGPU #28009

yaniv5678 opened this issue Dec 10, 2024 · 6 comments
Assignees
Labels
bug Something isn't working category: GPU OpenVINO GPU plugin PSE

Comments

@yaniv5678
Copy link

OpenVINO Version

2024.5.0

Operating System

Windows System

Device used for inference

GPU

Framework

None

Model used

deberta-v3-mini

Issue description

Hi,
I converted deberta-v3-mini to OpenVINO using optimum-cli, with weights compressed to int8. The file size on disk is ~160MB.
And then compiled the model using both pythonic openvino & openvino-rs.
In both scenarios, model took ~500MB of RAM.
When I used inference precision hint of "int8" to my iGPU (intel iris xe, with core i5)
It didn't help, it even took more RAM (around 1.2GB!!)

When I compiled this model to CPU, it only took ~40MB somehow.

Can you help me understand why, and how to decrease RAM usage in GPU case?
Is this a bug?

Thanks!

Step-by-step reproduction

optimum-cli export openvino --model microsoft/deberta-v3-small --weight-format int8 deberta
import openvino as ov
core = ov.Core()
compiled_model = core.compile_model("deberta/openvino_model.xml", device_name='GPU')

Relevant log output

No response

Issue submission checklist

  • I'm reporting an issue. It's not a question.
  • I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
  • There is reproducer code and related data files such as images, videos, models, etc.
@Aznie-Intel
Copy link

Aznie-Intel commented Dec 11, 2024

Hi @yaniv5678 , how did you check the memory usage when compiling the model with CPU and GPU? Meanwhile, GPU performance relies on the OpenCL kernels for the implementation. You can refer GPU Performance Checklist.

@yaniv5678
Copy link
Author

yaniv5678 commented Dec 11, 2024

Hi @Aznie-Intel! Thanks for your prompt response.

I've checked using the task manager. I made sure to only read and compile the model, and then put the process to sleep to be sure it isn't related to other code.
I've checked it multiple times and experienced the same RAM usage for my process.

  • Can you please try to reproduce it in your environment?
  • Has anyone run DeBERTa or something similiar?

I've referred the GPU perf checklist, thanks.
It didn't help, I think I'm pretty aligned with all the tips and I've tried the relevant ones.
i.e tried using caching and it didn't help to RAM consumption.

Do you know what is the memory layout of these 500MB RAM?
Only model weights or something else big? (I thought the weights are being "mmap"ed so I should not see them! So I wonder why I see not only them, but 500MB of RAM consumption).

@Aznie-Intel
Copy link

Aznie-Intel commented Dec 12, 2024

@yaniv5678 Below is my observation for both CPU and GPU.

CPU:
cpu_1

GPU:
gpu_1

There is no significant memory consumption compared to CPU and GPU. Can you provide the following information for further investigation:

  1. Run Hello Query Device Python Sample to find your GPU device specification.
  2. Intel® Graphics Compute Runtime for OpenCL™ driver version.

@yaniv5678
Copy link
Author

yaniv5678 commented Dec 13, 2024

@Aznie-Intel

  1. Below is the output of "Hello Query Device" script:
[ INFO ] Available devices:
[ INFO ] CPU :
[ INFO ]        SUPPORTED_PROPERTIES:
[ INFO ]                AVAILABLE_DEVICES:
[ INFO ]                RANGE_FOR_ASYNC_INFER_REQUESTS: 1, 1, 1
[ INFO ]                RANGE_FOR_STREAMS: 1, 12
[ INFO ]                EXECUTION_DEVICES: CPU
[ INFO ]                FULL_DEVICE_NAME: 13th Gen Intel(R) Core(TM) i5-1335U
[ INFO ]                OPTIMIZATION_CAPABILITIES: FP32, INT8, BIN, EXPORT_IMPORT
[ INFO ]                DEVICE_TYPE: Type.INTEGRATED
[ INFO ]                DEVICE_ARCHITECTURE: intel64
[ INFO ]                NUM_STREAMS: 1
[ INFO ]                INFERENCE_NUM_THREADS: 0
[ INFO ]                PERF_COUNT: False
[ INFO ]                INFERENCE_PRECISION_HINT: <Type: 'float32'>
[ INFO ]                PERFORMANCE_HINT: PerformanceMode.LATENCY
[ INFO ]                EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE
[ INFO ]                PERFORMANCE_HINT_NUM_REQUESTS: 0
[ INFO ]                ENABLE_CPU_PINNING: True
[ INFO ]                SCHEDULING_CORE_TYPE: SchedulingCoreType.ANY_CORE
[ INFO ]                MODEL_DISTRIBUTION_POLICY: set()
[ INFO ]                ENABLE_HYPER_THREADING: True
[ INFO ]                DEVICE_ID:
[ INFO ]                CPU_DENORMALS_OPTIMIZATION: False
[ INFO ]                LOG_LEVEL: Level.NO
[ INFO ]                CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1.0
[ INFO ]                DYNAMIC_QUANTIZATION_GROUP_SIZE: 32
[ INFO ]                KV_CACHE_PRECISION: <Type: 'float16'>
[ INFO ]                AFFINITY: Affinity.HYBRID_AWARE
[ INFO ]
[ INFO ] GPU :
[ INFO ]        SUPPORTED_PROPERTIES:
[ INFO ]                AVAILABLE_DEVICES: 0
[ INFO ]                RANGE_FOR_ASYNC_INFER_REQUESTS: 1, 2, 1
[ INFO ]                RANGE_FOR_STREAMS: 1, 2
[ INFO ]                OPTIMAL_BATCH_SIZE: 1
[ INFO ]                MAX_BATCH_SIZE: 1
[ INFO ]                DEVICE_ARCHITECTURE: GPU: vendor=0x8086 arch=v12.3.0
[ INFO ]                FULL_DEVICE_NAME: Intel(R) Iris(R) Xe Graphics (iGPU)
[ INFO ]                DEVICE_UUID: *****
[ INFO ]                DEVICE_LUID: *****
[ INFO ]                DEVICE_TYPE: Type.INTEGRATED
[ INFO ]                DEVICE_GOPS: {<Type: 'float16'>: 3200.0, <Type: 'float32'>: 1600.0, <Type: 'int8_t'>: 6400.0, <Type: 'uint8_t'>: 6400.0}
[ INFO ]                OPTIMIZATION_CAPABILITIES: FP32, BIN, FP16, INT8, EXPORT_IMPORT
[ INFO ]                GPU_DEVICE_TOTAL_MEM_SIZE: 7441600512
[ INFO ]                GPU_UARCH_VERSION: 12.3.0
[ INFO ]                GPU_EXECUTION_UNITS_COUNT: 80
[ INFO ]                GPU_MEMORY_STATISTICS: {}
[ INFO ]                PERF_COUNT: False
[ INFO ]                MODEL_PRIORITY: Priority.MEDIUM
[ INFO ]                GPU_HOST_TASK_PRIORITY: Priority.MEDIUM
[ INFO ]                GPU_QUEUE_PRIORITY: Priority.MEDIUM
[ INFO ]                GPU_QUEUE_THROTTLE: Priority.MEDIUM
[ INFO ]                GPU_ENABLE_SDPA_OPTIMIZATION: True
[ INFO ]                GPU_ENABLE_LOOP_UNROLLING: True
[ INFO ]                GPU_DISABLE_WINOGRAD_CONVOLUTION: False
[ INFO ]                CACHE_DIR:
[ INFO ]                CACHE_MODE: CacheMode.OPTIMIZE_SPEED
[ INFO ]                PERFORMANCE_HINT: PerformanceMode.LATENCY
[ INFO ]                EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE
[ INFO ]                COMPILATION_NUM_THREADS: 12
[ INFO ]                NUM_STREAMS: 1
[ INFO ]                PERFORMANCE_HINT_NUM_REQUESTS: 0
[ INFO ]                INFERENCE_PRECISION_HINT: <Type: 'float16'>
[ INFO ]                ENABLE_CPU_PINNING: False
[ INFO ]                DEVICE_ID: 0
[ INFO ]                DYNAMIC_QUANTIZATION_GROUP_SIZE: 32
[ INFO ]                ACTIVATIONS_SCALE_FACTOR: 0.0
[ INFO ]
  1. I couldn't find out where "Intel® Graphics Compute Runtime for OpenCL™ driver" version is stored.
    Anyways, I opened "Intel Graphics Command Center" and saw the following versions listed there:
  • DirectX 12
  • Graphics Driver 32.0.101.5972 (no update data available)
  • Shader version 6.6
  • OpenCL Runtime Version: 3.0
  • Vulkan 1.3.289
  • Graphics Output Protocol (GOP) version: 21.0.1060

@Aznie-Intel
Copy link

@yaniv5678 Thanks for the information. I will check this with the relevant team and update you soon.

@avitial avitial self-assigned this Dec 16, 2024
@avitial
Copy link
Contributor

avitial commented Dec 24, 2024

Ref. 159902

@avitial avitial added category: GPU OpenVINO GPU plugin and removed support_request labels Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working category: GPU OpenVINO GPU plugin PSE
Projects
None yet
Development

No branches or pull requests

3 participants