-
Notifications
You must be signed in to change notification settings - Fork 23.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High memory usage for CPU inference on variable input shapes (10x compared to pytorch 1.1) #27971
Comments
10X memory usage compared to pytorch 1.1 is bad so I am marking this as high pri. |
Paging the MKL-DNN folks as this is almost certainly MKLDNN related |
Thanks for the hint. Are there any environment variables or options that might influence result? Edit: maybe #25186 could be useful here. I just tried the benchmark on an AMD Ryzen CPU and got the same results. |
FWIW ONNX runtime looks almost unaffected by this issue, so as a workaround it's possible to use it for inference, here are benchmark results on the same machine. Model exported with (no other optimizations applied):
And results are:
Even better, memory stops growing after about 800 iterations with |
I can reproduce this on master.
|
You can get more information about MKLDNN by setting env var
(I don't really know what it means though XD) |
@ezyang, unfortunately verbose log does not tell us anything about memory consumption. |
Observed behavior is likely the result of caching mechanism implemented outside of the library. |
@lopuhin, This is the same problem which you said in oneapi-src/oneDNN#489, Ideep will cache MKLDNN primitives to reduce the cost of create MKLDNN primitive creation, we support an environment variable named LRU_CACHE_CAPACITY to control the cache capacity. The default value is 1024, you can set a smaller number to reduce the memory use by export LRU_CACHE_CAPACITY=your number. Thanks! |
Wow this works perfectly and solves the issue, thank you @XiagenFeng Benchmark results:
|
Downgrading priority as a workaround is present. I'll keep the bug open in case anyone else notices high memory usage; we may want to reduce the default cache size (but hard to say without more reports.) |
Amplifying priority: #29809 is a duplicate report of this problem. |
Another duplicate report: #29893 |
time to reduce default cache size? |
Let's reduce the default cache size. |
@gchanan says maybe the recent release of MKL DNN may have helped here. |
pytorch=1.3.0 and set LRU_CACHE_CAPACITY=1 fix the memory leak. |
I cannot reproduce this with current master (
|
I can reproduce this with I cannot reproduce this with current master ( The
Full output for v1.5.0 build:
On current master, MKL-DNN has been upgraded to
Full output for master (
The MKL-DNN upgrade from v0.21.1 to v1.2.0 happened in gh-32422. Memory usage now is still a little higher than with PyTorch 1.1 (760 MB now for n=400, vs. 518 MB on 1.1), but that's probably expected, and it doesn't keep growing:
That upgrade also got rid of |
same here for pytorch 1.3.0 fixed by import os
os.environ["LRU_CACHE_CAPACITY"] = "3" |
@AloneGu The fix was on the master branch |
got it , thx |
For the record (since I recently found this issue searching for a solution to this particular problem), the relevant environment variable is now called |
So I had to go really deep on a CPU-inference memory issue for a model that has variable sized input (audio). Here's what I found, hope it helps: What worked
What didn't work, but maybe would work for you
Memory usage after inference of 500 items of varying sizes
|
🐛 Bug
In pytorch 1.3, when doing inference with resnet34 on CPU with variable input shapes, much more memory is used compared to pytorch 1.1 (both CPU-only builds on one core): 6 GB for pytorch 1.3 vs. ~0.5 GB for pytorch 1.1
To Reproduce
Steps to reproduce the behavior:
Run the following script https://gist.github.com/lopuhin/0d100ef7df01fdfc91d9685f6e01ff64 - it performs inference with resnet34 on images with fixed width and variable height, and reports speed and memory growth over the course of the benchmark.
Running under pytorch 1.1:
Running under pytorch 1.3:
Expected behavior
Expected behavior is low memory usage as in pytorch 1.1. Alternatively, a way to control caching (e.g. something which disables caching or something like
torch.cuda.clear_caches()
but for CPU) - as I understand, high memory usage happens because allocations are cached, which makes sense for fixed shapes, but does not work well for variable shapes. Binning shapes is possible as a work-around but has a noticeable performance penalty and memory usage is still higher.Environment
Environment under pytorch 1.1 (via
collect_env.py
script):pytorch installed with
Environment under pytorch 1.3:
pytorch installed with
Additional context
This may be similar to oneapi-src/oneDNN#489 but here mkldnn is not used explicitly.
cc @VitalyFedyunin @gujinghui @PenghuiCheng @XiaobingSuper @jianyuh @ezyang @gchanan @zou3519
The text was updated successfully, but these errors were encountered: