-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buffer strategy for application with a lot of threads will lead to out of memory errors. #73
Comments
Thank you, @StefanS52u1 I'm aware of this issue, actually, there was another issue (#60), that its author has closed (I just reopened it). What would you recommend - collect on memory pressure (like, for instance, aggressively collect, when overall allocations cross some limit) or collect based on inactivity timeout? The former seems simpler to implement, as it doesn't introduce the notion of time, but it is harder to avoid active threads from suffering of buffer starvation... thank you, |
keeping a common pool of free buffers comes to mind... |
another idea - keep empty/total ratio at a specific level, if possible without overflowing the max buffers size. If ratio is less than some low-water level and buffer count is less than max - allocate new buffer, if higher than high-water - start deallocating free buffers, instead of putting them to free buffer queue.... |
the old technique of double-buffer has its benefits (for instance, the reader can read accumulated data at any time, without waiting for the writer to actually submit anything), but is about twice slower and, what is worse, introduces bigger deviations in latency - because of interlocked operations at every track() call. |
hey @StefanS52u1 @srogatch in here i implemented a sizable queue of buffers. i'll prepare a build soon (along with the fix of sockets misuse) - will you guys be able to check it out? |
This is the build: https://github.com/tyoma/micro-profiler/releases/tag/v1.9.632.0 |
Hi Artem,
that's really difficult to answer. Nevertheless - profiling has always a
high impact on the observed executable. It will also influence thread
synchronization and more.
Therefore I would prefer a simple strategy. If you have any
implementations, i would support you with testing, if you want it.
Best regards Stefan
P.S: I switched back to Micro-Profiler version 1.6.615 and here the buffer
strategy and the memory footprint seems to be quite o.k. for me.
Am Di., 18. Mai 2021 um 05:22 Uhr schrieb Artem G. ***@***.***
…:
This is the build:
https://github.com/tyoma/micro-profiler/releases/tag/v1.9.632.0
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#73 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOWZUHB2BQWNN7XRIMLWV6TTOHMOVANCNFSM43ZAOUZQ>
.
|
Hi @StefanS52u1! Thank you for the reply - the build is here: https://github.com/tyoma/micro-profiler/releases/tag/v1.9.632.0 |
Hi @StefanS52u1, I hear what you say about the buffering. Unfortunately, the approach with bipartite buffering must use locking operation on every call to a profiled function and this has two significant drawbacks:
The new approach with multiple buffers allows to write without locking anything up until the point buffer switch is required. I kindly ask you the run it on your application to see, if it indeed does what I intended it to do, so that I'd be able to close this (and another similar) issue. Thank you! |
Hi Artem,
sorry, but so far i have no time to check the changes, but i will do it
until the end of this week.
Best regards Stefan
Am So., 23. Mai 2021 um 23:48 Uhr schrieb Artem G. ***@***.***
…:
Hi @StefanS52u1 <https://github.com/StefanS52u1>,
I hear what you say about the buffering. Unfortunately, the approach with
bipartite buffering must use locking operation on every call to a profiled
function and this has two significant drawbacks:
1. slowdown - current processor must lock the bus in order to do CAS;
2. tying threads - due to locking multiple heavy loaded threads being
profiled may become dependent on that sequencing.
The new approach with multiple buffers allows to write without locking
anything up until the point buffer switch is required.
And in the build I mentioned just above I implemented an adaptive
mechanism that tries to keep allocated buffers amount steady, while, at the
same, deallocates unnecessary buffers, when a thread becomes idle.
I kindly ask you the run it on your application to see, if it indeed does
what I intended it to do, so that I'd be able to close this (and another
similar) issue.
Thank you!
Artem
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#73 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOWZUHHDJR7LN7TBXKG3Q2TTPFZ2FANCNFSM43ZAOUZQ>
.
|
Hi Artem,
sorry for the delayed answer: The memory footprint seems to be much better
now with the version 1.9.632. But i am observing new issues. Our test
application will now start again (with the older version the i always got
bad alloc due to much allocated ram). but the devstudio studio will stop
working and won't update the profile table after a short time.The devstudio
will always get stucked then and will not be react on key stroke any more..
And i still observe another issue/problem: We are using a thread pool,
therefore the same task/jobs and methods of the objects will be executed by
different jobs. Therefore we are not interested at all, which thread is
executing wich function. It does matter. In this case i would prefer a
view, where the date is not collected for each thread, as shown below. For
this application the thread id doesn't matter. The data should be displayed
ignoring the thread id.
Best regards Stefan
It seems the the devstudio will get hang up within the microprofiler dll....
[image: image.png]
P.S: I will check the actual version 1.9.634 also.
[image: image.png]
Am Mi., 26. Mai 2021 um 09:52 Uhr schrieb Stefan Seiwerth <
***@***.***>:
… Hi Artem,
sorry, but so far i have no time to check the changes, but i will do it
until the end of this week.
Best regards Stefan
Am So., 23. Mai 2021 um 23:48 Uhr schrieb Artem G. <
***@***.***>:
> Hi @StefanS52u1 <https://github.com/StefanS52u1>,
>
> I hear what you say about the buffering. Unfortunately, the approach with
> bipartite buffering must use locking operation on every call to a profiled
> function and this has two significant drawbacks:
>
> 1. slowdown - current processor must lock the bus in order to do CAS;
> 2. tying threads - due to locking multiple heavy loaded threads being
> profiled may become dependent on that sequencing.
> The new approach with multiple buffers allows to write without
> locking anything up until the point buffer switch is required.
> And in the build I mentioned just above I implemented an adaptive
> mechanism that tries to keep allocated buffers amount steady, while, at the
> same, deallocates unnecessary buffers, when a thread becomes idle.
>
> I kindly ask you the run it on your application to see, if it indeed does
> what I intended it to do, so that I'd be able to close this (and another
> similar) issue.
>
> Thank you!
> Artem
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#73 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AOWZUHHDJR7LN7TBXKG3Q2TTPFZ2FANCNFSM43ZAOUZQ>
> .
>
|
Hi,
i'm observing also another issue with the microprofiler. I'm using also /GH
and /Gh instrumentation with an own library and dll to find memory leaks. I
will collect for each pentry call the the last address pointer within a
deque. Allocating memory will then store this stack together with the
allocated object id. What i observed is, that it was necessary to save all
registers (with pushad) within the assembly code, before calling the C/C++
Code, Otherwise i will get unpredictable exceptions. I know this will
decrease performance, but i think not too much.
Best regards Stefan
Am Di., 27. Juli 2021 um 21:56 Uhr schrieb Stefan Seiwerth <
***@***.***>:
… Hi Artem,
sorry for the delayed answer: The memory footprint seems to be much better
now with the version 1.9.632. But i am observing new issues. Our test
application will now start again (with the older version the i always got
bad alloc due to much allocated ram). but the devstudio studio will stop
working and won't update the profile table after a short time.The devstudio
will always get stucked then and will not be react on key stroke any more..
And i still observe another issue/problem: We are using a thread pool,
therefore the same task/jobs and methods of the objects will be executed by
different jobs. Therefore we are not interested at all, which thread is
executing wich function. It does matter. In this case i would prefer a
view, where the date is not collected for each thread, as shown below. For
this application the thread id doesn't matter. The data should be displayed
ignoring the thread id.
Best regards Stefan
It seems the the devstudio will get hang up within the microprofiler
dll....
[image: image.png]
P.S: I will check the actual version 1.9.634 also.
[image: image.png]
Am Mi., 26. Mai 2021 um 09:52 Uhr schrieb Stefan Seiwerth <
***@***.***>:
> Hi Artem,
> sorry, but so far i have no time to check the changes, but i will do it
> until the end of this week.
> Best regards Stefan
>
>
> Am So., 23. Mai 2021 um 23:48 Uhr schrieb Artem G. <
> ***@***.***>:
>
>> Hi @StefanS52u1 <https://github.com/StefanS52u1>,
>>
>> I hear what you say about the buffering. Unfortunately, the approach
>> with bipartite buffering must use locking operation on every call to a
>> profiled function and this has two significant drawbacks:
>>
>> 1. slowdown - current processor must lock the bus in order to do CAS;
>> 2. tying threads - due to locking multiple heavy loaded threads
>> being profiled may become dependent on that sequencing.
>> The new approach with multiple buffers allows to write without
>> locking anything up until the point buffer switch is required.
>> And in the build I mentioned just above I implemented an adaptive
>> mechanism that tries to keep allocated buffers amount steady, while, at the
>> same, deallocates unnecessary buffers, when a thread becomes idle.
>>
>> I kindly ask you the run it on your application to see, if it indeed
>> does what I intended it to do, so that I'd be able to close this (and
>> another similar) issue.
>>
>> Thank you!
>> Artem
>>
>> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on GitHub
>> <#73 (comment)>,
>> or unsubscribe
>> <https://github.com/notifications/unsubscribe-auth/AOWZUHHDJR7LN7TBXKG3Q2TTPFZ2FANCNFSM43ZAOUZQ>
>> .
>>
>
|
Hello Stefan! the issue with stuck updates is supposedly fixed here: https://github.com/tyoma/micro-profiler/releases/download/v1.9.633.0/micro-profiler.v1.9.634.0.vsix thank you, ps as for penter/pexit collection - you'll need to write a small asm piece where you will know what regs are in use and only if you need to get deeper (invoke cpp code) you store the full set of regs (pushad is not sufficient- youll need to store sse and avx state as well). See: https://github.com/tyoma/micro-profiler/blob/master/collector/src/calls_collector_msvc.asm |
Memory overhead is now o.k. even for big applications and a lot of threads. Thank you. |
With the buffer strategy of allocating 5000000 Bytes for buffering data for a thread will lead to performance issues for applications with many thread (>100). This will also lead with our 32-bit application , which has been running without any issues with the version 1.6.615, immediately to out of memory issues. For these kind of applications (many threads), i would prefer an a buffer strategy like the stretagy within 1.6.615 version.
The text was updated successfully, but these errors were encountered: