Buffer strategy for application with a lot of threads will lead to out of memory errors. #73

StefanS52u1 · 2021-04-29T05:49:05Z

With the buffer strategy of allocating 5000000 Bytes for buffering data for a thread will lead to performance issues for applications with many thread (>100). This will also lead with our 32-bit application , which has been running without any issues with the version 1.6.615, immediately to out of memory issues. For these kind of applications (many threads), i would prefer an a buffer strategy like the stretagy within 1.6.615 version.

tyoma · 2021-04-29T14:35:44Z

Thank you, @StefanS52u1

I'm aware of this issue, actually, there was another issue (#60), that its author has closed (I just reopened it).
I was thinking about collecting memory from less intensive threads, but to implement that I need to add this scenario to test cases and for that I'll probably need externally passed allocator (so I could mock it for testing purposes).

What would you recommend - collect on memory pressure (like, for instance, aggressively collect, when overall allocations cross some limit) or collect based on inactivity timeout? The former seems simpler to implement, as it doesn't introduce the notion of time, but it is harder to avoid active threads from suffering of buffer starvation...

thank you,
Artem

tyoma · 2021-04-29T15:17:52Z

keeping a common pool of free buffers comes to mind...

tyoma · 2021-04-29T19:00:23Z

another idea - keep empty/total ratio at a specific level, if possible without overflowing the max buffers size. If ratio is less than some low-water level and buffer count is less than max - allocate new buffer, if higher than high-water - start deallocating free buffers, instead of putting them to free buffer queue....

tyoma · 2021-04-29T19:04:06Z

the old technique of double-buffer has its benefits (for instance, the reader can read accumulated data at any time, without waiting for the writer to actually submit anything), but is about twice slower and, what is worse, introduces bigger deviations in latency - because of interlocked operations at every track() call.

tyoma · 2021-05-17T22:31:43Z

hey @StefanS52u1 @srogatch in here i implemented a sizable queue of buffers. i'll prepare a build soon (along with the fix of sockets misuse) - will you guys be able to check it out?

tyoma · 2021-05-18T03:22:02Z

This is the build: https://github.com/tyoma/micro-profiler/releases/tag/v1.9.632.0

StefanS52u1 · 2021-05-18T06:16:34Z

Hi Artem, that's really difficult to answer. Nevertheless - profiling has always a high impact on the observed executable. It will also influence thread synchronization and more. Therefore I would prefer a simple strategy. If you have any implementations, i would support you with testing, if you want it. Best regards Stefan P.S: I switched back to Micro-Profiler version 1.6.615 and here the buffer strategy and the memory footprint seems to be quite o.k. for me. Am Di., 18. Mai 2021 um 05:22 Uhr schrieb Artem G. ***@***.***

…

: This is the build: https://github.com/tyoma/micro-profiler/releases/tag/v1.9.632.0 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#73 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOWZUHB2BQWNN7XRIMLWV6TTOHMOVANCNFSM43ZAOUZQ> .

tyoma · 2021-05-18T13:25:08Z

Hi @StefanS52u1! Thank you for the reply - the build is here: https://github.com/tyoma/micro-profiler/releases/tag/v1.9.632.0

tyoma · 2021-05-23T21:48:05Z

Hi @StefanS52u1,

I hear what you say about the buffering. Unfortunately, the approach with bipartite buffering must use locking operation on every call to a profiled function and this has two significant drawbacks:

slowdown - current processor must lock the bus in order to do CAS;
tying threads - due to locking multiple heavy loaded threads being profiled may become dependent on that sequencing.

The new approach with multiple buffers allows to write without locking anything up until the point buffer switch is required.
And in the build I mentioned just above I implemented an adaptive mechanism that tries to keep allocated buffers amount steady, while, at the same, deallocates unnecessary buffers, when a thread becomes idle.

I kindly ask you the run it on your application to see, if it indeed does what I intended it to do, so that I'd be able to close this (and another similar) issue.

Thank you!
Artem

StefanS52u1 · 2021-05-26T07:52:39Z

Hi Artem, sorry, but so far i have no time to check the changes, but i will do it until the end of this week. Best regards Stefan Am So., 23. Mai 2021 um 23:48 Uhr schrieb Artem G. ***@***.***

…

: Hi @StefanS52u1 <https://github.com/StefanS52u1>, I hear what you say about the buffering. Unfortunately, the approach with bipartite buffering must use locking operation on every call to a profiled function and this has two significant drawbacks: 1. slowdown - current processor must lock the bus in order to do CAS; 2. tying threads - due to locking multiple heavy loaded threads being profiled may become dependent on that sequencing. The new approach with multiple buffers allows to write without locking anything up until the point buffer switch is required. And in the build I mentioned just above I implemented an adaptive mechanism that tries to keep allocated buffers amount steady, while, at the same, deallocates unnecessary buffers, when a thread becomes idle. I kindly ask you the run it on your application to see, if it indeed does what I intended it to do, so that I'd be able to close this (and another similar) issue. Thank you! Artem — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#73 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOWZUHHDJR7LN7TBXKG3Q2TTPFZ2FANCNFSM43ZAOUZQ> .

StefanS52u1 · 2021-07-27T19:56:45Z

Hi Artem, sorry for the delayed answer: The memory footprint seems to be much better now with the version 1.9.632. But i am observing new issues. Our test application will now start again (with the older version the i always got bad alloc due to much allocated ram). but the devstudio studio will stop working and won't update the profile table after a short time.The devstudio will always get stucked then and will not be react on key stroke any more.. And i still observe another issue/problem: We are using a thread pool, therefore the same task/jobs and methods of the objects will be executed by different jobs. Therefore we are not interested at all, which thread is executing wich function. It does matter. In this case i would prefer a view, where the date is not collected for each thread, as shown below. For this application the thread id doesn't matter. The data should be displayed ignoring the thread id. Best regards Stefan It seems the the devstudio will get hang up within the microprofiler dll.... [image: image.png] P.S: I will check the actual version 1.9.634 also. [image: image.png] Am Mi., 26. Mai 2021 um 09:52 Uhr schrieb Stefan Seiwerth < ***@***.***>:

…

Hi Artem, sorry, but so far i have no time to check the changes, but i will do it until the end of this week. Best regards Stefan Am So., 23. Mai 2021 um 23:48 Uhr schrieb Artem G. < ***@***.***>: > Hi @StefanS52u1 <https://github.com/StefanS52u1>, > > I hear what you say about the buffering. Unfortunately, the approach with > bipartite buffering must use locking operation on every call to a profiled > function and this has two significant drawbacks: > > 1. slowdown - current processor must lock the bus in order to do CAS; > 2. tying threads - due to locking multiple heavy loaded threads being > profiled may become dependent on that sequencing. > The new approach with multiple buffers allows to write without > locking anything up until the point buffer switch is required. > And in the build I mentioned just above I implemented an adaptive > mechanism that tries to keep allocated buffers amount steady, while, at the > same, deallocates unnecessary buffers, when a thread becomes idle. > > I kindly ask you the run it on your application to see, if it indeed does > what I intended it to do, so that I'd be able to close this (and another > similar) issue. > > Thank you! > Artem > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#73 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AOWZUHHDJR7LN7TBXKG3Q2TTPFZ2FANCNFSM43ZAOUZQ> > . >

StefanS52u1 · 2021-07-28T08:09:50Z

Hi, i'm observing also another issue with the microprofiler. I'm using also /GH and /Gh instrumentation with an own library and dll to find memory leaks. I will collect for each pentry call the the last address pointer within a deque. Allocating memory will then store this stack together with the allocated object id. What i observed is, that it was necessary to save all registers (with pushad) within the assembly code, before calling the C/C++ Code, Otherwise i will get unpredictable exceptions. I know this will decrease performance, but i think not too much. Best regards Stefan Am Di., 27. Juli 2021 um 21:56 Uhr schrieb Stefan Seiwerth < ***@***.***>:

…

Hi Artem, sorry for the delayed answer: The memory footprint seems to be much better now with the version 1.9.632. But i am observing new issues. Our test application will now start again (with the older version the i always got bad alloc due to much allocated ram). but the devstudio studio will stop working and won't update the profile table after a short time.The devstudio will always get stucked then and will not be react on key stroke any more.. And i still observe another issue/problem: We are using a thread pool, therefore the same task/jobs and methods of the objects will be executed by different jobs. Therefore we are not interested at all, which thread is executing wich function. It does matter. In this case i would prefer a view, where the date is not collected for each thread, as shown below. For this application the thread id doesn't matter. The data should be displayed ignoring the thread id. Best regards Stefan It seems the the devstudio will get hang up within the microprofiler dll.... [image: image.png] P.S: I will check the actual version 1.9.634 also. [image: image.png] Am Mi., 26. Mai 2021 um 09:52 Uhr schrieb Stefan Seiwerth < ***@***.***>: > Hi Artem, > sorry, but so far i have no time to check the changes, but i will do it > until the end of this week. > Best regards Stefan > > > Am So., 23. Mai 2021 um 23:48 Uhr schrieb Artem G. < > ***@***.***>: > >> Hi @StefanS52u1 <https://github.com/StefanS52u1>, >> >> I hear what you say about the buffering. Unfortunately, the approach >> with bipartite buffering must use locking operation on every call to a >> profiled function and this has two significant drawbacks: >> >> 1. slowdown - current processor must lock the bus in order to do CAS; >> 2. tying threads - due to locking multiple heavy loaded threads >> being profiled may become dependent on that sequencing. >> The new approach with multiple buffers allows to write without >> locking anything up until the point buffer switch is required. >> And in the build I mentioned just above I implemented an adaptive >> mechanism that tries to keep allocated buffers amount steady, while, at the >> same, deallocates unnecessary buffers, when a thread becomes idle. >> >> I kindly ask you the run it on your application to see, if it indeed >> does what I intended it to do, so that I'd be able to close this (and >> another similar) issue. >> >> Thank you! >> Artem >> >> — >> You are receiving this because you were mentioned. >> Reply to this email directly, view it on GitHub >> <#73 (comment)>, >> or unsubscribe >> <https://github.com/notifications/unsubscribe-auth/AOWZUHHDJR7LN7TBXKG3Q2TTPFZ2FANCNFSM43ZAOUZQ> >> . >> >

tyoma · 2021-07-29T16:12:54Z

Hello Stefan!

the issue with stuck updates is supposedly fixed here: https://github.com/tyoma/micro-profiler/releases/download/v1.9.633.0/micro-profiler.v1.9.634.0.vsix
It happened bc the collector would send the updates via timer and frontend input queue may get clogged with too many of an updates (if you happen to have 10K+ of profiled functions). This new version requests a new update only when it receives one, which balances the traffic and frontend cpu consumption. You may need though to make sure that the new version of a collector is used.
If this closes your request - let's close this one and I'll proceed with another issue of cumulative view (to sum up stats from different threads).

thank you,
Artem

ps as for penter/pexit collection - you'll need to write a small asm piece where you will know what regs are in use and only if you need to get deeper (invoke cpp code) you store the full set of regs (pushad is not sufficient- youll need to store sse and avx state as well). See: https://github.com/tyoma/micro-profiler/blob/master/collector/src/calls_collector_msvc.asm

StefanS52u1 · 2021-08-02T17:04:29Z

Memory overhead is now o.k. even for big applications and a lot of threads. Thank you.

StefanS52u1 closed this as completed Aug 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Buffer strategy for application with a lot of threads will lead to out of memory errors. #73

Buffer strategy for application with a lot of threads will lead to out of memory errors. #73

StefanS52u1 commented Apr 29, 2021

tyoma commented Apr 29, 2021

tyoma commented Apr 29, 2021

tyoma commented Apr 29, 2021

tyoma commented Apr 29, 2021 •

edited

Loading

tyoma commented May 17, 2021

tyoma commented May 18, 2021

StefanS52u1 commented May 18, 2021 via email

tyoma commented May 18, 2021

tyoma commented May 23, 2021 •

edited

Loading

StefanS52u1 commented May 26, 2021 via email

StefanS52u1 commented Jul 27, 2021 via email

StefanS52u1 commented Jul 28, 2021 via email

tyoma commented Jul 29, 2021 •

edited

Loading

StefanS52u1 commented Aug 2, 2021

Buffer strategy for application with a lot of threads will lead to out of memory errors. #73

Buffer strategy for application with a lot of threads will lead to out of memory errors. #73

Comments

StefanS52u1 commented Apr 29, 2021

tyoma commented Apr 29, 2021

tyoma commented Apr 29, 2021

tyoma commented Apr 29, 2021

tyoma commented Apr 29, 2021 • edited Loading

tyoma commented May 17, 2021

tyoma commented May 18, 2021

StefanS52u1 commented May 18, 2021 via email

tyoma commented May 18, 2021

tyoma commented May 23, 2021 • edited Loading

StefanS52u1 commented May 26, 2021 via email

StefanS52u1 commented Jul 27, 2021 via email

StefanS52u1 commented Jul 28, 2021 via email

tyoma commented Jul 29, 2021 • edited Loading

StefanS52u1 commented Aug 2, 2021

tyoma commented Apr 29, 2021 •

edited

Loading

tyoma commented May 23, 2021 •

edited

Loading

tyoma commented Jul 29, 2021 •

edited

Loading