Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer strategy for application with a lot of threads will lead to out of memory errors. #73

Closed
StefanS52u1 opened this issue Apr 29, 2021 · 14 comments

Comments

@StefanS52u1
Copy link

With the buffer strategy of allocating 5000000 Bytes for buffering data for a thread will lead to performance issues for applications with many thread (>100). This will also lead with our 32-bit application , which has been running without any issues with the version 1.6.615, immediately to out of memory issues. For these kind of applications (many threads), i would prefer an a buffer strategy like the stretagy within 1.6.615 version.

@tyoma
Copy link
Owner

tyoma commented Apr 29, 2021

Thank you, @StefanS52u1

I'm aware of this issue, actually, there was another issue (#60), that its author has closed (I just reopened it).
I was thinking about collecting memory from less intensive threads, but to implement that I need to add this scenario to test cases and for that I'll probably need externally passed allocator (so I could mock it for testing purposes).

What would you recommend - collect on memory pressure (like, for instance, aggressively collect, when overall allocations cross some limit) or collect based on inactivity timeout? The former seems simpler to implement, as it doesn't introduce the notion of time, but it is harder to avoid active threads from suffering of buffer starvation...

thank you,
Artem

@tyoma
Copy link
Owner

tyoma commented Apr 29, 2021

keeping a common pool of free buffers comes to mind...

@tyoma
Copy link
Owner

tyoma commented Apr 29, 2021

another idea - keep empty/total ratio at a specific level, if possible without overflowing the max buffers size. If ratio is less than some low-water level and buffer count is less than max - allocate new buffer, if higher than high-water - start deallocating free buffers, instead of putting them to free buffer queue....

@tyoma
Copy link
Owner

tyoma commented Apr 29, 2021

the old technique of double-buffer has its benefits (for instance, the reader can read accumulated data at any time, without waiting for the writer to actually submit anything), but is about twice slower and, what is worse, introduces bigger deviations in latency - because of interlocked operations at every track() call.

@tyoma
Copy link
Owner

tyoma commented May 17, 2021

hey @StefanS52u1 @srogatch in here i implemented a sizable queue of buffers. i'll prepare a build soon (along with the fix of sockets misuse) - will you guys be able to check it out?

@tyoma
Copy link
Owner

tyoma commented May 18, 2021

@StefanS52u1
Copy link
Author

StefanS52u1 commented May 18, 2021 via email

@tyoma
Copy link
Owner

tyoma commented May 18, 2021

Hi @StefanS52u1! Thank you for the reply - the build is here: https://github.com/tyoma/micro-profiler/releases/tag/v1.9.632.0

@tyoma
Copy link
Owner

tyoma commented May 23, 2021

Hi @StefanS52u1,

I hear what you say about the buffering. Unfortunately, the approach with bipartite buffering must use locking operation on every call to a profiled function and this has two significant drawbacks:

  1. slowdown - current processor must lock the bus in order to do CAS;
  2. tying threads - due to locking multiple heavy loaded threads being profiled may become dependent on that sequencing.

The new approach with multiple buffers allows to write without locking anything up until the point buffer switch is required.
And in the build I mentioned just above I implemented an adaptive mechanism that tries to keep allocated buffers amount steady, while, at the same, deallocates unnecessary buffers, when a thread becomes idle.

I kindly ask you the run it on your application to see, if it indeed does what I intended it to do, so that I'd be able to close this (and another similar) issue.

Thank you!
Artem

@StefanS52u1
Copy link
Author

StefanS52u1 commented May 26, 2021 via email

@StefanS52u1
Copy link
Author

StefanS52u1 commented Jul 27, 2021 via email

@StefanS52u1
Copy link
Author

StefanS52u1 commented Jul 28, 2021 via email

@tyoma
Copy link
Owner

tyoma commented Jul 29, 2021

Hello Stefan!

the issue with stuck updates is supposedly fixed here: https://github.com/tyoma/micro-profiler/releases/download/v1.9.633.0/micro-profiler.v1.9.634.0.vsix
It happened bc the collector would send the updates via timer and frontend input queue may get clogged with too many of an updates (if you happen to have 10K+ of profiled functions). This new version requests a new update only when it receives one, which balances the traffic and frontend cpu consumption. You may need though to make sure that the new version of a collector is used.
If this closes your request - let's close this one and I'll proceed with another issue of cumulative view (to sum up stats from different threads).

thank you,
Artem

ps as for penter/pexit collection - you'll need to write a small asm piece where you will know what regs are in use and only if you need to get deeper (invoke cpp code) you store the full set of regs (pushad is not sufficient- youll need to store sse and avx state as well). See: https://github.com/tyoma/micro-profiler/blob/master/collector/src/calls_collector_msvc.asm

@StefanS52u1
Copy link
Author

Memory overhead is now o.k. even for big applications and a lot of threads. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants