Closed
Description
I was inspecting the memory allocations for the thread that works with MailKit and surprisingly Task<ImapToken> takes huge chunk of memory more than strings and memory streams. Not sure if this is caused by the Task itself or ImapToken inside the task. Can this be improved? May be using ValueTask<T> or ImapToken struct can help?
I am using synchronous versions of the Mailkit interface.
Activity
jstedfast commentedon Feb 17, 2022
Yea, that looks like an insane amount of memory used by
Task<ImapToken>
.jstedfast commentedon Feb 18, 2022
What tool are you using to get this info? The VS Diagnostics tools aren't showing
Task<ImapToken>
at all for me for some reason.Also, I think it's the tasks and not the tokens - but I could be wrong on that since ImapToken is a class so it'd be a "pointer" to the ImapToken instance in the
Task<T>
.Which means there's a lot of
Task<T>
's for ImapTokens floating around in your heap for some reason.Is your code sync or async? Both code paths use the same methods that return
Task<ImapToken>
, so both will create those objects - I'm just curious if it makes a difference.jstedfast commentedon Feb 18, 2022
Another question is: is this a single ImapClient connection? Or multiple? (and roughly how many?)
ekalchev commentedon Feb 18, 2022
I am using single connection and sync methods only.
To make it simple, I created a small console app to reproduce this
items
array in my account contains 17 000 items. Build with .net 6.0 (net48 memory usage is even worse +15%)This is the result
I am using JetBrains dotTrace - it has trial version - This is JetBrains performance measuring tool but it still shows memory allocations.
I tried JetBrains dotMemory which give more information about memory allocations and I see the same picture
I just did very quick find and replace of
Task<ImapToken>
withValueTask<ImapToken>
This is the result from dotMemory
with Task
with ValueTask
Even with ValueTask<T> async methods have their overhead. I can see that some allocations in ValueTask is now reported in ImapStream+<ReadTokenAsync>. I know you wrote it with
bool runAsync
for better maintenance but I am not sure this huge memory overhead worth it. Even with ValueTask there will be some unnecessary allocations for sync methods. I would rather have separate sync method implementations (at least for critical paths)I've seen zero allocation ValueTask alternatives
https://github.com/alanmcgovern/ReusableTasks
https://github.com/mgravell/PooledAwait
jstedfast commentedon Feb 18, 2022
Yea, I was worried you might have been using the sync API and the overhead was due to that :-\
You might be right... and that was one of my big fears with doing things that way. I had just hoped I was wrong :(
jstedfast commentedon Feb 19, 2022
Seems like you are looking at the number of objects created over the lifetime of the program as opposed to the number of objects that are live at any given point in time.
There are probably ways to cache the (common)
ImapTokens
, at least, and maybe the commonTask<ImapToken>
s which should reduce the number of those objects created.I'm working on a bunch of other optimizations to reduce memory usage by reusing temporary buffers which may have a bigger impact.
ekalchev commentedon Feb 20, 2022
That is correct. But the lifetime of the program is one call to Fetch method (or at least the significant part of it) which generates 130k Task<T> allocations for single method call. Allocations are not cheap and deallocations with GC are even more expensive. And at some point of time GC will reclaim this memory and that will impact app performance.
As an example from a real app - I am using 4 threads per mail account (one instance of ImapClient per thread) to synchronize the mail folders in parallel. With 2 accounts the app memory rise to 1.5gb. If I turn off Mailkit threads. App memory consumption is only 250mb.
I am already thinking how I can turn sync methods returning task to "pure" sync methods. My first thought was - source generator. It seems there are already examples of this
https://xoofx.com/blog/2018/12/26/generate-async-from-sync-code-with-roslyn/
dotnet/roslyn#12931
https://github.com/maca88/AsyncGenerator
It seems to be viable
jstedfast commentedon Feb 20, 2022
Yep, which is why I have not closed it.
I just wanted to make sure we were both on the same page (and I wasn't sure my understanding of what I was seeing in dotMemory was correct because I'm not familiar with their UI).
I've got a lot of local changes that reduce memory allocations a fair bit, altho not so much with regards to
ImapToken
andTask<ImapToken>
.Thanks for this info - that helps put this issue in perspective.
I'm not sure if you've noticed, but someone has been submitting a lot of PR's lately to MimeKit that reduce memory allocations in MimeKit as well.
Reducing memory allocations will be my main focus for a while to try to get this under better control.
I noticed last night while working on this that I was able to halve the number of
byte[]
allocations and what I'd also like to do is significantly reduce the number ofstring
allocations as well (which you've probably noticed some commits along those lines, but it's not enough).jstedfast commentedon Feb 20, 2022
FWIW, here's some screenshots of progress I made yesterday (still uncommitted because I'm trying to work out some design ideas).
Before:
After:
ekalchev commentedon Feb 20, 2022
Having byte[] and string allocations reduced will be great! I was thinking that probably
ReadOnlySpan<T>
will be huge for MailKit and MimeKit in the future since you can completely avoid making byte[] and string copies when parsing. I guess this is not yet an options since the code is still targeting older version where this is unsupported.Still any optimization for
byte[]
andstring
won't affect the allocations ofTask<T>
. I believe the easiest and safer approach for mitigatingTask<T>
allocations is to convert private and protected methods to returnValueTask<T>
. and keepTask<T>
for the public interfaceIt seems that in the latest frameworks
ValueTask<T>
is allocation freedotnet/coreclr#26310
ValueTask<T>
has some limitationshttps://devblogs.microsoft.com/dotnet/understanding-the-whys-whats-and-whens-of-valuetask/
but I don't think MailKit code await Tasks concurrently or multiple times.
I am planning to research the possibility to create a source generator and use it for all async methods that has
runAsync
argument to generate a sync version of the methods. If we have any success with that I'll let you know.ekalchev commentedon Feb 20, 2022
I spent a few hours to convert Fetch method and its dependency methods to pure 'sync' methods. This is the result (there are still
Task<T>
but I they come from Connect and Authenticate methodswith sync methods returning
Task<T>
- Fetch method completed in 6.4 secwith sync methods returning
T
- Fetch completed in 2.5 secTime time for which both test completed is not very accurate because it depends from the server response. However I ran multiple tests and pure 'sync' version is consistently faster.
Those are the average times:
Sync with
Task<T>
- 4.5secSync with
T
- 3.5 sec. - Uses 60mb less thatTask<T>
versionjstedfast commentedon Feb 20, 2022
That's actually one of the main things being used to reduce allocations in MimeKit right now (and I've started using it in a few places in MailKit as well) when the framework supports it.
I'm also starting to phase out older frameworks. I'll likely be dropping support for anything older than 4.6.2 in late April/early May, for example.
Correct.
Yea, that is my current thinking as well.
Correct :)
The naming convention I've been using in MimeKit and MailKit is actually
doAsync
, but I've already got some of the API "ported" to have both sync and async from the ground up (and that's the order I'm working in, ImapStream -> ImapEngine -> further up). I'm also trying to refactor things a bit to share as much logic as possible while I go so that it's not a full duplication of code.jstedfast commentedon Feb 20, 2022
While using dotMemory, I noticed that SslStream is all async under the public sync APIs so I wonder if it's possible to achieve good results w/o resorting to sync and async versions of all the underlying private I/O API's.
FWIW, for my fairly small test IMAP account, I was getting about 1.4 K
ImapToken
s instantiated at a cost of 42.2 KB memory.With bcc5791 applied, it's now 726
ImapToken
s instantiated at a cost of 22.7 KB memory.If I can do that for the
Task<ImapToken>
s as well, that would be pretty significant.jstedfast commentedon Feb 22, 2022
I refactored ImapStream to have fully sync versions (i.e. no calling an Async function with doAsync = false).
65 remaining items