Skip to content

Memory allocations in Fetch method #1335

Closed
@ekalchev

Description

image

I was inspecting the memory allocations for the thread that works with MailKit and surprisingly Task<ImapToken> takes huge chunk of memory more than strings and memory streams. Not sure if this is caused by the Task itself or ImapToken inside the task. Can this be improved? May be using ValueTask<T> or ImapToken struct can help?

I am using synchronous versions of the Mailkit interface.

Activity

jstedfast

jstedfast commented on Feb 17, 2022

@jstedfast
Owner

Yea, that looks like an insane amount of memory used by Task<ImapToken>.

jstedfast

jstedfast commented on Feb 18, 2022

@jstedfast
Owner

What tool are you using to get this info? The VS Diagnostics tools aren't showing Task<ImapToken> at all for me for some reason.

Also, I think it's the tasks and not the tokens - but I could be wrong on that since ImapToken is a class so it'd be a "pointer" to the ImapToken instance in the Task<T>.

Which means there's a lot of Task<T>'s for ImapTokens floating around in your heap for some reason.

Is your code sync or async? Both code paths use the same methods that return Task<ImapToken>, so both will create those objects - I'm just curious if it makes a difference.

jstedfast

jstedfast commented on Feb 18, 2022

@jstedfast
Owner

Another question is: is this a single ImapClient connection? Or multiple? (and roughly how many?)

ekalchev

ekalchev commented on Feb 18, 2022

@ekalchev
ContributorAuthor

I am using single connection and sync methods only.

To make it simple, I created a small console app to reproduce this

using MailKit;
using MailKit.Net.Imap;
using System;
using System.Threading.Tasks;

ImapClient client = new ImapClient();
client.Connect("---");
client.Authenticate("---", "---");
client.Inbox.Open(FolderAccess.ReadWrite);
var items = client.Inbox.Fetch(0, -1, MessageSummaryItems.UniqueId);
GC.Collect();
await Task.Delay(2000);

items array in my account contains 17 000 items. Build with .net 6.0 (net48 memory usage is even worse +15%)

This is the result

image

I am using JetBrains dotTrace - it has trial version - This is JetBrains performance measuring tool but it still shows memory allocations.

I tried JetBrains dotMemory which give more information about memory allocations and I see the same picture

image

I just did very quick find and replace of Task<ImapToken> with ValueTask<ImapToken>

This is the result from dotMemory

with Task
image

with ValueTask
image

Even with ValueTask<T> async methods have their overhead. I can see that some allocations in ValueTask is now reported in ImapStream+<ReadTokenAsync>. I know you wrote it with bool runAsync for better maintenance but I am not sure this huge memory overhead worth it. Even with ValueTask there will be some unnecessary allocations for sync methods. I would rather have separate sync method implementations (at least for critical paths)

I've seen zero allocation ValueTask alternatives
https://github.com/alanmcgovern/ReusableTasks
https://github.com/mgravell/PooledAwait

jstedfast

jstedfast commented on Feb 18, 2022

@jstedfast
Owner

Yea, I was worried you might have been using the sync API and the overhead was due to that :-\

I know you wrote it with bool runAsync for better maintenance but I am not sure this huge memory overhead worth it.

You might be right... and that was one of my big fears with doing things that way. I had just hoped I was wrong :(

jstedfast

jstedfast commented on Feb 19, 2022

@jstedfast
Owner

Seems like you are looking at the number of objects created over the lifetime of the program as opposed to the number of objects that are live at any given point in time.

There are probably ways to cache the (common) ImapTokens, at least, and maybe the common Task<ImapToken>s which should reduce the number of those objects created.

I'm working on a bunch of other optimizations to reduce memory usage by reusing temporary buffers which may have a bigger impact.

ekalchev

ekalchev commented on Feb 20, 2022

@ekalchev
ContributorAuthor

That is correct. But the lifetime of the program is one call to Fetch method (or at least the significant part of it) which generates 130k Task<T> allocations for single method call. Allocations are not cheap and deallocations with GC are even more expensive. And at some point of time GC will reclaim this memory and that will impact app performance.

As an example from a real app - I am using 4 threads per mail account (one instance of ImapClient per thread) to synchronize the mail folders in parallel. With 2 accounts the app memory rise to 1.5gb. If I turn off Mailkit threads. App memory consumption is only 250mb.

I am already thinking how I can turn sync methods returning task to "pure" sync methods. My first thought was - source generator. It seems there are already examples of this
https://xoofx.com/blog/2018/12/26/generate-async-from-sync-code-with-roslyn/
dotnet/roslyn#12931
https://github.com/maca88/AsyncGenerator

It seems to be viable

jstedfast

jstedfast commented on Feb 20, 2022

@jstedfast
Owner

Allocations are not cheap and deallocations with GC are even more expensive.

Yep, which is why I have not closed it.

I just wanted to make sure we were both on the same page (and I wasn't sure my understanding of what I was seeing in dotMemory was correct because I'm not familiar with their UI).

I've got a lot of local changes that reduce memory allocations a fair bit, altho not so much with regards to ImapToken and Task<ImapToken>.

With 2 accounts the app memory rise to 1.5gb.

Thanks for this info - that helps put this issue in perspective.

I'm not sure if you've noticed, but someone has been submitting a lot of PR's lately to MimeKit that reduce memory allocations in MimeKit as well.

Reducing memory allocations will be my main focus for a while to try to get this under better control.

I noticed last night while working on this that I was able to halve the number of byte[] allocations and what I'd also like to do is significantly reduce the number of string allocations as well (which you've probably noticed some commits along those lines, but it's not enough).

jstedfast

jstedfast commented on Feb 20, 2022

@jstedfast
Owner

FWIW, here's some screenshots of progress I made yesterday (still uncommitted because I'm trying to work out some design ideas).

Before:

dotMemoryOriginal

After:

dotMemory-2022-02-19

ekalchev

ekalchev commented on Feb 20, 2022

@ekalchev
ContributorAuthor

Having byte[] and string allocations reduced will be great! I was thinking that probably ReadOnlySpan<T> will be huge for MailKit and MimeKit in the future since you can completely avoid making byte[] and string copies when parsing. I guess this is not yet an options since the code is still targeting older version where this is unsupported.

Still any optimization for byte[] and string won't affect the allocations of Task<T>. I believe the easiest and safer approach for mitigating Task<T> allocations is to convert private and protected methods to return ValueTask<T>. and keep Task<T> for the public interface
It seems that in the latest frameworks ValueTask<T> is allocation free
dotnet/coreclr#26310

ValueTask<T> has some limitations
https://devblogs.microsoft.com/dotnet/understanding-the-whys-whats-and-whens-of-valuetask/
but I don't think MailKit code await Tasks concurrently or multiple times.

I am planning to research the possibility to create a source generator and use it for all async methods that has runAsync argument to generate a sync version of the methods. If we have any success with that I'll let you know.

ekalchev

ekalchev commented on Feb 20, 2022

@ekalchev
ContributorAuthor

I spent a few hours to convert Fetch method and its dependency methods to pure 'sync' methods. This is the result (there are still Task<T> but I they come from Connect and Authenticate methods

image

with sync methods returning Task<T> - Fetch method completed in 6.4 sec
image

with sync methods returning T - Fetch completed in 2.5 sec
image

Time time for which both test completed is not very accurate because it depends from the server response. However I ran multiple tests and pure 'sync' version is consistently faster.

Those are the average times:
Sync with Task<T> - 4.5sec
Sync with T - 3.5 sec. - Uses 60mb less that Task<T> version

jstedfast

jstedfast commented on Feb 20, 2022

@jstedfast
Owner

I was thinking that probably ReadOnlySpan will be huge for MailKit and MimeKit...

That's actually one of the main things being used to reduce allocations in MimeKit right now (and I've started using it in a few places in MailKit as well) when the framework supports it.

I'm also starting to phase out older frameworks. I'll likely be dropping support for anything older than 4.6.2 in late April/early May, for example.

Still any optimization for byte[] and string won't affect the allocations of Task.

Correct.

I believe the easiest and safer approach for mitigating Task allocations is to convert private and protected methods to return ValueTask. and keep Task for the public interface

Yea, that is my current thinking as well.

but I don't think MailKit code await Tasks concurrently or multiple times.

Correct :)

I am planning to research the possibility to create a source generator and use it for all async methods that has runAsync argument to generate a sync version of the methods. If we have any success with that I'll let you know.

The naming convention I've been using in MimeKit and MailKit is actually doAsync, but I've already got some of the API "ported" to have both sync and async from the ground up (and that's the order I'm working in, ImapStream -> ImapEngine -> further up). I'm also trying to refactor things a bit to share as much logic as possible while I go so that it's not a full duplication of code.

jstedfast

jstedfast commented on Feb 20, 2022

@jstedfast
Owner

While using dotMemory, I noticed that SslStream is all async under the public sync APIs so I wonder if it's possible to achieve good results w/o resorting to sync and async versions of all the underlying private I/O API's.

FWIW, for my fairly small test IMAP account, I was getting about 1.4 K ImapTokens instantiated at a cost of 42.2 KB memory.

With bcc5791 applied, it's now 726 ImapTokens instantiated at a cost of 22.7 KB memory.

If I can do that for the Task<ImapToken>s as well, that would be pretty significant.

jstedfast

jstedfast commented on Feb 22, 2022

@jstedfast
Owner

I refactored ImapStream to have fully sync versions (i.e. no calling an Async function with doAsync = false).

65 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    performanceImprovements to speed or memory consumption

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Memory allocations in Fetch method · Issue #1335 · jstedfast/MailKit