Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge multiple mallocs into one large malloc #357

Open
jimbxb opened this issue Sep 27, 2022 · 4 comments
Open

Merge multiple mallocs into one large malloc #357

jimbxb opened this issue Sep 27, 2022 · 4 comments
Labels
enhancement New feature or request performance Issues related to performance of generated code research project

Comments

@jimbxb
Copy link
Collaborator

jimbxb commented Sep 27, 2022

Another very simple optimisation that we could implement is to pull multiple heap allocations together, allocating one large structure, and using segments of the allocated memory for each of the separate allocations.

I'm not sure of the repercussions of this on the GC, as it would change the allocation pattern from lots of small allocations to less larger allocations.

This could also be extended to allocating all memory for a proc at the start of the call, pulling all allocations over every branch, using different offsets for each branch if necessary. If branches use drastically different amounts of memory, this would likely be less desirable, so some heuristic could be involved.

Paul Bone did a write up that may be useful for considerations here
https://paul.bone.id.au/blog/2016/10/08/memory-fragmentation-in-boehmgc/

Originally posted by @jimbxb in #352 (comment)

@pschachte
Copy link
Owner

We'll want to do some benchmarking of this to explore different heuristics for which allocations to merge. One interesting heuristic would be to merge allocations where the address of one allocation is stored in memory allocated by another, on the theory that when the outer one needs to be collected, the inner one will, too (though not necessarily vice versa).

@pschachte pschachte added enhancement New feature or request research project performance Issues related to performance of generated code labels Sep 27, 2022
@jimbxb
Copy link
Collaborator Author

jimbxb commented Sep 27, 2022

Linked structures are likely a good candidate also, as having the structures allocated close by in memory could be good for cache performance.

@PaulBone
Copy link

PaulBone commented Oct 3, 2022

In the general case merged allocations don't work with GCs because many GCs don't support interior pointers. That is if the GC knows it has allocated the address 0xABC0 but you have a pointer to 0xABC8, the GC might not recognise this as a pointer and might free your memory early. So you want to check if your GC supports interior pointers and what the limits are. But your specific case where A always points to B is safe if you know that A always has a longer-or-equal lifetime than B.

One optimisation some GCs support, usually those with bump-allocation for the nursery (or semispace). Is that a test-for-space can be separated from the actual allocation. If an allocation looks like this:


if (bump_pointer + size_to_allocate >= limit) {
   do_gc();
}
new_object = bump_pointer;
bump_pointer += size_to_allocate;

And this code would be inlined on every allocation site, then when there are multiple allocations you can instead put:

if (bump_pointer + total_size >= limit) {
   do_gc();
}
new_object_1 = bump_pointer;
bump_pointer += size_1;
new_object_2 = bump_pointer;
bump_pointer += size_2;
new_object_3 = bump_pointer;
bump_pointer += size_3;

There must be no calls to other functions that could allocate between the test and last allocation.

Each of these allocations is still a separate heap object so you don't save memory or other execution costs, only the test is amortised. But on the other hand the object's lifetimes can be completely unrelated.

@PaulBone
Copy link

PaulBone commented Oct 3, 2022

I checked with @jimbxb, Since Wybe uses BoehmGC the "if they have related lifetimes and one always points to the other" optimisation is available, so is some level of interior pointer support, but if you want full support it may be an extra compile time option and cause the GC to retain more memory due to having more false-positive pointers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Issues related to performance of generated code research project
Projects
None yet
Development

No branches or pull requests

3 participants