Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC-free automatic memory management #387

Open
pschachte opened this issue May 2, 2023 · 3 comments
Open

GC-free automatic memory management #387

pschachte opened this issue May 2, 2023 · 3 comments
Labels
enhancement New feature or request performance Issues related to performance of generated code research project

Comments

@pschachte
Copy link
Owner

pschachte commented May 2, 2023

This project will replace the garbage collector in the Wybe system with
compiler-mediated direct allocation and deallocation. The details are to be
determined, but there are a number of different strategies that can be employed.
Structures can be allocated on the stack if they won't escape the creating
procedure, and couldn't be deallocated much sooner than procedure exit (see #358). They can
be allocated on the heap and deallocated after the last use, malloc/free style.
When fixed number of structures will last be used about the same time, they can
be allocated and freed in a single block allocation. When a variable number of
structures will last be used around the same time, they can be allocated from a
chain of buffers that would be passed from procedure to procedure as an extra
argument. Where the analysis cannot determine how many structures maintain
pointers to a structure, the compiler can generate code to maintain a reference
count at runtime, and free the structure when the count goes to 0.

The basis of all this would be a very precise pointer analysis. One
advantage of Wybe in this is that aliasing is not relevant semantically, so the
compiler can choose to copy structures rather than sharing them, if that makes
memory management easier.

This is probably a PhD-scale project.

@pschachte pschachte added enhancement New feature or request research project performance Issues related to performance of generated code labels May 2, 2023
@PaulBone
Copy link

You may want to fall-back to GC rather than RC when a lifetime can't be proven. Without getting into very fancy optimised RC (or GC). RC is slower but has the advantage that memory is freed the moment it isn't needed, allowing you to use less memory before swapping.

Background: I worry that people chase "GC-free" but when I ask them why it's desirable they say that they want to avoid GC pauses, but they could just use a pauseless garbage collector. This is especially confusing when work is shifted onto the programmer to manage lifetimes (such as in Rust), but I think you're implying that this should be entirely automatic and the programmer shouldn't need to get involved.

@pschachte
Copy link
Owner Author

My hope would be that the pointer analysis would be good enough that reference counting wouldn't be needed for many data structures, so the performance penalty would be low. If most memory is managed another way, you don't have much reclamation to amortise the cost of GC over, so overall, RC for those few things would wind up cheaper. On the other hand, if many allocations need to be handled by reference counting, you're right, you'd be better off with GC.

Another possible trick, only possible in a language with value semantics, would be to sometimes clone a data structure rather than aliasing it. You wouldn't want to do it too often, but if you only have two references to a thing, and it's not too big, and you can precisely track the last use of each of those references, it might pay off.

@PaulBone
Copy link

PaulBone commented Jun 7, 2023

If most memory is managed another way, you don't have much reclamation to amortise the cost of GC over

Hrm, It'd be necessary to track "Non-GC objects that might contain GC pointers" as part of the GC's root set so that the GC doesn't need to walk over everything to make GC competitive. 🤷 I can't guess what would be best.

Another possible trick, only possible in a language with value semantics, would be to sometimes clone a data structure rather than aliasing it. You wouldn't want to do it too often, but if you only have two references to a thing, and it's not too big, and you can precisely track the last use of each of those references, it might pay off.

I like this idea, it can be combined with either the GC or RC fallback as an optimisation that is just available if the compiler thinks it's a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Issues related to performance of generated code research project
Projects
None yet
Development

No branches or pull requests

2 participants