Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loader: Improve physical memory mapping performance at startup #113

Open
JonasKruckenberg opened this issue Sep 15, 2024 · 4 comments
Open
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@JonasKruckenberg
Copy link
Owner

The loader automatically maps all of physical memory at a chosen offset in kernel space (for having access to page tables, introspection of the kernels elf file etc).
The current implementation is quite sub-par though as it maps the huge range using the most fine-grained 4k pages (wasting a lot of cycles and memory in the process). Instead we should be mapping with megapages or gigapages (the region allocated by the virt_alloc for mapping is already gigapage-aligned).

Maybe we can even figure out a smart way to map the region using the fewest pages possible by mapping as much as we can using gigapages, then stepping down to megapages and map as much as possible using those and only map using regular pages for the rest of the requested range.

@JonasKruckenberg JonasKruckenberg added good first issue Good for newcomers enhancement New feature or request labels Sep 15, 2024
@alegnani
Copy link

I was having a look at the issue and realized there would need to be quite a lot of changes:
The current implementation relies on walk_mut to traverse the page table hierarchy and calls on_leaf at level 0. As we want to map superpages we would need to call this function earlier. If we are only performing translation we can get away by checking if the attributes of the PTE match Mode::ENTRY_FLAGS_LEAF. This is not going to work when we first map the page and we would need to pass additional context to force walk_mut to call on_leaf early. This could be fixed by either passing a level at which we want to stop our walk or not use the walk function at all. The latter would probably have better performance as it doesn't require us to walk the page tables every time.

Just wanted to have your opinion before hacking away.

@JonasKruckenberg
Copy link
Owner Author

JonasKruckenberg commented Oct 26, 2024

First off: heya ✌️
second: yeah what you're saying sounds about right. Thinking ahead we'll definitely benefit from being able the explicitly specify the level to map at (e.g. for wasm linear memories we'll always map 6gib slots)
if you can find a way to do this without having to recursively walk the page table every time even better!

also: the current mapping code is a bit overengineered bc I thought we'd need to support mutating the page table tree which turns out we don't really, so dont hesitate to do bigger sweeping changes if thats needed for faster mapping, no one is emotionally attached to that code :)

@alegnani
Copy link

alegnani commented Dec 8, 2024

Finally managed to work on superpage support.
I implemented the algorithm that splits a memory region into a region that is mappable with a superpage (in this case a 1GiB or 2MiB page) and 2 optional padding regions as part of the map_range function.

Unfortunately the virtual and physical memory regions used to call map_range from map_physical_memory are not at the same offset from a superpage, meaning they can only be mapped with 4K pages. This is because both the physical and virtual addresses have to be correctly aligned.
A (stupid) example could be physical memory 0x0..0x1000000 (which is aligned on 2MiB boundary) and virtual memory 0x1000..0x1001000 (0x1000 offset from 2MiB boundary).

This could be fixed in map_physical_memory by breaking up the physical memory and having multiple mappings. Continuing the example above we would map virtual 0x1000...0x1000000 to physical 0x1000..0x1000000 and virtual 0x1000000..0x1001000 to physical 0x0..0x1000.

Is there any other way to make this work? If not, are there any downsides to this approach?

@JonasKruckenberg
Copy link
Owner Author

Hey @alegnani that seems like a totally suitable solution to me, but tbh I don't fully grasp the nuances from just a description so feel free to open a pull request and we could discuss there further ;) at the same time, I have been working on a different implementation in the meantime (bc it was blocking more useful higher level virtmem features) and came up with this #158 (specifically this

pub fn map_contiguous(
) if you want you can have a look at that implementation and see if that aligns with yours/what we could do to improve it further maybe 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants