Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanups for struct State #274

Merged
merged 1 commit into from
Jan 2, 2025

Conversation

brian-pane
Copy link

  • Remove an unused field
  • Convert fields that are only used during tree-building into locals
  • Fix some comments

* Remove an unused field
* Convert fields that are only used during tree-building into locals
* Fix some comments
@brian-pane
Copy link
Author

I started looking at this struct because it shows up a lot when I profile for L1 data cache misses. Based on how much memory the deflate algorithm has to read while looking for the longest match, my hypothesis is that parts of the state struct get evicted frequently from a typical 32KB L1D cache. Since the State struct uses C, with the order of fields in memory chosen by the programmer rather than the compiler, I'm thinking about trying some reordering in the future: finding the sets of fields that are often used together and putting them close together in the struct so they'll be in the same cache line. In the meantime, this PR just contains minor changes to try to make the struct easier to understand.

Copy link
Collaborator

@folkertdev folkertdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks!

I suspect that historically having such large values on the stack would not optimize well, but these days it should be fine (it is when I run benchmarks locally).

I suspect there are some gains to be had by reordering the fields (certainly, you can decrease performance by reordering, and what are the changes we've randomly landed on the optimal order?).

@folkertdev folkertdev merged commit 19b0b40 into trifectatechfoundation:main Jan 2, 2025
20 checks passed
@brian-pane
Copy link
Author

Can the window size ever be larger than 65536 bytes? Some of the fields that hold offsets within the window are u16 and others are usize:

    pub(crate) prev_match: u16,       /* previous match */
    pub(crate) match_available: bool, /* set if previous match exists */
    pub(crate) strstart: usize,       /* start of string to insert */
    pub(crate) match_start: usize,    /* start of matching string */

@folkertdev
Copy link
Collaborator

The window bits are capped at 15 (the value of MAX_WBITS), and we allocate twice that, so for the fields you mention here, u16 would be correct.

In general, the window allocation is actually slightly larger, the additional padding means that simd operations can be used even near the end of the window.

Also interesting would be

    pub(crate) w_size: usize,    /* LZ77 window size (32K by default) */
    pub(crate) w_bits: usize,    /* log2(w_size)  (8..16) */
    pub(crate) w_mask: usize,    /* w_size - 1 */
    pub(crate) lookahead: usize, /* number of valid bytes ahead in window */

Because instructions are much cheaper than data loads, I suspect that w_mask should just be calculated from w_bits, maybe w_size too? w_bits can be a u8, and lookahead could be a u16.

If you do start playing around with this, splitting the changes into small commits (e.g. changing the type of one field, and following the compiler errors) makes reviewing and benchmarking a lot easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants