Cleanups for struct State #274

brian-pane · 2025-01-02T20:46:43Z

Remove an unused field
Convert fields that are only used during tree-building into locals
Fix some comments

* Remove an unused field * Convert fields that are only used during tree-building into locals * Fix some comments

brian-pane · 2025-01-02T21:01:04Z

I started looking at this struct because it shows up a lot when I profile for L1 data cache misses. Based on how much memory the deflate algorithm has to read while looking for the longest match, my hypothesis is that parts of the state struct get evicted frequently from a typical 32KB L1D cache. Since the State struct uses C, with the order of fields in memory chosen by the programmer rather than the compiler, I'm thinking about trying some reordering in the future: finding the sets of fields that are often used together and putting them close together in the struct so they'll be in the same cache line. In the meantime, this PR just contains minor changes to try to make the struct easier to understand.

folkertdev

Nice, thanks!

I suspect that historically having such large values on the stack would not optimize well, but these days it should be fine (it is when I run benchmarks locally).

I suspect there are some gains to be had by reordering the fields (certainly, you can decrease performance by reordering, and what are the changes we've randomly landed on the optimal order?).

brian-pane · 2025-01-03T01:29:59Z

Can the window size ever be larger than 65536 bytes? Some of the fields that hold offsets within the window are u16 and others are usize:

    pub(crate) prev_match: u16,       /* previous match */
    pub(crate) match_available: bool, /* set if previous match exists */
    pub(crate) strstart: usize,       /* start of string to insert */
    pub(crate) match_start: usize,    /* start of matching string */

folkertdev · 2025-01-03T14:53:46Z

The window bits are capped at 15 (the value of MAX_WBITS), and we allocate twice that, so for the fields you mention here, u16 would be correct.

In general, the window allocation is actually slightly larger, the additional padding means that simd operations can be used even near the end of the window.

Also interesting would be

    pub(crate) w_size: usize,    /* LZ77 window size (32K by default) */
    pub(crate) w_bits: usize,    /* log2(w_size)  (8..16) */
    pub(crate) w_mask: usize,    /* w_size - 1 */
    pub(crate) lookahead: usize, /* number of valid bytes ahead in window */

Because instructions are much cheaper than data loads, I suspect that w_mask should just be calculated from w_bits, maybe w_size too? w_bits can be a u8, and lookahead could be a u16.

If you do start playing around with this, splitting the changes into small commits (e.g. changing the type of one field, and following the compiler errors) makes reviewing and benchmarking a lot easier.

Cleanups for struct State

c31ab54

* Remove an unused field * Convert fields that are only used during tree-building into locals * Fix some comments

folkertdev approved these changes Jan 2, 2025

View reviewed changes

folkertdev merged commit 19b0b40 into trifectatechfoundation:main Jan 2, 2025
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanups for struct State #274

Cleanups for struct State #274

brian-pane commented Jan 2, 2025

brian-pane commented Jan 2, 2025

folkertdev left a comment

brian-pane commented Jan 3, 2025

folkertdev commented Jan 3, 2025

Cleanups for struct State #274

Cleanups for struct State #274

Conversation

brian-pane commented Jan 2, 2025

brian-pane commented Jan 2, 2025

folkertdev left a comment

Choose a reason for hiding this comment

brian-pane commented Jan 3, 2025

folkertdev commented Jan 3, 2025