Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a compress-level parameter #418

Merged
merged 1 commit into from
Apr 18, 2019
Merged

Conversation

dybvig
Copy link
Member

@dybvig dybvig commented Apr 11, 2019

This patch adds a compress-level parameter that can be used to control the amount of effort expended compressing, whether with the default LZ4 compression or gzip compression. The parameter can take on one of the symbolic values low, medium, high, or maximum. The default is medium, which for LZ4 currently maps to LZ4HC_CLEVEL_MIN and is both more effective and slower (at compression, but not decompression) than the previous default. The patch also makes several changes to the LZ4-support code added in #410; in particular, it restructures the code a bit, puts the compress format directly in the thread context (which now also holds the compress level), and changes how the compressed bytevector compression format and length are stored in the header word. I have tested the changes on OSX and Linux but not Windows.

@dybvig dybvig requested a review from burgerrg April 11, 2019 02:44
@mflatt
Copy link
Contributor

mflatt commented Apr 11, 2019

These changes look good to me.

(Just to confirm, Racket-on-Chez builds fine with the patch, its boot file size goes down by about 17%, and load time remains about the same and maybe slightly faster.)

Copy link
Contributor

@burgerrg burgerrg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes and for finding the missing documentation from the named condition and mutex change! I found a bug in the tag extraction of compressed bytevectors, a couple missing casts for the Windows C compiler, and a couple typos. I'm running the mats on Windows now. So far ta6nt passed o=0 and o=3.

c/externs.h Outdated Show resolved Hide resolved
c/compress-io.c Outdated Show resolved Hide resolved
c/new-io.c Outdated Show resolved Hide resolved
c/new-io.c Outdated Show resolved Hide resolved
csug/io.stex Outdated Show resolved Hide resolved
csug/io.stex Outdated Show resolved Hide resolved
s/bytevector.ss Outdated Show resolved Hide resolved
@dybvig dybvig requested a review from burgerrg April 11, 2019 17:41
c/new-io.c Outdated Show resolved Hide resolved
c/new-io.c Outdated Show resolved Hide resolved
@burgerrg
Copy link
Contributor

All the Windows mats passed!

@dybvig
Copy link
Member Author

dybvig commented Apr 12, 2019

Thanks for the corrections!

@dybvig dybvig self-assigned this Apr 12, 2019
@dybvig dybvig requested a review from burgerrg April 17, 2019 00:25
c/compress-io.c Show resolved Hide resolved
c/thread.c Outdated Show resolved Hide resolved
@owaddell
Copy link
Contributor

Thanks for all the changes on this branch! Valgrind's memcheck tool is happy now.

Note typo on line 85 of compress-io.c: "lcoal" -> "local".

… other related improvements

- added compress-level parameter to select a compression level for
  file writing and changed the default for lz4 compression to do a
  better job compressing.  finished splitting glz input routines
  apart from glz output routines and did a bit of other restructuring.
  removed gzxfile struct-as-bytevector wrapper and moved its fd
  into glzFile.  moved DEACTIVATE to before glzdopen_input calls
  in S_new_open_input_fd and S_compress_input_fd, since glzdopen_input
  reads from the file and could block.  the compress format and now
  level are now recorded directly the thread context.  replaced
  as-gz? flag bit in compressed bytevector header word with a small
  number of bits recording the compression format at the bottom of
  the header word.  flushed a couple of bytevector compression mats
  that depended on the old representation.  (these last few changes
  should make adding new compression formats easier.)  added
  s-directory build options to choose whether to compress and, if
  so, the format and level.
    compress-io.h, compress-io.c, new-io.c, equates.h, system.h,
    scheme.c, gc.c,
    io.ss, cmacros.ss, back.ss, bytevector.ss, primdata.ss, s/Mf-base,
    io.ms, mat.ss, bytevector.ms, root-experr*,
    release_notes.stex, io.stex, system.stex, objects.stex
- improved the effectiveness of LZ4 boot-file compression to within
  15% of gzip by increasing the lz4 output-port in_buffer size to
  1<<18.  With the previous size (1<<14) LZ4-compressed boot files
  were about 50% larger.  set the lz4 input-port in_buffer and
  out_buffer sizes to 1<<12 and 1<<14.  there's no clear win at
  present for larger input-port buffer sizes.
    compress-io.c
- To reduce the memory hit for the increased output-port in_buffer
  size and the corresponding increase in computed out_buffer size,
  one output-side out_buffer is now allocated (lazily) per thread
  and stored in the thread context.  The other buffers are now
  directly a part of the lz4File_out and lz4File_in structures
  rather than allocated separately.
    compress-io.c, scheme.c, gc.c,
    cmacros.ss
- split out the buffer emit code from glzwrite_lz4 into a
  separate glzemit_lz4 helper that is now also used by gzclose
  so we can avoid dealing with a NULL buffer in glzwrite_lz4.
  glzwrite_lz4 also uses it to writing large buffers directly and
  avoid the memcpy.
    compress-io.c
- replaced lz4File_out and lz4File_in mode enumeration with the
  compress format and inputp boolean.  using switch to check and
  raising exceptions for unexpected values to further simplify
  adding new compression formats in the future.
    compress-io.c
- replaced the never-defined struct lz4File pointer in glzFile
  union with the more specific struct lz4File_in_r and Lz4File_out_r
  pointers.
    compress-io.h, compress-io.c
- added free of lz4 structures to gzclose.  also changed file-close
  logic generally so that (1) port is marked closed before anything is
  freed to avoid dangling pointers in the case of an interrupt or
  error, and (2) structures are freed even in the case of a write
  or close error, before the error is reported.  also now mallocing
  glz and lz4 structures after possibility of errors have passed where
  possible and freeing them when not.
    compress-io.c,
    io.ss
- added return-value checks to malloc calls and to a couple of other
  C-library calls.
    compress-io.c
- corrected EINTR checks to look at errno rather than return codes.
    compress-io.c
- added S_ prefixes to the glz* exports
    externs.h, compress-io.c, new-io.c, scheme.c, fasl.c
- added entries for mutex-name and mutex-thread
    threads.stex
@burgerrg
Copy link
Contributor

It compiles cleanly in Microsoft Visual Studio, and so far the ta6nt mats are passing.

@dybvig dybvig merged commit 3ea6f8e into cisco:master Apr 18, 2019
@dybvig dybvig deleted the compress-level branch April 18, 2019 17:06
mflatt pushed a commit to racket/ChezScheme that referenced this pull request Mar 24, 2021
mflatt pushed a commit to mflatt/ChezScheme that referenced this pull request Oct 10, 2023
Add a compress-level parameter

Original commit: 3ea6f8e
mflatt pushed a commit to mflatt/ChezScheme that referenced this pull request Oct 10, 2023
Add a compress-level parameter

Original commit: 3ea6f8e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants