-
Notifications
You must be signed in to change notification settings - Fork 987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a compress-level parameter #418
Conversation
These changes look good to me. (Just to confirm, Racket-on-Chez builds fine with the patch, its boot file size goes down by about 17%, and load time remains about the same and maybe slightly faster.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes and for finding the missing documentation from the named condition and mutex change! I found a bug in the tag extraction of compressed bytevectors, a couple missing casts for the Windows C compiler, and a couple typos. I'm running the mats on Windows now. So far ta6nt passed o=0 and o=3.
All the Windows mats passed! |
Thanks for the corrections! |
Thanks for all the changes on this branch! Valgrind's memcheck tool is happy now. Note typo on line 85 of compress-io.c: "lcoal" -> "local". |
… other related improvements - added compress-level parameter to select a compression level for file writing and changed the default for lz4 compression to do a better job compressing. finished splitting glz input routines apart from glz output routines and did a bit of other restructuring. removed gzxfile struct-as-bytevector wrapper and moved its fd into glzFile. moved DEACTIVATE to before glzdopen_input calls in S_new_open_input_fd and S_compress_input_fd, since glzdopen_input reads from the file and could block. the compress format and now level are now recorded directly the thread context. replaced as-gz? flag bit in compressed bytevector header word with a small number of bits recording the compression format at the bottom of the header word. flushed a couple of bytevector compression mats that depended on the old representation. (these last few changes should make adding new compression formats easier.) added s-directory build options to choose whether to compress and, if so, the format and level. compress-io.h, compress-io.c, new-io.c, equates.h, system.h, scheme.c, gc.c, io.ss, cmacros.ss, back.ss, bytevector.ss, primdata.ss, s/Mf-base, io.ms, mat.ss, bytevector.ms, root-experr*, release_notes.stex, io.stex, system.stex, objects.stex - improved the effectiveness of LZ4 boot-file compression to within 15% of gzip by increasing the lz4 output-port in_buffer size to 1<<18. With the previous size (1<<14) LZ4-compressed boot files were about 50% larger. set the lz4 input-port in_buffer and out_buffer sizes to 1<<12 and 1<<14. there's no clear win at present for larger input-port buffer sizes. compress-io.c - To reduce the memory hit for the increased output-port in_buffer size and the corresponding increase in computed out_buffer size, one output-side out_buffer is now allocated (lazily) per thread and stored in the thread context. The other buffers are now directly a part of the lz4File_out and lz4File_in structures rather than allocated separately. compress-io.c, scheme.c, gc.c, cmacros.ss - split out the buffer emit code from glzwrite_lz4 into a separate glzemit_lz4 helper that is now also used by gzclose so we can avoid dealing with a NULL buffer in glzwrite_lz4. glzwrite_lz4 also uses it to writing large buffers directly and avoid the memcpy. compress-io.c - replaced lz4File_out and lz4File_in mode enumeration with the compress format and inputp boolean. using switch to check and raising exceptions for unexpected values to further simplify adding new compression formats in the future. compress-io.c - replaced the never-defined struct lz4File pointer in glzFile union with the more specific struct lz4File_in_r and Lz4File_out_r pointers. compress-io.h, compress-io.c - added free of lz4 structures to gzclose. also changed file-close logic generally so that (1) port is marked closed before anything is freed to avoid dangling pointers in the case of an interrupt or error, and (2) structures are freed even in the case of a write or close error, before the error is reported. also now mallocing glz and lz4 structures after possibility of errors have passed where possible and freeing them when not. compress-io.c, io.ss - added return-value checks to malloc calls and to a couple of other C-library calls. compress-io.c - corrected EINTR checks to look at errno rather than return codes. compress-io.c - added S_ prefixes to the glz* exports externs.h, compress-io.c, new-io.c, scheme.c, fasl.c - added entries for mutex-name and mutex-thread threads.stex
It compiles cleanly in Microsoft Visual Studio, and so far the ta6nt mats are passing. |
Add a compress-level parameter
Add a compress-level parameter Original commit: 3ea6f8e
Add a compress-level parameter Original commit: 3ea6f8e
This patch adds a
compress-level
parameter that can be used to control the amount of effort expended compressing, whether with the default LZ4 compression or gzip compression. The parameter can take on one of the symbolic valueslow
,medium
,high
, ormaximum
. The default is medium, which for LZ4 currently maps toLZ4HC_CLEVEL_MIN
and is both more effective and slower (at compression, but not decompression) than the previous default. The patch also makes several changes to the LZ4-support code added in #410; in particular, it restructures the code a bit, puts the compress format directly in the thread context (which now also holds the compress level), and changes how the compressed bytevector compression format and length are stored in the header word. I have tested the changes on OSX and Linux but not Windows.