-
Notifications
You must be signed in to change notification settings - Fork 0
/
LZ4_Frame_Format.html
1 lines (1 loc) · 25 KB
/
LZ4_Frame_Format.html
1
<html><head><title>LZ4 Framing format - v1.5.0</title><meta content="text/html; charset=UTF-8" http-equiv="content-type"><style type="text/css">@import url('https://themes.googleusercontent.com/fonts/css?kit=wAPX1HepqA24RkYW1AuHYA');ol{margin:0;padding:0}.c2{font-size:10pt;font-family:"Courier New";font-weight:bold}.c10{max-width:453.6pt;background-color:#ffffff;padding:70.8pt 70.8pt 70.8pt 70.8pt}.c4{line-height:1.0;height:11pt;padding-bottom:0pt}.c0{direction:ltr;margin-left:18pt}.c13{line-height:1.0;padding-bottom:0pt}.c8{color:inherit;text-decoration:inherit}.c3{color:#1155cc;text-decoration:underline}.c6{text-decoration:underline;font-weight:bold}.c14{font-weight:bold}.c18{font-family:"Courier New"}.c16{font-size:18pt}.c12{margin-left:36pt}.c11{font-size:14pt}.c5{height:11pt}.c9{text-align:center}.c7{text-decoration:underline}.c17{color:#0000ff}.c1{direction:ltr}.c15{font-style:italic}.title{padding-top:12pt;line-height:1.15;text-align:center;color:#000000;font-size:16pt;font-family:"Arial";font-weight:bold;padding-bottom:3pt}.subtitle{padding-top:0pt;line-height:1.15;text-align:center;color:#000000;font-size:11pt;font-family:"Arial";padding-bottom:3pt}li{color:#000000;font-size:11pt;font-family:"Calibri"}p{color:#000000;font-size:11pt;margin:0;font-family:"Calibri"}h1{padding-top:12pt;line-height:1.15;text-align:left;color:#000000;font-size:16pt;font-family:"Arial";font-weight:bold;padding-bottom:3pt}h2{padding-top:12pt;line-height:1.15;text-align:left;color:#000000;font-style:italic;font-size:14pt;font-family:"Arial";font-weight:bold;padding-bottom:3pt}h3{padding-top:12pt;line-height:1.15;text-align:left;color:#000000;font-size:13pt;font-family:"Arial";font-weight:bold;padding-bottom:3pt}h4{padding-top:12pt;line-height:1.15;text-align:left;color:#000000;font-size:14pt;font-family:"Calibri";font-weight:bold;padding-bottom:3pt}h5{padding-top:12pt;line-height:1.15;text-align:left;color:#000000;font-style:italic;font-size:13pt;font-family:"Calibri";font-weight:bold;padding-bottom:3pt}h6{padding-top:12pt;line-height:1.15;text-align:left;color:#000000;font-size:11pt;font-family:"Calibri";font-weight:bold;padding-bottom:3pt}</style></head><body class="c10"><hr><p class="c1 c9"><span class="c14 c16">LZ4 </span><span class="c14 c16">Framing </span><span class="c14 c16">Format</span></p><hr><p class="c5 c1"><span class="c14 c16"></span></p><p class="c1"><span class="c6 c11">Notices</span></p><p class="c1"><span>Copyright (c) 2013-2015 Yann Collet</span></p><p class="c1"><span>Permission is granted to copy and distribute this document for any purpose and without charge, including translations into other languages and incorporation into compilations, provided that the copyright notice and this notice are preserved, and that any substantive changes or deletions from the original are clearly marked.</span></p><p class="c1"><span class="c6 c11">Version</span></p><p class="c1"><span>1.5.0</span></p><h1 class="c1"><a name="h.2z5bl598dfq9"></a><span>Introduction</span></h1><p class="c1"><span>The purpose of this document is to define a lossless compressed data format, that is independent of CPU type, operating system, file system and character set, suitable for File compression, Pipe and streaming compression using the LZ4 algorithm : </span><span class="c7 c17"><a class="c8" href="">http://code.google.com/p/lz4/</a></span></p><p class="c1"><span>The data can be produced or consumed, even for an arbitrarily long sequentially presented input data stream, using only an a priori bounded amount of intermediate storage, and hence can be used in data communications. The format uses the LZ4 compression method, and optional </span><span class="c3"><a class="c8" href="http://code.google.com/p/xxhash/">xxHash-32</a></span><span> checksum method, for detection of data corruption.</span></p><p class="c1"><span>The data format defined by this specification does not attempt to allow random access to compressed data.</span></p><p class="c1"><span>This specification is intended for use by implementers of software to compress data into LZ4 format and/or decompress data from LZ4 format. The text of the specification assumes a basic background in programming at the level of bits and other primitive data representations.</span></p><p class="c1"><span>Unless otherwise indicated below, </span><span>a compliant compressor must produce data sets that conform to the specifications presented here. It doesn’t need to support all options though.</span></p><p class="c1"><span>A</span><span> compliant decompressor must be able to decompress </span><span>at least one working </span><span>set of parameters that conforms to the specifications presented here</span><span>. It may also ignore checksums. Whenever it does not support a specific parameter used within the compressed stream, it must produce a non-ambiguous error code and associated error message explaining which parameter is unsupported.</span></p><p class="c1"><span>Distribution of this document is unlimited.</span></p><p class="c4 c1"><span></span></p><hr style="page-break-before:always;display:none;"><p class="c1 c4"><span></span></p><p class="c1 c13"><span class="c6 c11">Summary </span><span class="c6">:</span></p><p class="c4 c1"><span></span></p><p class="c0"><span class="c3"><a class="c8" href="#h.2z5bl598dfq9">Introduction</a></span></p><p class="c0"><span class="c3">General structure of </span><span class="c3"><a class="c8" href="#h.1615sutikt7e">LZ4 Framing Format</a></span></p><p class="c12 c1"><span class="c3">Frame </span><span class="c3"><a class="c8" href="#h.uof0plru1f66">Descriptor</a></span></p><p class="c1 c12"><span class="c3"><a class="c8" href="#h.u8dkhfnwqyg">Data Blocks</a></span></p><p class="c0"><span class="c3"><a class="c8" href="#h.152pfqac8luc">Skippable </a></span><span class="c3">Frames</span></p><p class="c0"><span class="c3"><a class="c8" href="#h.ujcdmapf87vn">Legacy format</a></span></p><p class="c0"><span class="c3"><a class="c8" href="#h.zij6fhosmkvv">Appendix</a></span></p><p class="c5 c1"><span></span></p><p class="c5 c1"><span class="c6 c11"></span></p><hr style="page-break-before:always;display:none;"><p class="c5 c1"><span class="c6 c11"></span></p><h1 class="c1"><a name="h.1615sutikt7e"></a><span class="c7">General Structure of </span><span class="c7">LZ4 Framing format</span></h1><p class="c5 c9 c1"><span class="c6 c11"></span></p><p class="c9 c1"><span style="overflow: hidden; display: inline-block; margin: 0.00px 0.00px; border: 0.00px solid #000000; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px); width: 408.00px; height: 106.00px;"><img alt="" src="images/image05.png" style="width: 408.00px; height: 106.00px; margin-left: 0.00px; margin-top: 0.00px; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px);" title=""></span></p><p class="c5 c1"><span class="c6 c11"></span></p><p class="c1"><span class="c6">Magic Number</span></p><p class="c1"><span>4 Bytes, </span><span class="c7">Little endian</span><span> format.<br>Value : </span><span class="c14 c18">0x184D2204</span></p><p class="c5 c1"><span class="c2"></span></p><p class="c1"><span class="c6">Frame D</span><span class="c6">escriptor</span></p><p class="c1"><span>3</span><span> to 1</span><span>1</span><span> Bytes, to be detailed </span><span>in the next part.</span><span><br>Most important part of the spec.</span></p><p class="c5 c1"><span></span></p><p class="c1"><span class="c6">Data Blocks</span></p><p class="c1"><span>To be detailed later on.<br>That’s where compressed data is stored.</span></p><p class="c5 c1"><span></span></p><p class="c1"><span class="c6">EndMark</span></p><p class="c1"><span>The flow of </span><span>blocks </span><span>ends when the last data block has a size of “</span><span class="c14">0</span><span>”. </span><span><br></span><span>The size is expressed as </span><span>a </span><span>32-bits value.</span></p><p class="c5 c1"><span></span></p><p class="c1"><span class="c6">Content Checksum</span></p><p class="c1"><span>Content Checksum verify that the full content has been decoded correctly.<br>The content checksum is the result of </span><span class="c3"><a class="c8" href="http://code.google.com/p/xxhash/">xxh32()</a></span><span> hash function digesting the original (decoded) data as input, and a seed of zero.<br>Content checksum is only present when its </span><span class="c3"><a class="c8" href="#id.s5zerkv6retr">associated flag </a></span><span>is set in the framing descriptor. Content Checksum validates the result, that all blocks were fully transmitted in the correct order and without error, and also that the encoding/decoding process itself generated no distortion. Its usage is recommended. </span></p><p class="c5 c1"><span></span></p><p class="c1"><span class="c6">Frame Concatenation</span></p><p class="c1"><span>In some circumstances, it may be preferable </span><span>to append multiple frames, </span><span>for example </span><span>in order to add new data to an existing compressed file without re-framing it.</span></p><p class="c1"><span>In such case, each frame has its own set of descriptor flags. Each frame is considered independent. The only relation between frames is their sequential order.</span></p><p class="c1"><span>The ability to decode multiple concatenated frames within a single stream or file is left outside of this specification. As an example, the reference lz4 command line utility behavior is to decode all concatenated frames in their sequential order. </span></p><p class="c5 c1"><span></span></p><hr style="page-break-before:always;display:none;"><p class="c5 c1"><span></span></p><h2 class="c1"><a name="h.uof0plru1f66"></a><span class="c7">Frame </span><span class="c7">Descriptor</span></h2><p class="c9 c1"><span style="overflow: hidden; display: inline-block; margin: 0.00px 0.00px; border: 0.00px solid #000000; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px); width: 226.00px; height: 106.00px;"><img alt="" src="images/image00.png" style="width: 226.00px; height: 106.00px; margin-left: 0.00px; margin-top: 0.00px; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px);" title=""></span></p><p class="c9 c1"><span style="overflow: hidden; display: inline-block; margin: 0.00px 0.00px; border: 0.00px solid #000000; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px); width: 606.00px; height: 125.33px;"><img alt="" src="images/image01.png" style="width: 606.00px; height: 125.33px; margin-left: 0.00px; margin-top: 0.00px; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px);" title=""></span></p><p class="c1"><span>The descriptor uses a minimum of </span><span>3</span><span> bytes</span><span>, and up to 11 bytes depending on optional parameters.</span><span><br>In the picture, bit 7 is highest bit, while bit 0 is lowest.</span></p><p class="c5 c1"><span></span></p><p class="c1"><span class="c6">Version Number :</span></p><p class="c1"><span>2-bits field, </span><span class="c6">must</span><span class="c14"> </span><span>be set to “</span><span class="c14">01</span><span>”.<br>Any other value cannot be decoded by this </span><span>version of the specification.</span><span><br>Other version numbers will use different flag layouts.</span></p><p class="c5 c1"><span></span></p><p class="c1"><span class="c6">Block </span><span class="c6">Independence </span><span class="c6">flag :</span></p><p class="c1"><span>If this flag is set to “1”</span><span>, blocks are independent, and can therefore be decoded independently, in parallel.<br>If this flag is set to “</span><span>0</span><span>”, each block depends on previous ones for decoding (up to LZ4 window size, which is 64 KB). In this case, it’s necessary to decode all blocks in sequence.</span></p><p class="c1"><span>Block </span><span>dependency</span><span> improves compression ratio, especially for small blocks. On the other hand, it makes jumps or multi-threaded decoding impossible.</span></p><p class="c5 c1"><span></span></p><a href="#" name="id.r4mqxzdxswxz"></a><p class="c1"><span class="c6">Block checksum flag :</span></p><p class="c1"><span>If this flag is set, e</span><span>ach data block will be followed by a 4-bytes checksum, calculated by using the xxHash-32 algorithm on the raw (compressed) data block.<br>The intention is to detect data corruption (storage or transmission errors) immediately, before decoding.<br>Block ch</span><span>ecksum usage is optional.</span></p><p class="c5 c1"><span></span></p><p class="c1"><span class="c6">Content </span><span class="c6">Size flag :</span></p><p class="c1"><span>If this flag is set, the original (uncompressed) size of data included within </span><span>the frame</span><span> will be present as an 8 bytes unsigned value, litt</span><span>le endian format, </span><span>after the flags.</span></p><p class="c1"><span>Recommended </span><span>value : “</span><span class="c14">0</span><span>” (not present)</span></p><p class="c1 c5"><span></span></p><p class="c1"><span class="c6">Content checksum flag :</span></p><p class="c1"><span>If this flag is set, a</span><span class="c3"><a class="c8" href="#id.q3958klk497z"> content checksum</a></span><span> will be appended after the EoS mark.</span></p><p class="c1"><span>Recommended value : “</span><span class="c14">1</span><span>” (content checksum is present)</span></p><p class="c5 c1"><span></span></p><p class="c1"><span class="c6">Block Maximum Size :</span></p><p class="c1"><span>This information is intended to help the decoder allocate the right amount of memory.<br>Size here refers to the original (uncompressed) data size.<br>Block Maximum Size </span><span>is</span><span> one value among the fol</span><span>lowing table : </span></p><p class="c9 c1"><span style="overflow: hidden; display: inline-block; margin: 0.00px 0.00px; border: 0.00px solid #000000; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px); width: 331.00px; height: 56.00px;"><img alt="" src="images/image03.png" style="width: 331.00px; height: 56.00px; margin-left: 0.00px; margin-top: 0.00px; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px);" title=""></span></p><p class="c1"><span>The decoder may refuse to allocate block sizes above a (system-specific) size.<br>Unused values may be used in a future revision of the spec.<br>A decoder conformant to the current version of the spec is only able to decode blocksizes defined in this spec.</span></p><p class="c5 c1"><span></span></p><p class="c1"><span class="c6">Reserved bits :</span></p><p class="c1"><span>Value of reserved bits </span><span class="c6">must </span><span>be </span><span class="c14">0</span><span> (zero).<br>Reserved bit might be used in a future version of the specification, to enable any (yet-to-decide) optional feature.<br>If this happens, a decoder respecting the current version of the specification shall not be able to decode such a frame.</span></p><p class="c5 c1"><span></span></p><p class="c1"><span class="c6">Content Size</span></p><p class="c1"><span>This is the original (uncompressed) size. <br>This information is optional, and only present if the </span><span class="c3"><a class="c8" href="#id.tqyy099hxhnn">associated flag is set</a></span><span>.<br>Content size is provided using unsigned 8 Bytes, for a maximum of 16 HexaBytes.<br>Format is Little endian.<br>This field has no impact on decoding, it just informs the decoder how much data the frame holds (for example, to display it during decoding process, or for verification purpose). It can be safely skipped by a conformant decoder.</span></p><p class="c5 c1"><span></span></p><p class="c1"><span class="c6">Header Checksum :</span></p><p class="c1"><span>One-byte checksum of all descriptor fields, including optional ones when present.<br>The byte is second byte of </span><span class="c3"><a class="c8" href="http://code.google.com/p/xxhash/">xxh32()</a></span><span> : { (xxh32()>>8) & 0xFF } ,<br>using zero as a seed, <br>and the full Frame Descriptor as an input (</span><span class="c7 c15">including</span><span> optional fields when they are present).<br>A different checksum indicates an error in the descriptor.</span></p><hr style="page-break-before:always;display:none;"><p class="c5 c1"><span class="c6 c11"></span></p><h2 class="c1"><a name="h.u8dkhfnwqyg"></a><span class="c7">Data Blocks</span></h2><p class="c9 c1"><span style="overflow: hidden; display: inline-block; margin: 0.00px 0.00px; border: 0.00px solid #000000; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px); width: 317.00px; height: 90.00px;"><img alt="" src="images/image02.png" style="width: 317.00px; height: 90.00px; margin-left: 0.00px; margin-top: 0.00px; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px);" title=""></span></p><p class="c5 c1"><span class="c6 c11"></span></p><p class="c1"><span class="c6">Block</span><span class="c6"> Size</span></p><p class="c1"><span>Th</span><span>is</span><span> field uses </span><span class="c14">4</span><span class="c14">-bytes, </span><span>f</span><span>ormat is </span><span class="c7">little-endian</span><span>.</span></p><p class="c1"><span>The highest bit is “</span><span class="c14">1</span><span>” if data in the block is uncompressed.</span></p><p class="c1"><span>The highest bit is “</span><span class="c14">0</span><span>” if data in the block is compressed by LZ4.</span></p><p class="c1"><span>All other bits give the size, in bytes, of the following data block (the size does not include the checksum if present).</span></p><p class="c1"><span>Block Size shall never be larger than Block Maximum Size. Such a thing could happen when the original data is incompressible. In this case, such a data block shall be passed in uncompressed format.</span></p><p class="c5 c1"><span></span></p><p class="c1"><span class="c6">Data</span></p><p class="c1"><span>Where the actual data to decode stands. It might be compressed or not, depending on previous field indications.<br>Uncompressed size of Data can be any size, up to “block maximum size”. <br>Note that data block is not necessarily </span><span>full </span><span>: an arbitrary “flush” may happen anytime. Any block can be </span><span>“partially filled”.</span></p><p class="c5 c1"><span></span></p><a href="#" name="id.3p4pcqe6ab8n"></a><p class="c1"><span class="c6">Block checksum :</span></p><p class="c1"><span>Only present if the </span><span class="c3"><a class="c8" href="#id.r4mqxzdxswxz">associated flag is set</a></span><span>.<br>This is a 4-bytes checksum value, in little endian format, <br>calculated by using the xxHash-32 algorithm </span><span class="c7">on the raw (undecoded) data block</span><span>, <br>and a seed of zero.</span><span><br>The intention is to detect data corruption (storage or transmission errors) </span><span class="c15">before </span><span>decoding.</span></p><p class="c1"><span>Block checksum is cumulative with Content checksum.</span></p><hr style="page-break-before:always;display:none;"><p class="c5 c1"><span class="c6 c11"></span></p><h1 class="c1"><a name="h.152pfqac8luc"></a><span class="c7">Skippable Frames</span></h1><p class="c9 c1"><span style="overflow: hidden; display: inline-block; margin: 0.00px 0.00px; border: 0.00px solid #000000; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px); width: 285.00px; height: 106.00px;"><img alt="LZ4 Framing Format - Skippable Frame.png" src="images/image04.png" style="width: 285.00px; height: 106.00px; margin-left: 0.00px; margin-top: 0.00px; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px);" title=""></span></p><p class="c1"><span>Skippable frames allow the integration of user-defined data into a flow of concatenated frames.<br>Its design is pretty straightforward, with the sole objective to allow the decoder to quickly skip over user-defined data and continue decoding.</span></p><p class="c1"><span>For the purpose of facilitating identification, it is discouraged to start a flow of concatenated frames with a skippable frame. If there is a need to start such a flow with some user data encapsulated into a skippable frame, it’s recommended to start will a zero-byte LZ4 frame followed by a skippable frame. This will make it easier for file type identifiers.</span></p><p class="c1"><span> </span></p><p class="c1"><span class="c6">Magic Number</span></p><p class="c1"><span>4 Bytes, </span><span class="c7">Little endian</span><span> format.<br>Value : </span><span class="c2">0x184D2A5X</span><span>, which means any value from</span><span class="c2"> 0x184D2A50 to 0x184D2A5F.</span><span> All 16 values are valid to identify a skippable frame.<br></span></p><p class="c1"><span class="c6">Frame Size</span><span class="c6"> </span></p><p class="c1"><span>This is the size, in bytes, of the following User Data (without including the magic number nor the size field itself).<br>4 Bytes, </span><span class="c7">Little endian</span><span> format, unsigned 32-bits.<br>This means User Data can’t be bigger than (2^32-1) Bytes.<br></span></p><p class="c1"><span class="c6">User Data</span></p><p class="c1"><span>User Data can be anything. Data will just be skipped by the decoder. </span></p><hr style="page-break-before:always;display:none;"><p class="c5 c1"><span class="c6 c11"></span></p><h1 class="c1"><a name="h.ujcdmapf87vn"></a><span class="c7">Legacy frame</span></h1><p class="c9 c1"><span style="overflow: hidden; display: inline-block; margin: 0.00px 0.00px; border: 0.00px solid #000000; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px); width: 570.00px; height: 90.00px;"><img alt="" src="images/image06.png" style="width: 570.00px; height: 90.00px; margin-left: 0.00px; margin-top: 0.00px; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px);" title=""></span></p><p class="c1"><span>The Legacy frame format was defined into the initial versions of “LZ4Demo”.<br>Newer compressors should not use this format anymore, since it is too restrictive.<br>It is recommended that decompressors shall be able to decode this format during the transition period.</span></p><p class="c1"><span>Main properties of legacy format :<br>- Fixed block size : </span><span>8 MB</span><span>.<br>- All blocks must be completely filled, except the last one.<br>- All blocks are always compressed, even when compression is detri</span><span>mental.</span><span><br>- The last block is detected either because it is followed by the “EOF” (End of File) mark</span><span>, or because it is followed by a known Frame Magic Number.</span><span><br>- No checksum<br>- Convention is Little endian</span></p><p class="c5 c1"><span></span></p><p class="c1"><span class="c6">Magic Number</span></p><p class="c1"><span>4 Bytes, </span><span class="c7">Little endian</span><span> format.<br>Value : </span><span class="c2">0x184C2102<br></span></p><p class="c1"><span class="c6">Block Compressed Size</span></p><p class="c1"><span>This is the size, in bytes, of the following compressed data block.<br>4 Bytes, </span><span class="c7">Little endian</span><span> format.<br></span></p><p class="c1"><span class="c6">Data</span></p><p class="c1"><span>Where the actual data stands. <br>Data is </span><span class="c7">always</span><span> compressed, even when compression is detrimental (i.e. larger than original size).</span></p><hr style="page-break-before:always;display:none;"><p class="c5 c1"><span class="c6 c11"></span></p><h1 class="c1"><a name="h.zij6fhosmkvv"></a><span class="c6">Appendix </span><span> </span></h1><p class="c1"><span class="c11">Version changes</span></p><p class="c1"><span>1.4.1 : changed wording from “stream” to “frame”</span></p><p class="c1"><span>1.4 : added skippable streams, re-added stream checksum </span></p><p class="c1"><span>1.3 : modified header checksum</span></p><p class="c1"><span>1.2 : reduced choice of “block size”, to postpone decision on “dynamic size of BlockSize Field”.</span></p><p class="c1"><span>1.1 : optional fields are now part of the descriptor</span></p><p class="c1"><span>1.0 : changed “block size” specification, adding a compressed/uncompressed flag</span></p><p class="c1"><span>0.9 : reduced scale of “block maximum size” table</span></p><p class="c1"><span>0.8 : removed : high compression flag</span></p><p class="c1"><span>0.7 : removed : stream checksum</span></p><p class="c1"><span>0.6 : settled : stream size uses 8 bytes, endian convention is little endian</span></p><p class="c1"><span>0.5: added copyright notice</span></p><p class="c1"><span>0.4 : changed format to Google Doc compatible OpenDocument</span></p></body></html>