Skip to content

Commit

Permalink
Support Mac OS Archive Utility ZIPs [feat]
Browse files Browse the repository at this point in the history
  • Loading branch information
overlookmotel committed May 17, 2023
1 parent b426560 commit 8c46664
Show file tree
Hide file tree
Showing 31 changed files with 1,119 additions and 121 deletions.
20 changes: 20 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,26 @@ jobs:
- run: npm ci
- run: npm run test-main

test-big-mac-zips:
runs-on: ubuntu-latest

strategy:
matrix:
mac-big-size: [65534, 65535, 65536, 65537, 131072, 200000]

env:
MAC_BIG_SIZE: ${{ matrix.mac-big-size }}

steps:
- uses: actions/checkout@v3
- name: Use Node.js 18
uses: actions/setup-node@v3
with:
node-version: 18
cache: 'npm'
- run: npm ci
- run: npm run test-mac-big

coverage:
runs-on: ubuntu-latest
steps:
Expand Down
50 changes: 48 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ This library is a rewrite of `yauzl`, which retains all its features and careful

* Promise-based API
* Validation of CRC32 checksums to ensure data integrity (using fast Rust CRC32 calculation)
* Support for unzipping faulty ZIP files created by Mac OS Archive Utility (see [here](https://github.com/thejoshwolfe/yauzl/issues/69))
* Extract files from ZIP in parallel
* Additional options
* Bug fixes
Expand Down Expand Up @@ -101,7 +102,8 @@ The `reader` parameter must be an instance of a subclass of [`yauzl.Reader`](#cl
decodeStrings: true,
validateEntrySizes: true,
validateFilenames: true,
strictFilenames: false
strictFilenames: false,
supportMacArchive: true
}
```

Expand Down Expand Up @@ -140,6 +142,14 @@ The spec forbids filenames with backslashes, but Microsoft's `System.IO.Compress

When `strictFilenames`, `decodeStrings`, and `validateFilenames` options are all `true`, entries with backslashes in their filenames will result in an error.

#### `supportMacArchive`

When `true` (default), faulty ZIP files created by Mac OS Archive Utility can be unzipped successfully, despite being malformed.

Mac OS Archive Utility creates such faulty ZIPs when either (1) ZIP's size is over 4 GiB, (2) any file in the ZIP is over 4 GiB compressed or uncompressed, or (3) number of files in the ZIP exceeds 65535. See [yauzl#69](https://github.com/thejoshwolfe/yauzl/issues/69) for more details.

Handling these ZIPs does have a slight overhead. Also, in some *extremely* rare cases, it's possible it could also cause a valid ZIP to be mis-interpreted. So if you're sure ZIP is not created by Mac OS Archive Utility, you can disable the support for a very marginal performance improvement.

### `zip.close()`

Closes file and returns Promise which resolves when underlying file/file descriptor/reader is closed.
Expand Down Expand Up @@ -262,6 +272,16 @@ Instances of `Zip` class are returned by `open()`, `fromFd()`, `fromBuffer()`, a

`Number`. Total number of entries in ZIP file.

#### `zip.entryCountIsCertain`

`Boolean`. `true` if `entryCount` can be relied on for accuracy.

Mac OS Archive Utility truncates `entryCount` to 16 bits (i.e. max 65535), so it can be inaccurate.

Where the ZIP file has been identified as possibly a Mac OS ZIP, and it's possible `entryCount` is inaccurate, `entryCountIsCertain` will be `false`. In this case, actual number of entries may be higher than reported (but not lower).

As entries are read with `readEntry()`, `entryCount` will be increased if it becomes evident that there are more entries than reported. Once `entryCount` is determined to definitely be accurate, `entryCountIsCertain` will change to `true`.

#### `zip.comment`

`String`. Always decoded with `CP437` per the spec.
Expand All @@ -272,6 +292,20 @@ If `options.decodeStrings` is `false`, this field is the undecoded `Buffer` inst

`true` if ZIP file uses ZIP64 extension (allowing more than 65535 files, or file data larger than 4 GiB).

#### `zip.isMacArchive`

`Boolean`. `true` if ZIP is a faulty Mac OS Archive Utility ZIP. `false` if it's not known to be.

`zip.isMaybeMacArchive` indicates whether ZIP *may* be a Mac OS Archive Utility ZIP.

You don't need to worry about either of these properties - they're mainly for the internal logic of this package - but if you happen to be interested, the possible states are:

* `isMacArchive = true`: Definitely a faulty Mac OS Archive Utility ZIP.
* `isMaybeMacArchive = true`: ZIP possibly created by Mac OS Archive Utility (very probably it is).
* `isMaybeMacArchive = false`: ZIP definitely not created by versions Mac OS Archive Utility which produce faulty ZIPs.

Both properties are updated by `readEntry()` and `openReadStream()`, as more about the ZIP file becomes known.

### Class: `Entry`

Instances of `Entry` class are returned by `zip.readEntry()`, `zip.readEntries()`, or using a `Zip` as an async iterator. The constructor for the class is not part of the public API.
Expand All @@ -292,7 +326,7 @@ These fields are of type `Number`:
* `internalFileAttributes`
* `externalFileAttributes`
* `fileHeaderOffset`
* `fileDataOffset` (unpopulated until [`openReadStream()`](#reading-file-data) is called)
* `fileDataOffset` (usually unpopulated until [`openReadStream()`](#reading-file-data) is called)

In addition:

Expand All @@ -307,6 +341,16 @@ If `decodeStrings` option is `false`, this field is the undecoded `Buffer` inste

NB: In original `yauzl`, this field was named `fileName` (capital `N`).

#### `entry.uncompressedSizeIsCertain`

`Boolean`. `true` if `uncompressedSize` is reliable.

Mac OS Archive Utility truncates `uncompressedSize` to 32 bits (i.e. max size 4 GiB), so it is inaccurate for files >= 4 GiB in size.

Where the ZIP file has been identified as possibly a Mac OS ZIP, and it's possible `uncompressedSize` is inaccurate, `uncompressedSizeIsCertain` will be `false`. In this case, actual `uncompressedSize` may be higher than reported (but not lower).

After `openReadStream()` has completed streaming out the file, `uncompressedSize` will be updated to reflect the accurate uncompressed data size, and `uncompressedSizeIsCertain` will change to `true`. NB: This doesn't happen if either decompression (`decompress` option) or entry size validation (`validateEntrySizes` option) are disabled. Both are enabled by default.

#### `entry.extraFields`

`Array` with each entry in the form `{id, data}`, where `id` is a `Number` and `data` is a `Buffer`.
Expand Down Expand Up @@ -376,6 +420,8 @@ All active NodeJS release lines are supported (v16+ at time of writing). After a

Use `npm test` to run the tests. Use `npm run cover` to check coverage.

Use `npm run test-mac-big` to run additional tests on large Mac OS ZIP files. These tests are slow.

## Changelog

See [changelog.md](https://github.com/overlookmotel/yauzl-promise/blob/master/changelog.md)
Expand Down
101 changes: 83 additions & 18 deletions lib/entry.js
Original file line number Diff line number Diff line change
Expand Up @@ -13,19 +13,22 @@ const {createInflateRaw} = require('zlib'),
assert = require('simple-invariant');

// Imports
const {dosDateTimeToDate} = require('./utils.js'),
{INTERNAL_SYMBOL} = require('./shared.js');
const {INTERNAL_SYMBOL, uncertainUncompressedSizeEntriesRegistry} = require('./shared.js'),
{dosDateTimeToDate} = require('./utils.js');

// Exports

const MAC_LFH_EXTRA_FIELDS_LENGTH = 16,
FOUR_GIB = 0x100000000; // Math.pow(2, 32)

class Entry {
/**
* Class representing ZIP file entry.
* Class is exported in public interface, for purpose of `instanceof` checks, but constructor cannot
* be called by user. This is enforced by use of private symbol `INTERNAL_SYMBOL`.
* @class
* @param {Object} testSymbol - Must be `INTERNAL_SYMBOL`
* @param {Object} props - Entry properties (see `Zip` class's `_readEntry()` method)
* @param {Object} props - Entry properties (see `Zip` class's `_readEntryAt()` method)
*/
constructor(testSymbol, props) {
assert(testSymbol === INTERNAL_SYMBOL, 'Entry class cannot be instantiated directly');
Expand Down Expand Up @@ -143,16 +146,35 @@ class Entry {
// Bytes 10-11: File last modification time
// Bytes 12-13: File last modification date
// Bytes 14-17: CRC32
const localCrc32 = buffer.readUInt32LE(14);
// Bytes 18-21: Compressed size
const localCompressedSize = buffer.readUInt32LE(18);
// Bytes 22-23: Uncompressed size
const localUncompressedSize = buffer.readUInt32LE(22);
// Bytes 26-27: Filename length
const filenameLength = buffer.readUInt16LE(26);
// Bytes 28-29: Extra Fields length
const extraFieldsLength = buffer.readUInt16LE(28);
// Bytes 30-... - Filename + Extra Fields

const fileDataOffset = this.fileHeaderOffset + 30 + filenameLength + extraFieldsLength;
this.fileDataOffset = fileDataOffset;

if (this.zip.isMacArchive || this.zip.isMaybeMacArchive) {
// Check properties match Mac ZIP signature
const matchesMacSignature = localCrc32 === 0
&& localCompressedSize === 0
&& localUncompressedSize === 0
&& filenameLength === this.filenameLength
&& extraFieldsLength === this.extraFields.length * MAC_LFH_EXTRA_FIELDS_LENGTH;
if (this.zip.isMacArchive) {
assert(matchesMacSignature, 'Misidentified Mac OS Archive Utility ZIP');
} else if (!matchesMacSignature) {
// Doesn't fit signature of Mac OS Archive Utility ZIP, so can't be one
this.zip._setAsNotMacArchive();
}
}

if (this.compressedSize !== 0) {
assert(
fileDataOffset + this.compressedSize <= this.zip.footerOffset,
Expand All @@ -164,12 +186,12 @@ class Entry {
// Get stream
let stream = this.zip.reader.createReadStream(fileDataOffset + start, end - start);

// Pipe stream through decompress, CRC32 validation, and/or byte count transform streams
// Pipe stream through decompress, CRC32 validation, and/or uncompressed size check transform streams
const streams = [stream];
if (decompress) {
streams.push(createInflateRaw());
// eslint-disable-next-line no-use-before-define
if (this.zip.validateEntrySizes) streams.push(new ValidateByteCountStream(this.uncompressedSize));
if (this.zip.validateEntrySizes) streams.push(new ValidateUncompressedSizeStream(this));
}

// eslint-disable-next-line no-use-before-define
Expand All @@ -187,27 +209,59 @@ class Entry {

module.exports = Entry;

/**
* Transform stream to compare bytes streamed to expected.
* @class
*/
class ValidateByteCountStream extends TransformStream {
constructor(byteCount) {
class ValidateUncompressedSizeStream extends TransformStream {
/**
* Transform stream to compare size of uncompressed stream to expected.
* If `entry.uncompressedSizeIsCertain === false`, only checks actual byte count is accurate
* in lower 32 bits. `entry.uncompressedSize` can be inaccurate in faulty Mac OS ZIPs where
* uncompressed size reported by ZIP is truncated to lower 32 bits.
* If it proves inaccurate, `entry.uncompressedSize` is updated,
* and ZIP is flagged as being Mac OS ZIP if it isn't already.
* @class
* @param {Object} entry - Entry object
*/
constructor(entry) {
super();
this.byteCount = 0;
this.expectedByteCount = byteCount;
this.expectedByteCount = entry.uncompressedSize;
this.entry = entry;
}

_transform(chunk, encoding, cb) {
this.byteCount += chunk.length;
if (this.byteCount > this.expectedByteCount) {
cb(new Error(
`Too many bytes in the stream. Expected ${this.expectedByteCount}, `
+ `got at least ${this.byteCount}.`
));
} else {
cb(null, chunk);
if (this.entry.uncompressedSizeIsCertain) {
cb(new Error(
`Too many bytes in the stream. Expected ${this.expectedByteCount}, `
+ `got at least ${this.byteCount}.`
));
return;
}

// Entry must be at least 4 GiB larger. ZIP must be faulty Mac OS ZIP.
if (this.entry.uncompressedSize === this.expectedByteCount) {
this.expectedByteCount += FOUR_GIB;
this.entry.uncompressedSize = this.expectedByteCount;
const {zip} = this.entry;
if (!zip.isMacArchive) {
if (!zip.isMaybeMacArchive) {
// It shouldn't be possible for `isMaybeMacArchive` to be `false`.
// But check here as failsafe, as the logic around handling maybe-Mac ZIPs is complex.
// If there's a mistake in logic which does cause us to get here, `_setAsMacArchive()`
// below could throw an error which would crash the whole process. Contain the damage.
cb(new Error('Logic failure. Please raise an issue.'));
return;
}
zip._setAsMacArchive(zip.numEntriesRead, zip._entryCursor);
}
} else {
// Same entry must be being streamed simultaneously on another "thread",
// and other stream overtook this one, and already increased size
this.expectedByteCount = this.entry.uncompressedSize;
}
}

cb(null, chunk);
}

_flush(cb) {
Expand All @@ -216,6 +270,17 @@ class ValidateByteCountStream extends TransformStream {
`Not enough bytes in the stream. Expected ${this.expectedByteCount}, got only ${this.byteCount}.`
));
} else {
if (!this.entry.uncompressedSizeIsCertain) {
// Uncompressed size was uncertain, but is now known.
// Record size as certain, and remove from list of uncertain-sized entries.
this.entry.uncompressedSizeIsCertain = true;
const ref = this.entry._ref;
if (ref) {
this.entry._ref = null;
this.entry.zip._uncertainUncompressedSizeEntryRefs.delete(ref);
uncertainUncompressedSizeEntriesRegistry.unregister(ref);
}
}
cb();
}
}
Expand Down
7 changes: 6 additions & 1 deletion lib/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ module.exports = {
* @param {boolean} [options.validateEntrySizes=true] - Validate entry sizes
* @param {boolean} [options.validateFilenames=true] - Validate filenames
* @param {boolean} [options.strictFilenames=false] - Don't allow backslashes (`\`) in filenames
* @param {boolean} [options.supportMacArchive=true] - Support Mac OS Archive Utility faulty ZIP files
* @returns {Zip} - `Zip` class instance
*/
async function open(path, options) {
Expand All @@ -65,6 +66,7 @@ async function open(path, options) {
* @param {boolean} [options.validateEntrySizes=true] - Validate entry sizes
* @param {boolean} [options.validateFilenames=true] - Validate filenames
* @param {boolean} [options.strictFilenames=false] - Don't allow backslashes (`\`) in filenames
* @param {boolean} [options.supportMacArchive=true] - Support Mac OS Archive Utility faulty ZIP files
* @returns {Zip} - `Zip` class instance
*/
async function fromFd(fd, options) {
Expand All @@ -88,6 +90,7 @@ async function fromFd(fd, options) {
* @param {boolean} [options.validateEntrySizes=true] - Validate entry sizes
* @param {boolean} [options.validateFilenames=true] - Validate filenames
* @param {boolean} [options.strictFilenames=false] - Don't allow backslashes (`\`) in filenames
* @param {boolean} [options.supportMacArchive=true] - Support Mac OS Archive Utility faulty ZIP files
* @returns {Zip} - `Zip` class instance
*/
async function fromBuffer(buffer, options) {
Expand All @@ -110,6 +113,7 @@ async function fromBuffer(buffer, options) {
* @param {boolean} [options.validateEntrySizes=true] - Validate entry sizes
* @param {boolean} [options.validateFilenames=true] - Validate filenames
* @param {boolean} [options.strictFilenames=false] - Don't allow backslashes (`\`) in filenames
* @param {boolean} [options.supportMacArchive=true] - Support Mac OS Archive Utility faulty ZIP files
* @returns {Zip} - `Zip` class instance
*/
async function fromReader(reader, size, options) {
Expand All @@ -133,7 +137,8 @@ function validateOptions(inputOptions) {
decodeStrings: true,
validateEntrySizes: true,
validateFilenames: true,
strictFilenames: false
strictFilenames: false,
supportMacArchive: true
};

if (inputOptions != null) {
Expand Down
14 changes: 11 additions & 3 deletions lib/shared.js
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,18 @@
* Shared objects
* ------------------*/

/* global FinalizationRegistry */

'use strict';

// Exports

module.exports = {
INTERNAL_SYMBOL: {}
};
// Object used as private symbol to ensure `Zip` and `Entry` classes cannot be constructed by user
const INTERNAL_SYMBOL = {};

// Finalization registry for entries with uncertain uncompressed size
const uncertainUncompressedSizeEntriesRegistry = new FinalizationRegistry(
({zip, ref}) => zip._uncertainUncompressedSizeEntryRefs?.delete(ref)
);

module.exports = {INTERNAL_SYMBOL, uncertainUncompressedSizeEntriesRegistry};
7 changes: 3 additions & 4 deletions lib/utils.js
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,14 @@ module.exports = {decodeBuffer, validateFilename, dosDateTimeToDate, readUInt64L
* Decode string from buffer, in either CP437 or UTF8 encoding.
* @param {Buffer} buffer - Buffer
* @param {number} start - Start position in buffer
* @param {number} end - End position in buffer
* @param {boolean} isUtf8 - `true` if UTF8 encoded
* @returns {string} - Decoded string
*/
function decodeBuffer(buffer, start, end, isUtf8) {
if (isUtf8) return buffer.toString('utf8', start, end);
function decodeBuffer(buffer, start, isUtf8) {
if (isUtf8) return buffer.toString('utf8', start);

let str = '';
for (let i = start; i < end; i++) {
for (let i = start; i < buffer.length; i++) {
str += CP437_CHARS[buffer[i]]; // eslint-disable-line no-use-before-define
}
return str;
Expand Down
Loading

0 comments on commit 8c46664

Please sign in to comment.