Skip to content

EvanHahn/yauzl-promise

 
 

Repository files navigation

NPM version Build Status Coverage Status

Unzip library for NodeJS

yauzl was the best unzipping library for NodeJS. Sadly, it's been unmaintained for several years now, has some buggy behavior in modern NodeJS versions, and a rather dated callback API.

This library is a rewrite of yauzl, which retains all its features and careful adherence to the ZIP spec, but with the following improvements:

  • Promise-based API
  • Validation of CRC32 checksums to ensure data integrity (using fast Rust CRC32 calculation)
  • Support for unzipping faulty ZIP files created by Mac OS Archive Utility (see here)
  • Extract files from ZIP in parallel
  • Additional options
  • Bug fixes

API is mostly the same as yauzl, but some options and properties are renamed to be more consistent and less verbose. Notably, entry.filename property has a lower case n (yauzl's property is called fileName).

It passes all of yauzl's test suite.

Versions v1 - v3 were a wrapper around yauzl. v4 is a re-write from scratch.

Usage

Installation

npm install yauzl-promise

Simple usage

Unzip all files from a ZIP file to a directory:

const yauzl = require('yauzl-promise'),
  fs = require('fs'),
  {pipeline} = require('stream/promises');

const zip = await yauzl.open('/path/to/file.zip');
try {
  for await (const entry of zip) {
    if (entry.filename.endsWith('/')) {
      await fs.promises.mkdir(`/path/to/output/${entry.filename}`);
    } else {
      const readStream = await entry.openReadStream();
      const writeStream = fs.createWriteStream(
        `/path/to/output/${entry.filename}`
      );
      await pipeline(readStream, writeStream);
    }
  }
} finally {
  await zip.close();
}

Open methods

All methods return an instance of yauzl.Zip class.

NB: zip.close() must be called when reading from the ZIP is complete, to avoid leaking a file descriptor, or other resources.

open(path, [options])

Opens ZIP file, ready for reading.

It maintains a single file descriptor for the file throughout - fs.open() is only called once.

Details of options below.

const yauzl = require('yauzl-promise');
const zip = await yauzl.open('/path/to/file.zip');

fromFd(fd, [options])

Reads from the provided file descriptor, which is presumed to be an open .zip file.

Note that random access is required by the ZIP file specification, so the file descriptor cannot be an open socket or any other file descriptor that does not support random access.

fromBuffer(buffer, [options])

Open ZIP file from a Buffer in memory. buffer must be a NodeJS Buffer object.

fromReader(reader, size, [options])

This method of reading a ZIP file allows clients to implement their own back-end file system. For example, a client might translate read calls into network requests.

The reader parameter must be an instance of a subclass of yauzl.Reader which implements the required methods.

size must be the total size in bytes of the ZIP file.

Open methods options

options may be omitted or null. The defaults are:

{
  decodeStrings: true,
  validateEntrySizes: true,
  validateFilenames: true,
  strictFilenames: false,
  supportMacArchive: true
}

decodeStrings

When true (default), yauzl will decode strings with CP437 or UTF8 as required by the spec.

If set to false:

  • zip.comment, entry.filename, and entry.comment will be Buffer objects instead of Strings.
  • Any Info-ZIP Unicode Path Extra Field will be ignored. See extraFields.
  • Automatic filename validation will not be performed.

validateEntrySizes

When true (default), ensures that an entry's reported uncompressed size matches its actual uncompressed size.

This check happens as early as possible - during initial reading of entry (for entries with no compression), or during openReadStream() (for compressed entries).

validateFilenames

When true (default), entry filenames are validated not to be absolute or relative paths. If validation fails, an error is thrown.

false disables validation.

When decodeStrings is false, validateFilenames has no effect.

This functionality is also available via yauzl.validateFilename().

strictFilenames

When false (default) and decodeStrings is true, all backslash (\) characters in each entry.filename are replaced with forward slashes (/).

The spec forbids filenames with backslashes, but Microsoft's System.IO.Compression.ZipFile class in .NET versions 4.5.0 until 4.6.1 creates non-conformant ZIP files with backslashes in filenames. strictFilenames is false by default so that clients can read these non-conformant ZIP files without knowing about this Microsoft-specific bug.

When strictFilenames, decodeStrings, and validateFilenames options are all true, entries with backslashes in their filenames will result in an error.

supportMacArchive

When true (default), faulty ZIP files created by Mac OS Archive Utility can be unzipped successfully, despite being malformed.

Mac OS Archive Utility creates such faulty ZIPs when either (1) ZIP's size is over 4 GiB, (2) any file in the ZIP is over 4 GiB compressed or uncompressed, or (3) number of files in the ZIP exceeds 65535. See yauzl#69 for more details.

Handling these ZIPs does have a slight overhead. Also, in some extremely rare cases, it's possible it could also cause a valid ZIP to be mis-interpreted. So if you're sure ZIP is not created by Mac OS Archive Utility, you can disable the support for a very marginal performance improvement.

zip.close()

Closes file and returns Promise which resolves when underlying file/file descriptor/reader is closed.

Files must be closed when finished with to avoid resource leakages.

const zip = await yauzl.open('/path/to/file.zip');
// Read entries etc, and then...
await zip.close();

Reading entries

zip.readEntry()

Read next entry from ZIP file. Return value is an instance of yauzl.Entry class.

When there are no entries left, returns null.

Calling .readEntry() again returns the next entry.

const entry1 = await zip.readEntry();
const entry2 = await zip.readEntry();

zip.readEntries([numEntries])

Read up to numEntries entries, and return as an array.

const [entry1, entry2] = await zip.readEntries(2);
const [entry3, entry4] = await zip.readEntries(2);

If numEntries is null or undefined, reading will continue until all entries are read.

WARNING: This is dangerous. If ZIP contains a large number of files, could lead to crash due to out of memory. Safer to use async iteration instead.

Async iteration

Zips can be used as async iterators, iterating over entries.

const zip = await yauzl.open('/path/to/file.zip');
for await (const entry of zip) {
  // Do something with the entry
}
await zip.close();

Reading file data

zip.openReadStream(entry, [options])

Open a readable stream for the contents of a ZIP file entry. Returns a promise of a stream.

entry must be an Entry object from this Zip.

const readStream = await zip.openReadStream(entry);
readStream.pipe(writeStream);

It is possible to destroy the readStream before it has piped all of its data. To do this, call readStream.destroy(). This closes the specific stream, but not the ZIP file as a whole. The underlying file descriptor used for reading from the ZIP file remains open, so calling zip.close() is still required.

entry.openReadStream([options])

As above, but called on an Entry object.

const entry = await zip.readEntry();
const readStream = await entry.openReadStream();
readStream.pipe(writeStream);

Options

zip.openReadStream() and entry.openReadStream() both take the following options:

  • decompress - Set to false to disable decompressing compressed data.
  • decrypt - Set to false to disable decrypting encrypted data.
  • validateCrc32 - Set to false to disable validation of CRC32 checksum of file data.
  • start - Stream range of file data beginning at byte index start.
  • end - Stream range of file data ending at byte index end (exclusive).

start and end options can only be used if decompress, decrypt and validateCrc32 are all false.

Specifying decrypt: false for an encrypted entry causes the readable stream to provide the raw, still-encrypted file data (including the 12-byte header described in the spec).

Utilities

dosDateTimeToDate(date, time)

Converts MS-DOS date and time data into a JavaScript Date object. Each parameter is a Number, treated as an unsigned 16-bit integer. Note that DOS date/time format does not support timezones, so the date will be interpreted as UTC.

const entry = await zip.readEntry();
const date = yauzl.dosDateTimeToDate(entry.lastModDate, entry.lastModTime);

NB: Original yauzl interpreted dates according to local timezone. UTC is used here instead to ensure consistent result when unzipping the same ZIP anywhere.

validateFilename(filename)

Checks filename is not absolute or relative path, and does not contain backslashes (\). Throws an error if it does.

This function is automatically run for each entry, as long as decodeStrings and validateFilenames options are true.

Class: Zip

Instances of Zip class are returned by open(), fromFd(), fromBuffer(), and fromReader(). The constructor for the class is not part of the public API.

zip.isOpen

Boolean. true if Zip is open for reading. false if zip.close() has been called.

zip.entryCount

Number. Total number of entries in ZIP file.

zip.entryCountIsCertain

Boolean. true if entryCount can be relied on for accuracy.

Mac OS Archive Utility truncates entryCount to 16 bits (i.e. max 65535), so it can be inaccurate.

Where the ZIP file has been identified as possibly a Mac OS ZIP, and it's possible entryCount is inaccurate, entryCountIsCertain will be false. In this case, actual number of entries may be higher than reported (but not lower).

As entries are read with readEntry(), entryCount will be increased if it becomes evident that there are more entries than reported. Once entryCount is determined to definitely be accurate, entryCountIsCertain will change to true.

zip.comment

String. Always decoded with CP437 per the spec.

If options.decodeStrings is false, this field is the undecoded Buffer instead of a decoded String.

zip.isZip64

true if ZIP file uses ZIP64 extension (allowing more than 65535 files, or file data larger than 4 GiB).

zip.isMacArchive

Boolean. true if ZIP is a faulty Mac OS Archive Utility ZIP. false if it's not known to be.

zip.isMaybeMacArchive indicates whether ZIP may be a Mac OS Archive Utility ZIP.

You don't need to worry about either of these properties - they're mainly for the internal logic of this package - but if you happen to be interested, the possible states are:

  • isMacArchive = true: Definitely a faulty Mac OS Archive Utility ZIP.
  • isMaybeMacArchive = true: ZIP possibly created by Mac OS Archive Utility (very probably it is).
  • isMaybeMacArchive = false: ZIP definitely not created by versions Mac OS Archive Utility which produce faulty ZIPs.

Both properties are updated by readEntry() and openReadStream(), as more about the ZIP file becomes known.

Class: Entry

Instances of Entry class are returned by zip.readEntry(), zip.readEntries(), or using a Zip as an async iterator. The constructor for the class is not part of the public API.

Objects of this class represent ZIP file entries. Refer to the ZIP file specification for more details about these fields.

These fields are of type Number:

  • versionMadeBy
  • versionNeededToExtract
  • generalPurposeBitFlag
  • compressionMethod
  • lastModDate (MS-DOS format, see getLastMod())
  • lastModTime (MS-DOS format, see getLastMod())
  • crc32
  • compressedSize
  • uncompressedSize
  • internalFileAttributes
  • externalFileAttributes
  • fileHeaderOffset
  • fileDataOffset (usually unpopulated until openReadStream() is called)

In addition:

entry.filename

String. Following the spec, the bytes for the filename are decoded as UTF8 if generalPurposeBitFlag & 0x800, otherwise as CP437. Alternatively, this field may be populated from the Info-ZIP Unicode Path Extra Field (see extraFields).

This field is automatically validated unless decodeStrings or validateFilenames options are false.

If decodeStrings option is false, this field is the undecoded Buffer instead of a decoded String. In that case, generalPurposeBitFlag and any Info-ZIP Unicode Path Extra Field are ignored.

NB: In original yauzl, this field was named fileName (capital N).

entry.uncompressedSizeIsCertain

Boolean. true if uncompressedSize is reliable.

Mac OS Archive Utility truncates uncompressedSize to 32 bits (i.e. max size 4 GiB), so it is inaccurate for files >= 4 GiB in size.

Where the ZIP file has been identified as possibly a Mac OS ZIP, and it's possible uncompressedSize is inaccurate, uncompressedSizeIsCertain will be false. In this case, actual uncompressedSize may be higher than reported (but not lower).

After openReadStream() has completed streaming out the file, uncompressedSize will be updated to reflect the accurate uncompressed data size, and uncompressedSizeIsCertain will change to true. NB: This doesn't happen if either decompression (decompress option) or entry size validation (validateEntrySizes option) are disabled. Both are enabled by default.

entry.extraFields

Array with each entry in the form {id, data}, where id is a Number and data is a Buffer.

This library looks for and reads the ZIP64 Extended Information Extra Field (0x0001) in order to support ZIP64 format ZIP files.

This library also looks for and reads the Info-ZIP Unicode Path Extra Field (0x7075) in order to support some ZIP files that use it instead of General Purpose Bit 11 to convey UTF8 filenames. When the field is identified and verified to be reliable (see the ZIP file spec), the filename in this field is stored in the filename property, and the filename in the central directory record for this entry is ignored. When decodeStrings is false, any Info-ZIP Unicode Path Extra Fields are ignored.

None of the other fields are considered significant by this library. Fields that this library reads are left unaltered in the extraFields array.

entry.comment

String decoded with the charset indicated by generalPurposeBitFlag & 0x800, as with filename (the Info-ZIP Unicode Path Extra Field has no effect on the charset used for this field).

If decodeStrings is false, this field is the undecoded Buffer instead of a decoded String.

entry.getLastMod()

Get last mod date as a Date object. Effectively implemented as:

dosDateTimeToDate(entry.lastModDate, entry.lastModTime)

See dosDateTimeToDate().

entry.isEncrypted()

Returns whether this entry is encrypted with "Traditional Encryption". Effectively implemented as:

(entry.generalPurposeBitFlag & 0x1) !== 0

Note that "Strong Encryption" is not supported, and will result in an error.

entry.isCompressed()

Effectively implemented as:

entry.compressionMethod !== 0

NB: This differs slightly from original yauzl's behavior. yauzl would return false for an entry which is compressed, but with a compression method other than 8 (Deflate compression).

Class: Reader

This class is meant to be subclassed by clients and instantiated for the fromReader() function.

If creating your own Reader subclass, it should provide the following methods:

  • _createReadStream(start, length) (required)
  • async _read(start, length) (optional)
  • async _open() (optional)
  • async _close() (optional)

The file readers provided by yauzl for open() etc are subclasses of Reader. Their implementations can be found in lib/reader.js.

Versioning

This module follows semver. Breaking changes will only be made in major version updates.

All active NodeJS release lines are supported (v16+ at time of writing). After a release line of NodeJS reaches end of life according to Node's LTS schedule, support for that version of Node may be dropped at any time, and this will not be considered a breaking change. Dropping support for a Node version will be made in a minor version update (e.g. 1.2.0 to 1.3.0). If you are using a Node version which is approaching end of life, pin your dependency of this module to patch updates only using tilde (~) e.g. ~1.2.3 to avoid breakages.

Tests

Use npm test to run the tests. Use npm run cover to check coverage.

Use npm run test-mac-big to run additional tests on large Mac OS ZIP files. These tests are slow.

Changelog

See changelog.md

Issues

If you discover a bug, please raise an issue on Github. https://github.com/overlookmotel/yauzl-promise/issues

Contribution

Pull requests are very welcome. Please:

  • ensure all tests pass before submitting PR
  • add tests for new features
  • document new functionality/API additions in README
  • do not add an entry to Changelog (Changelog is created when cutting releases)

About

yauzl unzipping with Promises

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • JavaScript 100.0%