-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds zip support to the zlib
module
#45651
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 I know this is an unpopular opinion, but I'm not convinced this should live in node core. Sure it's a common file format, but then again so are tar, rar, 7z, zstd, xz, and others, which also don't belong in node core.
I feel like adding such modules to node core is further leading to feature creep. I get that other platforms like PHP and such may have zip modules, but they also include a ton of other modules that make them "kitchen sink" platforms, which I would hate to see node.js become.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd appreciate some early review to let me know places where the code isn't compatible with Node.js' standards.
Might be jumping the gun here since yeah, there should be a discussion about whether this should happen at all, but, sure, gave it a first look. I do concur with @mscdex's concerns, fwiw.
src/node_zip.cc
Outdated
} | ||
|
||
void ZipArchive::MemoryInfo(MemoryTracker* tracker) const { | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(might want to fill this out)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How precise does it have to be? I updated the code to track the size of the input buffer + the buffer of any file that gets added later, but it doesn't include the small-ish libzip overhead, and gets confused if the same file is modified multiple times. If the value is indicative it might be fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn’t have to be super precise, but it should be usable for debugging. You’ll probably want to track buf_
here via tracker->TrackField("buf", buf_);
. If you can’t track or estimate memory owned by libzip (including memory for added entries?) then it’s probably fine to omit it, rather than to give numbers that are potentially very inaccurate (e.g. after repeated AddEntry()
+ DeleteEntry()
calls).
src/node_zip.cc
Outdated
} | ||
|
||
zip_int64_t file_index = zip_file_add(zip->zip_, *path, file_source, ZIP_FL_OVERWRITE | ZIP_FL_ENC_UTF_8); | ||
CHECK_GE(file_index, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if this call fails? Likely also applies elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now the code is overly strict and if any function fails, the program aborts on the CHECK_GE
call. I have to replace most these calls by something that would just throw instead.
Despite the comments from other reviewers, my main concern is about the usage of the |
I used |
There is an initiative to use native buffers in the new public APIs (referencing my personal talks with @addaleax and @jasnell), a @nodejs/tsc member can clarify if this is still a thing. |
@anonrig I don’t know if that’s the best way forward here, but since this feels like a very broad conversation (“Should new Node.js APIs return |
In this case, I think Buffer is fine given that it is consistent with the rest of the zlib module. I can see us eventually making a call on avoiding Buffer in the future (or standardizing on it) but this is not the place to decide that |
Is a new top level module what we want here? As opposed to adding this to zlib? I know it's not based on zlib but neither is brotli. |
I think @GeoffreyBooth had the same feedback. I don't have a strong opinion there, perhaps |
I would put it in |
Just leaving this reference here: #41588 |
That might make some sense for zip, which inherently supports compression, but if we do add other archive formats that don't, it won't fit. Also, except for zip, compression and archive formats are orthogonal even if related topics. |
archive
modulezlib
module
## Compressing multiple files together | ||
|
||
<!-- YAML | ||
added: REPLACEME | ||
--> | ||
|
||
The `zlib` library provides ways to compress individual objects, but not to | ||
aggregate multiple ones into a single file suitable for redistribution (what | ||
is often called archival). | ||
|
||
To this end, `node:zip` provides the `ZipArchive` class which allows to create, | ||
read, and modify zip archives: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this section needs updating? Because of references to node:zip
etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it's a typo, the archive is now part of node:zlib
(would it make sense to have a check in lint-md that all node:something
identifiers must be valid?)
Ref #45434
This PR adds a new
ZipArchive
class to thezlib
module, which can be used to read and write content from zip archives. Its current API looks like this:Maintenance cost
I kept the feature scope limited enough to cover most of the use cases but without increasing the maintenance cost or build cost. A few things have been cut from what the libzip would allow:
Opening files directly from the filesystem isn't supported, because it would bypass
node:fs
. The current API only works with memory buffers (according to my tests it doesn't have any negative impact even when compared to the wasm API which went through file descriptors).Encryption isn't supported, because it's unclear how it should integrate with
node:crypto
. There's room for follow-up, but it didn't seem a required feature for the first iteration.Performances
Keep in mind that raw performances aren't the main reason why zip support is important to have as a native feature. The speedup is nice, the simplified garbage collection is very nice, but the real benefit is having a stable cross-platform way to bundle files between platforms. It will be useful for cache mechanisms, transfer algorithms, user CLI generation, and more.
Still, I made some reasonable checks to make sure that no use case regressed. Size of the binary before / after:
Performance-wise, using Yarn as benchmark, the results show native being ~2x faster than wasm (keep in mind the wasm implementation isn't the most popular zip library; projects using jszip will see significantly larger differences):
To Do