-
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Spec: Nested WACZ files? #129
Comments
Clarifying a bit more, there are two key reasons for resulting in multiple WACZ files: The solution for these are as follows:
Options 1) and 2) are good solutions for reason A - where multiple WACZ files exist due to parallel crawling, and can be quite small. |
@ikreymer for nesting would we need a new file name and extension for nested WACZ files that is distinct from WACZ? If not won't WACZ viewers need to account for whether the WACZ was nested or not and behave accordingly? If we want to consider nesting as part of WACZ I think this would mean updating the WACZ specification to include this nesting functionality directly, or at least pointing to the separate WACZ Aggregation specification? |
In the use case above where each WACZ is individually signed, is the issue that the cert that is being used to sign each WACZ needs to be different? Or is it simply a technical convenience to get around CDXJ merging? Or are there other issues at play? |
Related to multi-WACZ / aggregated WACZ loading #112, a possible idea is to support nested WACZ files, eg. ZIP files containing other WACZ files, and a datapackage.json.
The main use case for this would be parallel crawlers which produce multiple WACZ files which are signed individually. For packaging / distribution, it is still convenient to bundle the output into a single file. This makes sense if the reason for having multiple WACZ output is parallelism, and not size limits. Some questions to answer around this:
An alternative would be to simply merging WACZ files, merging the CDXJ, page lists, etc.., which is also doable, but more work (both to implement and to run).
The text was updated successfully, but these errors were encountered: