Skip to content

WACZ Aggregation / Multi WACZ Specification #112

Open
@edsu

Description

Details about how to aggregate multiple WACZ files into a single WACZ need to be added to the specification. This hinges on resources in the datapackage.json using a url for a WACZ rather than a path. See the Resource Information section in the Data Package specification for details:

{
   "resources": [
      {"hash": "...", "url": "https://example.com/filename_1.wacz", "bytes": "..."}
      {"hash": "...", "url": "https://example.com/filename_2.wacz", "bytes": "..."}
   ]
   ...
}

There should also be a Data Package profile so that clients can easily distinguish between collections and regular WACZ files. Perhaps WACZ-Aggregation?

The specification should document that WACZ users MAY want to use the data-package.json as a place to record additional metadata about crawls. See the browsertrix-cloud API for examples.

Metadata

Assignees

Labels

documentationImprovements or additions to documentation

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions