Skip to content

Add eStargz specification to OCI v1 (support lazy pulling) #815

Open
@ktock

Description

TL;DR

  • Standardize eStargz archive format as an optional extension to OCI Image Spec v1: https://github.com/containerd/stargz-snapshotter/blob/v0.2.0/docs/stargz-estargz.md
  • Define org.opencontainers.image.toc.digest annotation for enabling chunk-level content verification
  • No need to introduce a new layer media type, because eStargz is fully compatible with application/vnd.oci.image.layer.v1.tar+gz
    • Though compression methods other than gzip is out-of-scope, this spec can be smoothly extended to other compression methods in the future (e.g. zstd)

Overview

Pull is one of the time-consuming steps in the container lifecycle. One of the root causes of this issue is tar (+gzip) archived layer that doesn't allow image consumers (e.g. container runtimes, builders, etc.) to run container until the entire contents being locally available.

This proposal aims at solving this issue by enabling lazy pulling for OCI images. Lazy pulling here means image consumers don't download the entire image on pull operation but fetches necessary chunks of contents on-demand. This allows us to reduce the time to take for pull and startup the container quickly.

We propose standardizing lazily-pullable and OCI-compatible tar.gz extension "eStargz" (https://github.com/containerd/stargz-snapshotter/blob/v0.2.0/docs/stargz-estargz.md) which is developed in containerd Stargz Snapshotter project. The recent benchmarking result shows the performance improvement on the pull operation (Please also see the README for the detailed explanation).

benchmarking result

Because eStargz is fully compatible with the current spec,

  • it can be lazily pulled without any changes to the registry
  • it can still run on eStargz-agnostic runtimes so the community can adopt the new spec without taking risk of breaking their environment

Though this proposal focuses on the extension to the gzip-compressed layer, we believe eStargz can be smoothly extended to other compression methods in the future. Recently, Podman community tries to define zstd-version of lazy-pullable format zstd:chunked based on the eStargz spec. Standardizing eStargz will also help standardize zstd:chunked in the future, with a minimum amount of changes to the spec. This consistency of the format across compression methods should also be beneficial for runtime implementers to adopt lazy pulling without unnecessary complexity.

Thanks @AkihiroSuda for the discussion about this proposal.

Goal

The goal of this proposal is to add support of lazy pulling to OCI Image Spec by standardizing eStargz spec (https://github.com/containerd/stargz-snapshotter/blob/v0.2.0/docs/stargz-estargz.md) as an optional extension and by defining an annotation org.opencontainers.image.toc.digest for content verification. Changes aren't needed to the OCI Distribution Spec because eStargz can be lazily pulled from the registry as long as it supports HTTP Range Request which is already included to that spec.

Proposed Changes

Fig 1. The Structure Fig 2.Prefetching Support Fig 3. Content Verification

Starndardize eStargz archive format as an optional extension to application/vnd.oci.image.layer.v1.tar+gz (Fig 1 and 2)

eStargz is compatible with application/vnd.oci.image.layer.v1.tar+gz so a new Media Type doesn't need to be introduced. Instead, we propose adding eStargz spec to OCI Image Spec as the optional extension to +gzip Media Types.

The overview of eStargz is the following. For more details, please refer to eStargz spec.

  • Gzip-compressing tar entry per file (or chunk if that file is large). This enables the image consumer to decompress each tar entry selectively.
  • Adding TOC JSON to the layer tar blob. This contains metadata and content offset of all files. This allows image consumers to mount a layer without scanning the entire tar.gz and to extract necessary contents, selectively.
  • Adding meta entries for indicating "prioritized" files that SHOULD be prefetched when mounting the layer. This helps image consumers to make sure that these files are locally available and to avoid network-related overheads when reading these files.

Define org.opencontainers.image.toc.digest annotation (Fig 3)

In the current OCI Spec, a layer can be verified by the Digest of the layer written in the descriptor in the manifest. However, when a user lazily pull a layer (i.e. fetch and extract chunks separately on demand), this verification method cannot be applied because the entire layer contents haven't acquired.

For solving this issue, eStargz can verify the contents in chunk-granularity on demand. Digests of each chunk are written in the TOC JSON so that the image consumers can verify them separately every time it acquire the file contents. The TOC JSON itself is verified by the digest written in a pre-defined annotation on the layer descriptor in the manifest which is already verifiable with the current spec. More details of this extension are described in the eStargz definition doc.

For enabling this, we propose adding the following pre-defined annotation, following the OCI's naming convention of annotation.

  • org.opencontainers.image.toc.digest: OCI Digest of the TOC JSON in the layer

Out-of-scope

This proposal focuses on lazy pulling and standardizing eStargz spec which is used in the wild, for OCIv1. Thus some requirements discussed in OCIv2 are out-of-scope in this proposal, incluiding:

Though OCIv2 is out-of-scope in this proposal, eStargz doesn't conflict to OCIv2 discussion.

This proposal focuses on the extension to application/vnd.oci.image.layer.v1.tar+gz and other types of compression method (e.g. zstd) are out-of-scope.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions