Collected notes on OCI Artifacts

tarballs, sha digests and json as far as the eye can see

Published at

OCI1 artifacts are an expansion of the OCI Image and OCI Distribution specifications that allow storing arbitrary blobs in the same content addressable manner as container image layers, configuration and manifests are stored.

Table of Contents

Content Addressing

Content addressing is the idea of storing information based on some function of its content rather than an arbitrary label2. Importantly this function must reliably and uniquely identify content. If the function was the length of the content, then all content that is 5 bytes long would compete for the same digest which isn't very useful. In the case of OCI registries the canonical function is sha256 although other hashing functions are supported.

Digests

Digests are the combination of a hashing function's name and the result of that hash when applied to some content. If we want to store OoTR_1465294_OHBKB3ZQLY.zpf in the registry and then later retrieve it, we would need to know its digest:

1
2
3
HASH=$(sha256sum OoTR_1465294_OHBKB3ZQLY.zpf | cut -f1 -d$' ')
echo "sha256:${HASH}"
# sha256:4a41c27f1ea6825e770d30da574b364cf2e5df118e930e1789f46d4186c7605b

Digests always refer to a specific revision of the target because that revision will uniquely produce that digest. All content in the registry is addressable via its digest

Tags

Tags are used as human readable aliases to a specific digest and multiple tags may reference the same digest. A tag always refers to a specific digest but which digest that a tag refers to may change over time. Emphasis that any tag may change its referenced digest over time, not just specific tags such as latest.

mediaType

A mediaType3 is a human and machine readable instruction for how to interpret a blob. A blob is ascribed a media type when it is referenced by a descriptor and a blob can have different media types ascribed to it in different descriptors. Media types are constructed of the following portions:

  1. Type
  2. Subtype Tree
  3. Suffix
  4. Parameters
1
application/vnd.sudonters.zootr.settings.v7.1+json
  1. application
  2. vnd.sudonters.zootr.settings.v7.1
  3. json

In this case this media type would refer to a settings file for OOTR version 7.1 and it is stored in a json format. Such specific media types are extremely common. Not only are developing such taxonomies fun4 but they double as semantic labels and versioning schemes.

Versioning is important to allow the artifact to change over time. In these examples I have decided to tie the version of my artifacts to the current, at the time of writing, stable version of the OOTR program. Using the suffix to indicate the representation may seem silly but this allows clients to request alternative representations if available through a process known as content negotiation.

1
application/vnd.sudonters.zootr.settings.v7.1+text

The change in this suffix indicates that this is the single line encoded representation of settings. Extensions to the program could introduce XML or YAML formatted versions of the settings. A client may prefer any of these formats instead and the server should provide that representation if available.

1
application/vnd.sudonters.zootr.notes.v7.1+md;charset=utf8
  1. application
  2. vnd.sudonters.zootr.notes.v7.1
  3. md
  4. charset=utf8

This media type describes a text document that are written as markdown and the encoding character set is utf-8. Parameters are specific to the media type.

The spec defines several of its own media types5

Blobs

Blobs6 are the actual content we want to store in the registry. If we want to store the content directly, meaning uncompressed etc, in the registry we would create the digest directly from the contents:

1
2
3
HASH=$(sha256sum OoTR_1465294_OHBKB3ZQLY.zpf | cut -f1 -d$' ')
echo "sha256:${HASH}"
# sha256:4a41c27f1ea6825e770d30da574b364cf2e5df118e930e1789f46d4186c7605b

However, if we want to compress the content then the digest should be calculated from the compressed content:

1
2
3
HASH=$(gzip < OoTR_1465294_OHBKB3ZQLY.zpf | sha256sum | cut -f1 -d$' ')
echo "sha256:${HASH}"
# sha256:81a9db5f5ce6af358cbd05f0cb6e406fda09b9c3c5ce1ac0158ef03a6659125c

This distinction is important because the OCI image specification makes references to digests of both compressed and uncompressed content in different context that all refer, conceptually at least, to the same content except for the difference in compression. Blobs are always referenced by the digest of the raw bytes at rest in the registry.

Blobs themselves do not carry any kind of mediaType. Instead this interpretation is provided via descriptors that reference blobs. This means that blobs are strictly concerned with the raw bytes that make up the content and defers interpretation of the content to other registry resources like manifests that declare descriptors.

Descriptors

Descriptors are how all content is referenced by resources within the repository. Descriptors are required to carry:

  1. The mediaType of the blob
  2. The digest of the blob
  3. The size in bytes of the blob

The digest is used to both address content and serves as verification that the correct content was downloaded when interacting with untrusted sources. Similarly, the size can also be used to verify contents. The mediaType describes how the content should be interpreted by the client.

The digest is used to both address the blob and for verification that the correct blob was downloaded when interacting with untrusted sourced. The byte size can similarly be used as a check for the correct blob. The mediaType describe what the blob is in context of this artifact.

An example descriptor of the gzipped OOTR patch file:

1
2
3
4
5
{
  "mediaType": "application/vnd.sudonters.zootr.patch.v7.1+gzip",
  "digest": "sha256:81a9db5f5ce6af358cbd05f0cb6e406fda09b9c3c5ce1ac0158ef03a6659125c",
  "size": 447488
}

A descriptor may have other attributes on it such as annotations -- arbitrary metadata represented as key-value pairs -- or an artifactType which is a secondary mediaType that accurately describes the artifact when the descriptor does not point to an image layer but is using an image layer mediaType7.

There is also an "empty descriptor", reproduced below, which is intended as a kind of "null" value when an artifact does not have any content associated to a property, such as layers or configuration:

1
2
3
4
5
6
{
  "mediaType": "application/vnd.oci.empty.v1+json",
  "digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a",
  "size": 2,
  "data": "e30="
}

Manifests

A manifest describes a specific revision of an image or artifact by recording the blobs of the configuration and layers of that revision. The primary manifest mediaType is application/vnd.oci.image.manifest.v1+json which can be used to describe image AND non image artifacts.

Config

The manifest config property is intended as machine readable instructions for interacting with the image or artifact. When the manifest describes an OCI Image this descriptor points to configuration that describes how to construct the unionfs for the container, the default entrypoint, and other information needed to launch the image.8

An artifact may have similar instructions. An artifact with mediaType application/vnd.sudonters.zootr.patch.v7.1 might be bundled with a configuration that indicates the settings and RNG seed used to generate the specific patch file.

If the artifact does not have any configuration it should use the "Empty Descriptor" described above.

Layers

These are descriptors to blobs within the repository. For image, layers must be an ordered collection of unionfs layers; however, the only restriction placed on artifacts is that they SHOULD have at least one layer and SHOULD use the empty descriptor instead of providing an empty layer collection. Additionally, there is no requirement that all layers in a manifest need to share the same media type.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "artifactType": "application/vnd.sudonters.zootr.notes.v7.1",
  "layers": [
    {
        "digest": "sha256:7c6cfc2e9e41335e46687f337998b8dcd42f03d5ac0e538176dc9f4db44bfc2f", 
        "size": 79872,
        "mediaType": "application/vnd.sudonters.zootr.notes.v7.1+md"
    },
    {
        "digest": "sha256:b1f02c4db88fb8c8ced3e7dcac7407c7f1635dd4094388a00bfdd613d433bf7a",
        "size": 51577285,
        "mediaType": "application/vnd.sudonters.zootr.notes.v7.1+jpg"
    }
  ]
}

Indexes

Tags provide an alias to a specific digest. However, if we were to pull and inspect the Docker image ubuntu:22.04 on an amd64 host and an arm64 host we would see they have different digests9. This is accomplished via application/vnd.oci.image.index.v1+json aka indexes , which are a collection of manifest descriptors in a single document. Initially this was designed to support "multiarch images" but could also be used any time a registry might offer various representations of an artifact. The specification allows for an index to reference another index in this fashion10.

After submitting all references in the index to the registry, we additionally submit an index request. If we want to describe a particular configuration to store in the registry, we might want to store both the JSON formatted artifact and the encoded string format as separate manifests rather than as separate layers within the same manifest. We submit the request similar to a manifest but with a different Content-Type header:

1
2
PUT /v2/zootr/stable/manifests/OHBKB3ZQLY
Content-Type: application/vnd.oci.image.index.v1+json

And the body looks like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.index.v1+json",
  "artifactType": "application/vnd.sudonters.zootr.stable.settings.v7.1",
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "artifactType": "application/vnd.sudonters.zootr.stable.settings.v7.1+json",
      "size": 868,
      "digest": "sha256:35458012a23557956a0b149ea5f1833397b59a9230d3b1769c9341cbccf3cb23"
    },
    {
      "mediaType":"application/vnd.oci.image.manifest.v1+json",
      "artifactType": "application/vnd.sudonters.zootr.stable.patch.v7.1+text",
      "size": 563,
      "digest": "sha256:4e27081591af61545e0d94a8274c53feeb67d94397b6e4fbf4a9714bfa856dfd"
    }
  ]
}
Clients must request the index by setting the correct ACCEPT header:

1
2
HEAD /v2/zootr/stable/manifests/OHBKB3ZQLY
Accept: application/vnd.oci.image.index.v1+json

If the index content exists at the reference then the registry will produce it, otherwise an error will be returned which either indicates that there's nothing at the tag at all or there is content that exists at the tag but it is in a different representation.

Supplemental Manifests

The registry I was playing with -- registry:2 -- did not support subject and references which were added in OCI Distribution 1.1 which is pretty new at time of writing.

A manifest may reference another manifest11 via the subject field:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "artifactType": "application/vnd.sudonters.zootr.notes.v7.1",
  "layers": [
    {
        "digest": "sha256:7c6cfc2e9e41335e46687f337998b8dcd42f03d5ac0e538176dc9f4db44bfc2f", 
        "size": 79872,
        "mediaType": "application/vnd.sudonters.zootr.notes.v7.1+md"
    },
    {
        "digest": "sha256:b1f02c4db88fb8c8ced3e7dcac7407c7f1635dd4094388a00bfdd613d433bf7a",
        "size": 51577285,
        "mediaType": "application/vnd.sudonters.zootr.notes.v7.1+jpg"
    }
  ],
  "subject": {
      "mediaType":"application/vnd.oci.image.manifest.v1+json",
      "artifactType": "application/vnd.sudonters.zootr.stable.patch.v7.1",
      "size": 599,
      "digest": "sha256:06fce1ebf8c29ec26851e9f4fa35c0b9e1329a0ba245efc21f1c387366391c6b"
  }
}

Clients may then discover this manifest by asking the registry for manifests that reference sha256:06fce1ebf8c29ec26851e9f4fa35c0b9e1329a0ba245efc21f1c387366391c6b -- the patch manifest. A registry that implements the referrers API MUST NOT return a 404 to these queries12. When responding a registry produces an image index that holds the referencing manifests. Clients may additionally request only referencing manifests by appending an artifactType=${MEDIATYPE} to the request. If a client was interested in only locating notes for a given patch:

1
2
3
GET /v2/zootr/stable/referrers/
    sha256:06fce1ebf8c29ec26851e9f4fa35c0b9e1329a0ba245efc21f1c387366391c6b
    ?artifactType=application/vnd.sudonters.zootr.notes.v7.1

The canonical examples for this mechanism are SBOM13 manifests and cryptographic signatures14 where these manifests aren't necessary to the operation of the artifact and in fact may be generated by third parties. These additional manifests may be required by tooling within our infrastructure, i.e. a kubernetes admission handler that requires a signature from a particular private key before allowing an image to run in the cluster.


  1. Open Container Initiative 

  2. not that a function of content is any less arbitrary 

  3. aka "Content Type" aka "MIME Type" 

  4. Maybe these are the justified hierarchies Chomsky talks about 

  5. including application/vnd.oci.descriptor.v1+json which describes a descriptor 

  6. not a joke, this is the technical term for "eh it's just some bytes who cares" 

  7. for reasons such as legacy support 

  8. The OCI Image Configuration is specified here

  9. sha256:56887c5194fddd8db7e36ced1c16b3569d89f74c801dc8a5adbf48236fb34564 and sha256:cf3cc0848a5d6241b6218bdb51d42be7a9f9bd8c505f3abe1222b9c2ce2451ac at time of writing 

  10. I'm not 100% sure the use of this but it is nice that it is an option 

  11. Unclear if indexes are allowed to participate in this relationship 

  12. Returning a 404 means not implemented because it predates 1.1 registries 

  13. Software Bill of Materials, a document that details what software is included in an artifact 

  14. separate from digests, digests authenticate what the bytes are, these authenticate that we said its okay for those bytes to be around