Skip to content

Common material and utilities for publishing E-ARK specifications sites and PDFs.

Notifications You must be signed in to change notification settings

DILCISBoard/spec-publisher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

E-ARK Specification Publication

This repository contains a few tools used for the publication of E-ARK specifications:

Overview

Pre-requisites

The tools here are usually invoked as part of a publication workflow, e.g. from the E-ARK CSIP project. There are a few prerequisites for running the tools:

As often as not these will need to be deployed on some kind of continuous integration environment. For purpsoes of the documentation examples we will be using GitHub Actions workflows. The tools required are available/made available as required.

Assumptions

There are a few assumptions regarding the publication workflow:

All of the above can be subverted if required, but that's the path of least resistance.

What it does

You can do the following:

  • generate tables of requirements, example and appendices from a METS profile as GitHub flavoured Markdown;
  • generated the same as Markdown suitable for conversion to PDF using Pandoc;
  • use the templates and metadata to generate a GitHub pages site and PDF documents via the Pandoc Docker image.

Typical Publication Process

  1. Preparation Ensure that the specifcation is ready for publication and that details like the version number and publication date are as required.
  2. Generated the pages site markdown from a METS profile This is typically done via the spec-publisher Java project using the master branch, something like: java -jar target/mets-profile-proc.jar -o ../profile/E-ARK-CSIP.xml.
  3. Create the pages site markdown Usually by running a Pandoc script using the Docker image, something like docker run --rm -v "$PWD:/source" -u "$(id -u):$(id -g)" --entrypoint /source/create-site.sh eark4all/spec-pdf-publisher.
  4. Generate the PDF markdown from a METS profile This is typically done via the spec-publisher Java project using the feat/pdf-publication branch, something like: java -jar target/mets-profile-proc.jar -o ../profile/E-ARK-CSIP.xml.
  5. Create the PDF Usually by running a Pandoc script using the Docker image, something like docker run --rm -v "$PWD:/source" -u "$(id -u):$(id -g)" --entrypoint /source/create-pdf.sh eark4all/spec-pdf-publisher.
  6. Use Jekyll to generate the website This is usually done using the GitHub pages Docker box, e.g. docker run --rm -v "$PWD"/docs:/usr/src/app -v "$PWD"/_site:/_site -u "$(id -u):$(id -g)" starefossen/github-pages jekyll build -d /_site, which uses the ./docs directory as a source and generates a site in ./_site.
  7. Publish the generated site to GitHub.

Preparing a Specification for Publication

The specification publication process produces an E-ARK specification website and the PDF specification document. These can be generated from the following sources, or a combination of sources:

  • a METS profile XML document describing the specifcation and its requirements (these are extracted by the spec-publisher Java project described below);
  • markdown files for text content, e.g. schema.md, requirements.md, examples.md or appendices.md;
  • HTML, or LaTex files for the same;
  • images to accompany the text content, these can be included in the text source using the appropriate markdown or HTML syntax; and
  • metadata in a top-level YAML metadata file, e.g. metadata.yml.

Directory Structure

The directory structure for the specification is fairly arbitary. The files can be concatenated in any order to form a final specification document. That said, following a convention will make the process easier to manage, and share. The following is a typical structure:

archived/
  - old versions of the specification PDF documents.
examples/
  - example information packages in archive format.
profile/
  - the METS profile XML documents for all versions of the specification.
schema/
  - any supporting XML schema documents, e.g. METS extensions, the METS Profile and METS schema documents, etc.
spec-publisher/
  - the spec-publisher Java project, if required.
specification/
  - the text and image files that comprise the specification source.

Metadata

The metadata for the specification is stored in a top-level YAML file within the specification directory, e.g. specification/metadata.md. Currently the supported fields are:

  • title: the title of the specification;
  • subtitle: the subtitle of the specification;
  • abstract: a short abstract of the specification;
  • version: the version number of the specification (usually templated, see below); and
  • date: the release date of the specification (usually templated, see below).

Here's the CSIP as an example:

---
title: E-ARK CSIP
subtitle: Common Specification for Information Packages
abstract: |
        This base profile describes the Common Specification for Information
        Packages (CSIP) and the implementation of METS for packaging OAIS
        rest of abstract here...
version: ${RELEASE_VERSION}
date: ${RELEASE_DATE}
---

The templated fields ${RELEASE_VERSION} and ${RELEASE_DATE} are replaced by the publication workflow, e.g. by the GitHub Actions workflow. The values are derived from the last git tag, or the current release tag.

The spec-publisher Java Project

Build from source

This is a Java project and is built using Maven. You'll need a copy of this project sub-directory, from a git clone, git clone https://github.com/DILCISBoard/E-ARK-CSIP.git or a source package download.

Note that there are effectively 2 forks of this project one for bu source package download.

From within this project sub-directory, e.g. mets-profile-processor issue the Maven command: mvn clean package to run tests and build.

Class overview

It's just a basic SAX processor for the profile with some Markdown output.

eu.dilcis.csip.MetsProfileProcessor

Main entry point for fat JAR package, sequences parsing user input and running the SAX handler.

eu.dilcis.csip.ProcessorOptions

Parses the String args array and records the user options in a dedicated class.

eu.dilcis.csip.profile.MetsProfileXmlHandler

SAX event driven handler for METS Profile, parses Requirements lists from Profile XML document.

eu.dilcis.csip.OutputHandler

Buffers XML element text and handles output (for now.....)

ToDo ?

  • Stronger data typing for eu.dilcis.csip.profile.MetsProfileXmlHandler.Requirement
  • Requirement validation, e.g. non-empty fields etc.
  • Group think for other validation activities.
  • Markdown table generation
  • index.md file template selection
  • index.md file template substitution
  • Generalise vanilla METS Profile handling to base class
  • fix SaxExceptions from OutputHandler class

About

Common material and utilities for publishing E-ARK specifications sites and PDFs.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •