Skip to content

1. Adapting the Metadata

Sytze Van Herck edited this page Jan 10, 2023 · 16 revisions

The metadata determine the structure of your Linked Data file. You can change the outcome of CoWs conversion process by adapting the metadata.json file which the tool generates.

Contents

This section discusses the following components of the JSON schema file:

Each section includes an exercise for you to try and change a metadata file yourself.

Base URI

The base URI determines what your URI's will start out with. This URI is at the start of the metadata.json file. In the example below, the base URI is set to "https://iisg.amsterdam/". Thus all URI's in the Linked Data file will start with this URI.

{
 "@context": [
  "https://raw.githubusercontent.com/CLARIAH/COW/master/csvw.json",
  {
   "@language": "en",
   "@base": "https://iisg.amsterdam/"
  },

You can create your own base URI. For example:

{
 "@context": [
  "https://raw.githubusercontent.com/CLARIAH/COW/master/csvw.json",
  {
   "@language": "en",
   "@base": "http://raw-data-now.org/trial-1/"
  },

Changing the base URI is a great idea when you are trying things out. You can easily distinguish different versions by adding trial-1, trial-2, etcetera.

Exercise

Try changing the base URI yourself. Download the example csv file. Copy the file path, and switch to your terminal. Move to the folder where you saved the buurt.csv file. Next, follow these steps:

  1. Upload the example csv file:
cow_tool build buurt.csv

The tool generates a -metadata.json file in the folder where you saved the example file.

  1. Open and edit the metadata file. Add "trial-1/" to the base URI on line 6. Make sure to save the changes.

  2. Create the Linked Data file with the following command:

cow_tool convert buurt.csv

The end result is an .nq file. When you open this file and search for "/trial-1/" you should have 76 matches.

^

Prefixes

Prefixes abbreviate URI's to save you the trouble of typing full URI's. The @context part of the JSON schema also contains prefixes as illustrated below.

{
 "@context": [
  "https://raw.githubusercontent.com/CLARIAH/COW/master/csvw.json",
  {
   "@language": "en",
   "@base": "https://iisg.amsterdam/"
  },
  {
   "aat": "http://vocab.getty.edu/aat/",
   "bibo": "http://purl.org/ontology/bibo/",
   ...
   "xsd": "http://www.w3.org/2001/XMLSchema#",
  }
 ],
} 

A number of prefixes are provided when building the JSON schema. In addition to the provided prefixes, you can create and add your own. In the example below we want to refer to the URI: "https://prefixes.causelesstypos.com/". Let's call the prefix "typos" and add it to the list of prefixes.

{
 "@context": [
  "https://raw.githubusercontent.com/CLARIAH/COW/master/csvw.json",
  {
   "@language": "en",
   "@base": "https://iisg.amsterdam/"
  },
  {
   ...
   "xsd": "http://www.w3.org/2001/XMLSchema#",
   "typos": "https://prefixes.causelesstypos.com/"
  }
 ],
}

Exercise

Try changing the prefixes yourself. Download the example csv file. Copy the file path, and switch to your terminal. Move to the folder where you saved the buurt.csv file. Next, follow these steps:

  1. Upload the example csv file:
cow_tool build buurt.csv

The tool generates a -metadata.json file in the folder where you saved the example file.

  1. Open and edit the metadata file. After line 46 add the following line:
"lic": "http://opendefinition.org/licenses/"

Then change the license id to "lic:cc-by/". Make sure to save the changes.

  1. Create the Linked Data file with the following command:
cow_tool convert buurt.csv

The end result is an .nq file. When you open this file, you should find "http://opendefinition.org/licenses/cc-by/".

^

Data Types

When transforming data into Linked Data, it is important to define the data type of each column. Proper data type definitions result in more flexibility when you query the data later on.

Data types are described by the XML Schema, indicated by the xsd: prefix. Common datatypes are:

  • xsd:string for text
  • xsd:int for whole numbers below 64k
  • xsd:integer for any whole number
  • xsd:float for numbers with decimals
  • xsd:date for complete dates (YYYY-MM-DD)
  • xsd:gYear for years

By default CoW adds the xsd: prefix to any data type. CoW also assigns the data type string to all columns. Below we change the data type of the number of maids by neighbourhood ('Dienstboden') to float:

   {
    "name": "Dienstboden",
    "datatype": "float",
    "@id": "https://iisg.amsterdam/buurt.csv/column/Dienstboden"
   }

Exercise

Try changing the data types yourself. Download the example csv file. Copy the file path, and switch to your terminal. Move to the folder where you saved the buurt.csv file. Next, follow these steps:

  1. Upload the example csv file:
cow_tool build buurt.csv

The tool generates a -metadata.json file in the folder where you saved the example file.

  1. Open and edit the metadata file. Change the datatype of the column "Dienstboden" from "string" to "float". Make sure to save the changes.

  2. Create the Linked Data file with the following command:

cow_tool convert buurt.csv

The end result is an .nq file. When you open this file the second line should contain "1,5"^^<http://www.w3.org/2001/XMLSchema#float> where #float refers to the correct data type.

^

Column Titles and Descriptions

The example for exercises contains the following table:

properties_name_in_uri Dienstboden
buurt-a 1,5
buurt-b 2,32
buurt-c 1,96
buurt-d 1,37

This table has two columns and four rows. The first column is called "properties_name_in_uri" and the second column is called "Dienstboden". The data are about neighbourhoods (buurt) and the number of maids (Dienstboden) living there.

The JSON schema file represents a column as follows:

   {
    "name": "properties_name_in_uri",
    "datatype": "string",
    "@id": "https://iisg.amsterdam/buurt.csv/column/properties_name_in_uri",
    "dc:description": "properties_name_in_uri",
    "titles": [
    "properties_name_in_uri"
    ]
   },

The column "properties_name_in_uri" is referred to by URI "https://iisg.amsterdam/buurt.csv/column/properties_name_in_uri" (@id). Since the values of this column are texts, the data type is a "string". The description, name, and title of the column are "properties_name_in_uri". The name refers to the name of the column in the original CSV file.

The description and title can be more specific. Linked Data add information to the data itself.

Exercise

Try changing the column metadata yourself. Download the example csv file. Copy the file path, and switch to your terminal. Move to the folder where you saved the buurt.csv file. Next, follow these steps:

  1. Upload the example csv file.
cow_tool build buurt.csv

The tool generates a -metadata.json file in the folder where you saved the example file.

  1. Open and edit the metadata file.

Improve the description and title of the first column as follows:

   {
    "name": "properties_name_in_uri",
    "datatype": "string",
    "@id": "https://iisg.amsterdam/buurt.csv/column/properties_name_in_uri",
    "dc:description": "Name of neighbourhood as described in the dataset",
    "titles": ["Property name of neighbourhood in the URI"]
   },
  1. Create the Linked Data file with the following command:
cow_tool convert buurt.csv

The end result is an .nqfile. The triple on line 47 should now contain <http://purl.org/dc/terms/description> "Name of neighbourhood as described in the dataset"@en.

^

Triples

Linked Data consists of triples. Triples contain a subject, predicate, and an object (see also introduction). The elements of a triple are defined in the metadata JSON schema by the aboutURL, propertyURL, and valueURL or CSVW:value respectively.

Another notation for triples is ?s ?p ?o. In SPARQL queries ?s stands for subject, ?p for predicate, and o? for object. For clarity, we've added this notation to the examples. The notation will not show in your example files.

Subject (aboutURL)

The default JSON schema from our example table describes the column "Dienstboden" (maids) as:

   {
        "@id": "https://iisg.amsterdam/buurt.csv/column/Dienstboden", 
        "datatype": "string", 
        "dc:description": "Dienstboden", 
        "name": "Dienstboden", 
        "titles": [
          "Dienstboden"
        ]
      }, 

The results of the JSON schema are triples like:

<https://iisg.amsterdam/26> <https://iisg.amsterdam/vocab/Dienstboden> "1,31"^^<http://www.w3.org/2001/XMLSchema#string> 
?s                          ?p                                         ?o

The subject (?s) consists of the base URI (https://iisg.amsterdam/) and the row number. The first row of the CSV file defaults to 0. The row number is set as a default element of the base URI in this part of the JSON schema:

"tableSchema": {
    "aboutUrl": "{_row}", 
    "columns": [

The aboutURL consists of:

  • the base URI: defined with "@base": "https://iisg.amsterdam/",
  • and the row number: defined with "aboutUrl": "{_row}", under "tableSchema"

The code to add the row number to the aboutURL {_row} builds on Jinja. The {} indicate that CoW needs to execute Jinja. The Jinja code _row adds the row number to the aboutUrl.

<https://iisg.amsterdam/0> <https://iisg.amsterdam/vocab/properties_name_in_uri> "buurt-a"^^<http://www.w3.org/2001/XMLSchema#string>
 ?s                        ?p                                                    ?o

IMPORTANT NOTE Remember to change the overall aboutUrl in the tableSchema (see exercise here). At the very least, extend the URI. Make sure to add something unique.

Other unique values in the .csv files themselves (e.g. an id) can also be added to the aboutURL. The advantage of using values from the .csv files is the addition of meaning to the URI. The disadvantage of this approach is the loss of information on the row the information was taken from.

Choose between a substantive URI or a provenance related URI based on the project. For a substantive URI, extend the aboutURL. For provenance information, add unique values from the .csv file to the aboutURL.

Exercise

Try changing the aboutURL or subject yourself. Download the example csv file. Copy the file path, and switch to your terminal. Move to the folder where you saved the buurt.csv file. Next, follow these steps:

  1. Upload the example csv file:
cow_tool build buurt.csv

The tool generates a -metadata.json file in the folder where you saved the example file.

  1. Open and edit the metadata file. First, add your project name to the aboutURL to ensure unique subject URI's:
"tableSchema": {
    "aboutUrl": "buurt.csv/{_row}",

Second, remove the primaryKey and add the unique names of neighbourhoods to the aboutURL:

"tableSchema": {
    "aboutUrl": "buurt.csv/{properties_name_in_uri}", 
    "columns": [
  1. Create the Linked Data file with the following command:
cow_tool convert buurt.csv

The end result is an .nq file. When you open this file, the subject of a triple should have changed to:

<https://iisg.amsterdam/buurt.csv/buurt-a>
?s

^

Predicate (propertyURL)

The predicate is the second element of a triple. A predicate can add information about the subject. To establish that the neighbourhoods ("Buurt") in the example reflect geographical areas, the JSON schema is adapted as follows:

   {
    "name": "properties_name_in_uri",
    "datatype": "string",
    "dc:description": "Name of neighbourhood as described in the dataset",
    "titles": ["Property name of neighbourhood in the URI"],
    "propertyUrl": "rdf:type",
    "valueUrl": "sdmx-dimension:refArea",
    "@id": "https://iisg.amsterdam/buurt.csv/column/properties_name_in_uri"
   },

The results of the JSON schema are triples like:

<https://iisg.amsterdam/buurt.csv/buurt-a> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/linked-data/sdmx/2009/dimension#refArea>
?s                                         ?p                                                ?o

Note: Check whether the vocabulary is added to the list of prefixes. Otherwise, add "sdmx-dimension": "http://purl.org/linked-data/sdmx/2009/dimension#", to the list.

The predicate (?p) is determined by the propertyUrl and refers to an existing vocabulary (RDF). Using existing vocabularies enhances the exchange of data.

Note: The example above does not specify a subject. Instead, the subject (?s) is determined by the global aboutUrl set in the tableSchema. To specify a different subject, create a virtual column.

When there is no predicate from existing vocabularies, you can create your own. Defining "propertyURL": "vocab/averageNrMaids" results in triples like:

<https://iisg.amsterdam/buurt.csv/buurt-a> <https://iisg.amsterdam/vocab/averageNrMaids> "1,5"^^<http://www.w3.org/2001/XMLSchema#string>
?s                                         ?p                                            ?o

The predicate (?p) consists of:

  • the base URI: defined with "@base": "https://iisg.amsterdam/",
  • and the propertyUrl: defined with "propertyUrl": "vocab/averageNrMaids"

The word "vocab" clarifies that "averageNrMaids" is a term from our personal vocabulary. For a dataset specific vocabulary, precede the propertyUrl with the name of your dataset "buurt.csv/vocab/averageNrMaids".

Exercise

Try changing the propertyURL or predicate yourself. Download the example csv file. Copy the file path, and switch to your terminal. Move to the folder where you saved the buurt.csv file. Next, follow these steps:

  1. Upload the example csv file:
cow_tool build buurt.csv

The tool generates a -metadata.json file in the folder where you saved the example file.

Note: This exercise builds upon the previous exercise (see subject exercise).

  1. Open and edit the metadata file. First, add the propertyURL and valueURL to specify neighbourhoods as geographical areas:
   {
    "name": "properties_name_in_uri",
    ...,
    "propertyUrl": "rdf:type",
    "valueUrl": "sdmx-dimension:refArea",
    "@id": "https://iisg.amsterdam/buurt.csv/column/properties_name_in_uri"
   },

Second, create your own vocabulary for the "Dienstboden" (maids) column:

   {
    "name": "Dienstboden",
    "datatype": "string",
    "dc:description": "Dienstboden, presumably average number per household",
    "titles": ["Dienstboden"],
    "propertyUrl": "vocab/averageNrMaids",
    "@id": "https://iisg.amsterdam/buurt.csv/column/Dienstboden"
   }

Note: remember to change the data type to float as well (see data types exercise).

  1. Create the Linked Data file with the following command:
cow_tool convert buurt.csv

The end result is an .nq file. The triples on line 2, 5, 7 and 8 should now contain <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/linked-data/sdmx/2009/dimension#refArea>.

^

Object (valueURL / CSVW:value)

The object is the third element of a triple. An object can be a URI (valueURL) or a value (CSVW:value). When changing the predicate of the neighbourhoods ("Buurt") to reflect geographical areas, the object was set to a URI with "valueUrl": "sdmx-dimension:refArea".

The next example specifies the average number of maids as a value (csvw:value). Due to a Dutch quirk in the example the average number of maids uses a comma as decimal separator. To change the decimal separator, the JSON-schema is adapted as follows:

   {
    "name": "Dienstboden",
    "datatype": "float",
    "dc:description": "Dienstboden",
    "titles": ["Dienstboden"],
    "propertyUrl": "vocab/averageNrMaids",
    "csvw:value": "{{Dienstboden|replace(',', '.')}}",
    "@id": "https://iisg.amsterdam/buurt.csv/column/Dienstboden"
   }

The results of the JSON schema are triples like:

<https://iisg.amsterdam/buurt.csv/buurt-a> <https://iisg.amsterdam/vocab/averageNrMaids> "1.5"^^<http://www.w3.org/2001/XMLSchema#float>
?s                                         ?p                                            ?o

The object (?o) is determined by csvw:value and refers to an existing vocabulary (RDF).

The code to replace the decimal separator {Dienstboden|replace(',', '.')} builds on Jinja. The {} indicate that CoW needs to execute Jinja. The Jinja code replace(',', '.') replaces the decimal separator , with . for the column Dienstboden.

Remember this important distinction: to create a URI use valueUrl, to create a value such as a number or a string use csvw:value.

Exercise

Try changing the object yourself. Download the example csv file. Copy the file path, and switch to your terminal. Move to the folder where you saved the buurt.csv file. Next, follow these steps:

  1. Upload the example csv file:
cow_tool build buurt.csv

The tool generates a -metadata.json file in the folder where you saved the example file.

  1. Open and edit the metadata file. Replace the decimal separator of the "Dienstboden" (maids) column when adding the CSVW:value:
   {
    "name": "Dienstboden",
    ...,
    "propertyUrl": "vocab/averageNrMaids",
    "CSVW:value": "{{Dienstboden|replace(',', '.')}}",
    "@id": "https://iisg.amsterdam/buurt.csv/column/Dienstboden"
   }

Note: remember to change the data type to float as well (see data types exercise).

  1. Create the Linked Data file with the following command:
cow_tool convert buurt.csv

The end result is an .nq file. The triples on line 1, 4, 6 and 7 should now contain the respective value "1.5"^^<http://www.w3.org/2001/XMLSchema#float>.

Note: with valueURL you can change an object to a URI (see predicate exercise).

^

Next: 2. Enriching the Metadata