Dereferenceable identifiers [RDID] #53

jpullmann · 2018-01-18T21:11:41Z

Dereferenceable identifiers [RDID]

Encode identifiers as dereferenceable HTTP URIs

Related use cases: Modeling identifiers and making them actionable [ID11]

makxdekkers · 2018-01-19T10:10:55Z

I agree but I don't think there is a need to make any changes in the DCAT specification. It's an automatic consequence of DCAT being an RDF vocabulary.

dr-shorthair · 2018-01-19T13:02:14Z

Yes - that was my impression too. No changes necessary to satisfy this requirement.

andrea-perego · 2018-01-19T18:35:53Z

This issue is strictly related to providing guidance on how to use DCAT to specify identifiers as DOIs, ISBNs, etc. - see the related use case.
Currently, these IDs are encoded as simple strings, unless they are used as part of the primary resource URI. An option could be to encourage the use of owl:sameAs whenever the ID can be resolvable when encoded as URI (as for DOIs).
So, there may be no need to create new property / classes, but rather to describe how to use the existing ones to address these use cases.

makxdekkers · 2018-01-19T18:57:00Z

It could be part of DCAT guidance, maybe in the usage note of https://www.w3.org/TR/vocab-dcat/#Property:dataset_identifier? In fact, the current usage note only suggests that the "identifier might be used as part of the URI of the dataset" but it would be good to mention other identifiers in the usage note as well.

andrea-perego · 2018-01-19T22:42:27Z

It could be part of DCAT guidance, maybe in the usage note of https://www.w3.org/TR/vocab-dcat/#Property:dataset_identifier? In fact, the current usage note only suggests that the "identifier might be used as part of the URI of the dataset" but it would be good to mention other identifiers in the usage note as well.

+1 from me. One of options mentioned in the related use case is to use dct:identifier with a datatype denoting the identifier type (DOI, etc.). But these datatypes need to be defined. There's of course also the other option of using specific properties for each type of identifier (prism:doi, bibo:doi, etc.).
But for specifying multiple identifiers as HTTP URIs we need a property as owl:sameAs, which needs to be added to the DCAT spec.

agbeltran · 2018-01-24T17:24:39Z

Should we add the 'documentation' tag for this requirement then?

kcoyle · 2018-01-24T20:40:10Z

The library world has struggled with this same problem. There are many identifiers that are not (yet) expressed as IRIs. As these are just alpha-numeric strings, there is a need to give a context so that they are meaningful/useful. This has led to some awkward models of identifiers being at least 2-part: the identifier string, and the "provenance" of the identifier. So although one should prefer IRI forms when available, what should be done with a string like "098378297" when it is the identifier from some agency? That's the hard part.

agbeltran · 2018-01-31T20:56:26Z

@kcoyle - could you provide a pointer to a catalogue from the library world with that situation? In those examples, is it not possible to get a description of the resource being identified at all?

For the case of life science data, which would be also applicable to other scientific domains I imagine, our paper "Identifiers for the 21st century: how to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data"( https://doi.org/10.1371/journal.pbio.2001414) presents the situation and could be a useful reference.

kcoyle · 2018-02-01T03:52:46Z

Here are some (there are about a dozen common ones) (type followed by example):

US Copyright Office document ID: PA 1-060-815
ISBN: 9780060723804 (also ISSN for serials is similar)
Patent document number (requires country code): 67-SC41534
Standard techical report number: METPRO/CB/TR--74/216+PR.ENVR.WI

I'm not sure what you mean about getting a description of the resource - the bibliographic record is a description of the resource; what is problematic is that not all identifiers have a URI form. These are not "web-based identifiers" and I suspect that other data providers also have older identifiers that are not (yet) web-based. These are in library data. In addition, the identifiers for the records in a data file in library data are simple strings, like "##2001627090" (the octothorps represent blanks). These are exported as URIs when the data is converted to RDF ("https://lccn.loc.gov/2015020100") but there is still a majority of data that is not in RDF.

The concept of a "data catalog" is not applied to these files of records although there are sites that provide files of the records for download. This may or may not fit into the context of DCAT.

makxdekkers · 2018-02-07T10:28:39Z

The European DCAT-AP includes a property adms:identifier with range adms:Identifier for Dataset. adms:Identifier is based on the UN/CEFACT Identifier class and consists of:

a content string which is the identifier;
an optional identifier for the identifier scheme;
an optional identifier for the version of the identifier scheme;
an optional identifier for the agency that manages the identifier scheme.

nicholascar · 2018-02-07T21:41:09Z

Whatever mechanism we decide on for handing identifiers, it should be comprehensive enough to be used in a DCAT2 and also elsewhere since we have the requirement of referring to alternate identifiers (non-HTTP URIs) for things like physical samples in SOSA ontology catalogues.

dr-shorthair · 2018-02-08T00:10:01Z

Discussed at some length in meeting https://www.w3.org/2018/02/07-dxwgdcat-minutes

AndreaPerego: main issue is that a number of other identifier systems are used for data citation, publishers, etc
... e.g. DataCite supports quite a few identifier systems
... DCAT-AP also discussed this at length
... agencies want to use their internal identifiers, not necessarily URIs
... may connect datasets using SPARQL queries, etc not just URIs
... what is needed by different communities is ability to specify different kinds of identifiers, and their type
... need to indicate that a string is an identifier
... whenenver possible make identifiers resolvable by encoding them as URIs, but this does not apply to all identifier systems
... situation is quite complicated
... there are some other URI systems, but not necessarily resolvable
... case sensitivity is also an issue
... proposal made in UC is to try to address both issue
... 1. encode as http URIs where possible
... 2. encode as a string using dct:identifier property, and note the type of the identifier using ^^type indicator
... UC is about providing guidance where standard RDF http URI does not apply
... also for how to use SPARQL queries, for example

NicholasCar: We had the same issue with physical samples.
... recommendation there is to supplement identifier field with identifier-type
... need a comprehensive schema for alternative identifiers

SimonCox: Makx suggested looking at ADMS.
... https://‌www.w3.org/‌TR/‌vocab-adms/#identifier

It would be beneficial to have a comprehensive handling of identifiers, inc. type and other properties, as we need to use them in many situations, not just DCAT

SimonCox: There's an adms:Identifier class there.

Yes, ADMS seems to mostly do it!

SimonCox: It is based on UN/CEFACT. So, this appear that fullfills the proposal you made, AndreaPerego.
... Adopt or clone adms:Identifier

AndreaPerego: alternative proposals: PRISM, BIBO,
... specific fields for well-known identifier schemes, e.g. bibo:DOI
... these are already used by some important services, e.g. crossRef
... need to explain how these different approaches map to each other

dr-shorthair · 2018-02-15T00:37:23Z

Proposal: promote adms:Identifier to DCAT

fellahst · 2018-02-15T22:21:59Z

I suggest we clone ADMS identifier or use an ontology addressing only identifiers such as http://data.press.net/ontology/identifier/ or http://ows.usersmarts.com/owldocgen/owldoc?url=http://www.opengis.net/ont/common/identifier# . The reason to use a micro ontology for just identifiers is that it can be reused for many other purposes. It would be nice to submit this small ontology for standardization by W3C.

nicholascar · 2018-02-16T06:27:28Z

I like! I think the first microbotology needs a few more things though: notes on identifier formats; whether they are structured or opaque strings etc. These could be optional

philarcher · 2018-02-16T08:36:00Z

On this topic, one of my priorities at GS1 now is to make barcodes dereferenceable. In more formal terms, we're defining how GTINs (the numbers you see beneath a barcode) and our other less well-known identifiers, can be encoded in HTTP URIs. I mention it here because there is a close relationship between our GTINs and ISBN and ISSN (ISBNs all begin 978 or 979 but are part of the EAN/UPC/GTIN world). Therefore, if this WG has use cases for dereferenceable ISBNs, I'd be pleased to know, especially if you have any idea where they should dereference to!

makxdekkers · 2018-02-16T08:53:27Z

In fact, you could call the definition of adms:Identifier a micro-ontology: it defines a class and a set of properties to describe it, plus a note that "it may also be useful to provide further properties".

larsgsvensson · 2018-02-20T21:29:01Z

Therefore, if this WG has use cases for dereferenceable ISBNs, I'd be pleased to know, especially if you have any idea where they should dereference to!

I don't have a use case, but my first reaction would be that ISBNs should dereference to the national library for the jurisdiction where the publication was published (or where the publisher is located). But maybe I'm biased since I work in a national library...

agbeltran · 2018-02-20T21:29:15Z

Also relevant to this discussion is the schema.org discussion on identifiers and the sdo:identifier term.

smrgeoinfo · 2018-11-14T23:57:58Z

after a review of the discussion, it looks like there are two proposals:
ADMS kind of approach-- identifiers have a datatype like skos:notation, i.e. typed literal, and the value for the typed literal is the identifier type. e.g.
dcat:identifier "978-3-16-148410-0"^^https://www.iso.org/standard/36563.html
Its not clear to me how ADMS would serialize the other properties (version and managing authority)

schema.org, ISO19115, DATS approach-- make identifier an object/class with a code property (the identifier string), a scheme property, maybe an authority property.

Personally I think the second approach is more transparent and widely used.
Schema.org implements the identifier as a PropertyValue, which obfuscates things;
DATS uses 'identifier' and 'identifierSource' as the property names;
ISO19115-1 uses 'code', 'codespace', and 'version', with a citation for the 'authority'
DataCite has 'identifier' and 'identifierType'

proposal:
class: dcat:identifier
Properties:
dcat:code -- the identifier string; for a well formed URI this would be all that's necessary
dcat:identifierType -- literal or URI
dcat:version -- literal
authority -- foaf:organization

makxdekkers · 2018-11-15T10:27:25Z

@smrgeoinfo ADMS also makes the identifier a class, namely adms:Identifier.
The spec at https://www.w3.org/TR/vocab-adms/#identifier indeed does not provide a full recommendation on how to express the other properties of the Identifier, but I would suggest:

the identifier string in skos:notation
the identifier scheme in skos:inScheme
the version in owl:versionInfo
the agency in dct:creator or dct:publisher

I would not be in favour of defining a dcat:Identifier class alongside the adms:Identifier class that basically does the same thing.

riccardoAlbertoni · 2018-11-15T13:30:27Z

adms:identifier is already adopted in some DCAT application profiles, so I second the idea of using it rather than introducing new terms, at least as a first attempt.

As part of the action 259 which has been assigned to me in the last week dcat call, I have drafted the following wiki page, DCAT-Identifiers.

In such a page, I have tried to set up a proposal based on existing adms:identifier examples.

The page is still in progress, I certainly need to update it with the latest @makxdekkers suggestions. Though it is not yet complete, and corrections might be needed, I guess it can help the discussion.

smrgeoinfo · 2018-11-15T16:14:11Z

@riccardoAlbertoni thanks, that wiki page is helpful. A couple comments:
in the Representing HTTP dereferenceable secondary identifier section, there seems to be an assumption that the ^^xsd:anyURI type implies that the literal is an HTTP URI, but the data type allows any valid RFC-3986 URI (e.g. urn:), and these might not be dereferenceable.

Also, in the example, with a doi:

 skos:notation  "10.1109/5.771073"^^dcat:doi  ;
 adms:schemeAgency "International DOI Foundation" .

I would suggest that the issuing authority of interest should be the registrant for the 10.1109 doi space, "IEEE Xplore Digital Library", perhaps this should be added as a dct:creator. There are two concerns-- the authority that defined the identifier scheme (DOI foundation), and the authority responsible for assigning and maintaining identifiers using that scheme (IEEE).

@makxdekkers I got the impression from the adms doco that the identifier scheme is encoded as the data type in the skos:notation typed literal, so using skos:inScheme would be redundant, and I think its also not consistent with the intention of skos:inScheme.

riccardoAlbertoni · 2018-11-16T18:06:04Z

in the Representing HTTP dereferenceable secondary identifier section, there seems to be an assumption that the ^^xsd:anyURI type implies that the literal is an HTTP URI, but the data type allows any valid RFC-3986 URI (e.g. urn:), and these might not be dereferenceable.

I see your point @smrgeoinfo, the title is slightly misleading.
I suspect that the only way to know if a URI is HTTP dereferenceable is to try to resolve it as It can be broken.

As far as I can understand, indicating an urn is useful as well. Independently from their dereferenceability, secondary IDs are indicated to say that others might refer to the same dataset with different IDs, they are useful to manage/ group duplicates. So I have made the distinction between dereferenceable and non-deferenceable URIs less sharp.

riccardoAlbertoni · 2018-11-21T17:20:42Z

@smrgeoinfo wrote

I would suggest that the issuing authority of interest should be the registrant for the 10.1109 doi space, "IEEE Xplore Digital Library", perhaps this should be added as a dct:creator. There are two concerns-- the authority that defined the identifier scheme (DOI foundation), and the authority responsible for assigning and maintaining identifiers using that scheme (IEEE).

@smrgeoinfo Please take a look at example 7, Have I correctly interpreted your suggestion?

agbeltran · 2018-11-21T19:49:12Z

To answer 'Question 1' in 'Proposal 1' from @riccardoAlbertoni's notes on the wiki, the DataCite schemas include an XSD with a list of identifier types/schemes here:

https://schema.datacite.org/meta/kernel-4.1/include/datacite-relatedIdentifierType-v4.xsd

agbeltran · 2018-11-21T19:54:46Z

Also FAIRsharing keeps a registry of identifier schemes: https://fairsharing.org/standards/?q=&selected_facets=type_exact:identifier%20schema

agbeltran · 2018-11-21T20:50:18Z

As regards @smrgeoinfo point on identifying both the identifier scheme and the organisation minting the identifiers, it seems to me that is a use case not covered by ADMS, as adms:schemaAgency covers the name of the "agency that manages the identifier scheme" as a literal, while dct:creator would be used to point to the representation of such organisation rather than a separate one? is that correct @makxdekkers ?

Apart from that interpretation of ADMS, example 7 would cover accounting for both the identifier scheme/type and the organisation maintaining it IMO.

makxdekkers · 2018-11-22T18:44:40Z

@agbeltran Yes, dct:creator and adms:schemaAgency should be for the same organisation. The literal option was provided because schema agencies might not be in Linked Data space and have no URI.

riccardoAlbertoni · 2018-11-23T14:48:30Z

Yes, dct:creator and adms:schemaAgency should be for the same organisation. The literal option was provided because schema agencies might not be in Linked Data space and have no URI.`

Then, assuming we want to distinguish between (a) the authority that defined the identifier scheme (DOI foundation), and (b) the authority responsible for assigning and maintaining identifiers using that scheme (IEEE), we need to consider a property distinct from dct:creator for (b)

I see two alternative options here

add a new extra dcat property ( e.g., named dcat:idMantainer/dcat:IdAuthority ) to indicate (b) the authority responsible for assigning and maintaining identifiers using that scheme (IEEE).
use of dct:publisher for indicating (b) instead of defining a new property such as dcat:idMantainer/dcat:IdAuthority . However, I've got the impression dct:creator / dct:publisher are used interchangeably to refer to schema agency (i.e., DOI is the @smrgeoinfo's example), so I do not know if this is really possible.

Which of the two the group thinks is more reasonable?
Does anyone see further options?

makxdekkers · 2018-11-24T12:10:37Z

@riccardoAlbertoni I am not in favour of your proposal.
As I understand it, the DOI Foundation is the schema agency for DOI. Period. The fact that DOI is organised in such a way that there are registration agencies and registrants for sub-spaces under DOI should be irrelevant. Moreover, naming the registrant goes against the philosophy of DOI where the sub-spaces are abstracted from the organisation that registers them, with the advantage that DOIs don't change when the organisation changes or the responsibility for that sub-space is handed over to someone else. Your proposal risks creating a dependency that DOI itself tries to avoid.
So, in summary, I vote against both options, and suggest to use adms:Identifier as specified allowing only one single agency.

riccardoAlbertoni · 2018-11-24T13:30:45Z

Thank @makxdekkers for your comment.
If I have correctly interpreted your message you are not in favour of the requirement behind my modelling attempt, namely the need to mention both
a) the authority that defined the identifier scheme (DOI foundation), and
b) the authority responsible for assigning and maintaining identifiers using that scheme (IEEE),
as it was suggested by @smrgeoinfo. @smrgeoinfo Have I misinterpreted your suggestion?

I've found @makxdekkers' motivations convincing, I also guess that similar considerations might hold for other identifier schemes.
So I have included your motivations for not representing (b) in example 7.

makxdekkers · 2018-11-24T18:29:59Z

Correct, I am not in favour of the requirement to model more than one authority for identifiers.

smrgeoinfo · 2018-11-24T22:04:17Z

short story:

I think what a user really needs to know is what is the identifier scheme (not who defined it), in particular, if those identifiers can be dereferenced, how can they are dereferenced, and what kind of representations of the identified resource should be available. The agent defining the scheme is not the info needed for this use case. Back to the original question, if identifiers are are required to be http: URIs, the base identifier scheme is known (http), but the practical matter is that various agent embed identifiers within the http uri, and the identifier scheme that matters to the user is not http, but what the embedded scheme is, e.g. doi, ark, igsn...

details

a) the authority that defined the identifier scheme (DOI foundation), and
b) the authority responsible for assigning and maintaining identifiers using that scheme (IEEE),

@riccardoAlbertoni yes you are interpreting my suggestion as intended, and I think @makxdekkers point about the registering agent is valid.

If a registered URI type is used (following RFC-3986), the identifier scheme is part of the URI; a separate identifier scheme property is redundant in that case. If the skos:notation in the adms:identifier has type ^^xsd:anyURI, then the identifier for the scheme should be the prefix on the ID string ('http:' in the example 7).

DOI is registered as a namespace in the 'info' URI scheme (see faq #11 ), so it would appear that to formally encode a DOI as an rfc 3986 URI it would look like 'info:doi/10.1109/5.771073'. The info namespace registry was off line when I tried and check this.

As far as dct:creator, it seems odd to me that the dct:creator property on an adms:Identifer is not the creator of the identifier instance, rather it is the creator of the identifier scheme. This would be confusing if one were not conversant in the usage recommendations for adms; if that's the convention we should stick with it.

To me, the major use case for knowing the identifier scheme is that it should tell you how you can dereference the identifier, and ideally what kind of representations for the identified resource are available, so there is no particular need to identify the agent responsible for actually issuing and maintaining the lifecycle of the identifier, in the case of a DOI, knowing the scheme lets a user know that the registering agent is specified by the prefix part of the id string and there are ways to dereference that.

agbeltran · 2018-12-16T22:15:58Z

Marking this issue as 'due for closing' given PR #614

agbeltran · 2018-12-19T23:38:33Z

Closing after merging #614

jpullmann added dcat requirement referencing labels Jan 18, 2018

dr-shorthair mentioned this issue Jan 18, 2018

6.1.1 Dereferenceable identifiers [RDID] #50

Closed

This was referenced Jan 19, 2018

Identifier type [RIDT] #68

Closed

Primary and alternative identifier [RIDALT] #67

Closed

agbeltran added the identification label Jan 24, 2018

dr-shorthair added the documentation label Jan 30, 2018

davebrowning mentioned this issue Feb 7, 2018

Provide guidance in DCAT2 on how to extend Distribution #106

Closed

dr-shorthair mentioned this issue Feb 9, 2018

DCAT dependencies #111

Closed

dr-shorthair mentioned this issue Feb 15, 2018

Aligning ADMS with DCAT #113

Closed

riccardoAlbertoni closed this as completed Nov 23, 2018

riccardoAlbertoni reopened this Nov 23, 2018

riccardoAlbertoni mentioned this issue Dec 5, 2018

Dcat issue53 riccardo #614

Merged

agbeltran added the due for closing Issue that is going to be closed if there are no objection within 6 days label Dec 16, 2018

agbeltran closed this as completed Dec 19, 2018

riccardoAlbertoni mentioned this issue Dec 20, 2018

Deleting closed issues from the DCAT document #633

Closed

riccardoAlbertoni added a commit that referenced this issue Dec 20, 2018

dcat: deleting closed issue #53

4458e66

riccardoAlbertoni mentioned this issue Dec 20, 2018

Dcat deleting closed issues53and68git riccardo #634

Merged

riccardoAlbertoni added a commit that referenced this issue Dec 20, 2018

dcat: adding issues #68 and #53 to appendix E

ee628ab

davebrowning removed the due for closing Issue that is going to be closed if there are no objection within 6 days label Feb 28, 2019

Dereferenceable identifiers [RDID] #53

Dereferenceable identifiers [RDID] #53

Comments

jpullmann commented Jan 18, 2018

Dereferenceable identifiers [RDID]

makxdekkers commented Jan 19, 2018

dr-shorthair commented Jan 19, 2018

andrea-perego commented Jan 19, 2018 • edited Loading

makxdekkers commented Jan 19, 2018 • edited by andrea-perego Loading

andrea-perego commented Jan 19, 2018 • edited Loading

agbeltran commented Jan 24, 2018

kcoyle commented Jan 24, 2018

agbeltran commented Jan 31, 2018

kcoyle commented Feb 1, 2018 • edited by dr-shorthair Loading

makxdekkers commented Feb 7, 2018 • edited Loading

nicholascar commented Feb 7, 2018 • edited Loading

dr-shorthair commented Feb 8, 2018

dr-shorthair commented Feb 15, 2018

fellahst commented Feb 15, 2018

nicholascar commented Feb 16, 2018

philarcher commented Feb 16, 2018

makxdekkers commented Feb 16, 2018

larsgsvensson commented Feb 20, 2018

agbeltran commented Feb 20, 2018

smrgeoinfo commented Nov 14, 2018

makxdekkers commented Nov 15, 2018

riccardoAlbertoni commented Nov 15, 2018 • edited Loading

smrgeoinfo commented Nov 15, 2018

riccardoAlbertoni commented Nov 16, 2018

riccardoAlbertoni commented Nov 21, 2018

agbeltran commented Nov 21, 2018

agbeltran commented Nov 21, 2018

agbeltran commented Nov 21, 2018

makxdekkers commented Nov 22, 2018

riccardoAlbertoni commented Nov 23, 2018

makxdekkers commented Nov 24, 2018 • edited Loading

riccardoAlbertoni commented Nov 24, 2018

makxdekkers commented Nov 24, 2018

smrgeoinfo commented Nov 24, 2018

agbeltran commented Dec 16, 2018

agbeltran commented Dec 19, 2018

andrea-perego commented Jan 19, 2018 •

edited

Loading

makxdekkers commented Jan 19, 2018 •

edited by andrea-perego

Loading

andrea-perego commented Jan 19, 2018 •

edited

Loading

kcoyle commented Feb 1, 2018 •

edited by dr-shorthair

Loading

makxdekkers commented Feb 7, 2018 •

edited

Loading

nicholascar commented Feb 7, 2018 •

edited

Loading

riccardoAlbertoni commented Nov 15, 2018 •

edited

Loading

makxdekkers commented Nov 24, 2018 •

edited

Loading