-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dereferenceable identifiers [RDID] #53
Comments
I agree but I don't think there is a need to make any changes in the DCAT specification. It's an automatic consequence of DCAT being an RDF vocabulary. |
Yes - that was my impression too. No changes necessary to satisfy this requirement. |
This issue is strictly related to providing guidance on how to use DCAT to specify identifiers as DOIs, ISBNs, etc. - see the related use case. |
It could be part of DCAT guidance, maybe in the usage note of https://www.w3.org/TR/vocab-dcat/#Property:dataset_identifier? In fact, the current usage note only suggests that the "identifier might be used as part of the URI of the dataset" but it would be good to mention other identifiers in the usage note as well. |
+1 from me. One of options mentioned in the related use case is to use |
Should we add the 'documentation' tag for this requirement then? |
The library world has struggled with this same problem. There are many identifiers that are not (yet) expressed as IRIs. As these are just alpha-numeric strings, there is a need to give a context so that they are meaningful/useful. This has led to some awkward models of identifiers being at least 2-part: the identifier string, and the "provenance" of the identifier. So although one should prefer IRI forms when available, what should be done with a string like "098378297" when it is the identifier from some agency? That's the hard part. |
@kcoyle - could you provide a pointer to a catalogue from the library world with that situation? In those examples, is it not possible to get a description of the resource being identified at all? For the case of life science data, which would be also applicable to other scientific domains I imagine, our paper "Identifiers for the 21st century: how to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data"( https://doi.org/10.1371/journal.pbio.2001414) presents the situation and could be a useful reference. |
Here are some (there are about a dozen common ones) (type followed by example):
I'm not sure what you mean about getting a description of the resource - the bibliographic record is a description of the resource; what is problematic is that not all identifiers have a URI form. These are not "web-based identifiers" and I suspect that other data providers also have older identifiers that are not (yet) web-based. These are in library data. In addition, the identifiers for the records in a data file in library data are simple strings, like "##2001627090" (the octothorps represent blanks). These are exported as URIs when the data is converted to RDF ("https://lccn.loc.gov/2015020100") but there is still a majority of data that is not in RDF. The concept of a "data catalog" is not applied to these files of records although there are sites that provide files of the records for download. This may or may not fit into the context of DCAT. |
The European DCAT-AP includes a property adms:identifier with range adms:Identifier for Dataset. adms:Identifier is based on the UN/CEFACT Identifier class and consists of:
|
Whatever mechanism we decide on for handing identifiers, it should be comprehensive enough to be used in a DCAT2 and also elsewhere since we have the requirement of referring to alternate identifiers (non-HTTP URIs) for things like physical samples in SOSA ontology catalogues. |
Discussed at some length in meeting https://www.w3.org/2018/02/07-dxwgdcat-minutes AndreaPerego: main issue is that a number of other identifier systems are used for data citation, publishers, etc NicholasCar: We had the same issue with physical samples. SimonCox: Makx suggested looking at ADMS. It would be beneficial to have a comprehensive handling of identifiers, inc. type and other properties, as we need to use them in many situations, not just DCAT SimonCox: There's an adms:Identifier class there. Yes, ADMS seems to mostly do it! SimonCox: It is based on UN/CEFACT. So, this appear that fullfills the proposal you made, AndreaPerego. AndreaPerego: alternative proposals: PRISM, BIBO, |
Proposal: promote adms:Identifier to DCAT |
I suggest we clone ADMS identifier or use an ontology addressing only identifiers such as http://data.press.net/ontology/identifier/ or http://ows.usersmarts.com/owldocgen/owldoc?url=http://www.opengis.net/ont/common/identifier# . The reason to use a micro ontology for just identifiers is that it can be reused for many other purposes. It would be nice to submit this small ontology for standardization by W3C. |
I like! I think the first microbotology needs a few more things though: notes on identifier formats; whether they are structured or opaque strings etc. These could be optional |
On this topic, one of my priorities at GS1 now is to make barcodes dereferenceable. In more formal terms, we're defining how GTINs (the numbers you see beneath a barcode) and our other less well-known identifiers, can be encoded in HTTP URIs. I mention it here because there is a close relationship between our GTINs and ISBN and ISSN (ISBNs all begin 978 or 979 but are part of the EAN/UPC/GTIN world). Therefore, if this WG has use cases for dereferenceable ISBNs, I'd be pleased to know, especially if you have any idea where they should dereference to! |
In fact, you could call the definition of |
I don't have a use case, but my first reaction would be that ISBNs should dereference to the national library for the jurisdiction where the publication was published (or where the publisher is located). But maybe I'm biased since I work in a national library... |
Also relevant to this discussion is the schema.org discussion on identifiers and the sdo:identifier term. |
after a review of the discussion, it looks like there are two proposals: schema.org, ISO19115, DATS approach-- make identifier an object/class with a code property (the identifier string), a scheme property, maybe an authority property. Personally I think the second approach is more transparent and widely used. proposal: |
@smrgeoinfo ADMS also makes the identifier a class, namely adms:Identifier.
I would not be in favour of defining a dcat:Identifier class alongside the adms:Identifier class that basically does the same thing. |
adms:identifier is already adopted in some DCAT application profiles, so I second the idea of using it rather than introducing new terms, at least as a first attempt. As part of the action 259 which has been assigned to me in the last week dcat call, I have drafted the following wiki page, DCAT-Identifiers. In such a page, I have tried to set up a proposal based on existing adms:identifier examples. The page is still in progress, I certainly need to update it with the latest @makxdekkers suggestions. Though it is not yet complete, and corrections might be needed, I guess it can help the discussion. |
@riccardoAlbertoni thanks, that wiki page is helpful. A couple comments: Also, in the example, with a doi:
I would suggest that the issuing authority of interest should be the registrant for the 10.1109 doi space, "IEEE Xplore Digital Library", perhaps this should be added as a dct:creator. There are two concerns-- the authority that defined the identifier scheme (DOI foundation), and the authority responsible for assigning and maintaining identifiers using that scheme (IEEE). @makxdekkers I got the impression from the adms doco that the identifier scheme is encoded as the data type in the skos:notation typed literal, so using skos:inScheme would be redundant, and I think its also not consistent with the intention of skos:inScheme. |
I see your point @smrgeoinfo, the title is slightly misleading. As far as I can understand, indicating an urn is useful as well. Independently from their dereferenceability, secondary IDs are indicated to say that others might refer to the same dataset with different IDs, they are useful to manage/ group duplicates. So I have made the distinction between dereferenceable and non-deferenceable URIs less sharp. |
@smrgeoinfo wrote
@smrgeoinfo Please take a look at example 7, Have I correctly interpreted your suggestion? |
To answer 'Question 1' in 'Proposal 1' from @riccardoAlbertoni's notes on the wiki, the DataCite schemas include an XSD with a list of identifier types/schemes here: https://schema.datacite.org/meta/kernel-4.1/include/datacite-relatedIdentifierType-v4.xsd |
Also FAIRsharing keeps a registry of identifier schemes: https://fairsharing.org/standards/?q=&selected_facets=type_exact:identifier%20schema |
As regards @smrgeoinfo point on identifying both the identifier scheme and the organisation minting the identifiers, it seems to me that is a use case not covered by ADMS, as Apart from that interpretation of ADMS, example 7 would cover accounting for both the identifier scheme/type and the organisation maintaining it IMO. |
@agbeltran Yes, dct:creator and adms:schemaAgency should be for the same organisation. The literal option was provided because schema agencies might not be in Linked Data space and have no URI. |
Then, assuming we want to distinguish between (a) the authority that defined the identifier scheme (DOI foundation), and (b) the authority responsible for assigning and maintaining identifiers using that scheme (IEEE), we need to consider a property distinct from dct:creator for (b) I see two alternative options here
Which of the two the group thinks is more reasonable? |
@riccardoAlbertoni I am not in favour of your proposal. |
Thank @makxdekkers for your comment. I've found @makxdekkers' motivations convincing, I also guess that similar considerations might hold for other identifier schemes. |
Correct, I am not in favour of the requirement to model more than one authority for identifiers. |
short story: I think what a user really needs to know is what is the identifier scheme (not who defined it), in particular, if those identifiers can be dereferenced, how can they are dereferenced, and what kind of representations of the identified resource should be available. The agent defining the scheme is not the info needed for this use case. Back to the original question, if identifiers are are required to be http: URIs, the base identifier scheme is known (http), but the practical matter is that various agent embed identifiers within the http uri, and the identifier scheme that matters to the user is not http, but what the embedded scheme is, e.g. doi, ark, igsn... details
@riccardoAlbertoni yes you are interpreting my suggestion as intended, and I think @makxdekkers point about the registering agent is valid. If a registered URI type is used (following RFC-3986), the identifier scheme is part of the URI; a separate identifier scheme property is redundant in that case. If the skos:notation in the adms:identifier has type ^^xsd:anyURI, then the identifier for the scheme should be the prefix on the ID string ('http:' in the example 7). DOI is registered as a namespace in the 'info' URI scheme (see faq #11 ), so it would appear that to formally encode a DOI as an rfc 3986 URI it would look like 'info:doi/10.1109/5.771073'. The info namespace registry was off line when I tried and check this. As far as dct:creator, it seems odd to me that the dct:creator property on an adms:Identifer is not the creator of the identifier instance, rather it is the creator of the identifier scheme. This would be confusing if one were not conversant in the usage recommendations for adms; if that's the convention we should stick with it. To me, the major use case for knowing the identifier scheme is that it should tell you how you can dereference the identifier, and ideally what kind of representations for the identified resource are available, so there is no particular need to identify the agent responsible for actually issuing and maintaining the lifecycle of the identifier, in the case of a DOI, knowing the scheme lets a user know that the registering agent is specified by the prefix part of the id string and there are ways to dereference that. |
Marking this issue as 'due for closing' given PR #614 |
Closing after merging #614 |
Dereferenceable identifiers [RDID]
Encode identifiers as dereferenceable HTTP URIs
Related use cases: Modeling identifiers and making them actionable [ID11]
The text was updated successfully, but these errors were encountered: