Let’s build a dataset series using dcat:DatasetSeries!

Use case:

  1. Assuming that we have the following three instances of dcat:Dataset, already published by three different Agents:

    1. <beesBE2022> dereferenced to https://jimjyang.github.io/playground/dataset-series/beesBE2022.ttl, which is an instance of dcat:Dataset describing an open dataset, published by Agent1.

    2. <beesCZ2022> dereferenced to https://jimjyang.github.io/playground/dataset-series/beesCZ2022.ttl, which is an instance of dcat:Dataset describing an open dataset, published by Agent2.

    3. <beesNO2022> dereferenced to https://jimjyang.github.io/playground/dataset-series/beesNO2022.ttl, which is an instance of dcat:Dataset describing an open dataset, published by Agent3.

  2. Assuming further that we have the following file to be harvested into our data portal, created by Agent4 who wants to create and publish an open dataset series by reusing the abovementioned three already published dataset descriptions:

    1. https://jimjyang.github.io/playground/dataset-series/source-for-harvesting.ttl

    2. Note: Agent4 has no mandate neither to update the abovementioned three instances of dcat:Dataset nor to enforce the other agents to do so. The abovementioned instances of dcat:Dataset will thus remain as they are.

Possible implementations:

After the harvesting, in our data portal, should this instance of dcat:DatasetSeries

  1. use dcat:seriesMembers to refer directly to the abovementioned three instances of dcat:Dataset, as in https://jimjyang.github.io/playground/dataset-series/beesEU2022.ttl?

    1. Since the abovementioned dereferenced representations of the three instances of dcat:Dataset do not use the main property dcat:inSeries, this implementation uses only the inverse property dcat:seriesMember. Is this implementation compliant with the current version (Working Draft per 07 March 2023) of W3C/DCAT3 ⧉? The answer is no, it isn’t.

  2. or, not use dcat:seriesMembers, as in https://jimjyang.github.io/playground/dataset-series/beesEU2022withoutMembers.ttl?

    1. Neither this instance of dcat:DatasetSeries nor the abovementioned dereferenced representations of the three instances of dcat:Dataset say anything at all about the memberships of this dataset series, how can our (re)users and non-reasoning applications know which datasets are in this dataset series? The answer is no, they can’t.

  3. or, use dcat:seriesMembers referring to new instances of dcat:Dataset that refer to the original instances of dcat:Dataset using owl:sameAs, and these new instances of dcat:Dataset also refer back to the dataset series using dcat:inSeries, as in https://jimjyang.github.io/playground/dataset-series/beesEU2022withNewMemberURIs.ttl.

    1. This implementation is compliant with the current version of W3C/DCAT3. It also makes it possible for our (re)users and non-reasoning applications to know the members of this dataset series.

    2. However, as illustrated, this implementation generates though unnecessary overhead by creating new instances of dcat:Dataset.

Questions:

  1. In general, is it reasonable to assume or even request that the metadata/description of a dataset has to be updated with the information about where the dataset is reused, every and each time it is reused?

  2. Wouldn’t it (therefore) be better if W3C/DCAT3 had chosen dcat:seriesMember as the main property and dcat:inSeries the inverse, such that alternative A above which is the most straightforward implementation, could become compliant with W3C/DCAT3?