Advancing data sharing and reusability for restricted access data on the Web: Introducing the DataSet-Variable Ontology
In response to the increasing volume of research data being gener- ated, more and more data portals have been designed to facilitate data findability and accessibility. However, a significant portion of this data remains confidential or restricted due to its sensitive nature, such as patient data or census microdata. While maintaining confidentiality prohibits its public release, the emergence of portals supporting rich metadata can help enable researchers to at least discover the existence of restricted access data, empowering them to assess the suitability of the data before requesting access. Existing standards, such as CSV on the Web and RDF Data Cube, have been adopted to facilitate data management, integration, and re-use of data on the Web. However, the current landscape still lacks adequate standards not only to effectively describe restricted access data while preserving confidentiality but also to facilitate its discovery. In this work, we investigate the relationship between the structural, statistical, and semantic elements of restricted access tabular data, and we explore how such relationship can be formally modeled in a way that is Findable, Accessible, Interoperable, and Reusable. We introduce the DataSet-Variable Ontology (DSV), that by combining CSV on the Web and RDF Data Cube standards, leveraging semantic technologies and Linked Data principles, and introducing variable-level metadata, aims to capture high-quality metadata to support the management and re-use of restricted access data on the Web. As evaluation, we conducted a case study where we applied DSV to four different datasets from different statistical governmental agencies. We employed a set of competency questions to assess the ontology’s ability to support knowledge discovery and data exploration.
|, , , ,
|12th Knowledge Capture Conference 2023, K-CAP '23
Martorana, M., Kuhn, T., R. Siebes, & van Ossenbruggen, J. (2023). Advancing data sharing and reusability for restricted access data on the Web: Introducing the DataSet-Variable Ontology. In Proceedings of the International Conference on Knowledge Capture (pp. 83–91). doi:10.1145/3587259.3627559