Prevalence of open/FAIR data practices
History
Version | Revision date | Revision | Author |
---|---|---|---|
1.1 | 2023-07-20 | Edited & revised | T. Willemse |
1.0 | 2023-04-13 | First draft | T. Willemse |
Description
The Open Science discourse has been fuelled by the prospect of a more open and collaborative scientific effort that can accelerate scientific development and innovation. A big step in this direction is the development of Open Data practices that make it easier for scientist to share and reuse data for research. Often this agenda is pushed forward with the Findability, Accessibility, Interoperability and Reusability (FAIR) principles (Wilkinson et al. 2016a). Taking these principles in mind serves to open up data practices in science and thereby improve scientific data practices, data reuse and reproducibility. Accordingly, it is important to get an indication of the prevalence of such practices in the scientific system to get an overview of the status of data sharing and Open Science in general.
The FAIR principles and Open Data management try to establish a data environment in which high quality data is easily accessible in the long term and where this data can be simply discovered, evaluated and reused (Wilkinson et al. 2016b). Making data more findable could be achieved by using identifiers, adding rich metadata and registration in a searchable resource. Accessibility could be improved by data being retrieved by their identifier in a standardized format, as well as by keeping metadata accessible even if the data is no longer available. Interoperability could be enhanced by using applicable language and vocabularies along with qualified references to other data. Reusability of data can be increased by unambiguous and comprehensive storing and describing practices.
Different stakeholders, such as researchers, data publishers and funding agencies stand to benefit from these practices. More insight in the application and presence of FAIR data principles could be very relevant in their profession. Questions typically relate to how to improve the implementation and development of FAIR principles. In order to improve (FAIR) data sharing practices, it is important to first have an overview of the current practices. Hence, relevant questions are where, how and what FAIR data practices are used in an area of interest.
Metrics
Level of FAIRness of data
Metrics on the level of FAIRness of data (sources) can support in establishing the prevalence of open/FAIR data practices. This metric attempts to show in a more nuanced manner where FAIR data practices are used and in some cases even to what extent they are used. Assessing whether or not a data source practices FAIR principles is not trivial with a quick glance, but there are initiatives that developed methodologies that assist to determine this for (a large number of) data sources.
Measurement.
Existing methodologies
Research Data Alliance
The Research Data Alliance developed a FAIR Data Maturity Model that can help to assess whether or not data adheres to the FAIR principles. This document is not meant to be a normative model, but provide guidelines for informed assessment.
The document includes a set of indicators for each of the four FAIR principles that can be used to assess whether or not the principles are met. Each indicator is described in detail and its relevance is annotated (essential, important or useful). The model recommends to evaluate the maturity of each indicator with the following set of maturity categories:
0 – not applicable
1 – not being considered yet
2 – under consideration or in planning phase
3 – in implementation phase
4 – fully implemented
By following this methodology, one could assess to what extent the FAIR data practices are adhered to and create comprehensive overviews, for instance by showing the scores in radar charts.
Data life cycle assessment
Determining the level of FAIR data practices can involve assessing how well data adheres to the FAIR principles at each stage of the data lifecycle, from creation to sharing and reuse (Jacob 2019).
Identify the stages of the data lifecycle: The data lifecycle typically includes stages such as planning, collection, processing, analysis, curation, sharing, and reuse. Identify the stages that are relevant to the data to be assessed.
Evaluate adherence to FAIR principles at each stage: For each stage of the data lifecycle, evaluate the extent to which the data adheres to the FAIR principles. Use for instance the FAIR Data Maturity Model to score the adherence to the FAIR principles, assign a score for each principle and stage of the data lifecycle.
Determine the overall level of FAIR data practices: Once the scores for each principle and stage have been assigned, determine the overall level of FAIR data practices. This can be done by using a summary score that takes into account the scores for each principle and stage, or by assigning a level of FAIR data practices based on the average score across the principles and stages.
Availability of data statement
A data availability statement in a publication describes how the reader could get access to the data of the research. Having such a statement in place improves transparency on data availability and can thus be considered as an Open Data practice. However, having a data availability statement in place does not necessarily imply that the data is openly available or that it is more likely that the data can be shared (Gabelica, Bojčić, and Puljak 2022). Nevertheless, a description of how to access an Open Data repository, how to make a request for data access or an explanation why some data cannot be shared due to ethical considerations are all examples of Open Data practices that make data reuse more accessible and transparent (Federer et al. 2018). The availability of a data statement can therefore be considered as an Open Data practice.
Measurement
Existing methodology
All PLOS journals require publications to include a data availability statement. Moreover, it is strongly recommended that procedures on how to access research data are described in the data availability statement and that the data is stored in a public repository. Other practices that comply with this recommendation are including a data file, data requests through an approving committee and providing contact information for a third party that owns the data (Federer et al. 2018). A detailed description of how to use PLOS data availability statements for quantitative research can be found in the (Colavizza et al. 2020) publication.
Known correlates
Some research suggests that openly sharing data is positively related to the citation rate of publications (Piwowar, Day, and Fridsma 2007; Piwowar and Vision 2013).
References
Reuse
Citation
@online{apartis2023,
author = {Apartis, S. and Catalano, G. and Consiglio, G. and Costas,
R. and Delugas, E. and Dulong de Rosnay, M. and Grypari, I. and
Karasz, I. and Klebel, Thomas and Kormann, E. and Manola, N. and
Papageorgiou, H. and Seminaroti, E. and Stavropoulos, P. and Stoy,
L. and Traag, V.A. and van Leeuwen, T. and Venturini, T. and
Vignetti, S. and Waltman, L. and Willemse, T.},
title = {PathOS - {D2.1} - {D2.2} - {Open} {Science} {Indicator}
{Handbook}},
date = {2023},
url = {https://handbook.pathos-project.eu/indicator_templates/quarto/1_open_science/prevalence_open_fair_data_practices.html},
doi = {10.5281/zenodo.8305626},
langid = {en}
}