Impact of Open Code in research

Author
Affiliation

P. Stavropoulos

Athena Research Center

Version Revision date Revision Author
1.2 2023-08-30 Revisions Petros Stavropoulos
1.1 2023-07-20 Revisions Petros Stavropoulos
1.0 2023-05-11 First Draft Petros Stavropoulos

Description

The impact of Open Code in research aims to capture the effect of making research code openly accessible and reusable on enhancing the reproducibility of research results, as open and accessible code is a cornerstone for verification and validation in science.

This indicator can be used to assess the level of openness and accessibility of research code within a specific scientific community or field and to identify potential barriers or incentives for the adoption of Open Code practices. It can also be used to track the reuse and subsequent impact related to reproducibility of Open Code, as well as to evaluate the effectiveness of policies and initiatives promoting Open Code practices.

Connections to Academic Indicators

This indicator examines the broader effects of making code or software openly accessible, focusing on its role in fostering transparency, collaboration, and reproducibility across the scientific community. This builds upon the Use of Code in Research, which assesses the initial incorporation of code or software into research, and the Reuse of Code in Research, which measures the extent to which existing code or software is adopted in subsequent studies. Together, these indicators provide a comprehensive view of how code or software contributes to research outputs, reusability, reproducibility, and the wider adoption of Open Code practices.

Metrics

NCI for publications that have introduced/reused Open Code

This metric captures the Normalised Citation Impact (NCI) for publications that have either introduced or reused Open Code. By assessing citation impact, this indicator reflects the visibility and influence of research publications that contribute to or benefit from Open Code practices. Citation-based metrics, including the NCI, are extensively discussed under the academic indicator Citation Impact. For general details on the methodology, limitations, and measurement of NCI, refer to the academic indicator and its corresponding metrics.

In this metric, we focus specifically on publications that have directly contributed to reproducibility by either introducing new Open Code or reusing existing Open Code. The reuse of Open Code can be identified using methodologies and metrics outlined in the academic indicator Use of Code in Research, which provides tools and techniques for tracking code usage in research publications. Additionally, this indicator highlights publications that explicitly document and share new Open Code repositories.

To measure the NCI for publications that have introduced Open Code, we identify relevant publications through metadata analysis of code repositories, such as unique identifiers or DOIs associated with repositories like GitHub or GitLab. This process can be supported by automated tools, including the SciNoBo toolkit, which uses Deep Learning and Natural Language Processing (NLP) to extract metadata such as the repository name, version, license, and URLs. These tools enable precise identification of publications introducing Open Code, making it possible to calculate their citation impact.

The NCI for publications that have introduced or reused Open Code is particularly relevant for reproducibility because it serves as a proxy for the level of engagement and trust that the broader research community places in these resources. Highly cited publications introducing Open Code often signal that the code has provided novel, generalizable solutions to scientific problems, enabling other researchers to replicate and extend findings. Similarly, publications with high NCI that reuse Open Code indicate that shared computational tools are not only accessible but also integral to advancing research in a transparent and reproducible manner.

By focusing on NCI, we can compare publications across disciplines and timeframes, overcoming disparities in citation practices. This ensures that the contribution of Open Code to reproducibility is evaluated on a level playing field, highlighting those practices and outputs that have the greatest impact. Furthermore, normalized metrics allow us to monitor trends in Open Code practices, assess the effectiveness of reproducibility policies, and identify fields or communities where additional incentives for Open Code adoption might be needed.

Code downloads/usage counts/stars from repositories

This metric captures the level of interest and impact of Open Code by measuring repository activity such as downloads, usage counts, and stars. Metrics derived from repository platforms like GitHub, GitLab, or Bitbucket can provide insight into how often code is accessed, favorited, or replicated by other users. These indicators are extensively discussed under the academic indicator Use of Code in Research, particularly in the metric “Repository statistics (# Forks/Clones/Stars/Downloads/Views).” For detailed methodologies and measurement approaches, refer to the academic indicator.

In the context of reproducibility, this indicator emphasizes the implications of repository usage statistics for research transparency and validation. High download counts, numerous forks, or significant stars can signal that the code is well-documented, functional, and useful for replication and extension of research findings. Such engagement often indicates that the Open Code has met the standards required for reproducibility, including accessibility and usability by other researchers.

Furthermore, widespread use of Open Code repositories reflects the extent to which shared computational tools are integrated into the research ecosystem. This integration supports cumulative science, where researchers build on existing work rather than duplicating efforts. By tracking these repository metrics, we can better understand how Open Code practices facilitate reproducibility and identify gaps or barriers preventing broader adoption and reuse.

Reuse

Open Science Impact Indicator Handbook © 2024 by PathOS is licensed under CC BY 4.0 (View License)

Citation

BibTeX citation:
@online{apartis2024,
  author = {Apartis, S. and Catalano, G. and Consiglio, G. and Costas,
    R. and Delugas, E. and Dulong de Rosnay, M. and Grypari, I. and
    Karasz, I. and Klebel, Thomas and Kormann, E. and Manola, N. and
    Papageorgiou, H. and Seminaroti, E. and Stavropoulos, P. and Stoy,
    L. and Traag, V.A. and van Leeuwen, T. and Venturini, T. and
    Vignetti, S. and Waltman, L. and Willemse, T.},
  title = {Open {Science} {Impact} {Indicator} {Handbook}},
  date = {2024},
  url = {https://handbook.pathos-project.eu/sections/5_reproducibility/impact_of_open_code_in_research.html},
  doi = {10.5281/zenodo.14538442},
  langid = {en}
}
For attribution, please cite this work as:
Apartis, S., G. Catalano, G. Consiglio, R. Costas, E. Delugas, M. Dulong de Rosnay, I. Grypari, et al. 2024. “Open Science Impact Indicator Handbook.” Zenodo. 2024. https://doi.org/10.5281/zenodo.14538442.