Open Science Impact Indicator Handbook

S. Apartis; G. Catalano; G. Consiglio; R. Costas; E. Delugas; M. Dulong de Rosnay; I. Grypari; I. Karasz; Thomas Klebel; E. Kormann; N. Manola; H. Papageorgiou; E. Seminaroti; P. Stavropoulos; L. Stoy; V.A. Traag; T. van Leeuwen; T. Venturini; S. Vignetti; L. Waltman; T. Willemse

doi:10.5281/zenodo.14538442

Productivity

Author

Affiliation

V.A Traag

Leiden University

History

Version	Revision date	Revision	Author
1.0	2024-12-06	First draft	V.A. Traag

Description

In general, productivity estimates the amount of output relative to the amount of input. In the context of academia, outputs can be various objects, varying from publications to data, code, or peer reviews. Although productivity is an aspect of interest, it should usually be considered jointly with something like quality. That is, a higher productivity may just stimulate more, but lower quality, outputs. There is some evidence of such a type of effect (Butler 2003), although this evidence is also disputed (Besselaar, Heyman, and Sandström 2017).

Output is usually only measured for a limited set of objects, with scholarly publications being the most typical example. Nonetheless, other relevant outputs should not be ignored, and limitations of productivity based on publications should be considered. Moreover, we should be aware of certain potential differences between productivity at the individual level and the collective level. For instance, consider a research group for which one individual is tasked with data quality assurance and code review. That individual might perhaps have a lower productivity in terms of publication outputs, yet her/his activities are a boon to the other researchers in the group, whose productivity might greatly increase as a result (Tiokhin et al. 2023).

In addition, one aspect of productivity that is usually missing is the overall input (Abramo and D’Angelo 2016). That is, we typically do not know how many people are employed at a certain institution. Even if part of that becomes visible in authorships, not every employee’s contribution will become visible in authorship. Hence, institutions that have for example more research assistants who are not acknowledges as author may seem to have relatively few authors, but in reality there are much more people active at the institution. Moreover, even if we know whether a particular author as affiliated with a certain institution, we do not know the amount of time (s)he spends at that affiliation, which is particularly challenging with multiple affiliations. Going one step further, the input could also be specified in financial terms. Unfortunately, none of this data is typically available (Waltman et al. 2016). Nonetheless, this is an important limitation to taken into account when considering productivity.

Avg. number of papers per author

Measurement

For a certain institutions \(i\) we can count how many authors \(a_i\) are affiliated with institution \(i\) and how many publications \(n_i\) are published in a given year \(y\). The ratio of \(\frac{n_i}{a_i}\) then gives the average number of papers per author, which is an indicator of productivity. We typically observe an increase in productivity over time, such that in more recent years, the number of papers per author is usually larger than in earlier years.

One relevant aspect in the context of counting number of papers per author is the increase in collaboration. If the total amount of publications remains the same in a given year, but more of them are co-authored, then the metric will be higher. Hence, it sometimes makes sense to use “fractional counting” for publications (Waltman and Eck 2015). This means that we can consider fractions, or weights, for all publications, based on the “fraction” of their authorship. For instance, if a publication has three authors: each has a fraction of 1/3. If two of the authors are affiliated with a single institution, say institution A, that institution will have a weight of 2/3. If, in addition, the third author would have two affiliations, one with the aforementioned institution A, and one with institution B, we could count that author as belonging to institution A for 1/2, bringing the total to 5/6.

If we indicate \(n_{ji}\) the fraction to which publication \(j\) belongs to institution \(i\), we can define \(n'_i = \sum_j w_{ji}\) the number of fractionally counted publications. Similarly, if we indicate with \(a_{ji}\) the fraction with which author \(j\) belongs to institution \(i\), we can define the fractionally counted number of authors as \(a'_{i} = \sum_j a_{ji}\). The productivity can then be simply specified as \(\frac{n'_i}{a'_i}\).

If there is input data available, such that the total amount of budget of fte available is indicated by \(f_i\), the average number of publications per currency unit or fte can be expressed as \(\frac{n_i}{f_i}\).

Datasources

OpenAlex

OpenAlex covers publications based on previously gathered data from Microsoft Academic Graph, but mostly relies on Crossref to index new publications. OpenAlex offers a user interface that is at the moment still under active development, an open API, and the possibility to download the entire data snapshot. The API is rate-limited, but there are options of having a premium account. Documentation for the API is available at https://docs.openalex.org/.

It is possible to retrieve the number of authors for a particular publication in OpenAlex, for example by using a third-party package for Python called pyalex.

import pyalex as alx
alx.config.email = "mail@example.com"
w = alx.Works()["W3128349626"]

authors = w["author"]
institutions = w["institutions"]
countries = w["countries"]

Based on this type of data, the above-mentioned metrics can be calculated. When large amounts of data need to be processed, it is recommended to download the full data snapshot, and work with it directly.

OpenAlex provides disambiguated authors, institutes and countries. The institutions are matched to Research Organization Registry (ROR), the countries might be available, even if no specific institution is available.

Dimensions

Dimensions is a bibliometric database that takes a comprehensive approach to indexing publications. It offers limited free access through its user interface. API access and access through its database via Google BigQuery can be arranged through payments. It also offers the possibility to apply for access to the API and/or Google BigQuery for research purposes. The API is documented at https://docs.dimensions.ai/dsl.

The database is closed access, and we therefore do not provide more details about API usage.

Scopus

Scopus is a bibliometric database with a relatively broad coverage. Its data is closed and is generally available only through a paid subscription. It does offer the possibility to apply for access for research purposes through the ICSR Lab. Some additional documentation of their metrics is available at https://www.elsevier.com/products/scopus/metrics, in particular in the Research Metrics Guidebook, with documentation for the dataset available through ICSR Lab being available separately.

The database is closed access, and we therefore do not provide more details about API usage.

Web of Science

Web of Science is a bibliometric database that takes a more selective approach to indexing publications. Its data is closed and is only through a paid subscription.

The database is closed access, and we therefore do not provide more details about API usage.

References

Abramo, Giovanni, and Ciriaco Andrea D’Angelo. 2016. “A Farewell to the MNCS and Like Size-Independent Indicators.” Journal of Informetrics 10 (2): 646–51. https://doi.org/10.1016/j.joi.2016.04.006.

Besselaar, Peter van den, Ulf Heyman, and Ulf Sandström. 2017. “Perverse Effects of Output-Based Research Funding? Butler’s Australian Case Revisited.” J. Informetr. 11 (3): 905–18. https://doi.org/10.1016/j.joi.2017.05.016.

Butler, Linda. 2003. “Explaining Australia’s Increased Share of ISI Publications—the Effects of a Funding Formula Based on Publication Counts.” Res. Policy 32 (1): 143–55. https://doi.org/10.1016/S0048-7333(02)00007-0.

Tiokhin, Leo, Karthik Panchanathan, Paul E. Smaldino, and Daniël Lakens. 2023. “Shifting the Level of Selection in Science.” Perspectives on Psychological Science, August, 17456916231182568. https://doi.org/10.1177/17456916231182568.

Waltman, Ludo, and Nees Jan van Eck. 2015. “Field-Normalized Citation Impact Indicators and the Choice of an Appropriate Counting Method.” Journal of Informetrics 9 (4): 872894. https://www.sciencedirect.com/science/article/pii/S1751157715300456?casa_token=d-OFUoR59AUAAAAA:ET7OLI5A7aIU9000goIFbzSvdcrkcgrQW1cmglrUxPvDB9hcn33QxeX_wNWNh7-OzRlRp33Tewk.

Waltman, Ludo, Nees Jan van Eck, Martijn Visser, and Paul Wouters. 2016. “The Elephant in the Room: The Problem of Quantifying Productivity in Evaluative Scientometrics.” Journal of Informetrics 10 (2): 671–74. https://doi.org/10.1016/j.joi.2015.12.008.

Reuse

Citation

BibTeX citation:

@online{apartis2024,
  author = {Apartis, S. and Catalano, G. and Consiglio, G. and Costas,
    R. and Delugas, E. and Dulong de Rosnay, M. and Grypari, I. and
    Karasz, I. and Klebel, Thomas and Kormann, E. and Manola, N. and
    Papageorgiou, H. and Seminaroti, E. and Stavropoulos, P. and Stoy,
    L. and Traag, V.A. and van Leeuwen, T. and Venturini, T. and
    Vignetti, S. and Waltman, L. and Willemse, T.},
  title = {Open {Science} {Impact} {Indicator} {Handbook}},
  date = {2024},
  url = {https://handbook.pathos-project.eu/sections/2_academic_impact/productivity.html},
  doi = {10.5281/zenodo.14538442},
  langid = {en}
}

For attribution, please cite this work as:

Apartis, S., G. Catalano, G. Consiglio, R. Costas, E. Delugas, M. Dulong de Rosnay, I. Grypari, et al. 2024. “Open Science Impact Indicator Handbook.” Zenodo. 2024. https://doi.org/10.5281/zenodo.14538442.