PathOS - D2.1 - D2.2 - Open Science Indicator Handbook

S. Apartis; G. Catalano; G. Consiglio; R. Costas; E. Delugas; M. Dulong de Rosnay; I. Grypari; I. Karasz; Thomas Klebel; E. Kormann; N. Manola; H. Papageorgiou; E. Seminaroti; P. Stavropoulos; L. Stoy; V.A. Traag; T. van Leeuwen; T. Venturini; S. Vignetti; L. Waltman; T. Willemse

doi:10.5281/zenodo.8305626

Author

Affiliation

V.A Traag

Leiden University

Collaboration intensity

History

Version	Revision date	Revision	Author
1.0	2023-11-20	First draft	V.A. Traag

Description

Collaboration is common in academia, and many activities in academia, such as conferences or workshops, are aimed at stimulating intellectual exchanges between scholars. Collaboration become visible in publications that are co-authored by multiple authors, institutions and countries, which has become increasingly more common over the years. Here, we will use co-authored publication as the basis for metrics that serve as indicators for collaboration.

Besides co-authorships there are multiple other forms of collaboration that are less well visible. People might have fruitful exchanges with peers, that may help foster certain research directions. Critical feedback and discussions at conferences might provide further input. Some collaborations may become visible in acknowledgments, and some collaboration might effectively lead to co-authorship, but this represents only one part of collaboration.

Hence, not all collaborations necessarily translate into co-authorship. We might therefore expect the metric to be relatively precise (i.e. a co-authorship most likely signals a collaboration), but not necessarily very sensitive (i.e. not all collaborations might be visible through co-authorships). The sensitivity might be field-dependent: some fields have a rather stringent co-authorship culture, where only very substantial involvement in the research or the writing materialise in co-authorships, whereas other fields have a more lenient co-authorship culture, where smaller contributions might already translate into co-authorship. The precision is expected to be less field-dependent: authorship most likely signals collaboration in all fields.

Co-authorship can be measured in different ways and at different levels. It can be measured from the perspective of a set of publications, seeing how collaborative it is, or it can be measured from the perspective of an individual co-author. The first perspective simply leads to overall measures of collaboration, but the latter leads to measures of collaboration between various co-authors, bringing in a network perspective (Perianes-Rodriguez, Waltman, and Van Eck 2016).

The most relevant levels of co-authorship are the individual author level, the institutional level and the country level. Each of these levels may be measured in various ways. The simplest approach to collaboration simply assumes that each co-author has contributed equally to the work, resulting in so-called fractional publication counts (Waltman and Eck 2015). For instance, if a publication has three authors: each has a fraction of 1/3. If two of the authors are affiliated with a single institution, say institution A, that institution will have a weight of 2/3. If, in addition, the third author would have two affiliations, one with the aforementioned institution A, and one with institution B, we could count that author as belonging to institution A for ½, bringing the total to 5/6. Other forms of credit allocation are also possible (Hagen 2008).

We will detail three metrics of collaboration: one focusing on the number of authors, institutes or countries involved in collaboration, one on the number of joint papers, and another focusing on the % of papers that show some form of collaboration.

Avg. number of co-authors

The average number of co-authors is a metric about the degree of collaboration, and as such, represent an indicator for collaboration. This is a relatively good metric for collaboration. However, the average might hide differences between publications in terms of the degree of collaboration. For instance, some physics publications may involve many hundred co-authors, and if such a paper is included in the set, it may lead to misleading results.

Measurement.

Let \(k_i\) represent the number of co-authors of publication \(i\), either at the level of individual authors, institutions or countries. Let \(S\) be the set of publications of interest, with in total \(n = \|S\|\) publications. Then the average over the set of publications is simply the average as usual

\[c = \frac{1}{n} \sum_{i \in S} k_i.\]

Possibly, this average can be weighted by the fraction to which a publication belongs to the set of publications of interest.

Number of joint papers

The average number of joint papers is a metric about the degree of collaboration between two co-authors, and as such, represent an indicator for collaboration. This is a relatively good metric for collaboration, and brings in a network perspective on collaboration.

Measurement.

Let \(k_i\) represent the number of co-authors of publication \(i\), either at the level of individual authors, institutions or countries. From the perspective of a collaboration network, the strength of collaboration between a pair of co-authors is defined slightly differently (Perianes-Rodriguez, Waltman, and Van Eck 2016). Let us focus on the collaboration between two co-authors \(a\) and \(b\). Let \(S_{ab}\) be the set of publications on which \(a\) and \(b\) collaborated. They then have \(n_{ab} = |S_{ab}|\) publications in common. However, perhaps \(a\) and \(b\) are only few of many co-authors on each of these publications, so that the actual collaboration between \(a\) and \(b\) is relatively less intense than the overall number of publications would suggest. Let us consider a paper \(i\) that \(a\) and \(b\) collaborated on, which had \(k_i\) collaborators in total. Then \(a\) has collaborated with \(k_i - 1\) collaborators, of which \(b\) is one, and vice-versa, \(b\) has also collaborated with \(k_i – 1\) collaborators, of which \(a\) is one. Hence, the fraction of collaboration of \(a\) with \(b\) is then \(1/(k_i – 1)\). The total collaboration between \(a\) and \(b\) is then

\[w_{ab} = \sum_{i \in S_{ab}} \frac{1}{k_i – 1}.\]

When there is a fractionalisation available, the fraction of publication \(i\) that belongs to \(a\), respectively \(b\), can be denoted by \(w_{ia}\) and \(w_{ib}\) respectively. In that case, the collaboration of \(a\) with \(b\) can be defined as \(w_{ib}/(1 - w_{ia})\), and hence the total collaboration as

\[w_{ab} = \sum_{i \in S_{ab}} \frac{w_{ib}}{k_i – w_{ia}}.\]

Note that we indeed uncover the previous equation when using \(w_{ia} = 1/k_i\).

For more details, see (Perianes-Rodriguez, Waltman, and Van Eck 2016).

Number/% of co-authored publications

The number or percentage of co-authored publications provides a good indicator of collaboration. However, it provides a more coarse view of collaboration than the number of co-authors involved. In principle, each publication is considered to be co-authored if there is at least more than one co-author involved, but any co-authors beyond the second are immaterial to this metric. At the same time, this can also be considered an advantage. Whereas the average number of co-authors is very sensitive to outliers, such as physics papers with hundreds of co-authors, this metric is more robust to that.

Measurement.

Let \(k_i\) represent the number of co-authors of publication \(i\), either at the level of individual authors, institutions or countries. Let \(c_i\) then be \(1\) if \(k_i \geq 2\) and \(0\) otherwise, so as to indicate when a paper is collaborative. Let \(S\) be the set of publications of interest, with in total \(n = \|S\|\) publications. Then the total number of publications that are collaborative is simply

\[\sum_{i \in S} c_i,\]

while the percentage of publications that are collaborative is simply

\[\frac{1}{n} \sum_{i \in S} c_i,\]

If we represent co-authorship at the institution level, this measures the percentage of institutional collaborations, and if we represent co-authorship at the country level, that this measures the percentage of international collaborations. Although this may be evident, it might therefore be possible that a publication is still authored by multiple authors, but in this definition they may not be counted as collaborative if they do not involve different institutions or different countries, respectively.

Datasources:

OpenAlex

OpenAlex covers publications based on previously gathered data from Microsoft Academic Graph, but mostly relies on Crossref to index new publications. OpenAlex offers a user interface that is at the moment still under active development, an open API, and the possibility to download the entire data snapshot. The API is rate-limited, but there are options of having a premium account. Documentation for the API is available at https://docs.openalex.org/.

It is possible to retrieve the co-authors for a particular publication in OpenAlex, for example by using a third-party package for Python called pyalex.

import pyalex as alx
alx.config.email = "mail@example.com"
w = alx.Works()["W3128349626"]

authors = w[“author”]
institutions = w[“institutions”]
countries = w[“countries”]

Based on this type of data, the above-mentioned metrics can be calculated. When large amounts of data need to be processed, it is recommended to download the full data snapshot, and work with it directly.

OpenAlex provides disambiguated authors, institutes and countries. The institutions are matched to Research Organization Registry (ROR), the countries might be available, even if no specific institution is available.

Dimensions

Dimensions is a bibliometric database that takes a comprehensive approach to indexing publications. It offers limited free access through its user interface. API access and access through its database via Google BigQuery can be arranged through payments. It also offers the possibility to apply for access to the API and/or Google BigQuery for research purposes. The API is documented at https://docs.dimensions.ai/dsl.

The database is closed access, and we therefore do not provide more details about API usage.

Scopus

Scopus is a bibliometric database with a relatively broad coverage. Its data is closed and is generally available only through a paid subscription. It does offer the possibility to apply for access for research purposes through the ICSR Lab. Some additional documentation of their metrics is available at https://www.elsevier.com/products/scopus/metrics, in particular in the Research Metrics Guidebook, with documentation for the dataset available through ICSR Lab being available separately.

The database is closed access, and we therefore do not provide more details about API usage.

Web of Science

Web of Science is a bibliometric database that takes a more selective approach to indexing publications. Its data is closed and is only through a paid subscription.

The database is closed access, and we therefore do not provide more details about API usage.

Known correlates

Collaboration is associated with citations (Larivière et al. 2015), but this is potentially driven by network effects (Schulz et al. 2018) and need not be a causal effect.

References

Hagen, Nils T. 2008. “Harmonic Allocation of Authorship Credit: Source-Level Correction of Bibliometric Bias Assures Accurate Publication and Citation Analysis.” PLoS One 3 (12): e4021. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0004021.

Larivière, Vincent, Yves Gingras, Cassidy R. Sugimoto, and Andrew Tsou. 2015. “Team Size Matters: Collaboration and Scientific Impact Since 1900.” Journal of the Association for Information Science and Technology 66 (7): 1323–32. https://doi.org/10.1002/asi.23266.

Perianes-Rodriguez, Antonio, Ludo Waltman, and Nees Jan Van Eck. 2016. “Constructing Bibliometric Networks: A Comparison Between Full and Fractional Counting.” Journal of Informetrics 10 (4): 11781195. https://www.sciencedirect.com/science/article/pii/S1751157716302036?casa_token=CqUpGtD69gIAAAAA:MYGba1tlSRhDcycjBKTR97tcfwxhoPmbE4ljQQT7LufCZFUikz5I8XhPsSgyyAXqNN8zza4LPOk.

Schulz, Christian, Brian Uzzi, Dirk Helbing, and Olivia Woolley-Meza. 2018. “A Network-Based Citation Indicator of Scientific Performance.” https://doi.org/10.48550/arXiv.1807.04712.

Waltman, Ludo, and Nees Jan van Eck. 2015. “Field-Normalized Citation Impact Indicators and the Choice of an Appropriate Counting Method.” Journal of Informetrics 9 (4): 872894. https://www.sciencedirect.com/science/article/pii/S1751157715300456?casa_token=d-OFUoR59AUAAAAA:ET7OLI5A7aIU9000goIFbzSvdcrkcgrQW1cmglrUxPvDB9hcn33QxeX_wNWNh7-OzRlRp33Tewk.

Reuse

Citation

BibTeX citation:

@online{apartis2023,
  author = {Apartis, S. and Catalano, G. and Consiglio, G. and Costas,
    R. and Delugas, E. and Dulong de Rosnay, M. and Grypari, I. and
    Karasz, I. and Klebel, Thomas and Kormann, E. and Manola, N. and
    Papageorgiou, H. and Seminaroti, E. and Stavropoulos, P. and Stoy,
    L. and Traag, V.A. and van Leeuwen, T. and Venturini, T. and
    Vignetti, S. and Waltman, L. and Willemse, T.},
  title = {PathOS - {D2.1} - {D2.2} - {Open} {Science} {Indicator}
    {Handbook}},
  date = {2023},
  url = {https://handbook.pathos-project.eu/indicator_templates/quarto/2_academic_impact/collaboration_intensity.html},
  doi = {10.5281/zenodo.8305626},
  langid = {en}
}

For attribution, please cite this work as:

Apartis, S., G. Catalano, G. Consiglio, R. Costas, E. Delugas, M. Dulong de Rosnay, I. Grypari, et al. 2023. “PathOS - D2.1 - D2.2 - Open Science Indicator Handbook.” Zenodo. 2023. https://doi.org/10.5281/zenodo.8305626.