Collaboration intensity
History
Version | Revision date | Revision | Author |
---|---|---|---|
1.0 | 2023-11-20 | First draft | V.A. Traag |
Description
Collaboration is common in academia, and many activities in academia, such as conferences or workshops, are aimed at stimulating intellectual exchanges between scholars. Collaboration become visible in publications that are co-authored by multiple authors, institutions and countries, which has become increasingly more common over the years. Here, we will use co-authored publication as the basis for metrics that serve as indicators for collaboration.
Besides co-authorships there are multiple other forms of collaboration that are less well visible. People might have fruitful exchanges with peers, that may help foster certain research directions. Critical feedback and discussions at conferences might provide further input. Some collaborations may become visible in acknowledgments, and some collaboration might effectively lead to co-authorship, but this represents only one part of collaboration.
Hence, not all collaborations necessarily translate into co-authorship. We might therefore expect the metric to be relatively precise (i.e. a co-authorship most likely signals a collaboration), but not necessarily very sensitive (i.e. not all collaborations might be visible through co-authorships). The sensitivity might be field-dependent: some fields have a rather stringent co-authorship culture, where only very substantial involvement in the research or the writing materialise in co-authorships, whereas other fields have a more lenient co-authorship culture, where smaller contributions might already translate into co-authorship. The precision is expected to be less field-dependent: authorship most likely signals collaboration in all fields.
Co-authorship can be measured in different ways and at different levels. It can be measured from the perspective of a set of publications, seeing how collaborative it is, or it can be measured from the perspective of an individual co-author. The first perspective simply leads to overall measures of collaboration, but the latter leads to measures of collaboration between various co-authors, bringing in a network perspective (Perianes-Rodriguez, Waltman, and Van Eck 2016).
The most relevant levels of co-authorship are the individual author level, the institutional level and the country level. Each of these levels may be measured in various ways. The simplest approach to collaboration simply assumes that each co-author has contributed equally to the work, resulting in so-called fractional publication counts (Waltman and Eck 2015). For instance, if a publication has three authors: each has a fraction of 1/3. If two of the authors are affiliated with a single institution, say institution A, that institution will have a weight of 2/3. If, in addition, the third author would have two affiliations, one with the aforementioned institution A, and one with institution B, we could count that author as belonging to institution A for ½, bringing the total to 5/6. Other forms of credit allocation are also possible (Hagen 2008).
We will detail three metrics of collaboration: one focusing on the number of authors, institutes or countries involved in collaboration, one on the number of joint papers, and another focusing on the % of papers that show some form of collaboration.
Number of joint papers
The average number of joint papers is a metric about the degree of collaboration between two co-authors, and as such, represent an indicator for collaboration. This is a relatively good metric for collaboration, and brings in a network perspective on collaboration.
Measurement.
Let \(k_i\) represent the number of co-authors of publication \(i\), either at the level of individual authors, institutions or countries. From the perspective of a collaboration network, the strength of collaboration between a pair of co-authors is defined slightly differently (Perianes-Rodriguez, Waltman, and Van Eck 2016). Let us focus on the collaboration between two co-authors \(a\) and \(b\). Let \(S_{ab}\) be the set of publications on which \(a\) and \(b\) collaborated. They then have \(n_{ab} = |S_{ab}|\) publications in common. However, perhaps \(a\) and \(b\) are only few of many co-authors on each of these publications, so that the actual collaboration between \(a\) and \(b\) is relatively less intense than the overall number of publications would suggest. Let us consider a paper \(i\) that \(a\) and \(b\) collaborated on, which had \(k_i\) collaborators in total. Then \(a\) has collaborated with \(k_i - 1\) collaborators, of which \(b\) is one, and vice-versa, \(b\) has also collaborated with \(k_i – 1\) collaborators, of which \(a\) is one. Hence, the fraction of collaboration of \(a\) with \(b\) is then \(1/(k_i – 1)\). The total collaboration between \(a\) and \(b\) is then
\[w_{ab} = \sum_{i \in S_{ab}} \frac{1}{k_i – 1}.\]
When there is a fractionalisation available, the fraction of publication \(i\) that belongs to \(a\), respectively \(b\), can be denoted by \(w_{ia}\) and \(w_{ib}\) respectively. In that case, the collaboration of \(a\) with \(b\) can be defined as \(w_{ib}/(1 - w_{ia})\), and hence the total collaboration as
\[w_{ab} = \sum_{i \in S_{ab}} \frac{w_{ib}}{k_i – w_{ia}}.\]
Note that we indeed uncover the previous equation when using \(w_{ia} = 1/k_i\).
For more details, see (Perianes-Rodriguez, Waltman, and Van Eck 2016).
Datasources:
OpenAlex
OpenAlex covers publications based on previously gathered data from Microsoft Academic Graph, but mostly relies on Crossref to index new publications. OpenAlex offers a user interface that is at the moment still under active development, an open API, and the possibility to download the entire data snapshot. The API is rate-limited, but there are options of having a premium account. Documentation for the API is available at https://docs.openalex.org/.
It is possible to retrieve the co-authors for a particular publication in OpenAlex, for example by using a third-party package for Python called pyalex
.
import pyalex as alx
= "mail@example.com"
alx.config.email = alx.Works()["W3128349626"]
w
= w[“author”]
authors = w[“institutions”]
institutions = w[“countries”] countries
Based on this type of data, the above-mentioned metrics can be calculated. When large amounts of data need to be processed, it is recommended to download the full data snapshot, and work with it directly.
OpenAlex provides disambiguated authors, institutes and countries. The institutions are matched to Research Organization Registry (ROR), the countries might be available, even if no specific institution is available.
Dimensions
Dimensions is a bibliometric database that takes a comprehensive approach to indexing publications. It offers limited free access through its user interface. API access and access through its database via Google BigQuery can be arranged through payments. It also offers the possibility to apply for access to the API and/or Google BigQuery for research purposes. The API is documented at https://docs.dimensions.ai/dsl.
The database is closed access, and we therefore do not provide more details about API usage.
Scopus
Scopus is a bibliometric database with a relatively broad coverage. Its data is closed and is generally available only through a paid subscription. It does offer the possibility to apply for access for research purposes through the ICSR Lab. Some additional documentation of their metrics is available at https://www.elsevier.com/products/scopus/metrics, in particular in the Research Metrics Guidebook, with documentation for the dataset available through ICSR Lab being available separately.
The database is closed access, and we therefore do not provide more details about API usage.
Web of Science
Web of Science is a bibliometric database that takes a more selective approach to indexing publications. Its data is closed and is only through a paid subscription.
The database is closed access, and we therefore do not provide more details about API usage.
Known correlates
Collaboration is associated with citations (Larivière et al. 2015), but this is potentially driven by network effects (Schulz et al. 2018) and need not be a causal effect.
References
Reuse
Citation
@online{apartis2023,
author = {Apartis, S. and Catalano, G. and Consiglio, G. and Costas,
R. and Delugas, E. and Dulong de Rosnay, M. and Grypari, I. and
Karasz, I. and Klebel, Thomas and Kormann, E. and Manola, N. and
Papageorgiou, H. and Seminaroti, E. and Stavropoulos, P. and Stoy,
L. and Traag, V.A. and van Leeuwen, T. and Venturini, T. and
Vignetti, S. and Waltman, L. and Willemse, T.},
title = {PathOS - {D2.1} - {D2.2} - {Open} {Science} {Indicator}
{Handbook}},
date = {2023},
url = {https://handbook.pathos-project.eu/indicator_templates/quarto/2_academic_impact/collaboration_intensity.html},
doi = {10.5281/zenodo.8305626},
langid = {en}
}