PathOS - D2.1 - D2.2 - Open Science Indicator Handbook

S. Apartis; G. Catalano; G. Consiglio; R. Costas; E. Delugas; M. Dulong de Rosnay; I. Grypari; I. Karasz; Thomas Klebel; E. Kormann; N. Manola; H. Papageorgiou; E. Seminaroti; P. Stavropoulos; L. Stoy; V.A. Traag; T. van Leeuwen; T. Venturini; S. Vignetti; L. Waltman; T. Willemse

doi:10.5281/zenodo.8305626

Authors

Affiliation

E. Delugas

Centre for Industrial Studies

Catalano G.

Centre for Industrial Studies

S. Vignetti

Centre for Industrial Studies

Cost savings

History

Version	Revision date	Revision	Author
1.5	2024-04-26	Third draft	Delugas E., Catalano G.
1.4	2024-04-17	Peer review	V.A. Traag
1.3	2024-04-04	Second draft	Delugas E., Catalano G.
1.2	2023-09-11	Peer review	V.A. Traag
1.1	2023-07-04	Draft indicator template	Caputo A., Delugas E., Vignetti S.

Description

This cost saving indicator aims to capture the efficiency gains resulting from utilising OS resources (Fell 2019). Using OS resources, such as open repositories, can lead to significant savings in both time and money. This is because they reduce or eliminate certain costs associated with the scientific production process, thereby saving resources. However, the level of familiarity with these OS tools can vary among users. As a result, some initial investment in training or education may be necessary to understand and meet these requirements effectively.

On the one hand, organisations can streamline their operations and achieve greater efficiency by optimising the utilisation of OS resources, such as by reducing costs or completing activities in less time. For instance, enterprises might save time within the R&D departments, or even within other departments when the OS input is open software, an open tool or open data. This saving arises by reducing time spent on given activities (e.g., working hours), or avoiding storage and access costs. On the other hand, there might be the case of additional investment and operational costs associated with the use of OS. Indeed, when researchers opt for open-access channels to publish their scientific findings, there is a need to tailor the content to meet the unique standards of these platforms, a step that is not necessary with “closed” publishing routes. For instance, when openly sharing data, a researcher must often adhere to specific standards that are necessary either to utilise the research outputs or to contribute effectively to their open sharing. At the same time, professional users of open access resources might necessitate to learn how to exploit such OS resources, requiring specific skills to efficiently download, use, or interpret data.

Thus, the indicator is designed to measure the net impact between the investment cost needed to effectively use OS resources and the savings that materialise by using them. Accordingly, the proposed metrics can yield negative values under certain circumstances. The metrics often complement each other as they capture the different sources of savings that may arise with adopting OS practices for professionals, enterprises, and researchers. Since the boundaries among the benefits are often blurred, a notice of caution needs to be expressed in measuring the metrics to avoid overlapping benefits and double counting, i.e., including the same benefit into the same metric.

Metrics

Access costs savings thanks to OS

The metric “access cost savings” captures the avoided costs of accessing knowledge or tools essential for generating knowledge within a closed environment. Researchers and enterprises or professionals can bypass the expenses typically associated with accessing proprietary or paid resources by utilising OS resources. This includes costs such as subscription fees, licensing fees, or pay-per-use charges that would be incurred in a traditional closed system. The extent of savings in relation to total cost-saving generated by OS depends on the beneficiaries’ needs.

Measurement.

The first suitable way of estimating the value of this metric is the avoided cost method. It relies on the principle that if the OS input were not implemented, certain costs would still be incurred to meet the needs or objectives the OS seeks to address. Specifically, the costs refer to accessing scientific journal, data and other research outputs. The quantification implies the existence of the market price of a closed science similar service to compute the virtual price. No unique data source exists to extrapolate an absolute value of the saved cost since it varies depending on the beneficiaries’ demand and type of OS considered.
The second option for the metric measurement is relying on stated preference techniques to elicit the willingness-to-pay of the users for accessing different type of OS research output when market prices are not available, or they are not a good representation of the economic value of the savings. A Contingent Valuation survey on the users of an OS resource can be carried out to assess the willingness to pay and have an estimation of the value of the benefit. The Contingent Valuation is a technique used to estimate the value people place on goods, services, or resources without a clear market price. It is a way to understand how much individuals are willing to pay for something that is not easily bought or sold. A similar option is to exploit the Choice Experiment, which, instead, would allow for evaluating the single attributes of the OS input under evaluation. There are no existing data sources to obtain this metric since it depends on the case under analysis, which potentially changes every time. These techniques are equally applicable in the case of measuring labour cost savings (see section II.2). Indeed, although theoretically, access cost savings and labour cost savings are two distinct metrics, users may struggle to distinguish the two. Thus, the analyst should avoid double counting by carefully clarifying what is the objective of the evaluation, access cost or labour cost savings.
The third method is the Long Run Marginal Cost (LRMC). In this case, rather than valuing the money saved due to the free access, one values the saved production costs that would have been necessary to produce the equivalent research output in a scenario where the OS is unavailable. For example, the appraisal of an open database should be done by evaluating all the costs related to the in-house data collection and database-building process. To avoid double counting, this appraisal should strictly include only the costs of data collection and management to create the database (e.g., the labour cost equivalent to the time needed to collect the data) and should not include the costs associated with, e.g. the hardware to store the data, as those are related to the storage cost saving.

Existing methodologies

The avoided cost method

The avoided cost equation might change depending on which type of OS resource is under evaluation. However, a general formula can be summarised in the following way:

Where:

might be the number of research outputs (journals, data, methods, tools) that might vary over time and might be counted depending on the type of the OS (e.g., a single article, an annual access, etc);
is the market price that can vary depending on the type of OS and over time. For instance, an enterprise might opt for paying the price of single articles having an annual budget instead of illimited access which are usually chosen by research institutions. The same applies to data access, which prices might change depending on the type of the data, the frequency, etc.;
is the period over which the benefit should be computed;
is the number of users involved that might vary over time and can affect the price, e.g., the access to a software might depend on the number of users.

Contingent Valuation or Choice Experiment for willigness to pay of the users to access OS resources

Contingent Valuation and Choice Experiment assess the willingness to pay of a user to access an OS research output. Contingent Valuation is a technique that allows uncovering the total economic value of a public good by exploiting a stated preference survey, i.e., asking people what economic value they attach to a public good. Choice Experiment is instead adopted when one wants to reveal the marginal value of the attributes of the good or service. Details on how to carry out these surveys will be provided in the CBA methodological note (Delugas, Catalano, and Vignetti 2023). However, a helpful reference to implement the methods is the “Contemporary Guidance for Stated Preference Studies” (Johnston et al. 2017). The validity of the assessment relies on strictly following the rules of application. Some bias in the given responses may arise, which might be mitigated by following the literature recommendations. When employing these survey techniques to estimate this metric, respondents might also include, in their appraisal, broader perceived benefits such as transaction cost savings or enablement gains due to OS. Thus, the analyst should minimise the risk of double counting by adopting precise question framing and clear delineation between overlapping benefits.

Once obtained the willingness to pay for a unit of time (e.g., yearly) across the survey respondents, the value of the access cost savings equals:

Where:

is the mean willingness to pay;
N is the estimated number of users;
T is the span of time chosen for the evaluation of the benefit.

Long Run Marginal Cost

The assessment of access cost savings through the LRMC involves estimating the cost of producing a research output in-house rather than accessing it via OS resources. The estimation of production costs really depends on the type of research output, but it is possible to generalise the components of the estimation:

An estimate of the number of research outputs for a specific period, which could include data sets, papers, etc;
The labour costs associated with the production of the research output, such as the time spent collecting data to build a database;
An estimate of any potential tangible or intangible assets required to the production;
Other additional costs tied to the production process separate from labour and storage cost savings.

Where:

is the quantity of research outputs to be produced that might vary over time;
is the production cost expressed per the measurement unit of the research output;
is the span of time over which the benefit should be computed and can be expressed in any unit of time.

Labour cost savings given the availability of OS resources

The labour cost saving metric aims to capture the net effect generated by the availability of OS on the working hours, which is expressed in the personnel cost equivalent for time saved.^[1] For example, for a workday saved for a single researcher, the labour cost saving would mirror the daily salary. The savings may happen because of the availability of OS resources that facilitate the reduction of research output duplications (e.g., codes, papers, data) and improve professionals’ productivity by speeding up their work, allowing for task automation. For example, the availability of open data avoids collecting the same data more than once; open code saves time by reducing the need to write code (i.e., programming) from scratch. Similarly, data mining techniques automate information collection, which would otherwise require manual effort. Working time savings also occur due to a potential decrease in transaction costs, as closed environments require more time to obtain information or involve more complex procedures. Labour cost-saving is a helpful metric to gauge the production efficiency gains facilitated by OS resources, as it assesses the variation of one of the two components of the standard productivity indicator.

Measurement.

The avoided cost method is a suitable way of quantifying labour cost savings. It is particularly helpful since it is flexible enough to adapt to several contexts. In this case, the avoided costs allow for estimating the labour-cost saving by forecasting or measuring the time saved by each worker thanks to the OS input and evaluating it using the cost of labour. This method can be applied any time the labour costs per employee are available, and the estimation of the time saved is reliable. Indeed, since the time-savings and the personnel costs are strictly related to the beneficiary roles, the main issue for its measurement is the unavailability of the data. The best data source will always be the data provided by direct beneficiaries. However, for what concerns the data on salary, some private companies operating in the human resources sector might provide salary data, e.g., Glassdoor, Payscale. Alternatively, some data on the wages are usually available from national or international statistics institutes, e.g. Eurostat.
Another way of assessing the metric value is estimating the willingness-to-pay for time-saving among the OS users by means of the stated preference techniques. There is a substantial body of literature on the application of CE for the appraisal of saved time (e.g. Mahieu et al. (2014); Antoniou, Matsoukis, and Roussi (2007)). However, CV might also be adopted since there are no strict rules dictating the choice of the survey; rather, it depends on the context (Johnston et al. 2017). For instance, some weaknesses of the CE approach concern the need to collect data from a large sample to ensure statistical robustness and the respondents’ difficulty in selecting and ranking the options proposed in the survey. Please refer to access cost saving (section II.1) for additional details on these methods.

Existing methodologies

The avoided cost method

With the “avoided cost” method, organisations should provide an estimation of the time required (e.g., number of hours) to execute a specific task using OS resources, alongside the time it would have taken without OS resources. This facilitates the calculation of net savings. Naturally, when considering the scenario involving OS resources, they should also account for the time necessary to effectively utilise these resources. It may happen that more time is needed to use OS resources, for example, due to a lack of skills or specific knowledge on how to use them. To translate the net savings into costs, the salary associated with the researcher undertaking the assessed tasks could be used. When it is not possible, an attempt to recover the salary associated with the researcher or other employee, depending on some characteristics, might be done by exploiting, for instance, Glassdoor or PayScale data. When it is impossible to retrieve the individual salary associated with a specific job title, there are other options that envisage different level of approximation. Another route is to exploit national or European level micro data on wages, that, when available are closer approximation to the actual salary as well. When also this option is unavailable, another avenue for approximating the salary is to use the country-based average salary data, which can be retrieved from the national statistical institutes, though these sources involve a higher degree of approximation. When using average values, one could decide to adjust the figures by taking into consideration the existing estimation of the wage differentials that exist among sectors to reduce measurement errors. Regarding cross-country assessments, average salary data and labour costs can be retrieved from international statistical databases on earnings and wages (e.g., OECD or ILO databases).

In practice, the metric “labour-cost savings” is obtained by the following:

Where:

is the individual net time saved;
is the individual hourly wage;
is the span of time over which the benefit should be computed;
is the number of professionals involved in the time saving.

Choice Experiment (CE) or Contingent Valuation (CV) – WTP for the time saved by accessing OS outputs

Once elicited the willingness-to-pay for a unit of time saved (e.g., the value attached to an hour saved) across the survey respondents, the value of the labour cost savings equals:

is the average value of time attached to the chosen unit of time;
is the estimated number of employees that will save time thanks to OS input;
is the span of time chosen for the evaluation of the benefit.

Transaction cost savings given the availability of OS inputs

The metric “transaction cost savings” denotes the reduction in time that researchers and professionals experience when they bypass the need to navigate copyright agreements, engage in specific data or protocol access negotiations, and deal with other research outputs, all thanks to OS resources. Open resources like data, protocols, software, etc., can significantly cut down the time usually spent establishing access through agreements and procedures by offering universally shared and harmonised protocols for openness and access. Conversely, closed systems often require more bespoke and time-intensive methods to access information, involving the navigation of complex databases or compliance with detailed procedures.

Measurement.

This metric should be evaluated in the same way as the labour cost savings. Despite being related to both labour and access cost savings, given the weight of the transaction costs when research collaborations involve hundreds of facilities (Lee 2015), it is worth keeping them separate from either labour or access cost savings. For this reason, the avoided cost method is the suitable way of quantifying transaction cost savings. In this case, the avoided cost allows for estimating the transaction-cost savings by forecasting or measuring the time saved by each worker thanks to the OS resources and evaluating it using the equivalent salary. Please refer to the avoid cost method presented above for the Labour-Cost Savings.

Existing datasources for Labour and Transaction cost savings:

All the listed data sources are not intended to provide labour cost savings directly; rather, each of them is one of the potential sources for individual wage data that is necessary to measure the metric the following specific methodologies.

Glassdoor

Glassdoor is an American company that anonymously collects information and reviews on companies and their employees, including the paid salaries, at the international level. They provide salary data, including the share of bonuses or other components, taking into account variables that might influence the employee salary determination, e.g., company, country, and role. The data provided can be directly used in the avoided cost methodology. Glassdoor is accessible after having contributed to the data collection.

The primary limitation of this type of data lies in its collection method, which does not ensure statistical representativeness. Workers incentivised to provide their wage data might do so out of dissatisfaction, leading to potentially lower-than-average wages being reported. The data collection mechanism may also introduce distortion by encouraging workers to input potentially inaccurate information to access the available website statistics. Conversely, there is no public source that offers a comparable level of detail regarding individual salaries associated with specific job titles in the private sector. This source is one of the closest reflections of the actual salaries for researchers and other professionals in the private sector who may potentially use open-source inputs.

PayScale

This data source is not intended to provide labour cost savings directly; rather, it is one of the potential sources for individual wage data that is necessary to measure the metric.

PayScale is an American company that anonymously collects information on individual salaries. They provide salary data, including the share of bonuses or other components, taking into account variables that might influence the employee salary determination, e.g., company, country, and role. The data provided can be directly used in the avoided cost methodology. The company offers some insights into their quality control measures, integration of different sources, and methods for processing the data before distribution. PayScale is free accessible after having contributed to the data collection.

The primary limitation of this type of data lies in its collection method, which does not ensure statistical representativeness. Workers incentivised to provide their wage data might do so out of dissatisfaction, leading to potentially lower-than-average wages being reported. The data collection mechanism may also introduce distortion by encouraging workers to input potentially inaccurate information to access the available website statistics. On the other hand, there is no public source that offers a comparable level of detail regarding individual salaries associated with specific job titles in the private sector. This source is one of the closest reflections of the actual salaries for researchers and other professionals in the private sector who may potentially use OS resources.

Structure of Earnings Survey

The “structure of earnings survey” (SES) is conducted in EU countries, candidate countries, and European Free Trade Association (EFTA) countries by Eurostat. The survey aims to provide accurate data comparable across countries and over time on earnings. It is a large sample survey of enterprises on the relationships between the level of pay and individual characteristics of employees (sex, age, occupation, length of service, highest educational level attained, etc.) and those of their employer (economic activity, size, and location of the enterprise).

Compared to privately collected data, the SES is less detailed and often not updated, as it is conducted every four years. However, on the other hand, it ensures statistical representativeness and allows for cross-country comparisons.

Similar and even more detailed surveys might be available at the national level from the national statistical institutes in several countries.

OECD and ILO data on average wages

International organisations, such as the OECD and ILO, provide data on average wages per country, which are also associated with certain personal characteristics. Although this data is openly accessible and regularly updated, it is not usually the best option for calculating this metric since it would be highly approximate. However, in the absence of other data, they can be adjusted by considering wage differentials between different sectors and other individual characteristics to obtain a closer approximation of the individual salary needed.

Similar and even more detailed average data might be available at the national level from the national statistical institutes in several countries.

Savings for data storage given the availability of OS inputs

The metric “savings for data storage” quantifies the net cost savings achieved by not allocating production resources to data storage. While the use of OS resources may increase the need for storage, these inputs can also reduce or eliminate the need for paid storage. This is because open access facilitates easy referencing at any time, and open repositories can serve as alternatives to standard paid or in-house physical data storage solutions. The extent of savings in relation to total cost-saving depends on the reliance of the potential beneficiary on data storage.

Measurement.

The avoided cost method is one of the suitable ways of assessing these types of savings. It is particularly helpful since it is flexible enough to adapt to several contexts. In the case of data storage savings, the avoided cost method focuses on identifying and quantifying the costs that are avoided or reduced due to the availability of the OS resources that allow for substituting private data storage with open repositories or eliminating the need to store research outputs. The avoided cost method relies on the principle that if the OS resources were not implemented, certain costs would still be incurred to meet the needs or objectives OS seeks to address, such as storing data and other research outputs. The quantification implies forecasting the storage space needed over a span of time and the existence of the market price of a data storage service. No data source exists to extrapolate an absolute value of the savings since it varies depending on the beneficiaries’ needs. The approach to take would be to conduct market research that allows for comparing the paid alternatives in terms of features with the OS to choose the price of the most similar product. An example of a data storage fee can be retrieved, e.g., from the Dryad service.
However, when a similar storage service is not in the market, the benefit can be evaluated by adopting the Long Run Marginal Cost (LRMC) method, which measures the cost of increasing the production by one additional unit or the cost saved by reducing the production by one unit of output, holding the production levels of all other goods and services constant. This means that rather than valuing the money saved due to the reduced costs of buying the storage service, one values the saved production costs that would have been necessary to produce the equivalent in-house storage in a scenario where the OS is unavailable. LRMC of storage may include costs such as hardware, software development, personnel, and so on, and accurately avoid all the costs related, e.g., to collecting the data.

Existing datasources:

Dryad

Dryad works well as an example of the market price of data storage. Their data fee is usually billed quarterly, decreases with the volume of the data, and takes effect after the 10^th data publication since any user has to pay an annual membership fee. The fee ranges from 135 USD for 11-100 datasets to 55 USD for 500+ datasets.

Existing methodologies

The avoided cost method

To assess the value of the saving, one should retrieve the market price for a similar storage service by conducting market research and an estimation of the storage space needed over a specific period of time. When a market price for a perfect substitute storage service is unavailable, it can be considered the price of a product that is as similar as possible to a perfect substitute or any other reasonable solution representing a virtual price for the storage. Then, the saving metric is equal to:

Where:

is the quantity of storage space needed expressed per unit of time (e.g., 1 GB yearly, semestral, quarterly) that might vary over time;
is the market price that can vary depending on the total quantity and over time;
is the span of time over which the benefit should be computed and can be expressed in any unit of time.

Long Run Marginal Cost

The assessment of storage cost savings through LRMC entails estimating the cost of developing an in-house storage system. To estimate production costs, the following factors should be considered:

An approximation of the storage space needed for a specific period of time;
The initial hardware costs for physically storing the data, along with any significant hardware upgrades required over the period considered;
The labour costs associated with installing and maintaining the storage system, which include software development, upgrades, and operational support;
Other additional costs that are part of the production process.

Where:

is the quantity of storage space needed expressed per unit of time (e.g., 1 GB yearly, semestral, quarterly) that might vary over time;
is the production cost expressed per unit of storage space;
is the span of time over which the benefit should be computed and can be expressed in any unit of time.

Known correlates

Cost savings are directly linked to the metrics of “Innovation output” and “Industry adoption of research findings”. The rationale is that the savings accrued in both time and finances could potentially be redirected towards R&D investments. Over time, this reallocation could result in a significant increase in R&D productivity and innovation. Furthermore, the cost savings over time might trigger “Economic growth of companies” in terms of variations in productivity and assets. For instance, the money saved from lower access and storage costs can be reinvested in the company. This reinvestment could go towards R&D – which also relates to the innovation output indicator – and expanding operational capacity, which can drive revenue growth and increase the company’s assets and innovation capability, both key ingredients of company growth. Similarly, by saving time, companies can achieve more with the same or fewer resources. This means that businesses can offer more products or services or improve the quality of their offerings without a corresponding increase in costs. Over time, this contributes to economic growth by enhancing the company’s competitive edge and market share. The direct involvement of efficiency improvements in the economic growth of companies is also evident in asset optimisation. By maximising the utility of existing assets (mostly intangibles), companies can achieve higher returns on investment (ROI) over time.

The market wages are typically biased due to several factors related to labour product markets. However, when considering the researchers wage, it is often argued that they are a good approximation of the social cost of labour given the assumption (Guide to Cost-Benefit Analysis of Investment Projects: Economic Appraisal Tool for Cohesion Policy 2014-2020 2015). ↑

References

Antoniou, Constantinos, Evangelos Matsoukis, and Penelope Roussi. 2007. “A Methodology for the Estimation of Value-of-Time Using State-of-the-Art Econometric Models.” Journal of Public Transportation 10 (3): 119. https://www.sciencedirect.com/science/article/pii/S1077291X22002995.

Delugas, Erica, Gelsomina Catalano, and Silvia Vignetti. 2023. “PathOS - D4.1 Methodological note on the CBA of open science practices,” December. https://zenodo.org/records/10277642.

Fell, Michael J. 2019. “The Economic Impacts of Open Science: A Rapid Evidence Assessment.” Publications 7 (3): 46. https://www.mdpi.com/2304-6775/7/3/46.

Guide to Cost-Benefit Analysis of Investment Projects: Economic Appraisal Tool for Cohesion Policy 2014-2020. 2015. Luxembourg: European Union.

Johnston, Robert J., Kevin J. Boyle, Wiktor (Vic) Adamowicz, Jeff Bennett, Roy Brouwer, Trudy Ann Cameron, W. Michael Hanemann, et al. 2017. “Contemporary Guidance for Stated Preference Studies.” Journal of the Association of Environmental and Resource Economists 4 (2): 319–405. https://doi.org/10.1086/691697.

Lee, Wen Hwa. 2015. “Open Access Target Validation Is a More Efficient Way to Accelerate Drug Discovery.” PLoS Biology 13 (6): e1002164. https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002164.

Mahieu, Pierre-Alexandre, Henrik Andersson, Olivier Beaumais, Romain Crastes, and François-Charles Wolff. 2014. “Is Choice Experiment Becoming More Popular Than Contingent Valuation? A Systematic Review in Agriculture, Environment and Health.” FAERE-French Association of Environmental and Resource Economists Working Papers, no. 2014.12. https://ideas.repec.org/p/fae/wpaper/2014.12.html.

Reuse

Citation

BibTeX citation:

@online{apartis2023,
  author = {Apartis, S. and Catalano, G. and Consiglio, G. and Costas,
    R. and Delugas, E. and Dulong de Rosnay, M. and Grypari, I. and
    Karasz, I. and Klebel, Thomas and Kormann, E. and Manola, N. and
    Papageorgiou, H. and Seminaroti, E. and Stavropoulos, P. and Stoy,
    L. and Traag, V.A. and van Leeuwen, T. and Venturini, T. and
    Vignetti, S. and Waltman, L. and Willemse, T.},
  title = {PathOS - {D2.1} - {D2.2} - {Open} {Science} {Indicator}
    {Handbook}},
  date = {2023},
  url = {https://handbook.pathos-project.eu/indicator_templates/quarto/4_economic_impact/cost_savings.html},
  doi = {10.5281/zenodo.8305626},
  langid = {en}
}

For attribution, please cite this work as:

Apartis, S., G. Catalano, G. Consiglio, R. Costas, E. Delugas, M. Dulong de Rosnay, I. Grypari, et al. 2023. “PathOS - D2.1 - D2.2 - Open Science Indicator Handbook.” Zenodo. 2023. https://doi.org/10.5281/zenodo.8305626.