Digital Curation & Digital Preservation

"Recognising Open Research Data in Research Assessment: Overview of Practices and Challenges"

The literature review aims at identifying content and key issues regarding the assessment of ORD practices nationally and internationally. It starts from the observation that research assessment needs to be reformed as they are currently biased towards scientific publications. Internationally, discussions and projects thereon have emerged. To contextualise recORD and this literature review, we first describe international and Swiss initiatives for reforming research assessment and how they include ORD recognition. The remainder of the review follows an innovative methodology as it identifies first core values in responsible research assessment, and second existing frameworks, to thirdly derive propositions to keep in mind when developing concrete ORD-specific research assessment recommendations. In a final section, the review presents further readings and useful weblinks on the recognition of ORD in research assessment.

https://zenodo.org/doi/10.5281/zenodo.11060206

"The Products and Multi-Disciplinarity of Data-Centric Tasks: Influences on Data Searchers’ Behaviors and Cognition"

The study sought to answer the following research questions:

RQ 1: How do data-centric tasks with different products and levels of multi-disciplinarity affect data search behaviors?

RQ 2: How do data-centric tasks with different products and levels of multi-disciplinarity affect the utilization of different cognitive systems?

https://doi.org/10.1016/j.lisr.2024.101302

"An Analysis of the Effects of Sharing Research Data, Code, and Preprints on Citations"

In this study, we investigate whether adopting one or more Open Science practices leads to significantly higher citations for an associated publication, which is one form of academic impact. We use a novel dataset known as Open Science Indicators, produced by PLOS and DataSeer, which includes all PLOS publications from 2018 to 2023 as well as a comparison group sampled from the PMC Open Access Subset. In total, we analyze circa 122’000 publications. We calculate publication and author-level citation indicators and use a broad set of control variables to isolate the effect of Open Science Indicators on received citations. We show that Open Science practices are adopted to different degrees across scientific disciplines. We find that the early release of a publication as a preprint correlates with a significant positive citation advantage of about 20.2% on average. We also find that sharing data in an online repository correlates with a smaller yet still positive citation advantage of 4.3% on average. However, we do not find a significant citation advantage for sharing code.

https://arxiv.org/abs/2404.16171

"Health Data Sharing Attitudes Towards Primary and Secondary Use of Data: A Systematic Review"

Of 2109 studies identified through our search, 116 were included in the qualitative synthesis, yielding a total of 228,501 participants and various types of HD represented: person-generated HD (n = 17 studies and 10,771 participants), personal HD in general (n = 69 studies and 117,054 participants), Biobank data (n = 7 studies and 27,073 participants), genomic data (n = 13 studies and 54,716 participants), and miscellaneous data (n = 10 studies and 18,887 participants). The majority of studies had a moderate level of quality (83 [71.6%] of 116 studies), but varying levels of quality were observed across the included studies. Overall, studies suggest that sharing intentions for primary purposes were observed to be high regardless of data type, and it was higher than sharing intentions for secondary purposes. Sharing for secondary purposes yielded variable findings, where both the highest and the lowest intention rates were observed in the case of studies that explored sharing biobank data (98% and 10%, respectively). Several influencing factors on sharing intentions were identified, such as the type of data recipient, data, consent. Further, concerns related to data sharing that were found to be mutual for all data types included privacy, security, and data access/control, while the perceived benefits included those related to improvements in healthcare. Findings regarding attitudes towards sharing varied significantly across sociodemographic factors and depended on data type and type of use. In most cases, these findings were derived from single studies and therefore warrant confirmations from additional studies. . ..

Sharing health data is a complex issue that is influenced by various factors (the type of health data, the intended use, the data recipient, among others) and these insights could be used to overcome barriers, address people’s concerns, and focus on spreading awareness about the data sharing process and benefits.

https://doi.org/10.1016/j.eclinm.2024.102551

"Seek and You May (Not) Find: A Multi-Institutional Analysis of Where Research Data Are Shared"

Research data sharing has become an expected component of scientific research and scholarly publishing practice over the last few decades, due in part to requirements for federally funded research. As part of a larger effort to better understand the workflows and costs of public access to research data, this project conducted a high-level analysis of where academic research data is most frequently shared. To do this, we leveraged the DataCite and Crossref application programming interfaces (APIs) in search of Publisher field elements demonstrating which data repositories were utilized by researchers from six academic research institutions between 2012–2022. In addition, we also ran a preliminary analysis of the quality of the metadata associated with these published datasets, comparing the extent to which information was missing from metadata fields deemed important for public access to research data. Results show that the top 10 publishers accounted for 89.0% to 99.8% of the datasets connected with the institutions in our study. Known data repositories, including institutional data repositories hosted by those institutions, were initially lacking from our sample due to varying metadata standards and practices. We conclude that the metadata quality landscape for published research datasets is uneven; key information, such as author affiliation, is often incomplete or missing from source data repositories and aggregators. To enhance the findability, interoperability, accessibility, and reusability (FAIRness) of research data, we provide a set of concrete recommendations that repositories and data authors can take to improve scholarly metadata associated with shared datasets.

https://doi.org/10.1371/journal.pone.0302426

"Data Services at the Academic Library: A Natural History of Horses and Unicorns"

Methods: We used a web-based inventory of 25 academic libraries at U.S. Research 1 (R1) Carnegie institutions to assess the state of data services at university libraries. We categorized and quantified services, and tested for an effect of library resourcing on the size of library data service portfolios.

Results: Support for data management and geospatial services was relatively widespread, with increasing support in areas of data analyses and data visualization. There was significant variation among services in the modality in which they were offered (web, consult, instruction) and library resourcing had a significant effect on the number of data services a library offered.

https://doi.org/10.7191/jeslib.780

Research Data Management for Arts and Humanities: Intergrating Voices of the Community

Chapter one gives an overview of the European and national policy environment which has given rise to research data management and sharing mandates, as well as the institutional support structures around them. In chapter two, which is dedicated to implementation and everyday practice, the authors of this publication share how their institutions have developed capacities to accommodate data support professions, and also share their own career paths leading to such roles. After the first two chapters have set the stage and recounted the authors’ reflections on these new roles, the rest of the publication highlights and discusses some of the key domain-specificities of research data management in the Arts and Humanities. Chapter 3.1 reflects on the implications of the lack of consensus around the notion of data within the Arts and Humanities domain through a case study of digital critical editions. Chapter 3.2 addresses the challenges around the, essentially, multilingual character of arts and humanities data, with special focus on multilingual vocabularies and thesauri. Chapter 3.3 provides support for research scenarios where open data sharing is either impossible or is difficult due to legal and ethical limitations, and navigates the complexities of intellectual property and the application of regulatory frameworks, including restrictions on text and data mining, and authentication and authorisation in an open world. Clearly, the discourse on data sharing cannot be complete without discussing the current limitations within research assessment and rewards criteria, nor highlighting initiatives which aim to incentivise and reward data sharing in the working/professional contexts of the Working Group’s members. A discussion on rewards can be found in Chapter 3.4. Chapter 3.5 addresses one of the most widely shared data management challenges within the domain and brings together use cases concerning successful collaborations between cultural heritage institutions and arts and humanities research teams. Finally, Chapter 3.6 showcases good practices in long-term archiving

https://tinyurl.com/ycxbrh33

"Assessing Quality Variations in Early Career Researchers’ Data Management Plans"

This paper aims to better understand early career researchers’ (ECRs’) research data management (RDM) competencies by assessing the contents and quality of data management plans (DMPs) developed during a multi-stakeholder RDM course. We also aim to identify differences between DMPs in relation to several background variables (e.g., discipline, course track). The Basics of Research Data Management (BRDM) course has been held in two multi-faculty, research-intensive universities in Finland since 2020. In this study, 223 ECRs’ DMPs created in the BRDM of 2020 – 2022 were assessed, using the recommendations and criteria of the Finnish DMP Evaluation Guide + General Finnish DMP Guidance (FDEG). The median quality of DMPs appeared to be satisfactory. The differences in rating according to FDEG’s three-point performance criteria were statistically insignificant between DMPs developed in separate years, course tracks or disciplines. However, using content analysis, differences were found between disciplines or course tracks regarding DMP’s key characteristics such as sharing, storing, and preserving data. DMPs that contained a data table (DtDMPs) also differed highly significantly from prose DMPs. DtDMPs better acknowledged the data handling needs of different data types and improved the overall quality of a DMP. The results illustrated that the ECRs had learned the basic RDM competencies and grasped their significance to the integrity, reliability, and reusability of data. However, more focused, further training to reach the advanced competency is needed, especially in areas of handling and sharing personal data, legal issues, long-term preserving, and funders’ data policies. Equally important to the cultural change when RDM is an organic part of the research practices is to merge research support services, processes, and infrastructure into the research projects’ processes. Additionally, incentives are needed for sharing and reusing data.

https://doi.org/10.2218/ijdc.v18i1.873

Digital Scholarship and DigitalKoans Are Now 19 Years Old

Digital Scholarship and DigitalKoans were established on 4/20/2005. Digital Scholarship provides information and commentary about artificial intelligence, digital copyright, digital curation, open access, research data management, scholarly communication, and other digital information issues. Digital Scholarship is an open access noncommercial publisher. All of its publications are currently under a Creative Commons Attribution License.

DigitalKoans has published over 16,200 posts. Since 2008, over 5,600 job ads have been posted, with slightly over 4,000 of them for digital library jobs.

Digital Scholarship has published the following books and book supplements: the Open Access Bibliography: Liberating Scholarly Literature with E-Prints and Open Access Journals (2005; published with the Association of Research Libraries), the Scholarly Electronic Publishing Bibliography: 2008 Annual Edition (2009), Digital Scholarship 2009 (2010), Transforming Scholarly Publishing through Open Access: A Bibliography (2010), the Scholarly Electronic Publishing Bibliography 2010 (2011), the Digital Curation and Preservation Bibliography 2010 (2011), the Institutional Repository and ETD Bibliography 2011 (2011), the Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works (2012), the Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works, 2012 Supplement (2013), and the Research Data Curation and Management Bibliography (2021).

It has also published and updated the following bibliographies, webliographies, and weblogs: the Scholarly Electronic Publishing Bibliography (1996-2011), the Scholarly Electronic Publishing Weblog (2001-2013), the Electronic Theses and Dissertations Bibliography (2005-2021), the Google Books Bibliography (2005-2011), the Institutional Repository Bibliography (2009-2011), the Open Access Journals Bibliography (2010), the Digital Curation and Preservation Bibliography (2010-2011), the E-science and Academic Libraries Bibliography (2011), the Digital Curation Resource Guide (2012), the Research Data Curation Bibliography (2012-2019), the Altmetrics Bibliography (2013), the Transforming Peer Review Bibliography (2014), the Academic Library as Scholarly Publisher Bibliography (2018-2023), the Research Data Sharing and Reuse Bibliography (2021), the Research Data Publication and Citation Bibliography (2022), Digital Curation Certificate and Master’s Degree Programs (2023), the Academic Libraries and Research Data Management Bibliography (2023), and the Artificial Intelligence and Libraries Bibliography (2023).

"CHORUS Forum: 12 Best Practices for Research Data Sharing — Summary And Comments"

At last month’s CHORUS Forum: 12 Best Practices for Research Data Sharing speakers addressed the Joint Statement on Research Data Sharing by STM, DataCite and Crossref. The forum was moderated by Howard Ratner, Executive Director, CHORUS and sponsored by AIP Publishing, Association of American Publishers, Crossref, GeoScienceWorld, and STM.

https://tinyurl.com/3ubnw4db

"Open Data Ownership and Sharing: Challenges and Opportunities for Application of Fair Principles and a Checklist for Data Managers"

The amount of data generated across various disciplines has been steadily increasing and is projected to experience exponential growth in the foreseeable future. This underscores the pressing need for proficient and streamlined data management. Data has proven to be a crucial tool in addressing complex societal challenges on a global scale. However, the challenge of producing and openly disseminating data that are easily discoverable, accessible, interoperable, and reusable (FAIR) has emerged as a significant concern for policymakers. The potential for data to be repurposed for advancing scientific research and innovation across different disciplines is contingent on its willingness to be shared. This paper employs a systematic literature review to investigate the motivating factors, advantages, and obstacles associated with open data sharing. Additionally, it explores governance frameworks that can create unique opportunities for implementing FAIR principles in real-time scientific research.

https://doi.org/10.1016/j.jafr.2024.101157

Paywall: "Changes in Digital Collections and Their Metadata: A Longitudinal Study of UIUC Digital Library"

This article showcases the evolution of digital collections and their metadata at the University of Illinois Urbana-Champaign (UIUC) Library in the last 20 years. It discusses the growth of its collections and their characteristics, examines historical changes in the use of metadata elements, and explores responses to the changing nature of digitized and born-digital materials. Based on a large-scale data analysis of the digital collections and their metadata housed in UIUC Digital Library, the paper also examines the challenges and opportunities of the curation and management of digital collections and digital libraries in the future.

https://doi.org/10.1080/19386389.2024.2338015

Paywall: "Global Status of Dataset Repositories at a Glance: Study Based on OpenDOAR"

Developed countries like the United Kingdom and the USA are primarily involved in the development of institutional open-access repositories comprising significant components of OpenDOAR. The most extensively used software is DSpace. Most data set archives are OAI-PMH compliant but do not follow open-access rules. . . . Furthermore, the study concludes that the number of data sets kept in repositories is insufficient, although the expansion of such repositories has been consistent over the years.

https://doi.org/10.1108/DLP-11-2023-0094

"The Future of Data in Research Publishing: From Nice to Have to Need to Have?"

Science policy promotes open access to research data for purposes of transparency and reuse of data in the public interest. We expect demands for open data in scholarly publishing to accelerate, at least partly in response to the opacity of artificial intelligence (AI) algorithms. Open data should be findable, accessible, interoperable, and reusable (FAIR), and also trustworthy and verifiable. The current state of open data in scholarly publishing is in transition from ‘nice to have’ to ‘need to have.’ Research data are valuable, interpretable, and verifiable only in context of their origin, and with sufficient infrastructure to facilitate reuse. Making research data useful is expensive; benefits and costs are distributed unevenly. Open data also poses risks for provenance, intellectual property, misuse, and misappropriation in an era of trolls and hallucinating AI algorithms. Scholars and scholarly publishers must make evidentiary data more widely available to promote public trust in research. To make research processes more trustworthy, transparent, and verifiable, stakeholders need to make greater investments in data stewardship and knowledge infrastructures.

https://doi.org/10.1162/99608f92.b73aae77

"Common Metadata Framework for Research Data Repository: Necessity to Support Open Science"

The present study describes the features of a select number of RDRs and analyzes their metadata practices: Harvard Dataverse, Dryad, Figshare, Zenodo, and the Open Science Framework (OSF). It further examines the total number of metadata elements, common metadata elements, required metadata elements, and item-level metadata. Results indicate that even though Harvard Dataverse has the most metadata elements, Dryad provides rich metadata concerning item level. This study suggests a common metadata framework, richer metadata elements, and more features to make the research data’s interoperability possible from one RDR to another.

https://doi.org/10.1080/19386389.2024.2329370

"The Fair for Research Software Principles after Two Years: An Adoption Update"

It should be noted that while the many activities listed here support increasing FAIRness of research software, most of them do not address aspects of all four of the FAIRness of research software foundational principles. . . . This reflects that the FAIR4RS Principles are aspirational and high-level, and do not contain detailed guidance on how to achieve them. This is because specific technologies and tools are always changing, while the principles are intended to be long-lasting. Consequently, additional work is needed to make it simpler for people wanting to follow the FAIR4RS Principles to know how to practically do so. The following initiatives are assisting in achieving this, with some of these initiatives specifically addressing the range of opportunities for future work identified in 2022 by the FAIR4RS Working Group, which developed the FAIR4RS Principles.

https://www.researchsoft.org/blog/2024-03/

2024 Fedora Technology Assessment Report

The Fedora Program Team, in collaboration with the Technology Working Group, designed a project to understand the specific Fedora-related priorities of using institutions, along with the capacity and available resources of both individuals and institutions to contribute to the Fedora community between 2024 and 2026. They collaborated with the Research and Innovation Division at Lyrasis to survey Fedora users. Responses were collected between November 2023 and January 31, 2024, and analyzed by Leigh A. Grinstead, Senior Digital Services Consultant from Lyrasis, an independent, nonprofit, research group.

https://tinyurl.com/2s4b4rec

"Introducing HathiTrust’s New Strategic Vision"

The new strategic directions acknowledge this by prioritizing work in the following ways:

Enabling broader, more expansive access to the collection, including lawful access to copyrighted materials.

Expanding and diversifying the subject matter and sources of the collection, while reaffirming our focus on books and serial content.

Taking an ambitious, proactive approach to the stewardship of metadata and content, with investments in existing and emerging technologies to enrich the metadata, content, and user experience.

Adopting a renewed focus on the development of flexible and resilient technical and organizational infrastructure.

https://tinyurl.com/bdnnrcw7

"Publicly Shared Data: A Gap Analysis of Researcher Actions and Institutional Support throughout the Data Life Cycle"

[This report] examines research data management and sharing practices at six research-intensive academic institutions: Cornell University, Duke University, University of Michigan, University of Minnesota, Virginia Tech, and Washington University in St. Louis. Sponsored by the US National Science Foundation (grant #2135874) and part of ARL’s Realities of Academic Data Sharing (RADS) Initiative, this report highlights where service gaps may exist between researchers’ needs and the services and support provided by institutions.

https://tinyurl.com/mtdjvecu

"Research Data Management Sustainability: Services, Infrastructure, Accountability, and Planning"

This study aims to update on the status of RDMS service offerings, staffing and funding, and presents them according to the number of years a library has offered the service. This work also investigates RDMS service fulfillment, accountability in providing support, and planning strategies within the same institution sample. Updating the RDMS status, broadening the facets addressed, and presenting the data by cohort provides detail into how services have been maintained or developed so that institutions at a similar stage can make clearer decisions about how to keep RDMS sustainable.

https://tinyurl.com/22cexhrt

"Developing Text and Data Mining (TDM) Support within a University Research Library"

The introduction of the text and data mining (TDM) exception in 2014 led to researchers asking for support from staff within Library Services at the University of Birmingham. An initial involvement with a funded corpus linguistics project fostered an effective partnership between the Copyright and Licensing Team and the University’s Research Infrastructure Team. This case study traces the TDM journey that Library Services has subsequently undertaken. The article will look at how staff in Copyright and Licensing and the Research Skills Team identified the original service gap. It will also look at issues impacting on supporting TDM and the results of a TDM survey that was sent to researchers. It concludes with a reflection on how the service might evolve in the future — from the creation and availability of TDM datasets, to the skills development of both librarians and the university communities they support, and the impact artificial intelligence (AI) developments might have on TDM practices.

https://doi.org/10.1629/uksg.646

"The Challenges of Open Data Sharing for Qualitative Researchers"

"Open Science" advocates for open access to scientific research, as well as sharing data, analysis plans and code in order to enable replication of results. However, these requirements typically fail to account for methodological differences between quantitative and qualitative research, and serious ethical problems are raised by the suggestion that full qualitative datasets can or should be published alongside qualitative research papers. Aside from important ethical concerns, the idea of sharing qualitative data in order to enable replication is conceptually at odds with the underpinnings on most qualitative methodologies, which highlight the importance of the unique interpretative function of the researcher. The question of whether secondary analysis of qualitative data is acceptable is key, and in this commentary we argue that there are good conceptual, ethical and economic reasons to consider how funders, researchers and publishers can make better use of existing data.

https://doi.org/10.1177/13591053241237620

Open Data: From Theory to Practice: Case Studies and Commentary from Libraries, Publishers, Funders and Industry

From theory to practice is the first time in the nine-year history of The State of Open Data that a supplementary publication has expanded upon the main report’s years of survey results about open data, involving tens of thousands of researchers globally.

Each case study and commentary is told from the perspective of a research stakeholder group:

Funding bodies: The NIH Generalist Repository Ecosystem Initiative: meeting community needs for FAIR data sharing and discovery

Scholarly Publishers: Operationalize data policies through collaborative approaches – the momentum is now

University Libraries: One size does not fit all: an investigation into how institutional libraries are tailoring support to their researchers’ needs

Industry: How Open Pharma supports responsible data sharing for pharma research publications.

https://tinyurl.com/ytcxprn7

"Evaluating an Instructional Intervention for Research Data Management Training "

At a large research university in Canada, a research data management (RDM) specialist and two liaison librarians partnered to evaluate the effectiveness of an active learning component of their newly developed RDM training program. . . . This study relies on a pre- and post-test quasi-experimental intervention during introductory RDM workshops offered 12 times between February 2022 and January 2023. . . . Comparing the overall average scores for each participant pre- and post-instruction intervention, we find that workshop participants, in general, improved in proficiency. The results of a Wilcoxon signed-rank test demonstrate that the difference between the pre- and post-test observations is statistically significant with a high effect size.

https://tinyurl.com/2wvt5bhj

The Research Data Services Landscape at US and Canadian Higher Education Institutions

The following are our high-level findings:

While there are wide divergences in the number and variety of services offered both within and across Carnegie Classifications, R1 institutions offer approximately three times the number of services offered by R2s, and more than nine times the number offered by liberal arts colleges.

General research data services are the most common type offered regardless of institution type. Statistical services, geospatial services, and visualization services are also common at research universities, which typically offer a much wider range of specialized services than liberal arts colleges.

Libraries remain the largest provider of research data services at US and Canadian research universities, but IT and units associated with the research office play important collaborative roles, especially with specialized services.

Bioinformatics services are offered almost exclusively through the interdisciplinary units associated with the research office or core facilities associated with medical schools.

Consulting services are the most common mode of service provision, comprising almost three quarters of all data services.

https://doi.org/10.18665/sr.320420