Digital Curation & Digital Preservation – Page 10

"Data Journals: Where Data Sharing Policy Meets Practice"

Data journals incorporate elements of traditional scholarly communications practices—reviewing for quality and rigor through editorial and peer-review—and the data sharing / open data movement—prioritizing broad dissemination through repositories, sometimes with curation or technical checks. Their goals for dataset review and sharing are recorded in journal-based data policies and operationalized through workflows. In this qualitative, small cohort semi-structured interview study of eight different journals that review and publish research data, we explored (1) journal data policy requirements, (2) data review standards, and (3) implementation of standardized data evaluation workflows. Differences among the journals can be understood by considering editors’ approaches to balancing the interests of varied stakeholders. Assessing data quality for reusability is primarily conditional on fitness for use which points to an important distinction between disciplinary and discipline-agnostic data journals.

https://doi.org/10.17615/nqtz-b568

"Who Re-Uses Data? A Bibliometric Analysis of Dataset Citations"

Open data is receiving increased attention and support in academic environments, with one justification being that shared data may be re-used in further research. But what evidence exists for such re-use, and what is the relationship between the producers of shared datasets and researchers who use them? Using a sample of data citations from OpenAlex, this study investigates the relationship between creators and citers of datasets at the individual, institutional, and national levels. We find that the vast majority of datasets have no recorded citations, and that most cited datasets only have a single citation. Rates of self-citation by individuals and institutions tend towards the low end of previous findings and vary widely across disciplines. At the country level, the United States is by far the most prominent exporter of re-used datasets, while importation is more evenly distributed. Understanding where and how the sharing of data between researchers, institutions, and countries takes place is essential to developing open research practices.

https://arxiv.org/abs/2308.04379

"Metadata Standard for Continuous Preservation, Discovery, and Reuse of Research Data in Repositories by Higher Education Institutions: A Systematic Review"

This systematic review synthesised existing research papers that explore the available metadata standards to enable researchers to preserve, discover, and reuse research data in repositories. This review provides a broad overview of certain aspects that must be taken into consideration when creating and assessing metadata standards to enhance research data preservation discoverability and reusability strategies. Research papers on metadata standards, research data preservation, discovery and reuse, and repositories published between January 2003 and April 2023 were reviewed from a total of five databases. The review retrieved 1597 papers, and 13 papers were selected in this review. We revealed 13 research articles that explained the creation and application of metadata standards to enhance preservation, discovery, and reuse of research data in repositories. Among them, eight presented the three main types of metadata, descriptive, structural, and administrative, to enable the preservation of research data in data repositories. We noted limited evidence on how these metadata standards can be used to enhance the discovery and reuse of research data in repositories to enable the preservation, discovery, and reuse of research data in repositories. No reviews indicated specific higher education institutions employing metadata standards for the research data created by their researchers. Repository designs and a lack of expertise and technology know-how were among the challenges identified from the reviewed papers. The review has the potential to influence professional practice and decision-making by stakeholders, including researchers, students, librarians, information communication technologists, data managers, private and public organisations, intermediaries, research institutions, and non-profit organizations.

https://doi.org/10.3390/info14080427

"ARL Awarded Grant to Continue Research on Institutional Expenses for Public Access to Research Data"

The US Institute of Museum and Library Services (IMLS) has awarded the Association of Research Libraries (ARL), in collaboration with Duke University, the University of Minnesota, and Washington University in St. Louis, all of whom are members of the Data Curation Network (DCN), a $741,921 National Leadership Grant to examine institutional expenses for public access to research data. This research builds upon ARL’s existing Realities of Academic Data Sharing initiative.

https://tinyurl.com/378dzab6

"Images, an Overview"

Images have been historical records since the advent of photography. High-resolution photography laid the groundwork for the digitization process known today and has continued to bolster the cultural heritage sector. An overview of images in the context of library and information science (LIS) is a story of how libraries have adopted aspects of the commercial image production environment, expensive digitization equipment, and considerable information technology infrastructure to provide image resources to their users. This entry [of the Encyclopedia of Libraries, Librarianship, and Information Science] discusses images in the LIS field and considers the concepts, tools, and best practices that surround the prevalence of images as primary sources.

https://hdl.handle.net/10657/15041

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Association of Research Libraries and California Digital Library Receive Grant to Advance Data Management and Sharing"

The Association of Research Libraries (ARL) and the California Digital Library (CDL) have received a $668,048 National Leadership Grant from the US Institute of Museum and Library Services (IMLS) to assist institutions in managing and sharing federally funded research data. This project will build a machine-actionable data-management plan (maDMP) tool by enhancing and developing new DMPTool features utilizing persistent identifiers (PIDs). CDL and ARL will work together to further strengthen institutional capacity for tracking research outputs by piloting the institutional integration of maDMPs across an academic campus and building community across institutions for maDMPs.

https://tinyurl.com/35x9d45z

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Progressing with Patience: An Unflinching Look at the Challenges of Digital Preservation"

Many academic libraries have devoted significant time, resources, and strategy to developing approaches that steward digital assets responsibly into the future. This paper examines how one academic library’s experience [University of Nevada, Las Vegas, Las Vegas] with this work has progressed over nearly a decade, and compares the experience to trends in the field. The point of view of technical services, digital collections, and management, are represented and specific workflows are shared. The paper takes a close look at challenges faced, explains how strategy has evolved over time, and shares examples of how other organizations might benefit from a shift in how progress is assessed through a new perspective on success.

https://repository.ifla.org/handle/123456789/2689

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"New at Dryad: Support for NIH-funded researchers"

Dryad provides a simple submission process that makes it easy for researchers to upload your datasets, apply metadata that makes them discoverable and reusable, and get a persistent identifier (DOI) you can use in grant reporting. Once submitted, datasets are made publicly accessible so they can be reused by others in order to advance scientific discovery and collaboration across disciplines. Dryad also provides an extensive library of existing datasets from various sources, including those funded by NIH grants, that are completely free to access and reuse.

https://tinyurl.com/4uu9tz2r

Paywall: "Human-AI Interaction for Exploratory Search & Recommender Systems with Application to Cultural Heritage "

This dissertation introduces three primary contributions through publicly deployed sys- tems and datasets. First, we demonstrate how the construction of large-scale cultural heritage datasets using machine learning can answer interdisciplinary questions in library & information science and the humanities (Chapter 2). Second, based on the feedback of users of these cultural heritage datasets, we introduce open faceted search, an extension of faceted search that leverages human-AI interaction affordances to empower users to define their own facets in an open domain fashion (Chapter 3). Third, encountering similar challenges with the deluge of scientific papers, we explore the question of how to improve recommender systems through human-AI interaction and tackle the broad challenge of advice taking for opaque machine learners (Chapter 4).

https://tinyurl.com/yc59txc5

"eLife and PREreview to Enhance the ‘Publish, Review, Curate’ Ecosystem Through Adoption of COAR Notify"

The project will put in place the basic infrastructure and protocols needed for all-round and standardised connections between preprint repositories, community-led preprint review platforms, journals, and preprint review aggregation and curation platforms. The aim is to lower existing technological and cost barriers so that as many of these services as possible can more easily participate in the ‘publish, review, curate’ future for research.

https://tinyurl.com/36emyk9b

"Policy Recommendations to Ensure That Research Software Is Openly Accessible and Reusable"

There is now an opportunity to expand US federal policies in similar ways and align their research software sharing aspects across agencies.

To do this, we recommend:

As part of their updated policy plans submitted in response to the 2022 OSTP memo, US federal agencies should, at a minimum, articulate a pathway for developing guidance on research software sharing, and, at a maximum, incorporate research software sharing requirements as a necessary extension of any data sharing policy and a critical strategy to make data truly FAIR (as these principles have been adapted to apply to research software [12]).

As part of sharing requirements, federal agencies should specify that research software should be deposited in trusted, public repositories that maximize discovery, collaborative development, version control, long-term preservation, and other key elements of the National Science and Technology Council’s "Desirable Characteristics of Data Repositories for Federally Funded Research" [13], as adapted to fit the unique considerations of research software.

US federal agencies should encourage grantees to use non-proprietary software and file formats, whenever possible, to collect and store data. We realize that for some research areas and specialized techniques, viable non-proprietary software may not exist for data collection. However, in many cases, files can be exported and shared using non-proprietary formats or scripts can be provided to allow others to open files.

Consistent with the US Administration’s approach to cybersecurity [<14], federal agencies should provide clear guidance on measures grantees are expected to undertake to ensure the security and integrity of research software. This guidance should encompass the design, development, dissemination, and documentation of research software. Examples include the National Institute of Standards and Technology’s secure software development framework and Linux Foundation’s open source security foundation.

As part of the allowable costs that grantees can request to help them meet research sharing requirements, US federal agencies should include reasonable costs associated with developing and maintaining research software needed to maximize data accessibility and reusability for as long as it is practical. Federal agencies should ensure that such costs are additive to proposal budgets, rather than consuming funds that would otherwise go to the research itself.

US federal agencies should encourage grantees to apply licenses to their research software that facilitate replication, reuse, and extensibility, while balancing individual and institutional intellectual property considerations. Agencies can point grantees to guidance on desirable criteria for distribution terms and approved licenses from the Open Source Initiative.

In parallel with the actions listed above that can be immediately incorporated into new public access plans, US federal agencies should also explore long-term strategies to elevate research software to co-equal research outputs and further incentivize its maintenance and sharing to improve research reproducibility, replicability, and integrity.

https://doi.org/10.1371/journal.pbio.3002204

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Trends in Research Data Management and Academic Health Sciences Libraries"

Spurred by the National Institute of Health mandating a data management and sharing plan as a requirement of grant funding, research data management has exploded in importance for librarians supporting researchers and research institutions. This editorial examines the role and direction of libraries in this process from several viewpoints. Key markers of success include collaboration, establishing new relationships, leveraging existing relationships, accessing multiple avenues of communication, and building niche expertise and cachè as a valued and trustworthy partner. [Article includes case studies.]

https://doi.org/10.1080/02763869.2023.2218776

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"How Are Exclusively Data Journals Indexed in Major Scholarly Databases? An Examination of the Web of Science, Scopus, Dimensions, and OpenAlex"

As part of the data-driven paradigm and open science movement, the data paper is becoming a popular way for researchers to publish their research data, based on academic norms that cross knowledge domains. Data journals have also been created to host this new academic genre. The growing number of data papers and journals has made them an important large-scale data source for understanding how research data is published and reused in our research system. One barrier to this research agenda is a lack of knowledge as to how data journals and their publications are indexed in the scholarly databases used for quantitative analysis. To address this gap, this study examines how a list of 18 exclusively data journals (i.e., journals that primarily accept data papers) are indexed in four popular scholarly databases: the Web of Science, Scopus, Dimensions, and OpenAlex. We investigate how comprehensively these databases cover the selected data journals and, in particular, how they present the document type information of data papers. We find that the coverage of data papers, as well as their document type information, is highly inconsistent across databases, which creates major challenges for future efforts to study them quantitatively. As a result, we argue that efforts should be made by data journals and databases to improve the quality of metadata for this emerging genre.

https://arxiv.org/abs/2307.09704

"Prevalence and Predictors of Data and Code Sharing in the Medical and Health Sciences: Systematic Review with Meta-Analysis of Individual Participant Data"

The review found that public code sharing was persistently low across medical research. Declarations of data sharing were also low, increasing over time, but did not always correspond to actual sharing of data. The effectiveness of mandatory data sharing policies varied substantially by journal and type of data, a finding that might be informative for policy makers when designing policies and allocating resources to audit compliance.

https://doi.org/10.1136/bmj-2023-075767

Directions in Digital Scholarship: Support for Digital, Data-Intensive, and Computational Research in Academic Libraries

This report of a 2023 Coalition for Networked Information (CNI) initiative takes a broad look at library engagement with digital scholarship (DS) and examines connections with data-intensive and computational research over roughly the past five years and into the future. . . . To understand trends in DS programs, including attention to the impact of the pandemic, especially with reference to the importance of physical spaces and in-person programming, evidence was gathered from several sources, including online interviews with 12 library and DS leaders, profiles of 47 libraries’ DS programs, and conversations during two online forums representing a total of 24 institutions. Findings from these sources are analyzed and synthesized in this report.

https://tinyurl.com/398nzhcx

"Signing Data Citations Enables Data Verification and Citation Persistence"

Increasingly, digital datasets are being published with assigned identifiers, then cited in papers as the basis for repeatable experiments. To help future readers find and verify data, customary citations can be extended with content signatures, which can be introduced without having to replace existing identifier such as DOIs and ARKs. That is, signatures can be seen as complementary identifiers to help keep specific versions of cited data findable and identifiable as they evolve and change locations. For example, if a DOI identifies an evolving dataset, rather than a fixed version — i.e., content drift is expected — the DOI can safely be cited for the sake of attribution, metadata linking, and citation statistics (e.g., by Crossref (https://www.crossref.org) and DataCite (https://datacite.org)), while the content signature helps the reader find the exact content that was cited, possibly with assistance from metadata linked to the DOI. Additionally, a citation that includes both the DOI (for example) and content signature of a dataset creates a fixed mapping between the two identifiers. Then, unintentional content drift by the DOI can be detected and reported, and an alternative location may potentially be discovered by consulting public content signature registries.

https://doi.org/10.1038/s41597-023-02230-y

"Archiving Website-Based References in Academic Papers: Problems Caused by Reference Rot, Potential Solutions and Limitations"

With this background in mind, this paper has three objectives. First, it provides several examples of studies that have attempted to quantify or characterize reference rot of web-based references, and consequences of this phenomenon. Second, we provide a short practical ‘manual’ that would allow academics or editors to manually archive web-based references at the Internet Archive. Third, we assess some technical and practical suggestions for improving the landscape of digital information preservation while taking into account human and technological limitations.

https://doi.org/10.1002/leap.1560

"Build, Access, Analyze: Introducing ARCH (Archives Research Compute Hub)"

ARCH helps users easily conduct and support computational research with digital collections at scale — e.g., text and data mining, data science, digital scholarship, machine learning, and more. Users can build custom research collections relevant to a wide range of subjects, generate and access research-ready datasets from collections, and analyze those datasets. In line with best practices in reproducibility, ARCH supports open publication and preservation of user-generated datasets. ARCH is currently optimized for working with tens of thousands of web archive collections, covering a broad range of subjects, events, and timeframes, and the platform is actively expanding to include digitized text and image collections. ARCH also works with various portions of the overall Wayback Machine global web archive totaling 50+ PB going back to 1996, representing an extensive archive of contemporary history and communication.

https://tinyurl.com/z9c83dut

"Perceived Benefits of Open Data Are Improving but Scientists Still Lack Resources, Skills, and Rewards"

Addressing global scientific challenges requires the widespread sharing of consistent and trustworthy research data. Identifying the factors that influence widespread data sharing will help us understand the limitations and potential leverage points. We used two well-known theoretical frameworks, the Theory of Planned Behavior and the Technology Acceptance Model, to analyze three DataONE surveys published in 2011, 2015, and 2020. These surveys aimed to identify individual, social, and organizational influences on data-sharing behavior. In this paper, we report on the application of multiple factor analysis (MFA) on this combined, longitudinal, survey data to determine how these attitudes may have changed over time. The first two dimensions of the MFA were named willingness to share and satisfaction with resources based on the contributing questions and answers. Our results indicated that both dimensions are strongly influenced by individual factors such as perceived benefit, risk, and effort. Satisfaction with resources was significantly influenced by social and organizational factors such as the availability of training and data repositories. Researchers that improved in willingness to share are shown to be operating in domains with a high reliance on shared resources, are reliant on funding from national or federal sources, work in sectors where internal practices are mandated, and live in regions with highly effective communication networks. Significantly, satisfaction with resources was inversely correlated with willingness to share across all regions. We posit that this relationship results from researchers learning what resources they actually need only after engaging with the tools and procedures extensively.

https://doi.org/10.1057/s41599-023-01831-7

ITHAKA: "New Services for Academic, Research, and Cultural Institutions to Share, Preserve, and Manage Digital Collections"

Following a successful series of pilots during which over 300 institutions shared more than 1,800 collections on JSTOR, and a cohort of 40 partners helped to define preservation and collection loading needs, ITHAKA developed three services to support institutions of all sizes looking for high-impact, sustainable solutions. Institutions can now:

Share collections on JSTOR, making it possible for millions of users to discover and use content alongside a rich trove of journals, books, images, and other primary source collections while bringing greater visibility to institutions.

Preserve collections with Portico to safeguard the accessibility and usability of digital files in the long term, addressing the needs of tomorrow’s scholars.

Manage collections using JSTOR Forum, a web-based tool that makes it easy to catalog, edit metadata, and publish to JSTOR and other sites – all in one place.

https://www.ithaka.org/news/new-services/

Video: "Dryad in the Community: New Data Sharing Mandates and the Role of Academic Libraries"

In this presentation, Dryad’s Head of Community Engagement, Sarah Lippincott is joined by fellow presenters Michael Casp, Head of Production Division at J&J Editorial, Emma Molls, Director of Open Research & Publishing at University of Minnesota Libraries, and Alberto Pepe, Director of Strategy and Innovation at Wiley and Co-founder of Authorea. Sarah reviews some pertinent highlights from the Nelson memo and NIH policies, two of the major developments that will impact data sharing over the next few years. and concludes with a discussion on how libraries can help researchers move from data sharing to data publishing.

https://tinyurl.com/bdfd7axh

"It Takes a Researcher to Know a Researcher: Academic Librarian Perspectives Regarding Skills and Training for Research Data Support in Canada "

This study demonstrates that an in-depth qualitative portrait of data-related librarians within a national academic ecosystem provides valuable new insights regarding the perceived importance of conducting original empirical research to succeed in these roles.

https://doi.org/10.18438/eblip30297

"The Use of Web Archives in Disinformation Research"

In recent years, journalists and other researchers have used web archives as an important resource for their study of disinformation.. . . We will show how web archives have been used to investigate changes to webpages, study archived social media including deleted content, and study known disinformation that has been archived.

https://arxiv.org/abs/2306.10004v1

University of Hawaii at Manoa: Needs Assessment for Data Management and Sharing Training Courses

Data management is an increasingly fundamental skill for graduate students and researchers in the biomedical sciences, especially as National Institutes of Health (NIH) and other funding agencies are now beginning to require data management and sharing plans as part of research. Since the University of Hawaii at Manoa (UHM) Library provided little support for this area and existing data management and sharing instructional content are either out of date or fail to address the unique needs of the UHM research community, the UHM Library took steps to establish data management and sharing instruction services to meet the specific needs of the UHM research community.

https://hdl.handle.net/10125/104944

"DataChat: Prototyping a Conversational Agent for Dataset Search and Visualization"

Data users need relevant context and research expertise to effectively search for and identify relevant datasets. Leading data providers, such as the Inter-university Consortium for Political and Social Research (ICPSR), offer standardized metadata and search tools to support data search. Metadata standards emphasize the machine-readability of data and its documentation. There are opportunities to enhance dataset search by improving users’ ability to learn about, and make sense of, information about data. Prior research has shown that context and expertise are two main barriers users face in effectively searching for, evaluating, and deciding whether to reuse data. In this paper, we propose a novel chatbot-based search system, DataChat, that leverages a graph database and a large language model to provide novel ways for users to interact with and search for research data. DataChat complements data archives’ and institutional repositories’ ongoing efforts to curate, preserve, and share research data for reuse by making it easier for users to explore and learn about available research data.

https://arxiv.org/abs/2305.18358