"Back to Basics: Considering Categories of Data Services Consults"


Consultations are fundamental to data librarianship, serving as a vital means of one-on-one support for researchers. However, the topics and forms of support unique to data services consults are not always carefully considered. This commentary addresses five common services offered by data librarians—dataset reference, data management support, data analysis and software support, data curation, and data management (and sharing) plan writing—and considers strategies for successful patron support within the boundaries of a consultation.

https://doi.org/10.7191/jeslib.931

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Promoting Data Sharing: The Moral Obligations of Public Funding Agencies"


Sharing research data has great potential to benefit science and society. However, data sharing is still not common practice. Since public research funding agencies have a particular impact on research and researchers, the question arises: Are public funding agencies morally obligated to promote data sharing? We argue from a research ethics perspective that public funding agencies have several pro tanto obligations requiring them to promote data sharing. However, there are also pro tanto obligations that speak against promoting data sharing in general as well as with regard to particular instruments of such promotion. We examine and weigh these obligations and conclude that all things considered funders ought to promote the sharing of data. Even the instrument of mandatory data sharing policies can be justified under certain conditions.

https://doi.org/10.1007/s11948-024-00491-3

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Sustaining a Spatial Collaboration: Leveraging Social Infrastructure to Support Technological Advancement in Geospatial Data Discovery"


The Big Ten Academic Alliance (BTAA) Geospatial Information Network (GIN) serves as a prime example of an enduring, successful collaboration across multiple institutions. The proliferation and significance of geospatial data have outpaced the development of adequate search tools and high-quality metadata, prompting the necessity for streamlined geospatial data discovery. In response, the BTAA-GIN established, maintains, and continuously enhances a geoportal that federates metadata from public geospatial data providers in addition to geospatial resources from member institutions. This article provides a reflective analysis of the evolution of the BTAA-GIN over the past nine years, an exploration of the ever-shifting open-source technology landscape, possible future directions and potential expansions of scope, and highlights key features contributing to its success. Examining the trajectory of the BTAA-GIN and the factors behind its achievements yields valuable insights for comparable large-scale, multi-institutional endeavors.

https://doi.org/10.1080/15420353.2024.2388576

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Data Quality Assurance Practices in Research Data Repositories — A Systematic Literature Review"


This study conducted a systematic analysis of data quality assurance (DQA) practices in RDRs, guided by activity theory and data quality literature, resulting in conceptualizing a data quality assurance model (DQAM) for RDRs. DQAM outlines a DQA process comprising evaluation, intervention, and communication activities and categorizes 17 quality dimensions into intrinsic and product-level data quality. It also details specific improvement actions for data products and identifies the essential roles, skills, standards, and tools for DQA in RDRs. By comparing DQAM with existing DQA models, the study highlights its potential to improve these models by adding a specific DQA activity structure.

https://doi.org/10.1002/asi.24948

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Metadata Management in Data Lake Environments: A Survey"


Data lakes are storage repositories that contain large amounts of data in its native format; either structured ssemi-structured or unstructured, to be used when needed. . . .This survey congregates different facets of metadata management in data lakes and presents a global view along with the technological implications and the required features for building successful metadata management systems. Besides, this survey summarizes and discusses research gaps, open problems and main challenges facing both industrialists and academics.

https://doi.org/10.1080/19386389.2024.2359310

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Reproducible and Attributable Materials Science Curation Practices: A Case Study"


While small labs produce much of the fundamental experimental research in Material Science and Engineering (MSE), little is known about their data management and sharing practices and the extent to which they promote trust in and transparency of the published research. In this research, a case study is conducted on a leading MSE research lab [at MIT] to characterize the limits of current data management and sharing practices concerning reproducibility and attribution. The workflows are systematically reconstructed, underpinning four research projects by combining interviews, document review, and digital forensics. Then, information graph analysis and computer-assisted retrospective auditing are applied to identify where critical research information is unavailable orat risk.

Data management and sharing practices in this leading lab protect against computer and disk failure; however, they are insufficient to ensure reproducibility or correct attribution of work,especiallywhen a group member withdraws before the project completion.Therefore, recommendations for adjustments in MSE data management and sharing practices are proposed to promote trustworthiness and transparency by adding lightweight automated file-level auditing and automated data transfer processes.

https://doi.org/10.2218/ijdc.v18i1.940

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "What Is Research Data ‘Misuse’? And How Can It Be Prevented or Mitigated?"


In the article, we emphasize the challenge of defining misuse broadly and identify various forms that misuse can take, including methodological mistakes, unauthorized reuse, and intentional misrepresentation. We pay particular attention to underscoring the complexity of defining misuse, considering different epistemological perspectives and the evolving nature of scientific methodologies. We propose a theoretical framework grounded in the critical analysis of interdisciplinary literature on the topic of misusing research data, identifying similarities and differences in how data misuse is defined across a variety of fields, and propose a working definition of what it means to "misuse" research data. Finally, we speculate about possible curatorial interventions that data intermediaries can adopt to prevent or respond to instances of misuse.

https://doi.org/10.1002/asi.24944

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Training to Act FAIR: A Pre-Post Study on Teaching FAIR Guiding Principles to (Future) Researchers in Higher Education"


With a pre-post test design, the study evaluates the short-term effectiveness of FAIR training on students’ scientific suggestions and justifications in line with FAIR’s guiding principles. The study also assesses the influence of university legal frameworks on students’ inclination towards FAIR training. Before FAIR training, 81.1% of students suggested that scientific actions were not in line with the FAIR guiding principles. However, there is a 3.75-fold increase in suggestions that adhere to these principles after the training. Interestingly, the training does not significantly impact how students justify FAIR actions. The study observes a positive correlation between the presence of university legal frameworks on FAIR guiding principles and students’ inclination towards FAIR training.

https://doi.org/10.1007/s10805-024-09547-2

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Privacy Protection Framework for Open Data: Constructing and Assessing an Effective Approach"


This framework [Privacy Protection Framework for Open Data] aims to establish clear privacy protection measures and safeguard individuals’ privacy rights. Existing privacy protection practices were examined using content analysis, and 36 indicators across five dimensions were developed and validated through an empirical study with 437 participants. The PPFOD offers comprehensive guidelines for data openness, empowering individuals to identify privacy risks, guiding businesses to ensure legal compliance and prevent data leaks, and assisting libraries and data institutions in implementing effective privacy education and training programs, fostering a more privacy-conscious and secure data era.

https://doi.org/10.1016/j.lisr.2024.101312

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Trusted Research Environments: Analysis of Characteristics and Data Availability"


Trusted Research Environments (TREs) enable the analysis of sensitive data under strict security assertions that protect the data with technical, organizational, and legal measures from (accidentally) being leaked outside the facility. While many TREs exist in Europe, little information is available publicly on the architecture and descriptions of their building blocks and their slight technical variations. To highlight on these problems, an overview of the existing, publicly described TREs and a bibliography linking to the system description are provided. Their technical characteristics, especially in commonalities and variations, are analysed, and insight is provided into their data type characteristics and availability. The literature study shows that 47 TREs worldwide provide access to sensitive data, of which two-thirds provide data predominantly via secure remote access. Statistical offices (SOs) make the majority of sensitive data records included in this study available.

https://doi.org/10.2218/ijdc.v18i1.939

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Attitudes on Data Reuse Among Internal Medicine Residents"


Results: We surveyed a population of 162 residents, and 67 residents responded, representing a 41.36% response rate. Strong majorities of residents exhibited positive views of secondary data analysis. Moreover, in our sample, those with exposure to secondary data analysis research opined that secondary data analysis takes less time and is less difficult to conduct compared to the other residents without curricular exposure to secondary analysis.

Discussion: The survey reflects that residents believe secondary data analysis is worthwhile and this highlights opportunities for data librarians. As current residents matriculate into professional roles as clinicians, educators, and researchers, libraries have an opportunity to bolster support for data curation and education.

https://doi.org/10.5195/jmla.2024.1772

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Research Data Alliance: Recommendations on Open Science Rewards and Incentives


Open Science contributes to the collective building of scientific knowledge and societal progress. However, academic research currently fails to recognise and reward efforts to share research outputs. Yet it is crucial that such activities be valued, as they require considerable time, energy, and expertise to make scientific outputs usable by others, as stated by the FAIR principles. To address this challenge, several bottom-up and top-down initiatives have emerged to explore ways to assess and credit Open Science activities (e.g., Research Data Alliance, RDA) and to promote the assessment of a broad spectrum of research outputs, including datasets and software (e.g., Coalition for Advancement of Research Assessment, CoARA). As part of the RDA-SHARC (SHAring Rewards and Credit) interest group, we have developed a set of recommendations to help implement various rewarding schemes at different levels. The recommendations target a broad range of stakeholders. For instance, institutions are encouraged to provide digital services and infrastructure, organise training and cover expenses associated with making data available for the community. The funders should establish policies requiring open access to data produced by funded research and provide corresponding support. The publishers should favour open peer-review models and open access to articles, data and software. Government policymakers should set up a comprehensive Open Science strategy, as recommended by UNESCO and followed by a growing number of countries. The present work details different measures that are proposed to the stakeholders. The need to include sharing activities in research evaluation schemes as an overarching mechanism to promote Open Science practices is specifically emphasised.

https://tinyurl.com/4rhk44mn

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"An Empirical Examination of Data Reuser Trust in a Digital Repository"


Most studies of trusted digital repositories have focused on the internal factors delineated in the Open Archival Information System (OAIS) Reference Model—organizational structure, technical infrastructure, and policies, procedures, and processes. Typically, these factors are used during an audit and certification process to demonstrate a repository can be trusted. The factors influencing a repository’s designated community of users to trust it remains largely unexplored. This article proposes and tests a model of trust in a data repository and the influence trust has on users’ intention to continue using it. Based on analysis of 245 surveys from quantitative social scientists who published research based on the holdings of one data repository, findings show three factors are positively related to data reuser trust—integrity, identification, and structural assurance. In turn, trust and performance expectancy are positively related to data reusers’ intentions to return to the repository for more data. As one of the first studies of its kind, it shows the conceptualization of trusted digital repositories needs to go beyond high-level definitions and simple application of the OAIS standard. Trust needs to encompass the complex trust relationship between designated communities of users that the repositories are being built to serve.

https://doi.org/10.1002/asi.24933

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Toward Measuring Data Literacy for Higher Education: Developing and Validating a Data Literacy Self-Efficacy Scale"


This study aims to develop and validate a scale designed for measuring self-efficacy in data literacy within the context of higher education. Both exploratory and confirmatory factor analyses were conducted to determine construct validity and reliability. The resulting data literacy self-efficacy scale comprises 31 items organized into three factors: data identification, data processing, and data management and sharing. These factors represent distinct yet interconnected dimensions, highlighting the multifaceted nature of data literacy.

https://doi.org/10.1002/asi.24934

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"To Share or Not to Share? Image Data Sharing in the Social Sciences and Humanities"


Introduction. The paper aims to investigate image data sharing within social science and humanities. While data sharing is encouraged as a part of the open science movement, little is known about the approaches and factors influencing the sharing of image data. This information is evident as the use of image data in these fields of research is increasing, and data sharing is context dependent. . . .

Results. The findings show that image data sharing is not an established research practice, and when it happens it is mostly done via informal means by sharing data through personal contacts. Supporting the scientific community, the open science agenda and fulfilling research funders’ requirements motivate scholars to share their data. Impeding factors relate to the qualities of data, ownership of data, data stewardship, and research integrity.

https://tinyurl.com/nt8md9cj

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"DMPs as Management Tool for Intellectual Assets by SMART-metrics"


Data Management Plans (DMPs) are vital components of effective research data management (RDM). They serve not only as organisational tools but also as a structured framework dictating the collection, processing, sharing/publishing, and management of data throughout the research data life cycle. This can include existing data curation standards, the establishment of data handling protocols, and the creation, when necessary, of community curation policies. Therefore, DMPs present a unique opportunity to harmonise project management efforts for optimising the formulation and execution of project objectives.

To harness the full potential of DMPs as project management tools, the SMART approach (i.e., Specific, Measurable, Achievable, Relevant, and Time-bound) emerges as a compelling methodology. During the initial stage of the project proposal, drafted SMART metrics can offer a systematic approach to map work packages (WPs) and deliverables to the overarching project objectives. Then, the Principal Investigators (PIs) can ensure the consortia that all the project potential intellectual assets (i.e., expected research results) were considered properly, as well as their necessary timelines, resources, and execution. It becomes imperative for data stewards (DSs) and governance policymakers to educate and provide guidelines to researchers on the advantages of developing well-curated DMPs that align results with SMART metrics. This alignment ensures that every intellectual asset intended as a research result (e.g., intellectual properties, publications, datasets, and software) within the project is subject to rigorous drafted planning, execution, and accountability.

Consequently, the risk of unforeseen setbacks and/or deviations from the original objectives is minimised, increasing the traceability and transparency of the research data life cycle. In addition, the integration of Technology Readiness Levels (TRLs) into this proposed enhanced DMP provides a systematic method to evaluate the maturity and readiness of technologies across scientific disciplines. Regular TRL assessments will allow PIs: (1) to monitor the WP progress, (2) to adapt research strategies if required, and (3) to ensure the projects remain in line with the drafted SMART metrics in the enhanced DMP before the project started. The TRLs can also help PIs maintain their focus on project milestones and specific tasks aligned with the original objectives, contributing to the overall success of their endeavours, while improving the transparency for the reporting and divulgation of the research results.

The paper presents the overall framework for enhancing DMPs as project management tools for any intellectual assets using SMART metrics and TRLs, as well as introducing suggested support services for data stewardship teams to assist PIs when implementing this novel framework effectively.

https://tinyurl.com/25ymtyyk

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Biomedical Data Repository Concepts and Management Principles"


The demand for open data and open science is on the rise, fueled by expectations from the scientific community, calls to increase transparency and reproducibility in research findings, and developments such as the Final Data Management and Sharing Policy from the U.S. National Institutes of Health and a memorandum on increasing public access to federally funded research, issued by the U.S. Office of Science and Technology Policy. This paper explores the pivotal role of data repositories in biomedical research and open science, emphasizing their importance in managing, preserving, and sharing research data. Our objective is to familiarize readers with the functions of data repositories, set expectations for their services, and provide an overview of methods to evaluate their capabilities. The paper serves to introduce fundamental concepts and community-based guiding principles and aims to equip researchers, repository operators, funders, and policymakers with the knowledge to select appropriate repositories for their data management and sharing needs and foster a foundation for the open sharing and preservation of research data.

https://doi.org/10.1038/s41597-024-03449-z

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Understanding the Value of Curation: A Survey of Us Data Repository Curation Practices and Perceptions"


Data curators play an important role in assessing data quality and take actions that may ultimately lead to better, more valuable data products. This study explores the curation practices of data curators working within US-based data repositories. We performed a survey in January 2021 to benchmark the levels of curation performed by repositories and assess the perceived value and impact of curation on the data sharing process. Our analysis included 95 responses from 59 unique data repositories. Respondents primarily were professionals working within repositories and examined curation performed within a repository setting. A majority 72.6% of respondents reported that "data-level" curation was performed by their repository and around half reported their repository took steps to ensure interoperability and reproducibility of their repository’s datasets. Curation actions most frequently reported include checking for duplicate files, reviewing documentation, reviewing metadata, minting persistent identifiers, and checking for corrupt/broken files. The most "value-add" curation action across generalist, institutional, and disciplinary repository respondents was related to reviewing and enhancing documentation. Respondents reported high perceived impact of curation by their repositories on specific data sharing outcomes including usability, findability, understandability, and accessibility of deposited datasets; respondents associated with disciplinary repositories tended to perceive higher impact on most outcomes. Most survey participants strongly agreed that data curation by the repository adds value to the data sharing process and that it outweighs the effort and cost. We found some differences between institutional and disciplinary repositories, both in the reported frequency of specific curation actions as well as the perceived impact of data curation. Interestingly, we also found variation in the perceptions of those working within the same repository regarding the level and frequency of curation actions performed, which exemplifies the complexity of a repository curation work. Our results suggest data curation may be better understood in terms of specific curation actions and outcomes than broadly defined curation levels and that more research is needed to understand the resource implications of performing these activities. We share these results to provide a more nuanced view of curation, and how curation impacts the broader data lifecycle and data sharing behaviors.

https://doi.org/10.1371/journal.pone.0301171

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"The Student Staffing Advantage: Data Science Consulting Service at NC State University Libraries"


The primarily peer-to-peer, graduate student-staffed Data Science Consulting Service at NC State University Libraries, within the Data & Visualization Services (DVS) department and collaborating closely with the Data Science Academy (DSA), has established a sustainable service and staffing model focused on providing broad data science analytic support to researchers across the university community. The service addresses the needs of university researchers who possess domain knowledge in their fields of study but a skills gap in the data science competencies required for research. The literature shows that it has been difficult for libraries to cover these needs with existing staffing models. Few universities follow the model practiced at NC State University, so a scan of the current landscape of data science consulting at universities across the country was performed to establish context. The support model and its advantages are described, including partnership with the DSA, student success, model sustainability and future directions for the service. Through a summary of the DVS assessment and needs evaluation process, the service’s advantages in staying ahead of patron needs are illustrated. This scalable, sustainable, student-focused model could be implemented by similar research institutions to expand the capacity of their technical research services.

https://onlinelibrary.wiley.com/doi/10.1002/sta4.702

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Journal Requirement for Data Sharing Statements in Clinical Trials: A Cross-Sectional Study"


Despite ICMJE [International Committee of Medical Journal Editors] recommendations, more than 27% of biomedical journals do not require clinical trials to include data sharing statements, highlighting room for improved transparency.

https://doi.org/10.1016/j.jclinepi.2024.111405

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"The NIH Data Management and Sharing Policy for Non-data librarians" (Video)


The NIH Data Management and Sharing (DMS) Policy went into effect early last year. That means that the policy that so many medical data librarians have been talking about is finally in place and affecting researchers. Libraries do not need a data expert or an institutional repository to get started with supporting NIH grants with this new policy. Reference interviewing skills and a basic knowledge of the NIH DMS Plan format can be combined to walk researchers through the basics. In this session, librarians who are new to the NIH DMS Policy will learn the essentials: what is the NIH DMS policy, who is affected, and how do researchers incorporate it into an NIH grant application.

https://www.youtube.com/watch?v=6JAj5rHpFd0

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Rethinking Data Management Planning: Introducing Research Output Management Planning (ROMPi) Approach"


Data management plans (DMPs), designed to adhere to Findable, Accessible, Interoperable, Reusable (FAIR) principles, were introduced to enhance research data management (RDM) but have encountered challenges in implementation. This essay calls for a paradigm shift by introducing the ‘Research Output Management Planning (ROMPi)’ approach, aiming to integrate traditional research project management practices promoting a holistic perspective of RDM. In its essence, ROMPi reframes the DMP in the conventional project management work breakdown structure in work packages (WPs), with research outputs going through their lifecycle. It also advocates reimagining the concept of data into research outputs, acknowledging a holistic perspective of the research outcomes. We demonstrated that the research project management perspective at the early implementation stage could ultimately align DMP within the research process. ROMPi offers a practical research output management approach, fostering a holistic project-researcher-centric perspective.

https://doi.org/10.5334/dsj-2024-034

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"The FAIR Assessment Conundrum: Reflections on Tools and Metrics"


Several tools for assessing FAIRness have been developed. Although their purpose is common, they use different assessment techniques, they are designed to work with diverse research products, and they are applied in specific scientific disciplines. It is thus inevitable that they perform the assessment using different metrics. This paper provides an overview of the actual FAIR assessment tools and metrics landscape to highlight the challenges characterising this task. In particular, 20 relevant FAIR assessment tools and 1180 relevant metrics were identified and analysed concerning (i) the tool’s distinguishing aspects and their trends, (ii) the gaps between the metric intents and the FAIR principles, (iii) the discrepancies between the declared intent of the metrics and the actual aspects assessed, including the most recurring issues, (iv) the technologies used or mentioned the most in the assessment metrics. The findings highlight (a) the distinguishing characteristics of the tools and the emergence of trends over time concerning those characteristics, (b) the identification of gaps at both metric and tool levels, (c) discrepancies observed in 345 metrics between their declared intent and the actual aspects assessed, pointing at several recurring issues, and (d) the variety in the technology used for the assessments, the majority of which can be ascribed to linked data solutions. This work also highlights some open issues that FAIR assessment still needs to address.

https://doi.org/10.5334/dsj-2024-033

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |