"Big Data-Driven Investigation into the Maturity of Library Research Data Services (RDS)"


The creation of library research data services (RDS) requires assessment of their maturity, i.e., the primary objective of this study. Its authors have set out to probe the nationwide level of library RDS maturity, based on the RDS maturity model, as proposed by Cox et al. (2019), while making use of natural language processing (NLP) tools, typical for big data analysis. The secondary objective consisted in determining the actual suitability of the above-referenced tools for this particular type of assessment.

https://doi.org/10.1016/j.acalib.2022.102646

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Federating Research Infrastructures in Europe for Fair Access to Data: Science Europe Briefing on EOSC

The European research and innovation ecosystem is going through a period of profound change. Researchers, organisations that fund or perform research, and policymakers are reshaping the research process and its outputs based on the opportunities offered by the digital transition. The findability, accessibility, interoperability, and reusability (FAIRness) of research publications, data, and software in the digital space will define research and innovation going forward. Closely related, the transition to an open research process and Open Access of its outputs is becoming the ‘new normal’. One of the most prominent initiatives in the digital and open transition of research is the European Open Science Cloud (EOSC). This federation of existing research data infrastructures in Europe aims to create a web of FAIR data and related services for research.

https://doi.org/10.5281/zenodo.7346887

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Adoption of Transparency and Openness Promotion (TOP) Guidelines across Journals"


Journal policies continuously evolve to enable knowledge sharing and support reproducible science. However, that change happens within a certain framework. Eight modular standards with three levels of increasing stringency make Transparency and Openness Promotion (TOP) guidelines which can be used to evaluate to what extent and with which stringency journals promote open science. Guidelines define standards for data citation, transparency of data, material, code and design and analysis, replication, plan and study pre-registration, and two effective interventions: "Registered reports" and "Open science badges", and levels of adoption summed up across standards define journal’s TOP Factor. In this paper, we analysed the status of adoption of TOP guidelines across two thousand journals reported in the TOP Factor metrics. We show that the majority of the journals’ policies align with at least one of the TOP’s standards, most likely "Data citation" (70%) followed by "Data transparency" (19%). Two-thirds of adoptions of TOP standard are of the stringency Level 1 (less stringent), whereas only 9% is of the stringency Level 3. Adoption of TOP standards differs across science disciplines and multidisciplinary journals (N = 1505) and journals from social sciences (N = 1077) show the greatest number of adoptions. Improvement of the measures that journals take to implement open science practices could be done: (1) discipline-specific, (2) journals that have not yet adopted TOP guidelines could do so, (3) the stringency of adoptions could be increased.

https://doi.org/10.3390/publications10040046

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Open Science Infrastructure as a Key Component of Open Science"


The Open Science movement is a response to the accumulated problems in scholarly communication, like the "reproducibility crisis", "serials crisis", and "peer review crisis". The European Commission defines priorities of Open Science as Findable, Accessible, Interoperable and Reproducible (FAIR) data, infrastructure and services in the European Open Science Cloud (EOSC), Next generation metrics, altmetrics and rewards, the future of scientific communication, research integrity and reproducibility, education and skills and citizen science. Open Science Infrastructure is also one of four key components of Open Science defined by UNESCO.

Mainly represented among Open Science Infrastructures are institutional and thematic repositories for publications, research data, software and code. Furthermore, the Open Science Infrastructure services range may include discovery, mining, publishing, the peer review process, archiving and preservation, social networking tools, training, high-performance computing, and tools for processing and analysis. Successful Open Science Infrastructure should be based on community values and responsive to needed changes. Preferably the Open Science Infrastructure should be distributed, enabling machine-actionable tools and services, supporting reusability and reproducibility, quality FAIR data, interoperability, sustainability, long-term preservation and funding.

https://doi.org/10.7557/5.6777

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Why Don’t We Share Data and Code? Perceived Barriers and Benefits to Public Archiving Practices"


Here, we define, categorize and discuss barriers to data and code sharing that are relevant to many research fields. We explore how real and perceived barriers might be overcome or reframed in the light of the benefits relative to costs. By elucidating these barriers and the contexts in which they arise, we can take steps to mitigate them and align our actions with the goals of open science, both as individual scientists and as a scientific community.

https://doi.org/10.1098/rspb.2022.1113

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Data Quality Assurance at Research Data Repositories"


This paper presents findings from a survey on the status quo of data quality assurance practices at research data repositories.

The personalised online survey was conducted among repositories indexed in re3data in 2021. It covered the scope of the repository, types of data quality assessment, quality criteria, responsibilities, details of the review process, and data quality information and yielded 332 complete responses.

The results demonstrate that most repositories perform data quality assurance measures, and overall, research data repositories significantly contribute to data quality. Quality assurance at research data repositories is multifaceted and nonlinear, and although there are some common patterns, individual approaches to ensuring data quality are diverse. The survey showed that data quality assurance sets high expectations for repositories and requires a lot of resources. Several challenges were discovered: for example, the adequate recognition of the contribution of data reviewers and repositories, the path dependence of data review on review processes for text publications, and the lack of data quality information. The study could not confirm that the certification status of a repository is a clear indicator of whether a repository conducts in-depth quality assurance.

http://doi.org/10.5334/dsj-2022-018

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Paywall: "A Comprehensive Review of Open Data Platforms, Prevalent Technologies, and Functionalities"


We will discuss seven major open data platforms, such as (1) CKAN (2) DKAN (3) Socrata (4) OpenDataSoft (5) GitHub (6) Google datasets (7) Kaggle. We will evaluate the technological commons, techniques, features, methods, and visualization offered by each tool. In addition, why are these platforms important to users such as providers, curators, and end-users? And what are the key options available on these platforms to publish open data?

https://doi.org/10.1145/3560107.3560142

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Producing Open Data"


Mainly building on our own experience as scholars from different research traditions (life sciences, social sciences and humanities), we describe best-practice approaches for opening up research data. We reflect on common barriers and strategies to overcome them, condensed into a step-by-step guide focused on actionable advice in order to mitigate the costs and promote the benefit of open data on three levels at once: society, the disciplines and individual researchers.

https://doi.org/10.3897/rio.8.e86384

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Twitter’s Potential Collapse Could Wipe Out Vast Records of Recent Human History"


Twitter’s ubiquity, its adoption by nearly a quarter of a billion users in the last 16 years, and its status as a de facto public archive, has made it a gold mine of information, says Thomas [senior analyst at the Institute for Strategic Dialogue].

"In one sense, this actually represents an enormous opportunity for future historians—we’ve never had the capacity to capture this much data about any previous era in history," she explains. But that enormous scale presents a huge storage problem for organizations.

For eight years, the US Library of Congress took it upon itself to maintain a public record of all tweets, but it stopped in 2018, instead selecting only a small number of accounts’ posts to capture. "It never, ever worked," says William Kilbride, executive director of the Digital Preservation Coalition. The data the library was expected to store was too vast, the volume coming out of the firehose too great. "Let me put that in context: it’s the Library of Congress. They had some of the best expertise on this topic. If the Library of Congress can’t do it, that tells you something quite important," he says.

https://cutt.ly/EMPxp0h

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Nature Authors Can Now Seamlessly Share Their Data"


In April of this year, Springer Nature and Figshare announced a new integrated route for data deposition at Nature Portfolio titles to help address this problem and encourage researchers to share data rather than seeing it as a hurdle to article publication.

Following the success of the pilot, this streamlined integration is now being extended. Authors submitting to the Nature Portfolio journals, including Nature, in the fields of life, health, chemical and physical sciences will now be able to easily opt into data sharing, via Figshare, as part of one integrated submission process.

https://cutt.ly/RMTKcpo

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Digital Preservation Coalition: "Understanding User Needs: A Case Study from the National Library of Scotland"


This case study looks at the approaches to user engagement with National Library of Scotland (NLS) maps website users, and how this informs digital preservation decisions. After a brief description of the NLS maps website structure, it examines user expectations of the NLS maps website, how these have developed over time, and the main purposes users have for visiting the website. The main research methods which have been employed to consult with users are then outlined, including user surveys, web-analytics, mystery visitor reports, and enquiries.

http://doi.org/10.7207/twgn22-01

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Research Data Management Needs Assessment of Clemson University"


The faculty, staff, and graduate students at Clemson University were surveyed by the library about their RDM needs in the spring of 2021. The survey was based on previous surveys from 2012 and 2016 to allow for comparison, but language was updated, and additional questions were added because the field of RDM has evolved. Survey findings indicated that researchers are overall more likely to back up and share their data, but the process of cleaning and preparing the data for sharing was an obstacle. Few researchers reported including metadata when sharing or consulting the library for help with writing a Data Management Plan (DMP). Researchers want RDM resources; offering and effectively marketing those resources will enable libraries to both support researchers and encourage best practices. Understanding researcher needs and offering time-saving services and convenient training options makes following RDM best practices easier for researchers. Outreach and integrated partnerships that support the research life cycle are crucial next steps for ensuring effective data management.

https://doi.org/10.31274/jlsc.13970

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Paywall: "Big Data Curation Framework: Curation Actions and Challenges"


The goal of this research is to provide a theoretical framework that identifies big data curation actions and associated curation challenges. . . . The outcome of the study includes the big data curation framework that provides overview of curation activities and concerns that are essential to perform such activities. The study also provides practical implications for libraries, archives, data repositories and other information organisations that concerns the issue of big data curation as big data presents a multidimensional array of exigencies in relation to the mission of those organisations.

https://doi.org/10.1177/01655515221133528

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"How Often Do Cancer Researchers Make Their Data and Code Available and What Factors Are Associated with Sharing"


One in five studies declared data were publicly available (59/306, 19%, 95% CI: 15–24%). However, when data availability was investigated this percentage dropped to 16% (49/306, 95% CI: 12–20%), and then to less than 1% (1/306, 95% CI: 0–2%) when data were checked for compliance with key FAIR principles. While only 4% of articles that used inferential statistics reported code to be available (10/274, 95% CI: 2–6%), the odds of reporting code to be available were 5.6 times higher for researchers who shared data.

https://doi.org/10.1186/s12916-022-02644-2

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Paywall: The Data Literacy Cookbook


The Data Literacy Cookbook includes a variety of approaches to and lesson plans for teaching data literacy, from simple activities to self-paced learning modules to for-credit and discipline-specific courses. . . . Many sections have overlapping learning outcomes, so you can combine recipes from multiple sections to whip up a scaffolded curriculum. The Data Literacy Cookbook provides librarians with lesson plans, strategies, and activities to help guide students as both consumers and producers in the data life cycle.

https://cutt.ly/XMhHEts

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Data Primer: Making Digital Humanities Research Data Public


This Data Primer was collaboratively authored by over 30 Digital Humanities researchers and research assistants, and was peer-reviewed by data professionals. It serves as an overview of the different aspects of data curation and management best practices for digital humanities researchers. Endorsed by the National Training Expert Group of the Digital Research Alliance of Canada.

https://cutt.ly/8MhHFnO

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"NLM Toolkit for the NIH Data Management and Sharing Policy"


A selection of guides, toolkits, and other resources for librarians working on addressing the NIH Data Management and Sharing Policy.

https://cutt.ly/iMyXCLp

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Open Access Books through Open Data Sources: Assessing Prevalence, Providers, and Preservation"


The results suggest reason for concern for the long tail of OA books distributed at thousands of different web domains as these include volatile cloud storage or sometimes no longer contained the files at all. Data quality issues, varying definitions of OA across services, and inconsistent implementation of unique identifiers were discovered as key challenges. The study includes recommendations for publishers, libraries, data providers, and preservation services for improving monitoring and practices for OA book preservation.

https://doi.org/10.5281/zenodo.7305489

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"The French National 3D Data Repository for Humanities: Features, Feedback and Open Questions"


We introduce the French National 3D Data Repository for Humanities designed for the conservation and the publication of 3D research data in the field of Humanities and Social Sciences. We present the choices made for the data organization, metadata, standards and infrastructure towards a FAIR service.

https://arxiv.org/abs/2211.04094

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Towards Environmentally Sustainable Long-term Digital Preservation "


Digital preservation relies on technological infrastructure (information and communication technology, ICT) that can have environmental impacts. While altering technology usage can reduce the impact of digital preservation practices, this alone is not a strategy for sustainable practice. Moving toward environmentally sustainable digital preservation requires critically examining the motivations and assumptions that shape current practice. The use of scalable cloud infrastructures can reduce the environmental impacts of long-term data preservation solutions.

http://www.ijdc.net/article/view/848

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Supporting Software Preservation Services in Research and Memory Organizations


Supporting Software Preservation Services in Research and Memory Organizationsidentifies concepts, skill sets, barriers, and future directions related to software preservation work. Although definitions of "software" can vary across preservation contexts, the study found that there appears to be wide support for inter-organizational collaboration in software preservation. The report includes 13 recommendations for broadening representation in the field, defining the field, networking and community building, informal and formal learning, and implementing shared infrastructures and model practices.

https://cutt.ly/4NJHcoF

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

The BitList 2022: The Global List of Digitally Endangered Species


The Global List of Digitally Endangered Species – The BitList – offers an accessible snapshot of the concerns expressed by the global digital preservation community with respect to the risks faced by diverse types of digital content in varied conditions and contexts. It provides an elementary assessment of the imminence and significance of the dangers faced by different, and at times overlapping classifications of digital materials. By identifying the urgency of action and significance of content, The BitList draws attention to those digital materials that, in the view of the global digital preservation community, require urgent action to remain viable.

http://doi.org/10.7207/dpcbitlist-22

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Finding Your Way in Academic Librarianship: Introducing the Scholarly Communication Notebook"


The SCN (https://www.oercommons.org/hubs/SCN) is an extension of an earlier, related, effort to create an open textbook about scholarly communication librarianship. That book, Scholarly Communication Librarianship and Open Knowledge, is forthcoming from ACRL in 2023. . . . Even if openly licensed, a book remains a relatively static resource. Scholarly communication is not static at all. Far from it, as many will attest and recognize through hard-won experience. Our contribution is the SCN, an online collection of contributed, modular, open content scoped to scholarly communication topics, which might complement the book or find use independent of it.

https://doi.org/10.5860/crln.83.10.444

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Data Platforms for Open Life Sciences – A Systematic Analysis of Management Instruments"


Open data platforms are interfaces between data demand of and supply from their users. Yet, data platform providers frequently struggle to aggregate data to suit their users’ needs and to establish a high intensity of data exchange in a collaborative environment. Here, using open life science data platforms as an example for a diverse data structure, we systematically categorize these platforms based on their technology intermediation and the range of domains they cover to derive general and specific success factors for their management instruments. Our qualitative content analysis is based on 39 in-depth interviews with experts employed by data platforms and external stakeholders. We thus complement peer initiatives which focus solely on data quality, by additionally highlighting the data platforms’ role to enable data utilization for innovative output. Based on our analysis, we propose a clearly structured and detailed guideline for seven management instruments. This guideline helps to establish and operationalize data platforms and to best exploit the data provided. Our findings support further exploitation of the open innovation potential in the life sciences and beyond.

https://doi.org/10.1371/journal.pone.0276204

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |