"The Ecosystem of Repository Migration"

Juliet L. Hardesty and Nicholas Homenda have published "The Ecosystem of Repository Migration" in Publications.

Here's an excerpt:

Indiana University was an early adopter of the Fedora repository, developing it as a home for heterogeneous digital library content from a variety of collections with unique content models. After joining the Hydra Project, now known as Samvera, in 2012, development progressed on a variety of applications that formed the foundation for digital library services using the Fedora 4 repository. These experiences have shaped migration planning to move from Fedora 3 to Fedora 4 for this large and inclusive set of digital content. Moving to Fedora 4 is not just a repository change; it is an ecosystem shift. End user interfaces for access, management systems for collection managers, and data structures are all impacted. This article shares what Indiana University has learned about migrating to Fedora 4 to help others work through their own migration considerations. This article is also meant to inspire the Fedora repository development community to offer ways to further ease migration work, sustaining Fedora users moving forward, and inviting new Fedora users to try the software and become involved in the community.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Bringing Citations and Usage Metrics Together to Make Data Count"

Helena Cousijn et al. have published "Bringing Citations and Usage Metrics Together to Make Data Count" in Data Science Journal.

Here's an excerpt:

Over the last years, many organizations have been working on infrastructure to facilitate sharing and reuse of research data. This means that researchers now have ways of making their data available, but not necessarily incentives to do so. Several Research Data Alliance (RDA) working groups have been working on ways to start measuring activities around research data to provide input for new Data Level Metrics (DLMs). These DLMs are a critical step towards providing researchers with credit for their work. In this paper, we describe the outcomes of the work of the Scholarly Link Exchange (Scholix) working group and the Data Usage Metrics working group. The Scholix working group developed a framework that allows organizations to expose and discover links between articles and datasets, thereby providing an indication of data citations. The Data Usage Metrics group works on a standard for the measurement and display of Data Usage Metrics. Here we explain how publishers and data repositories can contribute to and benefit from these initiatives. Together, these contributions feed into several hubs that enable data repositories to start displaying DLMs. Once these DLMs are available, researchers are in a better position to make their data count and be rewarded for their work.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Quality Issues of CRIS [Current Research Information System] Data: An Exploratory Investigation with Universities from Twelve Countries"

Otmane Azeroual and Joachim Schöpfel have published "Quality Issues of CRIS Data: An Exploratory Investigation with Universities from Twelve Countries" in Publications.

Here's an excerpt:

Collecting, integrating, storing and analyzing data in a database system is nothing new in itself. To introduce a current research information system (CRIS) means that scientific institutions must provide the required information on their research activities and research results at a high quality. A one-time cleanup is not sufficient; data must be continuously curated and maintained. Some data errors (such as missing values, spelling errors, inaccurate data, incorrect formatting, inconsistencies, etc.) can be traced across different data sources and are difficult to find. Small mistakes can make data unusable, and corrupted data can have serious consequences. The sooner quality issues are identified and remedied, the better. For this reason, new techniques and methods of data cleansing and data monitoring are required to ensure data quality and its measurability in the long term. This paper examines data quality issues in current research information systems and introduces new techniques and methods of data cleansing and data monitoring with which organizations can guarantee the quality of their data.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"When a Repository Is Not Enough: Redesigning a Digital Ecosystem to Serve Scholarly Communication"

Robin R. Sewell et al. have published "When a Repository Is Not Enough: Redesigning a Digital Ecosystem to Serve Scholarly Communication" in the Journal of Librarianship and Scholarly Communication.

Here's an excerpt:

INTRODUCTION Our library's digital asset management system (DAMS) was no longer meeting digital asset management requirements or expanding scholarly communication needs. We formed a multiunit task force (TF) to (1) survey and identify existing and emerging institutional needs; (2) research available DAMS (open source and proprietary) and assess their potential fit; and (3) deploy software locally for in-depth testing and evaluation. DESCRIPTION OF PROGRAM We winnowed a field of 25 potential DAMS down to 5 for deployment and evaluation. The process included selection and identification of test collections and the creation of a multipart task based rubric based on library and campus needs assessments. Time constraints and DAMS deployment limitations prompted a move toward a new evaluation iteration: a shorter criteria-based rubric. LESSONS LEARNED We discovered that no single DAMS was "just right," nor was any single DAMS a static product. Changing and expanding scholarly communication and digital needs could only be met by the more flexible approach offered by a multicomponent digital asset management ecosystem (DAME), described in this study. We encountered obstacles related to testing complex, rapidly evolving software available in a range of configurations and flavors (including tiers of vendor-hosted functionality) and time and capacity constraints curtailed in-depth testing. While we anticipate long-term benefits from "going further together" by including university-wide representation in the task force, there were trade-offs in distributing responsibilities and diffusing priorities. NEXT STEPS Shifts in scholarly communication at multiple levels—institutional, regional, consortial, national, and international—have already necessitated continual review and adjustment of our digital systems.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Improving the Discoverability and Web Impact of Open Repositories: Techniques and Evaluation"

George Macgregor has published "Improving the Discoverability and Web Impact of Open Repositories: Techniques and Evaluation" in Code4Lib Journal.

Here's an excerpt:

In this contribution we experiment with a suite of repository adjustments and improvements performed on Strathprints, the University of Strathclyde, Glasgow, institutional repository powered by EPrints 3.3.13. These adjustments were designed to support improved repository web visibility and user engagement, thereby improving usage. Although the experiments were performed on EPrints it is thought that most of the adopted improvements are equally applicable to any other repository platform. Following preliminary results reported elsewhere, and using Strathprints as a case study, this paper outlines the approaches implemented, reports on comparative search traffic data and usage metrics, and delivers conclusions on the efficacy of the techniques implemented. The evaluation provides persuasive evidence that specific enhancements to technical aspects of a repository can result in significant improvements to repository visibility, resulting in a greater web impact and consequent increases in content usage. COUNTER usage grew by 33% and traffic to Strathprints from Google and Google Scholar was found to increase by 63% and 99% respectively. Other insights from the evaluation are also explored. The results are likely to positively inform the work of repository practitioners and open scientists.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Data Discovery Paradigms: User Requirements and Recommendations for Data Repositories"

Mingfang Wu et al. have published "Data Discovery Paradigms: User Requirements and Recommendations for Data Repositories" in Data Science Journal (CC BY 4.0).

Here's an excerpt:

As data repositories make more data openly available it becomes challenging for researchers to find what they need either from a repository or through web search engines. This study attempts to investigate data users’ requirements and the role that data repositories can play in supporting data discoverability by meeting those requirements. We collected 79 data discovery use cases (or data search scenarios), from which we derived nine functional requirements for data repositories through qualitative analysis. We then applied usability heuristic evaluation and expert review methods to identify best practices that data repositories can implement to meet each functional requirement. We propose the following ten recommendations for data repository operators to consider for improving data discoverability and user’s data search experience:

1. Provide a range of query interfaces to accommodate various data search behaviours.

2. Provide multiple access points to find data.

3. Make it easier for researchers to judge relevance, accessibility and reusability of a data collection from a search summary.

4. Make individual metadata records readable and analysable.

5. Enable sharing and downloading of bibliographic references.

6. Expose data usage statistics.

7. Strive for consistency with other repositories.

8. Identify and aggregate metadata records that describe the same data object.

9. Make metadata records easily indexed and searchable by major web search engines.

10. Follow API search standards and community adopted vocabularies for interoperability.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Recording Available: ‘Securing Community-Controlled Infrastructure: SPARC’s Plan of Action’"

DuraSpace has released "Recording Available: 'Securing Community-Controlled Infrastructure: SPARC's Plan of Action'."

Here's an excerpt:

In this webinar, Heather shared SPARC's efforts on their community-controlled infrastructure project that further explores this question. Heather highlighted what lead to SPARC conducting a market analysis, which included both a financial analysis and an analysis of strategies of some of the key commercial players in the infrastructure arena, and the implications of those strategies for our community. . . .

The recordings and presentation slides of both webinars are available at https://duraspace.org/webinar/.

Academic Library as Scholarly Publisher Bibliography | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Supporting FAIR Data Principles with Fedora"

David Wilcox has published "Supporting FAIR Data Principles with Fedora" in LIBER Quarterly.

Here's an excerpt:

Making data findable, accessible, interoperable, and re-usable is an important but challenging goal. From an infrastructure perspective, repository technologies play a key role in supporting FAIR data principles. Fedora is a flexible, extensible, open source repository platform for managing, preserving, and providing access to digital content. Fedora is used in a wide variety of institutions including libraries, museums, archives, and government organizations. Fedora provides native linked data capabilities and a modular architecture based on well-documented APIs and ease of integration with existing applications. As both a project and a community, Fedora has been increasingly focused on research data management, making it well-suited to supporting FAIR data principles as a repository platform. Fedora provides strong support for persistent identifiers, both by minting HTTP URIs for each resource and by allowing any number of additional identifiers to be associated with resources as RDF properties. Fedora also supports rich metadata in any schema that can be indexed and disseminated using a variety of protocols and services. As a linked data server, Fedora allows resources to be semantically linked both within the repository and on the broader web. Along with these and other features supporting research data management, the Fedora community has been actively participating in related initiatives, most notably the Research Data Alliance. Fedora representatives participate in a number of interest and working groups focused on requirements and interoperability for research data repository platforms. This participation allows the Fedora project to both influence and be influenced by an international group of Research Data Alliance stakeholders. This paper will describe how Fedora supports FAIR data principles, both in terms of relevant features and community participation in related initiatives.

Academic Library as Scholarly Publisher Bibliography | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"OpenAIRE and DuraSpace Partner to Support Greater Functionality in the Global Repository Network"

DuraSpace has released "OpenAIRE and DuraSpace Partner to Support Greater Functionality in the Global Repository Network."

Here's an excerpt:

Repositories collectively act as the foundation for Open Science by collecting and providing access to research outputs, and play a key role in the emerging scholarly commons. To that end, OpenAIRE and DuraSpace aim to ensure that repositories are using up-to-date technologies and adopting international standards and protocols. Through this MOU, OpenAIRE and DuraSpace have agreed to work together on a number of aspects to support their common goals. These activities include enabling DSpace systems to comply with OpenAIRE metadata guidelines, gradual adoption of next generation repository functionalities, and working together on standardized methods for measuring and aggregating usage statistics.

Academic Library as Scholarly Publisher Bibliography | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Facilitating and Improving Environmental Research Data Repository Interoperability"

Corinna Gries et al. have published "Facilitating and Improving Environmental Research Data Repository Interoperability" in Data Science Journal.

Here's an excerpt:

Environmental research data repositories provide much needed services for data preservation and data dissemination to diverse communities with domain specific or programmatic data needs and standards. Due to independent development these repositories serve their communities well, but were developed with different technologies, data models and using different ontologies. Hence, the effectiveness and efficiency of these services can be vastly improved if repositories work together adhering to a shared community platform that focuses on the implementation of agreed upon standards and best practices for curation and dissemination of data. Such a community platform drives forward the convergence of technologies and practices that will advance cross-domain interoperability. It will also facilitate contributions from investigators through standardized and streamlined workflows and provide increased visibility for the role of data managers and the curation services provided by data repositories, beyond preservation infrastructure. Ten specific suggestions for such standardizations are outlined without any suggestions for priority or technical implementation. Although the recommendations are for repositories to implement, they have been chosen specifically with the data provider/data curator and synthesis scientist in mind.

Academic Library as Scholarly Publisher Bibliography | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Towards Open Access Self Archiving Policies: A Case Study of COAR"

Bijan Kumar Roy et al. have published "Towards Open Access Self Archiving Policies: A Case Study of COAR" in LIBER Quarterly.

Here's an excerpt:

This paper examines Open Access (OA) self archiving policies of different Open Access Repositories (OARs) affiliated to COAR (Confederation of Open Access Repositories) as partner institutes. The process of scrutiny includes three major activities—selection of databases to consult; comparison and evaluation of Open Access policies of repositories listed in the selected databases and attached to COAR group; and critical examination of available self archiving policies of these OA repositories against a set of selected criteria. The above steps lead to reporting the following results: key findings have been identified and highlighted; common practices have been analyzed in relation to the focus of this paper; and a best practice benchmark has been suggested for popularizing and strengthening OARs as national research systems. This paper may help administrators, funding agencies, policy makers and professional librarians in devising institute-specific self archiving policies for their own organizations.

Academic Library as Scholarly Publisher Bibliography | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

Search Results Ranking Using Machine-Learning Algorithms: "Best Match: New Relevance Search for PubMed"

Nicolas Fiorini et al. have published "Best Match: New Relevance Search for PubMed" in PLOS Biology.

Here's an excerpt:

PubMed is a free search engine for biomedical literature accessed by millions of users from around the world each day. With the rapid growth of biomedical literature—about two articles are added every minute on average—finding and retrieving the most relevant papers for a given query is increasingly challenging. We present Best Match, a new relevance search algorithm for PubMed that leverages the intelligence of our users and cutting-edge machine-learning technology as an alternative to the traditional date sort order. The Best Match algorithm is trained with past user searches with dozens of relevance-ranking signals (factors), the most important being the past usage of an article, publication date, relevance score, and type of article. This new algorithm demonstrates state-of-the-art retrieval performance in benchmarking experiments as well as an improved user experience in real-world testing (over 20% increase in user click-through rate). Since its deployment in June 2017, we have observed a significant increase (60%) in PubMed searches with relevance sort order: it now assists millions of PubMed searches each week. In this work, we hope to increase the awareness and transparency of this new relevance sort option for PubMed users, enabling them to retrieve information more effectively.

Academic Library as Scholarly Publisher Bibliography | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"IMLS Funds DuraSpace Fedora Investigation–Designing a Migration Path: Assessing Barriers of Upgrading to the Most Current Version of Fedora–No Collection Left Behind"

DuraSpace has released "IMLS Funds DuraSpace Fedora Investigation–Designing a Migration Path: Assessing Barriers of Upgrading to the Most Current Version of Fedora–No Collection Left Behind."

Here's an excerpt:

The Institute of Museum and Library Services has awarded DuraSpace a National Digital Platform Planning Grant for $49,279 to investigate barriers to upgrading hundreds of U.S.-based libraries and archives running unsupported versions of Fedora. In consultation with stakeholders this project will conduct an environmental scan of relevant community initiatives, and gather primary research data to inform recommendations to reduce barriers to upgrading to the most current version of Fedora.

There are approximately 240 U.S.-based libraries and archives identified as target beneficiaries of the deliverables of this project including universities, liberal arts colleges, and not-for-profit special libraries hosted by historical societies and small research institutes.

Academic Library as Scholarly Publisher Bibliography | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"HydraDAM2: Extending Fedora 4 and Hydra for Media Preservation"

Jon W. Dunn et al. have self-archived "HydraDAM2: Extending Fedora 4 and Hydra for Media Preservation."

Here's an excerpt:

The overarching goal of the HydraDAM2 project, funded by a grant from the National Endowment for the Humanities Preservation and Access Research and Development program, was to extend the existing HydraDAM digital asset management system, developed with prior NEH support, to be able to serve as a digital preservation repository for time-based media collections implementable at a wide range of institutions using multiple digital storage strategies. The new open source digital preservation repository system developed as part of the project by partners Indiana University (IU) and WGBH, known as Phydo, is based on the Fedora 4.x digital repository system and Samvera (formerly Hydra) repository application development framework and is intended to support storage and long-term preservation management of audio and video files and their accompanying metadata. This white paper describes the work of the HydraDAM2 project to develop the Phydo system, along with future plans.

Academic Library as Scholarly Publisher Bibliography | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap