Paywall: "A Comprehensive Review of Open Data Platforms, Prevalent Technologies, and Functionalities"


We will discuss seven major open data platforms, such as (1) CKAN (2) DKAN (3) Socrata (4) OpenDataSoft (5) GitHub (6) Google datasets (7) Kaggle. We will evaluate the technological commons, techniques, features, methods, and visualization offered by each tool. In addition, why are these platforms important to users such as providers, curators, and end-users? And what are the key options available on these platforms to publish open data?

https://doi.org/10.1145/3560107.3560142

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"OpenStack Swift: An Ideal Bit-Level Object Storage System for Digital Preservation "


A bit-level object storage system is a foundational building block of long-term digital preservation (LTDP). To achieve the purposes of LTDP, the system must be able to: preserve the authenticity and integrity of the original digital objects; scale up with dramatically increasing demands for preservation storage; mitigate the impact of hardware obsolescence and software ephemerality; replicate digital objects among distributed data centers at different geographical locations; and to constantly audit and automatically recover from compromised states. . . . In this paper, we present OpenStack Swift, an open-source, mature and widely accepted cloud platform, as a practical and proven solution with a case study at the University of Alberta Library. We emphasize the implementation, application, cost analysis and maintenance of the system, with the purpose of contributing to the community with an exceedingly robust, highly scalable, self-healing and comparatively cost-effective bit-level object storage system for long-term digital preservation.

https://doi.org/10.2218/ijdc.v17i1.782

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"The Emerging Digital Infrastructure for Research in the Humanities"


This article advances the thesis that three decades of investments by national and international funders, combined with those of scholars, technologists, librarians, archivists, and their institutions, have resulted in a digital infrastructure in the humanities that is now capable of supporting end-to-end research workflows. . . . The capabilities of the infrastructure remain unevenly distributed within and across disciplines, institutions, and regions. Moreover, the components, including the links between steps in the workflow, are generally far from user-friendly and seamless in operation. Because further refinements and additional capacities are still much needed, the article concludes with a discussion of key priorities for future work.

https://doi.org/10.1007/s00799-022-00332-3

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"CADRE: A Collaborative, Cloud-Based Solution for Big Bibliographic Data Research in Academic Libraries"

https://doi.org/10.3389/fdata.2020.556282

Research Data Curation Bibliography, Version 10 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Research Computing in the Cloud: Leveling the Playing Field"

Michael Berman has published "Research Computing in the Cloud: Leveling the Playing Field" in EDUCAUSE Review.

Here's an excerpt:

The universal availability of commodity cloud services and high-speed networks can eliminate the requirement that departments must have local HPC resources. The infrastructure available from large cloud providers such as AWS dwarfs and outperforms all but the largest and most-specialized supercomputing facilities. . . .

Moving large data sets on commodity networks, or even on regional research and education networks, simply doesn't work well for hundreds of terabytes or petabytes of data, which is the scale required by modern researchers in many fields. . . .

To begin to address these issues, the Pacific Research Platform (PRP), a collaboration among research universities and CENIC (operator of the CalREN network serving California), has been funded by the National Science Foundation to support the streaming of "elephant flows."

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

Implementation Roadmap for the European Open Science Cloud

The European Commission has released Implementation Roadmap for the European Open Science Cloud.

Here's an excerpt from the announcement:

Overall, the document presents the results and available evidence from an extensive and conclusive consultation process that started with the publication of the Communication: European Cloud initiative (COM(2016)178) in April 2016.

The consultation upheld the intervention logic presented in the Communication, to create a fit for purpose pan-European federation of research data infrastructures, with a view to moving from the current fragmentation to a situation where data is easy to store, find, share and re-use.

On the basis of the consultation, the implementation Roadmap gives and overview of six actions lines for the implementation of the EOSC:

a) architecture, b) data, c) services, d) access & interfaces, e) rules and f) governance.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"The Modern Research Data Portal: A Design Pattern for Networked, Data-Intensive Science"

Kyle Chard et al. have published "The Modern Research Data Portal: A Design Pattern for Networked, Data-Intensive Science" in PeerJ.

Here's an excerpt:

In this article, we first define the problems that research data portals address, introduce the legacy approach, and examine its limitations. We then introduce the MRDP design pattern and describe its realization via the integration of two elements: Science DMZs (Dart et al., 2013) (high-performance network enclaves that connect large-scale data servers directly to high-speed networks) and cloud-based data management and authentication services such as those provided by Globus (Chard, Tuecke & Foster, 2014). We then outline a reference implementation of the MRDP design pattern, also provided in its entirety on the companion web site, https://docs.globus.org/mrdp, that the reader can study—and, if they so desire, deploy and adapt to build their own high-performance research data portal. We also review various deployments to show how the MRDP approach has been applied in practice: examples like the National Center for Atmospheric Research's Research Data Archive, which provides for high-speed data delivery to thousands of geoscientists; the Sanger Imputation Service, which provides for online analysis of user-provided genomic data; the Globus data publication service, which provides for interactive data publication and discovery; and the DMagic data sharing system for data distribution from light sources. We conclude with a discussion of related technologies and summary.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

Digital Humanities: "CSDH/SCHN Cyberinfrastructure Conversations Summary"

CSDH/SCHN has released the "CSDH/SCHN Cyberinfrastructure Conversations Summary."

Here's an excerpt:

This is a high-level summary of the outcome of a series of conversations regarding the CFI Cyberinfrastructure Initiative among Canadian Digital Humanists. The conversations emerged from CSDH/SCHN consultations that began in the Spring of 2014. The document tries to reflect the priorities and areas of emphasis that have emerged from these discussions, and suggests several areas of focus for broad-based collaborative cyberinfrastructure that would serve the needs of many in the digital humanities research community. The diversity of work in the digital humanities makes it impossible to mention every need, but in the view of the CSDH executive, this summary covers a number of pressing needs from a range of research groups across the country, and balances the need to serve existing researchers with that of expanding access to important datasets and cyberinfrastructure to leading humanities researchers who are experimenting with advanced research computing.

Digital Scholarship | Digital Scholarship Sitemap

"E-Science as a Catalyst for Transformational Change in University Research Libraries"

Mary E. Piorun has self-archived her dissertaion "E-Science as a Catalyst for Transformational Change in University Research Libraries."

Here's an excerpt:

Changes in how research is conducted, from the growth of e-science to the emergence of big data, have lead to new opportunities for librarians to become involved in the creation and management of research data, at the same time the duties and responsibilities of university libraries continue to evolve. This study examines those roles related to e-science while exploring the concept of transformational change and leadership issues in bringing about such a change. Using the framework established by Levy and Merry for first- and second-order change, four case studies of libraries whose institutions are members in the Association of Research Libraries (ARL) are developed.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"The Role of the Library in the Research Enterprise"

Christopher J. Shaffer has published "The Role of the Library in the Research Enterprise" in the latest issue of the Journal of eScience Librarianship.

Here's an excerpt:

Libraries have provided services to researchers for many years. Changes in technology and new publishing models provide opportunities for libraries to be more involved in the research enterprise. Within this article, the author reviews traditional library services, briefly describes the eScience and publishing landscape as it relates to libraries, and explores possible library programs in support of research. Many of the new opportunities require new partnerships, both within the institution and externally.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

Fit for Purpose: Developing Business Cases for New Services in Research Libraries Webinar Recording

DuraSpace has released a recording of its Fit for Purpose: Developing Business Cases for New Services in Research Libraries webinar.

Here's an excerpt from the announcement:

Mike Furlough, Associate Dean of Research and Scholarly Communications, Penn State and David Minor Chronopolis Program Manager and Director of Digital Preservation Initiatives University of California San Diego Library/SDSC presented "Fit for Purpose: Developing Business Cases for New Services in Research Libraries" to participants in the DuraSpace/ARL/DLF E-Science Institute. In this webinar, the presenters discussed the CLIR/DLF-funded research project Fit for Purpose, which aims to present a structured, disciplined approach for making decisions about creating and maintaining new services in research libraries.

| Digital Curation Resource Guide | Digital Scholarship |

TechWatch: Preparing for Data-driven Infrastructure (Draft)

The JISC Observatory has released a draft for public comment of TechWatch: Preparing for Data-driven Infrastructure.

Here's an excerpt :

This report provides an overview of some concepts and approaches as well as tools, and can be used to help organisational planning. Specifically, this report:

  • describes data-centric architectures;
  • gives some examples of how data are already shared between organisations and discusses this from a datacentric perspective;
  • introduces some of the key tools and technologies that can support data-centric architectures as well as some new models of data management, including opportunities to use "cloud" services;
  • concludes with a look at the direction of travel and lists the sources cited in a References section.

| Research Data Curation Bibliography | Digital Scholarship |

Journal of eScience Librarianship Launched

The Lamar Soutter Library has launched the Journal of eScience Librarianship.

The first issue's "full-length papers" are:

| E-science and Academic Libraries Bibliography | Digital Scholarship |

Data-Intensive Research: Community Capability Model Framework (Consultation Draft)

The Community Capability Model for Data-Intensive Research project has released a consultation draft of the Community Capability Model Framework.

Here's an excerpt:

The Community Capability Model Framework is a tool developed by UKOLN, University of Bath, and Microsoft Research to assist institutions, research funders and researchers in growing the capability of their communities to perform data-­-intensive research by

  • profiling the current readiness or capability of the community,
  • indicating priority areas for change and investment, and
  • developing roadmaps for achieving a target state of readiness.

The Framework is comprised of eight capability factors representing human, technical and environmental issues. Within each factor are a series of community characteristics that are relevant for determining the capability or readiness of that community to perform data- intensive research.

| E-science and Academic Libraries Bibliography | Digital Scholarship |

E-science and Academic Libraries Bibliography

Digital Scholarship has released the E-science and Academic Libraries Bibliography. It includes English-language articles, books, editorials, and technical reports that are useful in understanding the broad role of academic libraries in e-science efforts. The scope of this brief selective bibliography is narrow, and it does not cover data curation and research data management issues in libraries in general. Most sources have been published from 2007 through October 18, 2011; however, a limited number of key sources published prior to 2007 are also included. The bibliography includes links to freely available versions of included works, such as e-prints and open access articles.

| Digital Curation and Preservation Bibliography 2010 | Digital Scholarship |

"Building Research Cyberinfrastructure at Small/Medium Research Institutions"

Anne Agee, Theresa Rowe, Melissa Woo, and David Woods have published "Building Research Cyberinfrastructure at Small/Medium Research Institutions" in EDUCAUSE Quarterly.

Here's an excerpt:

To build a respectable cyberinfrastructure, the IT organizations at small/medium research institutions need to use creativity in discovering the needs of their researchers, setting priorities for support, developing support strategies, funding and implementing cyberinfrastructure, and building partnerships to enhance research support. This article presents the viewpoints of four small-to-medium-sized research universities who have struggled with the issue of providing appropriate cyberinfrastructure support for their research enterprises. All four universities have strategic goals for raising the level of research activity and increasing extramural funding for research.

Presentations from the Digital Repository Federation International Conference 2009

Presentations from the DRF International Conference 2009: Open Access Repositories Now and in the Future—From the Global and Asia-Pacific Points of View are now available. The Digital Repository Federation is "a federation consisting of 87 universities and research institutes (as of February 2009), which aims to promote Open Access and Institutional Repository in Japan."

Here's a quick selection of presentations:

Revised NSF Software Development for Cyberinfrastructure Solicitation

The NSF has issued a revised solicitation for Software Development for Cyberinfrastructure grants (NSF 10-508). It is anticipated that $15,000,000 over a three-year period will be available for 25 to 30 awards. The full proposal deadline is February 26, 2010.

Here's an excerpt:

The FY2010 SDCI solicitation supports the development, deployment, and maintenance of software in the five software focus area listed above, i.e., software for HPC systems, software for digital data management, software for networking, middleware, and cybersecurity, and specifically focuses on cross-cutting issues of CI software sustainability, manageability and power/energy efficiency in each of these software focus areas. . . .

  1. Software for Digital Data

The Data focus area addresses software that promotes acquisition, transport, discovery, access, analysis, and preservation of very large-scale digital data in support of large scale applications or data sets transitioning to use by communities other than the ones that originally gathered the data. Examples of such datasets includes climatologic, ecologic, phonologic, observation data, sensor systems, spatial visualizations, multi-dimensional datasets correlated with metadata and so forth.

Specific focus areas in Software for Digital Data for the FY2010 SDCI solicitation include:

  • Documentation/Metadata: Tools for automated/facilitated metadata creation/acquisition, including linking data and metadata to assist in curation efforts; tools to enable the creation and application of ontologies, semantic discovery, assessment, comparison, and integration of new composite ontologies.
  • Security/Protection: Tools for data authentication, tiered/layered access systems for data confidentiality/privacy protection, replication tools to ensure data protection across varied storage systems/strategies, rules-based data security management tools, and assurance tools to test for digital forgery and privacy violations.
  • Data transport/management: Tools to enable acquisition of high data rate high volume data from varied, distributed data sources (including sensors systems and instruments), while addressing stringent space and data quality constraints; tools to assist in improved low-level management of data and transport to take better advantage of limited bandwidth.
  • Data analytics and visualization: Tools that operate in (near) real-time, not traditional batch mode, on possible streaming data, in-transit data processing, data integration and fusion.  

Data Preservation in High Energy Physics

The ICHFA DPHEP International Study Group has self-archived Data Preservation in High Energy Physics in arXiv.org.

Here's an excerpt:

Data from high-energy physics (HEP) experiments are collected with significant financial and human effort and are mostly unique. At the same time, HEP has no coherent strategy for data preservation and re-use. An inter-experimental Study Group on HEP data preservation and long-term analysis was convened at the end of 2008 and held two workshops, at DESY (January 2009) and SLAC (May 2009). This document is an intermediate report to the International Committee for Future Accelerators (ICFA) of the reflections of this Study Group.