"The Cloud, the Public Square, and Digital Public Archival Infrastructure"


Many governments have chosen to store their records in the cloud rather than invest in the increased digital infrastructure now required to manage them.. . . Yet, archivists and archival perspectives have not been much involved in public discussion of this change. . . . The shape of the emerging infrastructure underpinning the management of digital communication may well be the most significant lasting feature of the digital environment for societies and their archives. This article discusses why that development requires archival voices in the public square to address it.

https://doi.org/10.1007/s10502-023-09417-7

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Ten Lessons for Data Sharing with a Data Commons"


A data commons is a cloud-based data platform with a governance structure that allows a community to manage, analyze and share its data. Data commons provide a research community with the ability to manage and analyze large datasets using the elastic scalability provided by cloud computing and to share data securely and compliantly, and, in this way, accelerate the pace of research. Over the past decade, a number of data commons have been developed and we discuss some of the lessons learned from this effort.

https://doi.org/10.1038/s41597-023-02029-x

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

79.3 Exabytes Capacity Sold in 2022: "Magnetic Tape Storage Is Seeing Cloud Go Back to the Future for Its Archival Data Needs"


Even then [in 1981], says Goodwin, people were saying tape was not long for this world. Those critics appear to have been silenced by recent sales figures, which show year-on-year shipments of hard disk drives (HDDs) sink by 34% in 2022, while consignments of magnetic tape drives rose by 14% — a total of 79.3 exabytes, or roughly equivalent to the entirety of data created on the internet every 32 days.

bit.ly/3ky5Trv

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "A Comprehensive Review of Open Data Platforms, Prevalent Technologies, and Functionalities"


We will discuss seven major open data platforms, such as (1) CKAN (2) DKAN (3) Socrata (4) OpenDataSoft (5) GitHub (6) Google datasets (7) Kaggle. We will evaluate the technological commons, techniques, features, methods, and visualization offered by each tool. In addition, why are these platforms important to users such as providers, curators, and end-users? And what are the key options available on these platforms to publish open data?

https://doi.org/10.1145/3560107.3560142

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Towards Environmentally Sustainable Long-term Digital Preservation "


Digital preservation relies on technological infrastructure (information and communication technology, ICT) that can have environmental impacts. While altering technology usage can reduce the impact of digital preservation practices, this alone is not a strategy for sustainable practice. Moving toward environmentally sustainable digital preservation requires critically examining the motivations and assumptions that shape current practice. The use of scalable cloud infrastructures can reduce the environmental impacts of long-term data preservation solutions.

http://www.ijdc.net/article/view/848

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"OpenStack Swift: An Ideal Bit-Level Object Storage System for Digital Preservation "


A bit-level object storage system is a foundational building block of long-term digital preservation (LTDP). To achieve the purposes of LTDP, the system must be able to: preserve the authenticity and integrity of the original digital objects; scale up with dramatically increasing demands for preservation storage; mitigate the impact of hardware obsolescence and software ephemerality; replicate digital objects among distributed data centers at different geographical locations; and to constantly audit and automatically recover from compromised states. . . . In this paper, we present OpenStack Swift, an open-source, mature and widely accepted cloud platform, as a practical and proven solution with a case study at the University of Alberta Library. We emphasize the implementation, application, cost analysis and maintenance of the system, with the purpose of contributing to the community with an exceedingly robust, highly scalable, self-healing and comparatively cost-effective bit-level object storage system for long-term digital preservation.

https://doi.org/10.2218/ijdc.v17i1.782

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"CADRE: A Cloud-Based Data Service for Big Bibliographic Data"

https://dl.acm.org/doi/abs/10.1145/3459637.3481898

CADRE: Collaborative Archive & Data Research Environment

Academic Library as Scholarly Publisher Bibliography, Version 2 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"CADRE: A Collaborative, Cloud-Based Solution for Big Bibliographic Data Research in Academic Libraries"

https://doi.org/10.3389/fdata.2020.556282

Research Data Curation Bibliography, Version 10 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

Implementation Roadmap for the European Open Science Cloud

The European Commission has released Implementation Roadmap for the European Open Science Cloud.

Here's an excerpt from the announcement:

Overall, the document presents the results and available evidence from an extensive and conclusive consultation process that started with the publication of the Communication: European Cloud initiative (COM(2016)178) in April 2016.

The consultation upheld the intervention logic presented in the Communication, to create a fit for purpose pan-European federation of research data infrastructures, with a view to moving from the current fragmentation to a situation where data is easy to store, find, share and re-use.

On the basis of the consultation, the implementation Roadmap gives and overview of six actions lines for the implementation of the EOSC:

a) architecture, b) data, c) services, d) access & interfaces, e) rules and f) governance.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Storage is a Strategic Issue: Digital Preservation in the Cloud"

Gillian Oliver and Steve Knight have published "Storage is a Strategic Issue: Digital Preservation in the Cloud" in .

Here's an excerpt:

Worldwide, many governments are mandating a 'cloud first' policy for information technology infrastructures. In 2013, the National Library of New Zealand's National Digital Heritage Archive (NDHA) outsourced storage of its digital collections. A case study of the decision to outsource and its consequences was conducted, involving interviews of the representatives of three key stakeholders: IT, the NDHA, and the vendor. Clear benefits were identified by interviewees, together with two main challenges. The challenges related to occupational culture tensions, and a shift in funding models. Interviewees also considered whether the cultural heritage sector had any unique requirements. A key learning was that information managers were at risk of being excluded from the detail of outsourcing, and so needed to be prepared to assert their need to know based on their stewardship mandate.

Digital Scholarship | Digital Scholarship Sitemap

Guidance on Cloud Storage and Digital Preservation: How Cloud Storage Can Address the Needs of Public Archives in the UK

The National Archives (UK) has released Guidance on Cloud Storage and Digital Preservation: How Cloud Storage Can Address the Needs of Public Archives in the UK.

Here's an excerpt:

This Guidance is focussed on the cloud and its potential role in archival storage. It aims to help public archives in the UK develop an understanding of cloud storage and its potential contribution to their digital preservation activities, and to provide a balanced overview allowing archives to understand potential benefits and risks involved and the range of options available (including not using cloud if it does not meet your requirements).

Whilst primarily targeted at public archives, the aim is to provide information that will be useful within a range of organisational contexts, and overarching advice that can be translated into the private sector where relevant.

See also the case studies.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

DuraSpace Gets $861,000 Grant to Develop DuraCloud Data Services

DuraSpace has received a two-year $861,000 grant from the Gordon and Betty Moore Foundation to develop DuraCloud data services.

Here's an excerpt from the press release:

Currently, DuraCloud provides a reliable way to preserve and archive research materials in the cloud, a solution developed within the academic community for academic institutions. During the next phase of DuraCloud development, additional applications, features, and services will be built to extend the cloud in order to facilitate data archiving and content management. DuraSpace offers DuraCloud as a software as a service that enables archiving, preserving, and managing institutional content using cloud storage and intends to expand its service offerings in the next phase of development.

Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works Cover

|Digital Scholarship |

"LOCKSS Boxes in the Cloud"

David S. H. Rosenthal and Daniel L. Vargas have self-archived "LOCKSS Boxes in the Cloud" at the LOCKSS website.

Here's an excerpt:

The 30-year history of raw disk costs shows a drop of at least 30% per year. The history of cloud storage costs from commercial providers shows that they drop at most 3% per year. Until there is a radical change in one or other of these cost curves it clear that cloud storage is not even close to cost-competitive with local disk storage for long-term preservation purposes in general, and LOCKSS boxes in particular.

| Digital Curation and Preservation Bibliography 2010 | Digital Scholarship |

Digital Curation and the Cloud: Final Report

JISC has released Digital Curation and the Cloud: Final Report. This is a revised version of the draft report that was released earlier this year.

Here's an excerpt:

Digital curation involves a wide range of activities, many of which may be suitable for deployment within a cloud environment. These range from infrequent, resource-intensive tasks which will benefit from the ability to rapidly provision resources, to day-to-day collaborative activities which can be facilitated by networked cloud services. Associated benefits are offset by risks such as loss of data or service level, legal and governance incompatibilities and transfer bottlenecks. There is considerable variability across both risks and benefits according to the service and deployment models being adopted and the context in which activities are performed. Some risks, such as legal liabilities, are mitigated by the use of alternatives, for example, private cloud models, but this is typically at the expense of benefits such as resource elasticity and economies of scale.

| Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works | Digital Scholarship |

"REDDNET and Digital Preservation in the Open Cloud: Research at Texas Tech University Libraries on Long-Term Archival Storage"

James Brewer, Tracy Popp, and Joy Perrin have published "REDDNET and Digital Preservation in the Open Cloud: Research at Texas Tech University Libraries on Long-Term Archival Storage" in the latest issue of the Journal of Digital Information.

Here's an excerpt:

In open cloud systems users can develop their own software and data management, control access, and purchase their own hardware while running securely in the cloud environment. . . . It is in this context that REDDnet (Research and Education Data Depot network) is presented as the place where the Texas Tech University (TTU) Libraries have been conducting research on long-term digital archival storage. The REDDnet network by year's end will be at 1.2 petabytes (PB) with an additional 1.4 PB for a related project. . . additionally there are over 200 TB of tape storage. These numbers exclude any disk space which TTU will be purchasing during the year. National Science Foundation (NSF) funding covering REDDnet and CMS-HI was in excess of $850,000 with $850,000 earmarked toward REDDnet. In the terminology we used above, REDDnet is an open cloud system that invited TTU Libraries to participate. This means that we run software which fits the REDDnet structure. We are beginning to complete the final design of our system, and starting to move into the first stages of construction. And we have made a decision to move forward and purchase one-half petabyte of disk storage in the initial phase. The concerns, deliberations and testing are presented here along with our initial approach.

| Digital Curation and Preservation Bibliography 2010: "If you're looking for a reading list that will keep you busy from now until the end of time, this is your one-stop shop for all things digital preservation." — "Digital Preservation Reading List," Preservation Services at Dartmouth College weblog, February 21, 2012. | Digital Scholarship |

Presentations from the Curation in the Cloud Workshop

Presentations from the Curation in the Cloud Workshop are now available.

Here's an excerpt from the conference web page:

The aim of this 2-day workshop is to assess the potential and practicalities of using cloud-based solutions for the curation and long-term preservation of digital materials, focusing particularly on data that originates from research or that supports research processes. What will particularly be of value is to engage stakeholders from a number of different types and scales of organisations, encompassing those that are able to rely on established and joined-up institutional infrastructures; alongside those who may have more fragmented or immature local measures in place to manage data.

| Digital Curation and Preservation Bibliography 2010 | Digital Scholarship |

"Digital Curation and the Cloud"

Brian Aitken, Patrick McCann, Andrew McHugh, Kerry Miller have self-archived "Digital Curation and the Cloud" in Enlighten.

Here's an excerpt:

Digital curation involves a wide range of activities, many of which could benefit from cloud deployment to a greater or lesser extent. These range from infrequent, resource-intensive tasks which benefit from the ability to rapidly provision resources to day-to-day collaborative activities which can be facilitated by networked cloud services. Associated benefits are offset by risks such as loss of data or service level, legal and governance incompatibilities and transfer bottlenecks. There is considerable variability across both risks and benefits according to the service and deployment models being adopted and the context in which activities are performed. Some risks, such as legal liabilities, are mitigated by the use of alternative, e.g., private cloud models, but this is typically at the expense of benefits such as resource elasticity and economies of scale. Infrastructure as a Service model may provide a basis on which more specialised software services may be provided.

| Digital Curation and Preservation Bibliography 2010 | Digital Scholarship |

Cloud Computing Toolkit: Guidance for Outsourcing Information Storage to the Cloud

The Archives & Records Association and the Department of Information Studies, Aberystwyth University have released the Cloud Computing Toolkit: Guidance for Outsourcing Information Storage to the Cloud.

Here's an excerpt:

The toolkit covers four main areas that should be considered when an organisation intends to outsource business processes and information storage into a cloud environment and should help develop a consistent cloud computing strategy as well as requirements for the required cloud service. Each of the four main sections proposes questions that should be taken into consideration by the organisation or that should be addressed to the prospective cloud service provider:

  • Overview of cloud computing – Cloud computing definition, benefits and challenges
  • Preparing for the cloud – Cloud service selection and risk assessment
  • Managing the cloud – Information management, compliance, contract and cost
  • Operating in the cloud – Information security, access and availability

Read more about it at Storing Information in the Cloud: Project Report.

| Digital Scholarship | Digital Scholarship Publications Overview | Scholarly Electronic Publishing Bibliography 2010 |

Privacy Considerations in Cloud-Based Teaching and Learning Environments

The EDUCAUSE Learning Initiative has released Privacy Considerations in Cloud-Based Teaching and Learning Environments.

Here's an excerpt:

In this white paper, we outline the privacy issues relevant to using cloud-based instructional tools or cloud-based teaching and learning environments for faculty members and those supporting instruction. Our discussion of how teaching and learning in an increasingly technological environment has transformed the way we interact and interpret FERPA will help inform various choices that institutions can consider to best address the law, including policy and best-practice examples. We highlight practical suggestions for how faculty members can continue to use innovative instructional strategies and engage students while considering privacy issues. Finally, this paper discusses ways to further explore and address privacy locally and includes a comprehensive resource list for further reading.

| Digital Scholarship |

Cloud-Sourcing Research Collections: Managing Print in the Mass-Digitized Library Environment

OCLC has released Cloud-Sourcing Research Collections: Managing Print in the Mass-Digitized Library Environment.

Here's an excerpt from the press release:

The objective of the project was to examine the feasibility of outsourcing management of low-use print books held in academic libraries to shared service providers, including large-scale print and digital repositories. The study assessed the opportunity for library space saving and cost avoidance through the systematic and intentional outsourcing of local management operations for digitized books to shared service providers and progressive downsizing of local print collections in favor of negotiated access to the digitized corpus and regionally consolidated print inventory.

Some of the findings from the project that are detailed in the report include:

  • There is sufficient material in the mass-digitized library collection managed by the HathiTrust to duplicate a sizeable (and growing) portion of virtually any academic library in the United States, and there is adequate duplication between the shared digital repository and large-scale print storage facilities to enable a great number of academic libraries to reconsider their local print management operations.
  • The combination of a relatively small number of potential shared print providers, including the US Library of Congress, was sufficient to achieve more than 70% coverage of the digitized book collection, suggesting that shared service may not require a very large network of providers.
  • Substantial library space savings and cost avoidance could be achieved if academic institutions outsourced management of redundant low-use inventory to shared service providers.
  • Academic library directors can have a positive and profound impact on the future of academic print collections by adopting and implementing a deliberate strategy to build and sustain regional print service centers that can reduce the total cost of library preservation and access.

| Digital Scholarship |

Cloud Computing: TierraCloud Launches HC2 Open Source Project with Fedora Plug-in

TierraCloud has launched the HC2 Open Source Project. HC2 has a Fedora Repository plug-in.

Here's an excerpt from the press release:

Web2.0s have invented a new storage architecture that runs on industry standard x86 servers using sophisticated software to create extremely reliable and scalable storage systems. This architecture, that may be called Private Cloud Storage, is so compelling that enterprises will have no option but to use it. Although enterprise storage architectures have been fairly stable since the mid 80’s with external block and file storage, TierraCloud expects these architectures will undergo a sea-change in the next decade.

"Current mainstream solutions are ill-suited to address new private cloud storage requirements" said Sriram Rupanagunta, founder of TierraCloud. "Acquisition cost, management cost, scalability and reliability are the key requirements. With HC2’s unique advantages in the areas of automated data management, extreme data mobility, and ability to run third-party storage apps, the total-cost-of-ownership will get slashed by 10x." . . .

"It has become clear that data curation will require distributed storage and application frameworks," said Sayeed Choudhury, Associate Dean of University Libraries at Johns Hopkins University. "No single institution can develop the comprehensive, necessary infrastructure to preserve and provide access to the large amount of data being generated by all disciplines ranging from the sciences to the humanities. HC2's emphasis on hardware choices, geographically distributed data and open-source software is compelling. Most institutions will be eager to experiment with private cloud storage and HC2 represents a useful option in this regard."