"Who and What Links to the Internet Archive"

Yasmin AlNoamany, Ahmed AlSum, Michele C. Weigle, and Michael L. Nelson have self-archived "Who and What Links to the Internet Archive" in arXiv.org.

Here's an excerpt:

The Internet Archive's (IA) Wayback Machine is the largest and oldest public web archive and has become a significant repository of our recent history and cultural heritage. Despite its importance, there has been little research about how it is discovered and used. Based on web access logs, we analyze what users are looking for, why they come to IA, where they come from, and how pages link to IA. We find that users request English pages the most, followed by the European languages. Most human users come to web archives because they do not find the requested pages on the live web. About 65% of the requested archived pages no longer exist on the live web. We find that more than 82% of human sessions connect to the Wayback Machine via referrals from other web sites, while only 15% of robots have referrers. Most of the links (86%) from websites are to individual archived pages at specific points in time, and of those 83% no longer exist on the live web.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data"

The CODATA-ICSTI Task Group on Data Citation Standards and Practices has published "Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data" (edited by Yvonne M. Socha) in the Data Science Journal.

Here's an excerpt:

The use of published digital data, like the use of digitally published literature, depends upon the ability to identify, authenticate, locate, access, and interpret them. Data citations provide necessary support for these functions, as well as other functions such as attribution of credit and establishment of provenance. References to data, however, present challenges not encountered in references to literature. For example, how can one specify a particular subset of data in the absence of familiar conventions such as page numbers or chapters? The traditions and good practices for maintaining the scholarly record by proper references to a work are well established and understood in regard to journal articles and other literature, but attributing credit by bibliographic references to data are not yet so broadly implemented. This report discusses the current state of data citation practices, its supporting infrastructure, a set of guiding principles for implementing data citation, challenges to implementation of good data citation practices, and open research questions

.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

Special Issue on Research Data Access and Preservation

The latest issue of the Bulletin of the Association for Information Science and Technology focuses on research data access and preservation.

Here's a selection of articles:

  • "Partnerships Between Institutional Repositories, Domain Repositories and Publishers"
  • "The Relevance of Research Data Sharing and Reuse Studies"
  • "Tracking Citations and Altmetrics for Research Data: Challenges and Opportunities"
  • "The Research Data Alliance: Implementing the Technology, Practice and Connections of a Data Infrastructure"
  • "The DCC's Institutional Engagements: Raising Research Data Management Capacity in UK Higher Education"

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

Preserving Computer-Aided Design (CAD)

The Digital Preservation Coalition has released Preserving Computer-Aided Design (CAD).

Here's an excerpt:

Computer-Aided Design (CAD) systems are used in both industry and academia to create digital models, whether of engineering designs, archaeological dig sites, or virtual worlds. These models can be of long-lasting significance and importance, particularly if they contain irreplaceable data or relate to long-lived products. This report is primarily aimed at those responsible for archives and repositories with CAD content, but may also be useful for creators of CAD content who want to make their models more amenable to preservation. It begins with an introduction to the historical development and basic concepts of CAD systems, then reviews the most pertinent issues associated with preserving CAD models, and indicates the current state of standardization work in the area. The report goes on to present some recent research of relevance to preserving CAD models before drawing conclusions and making recommendations on how archives should handle the CAD models they accept.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"An Evaluation of Caching Policies for Memento TimeMaps"

Justin F. Brunelle and Michael L. Nelson have self-archived "An Evaluation of Caching Policies for Memento TimeMaps" in arXiv.org.

Here's an excerpt from :

As defined by the Memento Framework, TimeMaps are machine-readable lists of time-specific copies—called "mementos"—of an archived original resource. In theory, as an archive acquires additional mementos over time, a TimeMap should be monotonically increasing. However, there are reasons why the number of mementos in a TimeMap would decrease, for example: archival redaction of some or all of the mementos, archival restructuring, and transient errors on the part of one or more archives. We study TimeMaps for 4,000 original resources over a three month period, note their change patterns, and develop a caching algorithm for TimeMaps suitable for a reverse proxy in front of a Memento aggregator. We show that TimeMap cardinality is constant or monotonically increasing for 80.2% of all TimeMap downloads observed in the observation period. The goal of the caching algorithm is to exploit the ideally monotonically increasing nature of TimeMaps and not cache responses with fewer mementos than the already cached TimeMap. This new caching algorithm uses conditional cache replacement and a Time To Live (TTL) value to ensure the user has access to the most complete TimeMap available. Based on our empirical data, a TTL of 15 days will minimize the number of mementos missed by users, and minimize the load on archives contributing to TimeMaps.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

Presentations from Open Repositories 2013

Presentations from Open Repositories 2013 are now available.

Here's a brief selection of talks:

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

Helping to Open Up: Improving Knowledge, Capability and Confidence in Making Research Data More Open

The Research Information and Digital Literacies Coalition has released Helping to Open Up: Improving Knowledge, Capability and Confidence in Making Research Data More Open.

Here's an excerpt from the announcement:

The report describes a framework for how to address this challenge when designing training and support for opening data, within the broader questions of RDM. Recommendations are set out, relating to:

– putting opening data at the heart of policy

– putting opening data at the heart of training

– deepening and broadening the training

– identifying and disseminating best practice in opening data

– developing institutional and community support

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Crafting Linked Open Data for Cultural Heritage: Mapping and Curation Tools for the Linked Jazz Project"

M. Cristina Pattuelli, Matt Miller, Leanora Lange, Sean Fitzell, and Carolyn Li-Madeo have published "Crafting Linked Open Data for Cultural Heritage: Mapping and Curation Tools for the Linked Jazz Project" in the latest issue of Code4Lib Journal.

Here's an excerpt:

This paper describes tools and methods developed as part of Linked Jazz, a project that uses Linked Open Data (LOD) to reveal personal and professional relationships among jazz musicians based on interviews from jazz archives. The overarching aim of Linked Jazz is to explore the possibilities offered by LOD to enhance the visibility of cultural heritage materials and enrich the semantics that describe them. While the full Linked Jazz dataset is still under development, this paper presents two applications that have laid the foundation for the creation of this dataset: the Mapping and Curator Tool, and the Transcript Analyzer. These applications have served primarily for data preparation, analysis, and curation and are representative of the types of tools and methods needed to craft linked data from digital content available on the web. This paper discusses these two domain-agnostic tools developed to create LOD from digital textual documents and offers insight into the process behind the creation of LOD in general.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Batch Metadata Assignment to Archival Photograph Collections Using Facial Recognition Software"

Kyle Banerjee and Maija Anderson have published "Batch Metadata Assignment to Archival Photograph Collections Using Facial Recognition Software" in the latest issue of Code4Lib Journal.

Here's an excerpt:

Useful metadata is essential to giving individual meaning and value within the context of a greater image collection as well as making them more discoverable. However, often little information is available about the photos themselves, so adding consistent metadata to large collections of digital and digitized photographs is a time consuming process requiring highly experienced staff.

By using facial recognition software, staff can identify individuals more quickly and reliably. Knowledge of individuals in photos helps staff determine when and where photos are taken and also improves understanding of the subject matter.

This article demonstrates simple techniques for using facial recognition software and command line tools to assign, modify, and read metadata for large archival photograph collections.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Using Data Curation Profiles to Design the Datastar Dataset Registry"

Sarah J. Wright, Wendy A. Kozlowski, Dianne Dietrich, Huda J. Khan, and Gail S. Steinhart have published "Using Data Curation Profiles to Design the Datastar Dataset Registry" in the latest issue of D-Lib Magazine.

Here's an excerpt:

The development of research data services in academic libraries is a topic of concern to many. Cornell University Library's efforts in this area include the Datastar research data registry project. In order to ensure that Datastar development decisions were driven by real user needs, we interviewed researchers and created Data Curation Profiles (DCPs). Researchers supported providing public descriptions of their datasets; attitudes toward dataset citation, provenance, versioning, and domain specific standards for metadata also helped to guide development. These findings, as well as considerations for the use of this particular method for developing research data services in libraries are discussed in detail.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Foundations of Data Curation: The Pedagogy and Practice of "Purposeful Work" with Research Data"

Carole L. Palmer, Nicholas M. Weber, Trevor Muñoz, and Allen H. Renear have punlished "Foundations of Data Curation: The Pedagogy and Practice of "Purposeful Work" with Research Data" in the latest issue of Archive Journal.

Here's an excerpt:

Increased interest in large-scale, publicly accessible data collections has made data curation critical to the management, preservation, and improvement of research data in the social and natural sciences, as well as the humanities. This paper explicates an approach to data curation education that integrates traditional notions of curation with principles and expertise from library, archival, and computer science. We begin by tracing the emergence of data curation as both a concept and a field of practice related to, but distinct from, both digital curation and data stewardship. This historical account, while far from definitive, considers perspectives from both the sciences and the humanities. Alongside traditional LIS and archival science practices, unique aspects of curation have informed our concept of "purposeful work" with data and, in turn, our pedagogical approach to data curation for the sciences and the humanities.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

Walk This Way: Detailed Steps for Transferring Born-Digital Content from Media You Can Read In-house

OCLC Research has released Walk This Way: Detailed Steps for Transferring Born-Digital Content from Media You Can Read In-house.

Here's an excerpt from the announcement:

The third report, Walk This Way: Detailed Steps for Transferring Born-Digital Content from Media You Can Read In-house, collects the assembled wisdom of experienced practitioners to help those with less experience make appropriate choices in gaining control of born-digital content. It contains discrete steps with objectives, links to available tools and software, references and resources for further research and paths to engagement with the digital archives community.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

Johns Hopkins University Offers Digital Curation Certificate Program

Johns Hopkins University has established a Digital Curation Certificate program.

Here's an excerpt from the announcement:

The Johns Hopkins University Certificate in Digital Curation, offered through the online graduate program in Museum Studies, advances the education and training of museum professionals worldwide in this emerging field.

This certificate offers a specialized curriculum that is critically needed in the museum field. It will prepare current and aspiring museum professionals to manage the growing volume and variety of digital data of long-term value that museums are now producing, acquiring, storing and sharing with researchers, educators and the public. It will train students to work with digital collections, exhibitions, and research data that will ensure the longevity of our global cultural heritage of which museums are the stewards.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"The .txtual Condition: Digital Humanities, Born-Digital Archives, and the Future Literary"

Matthew Kirschenbaum has published "The .txtual Condition: Digital Humanities, Born-Digital Archives, and the Future Literary" in a preview issue of Digital Humanities Quarterly.

Here's an excerpt:

Here then are some specifics I have considered as to how digital humanities might usefully collaborate with those archivists even now working on born-digital collections:

  • Digital archivists need digital humanities researchers and subject experts to use born-digital collections. Nothing is more important. If humanities researchers don't demand access to born-digital materials then it will be harder to get those materials processed in a timely fashion, and we know that with the born-digital every day counts.
  • Digital humanists need the long-term perspective on data that archivists have. Today's digital humanities projects are, after all, the repository objects of tomorrow's born-digital archives. Funders are increasingly (and rightfully) insistent about the need to have a robust data management and sustainability plan built into project proposals from the outset. Therefore, there is much opportunity for collaboration and team-building around not only archiving and preservation, but the complete data curation cycle. This extends to the need to jointly plan around storage and institutional infrastructure.
  • Digital archivists and digital humanists need common and interoperable digital tools. Open source community-driven development at the intersection of the needs of digital archivists, humanities scholars, and even collections' donors should become an urgent priority.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Unintended Consequences: New Materialist Perspectives on Library Technologies and the Digital Record"

portal: Libraries and the Academy has released an e-print of "Unintended Consequences: New Materialist Perspectives on Library Technologies and the Digital Record" by Marlene Manoff.

Here's an excerpt:

Digital technology has irrevocably altered the nature of the archive. Drawing on materialist critiques and the evolving field of media archaeology, this essay explores new strategies for understanding the implications of computer networks in libraries. Although a significant portion of the contemporary literature within Library and Information Science (LIS) addresses issues of technological change, the materialist and multidisciplinary approaches proposed here provide a theoretical basis for investigating the current state of library technologies in new ways. These methods provide insight into the proliferation of digital products and the cycles of platform adoption and replacement that have marked the past decades of library development. They also help to reframe questions about content aggregation and the licensing of digital scholarship.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Data Management in Scholarly Journals and Possible Roles for Libraries—Some Insights from EDaWaX"

Sven Vlaeminck has published "Data Management in Scholarly Journals and Possible Roles for Libraries—Some Insights from EDaWaX" in the latest issue of LIBER Quarterly.

Here's an excerpt:

In this paper we summarize the findings of an empirical study conducted by the EDaWaX-Project. 141 economics journals were examined regarding the quality and extent of data availability policies that should support replications of published empirical results in economics. This paper suggests criteria for such policies that aim to facilitate replications. These criteria were also used for analysing the data availability policies we found in our sample and to identify best practices for data policies of scholarly journals in economics. In addition, we also evaluated the journals' data archives and checked the percentage of articles associated with research data. To conclude, an appraisal as to how scientific libraries might support the linkage of publications to underlying research data in cooperation with researchers, editors, publishers and data centres is presented.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

British Library and Portico Collaborate on E-journal Preservation

British Library and Portico will collaborate on preserving the Library's e-journal collection.

Here's an excerpt from the announcement:

The partnership will help the British Library—along with five other legal deposit libraries—to meet regulations that recently became law in the United Kingdom and that extend the practice of legal deposit from traditional print publications to non-print publications such as e-journals, blogs and websites in the UK web domain.

Portico will utilize its established workflow and processes to create standardized and uniform journal content that can be exported to the British Library. They have started with 1,500 journals from three publishers that are already preserving content with Portico. As necessary, Portico will develop new tools for processing additional publisher content.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

Research Data Curation Bibliography, Version 3

Digital Scholarship has released version 3 of the Research Data Curation Bibliography. This selective bibliography includes over 230 English-language articles and technical reports that are useful in understanding the curation of digital research data in academic and other research institutions.

The "digital curation" concept is still evolving. In "Digital Curation and Trusted Repositories: Steps toward Success," Christopher A. Lee and Helen R. Tibbo define digital curation as follows:

Digital curation involves selection and appraisal by creators and archivists; evolving provision of intellectual access; redundant storage; data transformations; and, for some materials, a commitment to long-term preservation. Digital curation is stewardship that provides for the reproducibility and re-use of authentic digital data and other digital assets. Development of trustworthy and durable digital repositories; principles of sound metadata creation and capture; use of open standards for file formats and data encoding; and the promotion of information management literacy are all essential to the longevity of digital resources and the success of curation efforts.

Most sources have been published from January 2000 through June 2012; however, a limited number of earlier key sources are also included.

The bibliography includes links to freely available versions of included works. If such versions are unavailable, italicized links to the publishers' descriptions are provided.

It is available under a Creative Commons Attribution-Noncommercial 3.0 United States License.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

2013 NDSA Innovation Award Winners

The National Digital Stewardship Alliance Innovation Working Group has announced the 2013 NDSA Innovation Award winners.

Here's an excerpt:

Please join us in congratulating the 2013 Innovation Award winners:

Future Steward: Martin Gengenbach, Gates Archive. Martin is recognized for his work documenting digital forensics tools and workflows, especially his paper,"The Way We Do it Here: Mapping Digital Forensics Workflows in Collecting Institutions" and his work cataloging the DFXML schema.

Individual: Kim Schroeder, Wayne State University. Kim is recognized for her work as a mentor to future digital stewards in her role as a lecturer in Digital Preservation at Wayne State University, where she helped establish the first NDSA Student Group, supported the student-lead colloquium on digital preservation, and worked to facilitate collaboration between students in digital stewardship and local cultural heritage organizations.

Project: DataUp, California Digital Library. DataUp is recognized for creating an open-source tool uniquely built to assist individuals aiming to preserve research datasets by guiding them through the digital stewardship workflow process from dataset creation and description to the deposit of their datasets into public repositories.

Organization: Archive Team. The Archive Team , a self-described "loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage," is recognized for both for its aggressive, vital work in preserving websites and digital content slated for deletion and for its work advocating for the preservation of digital culture within the technology and computing sectors.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

Web Archiving

The Digital Preservation Coalition has released Web Archiving.

Here's an excerpt:

Web archiving technology enables the capture, preservation and reproduction of valuable content from the live web in an archival setting, so that it can be independently managed and preserved for future generations. This report introduces and discusses the key issues faced by organizations engaged in web archiving initiatives, whether they are contracting out to a third party service provider or managing the process in-house. It follows this with an overview of the main software applications and tools currently available. Selection and deployment of the most appropriate tools is contextual: organizations are advised to select the approach that best meets their business needs and drivers, and which they are able to support technically. Three case studies are included to illustrate the different operational contexts, drivers, and solutions that can be implemented.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Making Research Data Repositories Visible: The re3data.org Registry"

Heinz Pampel et al. have self-archived "Making Research Data Repositories Visible: The re3data.org Registry" in PeerJ PrePrints.

Here's an excerpt:

Researchers require infrastructures that ensure a maximum of accessibility, stability and reliability to facilitate working with and sharing of research data. Such infrastructures are being increasingly summarized under the term Research Data Repositories (RDR). The project re3data.org—Registry of Research Rata Repositories has begun to index research data repositories in 2012 and offers researchers, funding organizations, libraries and publishers an overview of the heterogeneous research data repository landscape. Information icons help researchers to easily identify an adequate repository for the storage and reuse of their data. This article describes the RDR landscape, outlines the practicality of re3data.org as a service, and shows how this service helps to find research data.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

Research Data Management in Practice

The Australian National Data Service has released Research Data Management in Practice.

Here's an excerpt:

ANDS has commissioned this "Research Data Management Practice Guide" as a practical starting point that focuses on the 'Why' and 'How' of good data and risk management, with plenty of references for further reading for readers who need more detail. . . .

The Practice Guide is aimed at research administrators in the e-research space, providing them with an overview for the planning and operations of sharing research data, thereby creating better opportunities for data re-use. It is acknowledged that no single person or even business unit is responsible for all aspects of research data management and that a collaborative approach is required. In all cases this will involve the researcher/data creator.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap