"Research Data Management at the University of Warwick: Recent Steps towards a Joined-up Approach at a UK University"

Jenny Delasalle has published "Research Data Management at the University of Warwick: Recent Steps towards a Joined-up Approach at a UK University" in LIBREAS. Library Ideas.

Here's an excerpt:

This paper charts the steps taken and possible ways forward for the University of Warwick in its approach to research data management, providing a typical example of a UK research university's approach in two strands: requirements and support. The UK government approach and funding landscape in relation to research data management provided drivers for the University of Warwick to set requirements and provide support, and examples of good practice at other institutions, support from a central national body (the UK Digital Curation Centre) and learning from other universities' experiences all proved valuable to the University of Warwick. Through interviews with researchers at Warwick, various issues and challenges are revealed: perhaps the biggest immediate challenges for Warwick going forward are overcoming scepticism amongst researchers, overcoming costs, and understanding the implications of involving third party companies in research data management. Building technical infrastructure could sit alongside and beyond those immediate steps and beyond the challenges that face one University are those that affect academia as a whole.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"A Cross Disciplinary Study of Link Decay and the Effectiveness of Mitigation Techniques"

Jason Hennessey and Steven Xijin Ge have published "A Cross Disciplinary Study of Link Decay and the Effectiveness of Mitigation Techniques" in BMC Bioinformatics.

Here's an excerp:

We accessed 14,489 unique web pages found in the abstracts within Thomson Reuters' Web of Science citation index that were published between 1996 and 2010 and found that the median lifespan of these web pages was 9.3 years with 62% of them being archived. Survival analysis and logistic regression were used to find significant predictors of URL lifespan. The availability of a web page is most dependent on the time it is published and the top-level domain names. Similar statistical analysis revealed biases in current solutions: the Internet Archive favors web pages with fewer layers in the Universal Resource Locator (URL) while WebCite is significantly influenced by the source of publication. We also created a prototype for a process to submit web pages to the archives and increased coverage of our list of scientific webpages in the Internet Archive and WebCite by 22% and 255%, respectively.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Metadata is a Love Note to the Future—UK Higher Education Research Data Management (RDM) Survey"

Martin Hamilton has released "Metadata is a Love Note to the Future—UK Higher Education Research Data Management (RDM) Survey."

Here's an excerpt:

I'm delighted to be able to present here the results of our recent survey of the UK Higher Education community's plans for Research Data Management, along with a little initial analysis and an executive summary. To stay true to the spirit of openness, we have made a redacted version of the raw data available, along with our analysis, using the figshare cloud RDM service.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

European Landscape Study of Research Data Management

SURF has released the European Landscape Study of Research Data Management.

Here's an excerpt:

This report presents the results of an online survey to establish which interventions are already being used by funding agencies, research institutions, national bodies and publishers across the European Union member states and a number of countries outside Europe in order to improve the capacity and skills of researchers in making effective use of research data infrastructures. It also makes recommendations that organisations can adopt to help their researchers. . . .

Interviews with researchers indicate that the main drivers for writing a data management plan are requirements by the funder or the publisher. Nearly half of the research funders who took part in the survey have a policy covering research data management, whilst a quarter of the funders require data management plans as part of the grant application. Data management plans should address data acquisition, use, re-use, storage and protection and the rights of ownership. Just over one third of the responding funding organisations designate a specific organisation for preservation, although no term has been identified.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations"

Jonathan Zittrain, Kendra Albert, and Lawrence Lessig have self-archived "Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations" in SSRN.

Here's an excerpt:

We document a serious problem of reference rot: more than 70% of the URLs within the Harvard Law Review and other journals, and 50% of the URLs found within U.S. Supreme Court opinions do not link to the originally cited information.

Given that, we propose a solution for authors and editors of new scholarship that involves libraries undertaking the distributed, long-term preservation of link contents.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

Presentations from Research Data Management Forum 10

Presentations from the Research Data Management Forum 10 are now available.

Here are some representative presentations:

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Who and What Links to the Internet Archive"

Yasmin AlNoamany, Ahmed AlSum, Michele C. Weigle, and Michael L. Nelson have self-archived "Who and What Links to the Internet Archive" in arXiv.org.

Here's an excerpt:

The Internet Archive's (IA) Wayback Machine is the largest and oldest public web archive and has become a significant repository of our recent history and cultural heritage. Despite its importance, there has been little research about how it is discovered and used. Based on web access logs, we analyze what users are looking for, why they come to IA, where they come from, and how pages link to IA. We find that users request English pages the most, followed by the European languages. Most human users come to web archives because they do not find the requested pages on the live web. About 65% of the requested archived pages no longer exist on the live web. We find that more than 82% of human sessions connect to the Wayback Machine via referrals from other web sites, while only 15% of robots have referrers. Most of the links (86%) from websites are to individual archived pages at specific points in time, and of those 83% no longer exist on the live web.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data"

The CODATA-ICSTI Task Group on Data Citation Standards and Practices has published "Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data" (edited by Yvonne M. Socha) in the Data Science Journal.

Here's an excerpt:

The use of published digital data, like the use of digitally published literature, depends upon the ability to identify, authenticate, locate, access, and interpret them. Data citations provide necessary support for these functions, as well as other functions such as attribution of credit and establishment of provenance. References to data, however, present challenges not encountered in references to literature. For example, how can one specify a particular subset of data in the absence of familiar conventions such as page numbers or chapters? The traditions and good practices for maintaining the scholarly record by proper references to a work are well established and understood in regard to journal articles and other literature, but attributing credit by bibliographic references to data are not yet so broadly implemented. This report discusses the current state of data citation practices, its supporting infrastructure, a set of guiding principles for implementing data citation, challenges to implementation of good data citation practices, and open research questions

.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

Special Issue on Research Data Access and Preservation

The latest issue of the Bulletin of the Association for Information Science and Technology focuses on research data access and preservation.

Here's a selection of articles:

  • "Partnerships Between Institutional Repositories, Domain Repositories and Publishers"
  • "The Relevance of Research Data Sharing and Reuse Studies"
  • "Tracking Citations and Altmetrics for Research Data: Challenges and Opportunities"
  • "The Research Data Alliance: Implementing the Technology, Practice and Connections of a Data Infrastructure"
  • "The DCC's Institutional Engagements: Raising Research Data Management Capacity in UK Higher Education"

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

Preserving Computer-Aided Design (CAD)

The Digital Preservation Coalition has released Preserving Computer-Aided Design (CAD).

Here's an excerpt:

Computer-Aided Design (CAD) systems are used in both industry and academia to create digital models, whether of engineering designs, archaeological dig sites, or virtual worlds. These models can be of long-lasting significance and importance, particularly if they contain irreplaceable data or relate to long-lived products. This report is primarily aimed at those responsible for archives and repositories with CAD content, but may also be useful for creators of CAD content who want to make their models more amenable to preservation. It begins with an introduction to the historical development and basic concepts of CAD systems, then reviews the most pertinent issues associated with preserving CAD models, and indicates the current state of standardization work in the area. The report goes on to present some recent research of relevance to preserving CAD models before drawing conclusions and making recommendations on how archives should handle the CAD models they accept.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"An Evaluation of Caching Policies for Memento TimeMaps"

Justin F. Brunelle and Michael L. Nelson have self-archived "An Evaluation of Caching Policies for Memento TimeMaps" in arXiv.org.

Here's an excerpt from :

As defined by the Memento Framework, TimeMaps are machine-readable lists of time-specific copies—called "mementos"—of an archived original resource. In theory, as an archive acquires additional mementos over time, a TimeMap should be monotonically increasing. However, there are reasons why the number of mementos in a TimeMap would decrease, for example: archival redaction of some or all of the mementos, archival restructuring, and transient errors on the part of one or more archives. We study TimeMaps for 4,000 original resources over a three month period, note their change patterns, and develop a caching algorithm for TimeMaps suitable for a reverse proxy in front of a Memento aggregator. We show that TimeMap cardinality is constant or monotonically increasing for 80.2% of all TimeMap downloads observed in the observation period. The goal of the caching algorithm is to exploit the ideally monotonically increasing nature of TimeMaps and not cache responses with fewer mementos than the already cached TimeMap. This new caching algorithm uses conditional cache replacement and a Time To Live (TTL) value to ensure the user has access to the most complete TimeMap available. Based on our empirical data, a TTL of 15 days will minimize the number of mementos missed by users, and minimize the load on archives contributing to TimeMaps.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

Presentations from Open Repositories 2013

Presentations from Open Repositories 2013 are now available.

Here's a brief selection of talks:

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

Helping to Open Up: Improving Knowledge, Capability and Confidence in Making Research Data More Open

The Research Information and Digital Literacies Coalition has released Helping to Open Up: Improving Knowledge, Capability and Confidence in Making Research Data More Open.

Here's an excerpt from the announcement:

The report describes a framework for how to address this challenge when designing training and support for opening data, within the broader questions of RDM. Recommendations are set out, relating to:

– putting opening data at the heart of policy

– putting opening data at the heart of training

– deepening and broadening the training

– identifying and disseminating best practice in opening data

– developing institutional and community support

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Crafting Linked Open Data for Cultural Heritage: Mapping and Curation Tools for the Linked Jazz Project"

M. Cristina Pattuelli, Matt Miller, Leanora Lange, Sean Fitzell, and Carolyn Li-Madeo have published "Crafting Linked Open Data for Cultural Heritage: Mapping and Curation Tools for the Linked Jazz Project" in the latest issue of Code4Lib Journal.

Here's an excerpt:

This paper describes tools and methods developed as part of Linked Jazz, a project that uses Linked Open Data (LOD) to reveal personal and professional relationships among jazz musicians based on interviews from jazz archives. The overarching aim of Linked Jazz is to explore the possibilities offered by LOD to enhance the visibility of cultural heritage materials and enrich the semantics that describe them. While the full Linked Jazz dataset is still under development, this paper presents two applications that have laid the foundation for the creation of this dataset: the Mapping and Curator Tool, and the Transcript Analyzer. These applications have served primarily for data preparation, analysis, and curation and are representative of the types of tools and methods needed to craft linked data from digital content available on the web. This paper discusses these two domain-agnostic tools developed to create LOD from digital textual documents and offers insight into the process behind the creation of LOD in general.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Batch Metadata Assignment to Archival Photograph Collections Using Facial Recognition Software"

Kyle Banerjee and Maija Anderson have published "Batch Metadata Assignment to Archival Photograph Collections Using Facial Recognition Software" in the latest issue of Code4Lib Journal.

Here's an excerpt:

Useful metadata is essential to giving individual meaning and value within the context of a greater image collection as well as making them more discoverable. However, often little information is available about the photos themselves, so adding consistent metadata to large collections of digital and digitized photographs is a time consuming process requiring highly experienced staff.

By using facial recognition software, staff can identify individuals more quickly and reliably. Knowledge of individuals in photos helps staff determine when and where photos are taken and also improves understanding of the subject matter.

This article demonstrates simple techniques for using facial recognition software and command line tools to assign, modify, and read metadata for large archival photograph collections.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Using Data Curation Profiles to Design the Datastar Dataset Registry"

Sarah J. Wright, Wendy A. Kozlowski, Dianne Dietrich, Huda J. Khan, and Gail S. Steinhart have published "Using Data Curation Profiles to Design the Datastar Dataset Registry" in the latest issue of D-Lib Magazine.

Here's an excerpt:

The development of research data services in academic libraries is a topic of concern to many. Cornell University Library's efforts in this area include the Datastar research data registry project. In order to ensure that Datastar development decisions were driven by real user needs, we interviewed researchers and created Data Curation Profiles (DCPs). Researchers supported providing public descriptions of their datasets; attitudes toward dataset citation, provenance, versioning, and domain specific standards for metadata also helped to guide development. These findings, as well as considerations for the use of this particular method for developing research data services in libraries are discussed in detail.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Foundations of Data Curation: The Pedagogy and Practice of "Purposeful Work" with Research Data"

Carole L. Palmer, Nicholas M. Weber, Trevor Muñoz, and Allen H. Renear have punlished "Foundations of Data Curation: The Pedagogy and Practice of "Purposeful Work" with Research Data" in the latest issue of Archive Journal.

Here's an excerpt:

Increased interest in large-scale, publicly accessible data collections has made data curation critical to the management, preservation, and improvement of research data in the social and natural sciences, as well as the humanities. This paper explicates an approach to data curation education that integrates traditional notions of curation with principles and expertise from library, archival, and computer science. We begin by tracing the emergence of data curation as both a concept and a field of practice related to, but distinct from, both digital curation and data stewardship. This historical account, while far from definitive, considers perspectives from both the sciences and the humanities. Alongside traditional LIS and archival science practices, unique aspects of curation have informed our concept of "purposeful work" with data and, in turn, our pedagogical approach to data curation for the sciences and the humanities.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

Walk This Way: Detailed Steps for Transferring Born-Digital Content from Media You Can Read In-house

OCLC Research has released Walk This Way: Detailed Steps for Transferring Born-Digital Content from Media You Can Read In-house.

Here's an excerpt from the announcement:

The third report, Walk This Way: Detailed Steps for Transferring Born-Digital Content from Media You Can Read In-house, collects the assembled wisdom of experienced practitioners to help those with less experience make appropriate choices in gaining control of born-digital content. It contains discrete steps with objectives, links to available tools and software, references and resources for further research and paths to engagement with the digital archives community.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

Johns Hopkins University Offers Digital Curation Certificate Program

Johns Hopkins University has established a Digital Curation Certificate program.

Here's an excerpt from the announcement:

The Johns Hopkins University Certificate in Digital Curation, offered through the online graduate program in Museum Studies, advances the education and training of museum professionals worldwide in this emerging field.

This certificate offers a specialized curriculum that is critically needed in the museum field. It will prepare current and aspiring museum professionals to manage the growing volume and variety of digital data of long-term value that museums are now producing, acquiring, storing and sharing with researchers, educators and the public. It will train students to work with digital collections, exhibitions, and research data that will ensure the longevity of our global cultural heritage of which museums are the stewards.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"The .txtual Condition: Digital Humanities, Born-Digital Archives, and the Future Literary"

Matthew Kirschenbaum has published "The .txtual Condition: Digital Humanities, Born-Digital Archives, and the Future Literary" in a preview issue of Digital Humanities Quarterly.

Here's an excerpt:

Here then are some specifics I have considered as to how digital humanities might usefully collaborate with those archivists even now working on born-digital collections:

  • Digital archivists need digital humanities researchers and subject experts to use born-digital collections. Nothing is more important. If humanities researchers don't demand access to born-digital materials then it will be harder to get those materials processed in a timely fashion, and we know that with the born-digital every day counts.
  • Digital humanists need the long-term perspective on data that archivists have. Today's digital humanities projects are, after all, the repository objects of tomorrow's born-digital archives. Funders are increasingly (and rightfully) insistent about the need to have a robust data management and sustainability plan built into project proposals from the outset. Therefore, there is much opportunity for collaboration and team-building around not only archiving and preservation, but the complete data curation cycle. This extends to the need to jointly plan around storage and institutional infrastructure.
  • Digital archivists and digital humanists need common and interoperable digital tools. Open source community-driven development at the intersection of the needs of digital archivists, humanities scholars, and even collections' donors should become an urgent priority.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Unintended Consequences: New Materialist Perspectives on Library Technologies and the Digital Record"

portal: Libraries and the Academy has released an e-print of "Unintended Consequences: New Materialist Perspectives on Library Technologies and the Digital Record" by Marlene Manoff.

Here's an excerpt:

Digital technology has irrevocably altered the nature of the archive. Drawing on materialist critiques and the evolving field of media archaeology, this essay explores new strategies for understanding the implications of computer networks in libraries. Although a significant portion of the contemporary literature within Library and Information Science (LIS) addresses issues of technological change, the materialist and multidisciplinary approaches proposed here provide a theoretical basis for investigating the current state of library technologies in new ways. These methods provide insight into the proliferation of digital products and the cycles of platform adoption and replacement that have marked the past decades of library development. They also help to reframe questions about content aggregation and the licensing of digital scholarship.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Data Management in Scholarly Journals and Possible Roles for Libraries—Some Insights from EDaWaX"

Sven Vlaeminck has published "Data Management in Scholarly Journals and Possible Roles for Libraries—Some Insights from EDaWaX" in the latest issue of LIBER Quarterly.

Here's an excerpt:

In this paper we summarize the findings of an empirical study conducted by the EDaWaX-Project. 141 economics journals were examined regarding the quality and extent of data availability policies that should support replications of published empirical results in economics. This paper suggests criteria for such policies that aim to facilitate replications. These criteria were also used for analysing the data availability policies we found in our sample and to identify best practices for data policies of scholarly journals in economics. In addition, we also evaluated the journals' data archives and checked the percentage of articles associated with research data. To conclude, an appraisal as to how scientific libraries might support the linkage of publications to underlying research data in cooperation with researchers, editors, publishers and data centres is presented.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap