Curating for Quality: Ensuring Data Quality to Enable New Science

The UNC School of Information & Library Science has released Curating for Quality: Ensuring Data Quality to Enable New Science.

Here's an excerpt:

The National Science Foundation sponsored a workshop on September 10 and 11, 2012, in Arlington, Virginia on "Curating for Quality: Ensuring Data Quality to Enable New Science." Individuals from government, academic and industry settings gathered to discuss issues, strategies and priorities for ensuring quality in collections of data. This workshop aimed to define data quality research issues and potential solutions. The workshop objectives were organized into four clusters: (1) data quality criteria and contexts, (2) human and institutional factors, (3) tools for effective and painless curation, and (4) metrics for data quality. . . .

The workshop identified several key challenges that include:

  • selection strategies—how to determine what is most valuable to preserve
  • how much and which context to include—how to insure that data is interpretable and usable in the future, what metadata to include
  • tools and techniques to support painless curation—creating and sharing tools and techniques that apply across disciplines
  • cost and accountability models—how to balance selection, context decisions with cost constraints.

| Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works | Digital Scholarship |

SURA Research Data Management Group Releases "A Step-By-Step Guide to Data Management"

The SURA Research Data Management Group has released "A Step-By-Step Guide to Data Management."

Here's an excerpt from the press release:

SURA has launched an institutional tool for Research Data Management (RDM), developed by a working group formed with the Association of Southeastern Research Libraries (ASERL). The working group brings together CIOs and library professionals from SURA member institutions to explore collaborations for improving their ability to manage the rapidly growing volume of research data.

The working group produced an institutional "Step-By-Step Guide to Data Management," which is being used to identify gaps in existing RDM processes and guide future efforts of the group. The group has also built a discipline specific metadata scheme directory to assist researchers in finding existing metadata models for their research data.

| Digital Curation Resource Guide | Digital Scholarship |

Thomson Reuters Launches Data Citation Index

Thomson Reuters has launched the Data Citation Index within the Web of Knowledge.

Here's an excerpt from the press release:

This new research resource from Thomson Reuters creates a single source of discovery for scientific, social sciences and arts and humanities information. It provides a single access point to discover foundational research within data repositories around the world in the broader context of peer-reviewed literature in journals, books, and conference proceedings already indexed in the Web of Knowledge. . . .

The Thomson Reuters Data Citation Index makes research within the digital universe discoverable, citable and viewable within the context of the output the data has informed. Thomson Reuters partnered with numerous data repositories worldwide to capture bibliographic records and cited references for digital research, facilitating visibility, author attribution, and ultimately the measurement of impact of this growing body of scholarship.

| Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works | Digital Scholarship |

"Public Availability of Published Research Data in High-Impact Journals"

Alawi A. Alsheikh-Ali et al. have published "Public Availability of Published Research Data in High-Impact Journals" in PLOS ONE.

Here's an excerpt:

We reviewed the first 10 original research papers of 2009 published in the 50 original research journals with the highest impact factor. For each journal we documented the policies related to public availability and sharing of data. Of the 50 journals, 44 (88%) had a statement in their instructions to authors related to public availability and sharing of data. However, there was wide variation in journal requirements, ranging from requiring the sharing of all primary data related to the research to just including a statement in the published manuscript that data can be available on request. Of the 500 assessed papers, 149 (30%) were not subject to any data availability policy. Of the remaining 351 papers that were covered by some data availability policy, 208 papers (59%) did not fully adhere to the data availability instructions of the journals they were published in, most commonly (73%) by not publicly depositing microarray data. The other 143 papers that adhered to the data availability instructions did so by publicly depositing only the specific data type as required, making a statement of willingness to share, or actually sharing all the primary data. Overall, only 47 papers (9%) deposited full primary raw data online. None of the 149 papers not subject to data availability policies made their full primary data publicly available.

| Digital Curation Resource Guide | Digital Scholarship |

Intellectual Property Rights for Digital Preservation

The Digital Preservation Coalition has released Intellectual Property Rights for Digital Preservation.

Here's an excerpt:

While a number of legal issues colour contemporary approaches to, and practices of, digital preservation, it is arguable that intellectual property law, represented principally by copyright and its related rights, has been by far the most dominant, and often intractable, influence. It is thus essential for those engaging in digital preservation to understand the letter of the law as it applies to digital preservation, but equally important to be able to identify and implement practical and pragmatic strategies for handling legal risks relating to intellectual property rights in the pursuit of preservation objectives. . . .

This report is aimed primarily at depositors, archivists and researchers/re-users of digital works, but will provide a concise introduction to the subject matter for policymakers and the general public.

| Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works | Digital Scholarship |

"A Sample of Research Data Curation and Management Courses"

Andrew T. Creamer et al. have published "A Sample of Research Data Curation and Management Courses" in the latest issue of the Journal of eScience Librarianship.

Here's an excerpt:

This paper identifies a sample of research data curation and management courses available at American Library Association-accredited Library and Information Science (LIS) Programs in North America. . . .

Only 13 (22%) of LIS programs currently offer a course focused on the management and curation of research data. . . .

Although the literature supports LIS professionals adopting new roles and engaging in eScience and data management, most LIS data-related programs do not have a separate course solely focused on research data management. More LIS programs will need to adapt their curricula in order to help students and practicing professionals develop the needed competencies in research data curation and management.

| Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works | Digital Scholarship |

California Digital Library and Partners Launch DataUp Data Management Tool

The California Digital Library and its partners have launched the DataUp data management tool.

Here's an excerpt from the press release:

Researchers struggling to meet new data management requirements from funders, journals and their own institutions now can use the DataUp Web application and a Microsoft Excel add-in to document and archive their tabular data. . . .

The DataUp add-in operates within a program many researchers already use: Microsoft Excel. The Web application allows users to upload tabular data in either Excel format or comma-separated value (CSV) format. Both the add-in and the Web application allow users to:

  • Perform a "best practices check" to ensure data are well-formatted and organized
  • Create standardized metadata, or a description of the data, using a wizard-style template
  • Retrieve a unique identifier for their dataset from their data repository
  • Post their datasets and associated metadata to the repository.

Although hundreds of data repositories are available for archiving, many scientific researchers either are unaware of their existence or do not know how to access them. One of the major outcomes of the DataUp project is the ONEShare repository, created specifically for DataUp, where users can deposit tabular data and metadata directly from the tool.

An added advantage of ONEShare is its connection to the DataONE network of repositories. DataONE links existing data centers and enables users to search for data across participating repositories by using a single search interface. Data deposited into ONEShare will be indexed and made available by any DataONE user, facilitating collaboration and enabling data re-use.

| Research Data Curation Bibliography | Digital Scholarship |

"Academic Libraries as Data Quality Hubs"

Michael Joseph Giarlo has self-archived a preprint of "Academic Libraries as Data Quality Hubs" in ScholarSphere.

Here's an excerpt:

This position paper argues that academic libraries have a critical role to play serving as data quality hubs on campus, based on the need for increased data quality for "e-science" and on academic libraries' record of providing digital curation and preservation services. Scientific data are shown to be sufficiently at risk to demonstrate a clear niche for such services to be provided. Data quality measurements are defined, and digital curation processes are explained and mapped to these measurements in order to establish that academic libraries already have sufficient competencies "in-house" to provide data quality services. Opportunities for improvement and challenges are identified as areas that are fruitful for future research and exploration.

| Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works | Digital Scholarship |

"The Data Conservancy Instance: Infrastructure and Organizational Services for Research Data Curation"

Matthew S. Mayernik, G. Sayeed Choudhury, Tim DiLauro, Elliot Metsger, Barbara Pralle, Mike Rippin, and Ruth Duerr have published "The Data Conservancy Instance: Infrastructure and Organizational Services for Research Data Curation" in the latest issue of D-LIB Magazine.

Here's an excerpt:

Digital research data can only be managed and preserved over time through a sustained institutional commitment. Research data curation is a multi-faceted issue, requiring technologies, organizational structures, and human knowledge and skills to come together in complementary ways. This article provides a high-level description of the Data Conservancy Instance, an implementation of infrastructure and organizational services for data collection, storage, preservation, archiving, curation, and sharing. While comparable to institutional repository systems and disciplinary data repositories in some aspects, the DC Instance is distinguished by featuring a data-centric architecture, discipline-agnostic data model, and a data feature extraction framework that facilitates data integration and cross-disciplinary queries. The Data Conservancy Instance is intended to support, and be supported by, a skilled data curation staff, and to facilitate technical, financial, and human sustainability of organizational data curation services. The Johns Hopkins University Data Management Services (JHU DMS) are described as an example of how the Data Conservancy Instance can be deployed.

| Digital Curation Resource Guide | Digital Scholarship |

Middleware and Managing Data and Knowledge in a Data-Rich World

The Trans-European Research and Education Networking Association has released Middleware and Managing Data and Knowledge in a Data-Rich World.

Here's an excerpt:

This report explores the important aspects of data handling and storage in the context of future research networks and the associated services. The study encompasses networking requirements, storage, middleware, data policies, and data origin, each of which is considered from the standpoint of five disciplines: Genomics, High Energy Physics, Digital Cultural Heritage, Radio Astronomy, and Distributed Music Performance.

| Research Data Curation Bibliography | Digital Scholarship |

Key Digital Preservation Standard Updated: Open Archival Information System (OAIS)

ISO has published ISO 14721:2012: Space Data and Information Transfer Systems—Open Archival Information System (OAIS)—Reference Model. A PDF version with marked changes is available from the Consultative Committee for Space Data Systems.

Here's an excerpt:

This reference model:

  • provides a framework for the understanding and increased awareness of archival concepts needed for Long Term digital information preservation and access;
  • provides the concepts needed by non-archival organizations to be effective participants in the preservation process;
  • provides a framework, including terminology and concepts, for describing and comparing architectures and operations of existing and future Archives;
  • provides a framework for describing and comparing different Long Term Preservation strategies and techniques;
  • provides a basis for comparing the data models of digital information preserved by Archives and for discussing how data models and the underlying information may change over time;
  • provides a framework that may be expanded by other efforts to cover Long Term Preservation of information that is NOT in digital form (e.g., physical media and physical samples);

| Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works | Digital Scholarship |

"Ten Recommendations for Libraries to Get Started with Research Data Management"

The LIBER working group on E-Science has released "Ten Recommendations for Libraries to Get Started with Research Data Management."

Here's an excerpt:

LIBER installed the 'E-Science working group' in 2010 to investigate the role libraries can and should play in the field of E-Science. The group decided to focus on research data as it was felt to be the most urgent element of e-science that is of relevance to the community of (research) libraries. The group has held three workshops, the first during the LIBER conference 2011 in Barcelona, the second during the IDCC 2011 conference in Bristol and the third and last one during the LIBER conference 2012 in Tartu. The results of the first two workshops were used as a basis for compiling recommendations to the LIBER ommunity. The "10 recommendations for libraries to support research data management" (see side bar) were finalized and prioritized during the final workshop at the LIBER-conference in Tartu.

| Digital Curation Resource Guide | Digital Scholarship |

"Researchers’ Attitudes towards Data Discovery: Implications for a UCLA Data Registry"

Rachel A. Mandell has self-archived her thesis, "Researchers' Attitudes towards Data Discovery: Implications for a UCLA Data Registry," in SSRN.

Here's an excerpt:

The UCLA Data Registry is a tool designed to serve the greater UCLA research community by collecting and making available surrogate records of research datasets. To figure out how to build this system in accordance with the needs of the community, a total of 20 researchers from disparate disciplines were interviewed about their data and metadata practices. The results indicate that researchers' attitudes and behaviors towards making their work discoverable depend on their concept and definition of data. Given that the UCLA Library will build the UCLA Data Registry, it is important to consider the other possible tools that researchers could use in conjunction with the registry to enhance the discoverability of their data. The Data Registry will be built utilizing a basic metadata schema rather than very specific descriptive fields. The interviews also demonstrated that the culture of publishing and venues for data dissemination are shifting away from the traditional journal article publication, especially in emerging areas such as the digital humanities.

| Digital Curation Resource Guide | Digital Scholarship |

Best Practices for Citability of Data and Evolving Roles in Scholarly Communication

Opportunities for Data Exchange has released Best Practices for Citability of Data and Evolving Roles in Scholarly Communication.

Here's an excerpt:

This report sets out the current thinking on data citation best practice and presents the results of a survey of librarians asking how new support roles could and should be developed. The findings presented here build on the extensive desk research carried out for the report "Integration of Data and Publication" (Reilly, Schallier, Schrimpf, Smit, & Wilkinson, Sept 2011), which identified that data citation was an area of opportunity for both researchers and libraries. That report also recounted the findings of a workshop held at the LIBER 2011 Conference in Barcelona. . . .This previous work is supported here with further information gathered through extensive desk research, structured interviews and an online survey of LIBER members to explore best practice in data citation and evolving support roles for libraries.

| Research Data Curation Bibliography | Digital Scholarship |

Sharing Research Data: Compilation of Results on Drivers and Barriers and New Opportunities

Opportunities for Data Exchange has released Compilation of Results on Drivers and Barriers and New Opportunities.

Here's an excerpt:

Opportunities for Data Exchange (ODE) is a FP7 Project carried out by members of the Alliance for Permanent Access (APA), which is gathering evidence to support strategic investment in the emerging e-Infrastructure for data sharing, re-use and preservation. The ODE Conceptual Model has been developed within the Project to characterise the process of data sharing and the factors which give rise to variations in data sharing for different parties involved. Within the overall Conceptual Model there can be identified models of process, of context, and of drivers, barriers and enablers. The Conceptual Model has been evolved on the basis of existing knowledge and expertise, and draws on research conducted both outside of the ODE Project and in earlier stages of the Project itself (Sections 1-2).

| Research Data Curation Bibliography | Digital Scholarship |

"De-Mystifying the Data Management Requirements of Research Funders"

Dianne Dietrich, Trisha Adamus, Alison Miner, and Gail Steinhart have published "De-Mystifying the Data Management Requirements of Research Funders" in the latest issue of Issues in Science and Technology Librarianship.

Here's an excerpt:

Research libraries have sought to apply their information management expertise to the management of digital research data. This focus has been spurred in part by the policies of two major funding agencies in the United States, which require grant recipients make research outputs, including publications and research data, openly available. As many academic libraries are beginning to offer or are already offering assistance in writing and implementing data management plans, it is important to consider how best to support researchers. Our research examined the current data management requirements of major US funding agencies to better understand data management requirements facing researchers and the implications for libraries offering data management services for researchers.

| Research Data Curation Bibliography | Digital Scholarship |

Syracuse University’s School of Information Studies Offers Certificate of Advanced Study in Data Science

Syracuse University's School of Information Studies is offering a Certificate of Advanced Study in Data Science.

Here's an excerpt from the program web page:

Data scientists collect, organize, store, analyze and share big data. In other words, they know where data lives and can find it. They keep it in an accessible format ready for query. They look at data and see patterns and trends. Most importantly, they share what they find with partners, collaborators and, in many cases, the world.

The iSchool is helping lead the dialogue in defining data science within the academic community and within industry. In doing so, students in this CAS program have the rare opportunity to place their fingerprint on the first wave of standards. This will help institutions and affiliates clarify the murky definitions of data science as it infiltrates public consciousness over the next five to ten years. Professionals with this CAS are particularly poised to lead this field. Our students gain hard, technical skills but also possess the soft, theoretical skills that organizations desperately need.

| Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works | Digital Scholarship |

Digital Curation Resource Guide

Digital Scholarship has released the Digital Curation Resource Guide.

This resource guide presents over 200 selected English-language websites and documents that are useful in understanding and conducting digital curation. It covers academic programs, discussion lists and groups, glossaries, file formats and guidelines, metadata standards and vocabularies, models, organizations, policies, research data management, serials and blogs, services and vendor software, software and tools, and training. It is available under a Creative Commons Attribution-NonCommercial 3.0 Unported License.

The Digital Curation Resource Guide complements the Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works, which was released in June.

It is also available as an EPUB file (see How to Read EPUB Files).

The Future of Big Data

The Pew Research Center's Internet & American Life Project has released The Future of Big Data.

Here's an excerpt:

Imagine where we might be in 2020. The Pew Research Center's Internet & American Life Project and Elon University's Imagining the Internet Center asked digital stakeholders to weigh two scenarios for 2020, select the one most likely to evolve, and elaborate on the choice. One sketched out a relatively positive future where Big Data are drawn together in ways that will improve social, political, and economic intelligence. The other expressed the view that Big Data could cause more problems than it solves between now and 2020.

Respondents to our query rendered a decidedly split verdict.

| Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works | Digital Scholarship |

The Journal of Heredity Joins Growing Number of Journals Mandating Data Archiving

The American Genetic Association has mandated the Joint Data Archiving Policy for the Journal of Heredity. The Joint Data Archiving Policy (JDAP) page lists other journals that mandate data archiving.

| Digital Curation and Preservation Bibliography 2010 | Digital Scholarship |

Managing Research Data in Big Science

Norman Gray, Tobia Carozzi, and Graham Woan have self-archived Managing Research Data in Big Science in arXiv.org.

Here's an excerpt:

The project which led to this report was funded by JISC in 2010-2011 as part of its 'Managing Research Data' programme, to examine the way in which Big Science data is managed, and produce any recommendations which may be appropriate. . . .

This project has explored these differences using as a case-study Gravitational Wave data generated by the LSC [LIGO Scientific Collaboration], and has produced recommendations intended to be useful variously to JISC, the funding council (STFC) and the LSC community.

In Sect. 1 we define what we mean by 'big science', describe the overall data culture there, laying stress on how it necessarily or contingently differs from other disciplines.

In Sect. 2 we discuss the benefits of a formal data-preservation strategy, and the cases for open data and for well-preserved data that follow from that. . . .

In Sect. 3 we briefly discuss the LIGO data management plan, and pull together whatever information is available on the estimation of digital preservation costs.

| Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works | Digital Scholarship |

After UK’s RCUK Policy, European Commission Announces Another Major Open Access Policy

Yesterday DigitalKoans reported on the Research Councils UK's new open access policy. Today, the European Commission has announced another major open access policy.

Here's an excerpt from the press release:

The European Commission today outlined measures to improve access to scientific information produced in Europe. Broader and more rapid access to scientific papers and data will make it easier for researchers and businesses to build on the findings of public-funded research. This will boost Europe's innovation capacity and give citizens quicker access to the benefits of scientific discoveries. In this way, it will give Europe a better return on its €87 billion annual investment in R&D. The measures complement the Commission's Communication to achieve a European Research Area (ERA), also adopted today.

As a first step, the Commission will make open access to scientific publications a general principle of Horizon 2020, the EU's Research & Innovation funding programme for 2014-2020. As of 2014, all articles produced with funding from Horizon 2020 will have to be accessible:

  • articles will either immediately be made accessible online by the publisher ('Gold' open access)—up-front publication costs can be eligible for reimbursement by the European Commission; or
  • researchers will make their articles available through an open access repository no later than six months (12 months for articles in the fields of social sciences and humanities) after publication ('Green' open access).

The Commission has also recommended that Member States take a similar approach to the results of research funded under their own domestic programmes. The goal is for 60% of European publicly-funded research articles to be available under open access by 2016.

The Commission will also start experimenting with open access to the data collected during publicly funded research (e.g. the numerical results of experiments), taking into account legitimate concerns related to the fundee's commercial interests or to privacy.

| Transforming Scholarly Publishing through Open Access: A Bibliography | Digital Scholarship |