Data Curation, Open Data, and Research Data Management – Page 53

Presentations from Main Drivers on Successful Re-Use of Primary Research Data Workshop

Presentations from the Main Drivers on Successful Re-Use of Primary Research Data Workshop are now available.

Here's an excerpt from the press release:

On 23-24 September 2009 an international discussion workshop was held in Berlin, prepared and organised by Knowledge Exchange. The main focus of the workshop was on the benefits, challenges and obstacles of re-using data from a researcher's perspective. The most important message from the wide variety of presentations was that successful reuse not only requires efforts directed at the technical issues, but also at providing incentives for researchers, both to share and re-use. . . .

At the workshop the use cases presented by researchers from a variety of disciplines were supplemented by two keynotes and selected presentations by specialists from infrastructure institutions, publishers, and national and European funding bodies. Thanks to this broad approach it became clear that certain challenges and obstacles are comparable across disciplines and organisations. As a general recommendation the participants agreed that it is time to cooperate in more ambitious international activities to establish reliable and sustainable support for initiatives in the field of data related research infrastructure—as Prof. John Wood (Imperial College London) put it in his keynote: "You can no longer separate the data management issue from the planning and running of research infrastructures."

"Building a Sustainable Framework for Open Access to Research Data through Information and Communication Technologies"

Gideon Emcee Christian has self-archived "Building a Sustainable Framework for Open Access to Research Data through Information and Communication Technologies" in SSRN.

Here's an excerpt:

The growth in information and communication technology (ICT) has brought about increased pace in information and knowledge exchange. This increased pace is being fueled in large part by the open exchange of information. The pressure for open access to research data is gaining momentum in virtually every field of human endeavour. Data is the life blood of science and quite unsurprisingly data repositories are rapidly becoming an essential component of the infrastructure of the global science system. Improved access to data will transform the way research is conducted. It will create new opportunities and avenues for improved efficiency in dealing with social, economic and scientific challenges facing humanity.

Despite the admitted benefits of open access to research data, the concept is still weighed down by series of factors both legal and ethical which must be resolved in other to derive the maximum benefits arising from open access to data. The resolution of these issues will require the development of a sustainable framework to facilitate access to and use of research data by researchers, academics institutions, private individuals and other users. This research paper examines the legal and ethical issues affecting open access to research data. The research also examined various frameworks for enhancing open access to research data. Such frameworks include the open data contract, open content licenses as well as open data commons.

Data Dimensions: Disciplinary Differences in Research Data Sharing, Reuse and Long Term Viability

The Digital Curation Centre has released Data Dimensions: Disciplinary Differences in Research Data Sharing, Reuse and Long Term Viability: A Comparative Review Based on Sixteen Case Studies.

Here's an excerpt:

This synthesis study, commissioned by the Digital Curation Centre from Key Perspectives Ltd, forms a major output from the DCC SCARP Project, which investigated attitudes and approaches to data deposit, sharing and reuse, curation and preservation, over a range of research fields in differing disciplines. The aim was to investigate research practitioners’ perspectives and practices in caring for their research data, and the methods and tools they use to that end. Objectives included identification and promotion of ‘good practice’ in the selected research domains, as expressed in DCC tools and resources. The approach combined case study methods with a survey of the literature relevant to digital curation in the selected fields. . . .

This synthesis report (which drew on the SCARP case studies plus a number of others, identified in the Appendix), identifies factors that help understand how curation practices in research groups differ in disciplinary terms. This provides a backdrop to different digital curation approaches. However the case studies illustrate that "the discipline" is too broad a level to understand data curation practices or requirements. The diversity of data types, working methods, curation practices and content skills found even within specialised domains means that requirements should be defined at this or even a finer-grained level, such as the research group.

Research Data: Unseen Opportunities

The Canadian Association of Research Libraries has released Research Data: Unseen Opportunities.

Here's an excerpt from the press release:

The purpose of the toolkit is to enable research library directors to raise awareness of the issues of data management with administrators and researchers on campus.

Data are valuable assets that in some cases have an unlimited potential for reuse. The awareness toolkit underscores the need to ensure that research data are managed throughout the data lifecycle so that they are understandable and usable.

"This is a very timely document" says Marnie Swanson (University of Victoria), Chair of the CARL Data Management Sub-Committee. "More than ever, data are a critical component of the research endeavor and this toolkit will help libraries raise awareness in the scholarly community of the importance of data stewardship."

Research Data: Unseen Opportunities provides readers with a general understanding of the current state of research data in Canada and internationally. It is organized into seven sections: The Big Picture; Major Benefits of Data Management; Current Context; Case Studies; Gaps in Data Stewardship in Canada; Data Management Policies in Canada; Responses to Faculty/Administrative Concerns; What Can Be Done on Campus?

Data Preservation in High Energy Physics

The ICHFA DPHEP International Study Group has self-archived Data Preservation in High Energy Physics in arXiv.org.

Here's an excerpt:

Data from high-energy physics (HEP) experiments are collected with significant financial and human effort and are mostly unique. At the same time, HEP has no coherent strategy for data preservation and re-use. An inter-experimental Study Group on HEP data preservation and long-term analysis was convened at the end of 2008 and held two workshops, at DESY (January 2009) and SLAC (May 2009). This document is an intermediate report to the International Committee for Future Accelerators (ICFA) of the reflections of this Study Group.

NSF Awards $20 Million to DataONE (Observation Network for Earth) Project

The National Science Foundation has awarded a $20 million grant to the DataONE (Observation Network for Earth) Project, which reports to both the Office of the Vice President of Research and the University Libraries at the University of New Mexico. William Michener, professor and director of e-science initiatives at University Libraries, is directing the project.

Here's an excerpt from the press release:

Researchers at UNM have partnered with dozens of other universities and agencies to create DataONE, a global data access and preservation network for earth and environmental scientists that will support breakthroughs in environmental research.

DataONE is designed to provide universal access to data about life on Earth and the environment that sustains it. The underlying technologies will provide open, persistent, robust, and secure access to well-described and easily discovered Earth observational data.

Expected users include scientists, educators, librarians, resource managers, and the public. By providing easy and open access to a broad range of science data, as well as tools for managing, analyzing, and visualizing data, DataONE will be transformative in the speed with which researchers will be able to assemble and analyze data sets and in the types of problems they will be able to address. . . .

DataONE is one of two $20 million awards made this year as part of the National Science Foundation's (NSF) DataNet program. The collaboration of universities and government agencies coalesced to address the mounting need for organizing and serving up vast amounts of highly diverse and inter-related but often-incompatible scientific data. Resulting studies will range from research that illuminates fundamental environmental processes to identifying environmental problems and potential solutions. . . .

The DataONE team will study how a vast digital data network can provide secure and permanent access into the future, and also encourage scientists to share their information. The team will help determine data citation standards, as well as create the tools for organizing, managing, and publishing data.

The resulting computing and processing "cyberinfrastructure" will be made permanently available for use by the broader national and international science communities. DataONE is led by the University of New Mexico, and includes additional partner organizations across the United States as well as from Europe, Africa, South America, Asia, and Australia.

This grant is important nationally, and locally especially for our research community. University Libraries Dean Martha Bedard said, "The University Libraries are key partners in UNM research initiatives, and are excited and committed to supporting the emerging area of data curation, which this grant seeks to support in sophisticated ways."

DataONE will build a set of geographically distributed Coordinating Nodes that play an important role in facilitating all of the activities of the global network, as well as a network of Member Nodes that host relevant data and tools. The initial three Coordinating Nodes will be at the University of New Mexico, UC Santa Barbara (housed at the Davidson Library), and at the University of Tennessee/Oak Ridge National Laboratory. Member Nodes will be located in association with universities, libraries, research networks, and agencies worldwide.

ARL Releases E-Science Survey Preliminary Results and Resources

The Association of Research Libraries has released preliminary results and resources from an e-science survey of its members.

Here's an excerpt from the press release:

The Association of Research Libraries (ARL) E-Science Working Group surveyed ARL member libraries in the fall of 2009 to gather data on the state of engagement with e-science issues. An overview of initial survey findings was presented by E-Science Working Group Chair Wendy Lougee, University Librarian, McKnight Presidential Professor, University of Minnesota Libraries, at the October ARL Membership Meeting. Lougee's briefing explored contrasting approaches among research institutions, particularly in regard to data management. The briefing also summarized survey findings on topics such as library services, organizational structures, staffing patterns and staff development, and involvement in research grants, along with perspectives on pressure points for service development. To better explicate the findings, Lougee reviewed specific cases of activities at six research institutions. . . .

A full report of the survey findings is being prepared and will be published in 2010 by ARL through its Occasional Papers series.

Open Science at Web-Scale: Optimising Participation and Predictive Potential

JISC has released Open Science at Web-Scale: Optimising Participation and Predictive Potential.

Here's an excerpt:

This Report has attempted to draw together and synthesise evidence and opinion from a wide range of sources. Examples of data intensive science at extremes of scale and complexity which enable forecasting and predictive assertions, have been described together with compelling exemplars where an open and participative culture is transforming science practice. It is perhaps worth noting that the pace of change in this area is such, that it has been a challenging piece to compose and at best, it can only serve as a subjective snapshot of a very dynamic data space. . . .

The perspective of openness as a continuum is helpful in positioning the range of behaviours and practices observed in different disciplines and contexts. By separating the twin aspects of openness (access and participation), we can begin to understand the full scope and potential of the open science vision. Whilst a listing of the perceived values and benefits of open science is given, further work is required to provide substantive and tangible evidence to justify and support these assertions. Available evidence suggests that transparent data sharing and data re-use are far from commonplace. The peer production approaches to data curation which have been described, are really in their infancy but offer considerable promise as scaleable models which could be migrated to other disciplines. The more radical open notebook science methodologies are currently on the "fringe" and it is not clear whether uptake and adoption will grow in other disciplines and contexts.

Duke, NC State, and UNC Data Sharing Cloud Computing Project Launched

Duke University, North Carolina State University, and the University of North Carolina at Chapel Hill have launched a two-year project to share digital data.

Here's an excerpt from the press release:

An initiative that will determine how Triangle area universities access, manage, and share ever-growing stores of digital data launched this fall with funding from the Triangle Universities Center for Advanced Studies, Inc. (TUCASI).

The two-year TUCASI data-Infrastructure Project (TIP) will deploy a federated data cyberinfrastructure—or data cloud—that will manage and store digital data for Duke University, NC State University, UNC Chapel Hill, and the Renaissance Computing Institute (RENCI) and allow the campuses to more seamlessly share data with each other, with national research projects, and private sector partners in Research Triangle Park and beyond.

RENCI and the Data Intensive Cyber Environments (DICE) Center at UNC Chapel Hill manage the $2.7 million TIP. The provosts, heads of libraries and chief information officers at the three campuses signed off on the project just before the start of the fall semester.

"The TIP focuses on federation, sharing and reuse of information across departments and campuses without having to worry about where the data is physically stored or what kind of computer hardware or software is used to access it," said Richard Marciano, TIP project director, and also professor at UNC's School of Information and Library Science (SILS), executive director of the DICE Center, and a chief scientist at RENCI. "Creating infrastructure to support future Triangle collaboratives will be very powerful."

The TIP includes three components—classroom capture, storage, and future data and policy, which will be implemented in three phases. In phase one, each campus and RENCI will upgrade their storage capabilities and a platform-independent system for capturing and sharing classroom lectures and activities will be developed. . . .

In phase two, the TIP team will develop policies and practices for short- and long-term data storage and access. Once developed, the policies and practices will guide the research team as it creates a flexible, sustainable digital archive, which will connect to national repositories and national data research efforts. Phase three will establish policies for adding new collections to the TIP data cloud and for securely sharing research data, a process that often requires various restrictions. "Implementation of a robust technical and policy infrastructure for data archiving and sharing will be key to maintaining the Triangle universities' position as leaders in data-intensive, collaborative research," said Kristin Antelman, lead researcher for the future data and policy working group and associate director for the Digital Library at NC State.

The tasks of the TIP research team will include designing a model for capturing, storing and accessing course content, determining best practices for search and retrieval, and developing mechanisms for sharing archived content among the TIP partners, across the Triangle area and with national research initiatives. Campus approved social media tools, such as YouTube and iTunesU, will be integrated into the system.

The Fourth Paradigm: Data-Intensive Scientific Discovery

Microsoft Research has released The Fourth Paradigm: Data-Intensive Scientific Discovery.

Of particular interest is the "Scholarly Communication" chapter.

Here are some selections from that chapter:

"Jim Gray’s Fourth Paradigm and the Construction of the Scientific Record," Clifford Lynch
"Text in a Data-Centric World," Paul Ginsparg
"All Aboard: Toward a Machine-Friendly Scholarly Communication System," Herbert Van de Sompel and Carl Lagoze
"I Have Seen the Paradigm Shift, and It Is Us," John Wilbanks

Digital Videos: Presentations from Access 2009 Conference

Presentations from the Access 2009 Conference are now available. Digital videos and presentation slides (if available) are synched.

Here's a quick selection:

Dan Chudnov, "Repository Development at the Library of Congress"
Cory Doctorow, "Copyright vs Universal Access to All Human Knowledge and Groups Without Cost: The State of Play in the Global Copyfight"
Mark Jordan & Brian Owen, "COPPUL's LOCKSS Private Network / Software Lifecycles & Sustainability: a PKP and reSearcher Update"
Dorthea Salo, "Representing and Managing the Data Deluge"
Roy Tennant, "Inspecting the Elephant: Characterizing the Hathi Trust Collection"

Johns Hopkins University Sheridan Libraries' Data Conservancy Project Funded by $20 Million NSF Grant

The Johns Hopkins University Sheridan Libraries' Data Conservancy project has been funded by a $20 million NSF grant.

Here's an excerpt from the press release:

The Johns Hopkins University Sheridan Libraries have been awarded $20 million from the National Science Foundation (NSF) to build a data research infrastructure for the management of the ever-increasing amounts of digital information created for teaching and research. The five-year award, announced this week, was one of two for what is being called "data curation."

The project, known as the Data Conservancy, involves individuals from several institutions, with Johns Hopkins University serving as the lead and Sayeed Choudhury, Hodson Director of the Digital Research and Curation Center and associate dean of university libraries, as the principal investigator. In addition, seven Johns Hopkins faculty members are associated with the Data Conservancy, including School of Arts and Sciences professors Alexander Szalay, Bruce Marsh, and Katalin Szlavecz; School of Engineering professors Randal Burns, Charles Meneveau, and Andreas Terzis; and School of Medicine professor Jef Boeke. The Hopkins-led project is part of a larger $100 million NSF effort to ensure preservation and curation of engineering and science data.

Beginning with the life, earth, and social sciences, project members will develop a framework to more fully understand data practices currently in use and arrive at a model for curation that allows ease of access both within and across disciplines.

"Data curation is not an end but a means," said Choudhury. "Science and engineering research and education are increasingly digital and data-intensive, which means that new management structures and technologies will be critical to accommodate the diversity, size, and complexity of current and future data sets and streams. Our ultimate goal is to support new ways of inquiry and learning. The potential for the sharing and application of data across disciplines is incredible. But it’s not enough to simply discover data; you need to be able to access it and be assured it will remain available."

The Data Conservancy grant represents one of the first awards related to the Institute of Data Intensive Engineering and Science (IDIES), a collaboration between the Krieger School of Arts and Sciences, the Whiting School of Engineering, and the Sheridan Libraries. . . .

In addition to the $20 million grant announced today, the Libraries received a $300,000 grant from NSF to study the feasibility of developing, operating and sustaining an open access repository of articles from NSF-sponsored research. Libraries staff will work with colleagues from the Council on Library and Information Resources (CLIR), and the University of Michigan Libraries to explore the potential for the development of a repository (or set of repositories) similar to PubMedCentral, the open-access repository that features articles from NIH-sponsored research. This grant for the feasibility study will allow Choudhury's group to evaluate how to integrate activities under the framework of the Data Conservancy and will result in a set of recommendations for NSF regarding an open access repository.

"Empirical Study of Data Sharing by Authors Publishing in PLoS Journals"

Caroline J. Savage and Andrew J. Vickershave have published "Empirical Study of Data Sharing by Authors Publishing in PLoS Journals" in PLoS One.

Here's an excerpt:

We requested data from ten investigators who had published in either PLoS Medicine or PLoS Clinical Trials. All responses were carefully documented. In the event that we were refused data, we reminded authors of the journal's data sharing guidelines. If we did not receive a response to our initial request, a second request was made. Following the ten requests for raw data, three investigators did not respond, four authors responded and refused to share their data, two email addresses were no longer valid, and one author requested further details. A reminder of PLoS's explicit requirement that authors share data did not change the reply from the four authors who initially refused. Only one author sent an original data set. . . .

We received only one of ten raw data sets requested. This suggests that journal policies requiring data sharing do not lead to authors making their data sets available to independent investigators.

eSciDoc Infrastructure Version 1.1 Released

Version 1.1 of the eSciDoc Infrastructure has been released.

Here's an excerpt from the announcement:

Improved Ingest with support for pre-set states (e.g., ingest objects in status 'released'). Ingest performance has been improved significantly.

Support for user preferences added

Group policies extend the existing authorization options and allow for better support of collaborative working environments

Support for Japanese character sets in full-text and metadata searches, including the extraction of Japanese text from PDF documents

Support for OAI-PMH with dynamic sets based on filters

Improved and extended functionality for the Admin Tool, which now comes with a web-based GUI

Here's a brief description of the eSciDoc Core Services, which are part of a larger software suite (see the General Concepts page for further information):

The eSciDoc Core Services form a middleware for e-Research applications. The Core Services encapsulate a repository (Fedora Commons) and implement a broad range of commonly used functionalities. The service-oriented architecture fosters the creation of autonomous services, which can be re-used independently from the rest of the infrastructure. The multi-disciplinary nature of the existing Solutions built on top of the Core Services ensure the coverage of a broad range of generic and discipline-specific requirements.

“Adding eScience Assets to the Data Web”

Herbert Van de Sompel, Carl Lagoze, Michael L. Nelson, Simeon Warner, Robert Sanderson, and Pete Johnston have self-archived "Adding eScience Assets to the Data Web" on arXiv.org.

Here's an excerpt:

Aggregations of Web resources are increasingly important in scholarship as it adopts new methods that are data-centric, collaborative, and networked-based. The same notion of aggregations of resources is common to the mashed-up, socially networked information environment of Web 2.0. We present a mechanism to identify and describe aggregations of Web resources that has resulted from the Open Archives Initiative – Object Reuse and Exchange (OAI-ORE) project. The OAI-ORE specifications are based on the principles of the Architecture of the World Wide Web, the Semantic Web, and the Linked Data effort. Therefore, their incorporation into the cyberinfrastructure that supports eScholarship will ensure the integration of the products of scholarly research into the Data Web.

Presentations from the 154th ARL Membership Meeting: Transformational Times

The Association of Research Libraries has released presentations (audio, slides, or both) from the 154th ARL Membership Meeting: Transformational Times.

Australian National Data Service Launches Two Research Data Services

The Australian National Data Service has launched two research data services: Identify My Data and Register My Data.

Here's an excerpt from the announcement:

The Register My Data services allow you to register descriptions of your research data. These descriptions are then published in a number of discovery environments. The first of these is the Research Data Australia gateway (to be launched by ANDS in July) which aspires to include any Australian publicly funded data relevant to research and enable innovative cross-disciplinary re-use. Data descriptions registered with ANDS are also fed into other data discovery portals in Australia and internationally, including the big search engines such as Google. The Identify My Data services allocate persistent identifiers to data. These identifiers enable continuity of access even when the location of the data on the internet changes.

Curating Atmospheric Data for Long Term Use: Infrastructure and Preservation Issues for the Atmospheric Sciences Community

The Digital Curation Centre has released Curating Atmospheric Data for Long Term Use: Infrastructure and Preservation Issues for the Atmospheric Sciences Community, SCARP Case Study No. 2.

Here's an excerpt:

DCC SCARP aims to understand disciplinary approaches to data curation by substantial case studies based on an immersive approach. As part of the SCARP project we engaged with a number of archives, including the British Atmospheric Data Centre, the World Data Centre Archive at the Rutherford Appleton Laboratory and the European Incoherent Scatter Scientific Association (EISCAT). We developed a preservation analysis methodology which is discipline independent in application but none the less capable of identifying and drawing out discipline specific preservation requirements and issues. In this case study report we present the methodology along with its application to the Mesospheric Stratospheric Tropospheric (MST) radar dataset, which is currently supported by and accessed through the British Atmospheric Data Centre. We suggest strategies for the long term preservation of the MST data and make recommendations for the wider community.

Keeping Research Data Safe 2: The Identification of Long-lived Digital Datasets for the Purposes of Cost Analysis: Project Plan

Charles Beagrie has released Keeping Research Data Safe 2: The Identification of Long-lived Digital Datasets for the Purposes of Cost Analysis: Project Plan.

Here's an excerpt from the project home page:

The Keeping Research Data Safe 2 project commenced on 31 March 2009 and will complete in December 2009. The project will identify and analyse sources of long-lived data and develop longitudinal data on associated preservation costs and benefits. We believe these outcomes will be critical to developing preservation costing tools and cost benefit analyses for justifying and sustaining major investments in repositories and data curation.

DISC-UK DataShare Project: Final Report

JISC has released DISC-UK DataShare Project: Final Report.

Here's an excerpt:

The DISC-UK DataShare Project was funded from March 2007-March 2009 as part of JISC's Repositories and Preservation programme, Repositories Enhancement strand. It was led by EDINA and Edinburgh University Data Library in partnership with the University of Oxford and the University of Southampton. The project built on the existing informal collaboration of UK data librarians and data managers who formed DISC-UK (Data Information Specialists Committee–UK).

This project has brought together the distinct communities of data support staff in universities and institutional repository managers in order to bridge gaps and exploit the expertise of both to advance the current provision of repository services for accommodating datasets, and thus to explore new pathways to assist academics at our institutions who wish to share their data over the Internet. The project's overall aim was to contribute to new models, workflows and tools for academic data sharing within a complex and dynamic information environment which includes increased emphasis on stewardship of institutional knowledge assets of all types; new technologies to enhance e- Research; new research council policies and mandates; and the growth of the Open Access / Open Data movement.

With three institutions taking part plus the London School of Economics as an associate partner, a range of exemplars have emerged from the establishment of institutional data repositories and related services. Part of the variety in the exemplars is a result of the different repository platforms used by the three project partners: DSpace (Edinburgh DataShare), ePrints (e-Prints Soton) and Fedora (Oxford University Research Archive, ORA)–all open source software. LSE took another route and is using the distributed Dataverse repository network for data, linking to publications in LSE Research Online. Also, different approaches were taken in setting up the repositories. All three institutions had an existing, well-used institutional repository, but two chose to incorporate datasets within the same system as the publications, and one (Edinburgh DataShare) was a paired repository exclusively for datasets, designed to interoperate with the publications repository (Edinburgh Research Archive). The approach took a major turn midway through the project when an apparent solution to the problem of lack of voluntary deposits arose, in the form of the advent of the Data Audit Framework. Edinburgh participated as a partner in the DAF Development project which created the methodology for the framework, and also won a bid to carry out its own DAF Implementation project. Later, the other two partners conducted their own versions of the data audit framework under the auspices of the DataShare project.

A number of scoping activities were carried about by the partners with the goal of informing repository enhancement as well as broader dissemination. These included a State-of-the-Art-Review to determine what had been learned by previous repository projects in the UK that had forayed into the data arena. This resulted in a list of benefits and barriers to deposit of datasets by researchers to inform our outreach activities. A Data Sharing Continuum diagram was developed to illustrate where the projects were aiming to fit into the curation landscape, and the range of curation steps that could be taken, from simple backup to online visualization. Later on, a specialized metadata schema was explored (Data Documentation Initiative or DDI) in terms of how it might be incorporated into repository systems, though repository development in this area was not taken up. Instead, a dataset application profile was developed based on qualified Dublin Core (dcterms). This was implemented in the Edinburgh DataShare repository and adapted by Southampton for their next release. The project wished to explore wider issues with open data and web publishing, and therefore produced two briefing papers to do with data mashups–on numeric data and geospatial data. Finally, the project staff and consultant distilled what it had learned in terms of policy development for data repositories in a training guide. A number of peer reviewed posters, papers, and articles were written by DISC-UK members about various aspects of the project during the period.

Key conclusions were that 1) Data management motivation is a better bottom-up driver for researchers than data sharing but is not sufficient to create culture change, 2) Data librarians, data managers and data scientists can help bridge communication between repository managers & researchers, and 3) IRs can improve impact of sharing data over the internet.

Digital Preservation: PARSE.Insight Project Reports on First Year Achievements

In "Annual Review Year 1: Goals and Achievements," The PARSE.Insight (Permanent Access to the Records of Science in Europe) Project reports on its first year achievements. This post includes links to a number of longer documents, including the PARSE.Insight Deliverable D2.1 Draft Roadmap.

Here's an excerpt from the PARSE.Insight Deliverable D2.1 Draft Roadmap.

The purpose of this document is to provide an overview and initial details of a number of specific components, both technical and non-technical, which would be needed to supplement existing and already planned infrastructures for science data. The infrastructure components presented here are aimed at bridging the gaps between islands of functionality, developed for particular purposes, often by other European projects, whether separated by discipline or time. Thus the infrastructure components are intended to play a general, unifying role in science data. While developed in the context of a European wide infrastructure, there would be great advantages for these types of infrastructure components to be available much more widely.

U.S. Federal Government Launches Data.gov

The U.S. Federal Government has launched Data.gov.

Here's an excerpt from the home page:

The purpose of Data.gov is to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government. Although the initial launch of Data.gov provides a limited portion of the rich variety of Federal datasets presently available, we invite you to actively participate in shaping the future of Data.gov by suggesting additional datasets and site enhancements to provide seamless access and use of your Federal data.

Dryad Repository Gets $2.18 Million Grant from the National Science Foundation

The Dryad Repository has received a $2.18 million grant from the National Science Foundation.

Here's an excerpt from the press release:

The repository, called Dryad, is designed to archive data that underlie published findings in evolutionary biology, ecology and related fields and allow scientists to access and build on each other’s findings.

The grant recipients are:

National Evolutionary Synthesis Center (NESCent), a collaborative effort involving UNC-Chapel Hill and Duke and North Carolina State universities.

University of North Carolina at Chapel Hill’s Metadata Research Center in the School of Information and Library Science;

North Carolina State University’s Digital Library Program;

Long Term Ecological Research (LTER) Network Office at the University of New Mexico;

Yale University’s TreeBASE database; and

The National Evolutionary Synthesis Center and the Metadata Research Center have been developing Dryad in coordination with a large group of Journals and Societies in evolutionary biology and ecology. With the new grant, the additional team members are contributing to the development of the repository. . . .

Currently, a tremendous amount of information underlying published research findings is lost, researchers say. The lack of data sharing and preservation makes it impossible for the data to be examined or re-used by future investigators.

Dryad addresses these shortcomings and allows scientists to validate published findings, explore new analysis methodologies, repurpose data for research questions unanticipated by the original authors, integrate data across studies and look for trends through statistical meta-analysis.

"The Dryad project seeks to enable scientists to generate new knowledge using existing data," said Kathleen Smith, Ph.D., principal investigator for the grant, a biology professor at Duke and director of the National Evolutionary Synthesis Center. "The key to Dryad in our view is making data deposition a routine and easy part of the publication process."

Digital Repositories Roadmap Review: Towards a Vision for Research and Learning in 2013

JISC has released Digital Repositories Roadmap Review: Towards a Vision for Research and Learning in 2013.

Here's an excerpt from the announcement:

The review is structured into two parts. Firstly it makes a number of recommendations targeted at the JISC Executive. The review then goes on to identify a number of milestones of relevance to the wider community that might act as a measure of progress towards the wider vision of enhanced scholarly communication. Achievement of these milestones would be assisted by JISC through its community work and funding programmes. The review addresses repositories for research outputs, research data and learning materials in separate sections.

CLARION (Chemical Laboratory Repository In/Organic Notebooks) Project Funded

JISC has funded the CLARION (Chemical Laboratory Repository In/Organic Notebooks) project.

Here's an excerpt from the announcement:

So an important part of CLARION will be developing the means for working with scientists to expose their data at the appropriate time. CLARION will expand to include a variety of spectral data, both from central analytical services and from individual labs. Another key aspect of CLARION is that we shall be integrating it with a commercial electronic laboratory notebook (eLNb). We're in the process of evaluating offerings and expect to make an announcement soon. This will be a key opportunity to see how feasible it is to integrate a standard system with the needs of a departmental repository. The protocols may be harder but we'll have the experience from the crystallography band spectroscopy. An important aspect is that we are keen to develop the Open Data idea globally and we's be very interested from other groups who are doing –or thinking of doing –similar things.