Cyberinfrastructure/E-Science – Page 2

E-science and Academic Libraries Bibliography

Digital Scholarship has released the E-science and Academic Libraries Bibliography. It includes English-language articles, books, editorials, and technical reports that are useful in understanding the broad role of academic libraries in e-science efforts. The scope of this brief selective bibliography is narrow, and it does not cover data curation and research data management issues in libraries in general. Most sources have been published from 2007 through October 18, 2011; however, a limited number of key sources published prior to 2007 are also included. The bibliography includes links to freely available versions of included works, such as e-prints and open access articles.

| Digital Curation and Preservation Bibliography 2010 | Digital Scholarship |

"Building Research Cyberinfrastructure at Small/Medium Research Institutions"

Anne Agee, Theresa Rowe, Melissa Woo, and David Woods have published "Building Research Cyberinfrastructure at Small/Medium Research Institutions" in EDUCAUSE Quarterly.

Here's an excerpt:

To build a respectable cyberinfrastructure, the IT organizations at small/medium research institutions need to use creativity in discovering the needs of their researchers, setting priorities for support, developing support strategies, funding and implementing cyberinfrastructure, and building partnerships to enhance research support. This article presents the viewpoints of four small-to-medium-sized research universities who have struggled with the issue of providing appropriate cyberinfrastructure support for their research enterprises. All four universities have strategic goals for raising the level of research activity and increasing extramural funding for research.

Digital Data: "Metadata and Provenance Management"

Ewa Deelman et al. have self-archived "Metadata and Provenance Management" in arXiv.org.

Here's an excerpt:

Scientists today collect, analyze, and generate TeraBytes and PetaBytes of data. These data are often shared and further processed and analyzed among collaborators. In order to facilitate sharing and data interpretations, data need to carry with it metadata about how the data was collected or generated, and provenance information about how the data was processed. This chapter describes metadata and provenance in the context of the data lifecycle. It also gives an overview of the approaches to metadata and provenance management, followed by examples of how applications use metadata and provenance in their scientific processes.

Presentations from the Digital Repository Federation International Conference 2009

Presentations from the DRF International Conference 2009: Open Access Repositories Now and in the Future—From the Global and Asia-Pacific Points of View are now available. The Digital Repository Federation is "a federation consisting of 87 universities and research institutes (as of February 2009), which aims to promote Open Access and Institutional Repository in Japan."

Here's a quick selection of presentations:

Before and after DRF Initiative: A Cluster of Institutional Repository Programs beyond the Traditional Role of the Library, Hideki Uchijima (Kanazawa University)
Future Perspectives of Institutional Repositories in Taiwan, Simon C. Lin, Ya-Ning Chen (Academia Sinica)
Innovation in Scholarly Communication in High-Energy Physics:Repositories and Open Access, Salvatore Mele (CERN)
Suggestions for Asian/Australasian Regional Cooperation based on a Critical Evaluation of Collaboration and Standardisation across Australian Institutional Repositories, Peter Sefton (University of Southern Queensland)
A Retrospective of NII Projects on New Perspectives in CSI, Jun Adachi (National Institute of Informatics)

Revised NSF Software Development for Cyberinfrastructure Solicitation

The NSF has issued a revised solicitation for Software Development for Cyberinfrastructure grants (NSF 10-508). It is anticipated that $15,000,000 over a three-year period will be available for 25 to 30 awards. The full proposal deadline is February 26, 2010.

Here's an excerpt:

The FY2010 SDCI solicitation supports the development, deployment, and maintenance of software in the five software focus area listed above, i.e., software for HPC systems, software for digital data management, software for networking, middleware, and cybersecurity, and specifically focuses on cross-cutting issues of CI software sustainability, manageability and power/energy efficiency in each of these software focus areas. . . .

Software for Digital Data

The Data focus area addresses software that promotes acquisition, transport, discovery, access, analysis, and preservation of very large-scale digital data in support of large scale applications or data sets transitioning to use by communities other than the ones that originally gathered the data. Examples of such datasets includes climatologic, ecologic, phonologic, observation data, sensor systems, spatial visualizations, multi-dimensional datasets correlated with metadata and so forth.

Specific focus areas in Software for Digital Data for the FY2010 SDCI solicitation include:

Documentation/Metadata: Tools for automated/facilitated metadata creation/acquisition, including linking data and metadata to assist in curation efforts; tools to enable the creation and application of ontologies, semantic discovery, assessment, comparison, and integration of new composite ontologies.

Security/Protection: Tools for data authentication, tiered/layered access systems for data confidentiality/privacy protection, replication tools to ensure data protection across varied storage systems/strategies, rules-based data security management tools, and assurance tools to test for digital forgery and privacy violations.

Data transport/management: Tools to enable acquisition of high data rate high volume data from varied, distributed data sources (including sensors systems and instruments), while addressing stringent space and data quality constraints; tools to assist in improved low-level management of data and transport to take better advantage of limited bandwidth.

Data analytics and visualization: Tools that operate in (near) real-time, not traditional batch mode, on possible streaming data, in-transit data processing, data integration and fusion.

Data Preservation in High Energy Physics

The ICHFA DPHEP International Study Group has self-archived Data Preservation in High Energy Physics in arXiv.org.

Here's an excerpt:

Data from high-energy physics (HEP) experiments are collected with significant financial and human effort and are mostly unique. At the same time, HEP has no coherent strategy for data preservation and re-use. An inter-experimental Study Group on HEP data preservation and long-term analysis was convened at the end of 2008 and held two workshops, at DESY (January 2009) and SLAC (May 2009). This document is an intermediate report to the International Committee for Future Accelerators (ICFA) of the reflections of this Study Group.

NSF Awards $20 Million to DataONE (Observation Network for Earth) Project

The National Science Foundation has awarded a $20 million grant to the DataONE (Observation Network for Earth) Project, which reports to both the Office of the Vice President of Research and the University Libraries at the University of New Mexico. William Michener, professor and director of e-science initiatives at University Libraries, is directing the project.

Here's an excerpt from the press release:

Researchers at UNM have partnered with dozens of other universities and agencies to create DataONE, a global data access and preservation network for earth and environmental scientists that will support breakthroughs in environmental research.

DataONE is designed to provide universal access to data about life on Earth and the environment that sustains it. The underlying technologies will provide open, persistent, robust, and secure access to well-described and easily discovered Earth observational data.

Expected users include scientists, educators, librarians, resource managers, and the public. By providing easy and open access to a broad range of science data, as well as tools for managing, analyzing, and visualizing data, DataONE will be transformative in the speed with which researchers will be able to assemble and analyze data sets and in the types of problems they will be able to address. . . .

DataONE is one of two $20 million awards made this year as part of the National Science Foundation's (NSF) DataNet program. The collaboration of universities and government agencies coalesced to address the mounting need for organizing and serving up vast amounts of highly diverse and inter-related but often-incompatible scientific data. Resulting studies will range from research that illuminates fundamental environmental processes to identifying environmental problems and potential solutions. . . .

The DataONE team will study how a vast digital data network can provide secure and permanent access into the future, and also encourage scientists to share their information. The team will help determine data citation standards, as well as create the tools for organizing, managing, and publishing data.

The resulting computing and processing "cyberinfrastructure" will be made permanently available for use by the broader national and international science communities. DataONE is led by the University of New Mexico, and includes additional partner organizations across the United States as well as from Europe, Africa, South America, Asia, and Australia.

This grant is important nationally, and locally especially for our research community. University Libraries Dean Martha Bedard said, "The University Libraries are key partners in UNM research initiatives, and are excited and committed to supporting the emerging area of data curation, which this grant seeks to support in sophisticated ways."

DataONE will build a set of geographically distributed Coordinating Nodes that play an important role in facilitating all of the activities of the global network, as well as a network of Member Nodes that host relevant data and tools. The initial three Coordinating Nodes will be at the University of New Mexico, UC Santa Barbara (housed at the Davidson Library), and at the University of Tennessee/Oak Ridge National Laboratory. Member Nodes will be located in association with universities, libraries, research networks, and agencies worldwide.

ARL Releases E-Science Survey Preliminary Results and Resources

The Association of Research Libraries has released preliminary results and resources from an e-science survey of its members.

Here's an excerpt from the press release:

The Association of Research Libraries (ARL) E-Science Working Group surveyed ARL member libraries in the fall of 2009 to gather data on the state of engagement with e-science issues. An overview of initial survey findings was presented by E-Science Working Group Chair Wendy Lougee, University Librarian, McKnight Presidential Professor, University of Minnesota Libraries, at the October ARL Membership Meeting. Lougee's briefing explored contrasting approaches among research institutions, particularly in regard to data management. The briefing also summarized survey findings on topics such as library services, organizational structures, staffing patterns and staff development, and involvement in research grants, along with perspectives on pressure points for service development. To better explicate the findings, Lougee reviewed specific cases of activities at six research institutions. . . .

A full report of the survey findings is being prepared and will be published in 2010 by ARL through its Occasional Papers series.

Open Science at Web-Scale: Optimising Participation and Predictive Potential

JISC has released Open Science at Web-Scale: Optimising Participation and Predictive Potential.

Here's an excerpt:

This Report has attempted to draw together and synthesise evidence and opinion from a wide range of sources. Examples of data intensive science at extremes of scale and complexity which enable forecasting and predictive assertions, have been described together with compelling exemplars where an open and participative culture is transforming science practice. It is perhaps worth noting that the pace of change in this area is such, that it has been a challenging piece to compose and at best, it can only serve as a subjective snapshot of a very dynamic data space. . . .

The perspective of openness as a continuum is helpful in positioning the range of behaviours and practices observed in different disciplines and contexts. By separating the twin aspects of openness (access and participation), we can begin to understand the full scope and potential of the open science vision. Whilst a listing of the perceived values and benefits of open science is given, further work is required to provide substantive and tangible evidence to justify and support these assertions. Available evidence suggests that transparent data sharing and data re-use are far from commonplace. The peer production approaches to data curation which have been described, are really in their infancy but offer considerable promise as scaleable models which could be migrated to other disciplines. The more radical open notebook science methodologies are currently on the "fringe" and it is not clear whether uptake and adoption will grow in other disciplines and contexts.

Duke, NC State, and UNC Data Sharing Cloud Computing Project Launched

Duke University, North Carolina State University, and the University of North Carolina at Chapel Hill have launched a two-year project to share digital data.

Here's an excerpt from the press release:

An initiative that will determine how Triangle area universities access, manage, and share ever-growing stores of digital data launched this fall with funding from the Triangle Universities Center for Advanced Studies, Inc. (TUCASI).

The two-year TUCASI data-Infrastructure Project (TIP) will deploy a federated data cyberinfrastructure—or data cloud—that will manage and store digital data for Duke University, NC State University, UNC Chapel Hill, and the Renaissance Computing Institute (RENCI) and allow the campuses to more seamlessly share data with each other, with national research projects, and private sector partners in Research Triangle Park and beyond.

RENCI and the Data Intensive Cyber Environments (DICE) Center at UNC Chapel Hill manage the $2.7 million TIP. The provosts, heads of libraries and chief information officers at the three campuses signed off on the project just before the start of the fall semester.

"The TIP focuses on federation, sharing and reuse of information across departments and campuses without having to worry about where the data is physically stored or what kind of computer hardware or software is used to access it," said Richard Marciano, TIP project director, and also professor at UNC's School of Information and Library Science (SILS), executive director of the DICE Center, and a chief scientist at RENCI. "Creating infrastructure to support future Triangle collaboratives will be very powerful."

The TIP includes three components—classroom capture, storage, and future data and policy, which will be implemented in three phases. In phase one, each campus and RENCI will upgrade their storage capabilities and a platform-independent system for capturing and sharing classroom lectures and activities will be developed. . . .

In phase two, the TIP team will develop policies and practices for short- and long-term data storage and access. Once developed, the policies and practices will guide the research team as it creates a flexible, sustainable digital archive, which will connect to national repositories and national data research efforts. Phase three will establish policies for adding new collections to the TIP data cloud and for securely sharing research data, a process that often requires various restrictions. "Implementation of a robust technical and policy infrastructure for data archiving and sharing will be key to maintaining the Triangle universities' position as leaders in data-intensive, collaborative research," said Kristin Antelman, lead researcher for the future data and policy working group and associate director for the Digital Library at NC State.

The tasks of the TIP research team will include designing a model for capturing, storing and accessing course content, determining best practices for search and retrieval, and developing mechanisms for sharing archived content among the TIP partners, across the Triangle area and with national research initiatives. Campus approved social media tools, such as YouTube and iTunesU, will be integrated into the system.

The Fourth Paradigm: Data-Intensive Scientific Discovery

Microsoft Research has released The Fourth Paradigm: Data-Intensive Scientific Discovery.

Of particular interest is the "Scholarly Communication" chapter.

Here are some selections from that chapter:

"Jim Gray’s Fourth Paradigm and the Construction of the Scientific Record," Clifford Lynch
"Text in a Data-Centric World," Paul Ginsparg
"All Aboard: Toward a Machine-Friendly Scholarly Communication System," Herbert Van de Sompel and Carl Lagoze
"I Have Seen the Paradigm Shift, and It Is Us," John Wilbanks

Papers from the European Research Area 2009 Conference

Papers from the European Research Area 2009 Conference are now available.

Here's a selection from the "Open Access and Preservation" session:

Open Access And Preservation: How Can Knowledge Sharing Be Improved in ERA? (presentation slides), Alma Swan
"Optimizing Research Sharing in the European Research Area: Cyberinfrastructure, Quality, and Open Access," Jean-Claude Guédon

7 Things You Should Know About Cloud Computing

EDUCAUSE has released 7 Things You Should Know About Cloud Computing.

Here's the abstract:

Cloud computing is the delivery of scalable IT resources over the Internet, as opposed to hosting and operating those resources locally, such as on a college or university network. Those resources can include applications and services, as well as the infrastructure on which they operate. By deploying IT infrastructure and services over the network, an organization can purchase these resources on an as-needed basis and avoid the capital costs of software and hardware. With cloud computing, IT capacity can be adjusted quickly and easily to accommodate changes in demand. Cloud computing also allows IT providers to make IT costs transparent and thus match consumption of IT services to those who pay for such services. Operating in a cloud environment requires IT leaders and staff to develop different skills, such as managing contracts, overseeing integration between in-house and outsourced services, and mastering a different model of IT budgets.

eSciDoc Infrastructure Version 1.1 Released

Version 1.1 of the eSciDoc Infrastructure has been released.

Here's an excerpt from the announcement:

Improved Ingest with support for pre-set states (e.g., ingest objects in status 'released'). Ingest performance has been improved significantly.

Support for user preferences added

Group policies extend the existing authorization options and allow for better support of collaborative working environments

Support for Japanese character sets in full-text and metadata searches, including the extraction of Japanese text from PDF documents

Support for OAI-PMH with dynamic sets based on filters

Improved and extended functionality for the Admin Tool, which now comes with a web-based GUI

Here's a brief description of the eSciDoc Core Services, which are part of a larger software suite (see the General Concepts page for further information):

The eSciDoc Core Services form a middleware for e-Research applications. The Core Services encapsulate a repository (Fedora Commons) and implement a broad range of commonly used functionalities. The service-oriented architecture fosters the creation of autonomous services, which can be re-used independently from the rest of the infrastructure. The multi-disciplinary nature of the existing Solutions built on top of the Core Services ensure the coverage of a broad range of generic and discipline-specific requirements.

“Adding eScience Assets to the Data Web”

Herbert Van de Sompel, Carl Lagoze, Michael L. Nelson, Simeon Warner, Robert Sanderson, and Pete Johnston have self-archived "Adding eScience Assets to the Data Web" on arXiv.org.

Here's an excerpt:

Aggregations of Web resources are increasingly important in scholarship as it adopts new methods that are data-centric, collaborative, and networked-based. The same notion of aggregations of resources is common to the mashed-up, socially networked information environment of Web 2.0. We present a mechanism to identify and describe aggregations of Web resources that has resulted from the Open Archives Initiative – Object Reuse and Exchange (OAI-ORE) project. The OAI-ORE specifications are based on the principles of the Architecture of the World Wide Web, the Semantic Web, and the Linked Data effort. Therefore, their incorporation into the cyberinfrastructure that supports eScholarship will ensure the integration of the products of scholarly research into the Data Web.

Free Cloud Services from Amazon: AWS in Education

Amazon is offering academic community members free cloud services in its AWS in Education program.

Here's an excerpt from the press release:

Amazon.com, Inc. announces AWS in Education, a set of programs that enable the academic community to easily leverage the benefits of Amazon Web Services for teaching and research. With AWS in Education, educators, academic researchers, and students worldwide can obtain free usage credits to tap into the on-demand infrastructure of Amazon Web Services to teach advanced courses, tackle research endeavors and explore new projects. . . AWS in Education also provides self-directed learning resources on cloud computing for students.

Read more about it at "AWS in Education FAQs."

NSF Awards about $5 Million to 14 Universities to Participate in the IBM/Google Cloud Computing University Initiative

The National Science Foundation has awarded about $5 million in grants to 14 universities to participate in the IBM/Google Cloud Computing University Initiative.

Here's an excerpt from the press release:

The initiative will provide the computing infrastructure for leading-edge research projects that could help us better understand our planet, our bodies, and pursue the limits of the World Wide Web.

In 2007, IBM and Google announced a joint university initiative to help computer science students gain the skills they need to build cloud applications. Now, NSF is using the same infrastructure and open source methods to award CLuE grants to universities around the United States. Through this program, universities will use software and services running on an IBM/Google cloud to explore innovative research ideas in data-intensive computing. These projects cover a range of activities that could lead not only to advances in computing research, but also to significant contributions in science and engineering more broadly.

NSF awarded Cluster Exploratory (CLuE) program grants to Carnegie-Mellon University, Florida International University, the Massachusetts Institute of Technology, Purdue University, University of California-Irvine, University of California-San Diego, University of California-Santa Barbara, University of Maryland, University of Massachusetts, University of Virginia, University of Washington, University of Wisconsin, University of Utah and Yale University.

Open Grid Forum Digital Repositories Research Group Established

The Open Grid Forum has established a Digital Repositories Research Group.

Here's an excerpt from the home page:

The goal of the Digital Repositories Research Group (DR-RG) is to analyze how digital repositories can be built on top of federated storage infrastructure, focusing on the exploitation of existing data-related standards and the identification of need for new or revised data-related standards.

Draft Roadmap for Science Data Infrastructure

PARSE.Insight has released Draft Roadmap for Science Data Infrastructure.

Here's an excerpt from the announcement:

The draft roadmap provides an overview and initial details of a number of specific components, both technical and non-technical, which would be needed to supplement existing and already planned infrastructures for scientific data. The infra-structure components are aimed at bridging the gaps between islands of functionality, developed for particular purposes, often by other European projects. Thus the infrastructure components are intended to play a general, unifying role in scientific data. While developed in the context of a Europe-wide infrastructure, there would be great advantages for these types of infrastructure components to be available much more widely.

Working Together or Apart: Promoting the Next Generation of Digital Scholarship

The Council on Library and Information Resources has released Working Together or Apart: Promoting the Next Generation of Digital Scholarship: Report of a Workshop Cosponsored by the Council on Library and Information Resources and The National Endowment for the Humanities

Here's an excerpt from the Executive Summary:

On September 15, 2008, CLIR, in cooperation with the National Endowment for the Humanities (NEH), held a symposium to explore research topics arising at the intersection of humanities, social sciences, and computer science. The meeting addressed two fundamental questions: (1) how do the new media advance and transform the interpretation and analysis of text, image, and other sources of interest to the humanities and social sciences and enable new expression and pedagogy?, and (2) how do those processes of inquiry pose questions and challenges for research in computer science as well as in the humanities and social sciences?

Working Together or Apart considers these two questions. The volume opens with an essay by CLIR Director of Programs Amy Friedlander, which contextualizes and synthesizes the day's discussion. It is followed by six papers prepared for the meeting, and a summary of a report on digital humanities centers commissioned by CLIR and written by Diane Zorich.

Digital Video: Open Science: Good For Research, Good For Researchers? at Columbia

A digital video of the panel presentation: "Open Science: Good for Research, Good for Researchers?" at Columbia University is now available.

Here's the description from the Web page:

Open science refers to information-sharing among researchers and encompasses a number of initiatives to remove access barriers to data and published papers, and to use digital technology to more efficiently disseminate research results. Advocates for this approach argue that openly sharing information among researchers is fundamental to good science, speeds the progress of research, and increases recognition of researchers. Panelists: Jean-Claude Bradley, Associate Professor of Chemistry and Coordinator of E-Learning for the School of Arts and Sciences at Drexel University; Barry Canton, founder of Gingko BioWorks and the OpenWetWare wiki, an online community of life science researchers committed to open science that has over 5,300 users; Bora Zivkovic, Online Discussion Expert for the Public Library of Science (PLoS) and author of "A Blog Around the Clock."

Sun Microsystems Releases Open APIs for the Sun Open Cloud Platform

Sun Microsystems has released Open API's for its Open Cloud Platform.

Here's an excerpt from the press release:

Today at its CommunityOne developer event, Sun Microsystems, Inc. . . . showcased the Sun Open Cloud Platform, the company's open cloud computing infrastructure, powered by industry-leading software technologies from Sun, including Java, MySQL, OpenSolaris and Open Storage. Signaling a massive opportunity to open the world's nascent cloud market, Sun also outlined that a core element of its strategy is to offer public clouds and previewed plans to launch the Sun Cloud, its first public cloud service targeted at developers, student and startups. . . .

As part of the company's commitment to building communities, Sun also announced the release of a core set of Open APIs, unveiled broad partner support for its cloud platform and demonstrated innovative features of the Sun Cloud. Sun is opening its cloud APIs for public review and comment, so that others building public and private clouds can easily design them for compatibility with the Sun Cloud. Sun's Cloud API specifications are published under the Creative Commons license, which essentially allows anyone to use them in any way. Developers will be able to deploy applications to the Sun Cloud immediately, by leveraging pre-packaged VMIs (virtual machine images) of Sun's open source software, eliminating the need to download, install and configure infrastructure software. To participate in the discussion and development of Sun's Cloud APIs, go to sun.com/cloud.

In related news, according to the Wall Street Journal, IBM is negotiating to acquire Sun Microsystems.

Herbert Van de Sompel et al. on “Adding eScience Assets to the Data Web”

Herbert Van de Sompel et al.'s paper on "Adding eScience Assets to the Data Web" is now available on the Linked Data on the Web (LDOW2009) Web site.

Here's an excerpt:

Aggregations of Web resources are increasingly important in scholarship as it adopts new methods that are data-centric, collaborative, and networked-based. The same notion of aggregations of resources is common to the mashed-up, socially networked information environment of Web 2.0. We present a mechanism to identify and describe aggregations of Web resources that has resulted from the Open Archives Initiative-Object Reuse and Exchange (OAI-ORE) project. The OAI-ORE specifications are based on the principles of the Architecture of the World Wide Web, the Semantic Web, and the Linked Data effort. Therefore, their incorporation into the cyberinfrastructure that supports eScholarship will ensure the integration of the products of scholarly research into the Data Web.

Presentations from the 9th International Bielefeld Conference

Presentations from the 9th International Bielefeld Conference are now available.

Here's a few quick selections:

Communicating the Results of Research: How Much Does It Cost, and Who Pays?, Michael Jubb (slides) (audio)
IR Also Means Institutional Responsibility, Leo Waaijers (slides) (audio)
University Investment in the Library: What's the Return?, Carol Tenopir (slides) (audio)

Cloud Computing: DuraSpace Report to Mellon Foundation

The Andrew W. Mellon Foundation has released a progress report from the DuraSpace project, a joint project of the DSpace Foundation and the Fedora Commons. (Thanks to RepositoryMan.)

Here's an excerpt from "DSpace Foundation and Fedora Commons Receive Grant from the Mellon Foundation for DuraSpace" that describes the project:

Over the next six months funding from the planning grant will allow the organizations to jointly specify and design "DuraSpace," a new web-based service that will allow institutions to easily distribute content to multiple storage providers, both "cloud-based" and institution-based. The idea behind DuraSpace is to provide a trusted, value-added service layer to augment the capabilities of generic storage providers by making stored digital content more durable, manageable, accessible and sharable.