Two JISC Open Archives Initiative Object Reuse and Exchange Projects

JISC is funding two projects to do small-scale OAI-ORE tests:

TheOREM (Theses with ORE Metadata), at the University of Cambridge, aims to:

  • Test the applicability of the ORE standard in a realistic scholarly setting—thesis description, submission and publication.
  • Demonstrate the advantages of the ORE approach in complex object publication, by combining it with existing web-standards compliant technologies.
  • Provide examples to fully exercise the ORE specifications in order to provide validation and future direction.

FORESITE (Functional Object Reuse and Exchange: Supporting Information Topology Experiments) will create Resource Map descriptions of JSTOR's holdings, and then ingest them into the DSpace institutional repository system via the SWORD protocol, creating external references back to the original files. The description work will be automated, and the system for achieving this implemented at the University of Liverpool. The SWORD protocol will be implemented within DSpace by HP Labs along with other extensions necessary.

For further information, see the FORESITE proposal, A Preview of the TheOREM Project, and the TheOREM proposal.

Isilon's IQ Clustered Storage System Chosen by Michigan and Rice for Digital Repository Storage

Isilon Systems has announced that its IQ Clustered Storage System will be used to support the Michigan Digitization Project and the Rice Digital Scholarship Archive.

Here's an excerpt from the press release about Michigan:

Isilon Systems . . . today announced that the University of Michigan (U-M) has selected Isilon's IQ clustered storage system as the primary repository for its Michigan Digitization Project. In partnership with Google, the University of Michigan and its Michigan Digitization Project are digitizing more than 7.5 million books, ensuring these valuable resources are available to the public into perpetuity. This enormous undertaking includes the storage of digital copies of all unique books within the libraries of the entire Big-Ten Conference and directly supports Google Book Search, which aims to create a single, comprehensive, searchable, virtual card catalog of all books in all languages. The University of Michigan, in partnership with Indiana University (IU), is leveraging Isilon's IQ clustered storage system to create a Shared Digital Repository (SDR) of the universities' published library materials. Using Isilon IQ, U-M and IU are able consolidate digital copies of millions of books into one, single, shared pool of storage to meet the rapidly growing storage demand of its massive book digitization project. . . .

In conjunction with the Committee for Institutional Cooperation (CIC), an academic partnership formed by the universities of the Big-Ten Conference and the University of Chicago, the University of Michigan and Indiana University are working to create a Shared Digital Repository (SDR) which will mirror the content from U-M and the CIC libraries found in Google Book Search. Using Isilon IQ clustered storage, featuring its OneFS® operating system software, U-M has eliminated disparate data silos to create a shared pool of storage for the digitization efforts of these partner institutions. Each digitized book is approximately 55 MB in size, downloading at a rate of 3 MB/second, 24 hours a day, 7 days a week, for the entire six year duration of the project. Isilon IQ reduces storage management time, enabling U-M to accelerate the book scanning process, preserve valuable materials, and ultimately expand the research and learning capabilities for millions of users across the globe.

Here's an excerpt from the press release about Rice:

Isilon . . . today announced that Rice University has selected Isilon's IQ clustered storage system as its central repository for digital multimedia, including video of selected speeches by international dignitaries and musical performances from the Shepherd School of Music. In an effort to preserve the many historic events held at these prestigious venues and ensure the productions are available to the public into perpetuity, Rice has deployed Isilon clustered storage to consolidate hundreds of recorded musical performances and keynote speeches into a single, highly scalable and reliable shared pool of storage for the Rice Digital Scholarship Archive, an institutional repository based on the DSpace software platform. . . .

Through a cooperative effort between Rice University's Digital Library Initiative, Fondren Library and Central IT department, the university has created a central repository for all its critical multi-media content, enabling a variety of departments to execute on vital, content-driven projects simultaneously, activity that was impossible with traditional storage. Prior to using Isilon IQ, Rice's storage management for the Digital Scholarship archiving system was unable to effectively support management of large digital video and audio files that required streaming for delivery. These assets, therefore, were stored on a variety of streaming servers by various groups across campus, creating multiple access bottlenecks that led to inefficient storage management and undue IT cost and complexity. By unifying all of its digital content onto one, easy to use, "pay as you grow" clustered storage system, Rice University has removed costly data access and management barriers and dramatically simplified its storage architecture. Additionally, using Isilon's SmartQuotas provisioning and quota management software application, Rice is also storing its Language Center's multi-media course work and its Central IT department's webcasts on Isilon IQ, delivering immediate, concurrent data access to multiple users and user groups, further reducing storage management costs to maximize system efficiency.

Rice University will stream its collection of musical performances from the Shepherd School, as well as its video library of the many world leaders and dignitaries that have spoken at the Baker Institute, to thousands of users online. This operation necessitates the use of multiple media servers, using Windows, Quicktime and Real Player formats. Isilon clustered storage communicates natively over CIFS, NFS FTP, and HTTP, as well as interoperating with Windows, Mac and Linux environments, enabling seamless integration with Rice's variety of server formats and enabling all content to be streamed from one, central, easily and immediately accessible storage system. With Isilon IQ, Rice's entire collection of multi-media is accessible to all its servers 24x7x365, ensuring that the media streaming operations are not only efficient and cost-effective, but prepared to meet high user demand.

Summary of Experiences with E-Journal Publishing Software and Institutional Repositories

Sunny Yoon, Digital Resources Coordinator at the City University of New York, posted a query on the CODE4LIB list about the use of e-journal publishing software and its integration into institutional repositories.

She has now posted an interesting summary of responses to her query.

You can also read the replies that were posted to the list under the heading "e-journal publishing software."

Repository Interface for Overlaid Journal Archives: Results from an Online Questionnaire Survey

The RIOJA project has released Repository Interface for Overlaid Journal Archives: Results from an Online Questionnaire Survey.

Here's an excerpt from the "Introduction":

The Repository Interface for Overlaid Journal Archives (RIOJA) project (http://www.ucl.ac.uk/ls/rioja) is an international partnership of members of academic staff, librarians and technologists from UCL (University College London), the University of Cambridge, the University of Glasgow, Imperial College London and Cornell University. It aims to address some of the issues around the development and implementation of a new publishing model, that of the overlay journal – defined, for the purposes of the project, as a quality-assured journal whose content is deposited to and resides in one or more open access repositories. The project is funded by the Joint Information Systems Committee (JISC, http://www.jisc.ac.uk/) and runs from April 2007 to June 2008.

The RIOJA project will create an interoperability toolkit to enable the overlay of certification onto papers housed in subject repositories. The intention is that the tool will be generic, helping any repository to realise its potential to act as a more complete scholarly resource. The project will also create a demonstrator overlay journal, using the arXiv repository and OJS software, with interaction between the two facilitated by the RIOJA toolkit.

To inform and shape the project, a survey of Astrophysics and Cosmology researchers has been conducted. The findings from that survey form the basis of this report.

The project team will also undertake formal and informal discussion with publishers and with academic and managing members of editorial boards. The survey and supplementary discussions will help to ensure that the RIOJA outputs address the needs and expectations of the research community. Finally, the overall long-term sustainability of a repository-overlay journal will be assessed. The project will examine the costs of adding peer review to arXiv deposits, of implementing and maintaining the functionality which the survey shows to be most valued by researchers, and of providing long-term preservation of content, and will aim to identify and appraise possible cost-recovery business models.

Open Repositories 2008 Presentations

Presentations from the Open Repositories 2008 conference are available in the OR08 Publications repository.

The easiest way to find presentations is to use the Browse by Subject capability; however, both simple and advanced search functions are available as well.

Currently, the repository holds over 90 documents. You can track new additions at the Latest Additions to OR08 Publications page (RSS feed). It's anticipated that all documents will be available by 4/13/08.

Here's a brief selection of available presentations:

Project Reports from the Andrew W. Mellon Foundation's 2008 Research in Information Technology Retreat

Project reports from the Andrew W. Mellon Foundation's 2008 Research in Information Technology retreat are now available.

Here are selected project briefing reports:

Weblog Reports from Open Repositories 2008

Below are selected Weblog reports from Open Repositories 2008.

Ball State University Libraries Move Ahead with Ambitious Digital Initiative Program

The Ball State Libraries have nurtured an ambitious digital initatives program that has established an institutional repository, a CONTENTdm system for managing digital assets, a Digital Media Repository with over 102,000 digital objects, a Digitization Center and Mobile Digitization Unit, an e-Archives for university records, and a virtual press (among other initiatives). Future goals are equally ambitious.

Read more about it at "Goals for Ball State University Libraries' Digital Initiative."

Tracking Deposit Growth: UK Repository Records Statistics

Chris Keene, Technical Development Manager at the University of Sussex Library, has released UK Repository Records Statistics, which provides U.K. institutional repository record growth data from July 2006 onwards based on ROAR statistics. For example, the site has a table showing monthly record totals.

Repository Planning Checklist and Guidance Released: Presents Planning Tool for Trusted Electronic Repositories (PLATTER)

DigitalPreservationEurope has released Repository Planning Checklist and Guidance.

Here's an excerpt from the "Executive Summary and Introduction to Platter":

The purpose of this document is to present a tool, the Planning Tool for Trusted Electronic Repositories (PLATTER) which provides a basis for a digital repository to plan the development of its goals, objectives and performance targets over the course of its lifetime in a manner which will contribute to the repository establishing trusted status amongst its stakeholders. PLATTER is not in itself an audit or certification tool but is rather designed to complement existing audit and certification tools by providing a framework which will allow new repositories to incorporate the goal of achieving trust into their planning from an early stage. A repository planned using PLATTER will find itself in a strong position when it subsequently comes to apply one of the existing auditing tools to confirm the adequacy of its procedures for maintaining the long term usability of and access to its material. . . .

The PLATTER process is centred around a group of Strategic Objective Plans (SOPs) through which a repository specifies its current objectives, targets, or key performance indicators in those areas which have been identified as central to the process of establishing trust. In the future, PLATTER can and should be used as the basis for an electronic tool in which repositories will be able to compare their targets with those adopted by other similar (suitably anonymised) repositories. The intention is that the SOPs should be living documents which evolve with the repository, and PLATTER therefore defines a planning cycle through which the SOPs can develop symbiotically with the repository organisation.

OAI-ORE for Fedora: Oreprovider Released

Oskar Grenholm of the National Library of Sweden has released oreprovider, an open-source Java application that "will let you disseminate digital objects stored in a Fedora repository as OAI-ORE Resource Maps."

In the announcement, he says:

The idea behind it all is that you have a Java web application (oreprovider.war) that, on the fly, will generate Resource Maps serialized as Atom feeds (using OAI4J) for objects in Fedora. All you have to do in Fedora is to add information in RELS-EXT what datastreams belongs to which Resource Map (exactly how to do this can be seen at the projects web page).

DSpace Version 1.5 Released

Version 1.5 of DSpace, which is a major upgrade, has been released.

Here's an excerpt from the announcement:

The DSpace community is pleased to announce the release of DSpace 1.5! This is an important release of DSpace with many new features, including a completely new theme-able Manakin user interface, SWORD integration, many new configurable options, and scalability improvements. . . .

New Features:

  • Maven DSpace 1.5 introduces a new Maven-based build system. Maven is a software tool from Apache that allows developers to compile and distribute software projects. Maven also enables DSpace to be more modular by arranging the software into sub-components. In addition, it makes customizations easier by giving developers the tools to maintain customizations, and provides the ability to manage new features as DSpace continues its accelerating growth rate. . . .
  • Manakin Customize your repository look-and-feel with the new Manakin theme-able user interface. Manakin introduces a new modular framework, enabling an institution to customize their interface according to the specific needs of the particular repository, community, or collection. . . .
  • Light Network Interface Integrate DSpace with legacy or local systems that need to manage content in the repository through the new Light Network Interface. This interface provides a programmatic mechanism to manage content within the repository through a WebDAV or SOAP based protocol. . . .
  • SWORD Integrate with the new SWORD (Simple Web-service Offering Repository Deposit) protocol. Based upon the Atom Publishing Protocol, this interface allows for cross-repository deposit of new content. This protocol may enable future tools that will provide for 'one click' deposit. . . .
  • Browsing The browsing system has been completely re-implemented to provide improved scalability and configuration. The new browsing system enables administrators to easily create new browse indexes. . . .
  • Submissions The item submission system is now more configurable by managing the steps a user follows when submitting a new item to the repository. The new submission system allows for these steps to be rearranged, removed, and even allows for new steps to be added. . . .
  • Events Another under-the-hood improvement introduced in DSpace 1.5 is the event system, which improves scalability and modularity by introducing an event model to the architecture. This feature will allow future add-ons to automatically manage content in the repository based upon when an object has been added, modified, or removed from the system.

Microsoft to Unveil Research-Output Repository Platform at Open Repositories 2008

Microsoft will unveil its Windows-based research-output repository platform in early April at Open Repositories 2008. Initially, the software will be used internally to support a repository for Microsoft Research. At a later date, it will be made available for public download, possibly as open-source software.

Here's an excerpt from "Microsoft and 'Research-Output' Repositories":

The platform has a "semantic computing" flavor. The concepts of "resource" and "relationship" are first-class citizens in our platform API. We do offer a number of "research-output"-related entities for those who want to use them (e.g. "technical report", "thesis", "book", "software download", "data", etc.), all of which inherit from "resource". However, new entities can be introduced into the system (even programmatically) while the existing ones can be further extended through the addition of properties. . . .

We are already well into the process of developing a collection of tools and interfaces on top of the platform as tangible examples of how to use it. We already have implementations of OAI-PMH, BibTeX import/export, customized feed syndication service, ASP.NET controls providing access to the repository, and working on Search and a simple Web UI. We are also working on WPF and Silverlight tools for visualizing the relationships between the resources within our repository. . . .

At the Open Repositories 2008 conference, we will formally unveil our work in advance of its official release and initiate interactions/exchanges with the DSpace, EPrints, Fedora, and other players in the repository community. This is crucial to us because—like every other project our group undertakes—we are intensely focused on interoperability.

I want to be very transparent here: our effort is intended to provide a repository option to those institutions/organizations that already license or have access to Microsoft software (including the free versions of the products, like SQL Server Express). Our platform is intended to sit on top of the existing Microsoft "stack". By providing this new research-output repository platform at no cost, we can offer added value for our existing (and future) customers in the academic and research space. It is critical to point out that we are making every effort to ensure our platform is optimized to make the best use of Microsoft technologies AND to also interoperate with all other existing systems and platforms in the repository ecosystem. We are actively seeking engagement and feedback from the community!

Read more about it at “Microsoft Famulus: New IR Software.”

Microsoft Developing Authoring Add-in for Microsoft Office Word 2007 with NLM DTD Support

Microsoft is developing an Article Authoring Add-in for Microsoft Office Word 2007, which will support the NLM DTD. A Technology Preview of the Add-in is available.

Here's an excerpt from the Technical Computing @ Microsoft—Scholarly Publishing page:

In support of the increased emphasis on electronic publishing and archiving of scholarly articles, Microsoft has developed the Article Authoring Add-in for Microsoft Office Word 2007. This add-in will support the XML format from the National Library of Medicine (NLM), which is commonly used in the scientific, technical, and medical (STM) publishing market as part of the publishing workflow and as the format used for the archiving of articles. Pre-release versions of this add-in will target the staff at STM journals and publishers, at information repositories, and in-house and commercial software developers supporting the STM market.

The Article Authoring Add-in for Word 2007 will enable or simplify a number of activities that are part of the authoring and scholarly publishing process, such as:

  • gathering information about the authors and article content at the time the article is written;
  • enabling journals to provide authors with templates containing the structure for articles, and information for self-classification of the articles by the authors;
  • enabling access to the authors and article metadata contained in the Word file through the use of the NLM format and OpenXML document structure;
  • enabling the editorial staff to have access to the article and journal metadata directly within Word; and
  • enabling two-way conversion between Office OpenXML and the NLM format.

Greg Tananbaum consulted with Microsoft on the development of the tool.

Preserving Mixed Analog/Digital AV Archives: PrestoSpace Project Case Study

The Digital Curation Centre has published DCC Case Study—PrestoSpace: Preservation towards Storage and Access. Standardised Practices for Audiovisual Contents in Europe.

Here's the "Executive Summary":

Explicit strategies are needed to manage 'mixed' audio visual (AV) archives that contain both analogue and digital materials. The PrestoSpace Project brings together industry leaders, research institutes, and other stakeholders at a European level, to provide products and services for effective automated preservation and access solutions for diverse AV collections. The Project’s main objective is to develop and promote flexible, integrated and affordable services for AV preservation, restoration, and storage with a view to enabling migration to digital formats in AV archives.

Presentations from the Open Access Collections Workshop Now Available

Presentations from the Australian Partnership for Sustainable Repositories' Open Access Collections workshop are now available. Presentations are in HTML/PDF, MP3, and digital video formats. The workshop was held in association with the Queensland University Libraries Office of Cooperation and the University of Queensland Library.

Planets Project Releases White Paper: Representation Information Registries

The Planets (Preservation and Long-term Access through Networked Services) project has released White Paper: Representation Information Registries.

Here's the "Executive Summary":

This document is a report on the state-of-the-art in the field of Representation Information Registries (RIRs). RIRs are widely recognised as a critical component of digital preservation architecture in general, and a number of such registries are being developed as part of the Planets architecture in particular. This document discusses the development of the concept of representation information, and of the use of registries as a means of exposing that information for use by digital preservation services; it describes the RIR implementations which currently exist or are under development globally; it assesses planned and potential future developments in this area; it discusses the role which RIRs play within the Planets project, and concludes with recommendations for future areas of research within Planets and beyond.

Dealing with Research Data in a Federated Digital Repository: Oxford University Planning Document Released

The Oxford e-Research Centre has released Scoping Digital Repository Services for Research Data Management, a project plan for determining the requirements for handling data in a federated digital repository at Oxford University.

Here's an excerpt from the "Aims and Objectives" section:

Objectives:

  • Capture and document researchers’ requirements for digital repository services to handle research data.
  • Participate actively in the development of an interoperability framework for the federated digital repository at Oxford.
  • Make recommendations to improve and coordinate the provision of digital repository services for research data.
  • Initiate and develop collaborations with the different repository activities already occurring to ensure that communication takes place in between them.
  • Raise awareness at Oxford of the importance and advantages of the active management of research data.
  • Communicate significant national and international developments in repositories to relevant Oxford stakeholders, in order to stimulate the adoption of best practices.

Essays from the Core Functions of the Research Library in the 21st Century Meeting

The Council on Library and Information Resources has released essays from its recent Core Functions of the Research Library in the 21st Century meeting.

Here's an excerpt from the meeting home page that lists the essays:

"The Future of the Library in the Research University," by Paul Courant

"Accelerating Learning and Discovery: Refining the Role of Academic Librarians," by Andrew Dillon

"A New Value Equation Challenge: The Emergence of eResearch and Roles for Research Libraries," by Richard E. Luce

"Co-teaching: The Library and Me," by Stephen G. Nichols

"Groundskeepers to Gatekeepers: How to Change Faculty Perceptions of Librarians and Ensure the Future of the Research Library," by Daphnee Rentfrow

"The Research Library in the 21st Century: Collecting, Preserving, and Making Accessible Resources for Scholarship," by Abby Smith

"The Role of the Library in 21st Century Scholarly Publishing," by Kate Wittenberg

"Leveraging Digital Technologies in Service to Culture and Society: The Role of Libraries as Collaborators," by Lee Zia

SEASR (Software Environment for the Advancement of Scholarly Research)

The Andrew W. Mellon Foundation-funded SEASR (Software Environment for the Advancement of Scholarly Research) project is building digital humanities cyberinfrastructure.

Here's an excerpt about the project from its home page:

What can SEASR do for scholars?

  • help scholars to access existing large data stores more readily
  • provide scholars with enhanced data synthesis and query analysis: from focused data retrieval and data integration, to intelligent human-computer interactions for knowledge access, to semantic data enrichment, to entity and relationship discovery, to knowledge discovery and hypothesis generation
  • empower collaboration among scholars by enhancing and innovating virtual research environments

What kind of innovations does SEASR provide for the humanities?

  • a complete, fully integrated, state-of-the-art software environment for managing structured and unstructured data and analyzing digital libraries, repositories and archives, as well as educational platforms
  • an open source, end-to-end software system that enables researchers to develop, evolve, and maintain data interoperability, evaluation, analysis, and visualization

Read more about it at "Placing SEASR within the Digital Library Movement."

Helping Researchers Understand and Label Article Versions: VERSIONS Toolkit Released

The VERSIONS (Versions of Eprints—A User Requirements Study and Investigation Of the Need for Standards) project has released the VERSIONS Toolkit.

Here's an excerpt from the "Introduction":

If you are an experienced researcher you are likely to be disseminating your work on a personal website, in a subject archive, or in an institutional repository already. This toolkit aims to:

  • provide peer-to-peer advice about managing personal versions and revisions in order to keep your options open for future use of your work
  • clarify areas of uncertainty among researchers about agreements with publishers and how these relate to different versions of research outputs
  • suggest ways to identify your work clearly when placing it on the web in order to guide your readers to the latest and best versions of your work
  • direct you to further resources about making versions of your work openly accessible

The toolkit draws on the results of a survey of researchers’ attitudes and current practice when creating, storing and disseminating different versions of their research. As such the guidance in the toolkit represents the views of active researchers. Survey respondents were predominantly from economics and related disciplines.