Digital Repositories – Page 26

Interview with Tony Hey, Corporate Vice President of Microsoft’s External Research Division

Jon Udell's wide-ranging Perspectives interview with Tony Hey, Corporate Vice President of Microsoft’s External Research Division, is now available.

July/August NewSpace Newsletter from DSpace

The DSpace Foundation has published the July/August NewSpace newsletter.

Major Upgrade: Fedora 3.0 Released

Fedora Commons has released version 3.0 Fedora, which "completes all general release features."

Here's an excerpt from the press release:

Dan Davis, Chief Software Architect, Fedora Commons, explained, "We are pleased to offer a Fedora 3.0 that is a foundational step towards a model-driven content architecture." He went on to say, "Users will find it simpler to maintain and operate their repositories with version 3.0-it's more scalable and fits better into the Web."

Fedora 3.0 features the Content Model Architecture (CMA), an integrated structure for persisting and delivering the essential characteristics of digital objects in Fedora. The software is available at http://www.fedora-commons.org/ and at http://sourceforge.net/projects/fedora-commons. The Fedora CMA plays a central role in the Fedora architecture, in many ways forms the over-arching conceptual framework for future development of Fedora Repositories. Fedora 3.0 features include:

Overview of new Features in Fedora 3.0 Release

Content Model Architecture—Provides a model-driven approach for persisting and delivering the essential characteristics of digital content in Fedora

Fedora REST API—A new API that exposes a subset of the Access and Management API using a RESTful Web interface contributed by MediaShelf

Mulgara Support—Fedora supports the Mulgara 2.0 Semantic Triplestore replacing Kowari

Migration Utility—Provides an update utility to convert existing collections for Content Model Architecture compatibility

Relational Index Simplification—The Fedora schema was simplified making changes easier without having to reload the database and significantly increasing scalability

Dynamic Behaviors—Objects may be added or removed dynamically from the system moving system checks into run-time errors

Error Reporting—Provides improved run-time error details

Multiple Owner as a CSV String—Enables using a CSV string as ownerID and in XACML policies

Java 6 Compatibility—Fedora may be optionally compiled using Java 6 while retaining support for Java Enterprise Edition 1.5 deployments

Relationships API—API-M has been extended to enable adding, removing, and discovering RDF relations between Fedora objects

Revised Fedora Object XML Schemas—The new schemas are simpler, supporting the CMA and removing Disseminators

Atom Support—Fedora objects can now be imported and exported in the Atom format

Messaging Support—Integrates JMS messaging for sending notification of important events

Validation Framework—Provides system operators a way to validate all or part of their repository, based on content models

3.0-Compatible Service Releases—New versions of the OAI Provider and GSearch services are compatible with Fedora 3.0. The GSearch release also enables messaging support for GSearch, which allows for more robust and seamless integration with the Fedora repository.

Many new enhancements—see the Release Notes:
http://www.fedora-commons.org/documentation/3.0b2/
userdocs/distribution/release-notes.html.

New CONTENTdm Add-on: OCLC Web Harvester

OCLC has announced the availability of Web Harvester, which allows CONTENTdm sites to import Web content into their systems.

Here's an excerpt from the press release:

OCLC's Web Harvester evolved from collaboration with several state libraries, state archives and universities over a period of seven years. Participants emphasized the increasing importance of collecting and managing Web-based content as information resources move online yet remain within libraries' and archives' collection scopes.

The Web Harvester is integrated into library workflows, allowing library staff to capture content as part of the cataloging process. The captured content is then sent to the organization's digital collections where it can be managed with other CONTENTdm digital content. . . .

The Web Harvester is accessed via the Connexion client, OCLC's powerful cataloging service, and captures content ranging from single, Web-based documents to entire Web sites. Once retrieved, users can review the captured Web content and add it to a collection managed by OCLC's CONTENTdm software, a complete solution for storing, managing and delivering a library's digital collections to the Web. Once in CONTENTdm, then Web content can be accessed and managed in conjunction with other digital collections. Harvested items are discoverable from WorldCat.org, WorldCat Local and the CONTENTdm Web interface.

For additional security, master files of the captured content also can be ingested to the OCLC Digital Archive, the service for long-term storage of originals and master files from libraries' digital collections.

OpenDOAR/Google Maps Mashup

OpenDOAR is mapping repository data using Google Maps.

Here's an excerpt from the announcement:

SHERPA is pleased to announce the addition of a Google Maps extension to OpenDOAR, its directory of open access repositories (http://www.opendoar.org/find). Just run any search of the directory, and then change the output format from "Summaries" to "Google Map".

Here are a few examples:

1. http://www.opendoar.org/find?format=gmap&cID=jp
—Repositories in Japan . . .

3. http://www.opendoar.org/find?format=gmap&cID=us&ctID=6
—United States repositories holding theses & dissertations

4. http://www.opendoar.org/find?format=gmap&search=Nottingham
—Keyword search for "Nottingham"

5. http://www.opendoar.org/find?format=gmap&rSoftWareName=
CONTENTdm
—Repositories using CONTENTdm software

DSpace Foundation and Fedora Commons Announce Decision to Collaborate

The DSpace Foundation and Fedora Commons have announced that they will collaborate on future digital repository initiatives.

Here's an excerpt from the press release:

Today two of the largest providers of open source software for managing and providing access to digital content, the DSpace Foundation and Fedora Commons, announced plans to combine strengths to work on joint initiatives that will more closely align their organizations' goals and better serve both open source repository communities in the coming months. . . .

The collaboration is expected to benefit over 500 organizations from around the world who are currently using either DSpace (examples include MIT, Rice University, Texas Digital Library and University of Toronto) or Fedora (examples include the National Library of France, New York Public Library, Encyclopedia of Chicago and eSciDoc) open source software to create repositories for a wide variety of purposes. . . .

The decision to collaborate came out of meetings held this spring where members of DSpace and Fedora Commons communities discussed multiple dimensions of cooperation and collaboration between the two organizations. Ideas included leveraging the power and reach of open source knowledge communities by using the same services and standards in the future. The organizations will also explore opportunities to provide new capabilities for accessing and preserving digital content, developing common web services, and enabling interoperability across repositories.

In the spirit of advancing open source software, Fedora Commons and DSpace will look at ways to leverage and incubate ideas, community and culture to:

Provide the best technology and services to open source repository framework communities.

Evaluate and synchronize, where possible, both organizations' technology roadmaps to enable convergence and interoperability of key architectural components.

Demonstrate how the DSpace and Fedora open source repository frameworks offer a unique value proposition compared to proprietary solutions.

The announcement came on the heels of an event sponsored by the Joint Information Systems Committee's (JISC) Common Repository Interface Group (CRIG) held at the Library of Congress. The event, known as "RepoCamp," was a forum where developers gathered to discuss innovative approaches to improving interoperability and web-orientation for digital repositories. Sandy Payette, Executive Director of Fedora Commons, and Michele Kimpton, Executive Director of the DSpace Foundation, reiterated their commitment to collaboration and encouraged input and participation from both communities as work gets underway.

Oxford Releases Report on Digital Repository Services for Research Data Management

The Oxford University Office of the Director of IT has released Findings of the Scoping Study Interviews and the Research Data Management Workshop: Scoping Digital Repository Services for Research Data Management.

Here's an excerpt from the report's Web page:

The scoping study interviews aimed to document data management practices from Oxford researchers as well as to capture their requirements for services to help them manage their data more effectively. In order to do this, 37 face-to-face interviews were conducted between May and June with researchers from 27 colleges, departments and faculties. In addition to this, the Research Data Management Workshop was organised to complement the findings of the scoping study interviews.

APSR Releases Investigating Data Management Practices in Australian Universities

The Australian Partnership for Sustainable Repositories has released Investigating Data Management Practices in Australian Universities.

Here an excerpt from the report's Web page:

In late 2007, The University of Queensland undertook a survey of data management practices among the university’s researchers. This was done in response to the increasing realisation that repositories need to include research data, in addition to the research outputs in print form already included, and to provide information which would enhance the support provided for those engaged in eResearch.

The survey was carried out using the Apollo software developed at The Australian National University and adapted by APSR. Two other universities, The University of Melbourne and the Queensland University of Technology, have now replicated the survey among their own communities, while adding some questions of local interest.

The survey covers questions such as the types of digital data being created (spreadsheets, documents, experimental data, images, fieldwork data, etc), the size of the data collection, software used for data analysis, data storage and backup, application of a data management plan, roles and responsibilities around data management, copyright frameworks, usage of high capacity computing, and much more.

Microsoft’s Free Digital Tools for Scholars

At the ninth annual Microsoft Research Faculty Summit, Tony Hey, Corporate Vice President of Microsoft’s External Research Division, discussed a variety of digital tools for scholars.

Here's an excerpt from the press release:

Add-ins. The Article Authoring Add-in for Word 2007 enables metadata to be captured at the authoring stage to preserve document structure and semantic information throughout the publishing process, which is essential for enabling search, discovery and analysis in subsequent stages of the life cycle. The Creative Commons Add-in for Office 2007 allows authors to embed Creative Commons licenses directly into an Office document (Word, Excel or PowerPoint) by linking to the Creative Commons site via a Web service.

The Microsoft e-Journal Service. This offering provides a hosted, full-service solution that facilitates easy self-publishing of online-only journals to facilitate the availability of conference proceedings and small and medium-sized journals.

Research Output Repository Platform. This platform helps capture and leverage semantic relationships among academic objects—such as papers, lectures, presentations and video—to greatly facilitate access to these items in exciting new ways.

The Research Information Centre. In close partnership with the British Library, this collaborative workspace will be hosted via Microsoft Office SharePoint Server 2007 and will allow researchers to collaborate throughout the entire research project workflow, from seeking research funding to searching and collecting information, as well as managing data, papers and other research objects throughout the research process.

Here's a list that indicates availability.

Article Authoring Add-in version 1.0 for Microsoft Office Word 2007 (download)
Creative Commons Add-in version 1.0 for Microsoft Office (download)
Microsoft Math Add-in for Microsoft Office Word 2007 (download)
Microsoft eJournal Service (alpha preview)
Research Output Repository Platform ("Currently in a limited alpha release, an open beta version will be available later in 2008.")
Research Information Centre ("This service is currently in beta testing. Microsoft intends to share the code widely by the end of the year.")

Blog Reports from JISC Innovation Forum 2008 "Research Data—Whose Problem Is It?" Session

Live blogging reports are available from the "Research Data—Whose Problem Is It?" session at the JISC Innovation Forum 2008.

DSpace Can Support One Million Items

A paper by researchers from the National Library of Medicine ("Testing the Scalability of a DSpace-based Archive") finds that DSpace can support an archive with a million items. The tested system "is built upon MIT's DSpace software (Version 1.4), with some modifications and enhancements to better facilitate batch based processing."

Here's an excerpt from the conclusion:

We conclude that the version of DSpace used in SPER (with MySQL database) shows acceptable ingest performance for a million-item archive. . . .

The experimental results shown here pertain to items with mostly one or two monochrome TIFF images, though a few items have up to 100 images. However, a number of inferences may be derived from these results.

No real problems were found in ingesting a million items to the archive, using a Sun X4500 server machine, in terms of either performance or reliability of the SPER/DSpace software architecture and implementation. . . .

With the increase in archive size, the average ingest time of an item increases in a smooth and predictable way.

With increasing number of TIFF images, the ingest time (per item) increases by three to four percent for each additional image.

If color TIFF images were used, the ingest times would increase slightly due to the overhead of copying additional data to the upload area, and to the archive's asset storage. However, other archival overheads should not change.

JISC Asks: What's a Repository?

JISC's Information Environment team is using IdeaScale to crowdsource a definition of repositories.

Texas Digital Library Hosts Second E-Journal

The Texas Digital Library is hosting the Journal of Virtual Worlds Research. The first issue is now available.

Articles in the Journal of Virtual Worlds Research are freely available in the PDF format, and they are under a Creative Commons Attribution-No Derivative Works 3.0 United States License.

The journal is edited by Jeremiah Spence, a doctoral student at the University of Texas at Austin's College of Communication.

The Texas Digital Library also hosts the Journal of Digital Information. Articles in the Journal of Digital Information are freely available in the PDF or HTML formats, and authors retain the copyright to them. Supported by the Texas A&M University Libraries, it is edited by Cliff McKnight, Professor of Information Studies at Loughborough University, and Scott Phillips, Research and Development Coordinator at the Texas A&M University Libraries' Digital Initiatives department.

Personalized Repository Service: DRIVER Adds MyDriver

DRIVER (Digital Repository Infrastructure Vision for European Research) has added the MyDriver service, which allows registered users to create both saved searches that trigger new content alerts via e-mail and personal search filters. It also allows registered users to join communities.

Name Authority Service for Institutional and Subject Repositories: The Names Project

The JISC-funded Names Project is working toward the development of a prototype name authority service for UK repositories that "will reliably and uniquely identify individuals and institutions."

The Names Project has just released the Software Requirements Specification for the Names Project Prototype.

Read more about the project at "What’s in a Name? Prototyping a Name Authority Service for UK Repositories."

SPARC Europe and DRIVER to Collaborate on Promoting Repositories

SPARC Europe and DRIVER have signed a Memorandum of Agreement to collaborate on promoting digital repositories in Europe.

Here's an excerpt from the press release:

SPARC Europe and DRIVER today confirmed a need for cooperation in order to progress and enhance the provision, visibility and application of European research outputs through digital repositories, in systems providing access to texts, data or other types of content. DRIVER is a joint initiative of European stakeholders, co-financed by the European Commission, setting up a technical infrastructure for digital repositories and facilitating the building of an umbrella organisation for digital repositories. DRIVER relies on research libraries for the sustainable operation of repositories and provision of high quality content through digital repositories. SPARC Europe and DRIVER share the vision that research institutions should contribute actively and cooperatively to a common, pan-European data and service infrastructure based on digital repositories. . . .

Collaboration between SPARC Europe and DRIVER is framed by their joint support for an Open Access model for repositories in research institutions. They will present a common lobby at a national and international level to leverage change through the scholarly community within respective institutions and countries. Their reciprocal support will ensure wider access to standards for interoperability between repositories, and the adoption of emerging technical standards to facilitate open archiving. This agreement demonstrates their joint commitment to promote a European network of repositories offering access to research outputs across institutional and national boundaries.

David Prosser, Director of SPARC Europe, said "Europe is well placed to take a leading role internationally in the development of institutional repositories. A combination of institutional interest, progressive polices from funding bodies, and strong support from the European Commission creates the perfect conditions to foster an open research environment. DRIVER is a key component in underpinning the European repository infrastructure and we are very pleased to cement our already close relationship by signing this agreement."

Interview with Brad McLean, DSpace Foundation's New Technology Director

An interview with Brad McLean, DSpace Foundation's New Technology Director, has been published in the latest issue of NewSpace.

Australian National University's Harvester Service Released

The Australian National University has released its Harvester Service.

Here's an excerpt from the announcement:

The Harvester Service is a proxy harvester for processing and routing OAI-PMH Data Provider responses to various applications. It is intended it be used for integration with other applications requiring a harvesting service.

EM-Loader: Making Self-Archiving Easier

Building on the work of the SWORD Project, the EM-Loader project will build software that allows authors to use the metadata from PublicationsList.org to deposit works in the Depot.

Here's an excerpt from the announcement:

We will show proof of concept at an early stage by building a web service module that connects two existing services: the Depot, the JISC repository for researchers who do not have other provision; and PublicationsList.org, a service for researchers to build a web page listing their publications. Instead of recreating interoperability standards from scratch, the project has adopted and expanded the SWORD Deposit API.

In our revised approach we suggest that depositing papers into repositories can be made easier and rewarding for researchers by concentrating initially on compiling a personal publications list with complete metadata and then performing a batch submission to the repository.

Traditionally stage 1—compiling a personal bibliography—is by manual entry, but this can be made much easier with batch search and select of items from citation databases such as Web of Science and PubMed, and import from personal bibliography tools such as BibTeX, EndNote and Reference Manager. Full text of papers can be uploaded and attached to metadata in stage 2 (typically in PDF or DOC formats).

Functionality for stages 1 and 2 already exists and is provided to this project through PublicationsList.org. The main focus of our project activity is to build the workflow to enable all the structured metadata to be forwarded to the appropriate repository, alongside the associated digital object (full text) where available.

Read more about it at: EM-Loader and the EM-Loader proposal.

(Thanks to Open Access News.)

Repositories Support Project Releases Repository Planning Checklists

The Repositories Support Project has released three repository planning checklists:

D-NET Version 1.0 Released by Digital Repository Infrastructure Vision for European Research

DRIVER (Digital Repository Infrastructure Vision for European Research) has released version 1.0 of D-NET.

Here's an excerpt from the announcement:

The first of its kind, this open source software offers a tool-box for deploying a customizable distributed system featuring tools for harvesting and aggregating heterogeneous data sources. A variety of end-user functionalities are applied over this integration, ranging from search, recommendation, collections, profiling to innovative tools for repository manager users. . . .

The DRIVER D-NET v. 1.0 software is released under the Open Source Apache license with accompanying documentation, and with (limited to capacity) technical support by the DRIVER Consortium technical partners. . . .

In particular, the DRIVER software can be used for two main reasons:

Deploying new services on top of an operational DRIVER infrastructure Running instances of the DRIVER Infrastructure can be enriched in any moment with new service instances so as to empower or expand the available functionalities. Examples are:

Deployment and configuration of customized portals for designated communities over the aggregated data (e.g. a portal over national repositories or over subject-driven content, such as RECOLECTA and DART Europe DEEP above);

Deployment of new aggregation services so as to distribute and delegate harvesting and aggregating activities to specialized DRIVER National or Community Correspondents, carrying out their tasks over an assigned selection of repositories.

Deploying a new DRIVER infrastructure to serve other service providers and communities

CNI Spring 2008 Task Force Meeting Presentations

Presentations and project briefings from the CNI Spring 2008 Task Force Meeting are available. Podcast interviews with a few attendees are also available.

Here's a selection of project briefings:

Assessing Research Cyberinfrastructure Needs at the University of Minnesota
Ann Hill Duin, University of Minnesota
Eric F. Celeste, University of Minnesota
John T. Butler, University of Minnesota
Kemal Badur, University of Minnesota
NIH Public Access Policy: Campus Implementation Strategies
Joan Giesecke, University of Nebraska at Lincoln
Wendy Pradt Lougee, University of Minnesota
Karla Hahn, Association of Research Libraries
An OAI-ORE Aggregation for the National Virtual Observatory
David Reynolds, Johns Hopkins University
Starting an Institutional Repository Program in Two Months or Less:
The Good, the Bad, and the Ugly
Abby Clobridge, Bucknell University
Streaming from the Institutional Repository
Geneva Henry, Rice University
Diane C. Butler, Rice University

Digital Research Data Curation: Overview of Issues, Current Activities, and Opportunities for the Cornell University Library

Cornell University Library's Data Working Group has deposited its Digital Research Data Curation: Overview of Issues, Current Activities, and Opportunities for the Cornell University Library report in the eCommons@Cornell repository.

Here's the abstract:

Advances in computational capacity and tools, coupled with the accelerating collection and accumulation of data in many disciplines, are giving rise to new modes of conducting research. Infrastructure to promote and support the curation of digital research data is not yet fully-developed in all research disciplines, scales, and contexts. Organizations of all kinds are examining and staking out their potential roles in the areas of cyberinfrastructure development, data-driven scholarship, and data curation. The purpose of the Cornell University Library's (CUL) Data Working Group (DaWG) is to exchange information about CUL activities related to data curation, to review and exchange information about developments and activities in data curation in general, and to consider and recommend strategic opportunities for CUL to engage in the area of data curation. This white paper aims to fulfill this last element of the DaWG's charge.

Solr Search Engine Plug-In for Fedora Released

The DRAMA team has released a Solr plug-in for Fedora.

Here's a description of Solr from its home page:

Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, and a web administration interface. It runs in a Java servlet container such as Tomcat.

CIC Shared Digital Repository Project Update

A recently updated description of the Committee on Institutional Cooperation's Shared Digital Repository Project is available at Indiana University's Project: Shared Digital Repository page.

Here's an excerpt:

Description: The Shared Digital Repository (SDR) leverages the tradition of leadership in collaboration among the institutions of the Committee on Institutional Cooperation (CIC). The SDR operates under the leadership of the Repository Administrators (Indiana University and the University of Michigan), which also provide a large part of the funding. Additional governance and financial support are provided by the charter participating libraries of the CIC, and by other libraries and library consortia wishing to archive digital content.

Outcome: The SDR offers persistent and high-availability storage for digitized book and journal content, beginning with the Google content from the CIC members and later extending to other digitized content. The SDR will leverage technology investments and developments at the University of Michigan to build (through IU/UM collaboration) more generalized versions of Michigan's services and gain efficiencies from Michigan's investments. . . .

Milestones and status:

As of April 11, 2008, the SDR contains:

1,122,007 volumes

791,460 titles

approximately 393 million pages

213,379 individual volumes in the public domain (19% of the total)

Timeline:

Early 2008: Bloomington backup storage installed

January-March 2008: Page turner mechanism with branding; ability to publish virtual collections (UM-specific version); assessment of global searching functionality; access mechanisms for persons with visual disabilities

September-December 2008: Mechanism for direct ingest of non-Google content; compliance with the required elements in the "Trustworthy Repositories Audit & Certification (TRAC): Criteria and Checklist"