The Greenstone Digital Library project has released an alpha version of an OAI-PHM metadata analysis tool that can be used to "generate statistics and visualisations of OAI repositories." Several sample reports are available, including one for the University of Illinois IDEAL repository.
Category: Metadata
Presentations from eResearch Australasia 2007
Presentations from eResearch Australasia 2007 are now available.
Here are selected presentations:
- "Andrew Treloar: Supporting the e-Research Lifecycle from Acquisition through to Annotation: the DART/ARCHER Experience"
- "Brian Fitzgerald and Scott Kiel-Chisholm: The Legal Framework for e-Research Project"
- "Ian Johnson: Beyond Bibliographies: Integrating Research Data in a Unified Collaborative Framework"
- "Jane Hunter: Harvesting Community Tags and Annotations to Augment Institutional Repository Metadata"
- "Keith Webster: eResearch and the Future of Research Libraries"
- "Kerry Kilner: The Resource for Australian Literature"
- "Paul Arthur: Going Digital: Humanities and the eResearch Revolution"
- "Richard Levy and Austin McLean, ProQuest: Institutional Repositories: A Gateway to e-Research"
- "Ross Coleman: A Maturing Partnership: eHumanities and the Digital Library"
- "Sarah Howard and John Byron: Humanities Technologies: Research Methods and ICT Use by Humanities Researchers"
- "Toby Burrows and Elzbieta Majocha: Building Infrastructures for Web-based Collaboration in Humanities Research Networks"
ResearcherID.com and NISO Institutional Identifier
As scholarly digital information has proliferated in many formats and versions on the Internet, it has become increasing difficult to identify works that are by the same author or by the same institution. Recently, Thomson Scientific has begun work on author and institution identifiers.
Here's an excerpt from "Thomson Scientific Tagging Researchers: ResearcherID.com."
Thomson Scientific (http://scientific.thomson.com) has opened up a new web service called ResearcherID.com (www.researcherid.com) that allows researchers to establish their own identities and, with some restrictions, to identify their writings. . . .
Currently, all the registrants must have authorized access to Thomson Scientific's Web of Knowledge. In addition, all the registrants on the site are there by invitation only, but Pringle expects the service will be open to all Web of Knowledge users by the end of the month. Since Thomson estimates the access to that service to be 20 million users worldwide, this restriction would still make the service broad-based, if researchers choose to use it.
Here's an excerpt from "But What About Corporate Authors? NISO Institutional Identifier Project Underway."
Thomson Scientific (http://scientific.thomson.com) has joined an effort with the National Information Standards Organization (NISO; www.niso.org) to build an open standard for identifying institutions. The initial NISO effort will focus on academic and research institutions, the kind often referred to in author affiliation or corporate author fields. . . .
The charge from the voting membership to the new working group is to study and propose an identifier that will uniquely identify institutions and describe relationships between entities within institutions. In the course of developing a proposed identifier, the group will consider the minimum set of data consistent with account privacy and security issues, as well as other data used to support different business models.
Digital Asset Management Database Released: DAM Built on FileMaker Pro
Museums and the Online Archive of California (MOAC) has released the IMLS-funded Digital Asset Management Database (DAMD), a digital asset management system.
Here's an excerpt from the MOAC homepage:
Building on previous successful work in the areas of standards and online collections access, the new MOAC software tool, the Digital Asset Management Database (DAMD), has been developed as both a utilitarian tool and as a test case for exploring more general issues of content sharing and community tool development. This tool has two primary functions that can be used together or separately: it provides basic digital asset management for simple to complex media objects and it easily transforms collections information into an extensible variety of standards-based XML formats, such as METS and OAI, to allow even small organizations without technical staff to share their collections broadly and participate in building a national network of culture. DAMD was developed as an "open solution," built on FileMaker Pro software (8.5 or above) because of the broad base of installed users of FileMaker in the museum and arts communities. DAMD is available for free to cultural organizations. The tool, and its unique export/transform functions (detailed in the documentation), are open-ended, allowing organizations to customize the tool for themselves or the community to improve the tool for all.
Machine Services for Metadata Discovery and Aggregation—metadata+ Report
JISC has released Machine Services for Metadata Discovery and Aggregation—metadata+.
Here's an excerpt from the Executive Summary:
The main aim of the project is to develop an interoperability demonstrator to explore the technical aspects of providing a service-oriented infrastructure to facilitate metadata discovery and aggregation. The project developed a test bed that exposes metadata through standard search and linking protocols. Metadata mapping work was undertaken to enable the test bed to provide search response in multiple metadata schemas that are widely used in digital library and e-learning.
The core of the test bed consists of an open source digital repository—Fedora. Off-the-shelf, the repository provides web services for metadata searching and substantial content management and security features particularly suitable for real-life use scenarios. Since the search protocol considered in this project requires additional features that are not available from the repository, modifications to the repository source code were made. The modifications also involve incorporating the metadata mapping requirement such that search responses from different metadata formats can be facilitated.
A basic demonstrator (project website) has been created to exemplify how the search protocol can be used for discovering and aggregating metadata, as well as presenting them in coherent formats relevant to the intended presentation contexts. The metadata sources include publisher and digital libraries providing both bibliographic and user-generated (enrichment) metadata such as reviews and recommendations. In addition, the project demonstrated a novel use of the search protocol to dynamically create e-learning content packages, digital library metadata collection and news feeds.
Several digital libraries initiatives have evaluated the test bed infrastructure for real use scenarios. These libraries are an extended form of the test bed demonstrator and provide relevant facilities such metadata wiki (editor) and annotation services for gauging enrichment metadata (review, rating and recommendation) from users. They will continue the objectives of this project particularly on improving the test bed infrastructure and exploring the aggregated use of enrichment metadata, to enable the academic and research user communities to add values to bibliographic metadata from the publishers and libraries communities.
Alpha Release of the ORE Specification and User Guide
The Open Archives Initiative Object Reuse and Exchange has released an alpha version of the ORE Specification and User Guide. Comments can be made on the OAI-ORE discussion group or via email to ore@openarchives.org.
Here's an excerpt from the introduction:
The World Wide Web is built upon the notion of atomic units of information called resources that are identified with URIs such as http://www.openarchives.org/ore/0.1/toc (this page). In addition to these atomic units, aggregations of resources are often units of information in their own right. . . .
A mechanism to associate identities with these aggregations and describe them in a machine-readable manner would make them visible to Web agents, both humans and machines. This could be useful for a number of applications and contexts. For example:
- Crawler-based search engines could use such descriptions to index information and provide search results sets at the granularity of the aggregations rather than their individual parts.
- Browsers could leverage them to provide users with navigation aids for the aggregated resources, in the same manner that machine-readable site maps provide navigation clues for crawlers.
- Other automated agents such as preservation systems could use these descriptions as guides to understand a "whole document" and determine the best preservation strategy.
- Systems that mine and analyze networked information for citation analysis/bibliometrics could achieve better accuracy with knowledge of aggregation structure contained in these descriptions.
- These machine-readable descriptions could provide the foundation for advanced scholarly communication systems that allow the flexible reuse and refactoring of rich scholarly artifacts and their components [Value Chains].
University of Michigan Libraries Make over 100,000 Records for Digitized Books Available for Harvesting
The University of Michigan Libraries have made over 100,000 metadata records from its MBooks collection available for OAI-PMH harvesting. The records are for digitized books in the public domain.
Here's an excerpt from the announcement:
The University of Michigan Library is pleased to announce that records from our MBooks collection are available for OAI harvesting. The MBooks collection consists of materials digitized by Google in partnership with the University of Michigan.
http://quod.lib.umich.edu/cgi/o/oai/oai?verb=Identify
Only records for MBooks available in the public domain are exposed. We have split these into sets containing public domain items according to U.S. copyright law, and public domain items worldwide. There are currently over 100,000 records available for harvesting. We anticipate having 1 million records available when the entire U-M collection has been digitized by Google.
TASI Updates Digital Imaging Documents
The Technical Advisory Service for Images (TASI) has updated the following documents that deal with digital imaging issues:
Draft Report on the Future of Bibliographic Control Released for Comment
The Library of Congress has released a draft of the Report on the Future of Bibliographic Control for comment. Comments should be received by December 15.
Here's an excerpt from the "Introduction":
The recommendations fall into five general areas:
- Increase the efficiency of bibliographic production for all libraries through increased cooperation and increased sharing of bibliographic records, and by maximizing the use of data produced throughout the entire “supply chain” for information resources.
- Transfer effort into higher-value activity. In particular, expand the possibilities for knowledge creation by “exposing” rare and unique materials held by libraries that are currently hidden from view and, thus, underused.
- Position our technology for the future by recognizing that the World Wide Web is both our technology platform and the appropriate platform for the delivery of our standards. Recognize that people are not the only users of the data we produce in the name of bibliographic control, but so too are machine applications that interact with those data in a variety of ways.
- Position our community for the future by facilitating the incorporation of evaluative and other user-supplied information into our resource descriptions. Work to realize the potential of the FRBR framework for revealing and capitalizing on the various relationships that exist among information resources.
- Strengthen the library profession through education and the development of metrics that will inform decision-making now and in the future.
RLG Programs Descriptive Metadata Practices Survey Results Published
RLG Programs has published RLG Programs Descriptive Metadata Practices Survey Results and RLG Programs Descriptive Metadata Practices Survey Results: Data Supplement.
Here's an excerpt from the announcement:
We conducted this survey in July and August 2007 among 18 RLG partners in the United States and the United Kingdom, selected because they had "multiple metadata creation centers" on campus that included libraries, archives, and museums and had some interaction among them. Our objective was to gain a baseline understanding of current descriptive metadata practices and dependencies, the first project in our program to change metadata creation processes.
The report summarizes the descriptive practices used across a variety of applications, the data structure and data content standards followed, the audiences for the metadata created, and some organization patterns. The data from the 89 respondents is reported in a series of charts and graphs that are open to interpretation. RLG Programs offers its own interpretation in the prefatory narrative, flagging questions for follow up and goals for future projects. Although we saw some expected variations in practice across libraries, archives and museums, we were struck by the high levels of customization and local tool development, the limited extent to which tools and practices are, or can be, shared (both within and across institutions), the lack of confidence institutions have in the effectiveness of their tools, and the disconnect between their interest in creating metadata to serve their primary audiences and the inability to serve that audience within the most commonly used discovery systems (such as Google, Yahoo, etc.).
DLC-MODS Workbook 1.2: A Tool to Create MODS Metadata Records
The University of Tennessee Digital Library Center has released version 1.2 of the DLC-MODS Workbook under a GNU General Public License. A demo is also available.
Here's an excerpt from the diglib announcement:
The DLC-MODS Workbook provides a series of web pages that enable users to easily generate complex, valid MODS metadata records that meet the 1-4 levels of specification outlined in the Digital Library Federation Implementation Guidelines for Shareable MODS Records, (DLF Aquifer Guidelines November 2006).
Developed by programmer Christine Haygood Deane under the direction of metadata librarian Melanie Feltner-Reichert, this open source client-side software provides control of date formats and other problematic fields at the point of creation, while shielding creators from the need to work in XML. Metadata records created can be partially created, saved to the desktop, reloaded and completed at a later date. Final versions can be downloaded or cut-and-pasted into text editors for use elsewhere.
Paul Courant on Michigan’s Mass Digitization Project with Google
In "On Being in Bed with Google," Paul N. Courant, University Librarian and Dean of Libraries at the University of Michigan, vigorously rebuts arguments against research libraries participating in the Google Books Library Project.
Here's an excerpt:
Since 2005, Siva Vaidhyanathan has been making and refining the argument that libraries should be digitizing their collections independently, without corporate financing or participation, and that those who don’t are failing to uphold their responsibility to the public. "Libraries should not be relinquishing their core duties to private corporations for the sake of expediency."
"Expediency" is a bit of a dirty word. Vaidhyanathan’s phrase suggests that good people don’t do things simply because they are "expedient." But I view large-scale digitization as expeditious. We have a generation of students who will not find valuable scholarly works unless they can find them electronically. At the rate that OCA is digitizing things (and I say the more the merrier and the faster the better) that generation will be dandling great-grandchildren on its knees before these great collections can be found electronically. At Michigan, the entire collection of bound print will be searchable, by anyone in the world, about when children born today start kindergarten.
Library of Congress and Xerox Team Up to Build Large JPEG 2000 Image Repository
The Library of Congress and Xerox will work together to build a repository of around 1 million JPEG 2000 images of public domain works.
Here's an excerpt from the press release:
The two organizations are studying the potential of using the JPEG 2000 format in large repositories of digital cultural heritage materials such as those held by the Library and other federal agencies. The eventual outcome may be leaner, faster systems that institutions around the country can use to store their riches and to make their collections widely accessible.
The project, designed to help develop guidelines and best practices for digital content, is especially relevant to the Library’s National Digital Information Infrastructure and Preservation Program, which has been working with several other federal agencies on digitization standards.
The trial will include up to 1 million digitized, public domain prints, photographs, maps and other content from the Library’s extraordinary collections. Scientists in the Xerox Innovation Group will work with these materials to create an image repository that they will use to develop and test approaches for the management of large image collections.
The images to be used from the Library’s collection are already digitized (primarily in TIFF format), but JPEG 2000, a newer format for representing and compressing images, could make them easier to store, transfer and display. According to Michael Stelmach, manager of Digital Conversion Services in the Library’s Office of Strategic Initiatives, JPEG 2000 holds promise in the areas of visual presentation, simplified file management and decreased storage costs. It offers rich and flexible support for metadata, which can describe the image and provide information on the provenance, intellectual property and technical data relating to the image itself.
Xerox scientists will develop the parameters for converting TIFF files to JPEG 2000 and will build and test the system, then turn over the specifications and best practices to the Library of Congress. The specific outcome will be development of JPEG 2000 profiles, which describe how to use JPEG 2000 most effectively to represent photographic content as well as content digitized from maps. The Library plans to make the results available on a public Web site.
Four National Libraries Agree to Coordinate RDA Implementation
The Library of Congress, the British Library, Library and Archives Canada, and the National Library of Australia have agreed to coordinate their implementation of RDA: Resource Description and Access.
Here's an excerpt from the press release:
These national libraries, together with representatives from professional library associations in Canada, the United Kingdom and the United States, are members of the Committee of Principals which oversees the work of the Joint Steering Committee for Development of RDA, which is responsible for developing RDA. . . .
RDA addresses the needs of the future by providing a flexible framework for describing all types of resources of interest to libraries. RDA guidelines will be easy and efficient to use, as well as compatible with internationally established principles, models and standards. In addition, RDA will maintain continuity with the past, as data created using RDA will be compatible with existing records in online library catalogs.
The libraries plan to implement RDA by the end of 2009. To ensure a smooth transition to RDA, the four national libraries will work together where possible on implementation matters such as training, documentation and any national application decisions. Regular updates will be issued by the group to keep the library communities in their countries informed on RDA implementation progress and policy decisions.
Digital Archive for Architecture: CDWA for DSpace
The Art Institute of Chicago has developed the Digital Archive for Architecture (DAArch) to support the use of the Categories for the Description of Works of Art (CDWA) metadata schema in DSpace. The software runs under BSD/UNIX/Linux; is written in Java, JSP, PHP; utilizes PostgreSQL, and is under a BSD License.
Keller Discusses the Sun PASIG
Campus Technology has published an interview with Michael Keller about the Sun Preservation and Archiving Special Interest Group.
Sun Preservation and Archiving Special Interest Group Formed
Sun has formed the Sun PASIG (Sun Preservation and Archiving Special Interest Group).
Here's an excerpt from the press release:
Addressing the need for better collaboration on best practices around global standards in large data set and metadata preservation, the Sun PASIG will help provide support for organizations challenged with preserving and archiving important research and cultural heritage materials. Founding members of the Sun PASIG include The Alberta Library, The British Library, Johns Hopkins University, University of Oxford, Stanford University, The Texas Digital Library, and other leading global libraries and universities. . . .
At globally located semi-annual meetings, group members will share knowledge of storage technology trends, services-oriented architecture and software code, and discuss best practices of both commercial and community-developed solutions. Working groups will hold discussions on architectures, use cases and business drivers, storage, access and security, and operating policies, with the goal of providing common case studies and solutions for digital archiving. The Sun PASIG will focus on both collaborating with leading institutions in the EPrints, Fedora, and DSpace communities to create replicable solutions and exchanging expertise on global developments around the Open Archival Information System (OAIS) architecture model.
"Libraries and universities around the world face a common problem: how to best capture and archive valuable knowledge. Global discussion is the first step towards finding solutions that meet institutions' individualized preservation needs," said Michael Keller, University Librarian, Director of Academic Information Resources, Stanford University. "With the formation of Sun PASIG, we are looking forward to working with our peers to discover and create the best digital preservation options available, from infrastructure to interfaces."
DCMI Scholarly Communications Community
The Dublin Core Metadata Initiative has established the DCMI Scholarly Communications Community, which currently includes a mailing list and a wiki.
Here's an excerpt from the home page:
The DCMI Scholarly Communications Community is a forum for individuals and organisations to exchange information, knowledge and general discussion on issues relating to using Dublin Core for describing research papers, scholarly texts, data objects and other resources created and used within scholarly communications. This includes providing a forum for discussion around the Eprints Application Profile, also known as the Scholarly Works Application Profile (SWAP) and for other existing and future application profiles created to describe items of scholarly communication.
History of Metadata Timeline
Metadata Services at the Cornell University Library has created a History of Metadata timeline as part of their extensive Resources directory.
Metadata SPEC Kit from ARL
ARL has published Metadata, SPEC Kit 298 by Jin Ma. The front matter and Executive Summary are freely available.
RFC for Dublin Core (RFC 5013) Published
John A. Kunze has announced on DC-GENERAL that the RFC for Dublin Core (RFC 5013) has just been published.
He notes that it "contains the same element definitions as the recently revised NISO standard, Z39.85-2007, but is freely accessible in one click via a global set of mirrored repositories used by the highly technical audiences that support and define Internet infrastructure."
A Portal for Doctoral E-Theses in Europe
The SURFfoundation has released A Portal for Doctoral E-Theses in Europe: Lessons Learned from a Demonstrator Project by M. P. J. P. Vanderfeesten. The portal project was funded by JISC, the National Library of Sweden, and the SURFfoundation. The SURFfoundation ran the project.
Here’s an excerpt from the "Management Summary":
For the first time various repositories with doctoral e-theses have been harvested on an international scale. This report describes a small pilot project which tested the interoperability of repositories for e-theses and has set up a freely accessible European portal with over 10,000 doctoral e-theses.
Five repositories from five different countries in Europe were involved: Denmark, Germany, the Netherlands, Sweden and the UK. The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) was the common protocol used to test the interoperability. Based upon earlier experiences and developed tools (harvester, search engine) of the national DAREnet service in the Netherlands, SURFfoundation could establish a prototype for this European e-theses Demonstrator relatively fast and simple.
Nevertheless, some critical issues and problems occurred. They can be categorised into the following topics:
a) Generic issues related to repositories: the language used in the metadata fields differs per repository. . . . Furthermore, the quality of the data presented differs.. . . A further issue is the semantic and syntactic differences in metadata between repositories, which means that the format and content of the information exchange requests are not unambiguously defined. . . .
b) E-theses specific issues: to be able to harvest doctoral theses, the service provider needs to be able to filter on this document type. Up to now there is no commonly agreed format, which makes semantic interoperability possible [specific Dublin core recommendations omitted]. . . .
c) Issues related to data providers and service providers: besides the use of the OAI-protocol for metadata harvesting and the use of Dublin Core it is recommended for data providers to further standardise on the semantic interoperability by using the DRIVER guidelines with an addition of the e-Theses specific recommendations described above. To be able to offer more than basic services for e-Theses, one has to change the metadata format from simple Dublin Core to a richer and e-Theses specific one. . . . We needed to fix, normalise and crosswalk the differences between every repository to get a standard syntactic and semantic metadata structure. . . . The scaling up is a big issue. To stimulate the broad take up of various services, data providers have to work on implementing standards that create interoperability on syntactic and semantic levels.
d) Cultural and educational differences: In every country the educational processes are different. . . . Not only the graduation and publication process differs, but also the duration of the research process. Therefore the quality of the results in a cross-European search of doctoral theses may vary enormously.
(Thanks to Open Access News.)
Metadata Extraction Tool Version 3.2
The National Library of New Zealand has released version 3.2 of its open-source Metadata Extraction Tool.
Written in Java and XML, the Metadata Extraction Tool has a Windows interface, and it runs under UNIX in command line mode. Batch processing is supported.
Here’s an excerpt from the project home page:
The Tool builds on the Library’s work on digital preservation, and its logical preservation metadata schema. It is designed to:
- automatically extracts preservation-related metadata from digital files
- output that metadata in a standard format (XML) for use in preservation activities. . . .
The Metadata Extract Tool includes a number of ‘adapters’ that extract metadata from specific file types. Extractors are currently provided for:
- Images: BMP, GIF, JPEG and TIFF.
- Office documents: MS Word (version 2, 6), Word Perfect, Open Office (version 1), MS Works, MS Excel, MS PowerPoint, and PDF.
- Audio and Video: WAV and MP3.
- Markup languages: HTML and XML.
If a file type is unknown the tool applies a generic adapter, which extracts data that the host system ‘knows’ about any given file (such as size, filename, and date created).
Using the Open Archives Initiative Protocol for Metadata Harvesting
Libraries Unlimited has released Using the Open Archives Initiative Protocol for Metadata Harvesting by Timothy W. Cole and Muriel Foulonneau.
Here’s an excerpt from the publisher’s description:
Through a series of case studies, Cole and Foulonneau guide the reader through the process of conceiving, implementing and maintaining an OAI-compliant repository. Its applicability to both institutional archives and discipline based aggregators are covered, with equal attention paid to the technical and organizational aspects of creating and maintaining such repositories.
ONIX for Serials Coverage Statement Draft Release 0.9
EDItEUR has released "ONIX for Serials Coverage Statement Draft Release 0.9 (june 2007)" for comment through September 2007.
Here’s an excerpt from the draft’s Web page:
ONIX for Serials Coverage Statement is an XML structure capable of carrying simple or complex statements of holdings of serial resources, in paper or electronic form, to be included in ONIX for Serials messages for a variety of applications; for example, to express:
- The holdings of a particular serial version by a library
- The coverage of a particular serial version supplied by an online content hosting system
- The coverage of a particular serial version included in a subscription or offering
EDItEUR has also released "SOH: Serials Online Holdings Release1.1 (Draft June 2007)" for comment.