Metadata – Page 12 – DigitalKoans

DLF Awarded Grant to Evaluate Metadata Tools

The Digital Library Federation has been given a $18,000 by the Gladys Krieble Delmas Foundation grant to investigate the use of metadata tools to improve access to cultural heritage materials.

Here's an excerpt from the press release:

The assessment will be done within DLF Aquifer, a Digital Library Federation initiative focused on making digital content—especially cultural heritage materials pertinent to American culture and life—easier for scholars to find and use. The grant will enable a metadata librarian and a library school intern to identify tools that could be used to improve metadata for digital material that is difficult to find and use . . . .

DLF Aquifer has developed a set of implementation guidelines designed to make metadata more effective in aggregations. To assist libraries, archives, and other cultural heritage organizations in meeting the guidelines, DLF Aquifer proposes to offer a range of mapping and remediation services. Although a number of discrete prototypes such as date normalization and topical clustering tools have been developed, these tools are not yet robust enough to be used in production for reliable results. DLF proposes to inventory existing tools and examine the feasibility of developing these tools into production services.

Midosa Editor for XML-Standards (MEX): EAD, EAC and METS Support for Digital Finding Aids

The ‹daofind+› project has released enhancements, including a new Macintosh version, to MEX (Midosa Editor for XML-Standards), a finding aid tool set that supports EAD, EAC, and METS.

Here's an excerpt from an announcement of an upcoming meeting about MEX:

The MEX tool set (MidosaEditor for XML standards) is available at SourceForge in English and German for Windows and MAC. It was developed by the two projects ‹daofind› and ‹daofind+› with support from the Andrew W. Mellon Foundation, New York. MEX can be used for capturing and editing EAD, EAC and METS files and for producing complete integrated HTML presentations of online finding aids with digitised records in one step.

Harmonization of Metadata Standards

PROLEARN has released Harmonization of Metadata Standards.

Here's an excerpt from the "Introduction":

Today there is a plethora of metadata specifications (such as IEEE LOM, Dublin Core, METS, MODS, MPEG-7, etc.), many of which are useful in whole or part for activities related to teaching and learning. While each specification in itself is designed to increase system interoperability, we are increasingly seeing systems that need to work with more than one of these specifications. Adding support for an additional specification generally presents a significant amount of added complexity in implementation. The reason for this is a lack of harmonization between specifications. . . .

Existing solutions to the metadata harmonization issue are few—systems are either limited to a single specification, or implement ad-hoc solutions that only work in that particular environment. There are many examples of "mappings" between specifications that provide partial solutions to the problem, but generally fail due to low-fidelity translations and lack of generality (i.e. the mapping only works for limited parts of specifications). Another solution is to create a top-level data model that encompasses the common aspects of all the specifications. This has proven to be feasible in relatively well-constrained domains such as resource aggregation. . . In the field of general metadata, where there is no such common ground, such an approach is substantially less likely to be successful. . . .

The deliverable begins with a short introduction to metadata in Section 3. Section 4 discusses a set of metadata specifications that are highly relevant to learning and teaching. Section 5 forms the core of the deliverable and analyses the harmonization issues among a chosen set of specifications. Section 6 generalizes the analysis in Section 5 and makes a deeper analysis of the relationship between IEEE LOM and Dublin Core. Section 7, finally, points to possible ways to address the identified harmonization issues.

Alpha Version of OAI-PMH Metadata Analysis Tool Released

The Greenstone Digital Library project has released an alpha version of an OAI-PHM metadata analysis tool that can be used to "generate statistics and visualisations of OAI repositories." Several sample reports are available, including one for the University of Illinois IDEAL repository.

Presentations from eResearch Australasia 2007

Presentations from eResearch Australasia 2007 are now available.

Here are selected presentations:

ResearcherID.com and NISO Institutional Identifier

As scholarly digital information has proliferated in many formats and versions on the Internet, it has become increasing difficult to identify works that are by the same author or by the same institution. Recently, Thomson Scientific has begun work on author and institution identifiers.

Here's an excerpt from "Thomson Scientific Tagging Researchers: ResearcherID.com."

Thomson Scientific (http://scientific.thomson.com) has opened up a new web service called ResearcherID.com (www.researcherid.com) that allows researchers to establish their own identities and, with some restrictions, to identify their writings. . . .

Currently, all the registrants must have authorized access to Thomson Scientific's Web of Knowledge. In addition, all the registrants on the site are there by invitation only, but Pringle expects the service will be open to all Web of Knowledge users by the end of the month. Since Thomson estimates the access to that service to be 20 million users worldwide, this restriction would still make the service broad-based, if researchers choose to use it.

Here's an excerpt from "But What About Corporate Authors? NISO Institutional Identifier Project Underway."

Thomson Scientific (http://scientific.thomson.com) has joined an effort with the National Information Standards Organization (NISO; www.niso.org) to build an open standard for identifying institutions. The initial NISO effort will focus on academic and research institutions, the kind often referred to in author affiliation or corporate author fields. . . .

The charge from the voting membership to the new working group is to study and propose an identifier that will uniquely identify institutions and describe relationships between entities within institutions. In the course of developing a proposed identifier, the group will consider the minimum set of data consistent with account privacy and security issues, as well as other data used to support different business models.

Digital Asset Management Database Released: DAM Built on FileMaker Pro

Museums and the Online Archive of California (MOAC) has released the IMLS-funded Digital Asset Management Database (DAMD), a digital asset management system.

Here's an excerpt from the MOAC homepage:

Building on previous successful work in the areas of standards and online collections access, the new MOAC software tool, the Digital Asset Management Database (DAMD), has been developed as both a utilitarian tool and as a test case for exploring more general issues of content sharing and community tool development. This tool has two primary functions that can be used together or separately: it provides basic digital asset management for simple to complex media objects and it easily transforms collections information into an extensible variety of standards-based XML formats, such as METS and OAI, to allow even small organizations without technical staff to share their collections broadly and participate in building a national network of culture. DAMD was developed as an "open solution," built on FileMaker Pro software (8.5 or above) because of the broad base of installed users of FileMaker in the museum and arts communities. DAMD is available for free to cultural organizations. The tool, and its unique export/transform functions (detailed in the documentation), are open-ended, allowing organizations to customize the tool for themselves or the community to improve the tool for all.

Machine Services for Metadata Discovery and Aggregation—metadata+ Report

JISC has released Machine Services for Metadata Discovery and Aggregation—metadata+.

Here's an excerpt from the Executive Summary:

The main aim of the project is to develop an interoperability demonstrator to explore the technical aspects of providing a service-oriented infrastructure to facilitate metadata discovery and aggregation. The project developed a test bed that exposes metadata through standard search and linking protocols. Metadata mapping work was undertaken to enable the test bed to provide search response in multiple metadata schemas that are widely used in digital library and e-learning.

The core of the test bed consists of an open source digital repository—Fedora. Off-the-shelf, the repository provides web services for metadata searching and substantial content management and security features particularly suitable for real-life use scenarios. Since the search protocol considered in this project requires additional features that are not available from the repository, modifications to the repository source code were made. The modifications also involve incorporating the metadata mapping requirement such that search responses from different metadata formats can be facilitated.

A basic demonstrator (project website) has been created to exemplify how the search protocol can be used for discovering and aggregating metadata, as well as presenting them in coherent formats relevant to the intended presentation contexts. The metadata sources include publisher and digital libraries providing both bibliographic and user-generated (enrichment) metadata such as reviews and recommendations. In addition, the project demonstrated a novel use of the search protocol to dynamically create e-learning content packages, digital library metadata collection and news feeds.

Several digital libraries initiatives have evaluated the test bed infrastructure for real use scenarios. These libraries are an extended form of the test bed demonstrator and provide relevant facilities such metadata wiki (editor) and annotation services for gauging enrichment metadata (review, rating and recommendation) from users. They will continue the objectives of this project particularly on improving the test bed infrastructure and exploring the aggregated use of enrichment metadata, to enable the academic and research user communities to add values to bibliographic metadata from the publishers and libraries communities.

Alpha Release of the ORE Specification and User Guide

The Open Archives Initiative Object Reuse and Exchange has released an alpha version of the ORE Specification and User Guide. Comments can be made on the OAI-ORE discussion group or via email to ore@openarchives.org.

Here's an excerpt from the introduction:

The World Wide Web is built upon the notion of atomic units of information called resources that are identified with URIs such as http://www.openarchives.org/ore/0.1/toc (this page). In addition to these atomic units, aggregations of resources are often units of information in their own right. . . .

A mechanism to associate identities with these aggregations and describe them in a machine-readable manner would make them visible to Web agents, both humans and machines. This could be useful for a number of applications and contexts. For example:

Crawler-based search engines could use such descriptions to index information and provide search results sets at the granularity of the aggregations rather than their individual parts.

Browsers could leverage them to provide users with navigation aids for the aggregated resources, in the same manner that machine-readable site maps provide navigation clues for crawlers.

Other automated agents such as preservation systems could use these descriptions as guides to understand a "whole document" and determine the best preservation strategy.

Systems that mine and analyze networked information for citation analysis/bibliometrics could achieve better accuracy with knowledge of aggregation structure contained in these descriptions.

These machine-readable descriptions could provide the foundation for advanced scholarly communication systems that allow the flexible reuse and refactoring of rich scholarly artifacts and their components [Value Chains].

University of Michigan Libraries Make over 100,000 Records for Digitized Books Available for Harvesting

The University of Michigan Libraries have made over 100,000 metadata records from its MBooks collection available for OAI-PMH harvesting. The records are for digitized books in the public domain.

Here's an excerpt from the announcement:

The University of Michigan Library is pleased to announce that records from our MBooks collection are available for OAI harvesting. The MBooks collection consists of materials digitized by Google in partnership with the University of Michigan.

http://quod.lib.umich.edu/cgi/o/oai/oai?verb=Identify

Only records for MBooks available in the public domain are exposed. We have split these into sets containing public domain items according to U.S. copyright law, and public domain items worldwide. There are currently over 100,000 records available for harvesting. We anticipate having 1 million records available when the entire U-M collection has been digitized by Google.

TASI Updates Digital Imaging Documents

The Technical Advisory Service for Images (TASI) has updated the following documents that deal with digital imaging issues:

Draft Report on the Future of Bibliographic Control Released for Comment

The Library of Congress has released a draft of the Report on the Future of Bibliographic Control for comment. Comments should be received by December 15.

Here's an excerpt from the "Introduction":

The recommendations fall into five general areas:

Increase the efficiency of bibliographic production for all libraries through increased cooperation and increased sharing of bibliographic records, and by maximizing the use of data produced throughout the entire “supply chain” for information resources.

Transfer effort into higher-value activity. In particular, expand the possibilities for knowledge creation by “exposing” rare and unique materials held by libraries that are currently hidden from view and, thus, underused.

Position our technology for the future by recognizing that the World Wide Web is both our technology platform and the appropriate platform for the delivery of our standards. Recognize that people are not the only users of the data we produce in the name of bibliographic control, but so too are machine applications that interact with those data in a variety of ways.

Position our community for the future by facilitating the incorporation of evaluative and other user-supplied information into our resource descriptions. Work to realize the potential of the FRBR framework for revealing and capitalizing on the various relationships that exist among information resources.

Strengthen the library profession through education and the development of metrics that will inform decision-making now and in the future.

RLG Programs Descriptive Metadata Practices Survey Results Published

RLG Programs has published RLG Programs Descriptive Metadata Practices Survey Results and RLG Programs Descriptive Metadata Practices Survey Results: Data Supplement.

Here's an excerpt from the announcement:

We conducted this survey in July and August 2007 among 18 RLG partners in the United States and the United Kingdom, selected because they had "multiple metadata creation centers" on campus that included libraries, archives, and museums and had some interaction among them. Our objective was to gain a baseline understanding of current descriptive metadata practices and dependencies, the first project in our program to change metadata creation processes.

The report summarizes the descriptive practices used across a variety of applications, the data structure and data content standards followed, the audiences for the metadata created, and some organization patterns. The data from the 89 respondents is reported in a series of charts and graphs that are open to interpretation. RLG Programs offers its own interpretation in the prefatory narrative, flagging questions for follow up and goals for future projects. Although we saw some expected variations in practice across libraries, archives and museums, we were struck by the high levels of customization and local tool development, the limited extent to which tools and practices are, or can be, shared (both within and across institutions), the lack of confidence institutions have in the effectiveness of their tools, and the disconnect between their interest in creating metadata to serve their primary audiences and the inability to serve that audience within the most commonly used discovery systems (such as Google, Yahoo, etc.).

DLC-MODS Workbook 1.2: A Tool to Create MODS Metadata Records

The University of Tennessee Digital Library Center has released version 1.2 of the DLC-MODS Workbook under a GNU General Public License. A demo is also available.

Here's an excerpt from the diglib announcement:

The DLC-MODS Workbook provides a series of web pages that enable users to easily generate complex, valid MODS metadata records that meet the 1-4 levels of specification outlined in the Digital Library Federation Implementation Guidelines for Shareable MODS Records, (DLF Aquifer Guidelines November 2006).

Developed by programmer Christine Haygood Deane under the direction of metadata librarian Melanie Feltner-Reichert, this open source client-side software provides control of date formats and other problematic fields at the point of creation, while shielding creators from the need to work in XML. Metadata records created can be partially created, saved to the desktop, reloaded and completed at a later date. Final versions can be downloaded or cut-and-pasted into text editors for use elsewhere.

Paul Courant on Michigan’s Mass Digitization Project with Google

In "On Being in Bed with Google," Paul N. Courant, University Librarian and Dean of Libraries at the University of Michigan, vigorously rebuts arguments against research libraries participating in the Google Books Library Project.

Here's an excerpt:

Since 2005, Siva Vaidhyanathan has been making and refining the argument that libraries should be digitizing their collections independently, without corporate financing or participation, and that those who don’t are failing to uphold their responsibility to the public. "Libraries should not be relinquishing their core duties to private corporations for the sake of expediency."

"Expediency" is a bit of a dirty word. Vaidhyanathan’s phrase suggests that good people don’t do things simply because they are "expedient." But I view large-scale digitization as expeditious. We have a generation of students who will not find valuable scholarly works unless they can find them electronically. At the rate that OCA is digitizing things (and I say the more the merrier and the faster the better) that generation will be dandling great-grandchildren on its knees before these great collections can be found electronically. At Michigan, the entire collection of bound print will be searchable, by anyone in the world, about when children born today start kindergarten.

Library of Congress and Xerox Team Up to Build Large JPEG 2000 Image Repository

The Library of Congress and Xerox will work together to build a repository of around 1 million JPEG 2000 images of public domain works.

Here's an excerpt from the press release:

The two organizations are studying the potential of using the JPEG 2000 format in large repositories of digital cultural heritage materials such as those held by the Library and other federal agencies. The eventual outcome may be leaner, faster systems that institutions around the country can use to store their riches and to make their collections widely accessible.

The project, designed to help develop guidelines and best practices for digital content, is especially relevant to the Library’s National Digital Information Infrastructure and Preservation Program, which has been working with several other federal agencies on digitization standards.

The trial will include up to 1 million digitized, public domain prints, photographs, maps and other content from the Library’s extraordinary collections. Scientists in the Xerox Innovation Group will work with these materials to create an image repository that they will use to develop and test approaches for the management of large image collections.

The images to be used from the Library’s collection are already digitized (primarily in TIFF format), but JPEG 2000, a newer format for representing and compressing images, could make them easier to store, transfer and display. According to Michael Stelmach, manager of Digital Conversion Services in the Library’s Office of Strategic Initiatives, JPEG 2000 holds promise in the areas of visual presentation, simplified file management and decreased storage costs. It offers rich and flexible support for metadata, which can describe the image and provide information on the provenance, intellectual property and technical data relating to the image itself.

Xerox scientists will develop the parameters for converting TIFF files to JPEG 2000 and will build and test the system, then turn over the specifications and best practices to the Library of Congress. The specific outcome will be development of JPEG 2000 profiles, which describe how to use JPEG 2000 most effectively to represent photographic content as well as content digitized from maps. The Library plans to make the results available on a public Web site.

Four National Libraries Agree to Coordinate RDA Implementation

The Library of Congress, the British Library, Library and Archives Canada, and the National Library of Australia have agreed to coordinate their implementation of RDA: Resource Description and Access.

Here's an excerpt from the press release:

These national libraries, together with representatives from professional library associations in Canada, the United Kingdom and the United States, are members of the Committee of Principals which oversees the work of the Joint Steering Committee for Development of RDA, which is responsible for developing RDA. . . .

RDA addresses the needs of the future by providing a flexible framework for describing all types of resources of interest to libraries. RDA guidelines will be easy and efficient to use, as well as compatible with internationally established principles, models and standards. In addition, RDA will maintain continuity with the past, as data created using RDA will be compatible with existing records in online library catalogs.

The libraries plan to implement RDA by the end of 2009. To ensure a smooth transition to RDA, the four national libraries will work together where possible on implementation matters such as training, documentation and any national application decisions. Regular updates will be issued by the group to keep the library communities in their countries informed on RDA implementation progress and policy decisions.

Digital Archive for Architecture: CDWA for DSpace

The Art Institute of Chicago has developed the Digital Archive for Architecture (DAArch) to support the use of the Categories for the Description of Works of Art (CDWA) metadata schema in DSpace. The software runs under BSD/UNIX/Linux; is written in Java, JSP, PHP; utilizes PostgreSQL, and is under a BSD License.

Keller Discusses the Sun PASIG

Campus Technology has published an interview with Michael Keller about the Sun Preservation and Archiving Special Interest Group.

Sun Preservation and Archiving Special Interest Group Formed

Sun has formed the Sun PASIG (Sun Preservation and Archiving Special Interest Group).

Here's an excerpt from the press release:

Addressing the need for better collaboration on best practices around global standards in large data set and metadata preservation, the Sun PASIG will help provide support for organizations challenged with preserving and archiving important research and cultural heritage materials. Founding members of the Sun PASIG include The Alberta Library, The British Library, Johns Hopkins University, University of Oxford, Stanford University, The Texas Digital Library, and other leading global libraries and universities. . . .

At globally located semi-annual meetings, group members will share knowledge of storage technology trends, services-oriented architecture and software code, and discuss best practices of both commercial and community-developed solutions. Working groups will hold discussions on architectures, use cases and business drivers, storage, access and security, and operating policies, with the goal of providing common case studies and solutions for digital archiving. The Sun PASIG will focus on both collaborating with leading institutions in the EPrints, Fedora, and DSpace communities to create replicable solutions and exchanging expertise on global developments around the Open Archival Information System (OAIS) architecture model.

"Libraries and universities around the world face a common problem: how to best capture and archive valuable knowledge. Global discussion is the first step towards finding solutions that meet institutions' individualized preservation needs," said Michael Keller, University Librarian, Director of Academic Information Resources, Stanford University. "With the formation of Sun PASIG, we are looking forward to working with our peers to discover and create the best digital preservation options available, from infrastructure to interfaces."

DCMI Scholarly Communications Community

The Dublin Core Metadata Initiative has established the DCMI Scholarly Communications Community, which currently includes a mailing list and a wiki.

Here's an excerpt from the home page:

The DCMI Scholarly Communications Community is a forum for individuals and organisations to exchange information, knowledge and general discussion on issues relating to using Dublin Core for describing research papers, scholarly texts, data objects and other resources created and used within scholarly communications. This includes providing a forum for discussion around the Eprints Application Profile, also known as the Scholarly Works Application Profile (SWAP) and for other existing and future application profiles created to describe items of scholarly communication.

History of Metadata Timeline

Metadata Services at the Cornell University Library has created a History of Metadata timeline as part of their extensive Resources directory.

Metadata SPEC Kit from ARL

ARL has published Metadata, SPEC Kit 298 by Jin Ma. The front matter and Executive Summary are freely available.

RFC for Dublin Core (RFC 5013) Published

John A. Kunze has announced on DC-GENERAL that the RFC for Dublin Core (RFC 5013) has just been published.

He notes that it "contains the same element definitions as the recently revised NISO standard, Z39.85-2007, but is freely accessible in one click via a global set of mirrored repositories used by the highly technical audiences that support and define Internet infrastructure."

A Portal for Doctoral E-Theses in Europe

The SURFfoundation has released A Portal for Doctoral E-Theses in Europe: Lessons Learned from a Demonstrator Project by M. P. J. P. Vanderfeesten. The portal project was funded by JISC, the National Library of Sweden, and the SURFfoundation. The SURFfoundation ran the project.

Here’s an excerpt from the "Management Summary":

For the first time various repositories with doctoral e-theses have been harvested on an international scale. This report describes a small pilot project which tested the interoperability of repositories for e-theses and has set up a freely accessible European portal with over 10,000 doctoral e-theses.

Five repositories from five different countries in Europe were involved: Denmark, Germany, the Netherlands, Sweden and the UK. The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) was the common protocol used to test the interoperability. Based upon earlier experiences and developed tools (harvester, search engine) of the national DAREnet service in the Netherlands, SURFfoundation could establish a prototype for this European e-theses Demonstrator relatively fast and simple.

Nevertheless, some critical issues and problems occurred. They can be categorised into the following topics:

a) Generic issues related to repositories: the language used in the metadata fields differs per repository. . . . Furthermore, the quality of the data presented differs.. . . A further issue is the semantic and syntactic differences in metadata between repositories, which means that the format and content of the information exchange requests are not unambiguously defined. . . .

b) E-theses specific issues: to be able to harvest doctoral theses, the service provider needs to be able to filter on this document type. Up to now there is no commonly agreed format, which makes semantic interoperability possible [specific Dublin core recommendations omitted]. . . .

c) Issues related to data providers and service providers: besides the use of the OAI-protocol for metadata harvesting and the use of Dublin Core it is recommended for data providers to further standardise on the semantic interoperability by using the DRIVER guidelines with an addition of the e-Theses specific recommendations described above. To be able to offer more than basic services for e-Theses, one has to change the metadata format from simple Dublin Core to a richer and e-Theses specific one. . . . We needed to fix, normalise and crosswalk the differences between every repository to get a standard syntactic and semantic metadata structure. . . . The scaling up is a big issue. To stimulate the broad take up of various services, data providers have to work on implementing standards that create interoperability on syntactic and semantic levels.

d) Cultural and educational differences: In every country the educational processes are different. . . . Not only the graduation and publication process differs, but also the duration of the research process. Therefore the quality of the results in a cross-European search of doctoral theses may vary enormously.

(Thanks to Open Access News.)