ARL has published Metadata, SPEC Kit 298 by Jin Ma. The front matter and Executive Summary are freely available.
Category: Metadata
RFC for Dublin Core (RFC 5013) Published
John A. Kunze has announced on DC-GENERAL that the RFC for Dublin Core (RFC 5013) has just been published.
He notes that it "contains the same element definitions as the recently revised NISO standard, Z39.85-2007, but is freely accessible in one click via a global set of mirrored repositories used by the highly technical audiences that support and define Internet infrastructure."
A Portal for Doctoral E-Theses in Europe
The SURFfoundation has released A Portal for Doctoral E-Theses in Europe: Lessons Learned from a Demonstrator Project by M. P. J. P. Vanderfeesten. The portal project was funded by JISC, the National Library of Sweden, and the SURFfoundation. The SURFfoundation ran the project.
Here’s an excerpt from the "Management Summary":
For the first time various repositories with doctoral e-theses have been harvested on an international scale. This report describes a small pilot project which tested the interoperability of repositories for e-theses and has set up a freely accessible European portal with over 10,000 doctoral e-theses.
Five repositories from five different countries in Europe were involved: Denmark, Germany, the Netherlands, Sweden and the UK. The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) was the common protocol used to test the interoperability. Based upon earlier experiences and developed tools (harvester, search engine) of the national DAREnet service in the Netherlands, SURFfoundation could establish a prototype for this European e-theses Demonstrator relatively fast and simple.
Nevertheless, some critical issues and problems occurred. They can be categorised into the following topics:
a) Generic issues related to repositories: the language used in the metadata fields differs per repository. . . . Furthermore, the quality of the data presented differs.. . . A further issue is the semantic and syntactic differences in metadata between repositories, which means that the format and content of the information exchange requests are not unambiguously defined. . . .
b) E-theses specific issues: to be able to harvest doctoral theses, the service provider needs to be able to filter on this document type. Up to now there is no commonly agreed format, which makes semantic interoperability possible [specific Dublin core recommendations omitted]. . . .
c) Issues related to data providers and service providers: besides the use of the OAI-protocol for metadata harvesting and the use of Dublin Core it is recommended for data providers to further standardise on the semantic interoperability by using the DRIVER guidelines with an addition of the e-Theses specific recommendations described above. To be able to offer more than basic services for e-Theses, one has to change the metadata format from simple Dublin Core to a richer and e-Theses specific one. . . . We needed to fix, normalise and crosswalk the differences between every repository to get a standard syntactic and semantic metadata structure. . . . The scaling up is a big issue. To stimulate the broad take up of various services, data providers have to work on implementing standards that create interoperability on syntactic and semantic levels.
d) Cultural and educational differences: In every country the educational processes are different. . . . Not only the graduation and publication process differs, but also the duration of the research process. Therefore the quality of the results in a cross-European search of doctoral theses may vary enormously.
(Thanks to Open Access News.)
Metadata Extraction Tool Version 3.2
The National Library of New Zealand has released version 3.2 of its open-source Metadata Extraction Tool.
Written in Java and XML, the Metadata Extraction Tool has a Windows interface, and it runs under UNIX in command line mode. Batch processing is supported.
Here’s an excerpt from the project home page:
The Tool builds on the Library’s work on digital preservation, and its logical preservation metadata schema. It is designed to:
- automatically extracts preservation-related metadata from digital files
- output that metadata in a standard format (XML) for use in preservation activities. . . .
The Metadata Extract Tool includes a number of ‘adapters’ that extract metadata from specific file types. Extractors are currently provided for:
- Images: BMP, GIF, JPEG and TIFF.
- Office documents: MS Word (version 2, 6), Word Perfect, Open Office (version 1), MS Works, MS Excel, MS PowerPoint, and PDF.
- Audio and Video: WAV and MP3.
- Markup languages: HTML and XML.
If a file type is unknown the tool applies a generic adapter, which extracts data that the host system ‘knows’ about any given file (such as size, filename, and date created).
Using the Open Archives Initiative Protocol for Metadata Harvesting
Libraries Unlimited has released Using the Open Archives Initiative Protocol for Metadata Harvesting by Timothy W. Cole and Muriel Foulonneau.
Here’s an excerpt from the publisher’s description:
Through a series of case studies, Cole and Foulonneau guide the reader through the process of conceiving, implementing and maintaining an OAI-compliant repository. Its applicability to both institutional archives and discipline based aggregators are covered, with equal attention paid to the technical and organizational aspects of creating and maintaining such repositories.
ONIX for Serials Coverage Statement Draft Release 0.9
EDItEUR has released "ONIX for Serials Coverage Statement Draft Release 0.9 (june 2007)" for comment through September 2007.
Here’s an excerpt from the draft’s Web page:
ONIX for Serials Coverage Statement is an XML structure capable of carrying simple or complex statements of holdings of serial resources, in paper or electronic form, to be included in ONIX for Serials messages for a variety of applications; for example, to express:
- The holdings of a particular serial version by a library
- The coverage of a particular serial version supplied by an online content hosting system
- The coverage of a particular serial version included in a subscription or offering
EDItEUR has also released "SOH: Serials Online Holdings Release1.1 (Draft June 2007)" for comment.
Compound Information Objects: An OAI-ORE Perspective
The Open Archives Initiative—Object Reuse and Exchange has released Compound Information Objects: An OAI-ORE Perspective by Carl Lagoze and Herbert Van de Sompel.
Here’s an excerpt from the document’s "Introduction and Motivation" section:
In summary, the web architecture expresses the notion of linked URI-identified resources. Information systems can leverage this architecture to publish the components of a compound object and thereby make them available to web clients and services. But due to the absence of commonly accepted standards, the notion of an identified compound object with a distinct boundary and typed relationships among its component resources is lost.
The absence of these standards affects the functionality of a number of existing and possible web services and applications. Crawler-based search engines might be more useful if the granularity of their result sets corresponded to compound objects (a book or chapter, in this example) rather than individual resources (single pages). The ranking algorithms of these search engines might improve if the links among the components of a compound object were treated differently than links to the object as a whole, or if the number of in-links to the various component resources was accumulated to the level of the compound object instead of counted separately. Citation analysis systems would also benefit from a mechanism for citing the compound object itself, rather than arbitrary parts of the object. Finally, a standard for representing compound objects might enable a new class of "whole object" services such as "preserve a compound object".
Implementing the PREMIS Data Dictionary: A Survey of Approaches
The Library of Congress’ Network Development and MARC Standards Office unit has released Implementing the PREMIS Data Dictionary: A Survey of Approaches.
Here is an excerpt from the report’s preface:
The Preservation Metadata: Implementation Strategies (PREMIS) Working Group developed the Data Dictionary for Preservation Metadata, which is a specification containing a set of "core" preservation metadata elements that has broad applicability within the digital preservation community. The PREMIS Data Dictionary (PDD) was released in May 2005 along with a set of XML schemas to support its implementation. Since that time, institutions have begun to implement preservation metadata by providing content for semantic units expressed in the data dictionary or comparing it with planned or existing systems for long-term preservation. . . .
The Library of Congress, as part of the PREMIS maintenance activity, commissioned Deborah Woodyard-Robinson to provide this study to explore how institutions have implemented the PREMIS semantic units. . . . In this study sixteen repositories have been surveyed about their interpretation and application of the PDD, with an analysis then made on how the PREMIS core fits with the functions of a preservation repository and which PDD semantic units will be most relevant to certain types of repositories.
Dublin Core Standard Renewed and Updated
The Dublin Core Metadata Initiative has announced that the Dublin Core Metadata Element Set has been renewed and updated as ANSI/NISO standard Z39.85-2007.
In other Dublin Core news, the DCMI Abstract Model has been approved as a DCMI Recommendation and a new DCMI Task Group has been established for collaborative work on Resource Description and Access (RDA).
MIDESS (Management of Images in a Distributed Environment with Shared Services) Project
The JISC-funded MIDESS Project is examining issues related to the management of digital audio, images, video, and other digital content in distributed digital repositories as well as at the national level. It is being conducted by the London School of Economics, University College London, the University of Birmingham, and the University of Leeds.
Here is an excerpt from the "Aims and Objectives of the MIDESS Project" page:
- The MIDESS project will be building digital content databases at three of the partner institutions . . .
- These databases will be populated with digital content which has already been created, or is currently under creation, by the partner institutions. . . .
- Opportunities for the sharing and re-use of digital collections across institutions will be explored . . .
- Metadata standards will be established, and metadata developed, for each collection added to the repositories. . . .
- MIDESS will explore the role of digital content databases with a particular focus on interoperability with enterprise content management architectures.
- MIDESS will also aim to establish how distributed digital repositories could encourage the wider exposure and sharing of content across institutions through an evaluation of requirements for centralised metadata harvesting services.
- MIDESS will seek to pilot an infrastructure which could serve as a model for future distributed national digitisation activities.
The project has produced a number of interesting documents, especially the detailed workpackages, which deal with issues such as digital preservation, enterprise storage, intellectual property, and user requirements.
Report on Embedding and Reusing PerX in a VLE
The PerX (Pilot Engineering Repository Xsearch) project has released its Report on Embedding and Reusing PerX in a VLE. (A "VLE" is a virtual learning environment.)
Here’s an excerpt from the introduction:
This report presents the reusable middleware we have used to embed PerX functionality into the University VLE, VISION, a commercial VLE Blackboard system. We have done our best to use service oriented architectures (SOA) as possible. We argue that by using open source and open standards approaches rather than software and practices developed specifically for a particular VLE product, it is possible to obtain open reusable middleware that can simplify the DLVLE integration and bridge the functionality of both environments. We hope that our methodology can provide a common foundation on which a variety of institutions may build their own customized middleware to integrate scholarly objects in VLEs.
Here’s a brief description of the PerX project from its home page:
The PerX project has developed a pilot service which provides subject resource discovery across a series of repositories of interest to the engineering learning and research communities. This pilot was used as a test-bed to explore the practical issues that would be encountered when considering the possibility of full scale subject resource discovery services.
DLF and OCLC Release Registry of Digital Masters Record Creation Guidelines
The Digital Library Federation and OCLC have released their Registry of Digital Masters Working Group’s Registry of Digital Masters Record Creation Guidelines.
Here is an excerpt from the Purpose section of the document:
By recording materials in the Registry, institutions are signaling the intent to preserve and maintain the accessibility of the described materials over an extended timeframe. This implies that materials were born digital or have been converted to digital form, that the digital objects are stored in professionally managed systems, and that the institution is committed to retain and preserve them. . . .
These guidelines detail which MARC 21 elements should be used to carry Registry-required information. Registry records describe materials that an institution intends to digitize, either from existing paper- and/or microfilm-based materials (“intent to digitizeâ€), as well as born digital materials, and to indicate the standards by which the registered objects have been digitized.
A Registry record also provides information about whether a specific item has already been digitized, and if so, whether the digitization has been done at an adequate level such that another digital copy is not required, what institution is responsible for the digitization, what institution is responsible for the preservation of the digital content, and what specific materials are available.
Report on Ingest Tools for Digital Repositories
The Cairo Project has released Cairo Tools Survey: A Survey of Tools Applicable to the Preparation of Digital Archives for Ingest into a Preservation Repository. It has also released a related report, Cairo Use Cases: A Survey of User Scenarios Applicable to the Cairo Ingest Tool.
Here’s a description of the Cairo Project from its home page:
Cairo will develop a tool for ingesting complex collections of born-digital materials, with basic descriptive, preservation and relationship metadata, into a preservation repository. The project is based on needs identified by the JISC-funded Paradigm project and the Wellcome Library’s Digital Curation in Action project. It is a key building block in the partner institutions’ strategy to develop digital repository architectures which can support the development of digital collections over the long-term.
Irish Virtual Research Library and Archive Project Workbook
The Irish Virtual Research Library has released its Project Workbook, which provides detailed information about its policies and procedures.
Here’s an excerpt from the Irish Virtual Research Library’s home page that describes the project:
The Irish Virtual Research Library & Archive (IVRLA) is a major digitisation and digital object management project launched in UCD in January 2005. The project was conceived as a means to preserve elements of UCD’s main repositories and increase and facilitate access to this material through the adoption of digitisation technologies.
Additionally the project will undertake dedicated research into the area of interacting with and enhancing the use of digital objects in a research environment through the development of a digital repository. When fully implemented, the IVRLA will be one of the first comprehensive digital primary source repositories in Ireland, and will advance the research agenda into the use and challenges affecting this new method of research, and of digital curation over the coming years.
Best Practices for Digital Collections at UM Libraries
Digital Collections and Resources at the University of Maryland Libraries has released the second edition of its Best Practices for Digital Collections at UM Libraries.
While these wide-ranging guidelines are primarily intended for the UM Libraries, others may find this 81-page document to be helpful as well.
Summary of PerX Project Findings About OAI-PMH and Repository Metadata Challenges
Roderick A. MacLeod has posted a useful summary of some of the key documents and findings of the PerX (Pilot Engineering Repositories Xsearch) project on JISC-REPOSITORIES. He notes: "These documents may help to dispel possible myths concerning the ease of service provision, ease of reharvesting metadata, surfacing digital repository content in third part services, etc."
Here’s a excerpt from the project’s About page that describes it:
PerX is a two-year (June 2005-May 2007) JISC Digital Repositories Programme project, to develop a pilot service which provides subject resource discovery across a series of repositories of interest to the engineering learning and research community. This pilot will then be used as a test-bed to explore the practical issues that would be encountered when considering the possibility of a full scale subject resource discovery service.
(Prior posting about PerX.)
Persistent Identifier Linking Infrastructure (PLIN) Project
ARROW and the University of Southern Queensland have established the Persistent Identifier Linking Infrastructure (PLIN) Project.
As outlined in the project’s Executive Summary, its goals are to:
- Support adoption and use of persistent identifiers and shared persistent identifier management services by the project stakeholders.
- Plan for a sustainable, shared identifier management infrastructure that enables persistence of identifiers and associated services over archival lengths of time.
The project’s anticipated outcomes are:
- Best practice and policy guides for the use of persistent identifiers in Australian e-learning, e-research, and e-science communities.
- Use cases describing community requirements for identifiers and business process analysis relating to these use cases.
- E-Framework representations of persistent identifier management services that support the business requirements for identifiers.
- A "pilot" shared persistent identifier management infrastructure usable by the project stakeholders over the lifetime of the project. The pilot infrastructure will include services for creating, accessing and managing persistent digital identifiers over their lifetime. The pilot infrastructure will interoperate with other DEST funded systemic infrastructure. The development phase of the pilot will use an agile development methodology that will allow the inclusion of "value-added" services for managing resources using persistent identifiers to be included in the development program if resources permit.
- Software tools to help applications use the shared persistent identifier infrastructure more easily.
- Report on options and proposals for sustaining, supporting (including outreach) and governing shared persistent identifier management infrastructure
The PLIN Projet will base its work on the CNRI Handle System. The below excerpt from the Handle System home page describes its primary features:
The Handle System® is a general purpose distributed information system that provides efficient, extensible, and secure identifier and resolution services for use on networks such as the Internet. It includes an open set of protocols, a namespace, and a reference implementation of the protocols. The protocols enable a distributed computer system to store identifiers, known as handles, of arbitrary resources and resolve those handles into the information necessary to locate, access, contact, authenticate, or otherwise make use of the resources. This information can be changed as needed to reflect the current state of the identified resource without changing its identifier, thus allowing the name of the item to persist over changes of location and other related state information.
EAD 2002 Schema Released
The EAD Schema Working Group (SAA/EADWG) has released the EAD 2002 Schema.
Two syntaxes are available: Relax NG Schema (RNG) and (W3C Schema XSD; requires the EAD XLink Schema).
Version 1.0 to Version 2002 conversion tools are available at EAD v1 to EAD v2002 Conversion.
For further information about the Encoded Archival Description (EAD), see the EAD Help Pages.
DLF/NSDL OAI Best Practices Wiki
The Digital Library Federation and NSDL OAI and Shareable Metadata Best Practices Working Group’s OAI Best Practices Wiki has a number of resources relevant to the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and related metadata issues.
The Tools and Strategies for Using and Enhancing/Extending the OAI Protocol section is of particular interest. It includes information about OAI-PMH data provider and service provider registries, software solutions and packages, and static repositories and gateways; metadata management and added value tools as well as OAI and character validation tools; and using SRU/W, collection description schema, and NSDL safe transforms.
Is OAI-PMH Too Labor-Intensive?
OAI-PMH permits metadata harvesting from disciplinary archives, institutional repositories, and other digital archives. This allows the creation of specialized search services using this harvested metadata. OAI-PMH is a key technology for the open access movement, but does it require too much human intervention?
An interesting message on JISC-REPOSITORIES by Santy Chumbe, Technical Officer of the PerX project, suggests that it may. He says:
We have learned that in despite of its relative simplicity, an OAI-PMH service can be harder to implement and maintain than expected. We have spent a lot of effort harvesting, normalising and maintaining metadata obtained from OAI data providers. In particular the issue of metadata quality is an important factor here. A summary of our experiences dealing with OAI-PMH can be found at http://eprints.rclis.org/archive/00006394. . . . A final report outlining the maintenance issues involved in the project is in progress but the experience gained suggests that successful ongoing maintenance of OAI targets would require a mixture of automated and manual approaches and that the level of ongoing maintenance is high.
Test Driving the CrossRef Simple-Text Query Tool for Finding DOIs
CrossRef has made a DOI finding tool publicly available. It’s called Simple-Text Query. You can get the details at Barbara Quint’s article "Linking Up Bibliographies: DOI Harvesting Tool Launched by CrossRef."
What caught my eye in Quint’s article was this: "Users can enter whole bibliographies with citations in almost any bibliographic format and receive back the matching Digital Object Identifiers (DOIs) for these references to insert into their final bibliographies."
Well not exactly. I cut and pasted just the "9 Repositories, E-Prints, and OAI" section of the Scholarly Electronic Publishing Bibliography into Simple-Text Query. Result: error message. I had exceeded the 15,360 character limit. So, suggestion one: put the limit on the Simple-Text Query page.
So them I counted out 15,360 characters of the section and pasted that. Just kidding. I pasted the first six references. Result?
Alexander, Martha Latika, and J. N. Gautam. “Institutional Repositories for Scholarly Communication: Indian Initiatives.” Serials: The Journal for the Serials Community 19, no. 3 (2006): 195-201.
No doi match found.Allard, Suzie, Thura R. Mack, and Melanie Feltner-Reichert. “The Librarian’s Role in Institutional Repositories: A Content Analysis of the Literature.” Reference Services Review 33, no. 3 (2005): 325-336.
doi:10.1108/00907320510611357
http://dx.doi.org/10.1108/00907320510611357Allen, James. “Interdisciplinary Differences in Attitudes towards Deposit in Institutional Repositories.” Manchester Metropolitan University, 2005.
http://eprints.rclis.org/archive/00005180/
Reference not parsedAllinson, Julie, and Roddy MacLeod. “Building an Information Infrastructure in the UK.” Research Information (October/November 2006).
http://www.researchinformation.info/rioctnov06digital.html
Reference not parsedAnderson, Greg, Rebecca Lasher, and Vicky Reich. “The Computer Science Technical Report (CS-TR) Project: A Pioneering Digital Library Project Viewed from a Library Perspective.” The Public-Access Computer Systems Review 7, no. 2 (1996): 6-26.
http://epress.lib.uh.edu/pr/v7/n2/ande7n2.html
Reference not parsedAndreoni, Antonella, Maria Bruna Baldacci, Stefania Biagioni, Carlo Carlesi, Donatella Castelli, Pasquale Pagano, Carol Peters, and Serena Pisani. “The ERCIM Technical Reference Digital Library: Meeting the Requirements of a European Community within an International Federation.” D-Lib Magazine 5 (December 1999).
http://www.dlib.org/dlib/december99/peters/12peters.html
Reference not parsed
Hmmm. According to Quint’s article:
I asked Brand if CrossRef could reach open access material. She assured me it could, but it clearly did not give the free and sometimes underdefined material any preference.
Looks like the open access capabilities may need some fine tuning. D-Lib Magazine and The Public-Access Computer Systems Review are not exactly obscure e-journals. Since my references are formatted in the Chicago style by EndNote, I don’t think that the reference format is the issue. In fact, Quint’s article says: "The Simple-Text Query can retrieve DOIs for journal articles, books, and chapters in any reference citation style, although it works best with standard styles."
Conclusion: I play with it some more, but Simple-Text Query may be best for conventional, mainstream journal references.
Collex: Remixable Metadata for Humanists to Create Collections and Exhibits
What is Collex? The project’s About page describes it in part as follows:
Collex is a set of tools designed to aid students and scholars working in networked archives and federated repositories of humanities materials: a sophisticated COLLections and EXhibits mechanism for the semantic web.
Collex allows users to collect, annotate, and tag online objects and to repurpose them in illustrated, interlinked essays or exhibits. It functions within any modern web browser without recourse to plugins or downloads and is fully networked as a server-side application. By saving information about user activity (the construction of annotated collections and exhibits) as ‘remixable’ metadata, the Collex system writes current practice into the scholarly record and permits knowledge discovery based not only on the characteristics or ‘facets’ of digital objects, but also on the contexts in which they are placed by a community of scholars.
A detailed description of the project is available in "COLLEX: Semantic Collections & Exhibits for the Remixable Web."
You can see Collex in action at the NINES (a Networked Interface for Nineteenth-Century Electronic Scholarship) project, which also uses IVANHOE ("a shared, online playspace for readers interested in exploring how acts of interpretation get made and reflecting on what those acts mean or might mean") and Juxta ("a cross-platform tool for collating and analyzing any kind or number of textual objects").
The About 9s page identifies key objectives of the NINES project as follows:
- It will create a robust framework to support the authority of digital scholarship and its relevance in tenure and other scholarly assessment procedures.
- It will help to establish a real, practical publishing alternative to the paper-based academic publishing system, which is in an accelerating state of crisis.
- It will address in a coordinated and practical way the question of how to sustain scholarly and educational projects that have been built in digital forms.
- It will establish a base for promoting new modes of criticism and scholarship promised by digital tools.
People Metadata
A message by Liddy Nevile on DC-General has spawned an interesting thread about the need to have a metadata scheme that describes people. Other participants note related efforts, such as BIO, the FOAF Vocabulary Specification, GEDCOM, the North Carolina Encoded Archival Context (EAC) Project, and the XHTML Friends Network.
DOIs for Books Gain Ground
According to CrossRef, the official DOI registration agency, over a half-million DOIs have been assigned to books or book chapters, and twenty of its members are using DOIs in this fashion.
What’s a DOI? Here’s a short description from CrossRef
The DOI, or digital object identifier, serves as a persistent, actionable identifier for intellectual property online. DOIs can be assigned at any level of granularity, and therefore provide publishers with an extensible platform for a variety of applications. And DOI links don’t break. Even if a publisher needs to migrate publications from one system to another, or if the content moves from one publisher to another, the DOI never changes.
While the use of DOIs for book chapters is especially interesting, DOIs can be utilized for smaller book sections as this example of an entry for Ian Fleming in the Oxford Dictionary of National Biography illustrates. (Notice the DOI, "Ian Lancaster Fleming (1908–1964): doi:10.1093/ref:odnb/33168," at the bottom of the entry.)
New Digital Image Documentation from TASI
The Technical Advisory Service for Images (TASI) has issued new documentation dealing with digital image issues:
- "Challenges of Describing Images"
- "Controlling Your Language—Links to Metadata Vocabularies"
- "Getting Practical with Metadata"
- "Metadata Overview"
- "Metadata Standards and Interoperability"
- "Putting Things in Order: Links to Metadata Schemas and Related Standards"
TASI has also created new guides to assist users in identifying appropriate materials: